您尚未登录。

楼主 #1 2020-03-08 18:36:24

firstman
会员
注册时间: 2019-04-06
已发帖子: 279
积分: 279

Linux下的C语言内置支持正则表达式, 不用满世界找库了.

https://stackoverflow.com/questions/9656161/why-regexec-of-c-does-not-match-this-pattern-but-match-of-javascript-works

#include <assert.h>
#include <stdio.h>
#include <sys/types.h>
#include <regex.h>

int main(void) {
  int r;
  regex_t reg;
  regmatch_t match[2];
  char *line = "----------------------- Page 1-----------------------";

  regcomp(&reg, "[-]{23}[ ]*Page[ ]*([0-9]*)[-]{23}", REG_ICASE | REG_EXTENDED);
  /*                                ^------^ capture page number */
  r = regexec(&reg, line, 2, match, 0);
  if (r == 0) {
    printf("Match!\n");
    printf("0: [%.*s]\n", match[0].rm_eo - match[0].rm_so, line + match[0].rm_so);
    printf("1: [%.*s]\n", match[1].rm_eo - match[1].rm_so, line + match[1].rm_so);
  } else {
    printf("NO match!\n");
  }

  return 0;
}

这个可以直接编译运行.


windows下的VC编译估计通不过, 但是用mingw应该可以, 未测试.

运行结果:

Match!
0: [----------------------- Page 1-----------------------]
1: [1]

离线

楼主 #2 2020-03-08 18:37:52

firstman
会员
注册时间: 2019-04-06
已发帖子: 279
积分: 279

Re: Linux下的C语言内置支持正则表达式, 不用满世界找库了.

https://www.mitchr.me/SS/exampleCode/AUPG/regex_example.c.html

/* -*- Mode:C; Coding:us-ascii-unix; fill-column:132 -*- */
/**********************************************************************************************************************************/
/**
   @file      regex_example.c
   @author    Mitch Richling <https://www.mitchr.me/>
   @Copyright Copyright 1994,2014 by Mitch Richling.  All rights reserved.
   @brief     UNIX regex tools@EOL
   @Keywords  UNIX regular expressions regex
   @Std       ISOC POSIX.2 (IEEE Std 103.2) BSD4.3
   @Tested    
              - Solaris 2.8
              - MacOS X.2
              - Linux (RH 7.3)

   This is an example program intended to illustrate very basic use of regular expressions.
  
   Grumpy programmer note: IEEE Std 1003.2, generally referred to as 'POSIX.2' is a bit vague regarding several details like how
   back references work.  It also has a couple of errors (like how a single ')' is treated in a regular expression.  Because of
   this, most actual implementations of the standard will have several minor inconsistencies that one must watch out for.  My best
   advice is to "read the man page" on the platforms you wish to run on and to avoid exotic things.  For example, avoid things like
   the BSD REG_NOSPEC and REG_PEND options.  Another option is to simply carry your favorite regular expression library with you.
   For example, C++11 has very good regex support, and the BOOST library has a very nice regex class for older C++ versions.  PCRE
   is probably the most popular alternative, FOSS regular expression library available.
***********************************************************************************************************************************/

#include <sys/types.h>          /* UNIX types      POSIX */
#include <regex.h>              /* Regular Exp     POSIX */
#include <stdio.h>              /* I/O lib         C89   */
#include <string.h>             /* Strings         C89   */
#include <stdlib.h>             /* Standard Lib    C89   */

/**********************************************************************************************************************************/
#define MAX_SUB_EXPR_CNT 256
#define MAX_SUB_EXPR_LEN 256
#define MAX_ERR_STR_LEN  256

/**********************************************************************************************************************************/
int main(int argc, char *argv[]) {
  int i;                                /* Loop variable.                          */
  char p[MAX_SUB_EXPR_LEN];             /* For string manipulation                 */
  regex_t aCmpRegex;                    /* Pointer to our compiled regex           */
  char *aStrRegex;                      /* Pointer to the string holding the regex */
  regmatch_t pMatch[MAX_SUB_EXPR_CNT];  /* Hold partial matches.                   */
  char **aLineToMatch;                  /* Holds each line that we wish to match   */
  int result;                           /* Return from regcomp() and regexec()     */
  char outMsgBuf[MAX_ERR_STR_LEN];      /* Holds error messages from regerror()    */
  char *testStrings[] = { "This should match... hello",
                          "This could match... hello!",
                          "More than one hello.. hello",
                          "No chance of a match...",
                          NULL};

  /* use aStrRegex for readability. */
  aStrRegex = "(.*)(hello)+";
  printf("Regex to use: %s\n", aStrRegex);

  /* Compile the regex */
  if( (result = regcomp(&aCmpRegex, aStrRegex, REG_EXTENDED)) ) {
    printf("Error compiling regex(%d).\n", result);
    regerror(result, &aCmpRegex, outMsgBuf, sizeof(outMsgBuf));
    printf("Error msg: %s\n", outMsgBuf);
    exit(1);
  } /* end if */

  /*  Possible last argument to regcomp (||'ed together):
        REG_EXTENDED  Use extended regular expressions
        REG_BASIC     Use basic regular expressions
        REG_NOSPEC    Special character support off (Not POSIX.2)
        REG_ICASE     Ignore upper/lower case distinctions
        REG_NOSUB     No sub-strings (just check for match/no match)
        REG_NEWLINE   Compile for newline-sensitive matching
        REG_PEND      Specify alternate string ending (Not POSIX.2) */


  /* Apply our regular expression to some strings. */
  for(aLineToMatch=testStrings; *aLineToMatch != NULL; aLineToMatch++) {
    printf("String: %s\n", *aLineToMatch);
    printf("        %s\n", "0123456789012345678901234567890123456789");
    printf("        %s\n", "0         1         2         3");
    /* compare and check result (MAX_SUB_EXPR_CNT max sub-expressions).*/
    if( !(result = regexec(&aCmpRegex, *aLineToMatch, MAX_SUB_EXPR_CNT, pMatch, 0)) ) {
      /* Last argument to regexec (||'ed together):
         REG_NOTBOL    Start of the string is NOT the start of a line
         REG_NOTEOL    $ shouldn't match end of string (gotta have a newline)
         REG_STARTEND  Not POSIX.2 */
      printf("Result: We have a match!\n");
      for(i=0;i<=(int)aCmpRegex.re_nsub;i++) {
        printf("Match(%2d/%2d): (%2d,%2d): ", 
               i, 
               (int)(aCmpRegex.re_nsub), 
               (int)(pMatch[i].rm_so), 
               (int)(pMatch[i].rm_eo));

          if( (pMatch[i].rm_so >= 0) && (pMatch[i].rm_eo >= 1) && 
              (pMatch[i].rm_so != pMatch[i].rm_eo) ) {
            strncpy(p, &((*aLineToMatch)[pMatch[i].rm_so]), pMatch[i].rm_eo-pMatch[i].rm_so);
            p[pMatch[i].rm_eo-pMatch[i].rm_so] = '\0';
            printf("'%s'", p);
          } /* end if */
          printf("\n");
      } /* end for */
      printf("\n");
    } else {
      switch(result) {
        case REG_NOMATCH   : printf("String did not match the pattern\n");                   break;
        ////Some typical return codes:
        //case REG_BADPAT    : printf("invalid regular expression\n");                         break;
        //case REG_ECOLLATE  : printf("invalid collating element\n");                          break;
        //case REG_ECTYPE    : printf("invalid character class\n");                            break;
        //case REG_EESCAPE   : printf("`\' applied to unescapable character\n");               break;
        //case REG_ESUBREG   : printf("invalid backreference number\n");                       break;
        //case REG_EBRACK    : printf("brackets `[ ]' not balanced\n");                        break;
        //case REG_EPAREN    : printf("parentheses `( )' not balanced\n");                     break;
        //case REG_EBRACE    : printf("braces `{ }' not balanced\n");                          break;
        //case REG_BADBR     : printf("invalid repetition count(s) in `{ }'\n");               break;
        //case REG_ERANGE    : printf("invalid character range in `[ ]'\n");                   break;
        //case REG_ESPACE    : printf("Ran out of memory\n");                                  break;
        //case REG_BADRPT    : printf("`?', `*', or `+' operand invalid\n");                   break;
        //case REG_EMPTY     : printf("empty (sub)expression\n");                              break;
        //case REG_ASSERT    : printf("can't happen - you found a bug\n");                     break;
        //case REG_INVARG    : printf("A bad option was passed\n");                            break;
        //case REG_ILLSEQ    : printf("illegal byte sequence\n");                              break;
        default              : printf("Unknown error\n");                                      break;
      } /* end switch */
      regerror(result, &aCmpRegex, outMsgBuf, sizeof(outMsgBuf));
      printf("Result: Error msg: %s\n\n", outMsgBuf);
    } /* end if/else */
  } /* end for */
  
  /* Free up resources for the regular expression */
  regfree(&aCmpRegex);

  exit(0);
} /* end func main */

另外一个demo, 运行结果:

Regex to use: (.*)(hello)+
String: This should match... hello
        0123456789012345678901234567890123456789
        0         1         2         3
Result: We have a match!
Match( 0/ 2): ( 0,26): 'This should match... hello'
Match( 1/ 2): ( 0,21): 'This should match... '
Match( 2/ 2): (21,26): 'hello'

String: This could match... hello!
        0123456789012345678901234567890123456789
        0         1         2         3
Result: We have a match!
Match( 0/ 2): ( 0,25): 'This could match... hello'
Match( 1/ 2): ( 0,20): 'This could match... '
Match( 2/ 2): (20,25): 'hello'

String: More than one hello.. hello
        0123456789012345678901234567890123456789
        0         1         2         3
Result: We have a match!
Match( 0/ 2): ( 0,27): 'More than one hello.. hello'
Match( 1/ 2): ( 0,22): 'More than one hello.. '
Match( 2/ 2): (22,27): 'hello'

String: No chance of a match...
        0123456789012345678901234567890123456789
        0         1         2         3
String did not match the pattern
Result: Error msg: No match

离线

楼主 #3 2020-03-08 18:40:11

firstman
会员
注册时间: 2019-04-06
已发帖子: 279
积分: 279

Re: Linux下的C语言内置支持正则表达式, 不用满世界找库了.

https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_74/rtref/regexec.htm

ibm 的一个演示代码:

#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
 
int main(void)
{
   regex_t    preg;
   char       *string = "a very simple simple simple string";
   char       *pattern = "\\(sim[a-z]le\\) \\1";
   int        rc;
   size_t     nmatch = 2;
   regmatch_t pmatch[2];
 
   if (0 != (rc = regcomp(&preg, pattern, 0))) {
      printf("regcomp() failed, returning nonzero (%d)\n", rc);
      exit(EXIT_FAILURE);
   }
 
   if (0 != (rc = regexec(&preg, string, nmatch, pmatch, 0))) {
      printf("Failed to match '%s' with '%s',returning %d.\n",
             string, pattern, rc);
   }
   else {
      printf("With the whole expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[0].rm_eo - pmatch[0].rm_so, &string[pmatch[0].rm_so],
             pmatch[0].rm_so, pmatch[0].rm_eo - 1);
      printf("With the sub-expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],
             pmatch[1].rm_so, pmatch[1].rm_eo - 1);
   }
   regfree(&preg);
   return 0;
 
   /****************************************************************************
      The output should be similar to :
 
      With the whole expression, a matched substring "simple simple" is found
      at position 7 to 19.
      With the sub-expression, a matched substring "simple" is found
      at position 7 to 12.
   ****************************************************************************/
}

运行结果:

With the whole expression, a matched substring "simple simple" is found at position 7 to 19.
With the sub-expression, a matched substring "simple" is found at position 7 to 12.

离线

楼主 #5 2020-03-08 18:50:55

firstman
会员
注册时间: 2019-04-06
已发帖子: 279
积分: 279

Re: Linux下的C语言内置支持正则表达式, 不用满世界找库了.

在一楼的基础上写一个简单的 AT 输出字符串处理函数:

#include <assert.h>
#include <stdio.h>
#include <sys/types.h>
#include <regex.h>

int main(void) {
  int r;
  regex_t reg;
  regmatch_t match[7];


  char *line = "+CCLK: \"20/03/08,17:31:52+32\"";
  regcomp(&reg, "\\+CCLK: \"([0-9]{2})/([0-9]{2})/([0-9]{2}),([0-9]{2}):([0-9]{2}):([0-9]{2})", REG_ICASE | REG_EXTENDED);

  r = regexec(&reg, line, 7, match, 0);

  if (r == 0) {
    printf("Match!\n");
    printf("0: [%.*s]\n", match[0].rm_eo - match[0].rm_so, line + match[0].rm_so);
    printf("1: [%.*s]\n", match[1].rm_eo - match[1].rm_so, line + match[1].rm_so);
    printf("2: [%.*s]\n", match[2].rm_eo - match[2].rm_so, line + match[2].rm_so);
    printf("3: [%.*s]\n", match[3].rm_eo - match[3].rm_so, line + match[3].rm_so);

    printf("4: [%.*s]\n", match[4].rm_eo - match[4].rm_so, line + match[4].rm_so);
    printf("5: [%.*s]\n", match[5].rm_eo - match[5].rm_so, line + match[5].rm_so);
    printf("6: [%.*s]\n", match[6].rm_eo - match[6].rm_so, line + match[6].rm_so);
  } else {
    printf("NO match!\n");
  }

  return 0;
}

输出结果:

Match!
0: [+CCLK: "20/03/08,17:31:52]
1: [20]
2: [03]
3: [08]
4: [17]
5: [31]
6: [52]

离线

楼主 #6 2020-03-08 19:15:47

firstman
会员
注册时间: 2019-04-06
已发帖子: 279
积分: 279

Re: Linux下的C语言内置支持正则表达式, 不用满世界找库了.

https://stackoverflow.com/questions/7899119/what-does-s-mean-in-printf

void f(const char *str, int str_len)
{
  printf("%.*s\n", str_len, str);
}
#include <stdio.h>

int main() {
    int precision = 8;
    int biggerPrecision = 16;
    const char *greetings = "Hello world";

    printf("|%.8s|\n", greetings);
    printf("|%.*s|\n", precision , greetings);
    printf("|%16s|\n", greetings);
    printf("|%*s|\n", biggerPrecision , greetings);

    return 0;
}
|Hello wo|
|Hello wo|
|     Hello world|
|     Hello world|

意外的收获, printf 的一种使用方法.

离线

楼主 #8 2020-03-08 22:21:35

firstman
会员
注册时间: 2019-04-06
已发帖子: 279
积分: 279

Re: Linux下的C语言内置支持正则表达式, 不用满世界找库了.

OggyJFX 说:

我一直有个疑问,正则表达式,是一个标准的东西么?还是每个软件用的正则表达式,都有自己的规则?

规则基本大同小异: https://tool.oschina.net/uploads/apidocs/jquery/regexp.html

但是有些库实现的功能更多, 比如java/python库还带替换功能, 这个c库没有这个功能.

离线

页脚

工信部备案:粤ICP备20025096号 Powered by FluxBB

感谢为中文互联网持续输出优质内容的各位老铁们。 QQ: 516333132, 微信(wechat): whycan_cn (哇酷网/挖坑网/填坑网) service@whycan.cn