• 数据分析之正则表达式


    数据分析之正则表达式

    基础匹配规则

    符号示例描述
    literalhello匹配字面字符
    re1|re2he|she匹配re1或者re2表达式
    .a.b匹配任意字符(除了\n之外)
    [x-y][A-Z]匹配某一个范围内的单一字符
    [^…][^abc],[^a-z]不匹配字符集出现的字符
    ^^hello world匹配以字符串内容为起始的部分
    $hello world$匹配以字符串内容为结束的部分

    控制匹配次数

    符号示例描述
    *[A-Za-z0-9]*匹配0次或者多次前面的字母或者数字的组合
    +[a-z]+\.com匹配1次或者多次前面的字母或者数字的组合
    [hello]+匹配0次或者1次前面的字母或者数字的组合
    {N}[0-9]{3}匹配N个数字
    {M,N}[0-9]{1,3}匹配M~N次前面的正则表达式

    前后格式匹配

    符号示例描述
    (?<=…)(?<=hel)[a-z]{2}匹配hel(hello)后面的两个字符
    (?=…)[a-z]{2}(?=rld)匹配rld(world)前面两个字符
    (?(?匹配字符串之前不是192.168.开头的字符
    (?!..)(?!.cn)匹配非.cn的的前面字符

    re包源码注释

    The special characters are:
        "."      Matches any character except a newline.
        "^"      Matches the start of the string.
        "$"      Matches the end of the string or just before the newline at
                 the end of the string.
        "*"      Matches 0 or more (greedy) repetitions of the preceding RE.
                 Greedy means that it will match as many repetitions as possible.
        "+"      Matches 1 or more (greedy) repetitions of the preceding RE.
        "?"      Matches 0 or 1 (greedy) of the preceding RE.
        *?,+?,?? Non-greedy versions of the previous three special characters.
        {m,n}    Matches from m to n repetitions of the preceding RE.
        {m,n}?   Non-greedy version of the above.
        "\\"     Either escapes special characters or signals a special sequence.
        []       Indicates a set of characters.
                 A "^" as the first character indicates a complementing set.
        "|"      A|B, creates an RE that will match either A or B.
        (...)    Matches the RE inside the parentheses.
                 The contents can be retrieved or matched later in the string.
        (?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
        (?:...)  Non-grouping version of regular parentheses.
        (?P<name>...) The substring matched by the group is accessible by name.
        (?P=name)     Matches the text matched earlier by the group named name.
        (?#...)  A comment; ignored.
        (?=...)  Matches if ... matches next, but doesn't consume the string.
        (?!...)  Matches if ... doesn't match next.
        (?<=...) Matches if preceded by ... (must be fixed length).
        (?<!...) Matches if not preceded by ... (must be fixed length).
        (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
                           the (optional) no pattern otherwise.
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    re.compile()

    re.compile(pattern, flags=0)
    
    • 1

    编译一个正则表达式

    re.match()

    re.match(pattern, string, flags=0)
    
    • 1

    从字符串的起始位置匹配一个模式,匹配失败就返回None.

    re.search()

    re.search(pattern, string, flags=0)
    
    • 1

    扫描整个字符串并返回第一个匹配成功的。

    re.sub()

    re.sub(pattern, repl, string, count=0, flags=0)
    
    • 1

    更多内容

    菜鸟教程

  • 相关阅读:
    Web3D虚拟人捏脸
    走进 Java 接口测试之简单解决写接口脏数据问题
    致敬逆行者网页设计作品 大学生抗疫感动专题网页设计作业模板 疫情感动人物静态HTML网页模板下载
    决胜未来:解锁新科技趋势的无尽可能性
    一文带你搞懂Redis持久化
    leetcode题中的非常见方法归纳
    背包问题
    JavaEE初阶学习:Servlet
    Elasticsearch7.15.2 安装ik中文分词器后启动ES服务报错的解决办法
    C语言:函数
  • 原文地址:https://blog.csdn.net/weixin_42213421/article/details/127967125