2020年12月14日 星期一

用python的regular expression做文字取代

https://docs.python.org/3/library/re.html#re.sub

把 \ 之前的文字全部刪除

re.sub(r'^\S* \\ ', '',the_string)

在單引號之前的r但表單引號裡面的字串是regular expression

^ 是match字串的開頭

\S是match非空白的字元

*是只match前面的東西(\S)0到任意個

\\前面的\是跳脫字元,用來跳脫後面的字元\,後面的\被跳脫後就是指\本身了,用來match \。若沒跳脫的話(如前面的\)就會發揮他在reqular expression的作用(跳脫後面的字元)

----------

*

Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.

\S

Matches any character which is not a whitespace character. This is the opposite of \s. If the ASCII flag is used this becomes the equivalent of [^ \t\n\r\f\v].