Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
280 views
in Technique[技术] by (71.8m points)

Regular expression in python: removing brackets with brackets inside

I have a wiktionary dump and struggling with finding appropriate regex pattern to remove the double brackets in the expression. Here is the example of the expressions:

line = "# Test is a cool word {{source|{{nom w pc|Chantal|Bouchard}}, ''La langue et le nombril'', Presses de l'Université de Montréal (PUM), 2020, p. 174}}."

I am looking to remove all of the brackets when is begins with {{source|:

Example :# Test is a cool word.

I tried using re.sub like this line = re.sub("{{source|.*?}}", "", line )

but I got # Test is a cool word, ''La langue et le nombril'', Presses de l'Université de Montréal (PUM), 2020, p. 174}}.

I could also have another sentence like this line = "# Test is a cool word {{source|Nicolas|Daniel, Presses de l'Université de Montréal 4}}"

Thank you for your help!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can install the PyPi regex library (type pip install regex in the terminal/console and press ENTER), and then use

import regex
rx = r"s*{{source|(?>[^{}]|({{(?:[^{}]++|(?1))*}}))*}}s*"
line = "# Test is a cool word {{source|{{nom w pc|Chantal|Bouchard}}, ''La langue et le nombril'', Presses de l'Université de Montréal (PUM), 2020, p. 174}}."
print( regex.sub('', line) )
# => # Test is a cool word.

See the Python demo. The regex is

s*{{source|(?>[^{}]|({{(?:[^{}]++|(?1))*}}))*}}s*

See the regex demo. Details:

  • s* - zero or more whitespaces
  • {{source| - a literal {{source| string
  • (?>[^{}]|({{(?:[^{}]++|(?1))*}}))* - zero or more repetitions of:
    • [^{}] - a char other than { and }
    • | - or
    • ({{(?:[^{}]++|(?1))*}}) - Group 1 (it is necessary for recursion): {{, zero or more occurrences of any one or more chars other than {{ and }} or the the Group 1 recursed, and then a }} string
  • }} - a }} string
  • s* - zero or more whitespaces.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...