Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

python - Regular expression to find files with underscores and optional extension

This is for work, so I've changed the extensions and files to protect the innocent.

I am parsing text from a description looking for a file name in the format word_here and it can have as many underscores as needed plus an optional extension. I was able to come up with this regular expression which works

Test 1

text = 'Some text here: * my_file_stuff.mat * other_file * third_file *'

FILE_REG_EX = r'([w]+_+[w]+.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 1

python test_regex.py

['my_file_stuff.mat', 'other_file', 'third_file']

The problem is it doesn't work for stuff like this

Test 2

text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'

FILE_REG_EX = r'([w]+_+[w]+.*[py|mat]*)'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 2

python test_regex.py
['my_file|a', 'nother_file.mat|', 'O_HERES_ONE|', '_O_HERES_ANOTHER|']

I modified my regex to include the vertical bar, here

Test 3

text = '|my_file|another_file.mat|O_HERES_ONE|_O_HERES_ANOTHER| | | |'

FILE_REG_EX = r'([w]+_+[w]+.*[py|plot]*)|'
res = re.findall(FILE_REG_EX, text)

print(res)

Output 3

 python test_regex.py
['my_file', 'another_file.mat', 'O_HERES_ONE', 'O_HERES_ANOTHER']

and that works for the second one but now not for the first one. Part of the issue is I will be searching a description for text to look up where a file is at, and I have no way of knowing what formatting it will use for files, only that they will be something in the form of MY_FILE_HERE01.py with or without the extension.

I've tried using the not symbol to exclude the vertical bars in front and back, but that seems to come up empty for both strings.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You may use this regex for both kind of inputs:

[a-zA-Zd]+_w+(?:.(?:py|mat))?

RegEx Demo

RegEx Details:

  • [a-zA-Zd]+: Match 1+ of letters or digits
  • _: Match an underscore
  • w+: Match 1+ word characters
  • (?:.(?:py|mat))?: Optionally match .py or .mat

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...