Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

regex - Checking for diacritics with a regular expression

Simple problem: an existing project allows me to add additional fields (with additional checks on those fields as regular expressions) to support custom input forms. And I need to add a new form but cannot change how this project works. This form allows a visitor to enter his first and last name plus initials. So the RegEx ^[a-zA-Z.]*$ worked just fine for now.
Then someone noticed that it wouldn't accept diacritic characters as input. A Turkish name like ?mür was not accepted as valid. It needs to be accepted, though.

So I have two options:

  1. Remove the check completely, which would allow users to enter garbage.
  2. Write a regular expression that would also include diacritic letters but still no digits, spaces or other non-letters.

Since I cannot change the code of the project, I only have these two options. I would prefer option 2 but now wonder what the proper RegEx should be. (The project is written in C# 4.0.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use the specific Unicode escape for letters - p{L} (this will include the A-Za-z ranges):

^[.p{L}]*$

See on regularexpressions.info:

p{L} or p{Letter}

Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...