Regex blocks the program
I have the following regex
Dim origen As String = " /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt"" /cid:45 423 /z:65 /a:23 /m:39 /t:45rt "
Dim str As String = "(^|\s)/p:""\w:(\\(\w+[\s]*\w+)+)+\\\w+.\w+""(\s|$)"
Dim ar As Integer
Dim getfile As New Regex(str)
Dim mgetfile As MatchCollection = getfile.Matches(origen)
ar = mgetfile.Count
When I evaluate this, it works and gets /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""
, basically it's the file path.
But if I change the origen line to
Dim origen As String = " /c /p:""c:\mis doc umentos\mis imagenes construida\archivo.txt""/cid:45 423 /z:65 /a:23 /m:39 /t:45rt "
Make sure the end of the file matches the value "/ cid: 45" which invalidates the pattern, but instead of getting mgetfile.count = 0 the program is a block, if I do debug I got the property evaluation failed.
a source to share
The reason your program hangs is catastrophic backtracking .
Parts of your regex (\w+\s*\w+)+
and \w+.\w+
allow so many permutations that the regex engine gets stuck in an almost endless loop. The RegexBuddy debugger exits after 1,000,000 steps.
This only happens if the pattern cannot be successfully executed, thereby prompting the regex engine to try any and any other permutation the pattern allows. In general, duplicate groups that contain duplicate quantifiers are dangerous.
What are the real requirements? To match a path that only contains letters, numbers, underscores, and backslashes? Or just a string between the quotes? Perhaps you could shed some light on this ...
Until then, I suggest the following:
"(?<=^|\s)/p:""\w:(\\[\w\s]++)+\.\w+""(?=\s|$)"
This clears up a few things: (\\[\w\s]++)
Match a backslash followed by any number of alphanumeric characters and spaces. After they have been matched, the regex engine refuses to try another permutation (this is achieved with a potential quantifier ++
instead of a simple one +
.
After that, it matches a dot (your version will match any character) and a sequence of alphanumeric characters. Then the quote, and then it checks if there is a space or the end of the line. If not, the regex will fail and exit quickly.
If you want to match a string between quotes, then
"(?<=^|\s)/p:""[^""]+""(?=\s|$)"
is the best and fastest way.
a source to share