Confusion in RegExp Reluctant Quantifier? Java
Why am I getting output ab
for the following regex code with Relucutant quantifier?
Pattern p = Pattern.compile("abc*?");
Matcher m = p.matcher("abcfoo");
while(m.find())
System.out.println(m.group()); // ab
Likewise, why am I getting empty indices for the following code?
Pattern p = Pattern.compile(".*?");
Matcher m = p.matcher("abcfoo");
while(m.find())
System.out.println(m.group());
a source to share
In addition to Konrad Rudolph's answer:
abc*?
corresponds "ab"
in any case and "c"
only if necessary. Because nothing follows *?
, the regex engine stops immediately. If you have:
abc*?f
then it will match "abcf"
, because the character "c"
must match to allow the match "f"
. Another expression:
.*?
nothing matches because this pattern is 100% optional.
.*?f
will match again "abcf"
.
a source to share
*?
matches zero or more matches, but as little as possible (and, by the way, is usually called "not greedy" rather than "reluctant"). Therefore, if null matches are possible, this is the optimal match.
What exactly do you want to achieve? Perhaps non-greedy matching is not what you want.
a source to share
It never makes sense to have a reluctant quantifier as the last in a regex. An insufficient quantifier only matches as much as it needs to to achieve a perfect match. This means that there must be something after the quantifier to make it match.
If it seems odd to have something that could be exploited in such a meaningless use, it is probably because reluctant quantifiers are optional - which is not possible with "real" regular expressions. Other examples of use are senseless "quantifier" {1}
and \b+
or any other statement of zero width ( ^
, $
inverse, etc.) With quantifier. Some tastes regard the latter as a syntax error; Java allows this, but of course applies this statement once.
a source to share
The quantum factor ?
makes .*
as few characters as possible match, only matches more characters if required by backtracking.
Here's an illustrative example of using a regular expression to find a non-empty prefix that is also a string suffix (no overlap).
The capture group \1
in the first pattern is greedy: it matches everyone first, and takes less time than it backs off. This way the template will find the largest possible prefix / suffix:
System.out.println(
"abracadabra".replaceAll("^(.+).*\\1$", "($1)")
); // prints "(abra)"
Now \1
the second template is reluctant; it matches nothing at first, and takes more as it returns. This way the template will find the shortest prefix / suffix:
System.out.println(
"abracadabra".replaceAll("^(.+?).*\\1$", "($1)")
); // prints "(a)"
In your case, it .*?
might match an empty string and never need to return and match anymore, since that was enough to match the general pattern.
see also
Here's another illustrative example of a reluctant quantifier on final repetition:
Here x{3,5}
is greedy and will be as much as possible.
System.out.println(
"xxxxxxx".replaceAll("x{3,5}", "Y")
); // prints "Yxx"
It is x{3,5}?
reluctant here and will take as little as possible.
System.out.println(
"xxxxxxx".replaceAll("x{3,5}?", "Y")
); // prints "YYx"
a source to share
*?-> also call it as Lasy star
^abc*?f
*?----> repeats 0 or more times
^---> regular expression for start of the string
Example: abcf00abcf00 --->Matches:"abcf"00abcf00
In this case c must select to reach f
abc*?
*?----> repeats 0 or more times
Matches ab
Example: abcabcabcabc -----> Matches:"ab"c"ab"c"ab"c"ab"c
abc.* matches any character except line break
Example: abcabababbababab --->Matches:"abcabababbababab"
ab.*?
example: ababababbababab ---> "ab""ab""ab""ab""ab""ab""ab""ab"
abc? matsches ab or abc
a source to share