The article mainly introduces the meanings of .*, .*?, and .+? in parsing regular expressions. This article provides a very detailed explanation, which has certain reference value for everyone's learning or work. Friends in need can refer to it.
1. .*
.
represents any single character except for the newline character \n, and *
represents zero or more occurrences. So .*
together means any character appearing zero or more times. Without ?
, it represents greedy mode. For example, a.*b
will match the longest string starting with a and ending with b. If it is used to search for aabab
, it will match the entire string aabab
. This is called greedy matching.
For example, the pattern src=.*
will match the longest string starting with src=` and ending with `. When used to search <img src=``test.jpg` width=`60px` height=`80px`/>, it will return src=``test.jpg` width=`60px` height=`80px`.
2. .*?
When ?
is used after *
or +
, it represents lazy mode. It means matching as few characters as possible. It means matching any number of repetitions, but using the fewest repetitions possible to make the entire match successful.
a.*?b
matches the shortest string starting with a and ending with b. If applied to aabab
, it will match aab
(characters 1 to 3) and ab
(characters 4 to 5).
For example, the pattern src=`.*?` will match the shortest string starting with src=` and ending with `. And there can be no characters between the start and end, because * represents zero or more. When used to search <img src=``test.jpg` width=`60px` height=`80px`/>, it will return src=``.
3. .+?
Same as above, when ?
is used after *
or +
, it represents lazy mode. It means matching any number of repetitions, but using the fewest repetitions possible to make the entire match successful.
a.+?b
matches the shortest string starting with a and ending with b, but there must be at least one character between a and b. If applied to ababccaab
, it will match abab
(characters 1 to 4) and aab
(characters 7 to 9). Note that the matching result is not ab
, ab
, and aab
because there must be at least one character between a and b.
For example, the pattern src=`.+?` will match the shortest string starting with src=` and ending with `. And there must be characters between the start and end, because + represents one or more. When used to search <img src=``test.jpg` width=`60px` height=`80px`/>, it will return src=``test.jpg`. Note the difference from .*?
, it will not match src=`` because there must be at least one character between src=` and `.