Added initial support for recursive expressions.

Updated docs and tests accordingly.

[SVN r54994]
This commit is contained in:
John Maddock
2009-07-17 10:23:50 +00:00
parent 02a629baf7
commit 5a6bc29d7c
44 changed files with 1013 additions and 264 deletions

View File

@ -28,7 +28,7 @@
Syntax</a>
</h3></div></div></div>
<a name="boost_regex.syntax.perl_syntax.synopsis"></a><h4>
<a name="id650518"></a>
<a name="id693116"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.synopsis">Synopsis</a>
</h4>
<p>
@ -43,7 +43,7 @@
</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">perl</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
</pre>
<a name="boost_regex.syntax.perl_syntax.perl_regular_expression_syntax"></a><h4>
<a name="id650665"></a>
<a name="id693264"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.perl_regular_expression_syntax">Perl
Regular Expression Syntax</a>
</h4>
@ -53,7 +53,7 @@
</p>
<pre class="programlisting">.[{()\*+?|^$</pre>
<a name="boost_regex.syntax.perl_syntax.wildcard"></a><h5>
<a name="id650689"></a>
<a name="id693288"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.wildcard">Wildcard</a>
</h5>
<p>
@ -73,7 +73,7 @@
</li>
</ul></div>
<a name="boost_regex.syntax.perl_syntax.anchors"></a><h5>
<a name="id650736"></a>
<a name="id693334"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.anchors">Anchors</a>
</h5>
<p>
@ -83,7 +83,7 @@
A '$' character shall match the end of a line.
</p>
<a name="boost_regex.syntax.perl_syntax.marked_sub_expressions"></a><h5>
<a name="id650758"></a>
<a name="id693356"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.marked_sub_expressions">Marked
sub-expressions</a>
</h5>
@ -94,7 +94,7 @@
can also repeated, or referred to by a back-reference.
</p>
<a name="boost_regex.syntax.perl_syntax.non_marking_grouping"></a><h5>
<a name="id650784"></a>
<a name="id693382"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_marking_grouping">Non-marking
grouping</a>
</h5>
@ -107,7 +107,7 @@
without splitting out any separate sub-expressions.
</p>
<a name="boost_regex.syntax.perl_syntax.repeats"></a><h5>
<a name="id650820"></a>
<a name="id693418"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.repeats">Repeats</a>
</h5>
<p>
@ -188,7 +188,7 @@
to be applied to.
</p>
<a name="boost_regex.syntax.perl_syntax.non_greedy_repeats"></a><h5>
<a name="id651056"></a>
<a name="id693655"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_greedy_repeats">Non greedy
repeats</a>
</h5>
@ -218,7 +218,7 @@
while consuming as little input as possible.
</p>
<a name="boost_regex.syntax.perl_syntax.pocessive_repeats"></a><h5>
<a name="id651115"></a>
<a name="id693714"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.pocessive_repeats">Pocessive
repeats</a>
</h5>
@ -250,7 +250,7 @@
while giving nothing back.
</p>
<a name="boost_regex.syntax.perl_syntax.back_references"></a><h5>
<a name="id651174"></a>
<a name="id693772"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.back_references">Back references</a>
</h5>
<p>
@ -360,7 +360,7 @@
named "two".
</p>
<a name="boost_regex.syntax.perl_syntax.alternation"></a><h5>
<a name="id651394"></a>
<a name="id693992"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.alternation">Alternation</a>
</h5>
<p>
@ -387,7 +387,7 @@
<code class="literal">(?:abc)??</code> has exactly the same effect.
</p>
<a name="boost_regex.syntax.perl_syntax.character_sets"></a><h5>
<a name="id651462"></a>
<a name="id694060"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_sets">Character sets</a>
</h5>
<p>
@ -399,7 +399,7 @@
A bracket expression may contain any combination of the following:
</p>
<a name="boost_regex.syntax.perl_syntax.single_characters"></a><h6>
<a name="id651493"></a>
<a name="id694092"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.single_characters">Single characters</a>
</h6>
<p>
@ -407,7 +407,7 @@
'b', or 'c'.
</p>
<a name="boost_regex.syntax.perl_syntax.character_ranges"></a><h6>
<a name="id651515"></a>
<a name="id694113"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_ranges">Character
ranges</a>
</h6>
@ -421,7 +421,7 @@
sensitive.
</p>
<a name="boost_regex.syntax.perl_syntax.negation"></a><h6>
<a name="id651547"></a>
<a name="id694146"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.negation">Negation</a>
</h6>
<p>
@ -430,7 +430,7 @@
matches any character that is not in the range <code class="literal">a-c</code>.
</p>
<a name="boost_regex.syntax.perl_syntax.character_classes"></a><h6>
<a name="id651575"></a>
<a name="id694173"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_classes">Character
classes</a>
</h6>
@ -441,7 +441,7 @@
class names</a>.
</p>
<a name="boost_regex.syntax.perl_syntax.collating_elements"></a><h6>
<a name="id651607"></a>
<a name="id694206"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.collating_elements">Collating
Elements</a>
</h6>
@ -463,7 +463,7 @@
matches a <code class="literal">\0</code> character.
</p>
<a name="boost_regex.syntax.perl_syntax.equivalence_classes"></a><h6>
<a name="id651670"></a>
<a name="id694268"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.equivalence_classes">Equivalence
classes</a>
</h6>
@ -480,7 +480,7 @@
or even all locales on one platform.
</p>
<a name="boost_regex.syntax.perl_syntax.escaped_characters"></a><h6>
<a name="id651718"></a>
<a name="id694316"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.escaped_characters">Escaped
Characters</a>
</h6>
@ -492,7 +492,7 @@
is <span class="emphasis"><em>not</em></span> a "word" character.
</p>
<a name="boost_regex.syntax.perl_syntax.combinations"></a><h6>
<a name="id651786"></a>
<a name="id694384"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.combinations">Combinations</a>
</h6>
<p>
@ -500,7 +500,7 @@
<code class="literal">[[:digit:]a-c[.NUL.]]</code>.
</p>
<a name="boost_regex.syntax.perl_syntax.escapes"></a><h5>
<a name="id651808"></a>
<a name="id694406"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.escapes">Escapes</a>
</h5>
<p>
@ -692,7 +692,7 @@
</tbody>
</table></div>
<a name="boost_regex.syntax.perl_syntax._quot_single_character_quot__character_classes_"></a><h6>
<a name="id653698"></a>
<a name="id696296"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax._quot_single_character_quot__character_classes_">"Single
character" character classes:</a>
</h6>
@ -894,7 +894,7 @@
</tbody>
</table></div>
<a name="boost_regex.syntax.perl_syntax.character_properties"></a><h6>
<a name="id654298"></a>
<a name="id696896"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.character_properties">Character
Properties</a>
</h6>
@ -1002,7 +1002,7 @@
as does <code class="literal">\p{digit}</code>.
</p>
<a name="boost_regex.syntax.perl_syntax.word_boundaries"></a><h6>
<a name="id654587"></a>
<a name="id697185"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.word_boundaries">Word Boundaries</a>
</h6>
<p>
@ -1021,7 +1021,7 @@
<code class="literal">\B</code> Matches only when not at a word boundary.
</p>
<a name="boost_regex.syntax.perl_syntax.buffer_boundaries"></a><h6>
<a name="id654638"></a>
<a name="id697237"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.buffer_boundaries">Buffer boundaries</a>
</h6>
<p>
@ -1046,7 +1046,7 @@
to the regular expression <code class="literal">\n*\z</code>
</p>
<a name="boost_regex.syntax.perl_syntax.continuation_escape"></a><h6>
<a name="id654679"></a>
<a name="id697278"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.continuation_escape">Continuation
Escape</a>
</h6>
@ -1058,7 +1058,7 @@
one ended.
</p>
<a name="boost_regex.syntax.perl_syntax.quoting_escape"></a><h6>
<a name="id654701"></a>
<a name="id697299"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.quoting_escape">Quoting escape</a>
</h6>
<p>
@ -1071,7 +1071,7 @@
<span class="special">\*+</span><span class="identifier">aaa</span>
</pre>
<a name="boost_regex.syntax.perl_syntax.unicode_escapes"></a><h6>
<a name="id654748"></a>
<a name="id697346"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.unicode_escapes">Unicode escapes</a>
</h6>
<p>
@ -1081,7 +1081,7 @@
followed by a sequence of zero or more combining characters.
</p>
<a name="boost_regex.syntax.perl_syntax.matching_line_endings"></a><h6>
<a name="id654774"></a>
<a name="id697372"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.matching_line_endings">Matching
Line Endings</a>
</h6>
@ -1090,7 +1090,7 @@
sequence, specifically it is identical to the expression <code class="literal">(?&gt;\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}])</code>.
</p>
<a name="boost_regex.syntax.perl_syntax.keeping_back_some_text"></a><h6>
<a name="id654800"></a>
<a name="id697399"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.keeping_back_some_text">Keeping
back some text</a>
</h6>
@ -1105,7 +1105,7 @@
This can be used to simulate variable width lookbehind assertions.
</p>
<a name="boost_regex.syntax.perl_syntax.any_other_escape"></a><h6>
<a name="id654830"></a>
<a name="id697429"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.any_other_escape">Any other
escape</a>
</h6>
@ -1114,7 +1114,7 @@
\@ matches a literal '@'.
</p>
<a name="boost_regex.syntax.perl_syntax.perl_extended_patterns"></a><h5>
<a name="id654847"></a>
<a name="id697446"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.perl_extended_patterns">Perl
Extended Patterns</a>
</h5>
@ -1123,7 +1123,7 @@
<code class="literal">(?</code>.
</p>
<a name="boost_regex.syntax.perl_syntax.named_subexpressions"></a><h6>
<a name="id654869"></a>
<a name="id697467"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.named_subexpressions">Named
Subexpressions</a>
</h6>
@ -1145,14 +1145,14 @@
format string for search and replace operations, or in the <a class="link" href="../ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a> member functions.
</p>
<a name="boost_regex.syntax.perl_syntax.comments"></a><h6>
<a name="id654964"></a>
<a name="id697562"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.comments">Comments</a>
</h6>
<p>
<code class="literal">(?# ... )</code> is treated as a comment, it's contents are ignored.
</p>
<a name="boost_regex.syntax.perl_syntax.modifiers"></a><h6>
<a name="id654986"></a>
<a name="id697585"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.modifiers">Modifiers</a>
</h6>
<p>
@ -1166,7 +1166,7 @@
pattern only.
</p>
<a name="boost_regex.syntax.perl_syntax.non_marking_groups"></a><h6>
<a name="id655021"></a>
<a name="id697620"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.non_marking_groups">Non-marking
groups</a>
</h6>
@ -1175,7 +1175,7 @@
an additional sub-expression.
</p>
<a name="boost_regex.syntax.perl_syntax.branch_reset"></a><h6>
<a name="id655043"></a>
<a name="id697641"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.branch_reset">Branch reset</a>
</h6>
<p>
@ -1197,7 +1197,7 @@
# 1 2 2 3 2 3 4
</pre>
<a name="boost_regex.syntax.perl_syntax.lookahead"></a><h6>
<a name="id655080"></a>
<a name="id697678"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.lookahead">Lookahead</a>
</h6>
<p>
@ -1220,7 +1220,7 @@
could be used to validate the password.
</p>
<a name="boost_regex.syntax.perl_syntax.lookbehind"></a><h6>
<a name="id655154"></a>
<a name="id697753"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.lookbehind">Lookbehind</a>
</h6>
<p>
@ -1234,7 +1234,7 @@
(pattern must be of fixed length).
</p>
<a name="boost_regex.syntax.perl_syntax.independent_sub_expressions"></a><h6>
<a name="id655187"></a>
<a name="id697785"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.independent_sub_expressions">Independent
sub-expressions</a>
</h6>
@ -1246,8 +1246,32 @@
be considered, if this doesn't allow the expression as a whole to match then
no match is found at all.
</p>
<a name="boost_regex.syntax.perl_syntax.recursive_expressions"></a><h6>
<a name="id697816"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions">Recursive
Expressions</a>
</h6>
<p>
<code class="literal">(?<span class="emphasis"><em>N</em></span>) (?-<span class="emphasis"><em>N</em></span>) (?+<span class="emphasis"><em>N</em></span>)
(?R) (?0)</code>
</p>
<p>
<code class="literal">(?R)</code> and <code class="literal">(?0)</code> recurse to the start
of the entire pattern.
</p>
<p>
<code class="literal">(?<span class="emphasis"><em>N</em></span>)</code> executes sub-expression <span class="emphasis"><em>N</em></span>
recursively, for example <code class="literal">(?2)</code> will recurse to sub-expression
2.
</p>
<p>
<code class="literal">(?-<span class="emphasis"><em>N</em></span>)</code> and <code class="literal">(?+<span class="emphasis"><em>N</em></span>)</code>
are relative recursions, so for example <code class="literal">(?-1)</code> recurses
to the last sub-expression to be declared, and <code class="literal">(?+1)</code> recurses
to the next sub-expression to be declared.
</p>
<a name="boost_regex.syntax.perl_syntax.conditional_expressions"></a><h6>
<a name="id655218"></a>
<a name="id697914"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.conditional_expressions">Conditional
Expressions</a>
</h6>
@ -1261,12 +1285,35 @@
if the <span class="emphasis"><em>condition</em></span> is true, otherwise fails.
</p>
<p>
<span class="emphasis"><em>condition</em></span> may be either a forward lookahead assert,
or the index of a marked sub-expression (the condition becomes true if the
sub-expression has been matched).
<span class="emphasis"><em>condition</em></span> may be either: a forward lookahead assert,
the index of a marked sub-expression (the condition becomes true if the sub-expression
has been matched), or an index of a recursion (the condition become true
if we are executing directly inside the specified recursion).
</p>
<p>
Here is a summary of the possible predicates:
</p>
<div class="itemizedlist"><ul type="disc">
<li>
<code class="literal">(?(?=assert)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span>
if the forward look-ahead assert matches, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
</li>
<li>
<code class="literal">(?(?!assert)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span>
if the forward look-ahead assert does not match, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
</li>
<li>
<code class="literal">(?(R)yes-pattern|no-pattern)</code> Executes <span class="emphasis"><em>yes-pattern</em></span>
if we are executing inside a recursion, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
</li>
<li>
<code class="literal">(?(R<span class="emphasis"><em>N</em></span>)yes-pattern|no-pattern)</code> Executes
<span class="emphasis"><em>yes-pattern</em></span> if we are executing inside a recursion
to sub-expression <span class="emphasis"><em>N</em></span>, otherwise executes <span class="emphasis"><em>no-pattern</em></span>.
</li>
</ul></div>
<a name="boost_regex.syntax.perl_syntax.operator_precedence"></a><h5>
<a name="id655273"></a>
<a name="id698047"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.operator_precedence">Operator
precedence</a>
</h5>
@ -1301,7 +1348,7 @@
</li>
</ol></div>
<a name="boost_regex.syntax.perl_syntax.what_gets_matched"></a><h4>
<a name="id655363"></a>
<a name="id698137"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">What gets
matched</a>
</h4>
@ -1476,7 +1523,7 @@
</tbody>
</table></div>
<a name="boost_regex.syntax.perl_syntax.variations"></a><h4>
<a name="id656077"></a>
<a name="id698850"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.variations">Variations</a>
</h4>
<p>
@ -1485,7 +1532,7 @@
and <code class="literal">JScript</code></a> are all synonyms for <code class="literal">perl</code>.
</p>
<a name="boost_regex.syntax.perl_syntax.options"></a><h4>
<a name="id656124"></a>
<a name="id698897"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.options">Options</a>
</h4>
<p>
@ -1497,7 +1544,7 @@
are to be applied.
</p>
<a name="boost_regex.syntax.perl_syntax.pattern_modifiers"></a><h4>
<a name="id656172"></a>
<a name="id698945"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.pattern_modifiers">Pattern
Modifiers</a>
</h4>
@ -1509,7 +1556,7 @@
and <code class="literal">no_mod_s</code></a>.
</p>
<a name="boost_regex.syntax.perl_syntax.references"></a><h4>
<a name="id656224"></a>
<a name="id698998"></a>
<a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.references">References</a>
</h4>
<p>