Add checked constructors to the Unicode iterators that need them.

Update icu support code to use the new checking-constructors.
Update tests to check the full Unicode character range (as of Unicode V6).
Add minimal docs describing the iterators.

[SVN r73271]
This commit is contained in:
John Maddock
2011-07-21 10:01:09 +00:00
parent 03ef9626ba
commit d08bfeff25
89 changed files with 1426 additions and 1088 deletions

View File

@ -3,7 +3,7 @@
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>POSIX Extended Regular Expression Syntax</title>
<link rel="stylesheet" href="../../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.74.0">
<meta name="generator" content="DocBook XSL Stylesheets V1.76.1">
<link rel="home" href="../../index.html" title="Boost.Regex">
<link rel="up" href="../syntax.html" title="Regular Expression Syntax">
<link rel="prev" href="perl_syntax.html" title="Perl Regular Expression Syntax">
@ -22,13 +22,13 @@
<div class="spirit-nav">
<a accesskey="p" href="perl_syntax.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../syntax.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="basic_syntax.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section" lang="en">
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="boost_regex.syntax.basic_extended"></a><a class="link" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">POSIX Extended Regular
Expression Syntax</a>
</h3></div></div></div>
<a name="boost_regex.syntax.basic_extended.synopsis"></a><h4>
<a name="id1000989"></a>
<a name="boost_regex.syntax.basic_extended.synopsis-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.synopsis">Synopsis</a>
</h4>
<p>
@ -38,13 +38,13 @@
the flag <code class="computeroutput"><span class="identifier">extended</span></code> to the
regex constructor, for example:
</p>
<pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Extended expression:
</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">);</span>
<span class="comment">// e2 a case insensitive POSIX-Extended expression:
</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
<pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Extended expression:</span>
<span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">);</span>
<span class="comment">// e2 a case insensitive POSIX-Extended expression:</span>
<span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e2</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">extended</span><span class="special">|</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">icase</span><span class="special">);</span>
</pre>
<a name="boost_regex.posix_extended_syntax"></a><a name="boost_regex.syntax.basic_extended.posix_extended_syntax"></a><h4>
<a name="id1001164"></a>
<a name="boost_regex.syntax.basic_extended.posix_extended_syntax-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.posix_extended_syntax">POSIX
Extended Syntax</a>
</h4>
@ -54,25 +54,25 @@
</p>
<pre class="programlisting">.[{}()\*+?|^$</pre>
<a name="boost_regex.syntax.basic_extended.wildcard_"></a><h5>
<a name="id1001186"></a>
<a name="boost_regex.syntax.basic_extended.wildcard_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.wildcard_">Wildcard:</a>
</h5>
<p>
The single character '.' when used outside of a character set will match
any single character except:
</p>
<div class="itemizedlist"><ul type="disc">
<li>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
The NULL character when the flag <code class="computeroutput"><span class="identifier">match_no_dot_null</span></code>
is passed to the matching algorithms.
</li>
<li>
<li class="listitem">
The newline character when the flag <code class="computeroutput"><span class="identifier">match_not_dot_newline</span></code>
is passed to the matching algorithms.
</li>
</ul></div>
<a name="boost_regex.syntax.basic_extended.anchors_"></a><h5>
<a name="id1001238"></a>
<a name="boost_regex.syntax.basic_extended.anchors_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.anchors_">Anchors:</a>
</h5>
<p>
@ -84,7 +84,7 @@
of an expression, or the last character of a sub-expression.
</p>
<a name="boost_regex.syntax.basic_extended.marked_sub_expressions_"></a><h5>
<a name="id1001260"></a>
<a name="boost_regex.syntax.basic_extended.marked_sub_expressions_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.marked_sub_expressions_">Marked
sub-expressions:</a>
</h5>
@ -96,7 +96,7 @@
to by a back-reference.
</p>
<a name="boost_regex.syntax.basic_extended.repeats_"></a><h5>
<a name="id1001294"></a>
<a name="boost_regex.syntax.basic_extended.repeats_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.repeats_">Repeats:</a>
</h5>
<p>
@ -182,7 +182,7 @@ cab
operator to be applied to.
</p>
<a name="boost_regex.syntax.basic_extended.back_references_"></a><h5>
<a name="id1001600"></a>
<a name="boost_regex.syntax.basic_extended.back_references_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.back_references_">Back references:</a>
</h5>
<p>
@ -212,7 +212,7 @@ cab
</p></td></tr>
</table></div>
<a name="boost_regex.syntax.basic_extended.alternation"></a><h5>
<a name="id1001664"></a>
<a name="boost_regex.syntax.basic_extended.alternation-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.alternation">Alternation</a>
</h5>
<p>
@ -225,7 +225,7 @@ cab
will match either of "abd" or "abef".
</p>
<a name="boost_regex.syntax.basic_extended.character_sets_"></a><h5>
<a name="id1001731"></a>
<a name="boost_regex.syntax.basic_extended.character_sets_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_sets_">Character
sets:</a>
</h5>
@ -238,7 +238,7 @@ cab
A bracket expression may contain any combination of the following:
</p>
<a name="boost_regex.syntax.basic_extended.single_characters_"></a><h6>
<a name="id1001751"></a>
<a name="boost_regex.syntax.basic_extended.single_characters_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.single_characters_">Single
characters:</a>
</h6>
@ -247,7 +247,7 @@ cab
or 'c'.
</p>
<a name="boost_regex.syntax.basic_extended.character_ranges_"></a><h6>
<a name="id1001782"></a>
<a name="boost_regex.syntax.basic_extended.character_ranges_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_ranges_">Character
ranges:</a>
</h6>
@ -263,7 +263,7 @@ cab
the code points of the characters only.
</p>
<a name="boost_regex.syntax.basic_extended.negation_"></a><h6>
<a name="id1001844"></a>
<a name="boost_regex.syntax.basic_extended.negation_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.negation_">Negation:</a>
</h6>
<p>
@ -272,7 +272,7 @@ cab
range <code class="computeroutput"><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span></code>.
</p>
<a name="boost_regex.syntax.basic_extended.character_classes_"></a><h6>
<a name="id1001898"></a>
<a name="boost_regex.syntax.basic_extended.character_classes_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_classes_">Character
classes:</a>
</h6>
@ -282,7 +282,7 @@ cab
<a class="link" href="character_classes.html" title="Character Class Names">character class names</a>.
</p>
<a name="boost_regex.syntax.basic_extended.collating_elements_"></a><h6>
<a name="id1001949"></a>
<a name="boost_regex.syntax.basic_extended.collating_elements_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.collating_elements_">Collating
Elements:</a>
</h6>
@ -310,7 +310,7 @@ cab
matches a NUL character.
</p>
<a name="boost_regex.syntax.basic_extended.equivalence_classes_"></a><h6>
<a name="id1002051"></a>
<a name="boost_regex.syntax.basic_extended.equivalence_classes_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.equivalence_classes_">Equivalence
classes:</a>
</h6>
@ -327,7 +327,7 @@ cab
or even all locales on one platform.
</p>
<a name="boost_regex.syntax.basic_extended.combinations_"></a><h6>
<a name="id1002109"></a>
<a name="boost_regex.syntax.basic_extended.combinations_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.combinations_">Combinations:</a>
</h6>
<p>
@ -335,21 +335,21 @@ cab
<code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]]</span></code>.
</p>
<a name="boost_regex.syntax.basic_extended.escapes"></a><h5>
<a name="id1002162"></a>
<a name="boost_regex.syntax.basic_extended.escapes-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes">Escapes</a>
</h5>
<p>
The POSIX standard defines no escape sequences for POSIX-Extended regular
expressions, except that:
</p>
<div class="itemizedlist"><ul type="disc">
<li>
<div class="itemizedlist"><ul class="itemizedlist" type="disc">
<li class="listitem">
Any special character preceded by an escape shall match itself.
</li>
<li>
<li class="listitem">
The effect of any ordinary character being preceded by an escape is undefined.
</li>
<li>
<li class="listitem">
An escape inside a character class declaration shall match itself: in
other words the escape character is not "special" inside a
character class declaration; so <code class="computeroutput"><span class="special">[\^]</span></code>
@ -361,7 +361,7 @@ cab
extensions are also supported by Boost.Regex:
</p>
<a name="boost_regex.syntax.basic_extended.escapes_matching_a_specific_character"></a><h6>
<a name="id1002214"></a>
<a name="boost_regex.syntax.basic_extended.escapes_matching_a_specific_character-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.escapes_matching_a_specific_character">Escapes
matching a specific character</a>
</h6>
@ -550,7 +550,7 @@ cab
</tbody>
</table></div>
<a name="boost_regex.syntax.basic_extended._quot_single_character_quot__character_classes_"></a><h6>
<a name="id1002522"></a>
<a name="boost_regex.syntax.basic_extended._quot_single_character_quot__character_classes_-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended._quot_single_character_quot__character_classes_">"Single
character" character classes:</a>
</h6>
@ -704,7 +704,7 @@ cab
</tbody>
</table></div>
<a name="boost_regex.syntax.basic_extended.character_properties"></a><h6>
<a name="id1003023"></a>
<a name="boost_regex.syntax.basic_extended.character_properties-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.character_properties">Character
Properties</a>
</h6>
@ -811,7 +811,7 @@ cab
matches any "digit" character, as does <code class="computeroutput"><span class="special">\</span><span class="identifier">p</span><span class="special">{</span><span class="identifier">digit</span><span class="special">}</span></code>.
</p>
<a name="boost_regex.syntax.basic_extended.word_boundaries"></a><h6>
<a name="id1003342"></a>
<a name="boost_regex.syntax.basic_extended.word_boundaries-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.word_boundaries">Word Boundaries</a>
</h6>
<p>
@ -886,7 +886,7 @@ cab
</tbody>
</table></div>
<a name="boost_regex.syntax.basic_extended.buffer_boundaries"></a><h6>
<a name="id1003502"></a>
<a name="boost_regex.syntax.basic_extended.buffer_boundaries-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.buffer_boundaries">Buffer
boundaries</a>
</h6>
@ -977,7 +977,7 @@ cab
</tbody>
</table></div>
<a name="boost_regex.syntax.basic_extended.continuation_escape"></a><h6>
<a name="id1003694"></a>
<a name="boost_regex.syntax.basic_extended.continuation_escape-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.continuation_escape">Continuation
Escape</a>
</h6>
@ -989,7 +989,7 @@ cab
match to start where the last one ended.
</p>
<a name="boost_regex.syntax.basic_extended.quoting_escape"></a><h6>
<a name="id1003722"></a>
<a name="boost_regex.syntax.basic_extended.quoting_escape-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.quoting_escape">Quoting
escape</a>
</h6>
@ -1003,7 +1003,7 @@ cab
<span class="special">\*+</span><span class="identifier">aaa</span>
</pre>
<a name="boost_regex.syntax.basic_extended.unicode_escapes"></a><h6>
<a name="id1003802"></a>
<a name="boost_regex.syntax.basic_extended.unicode_escapes-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.unicode_escapes">Unicode
escapes</a>
</h6>
@ -1054,7 +1054,7 @@ cab
</tbody>
</table></div>
<a name="boost_regex.syntax.basic_extended.any_other_escape"></a><h6>
<a name="id1003908"></a>
<a name="boost_regex.syntax.basic_extended.any_other_escape-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.any_other_escape">Any other
escape</a>
</h6>
@ -1063,44 +1063,44 @@ cab
\@ matches a literal '@'.
</p>
<a name="boost_regex.syntax.basic_extended.operator_precedence"></a><h5>
<a name="id1003925"></a>
<a name="boost_regex.syntax.basic_extended.operator_precedence-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.operator_precedence">Operator
precedence</a>
</h5>
<p>
The order of precedence for of operators is as follows:
</p>
<div class="orderedlist"><ol type="1">
<li>
<div class="orderedlist"><ol class="orderedlist" type="1">
<li class="listitem">
Collation-related bracket symbols <code class="computeroutput"><span class="special">[==]</span>
<span class="special">[::]</span> <span class="special">[..]</span></code>
</li>
<li>
<li class="listitem">
Escaped characters <code class="computeroutput"><span class="special">\</span></code>
</li>
<li>
<li class="listitem">
Character set (bracket expression) <code class="computeroutput"><span class="special">[]</span></code>
</li>
<li>
<li class="listitem">
Grouping <code class="computeroutput"><span class="special">()</span></code>
</li>
<li>
<li class="listitem">
Single-character-ERE duplication <code class="computeroutput"><span class="special">*</span>
<span class="special">+</span> <span class="special">?</span>
<span class="special">{</span><span class="identifier">m</span><span class="special">,</span><span class="identifier">n</span><span class="special">}</span></code>
</li>
<li>
<li class="listitem">
Concatenation
</li>
<li>
<li class="listitem">
Anchoring ^$
</li>
<li>
<li class="listitem">
Alternation <code class="computeroutput"><span class="special">|</span></code>
</li>
</ol></div>
<a name="boost_regex.syntax.basic_extended.what_gets_matched"></a><h5>
<a name="id1004087"></a>
<a name="boost_regex.syntax.basic_extended.what_gets_matched-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.what_gets_matched">What
Gets Matched</a>
</h5>
@ -1110,11 +1110,11 @@ cab
rule</a>.
</p>
<a name="boost_regex.syntax.basic_extended.variations"></a><h4>
<a name="id1004108"></a>
<a name="boost_regex.syntax.basic_extended.variations-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.variations">Variations</a>
</h4>
<a name="boost_regex.syntax.basic_extended.egrep"></a><h5>
<a name="id1004122"></a>
<a name="boost_regex.syntax.basic_extended.egrep-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.egrep">Egrep</a>
</h5>
<p>
@ -1135,7 +1135,7 @@ cab
used with the -E option.
</p>
<a name="boost_regex.syntax.basic_extended.awk"></a><h5>
<a name="id1004224"></a>
<a name="boost_regex.syntax.basic_extended.awk-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.awk">awk</a>
</h5>
<p>
@ -1149,7 +1149,7 @@ cab
these by default anyway.
</p>
<a name="boost_regex.syntax.basic_extended.options"></a><h4>
<a name="id1004249"></a>
<a name="boost_regex.syntax.basic_extended.options-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.options">Options</a>
</h4>
<p>
@ -1162,7 +1162,7 @@ cab
modify how the case and locale sensitivity are to be applied.
</p>
<a name="boost_regex.syntax.basic_extended.references"></a><h4>
<a name="id1004327"></a>
<a name="boost_regex.syntax.basic_extended.references-heading"></a>
<a class="link" href="basic_extended.html#boost_regex.syntax.basic_extended.references">References</a>
</h4>
<p>
@ -1183,7 +1183,7 @@ cab
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 1998 -2010 John Maddock<p>
<td align="right"><div class="copyright-footer">Copyright &#169; 1998-2010 John Maddock<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>