Fix gcc warnings from ICU wrappers.

Add optional support for marked sub-expression location information.
Add support for ${n} in format replacement text.
Fixes #2556.
Fixes #2269.
Fixes #2514.

[SVN r50370]
This commit is contained in:
John Maddock
2008-12-23 11:46:00 +00:00
parent c997a1fcc6
commit b4152cd74d
94 changed files with 1344 additions and 1068 deletions

View File

@ -3,8 +3,8 @@
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>POSIX Basic Regular Expression Syntax</title>
<link rel="stylesheet" href="../../../../../../doc/html/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot_2006-12-17_0120">
<link rel="start" href="../../index.html" title="Boost.Regex">
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot_8125">
<link rel="home" href="../../index.html" title="Boost.Regex">
<link rel="up" href="../syntax.html" title="Regular Expression Syntax">
<link rel="prev" href="basic_extended.html" title="POSIX Extended Regular Expression Syntax">
<link rel="next" href="character_classes.html" title="Character Class Names">
@ -24,18 +24,18 @@
</div>
<div class="section" lang="en">
<div class="titlepage"><div><div><h3 class="title">
<a name="boost_regex.syntax.basic_syntax"></a><a href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax"> POSIX Basic Regular
<a name="boost_regex.syntax.basic_syntax"></a><a class="link" href="basic_syntax.html" title="POSIX Basic Regular Expression Syntax"> POSIX Basic Regular
Expression Syntax</a>
</h3></div></div></div>
<a name="boost_regex.syntax.basic_syntax.synopsis"></a><h4>
<a name="id518850"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.synopsis">Synopsis</a>
<a name="id546330"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.synopsis">Synopsis</a>
</h4>
<p>
The POSIX-Basic regular expression syntax is used by the Unix utility <code class="computeroutput"><span class="identifier">sed</span></code>, and variations are used by <code class="computeroutput"><span class="identifier">grep</span></code> and <code class="computeroutput"><span class="identifier">emacs</span></code>.
You can construct POSIX basic regular expressions in Boost.Regex by passing
the flag <code class="computeroutput"><span class="identifier">basic</span></code> to the regex
constructor (see <a href="../ref/syntax_option_type.html" title="syntax_option_type"><code class="computeroutput"><span class="identifier">syntax_option_type</span></code></a>), for example:
constructor (see <a class="link" href="../ref/syntax_option_type.html" title="syntax_option_type"><code class="computeroutput"><span class="identifier">syntax_option_type</span></code></a>), for example:
</p>
<pre class="programlisting"><span class="comment">// e1 is a case sensitive POSIX-Basic expression:
</span><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e1</span><span class="special">(</span><span class="identifier">my_expression</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">basic</span><span class="special">);</span>
@ -45,8 +45,8 @@
<a name="boost_regex.posix_basic"></a><p>
</p>
<a name="boost_regex.syntax.basic_syntax.posix_basic_syntax"></a><h4>
<a name="id519142"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.posix_basic_syntax">POSIX
<a name="id546622"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.posix_basic_syntax">POSIX
Basic Syntax</a>
</h4>
<p>
@ -55,8 +55,8 @@
</p>
<pre class="programlisting">.[\*^$</pre>
<a name="boost_regex.syntax.basic_syntax.wildcard_"></a><h5>
<a name="id519181"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.wildcard_">Wildcard:</a>
<a name="id546661"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.wildcard_">Wildcard:</a>
</h5>
<p>
The single character '.' when used outside of a character set will match
@ -73,8 +73,8 @@
</li>
</ul></div>
<a name="boost_regex.syntax.basic_syntax.anchors_"></a><h5>
<a name="id519250"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.anchors_">Anchors:</a>
<a name="id546729"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.anchors_">Anchors:</a>
</h5>
<p>
A '^' character shall match the start of a line when used as the first character
@ -85,8 +85,8 @@
of an expression, or the last character of a sub-expression.
</p>
<a name="boost_regex.syntax.basic_syntax.marked_sub_expressions_"></a><h5>
<a name="id519286"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.marked_sub_expressions_">Marked
<a name="id546766"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.marked_sub_expressions_">Marked
sub-expressions:</a>
</h5>
<p>
@ -97,8 +97,8 @@
by a back-reference.
</p>
<a name="boost_regex.syntax.basic_syntax.repeats_"></a><h5>
<a name="id519343"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.repeats_">Repeats:</a>
<a name="id546822"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.repeats_">Repeats:</a>
</h5>
<p>
Any atom (a single character, a marked sub-expression, or a character class)
@ -155,8 +155,8 @@ aaaa
to.
</p>
<a name="boost_regex.syntax.basic_syntax.back_references_"></a><h5>
<a name="id519587"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.back_references_">Back references:</a>
<a name="id547066"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.back_references_">Back references:</a>
</h5>
<p>
An escape character followed by a digit <span class="emphasis"><em>n</em></span>, where <span class="emphasis"><em>n</em></span>
@ -173,8 +173,8 @@ aaaa
</p>
<pre class="programlisting">aaabba</pre>
<a name="boost_regex.syntax.basic_syntax.character_sets_"></a><h5>
<a name="id519661"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_sets_">Character
<a name="id547141"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_sets_">Character
sets:</a>
</h5>
<p>
@ -186,8 +186,8 @@ aaaa
A bracket expression may contain any combination of the following:
</p>
<a name="boost_regex.syntax.basic_syntax.single_characters_"></a><h6>
<a name="id519697"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.single_characters_">Single
<a name="id547177"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.single_characters_">Single
characters:</a>
</h6>
<p>
@ -195,8 +195,8 @@ aaaa
or 'c'.
</p>
<a name="boost_regex.syntax.basic_syntax.character_ranges_"></a><h6>
<a name="id519747"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_ranges_">Character
<a name="id547227"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_ranges_">Character
ranges:</a>
</h6>
<p>
@ -211,8 +211,8 @@ aaaa
of the characters only.
</p>
<a name="boost_regex.syntax.basic_syntax.negation_"></a><h6>
<a name="id519839"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.negation_">Negation:</a>
<a name="id547319"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.negation_">Negation:</a>
</h6>
<p>
If the bracket-expression begins with the ^ character, then it matches the
@ -220,18 +220,18 @@ aaaa
range a-c.
</p>
<a name="boost_regex.syntax.basic_syntax.character_classes_"></a><h6>
<a name="id519900"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_classes_">Character
<a name="id547380"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.character_classes_">Character
classes:</a>
</h6>
<p>
An expression of the form <code class="computeroutput"><span class="special">[[:</span><span class="identifier">name</span><span class="special">:]]</span></code>
matches the named character class "name", for example <code class="computeroutput"><span class="special">[[:</span><span class="identifier">lower</span><span class="special">:]]</span></code> matches any lower case character. See
<a href="character_classes.html" title="Character Class Names">character class names</a>.
<a class="link" href="character_classes.html" title="Character Class Names">character class names</a>.
</p>
<a name="boost_regex.syntax.basic_syntax.collating_elements_"></a><h6>
<a name="id519983"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.collating_elements_">Collating
<a name="id547463"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.collating_elements_">Collating
Elements:</a>
</h6>
<p>
@ -255,19 +255,19 @@ aaaa
</p>
<pre class="programlisting">[[.NUL.]]</pre>
<p>
matches a 'NUL' character. See <a href="collating_names.html" title="Collating Names">collating
matches a 'NUL' character. See <a class="link" href="collating_names.html" title="Collating Names">collating
element names</a>.
</p>
<a name="boost_regex.syntax.basic_syntax.equivalence_classes_"></a><h6>
<a name="id520132"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.equivalence_classes_">Equivalence
<a name="id547611"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.equivalence_classes_">Equivalence
classes:</a>
</h6>
<p>
An expression of theform <code class="computeroutput"><span class="special">[[=</span><span class="identifier">col</span><span class="special">=]]</span></code>,
matches any character or collating element whose primary sort key is the
same as that for collating element <span class="emphasis"><em>col</em></span>, as with collating
elements the name <span class="emphasis"><em>col</em></span> may be a <a href="collating_names.html" title="Collating Names">collating
elements the name <span class="emphasis"><em>col</em></span> may be a <a class="link" href="collating_names.html" title="Collating Names">collating
symbolic name</a>. A primary sort key is one that ignores case, accentation,
or locale-specific tailorings; so for example <code class="computeroutput"><span class="special">[[=</span><span class="identifier">a</span><span class="special">=]]</span></code> matches
any of the characters: a, <20>, <20>, <20>, <20>, <20>, <20>, A, <20>, <20>, <20>, <20>, <20> and <20>. Unfortunately implementation
@ -276,16 +276,16 @@ aaaa
or even all locales on one platform.
</p>
<a name="boost_regex.syntax.basic_syntax.combinations_"></a><h6>
<a name="id520236"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.combinations_">Combinations:</a>
<a name="id547716"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.combinations_">Combinations:</a>
</h6>
<p>
All of the above can be combined in one character set declaration, for example:
<code class="computeroutput"><span class="special">[[:</span><span class="identifier">digit</span><span class="special">:]</span><span class="identifier">a</span><span class="special">-</span><span class="identifier">c</span><span class="special">[.</span><span class="identifier">NUL</span><span class="special">.]].</span></code>
</p>
<a name="boost_regex.syntax.basic_syntax.escapes"></a><h5>
<a name="id520314"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.escapes">Escapes</a>
<a name="id547794"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.escapes">Escapes</a>
</h5>
<p>
With the exception of the escape sequences \{, \}, \(, and \), which are
@ -299,45 +299,45 @@ aaaa
will match either a literal '\' or a '^'.
</p>
<a name="boost_regex.syntax.basic_syntax.what_gets_matched"></a><h4>
<a name="id520371"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.what_gets_matched">What Gets
<a name="id547851"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.what_gets_matched">What Gets
Matched</a>
</h4>
<p>
When there is more that one way to match a regular expression, the "best"
possible match is obtained using the <a href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest
possible match is obtained using the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">leftmost-longest
rule</a>.
</p>
<a name="boost_regex.syntax.basic_syntax.variations"></a><h4>
<a name="id520411"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.variations">Variations</a>
<a name="id547890"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.variations">Variations</a>
</h4>
<a name="boost_regex.grep_syntax"></a><p>
</p>
<a name="boost_regex.syntax.basic_syntax.grep"></a><h5>
<a name="id520443"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.grep">Grep</a>
<a name="id547923"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.grep">Grep</a>
</h5>
<p>
When an expression is compiled with the flag <code class="computeroutput"><span class="identifier">grep</span></code>
set, then the expression is treated as a newline separated list of <a href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic expressions</a>, a match
set, then the expression is treated as a newline separated list of <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic expressions</a>, a match
is found if any of the expressions in the list match, for example:
</p>
<pre class="programlisting"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"abc\ndef"</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span><span class="special">::</span><span class="identifier">grep</span><span class="special">);</span>
</pre>
<p>
will match either of the <a href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic
will match either of the <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic
expressions</a> "abc" or "def".
</p>
<p>
As its name suggests, this behavior is consistent with the Unix utility grep.
</p>
<a name="boost_regex.syntax.basic_syntax.emacs"></a><h5>
<a name="id520587"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.emacs">emacs</a>
<a name="id548067"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.emacs">emacs</a>
</h5>
<p>
In addition to the <a href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic features</a>
In addition to the <a class="link" href="basic_syntax.html#boost_regex.posix_basic">POSIX-Basic features</a>
the following characters are also special:
</p>
<div class="informaltable"><table class="table">
@ -606,29 +606,29 @@ aaaa
</table></div>
<p>
Finally, you should note that emacs style regular expressions are matched
according to the <a href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">Perl
according to the <a class="link" href="perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched">Perl
"depth first search" rules</a>. Emacs expressions are matched
this way because they contain Perl-like extensions, that do not interact
well with the <a href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">POSIX-style
well with the <a class="link" href="leftmost_longest_rule.html" title="The Leftmost Longest Rule">POSIX-style
leftmost-longest rule</a>.
</p>
<a name="boost_regex.syntax.basic_syntax.options"></a><h4>
<a name="id521082"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.options">Options</a>
<a name="id548562"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.options">Options</a>
</h4>
<p>
There are a <a href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions">variety
There are a <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions">variety
of flags</a> that may be combined with the <code class="computeroutput"><span class="identifier">basic</span></code>
and <code class="computeroutput"><span class="identifier">grep</span></code> options when constructing
the regular expression, in particular note that the <a href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code>, <code class="computeroutput"><span class="identifier">no_char_classes</span></code>,
the regular expression, in particular note that the <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">newline_alt</span></code>, <code class="computeroutput"><span class="identifier">no_char_classes</span></code>,
<code class="computeroutput"><span class="identifier">no</span><span class="special">-</span><span class="identifier">intervals</span></code>, <code class="computeroutput"><span class="identifier">bk_plus_qm</span></code>
and <code class="computeroutput"><span class="identifier">bk_plus_vbar</span></code></a> options
all alter the syntax, while the <a href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code> and <code class="computeroutput"><span class="identifier">icase</span></code>
all alter the syntax, while the <a class="link" href="../ref/syntax_option_type/syntax_option_type_basic.html" title="Options for POSIX Basic Regular Expressions"><code class="computeroutput"><span class="identifier">collate</span></code> and <code class="computeroutput"><span class="identifier">icase</span></code>
options</a> modify how the case and locale sensitivity are to be applied.
</p>
<a name="boost_regex.syntax.basic_syntax.references"></a><h4>
<a name="id521255"></a>
<a href="basic_syntax.html#boost_regex.syntax.basic_syntax.references">References</a>
<a name="id548735"></a>
<a class="link" href="basic_syntax.html#boost_regex.syntax.basic_syntax.references">References</a>
</h4>
<p>
<a href="http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target="_top">IEEE