2007-06-08 09:23:23 +00:00
< html >
< head >
< meta http-equiv = "Content-Type" content = "text/html; charset=ISO-8859-1" >
2007-12-14 10:11:21 +00:00
< title > POSIX Basic Regular Expression Syntax< / title >
2007-06-08 09:23:23 +00:00
< link rel = "stylesheet" href = "../../../../../../doc/html/boostbook.css" type = "text/css" >
2007-12-14 10:11:21 +00:00
< meta name = "generator" content = "DocBook XSL Stylesheets Vsnapshot_2006-12-17_0120" >
2007-06-08 09:23:23 +00:00
< link rel = "start" href = "../../index.html" title = "Boost.Regex" >
2007-12-14 10:11:21 +00:00
< link rel = "up" href = "../syntax.html" title = "Regular Expression Syntax" >
< link rel = "prev" href = "basic_extended.html" title = "POSIX Extended Regular Expression Syntax" >
< link rel = "next" href = "character_classes.html" title = "Character Class Names" >
2007-06-08 09:23:23 +00:00
< / head >
< body bgcolor = "white" text = "black" link = "#0000FF" vlink = "#840084" alink = "#0000FF" >
2007-08-13 17:54:01 +00:00
< table cellpadding = "2" width = "100%" > < tr >
2007-06-08 09:23:23 +00:00
< td valign = "top" > < img alt = "Boost C++ Libraries" width = "277" height = "86" src = "../../../../../../boost.png" > < / td >
2008-04-11 08:53:54 +00:00
< td align = "center" > < a href = "../../../../../../index.html" > Home< / a > < / td >
2007-06-08 09:23:23 +00:00
< td align = "center" > < a href = "../../../../../../libs/libraries.htm" > Libraries< / a > < / td >
2008-07-25 09:28:01 +00:00
< td align = "center" > < a href = "http://www.boost.org/users/people.html" > People< / a > < / td >
< td align = "center" > < a href = "http://www.boost.org/users/faq.html" > FAQ< / a > < / td >
2007-06-08 09:23:23 +00:00
< td align = "center" > < a href = "../../../../../../more/index.htm" > More< / a > < / td >
2007-08-13 17:54:01 +00:00
< / tr > < / table >
2007-06-08 09:23:23 +00:00
< hr >
< div class = "spirit-nav" >
< a accesskey = "p" href = "basic_extended.html" > < img src = "../../../../../../doc/html/images/prev.png" alt = "Prev" > < / a > < a accesskey = "u" href = "../syntax.html" > < img src = "../../../../../../doc/html/images/up.png" alt = "Up" > < / a > < a accesskey = "h" href = "../../index.html" > < img src = "../../../../../../doc/html/images/home.png" alt = "Home" > < / a > < a accesskey = "n" href = "character_classes.html" > < img src = "../../../../../../doc/html/images/next.png" alt = "Next" > < / a >
< / div >
< div class = "section" lang = "en" >
< div class = "titlepage" > < div > < div > < h3 class = "title" >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax" > < / a > < a href = "basic_syntax.html" title = "POSIX Basic Regular Expression Syntax" > POSIX Basic Regular
Expression Syntax< / a >
< / h3 > < / div > < / div > < / div >
< a name = "boost_regex.syntax.basic_syntax.synopsis" > < / a > < h4 >
2008-10-23 14:52:50 +00:00
< a name = "id518850" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.synopsis" > Synopsis< / a >
2007-12-14 10:11:21 +00:00
< / h4 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
The POSIX-Basic regular expression syntax is used by the Unix utility < code class = "computeroutput" > < span class = "identifier" > sed< / span > < / code > , and variations are used by < code class = "computeroutput" > < span class = "identifier" > grep< / span > < / code > and < code class = "computeroutput" > < span class = "identifier" > emacs< / span > < / code > .
2007-06-08 09:23:23 +00:00
You can construct POSIX basic regular expressions in Boost.Regex by passing
2007-12-14 10:11:21 +00:00
the flag < code class = "computeroutput" > < span class = "identifier" > basic< / span > < / code > to the regex
constructor (see < a href = "../ref/syntax_option_type.html" title = "syntax_option_type" > < code class = "computeroutput" > < span class = "identifier" > syntax_option_type< / span > < / code > < / a > ), for example:
2007-06-08 09:23:23 +00:00
< / p >
2007-12-14 10:11:21 +00:00
< pre class = "programlisting" > < span class = "comment" > // e1 is a case sensitive POSIX-Basic expression:
2007-06-08 09:23:23 +00:00
< / span > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "identifier" > e1< / span > < span class = "special" > (< / span > < span class = "identifier" > my_expression< / span > < span class = "special" > ,< / span > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "special" > ::< / span > < span class = "identifier" > basic< / span > < span class = "special" > );< / span >
< span class = "comment" > // e2 a case insensitive POSIX-Basic expression:
< / span > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "identifier" > e2< / span > < span class = "special" > (< / span > < span class = "identifier" > my_expression< / span > < span class = "special" > ,< / span > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "special" > ::< / span > < span class = "identifier" > basic< / span > < span class = "special" > |< / span > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "special" > ::< / span > < span class = "identifier" > icase< / span > < span class = "special" > );< / span >
< / pre >
< a name = "boost_regex.posix_basic" > < / a > < p >
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.posix_basic_syntax" > < / a > < h4 >
2008-10-23 14:52:50 +00:00
< a name = "id519142" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.posix_basic_syntax" > POSIX
Basic Syntax< / a >
2007-12-14 10:11:21 +00:00
< / h4 >
2007-06-08 09:23:23 +00:00
< p >
In POSIX-Basic regular expressions, all characters are match themselves except
for the following special characters:
< / p >
< pre class = "programlisting" > .[\*^$< / pre >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.wildcard_" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id519181" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.wildcard_" > Wildcard:< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
The single character '.' when used outside of a character set will match
any single character except:
< / p >
< div class = "itemizedlist" > < ul type = "disc" >
< li >
2007-12-14 10:11:21 +00:00
The NULL character when the flag < code class = "computeroutput" > < span class = "identifier" > match_no_dot_null< / span > < / code >
2007-06-08 09:23:23 +00:00
is passed to the matching algorithms.
< / li >
< li >
2007-12-14 10:11:21 +00:00
The newline character when the flag < code class = "computeroutput" > < span class = "identifier" > match_not_dot_newline< / span > < / code >
2007-06-08 09:23:23 +00:00
is passed to the matching algorithms.
< / li >
< / ul > < / div >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.anchors_" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id519250" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.anchors_" > Anchors:< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
A '^' character shall match the start of a line when used as the first character
of an expression, or the first character of a sub-expression.
< / p >
< p >
A '$' character shall match the end of a line when used as the last character
of an expression, or the last character of a sub-expression.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.marked_sub_expressions_" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id519286" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.marked_sub_expressions_" > Marked
sub-expressions:< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
A section beginning < code class = "computeroutput" > < span class = "special" > \(< / span > < / code > and ending
< code class = "computeroutput" > < span class = "special" > \)< / span > < / code > acts as a marked sub-expression.
2007-06-08 09:23:23 +00:00
Whatever matched the sub-expression is split out in a separate field by the
matching algorithms. Marked sub-expressions can also repeated, or referred-to
by a back-reference.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.repeats_" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id519343" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.repeats_" > Repeats:< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
Any atom (a single character, a marked sub-expression, or a character class)
can be repeated with the * operator.
< / p >
< p >
2007-12-14 10:11:21 +00:00
For example < code class = "computeroutput" > < span class = "identifier" > a< / span > < span class = "special" > *< / span > < / code >
2007-06-08 09:23:23 +00:00
will match any number of letter a's repeated zero or more times (an atom
2007-12-14 10:11:21 +00:00
repeated zero times matches an empty string), so the expression < code class = "computeroutput" > < span class = "identifier" > a< / span > < span class = "special" > *< / span > < span class = "identifier" > b< / span > < / code >
2007-06-08 09:23:23 +00:00
will match any of the following:
< / p >
< pre class = "programlisting" > b
ab
aaaaaaaab
< / pre >
< p >
An atom can also be repeated with a bounded repeat:
< / p >
< p >
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" > < span class = "identifier" > a< / span > < span class = "special" > \{< / span > < span class = "identifier" > n< / span > < span class = "special" > \}< / span > < / code > Matches
2007-06-08 09:23:23 +00:00
'a' repeated exactly n times.
< / p >
< p >
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" > < span class = "identifier" > a< / span > < span class = "special" > \{< / span > < span class = "identifier" > n< / span > < span class = "special" > ,\}< / span > < / code > Matches
2007-06-08 09:23:23 +00:00
'a' repeated n or more times.
< / p >
< p >
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" > < span class = "identifier" > a< / span > < span class = "special" > \{< / span > < span class = "identifier" > n< / span > < span class = "special" > ,< / span > < span class = "identifier" > m< / span > < span class = "special" > \}< / span > < / code > Matches 'a' repeated between n and m times
2007-06-08 09:23:23 +00:00
inclusive.
< / p >
< p >
For example:
< / p >
< pre class = "programlisting" > ^a{2,3}$< / pre >
< p >
Will match either of:
< / p >
< pre class = "programlisting" > aa
aaa
< / pre >
< p >
But neither of:
< / p >
< pre class = "programlisting" > a
aaaa
< / pre >
< p >
It is an error to use a repeat operator, if the preceding construct can not
be repeated, for example:
< / p >
< pre class = "programlisting" > a(*)< / pre >
< p >
Will raise an error, as there is nothing for the * operator to be applied
to.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.back_references_" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id519587" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.back_references_" > Back references:< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
An escape character followed by a digit < span class = "emphasis" > < em > n< / em > < / span > , where < span class = "emphasis" > < em > n< / em > < / span >
is in the range 1-9, matches the same string that was matched by sub-expression
< span class = "emphasis" > < em > n< / em > < / span > . For example the expression:
< / p >
< pre class = "programlisting" > ^\(a*\).*\1$< / pre >
< p >
Will match the string:
< / p >
< pre class = "programlisting" > aaabbaaa< / pre >
< p >
But not the string:
< / p >
< pre class = "programlisting" > aaabba< / pre >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.character_sets_" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id519661" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.character_sets_" > Character
sets:< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
A character set is a bracket-expression starting with [ and ending with ],
it defines a set of characters, and matches any single character that is
a member of that set.
< / p >
< p >
A bracket expression may contain any combination of the following:
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.single_characters_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id519697" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.single_characters_" > Single
characters:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
For example < code class = "computeroutput" > < span class = "special" > [< / span > < span class = "identifier" > abc< / span > < span class = "special" > ]< / span > < / code > , will match any of the characters 'a', 'b',
2007-06-08 09:23:23 +00:00
or 'c'.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.character_ranges_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id519747" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.character_ranges_" > Character
ranges:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
For example < code class = "computeroutput" > < span class = "special" > [< / span > < span class = "identifier" > a< / span > < span class = "special" > -< / span > < span class = "identifier" > c< / span > < span class = "special" > ]< / span > < / code >
2007-06-08 09:23:23 +00:00
will match any single character in the range 'a' to 'c'. By default, for
POSIX-Basic regular expressions, a character < span class = "emphasis" > < em > x< / em > < / span > is within
the range < span class = "emphasis" > < em > y< / em > < / span > to < span class = "emphasis" > < em > z< / em > < / span > , if it collates
within that range; this results in locale specific behavior. This behavior
2007-12-14 10:11:21 +00:00
can be turned off by unsetting the < code class = "computeroutput" > < span class = "identifier" > collate< / span > < / code >
2007-06-08 09:23:23 +00:00
option flag when constructing the regular expression - in which case whether
a character appears within a range is determined by comparing the code points
of the characters only.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.negation_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id519839" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.negation_" > Negation:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
If the bracket-expression begins with the ^ character, then it matches the
2007-12-14 10:11:21 +00:00
complement of the characters it contains, for example < code class = "computeroutput" > < span class = "special" > [^< / span > < span class = "identifier" > a< / span > < span class = "special" > -< / span > < span class = "identifier" > c< / span > < span class = "special" > ]< / span > < / code > matches any character that is not in the
2007-06-08 09:23:23 +00:00
range a-c.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.character_classes_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id519900" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.character_classes_" > Character
classes:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
An expression of the form < code class = "computeroutput" > < span class = "special" > [[:< / span > < span class = "identifier" > name< / span > < span class = "special" > :]]< / span > < / code >
matches the named character class "name", for example < code class = "computeroutput" > < span class = "special" > [[:< / span > < span class = "identifier" > lower< / span > < span class = "special" > :]]< / span > < / code > matches any lower case character. See
< a href = "character_classes.html" title = "Character Class Names" > character class names< / a > .
2007-06-08 09:23:23 +00:00
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.collating_elements_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id519983" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.collating_elements_" > Collating
Elements:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
An expression of the form < code class = "computeroutput" > < span class = "special" > [[.< / span > < span class = "identifier" > col< / span > < span class = "special" > .]< / span > < / code > matches
2007-06-08 09:23:23 +00:00
the collating element < span class = "emphasis" > < em > col< / em > < / span > . A collating element is any
single character, or any sequence of characters that collates as a single
unit. Collating elements may also be used as the end point of a range, for
2007-12-14 10:11:21 +00:00
example: < code class = "computeroutput" > < span class = "special" > [[.< / span > < span class = "identifier" > ae< / span > < span class = "special" > .]-< / span > < span class = "identifier" > c< / span > < span class = "special" > ]< / span > < / code >
2007-06-08 09:23:23 +00:00
matches the character sequence "ae", plus any single character
in the rangle "ae"-c, assuming that "ae" is treated as
a single collating element in the current locale.
< / p >
< p >
Collating elements may be used in place of escapes (which are not normally
2007-12-14 10:11:21 +00:00
allowed inside character sets), for example < code class = "computeroutput" > < span class = "special" > [[.^.]< / span > < span class = "identifier" > abc< / span > < span class = "special" > ]< / span > < / code > would
2007-06-08 09:23:23 +00:00
match either one of the characters 'abc^'.
< / p >
< p >
As an extension, a collating element may also be specified via its symbolic
name, for example:
< / p >
< pre class = "programlisting" > [[.NUL.]]< / pre >
< p >
2007-12-14 10:11:21 +00:00
matches a 'NUL' character. See < a href = "collating_names.html" title = "Collating Names" > collating
2007-06-08 09:23:23 +00:00
element names< / a > .
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.equivalence_classes_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id520132" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.equivalence_classes_" > Equivalence
classes:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
An expression of theform < code class = "computeroutput" > < span class = "special" > [[=< / span > < span class = "identifier" > col< / span > < span class = "special" > =]]< / span > < / code > ,
2007-06-08 09:23:23 +00:00
matches any character or collating element whose primary sort key is the
same as that for collating element < span class = "emphasis" > < em > col< / em > < / span > , as with collating
2007-12-14 10:11:21 +00:00
elements the name < span class = "emphasis" > < em > col< / em > < / span > may be a < a href = "collating_names.html" title = "Collating Names" > collating
2007-06-08 09:23:23 +00:00
symbolic name< / a > . A primary sort key is one that ignores case, accentation,
2007-12-14 10:11:21 +00:00
or locale-specific tailorings; so for example < code class = "computeroutput" > < span class = "special" > [[=< / span > < span class = "identifier" > a< / span > < span class = "special" > =]]< / span > < / code > matches
2007-06-08 09:23:23 +00:00
any of the characters: a, <20> , <20> , <20> , <20> , <20> , <20> , A, <20> , <20> , <20> , <20> , <20> and <20> . Unfortunately implementation
of this is reliant on the platform's collation and localisation support;
this feature can not be relied upon to work portably across all platforms,
or even all locales on one platform.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.combinations_" > < / a > < h6 >
2008-10-23 14:52:50 +00:00
< a name = "id520236" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.combinations_" > Combinations:< / a >
2007-12-14 10:11:21 +00:00
< / h6 >
2007-06-08 09:23:23 +00:00
< p >
All of the above can be combined in one character set declaration, for example:
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" > < span class = "special" > [[:< / span > < span class = "identifier" > digit< / span > < span class = "special" > :]< / span > < span class = "identifier" > a< / span > < span class = "special" > -< / span > < span class = "identifier" > c< / span > < span class = "special" > [.< / span > < span class = "identifier" > NUL< / span > < span class = "special" > .]].< / span > < / code >
2007-06-08 09:23:23 +00:00
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.escapes" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id520314" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.escapes" > Escapes< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
With the exception of the escape sequences \{, \}, \(, and \), which are
documented above, an escape followed by any character matches that character.
This can be used to make the special characters
< / p >
< pre class = "programlisting" > .[\*^$< / pre >
< p >
"ordinary". Note that the escape character loses its special meaning
2007-12-14 10:11:21 +00:00
inside a character set, so < code class = "computeroutput" > < span class = "special" > [\^]< / span > < / code >
2007-06-08 09:23:23 +00:00
will match either a literal '\' or a '^'.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.what_gets_matched" > < / a > < h4 >
2008-10-23 14:52:50 +00:00
< a name = "id520371" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.what_gets_matched" > What Gets
Matched< / a >
2007-12-14 10:11:21 +00:00
< / h4 >
2007-06-08 09:23:23 +00:00
< p >
When there is more that one way to match a regular expression, the "best"
2007-12-14 10:11:21 +00:00
possible match is obtained using the < a href = "leftmost_longest_rule.html" title = "The Leftmost Longest Rule" > leftmost-longest
2007-06-08 09:23:23 +00:00
rule< / a > .
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.variations" > < / a > < h4 >
2008-10-23 14:52:50 +00:00
< a name = "id520411" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.variations" > Variations< / a >
2007-12-14 10:11:21 +00:00
< / h4 >
2007-06-08 09:23:23 +00:00
< a name = "boost_regex.grep_syntax" > < / a > < p >
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.grep" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id520443" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.grep" > Grep< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
When an expression is compiled with the flag < code class = "computeroutput" > < span class = "identifier" > grep< / span > < / code >
2007-06-08 09:23:23 +00:00
set, then the expression is treated as a newline separated list of < a href = "basic_syntax.html#boost_regex.posix_basic" > POSIX-Basic expressions< / a > , a match
is found if any of the expressions in the list match, for example:
< / p >
2007-12-14 10:11:21 +00:00
< pre class = "programlisting" > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "identifier" > e< / span > < span class = "special" > (< / span > < span class = "string" > "abc\ndef"< / span > < span class = "special" > ,< / span > < span class = "identifier" > boost< / span > < span class = "special" > ::< / span > < span class = "identifier" > regex< / span > < span class = "special" > ::< / span > < span class = "identifier" > grep< / span > < span class = "special" > );< / span >
2007-06-08 09:23:23 +00:00
< / pre >
< p >
will match either of the < a href = "basic_syntax.html#boost_regex.posix_basic" > POSIX-Basic
expressions< / a > "abc" or "def".
< / p >
< p >
As its name suggests, this behavior is consistent with the Unix utility grep.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.emacs" > < / a > < h5 >
2008-10-23 14:52:50 +00:00
< a name = "id520587" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.emacs" > emacs< / a >
2007-12-14 10:11:21 +00:00
< / h5 >
2007-06-08 09:23:23 +00:00
< p >
In addition to the < a href = "basic_syntax.html#boost_regex.posix_basic" > POSIX-Basic features< / a >
the following characters are also special:
< / p >
< div class = "informaltable" > < table class = "table" >
< colgroup >
< col >
< col >
< / colgroup >
< thead > < tr >
< th >
< p >
Character
< / p >
< / th >
< th >
< p >
Description
< / p >
< / th >
< / tr > < / thead >
< tbody >
< tr >
< td >
< p >
+
< / p >
< / td >
< td >
< p >
repeats the preceding atom one or more times.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
?
< / p >
< / td >
< td >
< p >
repeats the preceding atom zero or one times.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
*?
< / p >
< / td >
< td >
< p >
A non-greedy version of *.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
+?
< / p >
< / td >
< td >
< p >
A non-greedy version of +.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
??
< / p >
< / td >
< td >
< p >
A non-greedy version of ?.
< / p >
< / td >
< / tr >
< / tbody >
< / table > < / div >
< p >
And the following escape sequences are also recognised:
< / p >
< div class = "informaltable" > < table class = "table" >
< colgroup >
< col >
< col >
< / colgroup >
< thead > < tr >
< th >
< p >
Escape
< / p >
< / th >
< th >
< p >
Description
< / p >
< / th >
< / tr > < / thead >
< tbody >
< tr >
< td >
< p >
\|
< / p >
< / td >
< td >
< p >
specifies an alternative.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\(?: ... )
< / p >
< / td >
< td >
< p >
is a non-marking grouping construct - allows you to lexically group
something without spitting out an extra sub-expression.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\w
< / p >
< / td >
< td >
< p >
matches any word character.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\W
< / p >
< / td >
< td >
< p >
matches any non-word character.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\sx
< / p >
< / td >
< td >
< p >
matches any character in the syntax group x, the following emacs
groupings are supported: 's', ' ', '_', 'w', '.', ')', '(', '"',
'\'', '> ' and '< '. Refer to the emacs docs for details.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\Sx
< / p >
< / td >
< td >
< p >
matches any character not in the syntax grouping x.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\c and \C
< / p >
< / td >
< td >
< p >
These are not supported.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\`
< / p >
< / td >
< td >
< p >
matches zero characters only at the start of a buffer (or string
being matched).
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\'
< / p >
< / td >
< td >
< p >
matches zero characters only at the end of a buffer (or string being
matched).
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\b
< / p >
< / td >
< td >
< p >
matches zero characters at a word boundary.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\B
< / p >
< / td >
< td >
< p >
matches zero characters, not at a word boundary.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\<
< / p >
< / td >
< td >
< p >
matches zero characters only at the start of a word.
< / p >
< / td >
< / tr >
< tr >
< td >
< p >
\>
< / p >
< / td >
< td >
< p >
matches zero characters only at the end of a word.
< / p >
< / td >
< / tr >
< / tbody >
< / table > < / div >
< p >
Finally, you should note that emacs style regular expressions are matched
according to the < a href = "perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched" > Perl
"depth first search" rules< / a > . Emacs expressions are matched
this way because they contain Perl-like extensions, that do not interact
2007-12-14 10:11:21 +00:00
well with the < a href = "leftmost_longest_rule.html" title = "The Leftmost Longest Rule" > POSIX-style
2007-06-08 09:23:23 +00:00
leftmost-longest rule< / a > .
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.options" > < / a > < h4 >
2008-10-23 14:52:50 +00:00
< a name = "id521082" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.options" > Options< / a >
2007-12-14 10:11:21 +00:00
< / h4 >
< p >
There are a < a href = "../ref/syntax_option_type/syntax_option_type_basic.html" title = "Options for POSIX Basic Regular Expressions" > variety
of flags< / a > that may be combined with the < code class = "computeroutput" > < span class = "identifier" > basic< / span > < / code >
and < code class = "computeroutput" > < span class = "identifier" > grep< / span > < / code > options when constructing
the regular expression, in particular note that the < a href = "../ref/syntax_option_type/syntax_option_type_basic.html" title = "Options for POSIX Basic Regular Expressions" > < code class = "computeroutput" > < span class = "identifier" > newline_alt< / span > < / code > , < code class = "computeroutput" > < span class = "identifier" > no_char_classes< / span > < / code > ,
< code class = "computeroutput" > < span class = "identifier" > no< / span > < span class = "special" > -< / span > < span class = "identifier" > intervals< / span > < / code > , < code class = "computeroutput" > < span class = "identifier" > bk_plus_qm< / span > < / code >
and < code class = "computeroutput" > < span class = "identifier" > bk_plus_vbar< / span > < / code > < / a > options
all alter the syntax, while the < a href = "../ref/syntax_option_type/syntax_option_type_basic.html" title = "Options for POSIX Basic Regular Expressions" > < code class = "computeroutput" > < span class = "identifier" > collate< / span > < / code > and < code class = "computeroutput" > < span class = "identifier" > icase< / span > < / code >
2007-06-08 09:23:23 +00:00
options< / a > modify how the case and locale sensitivity are to be applied.
< / p >
2007-12-14 10:11:21 +00:00
< a name = "boost_regex.syntax.basic_syntax.references" > < / a > < h4 >
2008-10-23 14:52:50 +00:00
< a name = "id521255" > < / a >
2007-06-08 09:23:23 +00:00
< a href = "basic_syntax.html#boost_regex.syntax.basic_syntax.references" > References< / a >
2007-12-14 10:11:21 +00:00
< / h4 >
2007-06-08 09:23:23 +00:00
< p >
< a href = "http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target = "_top" > IEEE
Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions
and Headers, Section 9, Regular Expressions (FWD.1).< / a >
< / p >
< p >
< a href = "http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target = "_top" > IEEE
Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
Utilities, Section 4, Utilities, grep (FWD.1).< / a >
< / p >
< p >
< a href = "http://www.gnu.org/software/emacs/" target = "_top" > Emacs Version 21.3.< / a >
< / p >
< / div >
< table xmlns:rev = "http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width = "100%" > < tr >
< td align = "left" > < / td >
2007-12-14 10:11:21 +00:00
< td align = "right" > < div class = "copyright-footer" > Copyright <20> 1998 -2007 John Maddock< p >
2007-11-07 03:23:31 +00:00
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at < a href = "http://www.boost.org/LICENSE_1_0.txt" target = "_top" > http://www.boost.org/LICENSE_1_0.txt< / a > )
2007-12-14 10:11:21 +00:00
< / p >
< / div > < / td >
2007-06-08 09:23:23 +00:00
< / tr > < / table >
< hr >
< div class = "spirit-nav" >
< a accesskey = "p" href = "basic_extended.html" > < img src = "../../../../../../doc/html/images/prev.png" alt = "Prev" > < / a > < a accesskey = "u" href = "../syntax.html" > < img src = "../../../../../../doc/html/images/up.png" alt = "Up" > < / a > < a accesskey = "h" href = "../../index.html" > < img src = "../../../../../../doc/html/images/home.png" alt = "Home" > < / a > < a accesskey = "n" href = "character_classes.html" > < img src = "../../../../../../doc/html/images/next.png" alt = "Next" > < / a >
< / div >
< / body >
< / html >