2007-06-08 09:23:23 +00:00
< html >
< head >
2020-10-12 18:22:57 +01:00
< meta http-equiv = "Content-Type" content = "text/html; charset=UTF-8" >
2007-12-14 10:11:21 +00:00
< title > POSIX Basic Regular Expression Syntax</ title >
2010-07-08 22:49:58 +00:00
< link rel = "stylesheet" href = "../../../../../../doc/src/boostbook.css" type = "text/css" >
2019-10-26 10:51:25 +01:00
< meta name = "generator" content = "DocBook XSL Stylesheets V1.79.1" >
2022-03-08 11:26:11 +00:00
< link rel = "home" href = "../../index.html" title = "Boost.Regex 7.0.1" >
2007-12-14 10:11:21 +00:00
< link rel = "up" href = "../syntax.html" title = "Regular Expression Syntax" >
< link rel = "prev" href = "basic_extended.html" title = "POSIX Extended Regular Expression Syntax" >
< link rel = "next" href = "character_classes.html" title = "Character Class Names" >
2007-06-08 09:23:23 +00:00
</ head >
< body bgcolor = "white" text = "black" link = "#0000FF" vlink = "#840084" alink = "#0000FF" >
2007-08-13 17:54:01 +00:00
< table cellpadding = "2" width = "100%" >< tr >
2007-06-08 09:23:23 +00:00
< td valign = "top" >< img alt = "Boost C++ Libraries" width = "277" height = "86" src = "../../../../../../boost.png" ></ td >
2008-04-11 08:53:54 +00:00
< td align = "center" >< a href = "../../../../../../index.html" > Home</ a ></ td >
2007-06-08 09:23:23 +00:00
< td align = "center" >< a href = "../../../../../../libs/libraries.htm" > Libraries</ a ></ td >
2008-07-25 09:28:01 +00:00
< td align = "center" >< a href = "http://www.boost.org/users/people.html" > People</ a ></ td >
< td align = "center" >< a href = "http://www.boost.org/users/faq.html" > FAQ</ a ></ td >
2007-06-08 09:23:23 +00:00
< td align = "center" >< a href = "../../../../../../more/index.htm" > More</ a ></ td >
2007-08-13 17:54:01 +00:00
</ tr ></ table >
2007-06-08 09:23:23 +00:00
< hr >
< div class = "spirit-nav" >
2010-07-08 22:49:58 +00:00
< a accesskey = "p" href = "basic_extended.html" >< img src = "../../../../../../doc/src/images/prev.png" alt = "Prev" ></ a >< a accesskey = "u" href = "../syntax.html" >< img src = "../../../../../../doc/src/images/up.png" alt = "Up" ></ a >< a accesskey = "h" href = "../../index.html" >< img src = "../../../../../../doc/src/images/home.png" alt = "Home" ></ a >< a accesskey = "n" href = "character_classes.html" >< img src = "../../../../../../doc/src/images/next.png" alt = "Next" ></ a >
2007-06-08 09:23:23 +00:00
</ div >
2013-12-14 17:42:13 +00:00
< div class = "section" >
2007-06-08 09:23:23 +00:00
< div class = "titlepage" >< div >< div >< h3 class = "title" >
2011-01-01 12:27:00 +00:00
< a name = "boost_regex.syntax.basic_syntax" ></ a >< a class = "link" href = "basic_syntax.html" title = "POSIX Basic Regular Expression Syntax" > POSIX Basic Regular
2007-12-14 10:11:21 +00:00
Expression Syntax</ a >
</ h3 ></ div ></ div ></ div >
2011-12-24 17:51:57 +00:00
< h4 >
< a name = "boost_regex.syntax.basic_syntax.h0" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.synopsis" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.synopsis" > Synopsis</ a >
2007-12-14 10:11:21 +00:00
</ h4 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
The POSIX-Basic regular expression syntax is used by the Unix utility < code class = "computeroutput" >< span class = "identifier" > sed</ span ></ code > , and variations are used by < code class = "computeroutput" >< span class = "identifier" > grep</ span ></ code > and < code class = "computeroutput" >< span class = "identifier" > emacs</ span ></ code > .
2007-06-08 09:23:23 +00:00
You can construct POSIX basic regular expressions in Boost.Regex by passing
2007-12-14 10:11:21 +00:00
the flag < code class = "computeroutput" >< span class = "identifier" > basic</ span ></ code > to the regex
2008-12-23 11:46:00 +00:00
constructor (see < a class = "link" href = "../ref/syntax_option_type.html" title = "syntax_option_type" >< code class = "computeroutput" >< span class = "identifier" > syntax_option_type</ span ></ code ></ a > ), for example:
2007-06-08 09:23:23 +00:00
</ p >
2011-07-21 10:01:09 +00:00
< pre class = "programlisting" >< span class = "comment" > // e1 is a case sensitive POSIX-Basic expression:</ span >
< span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span > < span class = "identifier" > e1</ span >< span class = "special" > (</ span >< span class = "identifier" > my_expression</ span >< span class = "special" > ,</ span > < span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span >< span class = "special" > ::</ span >< span class = "identifier" > basic</ span >< span class = "special" > );</ span >
< span class = "comment" > // e2 a case insensitive POSIX-Basic expression:</ span >
< span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span > < span class = "identifier" > e2</ span >< span class = "special" > (</ span >< span class = "identifier" > my_expression</ span >< span class = "special" > ,</ span > < span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span >< span class = "special" > ::</ span >< span class = "identifier" > basic</ span >< span class = "special" > |</ span >< span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span >< span class = "special" > ::</ span >< span class = "identifier" > icase</ span >< span class = "special" > );</ span >
2007-06-08 09:23:23 +00:00
</ pre >
2011-12-24 17:51:57 +00:00
< a name = "boost_regex.posix_basic" ></ a >< h4 >
< a name = "boost_regex.syntax.basic_syntax.h1" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.posix_basic_syntax" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.posix_basic_syntax" > POSIX
2007-06-08 09:23:23 +00:00
Basic Syntax</ a >
2007-12-14 10:11:21 +00:00
</ h4 >
2007-06-08 09:23:23 +00:00
< p >
In POSIX-Basic regular expressions, all characters are match themselves except
for the following special characters:
</ p >
< pre class = "programlisting" > .[\*^$</ pre >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h2" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.wildcard" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.wildcard" > Wildcard:</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
The single character '.' when used outside of a character set will match
any single character except:
</ p >
2012-11-29 10:28:07 +00:00
< div class = "itemizedlist" >< ul class = "itemizedlist" style = "list-style-type: disc; " >
2011-07-21 10:01:09 +00:00
< li class = "listitem" >
2010-07-08 22:49:58 +00:00
The NULL character when the flag < code class = "computeroutput" >< span class = "identifier" > match_no_dot_null</ span ></ code >
is passed to the matching algorithms.
</ li >
2011-07-21 10:01:09 +00:00
< li class = "listitem" >
2010-07-08 22:49:58 +00:00
The newline character when the flag < code class = "computeroutput" >< span class = "identifier" > match_not_dot_newline</ span ></ code >
is passed to the matching algorithms.
</ li >
2007-06-08 09:23:23 +00:00
</ ul ></ div >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h3" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.anchors" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.anchors" > Anchors:</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
A '^' character shall match the start of a line when used as the first character
of an expression, or the first character of a sub-expression.
</ p >
< p >
A '$' character shall match the end of a line when used as the last character
of an expression, or the last character of a sub-expression.
</ p >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h4" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.marked_sub_expressions" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.marked_sub_expressions" > Marked sub-expressions:</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
A section beginning < code class = "computeroutput" >< span class = "special" > \(</ span ></ code > and ending
< code class = "computeroutput" >< span class = "special" > \)</ span ></ code > acts as a marked sub-expression.
2007-06-08 09:23:23 +00:00
Whatever matched the sub-expression is split out in a separate field by the
matching algorithms. Marked sub-expressions can also repeated, or referred-to
by a back-reference.
</ p >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h5" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.repeats" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.repeats" > Repeats:</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
Any atom (a single character, a marked sub-expression, or a character class)
can be repeated with the * operator.
</ p >
< p >
2007-12-14 10:11:21 +00:00
For example < code class = "computeroutput" >< span class = "identifier" > a</ span >< span class = "special" > *</ span ></ code >
2007-06-08 09:23:23 +00:00
will match any number of letter a's repeated zero or more times (an atom
2007-12-14 10:11:21 +00:00
repeated zero times matches an empty string), so the expression < code class = "computeroutput" >< span class = "identifier" > a</ span >< span class = "special" > *</ span >< span class = "identifier" > b</ span ></ code >
2007-06-08 09:23:23 +00:00
will match any of the following:
</ p >
< pre class = "programlisting" > b
ab
aaaaaaaab
</ pre >
< p >
An atom can also be repeated with a bounded repeat:
</ p >
< p >
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" >< span class = "identifier" > a</ span >< span class = "special" > \{</ span >< span class = "identifier" > n</ span >< span class = "special" > \}</ span ></ code > Matches
2007-06-08 09:23:23 +00:00
'a' repeated exactly n times.
</ p >
< p >
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" >< span class = "identifier" > a</ span >< span class = "special" > \{</ span >< span class = "identifier" > n</ span >< span class = "special" > ,\}</ span ></ code > Matches
2007-06-08 09:23:23 +00:00
'a' repeated n or more times.
</ p >
< p >
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" >< span class = "identifier" > a</ span >< span class = "special" > \{</ span >< span class = "identifier" > n</ span >< span class = "special" > ,</ span > < span class = "identifier" > m</ span >< span class = "special" > \}</ span ></ code > Matches 'a' repeated between n and m times
2007-06-08 09:23:23 +00:00
inclusive.
</ p >
< p >
For example:
</ p >
< pre class = "programlisting" > ^a{2,3}$</ pre >
< p >
Will match either of:
</ p >
< pre class = "programlisting" > aa
aaa
</ pre >
< p >
But neither of:
</ p >
< pre class = "programlisting" > a
aaaa
</ pre >
< p >
It is an error to use a repeat operator, if the preceding construct can not
be repeated, for example:
</ p >
< pre class = "programlisting" > a(*)</ pre >
< p >
Will raise an error, as there is nothing for the * operator to be applied
to.
</ p >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h6" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.back_references" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.back_references" > Back
2011-12-24 17:51:57 +00:00
references:</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
An escape character followed by a digit < span class = "emphasis" >< em > n</ em ></ span > , where < span class = "emphasis" >< em > n</ em ></ span >
is in the range 1-9, matches the same string that was matched by sub-expression
< span class = "emphasis" >< em > n</ em ></ span > . For example the expression:
</ p >
2017-07-30 18:51:10 +01:00
< pre class = "programlisting" > ^\(a*\)[^a]*\1$</ pre >
2007-06-08 09:23:23 +00:00
< p >
Will match the string:
</ p >
< pre class = "programlisting" > aaabbaaa</ pre >
< p >
But not the string:
</ p >
< pre class = "programlisting" > aaabba</ pre >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h7" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.character_sets" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.character_sets" > Character
2007-06-08 09:23:23 +00:00
sets:</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
2020-10-12 18:22:57 +01:00
A character set is a bracket-expression starting with < code class = "literal" > [</ code >
and ending with < code class = "literal" > ]</ code > , it defines a set of characters, and
matches any single character that is a member of that set.
2007-06-08 09:23:23 +00:00
</ p >
< p >
A bracket expression may contain any combination of the following:
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h8" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.single_characters" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.single_characters" > Single
2007-06-08 09:23:23 +00:00
characters:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
For example < code class = "computeroutput" >< span class = "special" > [</ span >< span class = "identifier" > abc</ span >< span class = "special" > ]</ span ></ code > , will match any of the characters 'a', 'b',
2007-06-08 09:23:23 +00:00
or 'c'.
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h9" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.character_ranges" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.character_ranges" > Character
2007-06-08 09:23:23 +00:00
ranges:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
For example < code class = "computeroutput" >< span class = "special" > [</ span >< span class = "identifier" > a</ span >< span class = "special" > -</ span >< span class = "identifier" > c</ span >< span class = "special" > ]</ span ></ code >
2007-06-08 09:23:23 +00:00
will match any single character in the range 'a' to 'c'. By default, for
POSIX-Basic regular expressions, a character < span class = "emphasis" >< em > x</ em ></ span > is within
the range < span class = "emphasis" >< em > y</ em ></ span > to < span class = "emphasis" >< em > z</ em ></ span > , if it collates
within that range; this results in locale specific behavior. This behavior
2007-12-14 10:11:21 +00:00
can be turned off by unsetting the < code class = "computeroutput" >< span class = "identifier" > collate</ span ></ code >
2007-06-08 09:23:23 +00:00
option flag when constructing the regular expression - in which case whether
a character appears within a range is determined by comparing the code points
of the characters only.
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h10" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.negation" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.negation" > Negation:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
If the bracket-expression begins with the ^ character, then it matches the
2007-12-14 10:11:21 +00:00
complement of the characters it contains, for example < code class = "computeroutput" >< span class = "special" > [^</ span >< span class = "identifier" > a</ span >< span class = "special" > -</ span >< span class = "identifier" > c</ span >< span class = "special" > ]</ span ></ code > matches any character that is not in the
2007-06-08 09:23:23 +00:00
range a-c.
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h11" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.character_classes" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.character_classes" > Character
2007-06-08 09:23:23 +00:00
classes:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
An expression of the form < code class = "computeroutput" >< span class = "special" > [[:</ span >< span class = "identifier" > name</ span >< span class = "special" > :]]</ span ></ code >
matches the named character class "name", for example < code class = "computeroutput" >< span class = "special" > [[:</ span >< span class = "identifier" > lower</ span >< span class = "special" > :]]</ span ></ code > matches any lower case character. See
2008-12-23 11:46:00 +00:00
< a class = "link" href = "character_classes.html" title = "Character Class Names" > character class names</ a > .
2007-06-08 09:23:23 +00:00
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h12" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.collating_elements" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.collating_elements" > Collating
2007-06-08 09:23:23 +00:00
Elements:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
An expression of the form < code class = "computeroutput" >< span class = "special" > [[.</ span >< span class = "identifier" > col</ span >< span class = "special" > .]</ span ></ code > matches
2007-06-08 09:23:23 +00:00
the collating element < span class = "emphasis" >< em > col</ em ></ span > . A collating element is any
single character, or any sequence of characters that collates as a single
unit. Collating elements may also be used as the end point of a range, for
2007-12-14 10:11:21 +00:00
example: < code class = "computeroutput" >< span class = "special" > [[.</ span >< span class = "identifier" > ae</ span >< span class = "special" > .]-</ span >< span class = "identifier" > c</ span >< span class = "special" > ]</ span ></ code >
2007-06-08 09:23:23 +00:00
matches the character sequence "ae", plus any single character
2015-02-14 13:21:37 +00:00
in the range "ae"-c, assuming that "ae" is treated as
2007-06-08 09:23:23 +00:00
a single collating element in the current locale.
</ p >
< p >
Collating elements may be used in place of escapes (which are not normally
2007-12-14 10:11:21 +00:00
allowed inside character sets), for example < code class = "computeroutput" >< span class = "special" > [[.^.]</ span >< span class = "identifier" > abc</ span >< span class = "special" > ]</ span ></ code > would
2007-06-08 09:23:23 +00:00
match either one of the characters 'abc^'.
</ p >
< p >
As an extension, a collating element may also be specified via its symbolic
name, for example:
</ p >
< pre class = "programlisting" > [[.NUL.]]</ pre >
< p >
2008-12-23 11:46:00 +00:00
matches a 'NUL' character. See < a class = "link" href = "collating_names.html" title = "Collating Names" > collating
2007-06-08 09:23:23 +00:00
element names</ a > .
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h13" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.equivalence_classes" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.equivalence_classes" > Equivalence
2007-06-08 09:23:23 +00:00
classes:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
2013-12-14 17:42:13 +00:00
An expression of the form < code class = "computeroutput" >< span class = "special" > [[=</ span >< span class = "identifier" > col</ span >< span class = "special" > =]]</ span ></ code > ,
2007-06-08 09:23:23 +00:00
matches any character or collating element whose primary sort key is the
same as that for collating element < span class = "emphasis" >< em > col</ em ></ span > , as with collating
2008-12-23 11:46:00 +00:00
elements the name < span class = "emphasis" >< em > col</ em ></ span > may be a < a class = "link" href = "collating_names.html" title = "Collating Names" > collating
2007-06-08 09:23:23 +00:00
symbolic name</ a > . A primary sort key is one that ignores case, accentation,
2007-12-14 10:11:21 +00:00
or locale-specific tailorings; so for example < code class = "computeroutput" >< span class = "special" > [[=</ span >< span class = "identifier" > a</ span >< span class = "special" > =]]</ span ></ code > matches
2020-10-12 18:22:57 +01:00
any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation
2007-06-08 09:23:23 +00:00
of this is reliant on the platform's collation and localisation support;
this feature can not be relied upon to work portably across all platforms,
or even all locales on one platform.
</ p >
2011-12-24 17:51:57 +00:00
< h6 >
< a name = "boost_regex.syntax.basic_syntax.h14" ></ a >
2015-10-15 13:27:45 +01:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.combinations" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.combinations" > Combinations:</ a >
2007-12-14 10:11:21 +00:00
</ h6 >
2007-06-08 09:23:23 +00:00
< p >
All of the above can be combined in one character set declaration, for example:
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" >< span class = "special" > [[:</ span >< span class = "identifier" > digit</ span >< span class = "special" > :]</ span >< span class = "identifier" > a</ span >< span class = "special" > -</ span >< span class = "identifier" > c</ span >< span class = "special" > [.</ span >< span class = "identifier" > NUL</ span >< span class = "special" > .]].</ span ></ code >
2007-06-08 09:23:23 +00:00
</ p >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h15" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.escapes" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.escapes" > Escapes</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
With the exception of the escape sequences \{, \}, \(, and \), which are
documented above, an escape followed by any character matches that character.
This can be used to make the special characters
</ p >
< pre class = "programlisting" > .[\*^$</ pre >
< p >
"ordinary". Note that the escape character loses its special meaning
2007-12-14 10:11:21 +00:00
inside a character set, so < code class = "computeroutput" >< span class = "special" > [\^]</ span ></ code >
2007-06-08 09:23:23 +00:00
will match either a literal '\' or a '^'.
</ p >
2011-12-24 17:51:57 +00:00
< h4 >
< a name = "boost_regex.syntax.basic_syntax.h16" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.what_gets_matched" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.what_gets_matched" > What
2011-12-24 17:51:57 +00:00
Gets Matched</ a >
2007-11-07 03:23:31 +00:00
</ h4 >
2007-06-08 09:23:23 +00:00
< p >
2007-12-14 10:11:21 +00:00
When there is more that one way to match a regular expression, the "best"
2008-12-23 11:46:00 +00:00
possible match is obtained using the < a class = "link" href = "leftmost_longest_rule.html" title = "The Leftmost Longest Rule" > leftmost-longest
2007-12-14 10:11:21 +00:00
rule</ a > .
</ p >
2011-12-24 17:51:57 +00:00
< h4 >
< a name = "boost_regex.syntax.basic_syntax.h17" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.variations" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.variations" > Variations</ a >
2007-12-14 10:11:21 +00:00
</ h4 >
2011-12-24 17:51:57 +00:00
< a name = "boost_regex.grep_syntax" ></ a >< h5 >
< a name = "boost_regex.syntax.basic_syntax.h18" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.grep" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.grep" > Grep</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
< p >
When an expression is compiled with the flag < code class = "computeroutput" >< span class = "identifier" > grep</ span ></ code >
2008-12-23 11:46:00 +00:00
set, then the expression is treated as a newline separated list of < a class = "link" href = "basic_syntax.html#boost_regex.posix_basic" > POSIX-Basic expressions</ a > , a match
2007-06-08 09:23:23 +00:00
is found if any of the expressions in the list match, for example:
</ p >
2007-12-14 10:11:21 +00:00
< pre class = "programlisting" >< span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span > < span class = "identifier" > e</ span >< span class = "special" > (</ span >< span class = "string" > "abc\ndef"</ span >< span class = "special" > ,</ span > < span class = "identifier" > boost</ span >< span class = "special" > ::</ span >< span class = "identifier" > regex</ span >< span class = "special" > ::</ span >< span class = "identifier" > grep</ span >< span class = "special" > );</ span >
2007-06-08 09:23:23 +00:00
</ pre >
< p >
2008-12-23 11:46:00 +00:00
will match either of the < a class = "link" href = "basic_syntax.html#boost_regex.posix_basic" > POSIX-Basic
2007-06-08 09:23:23 +00:00
expressions</ a > "abc" or "def".
</ p >
< p >
As its name suggests, this behavior is consistent with the Unix utility grep.
</ p >
2011-12-24 17:51:57 +00:00
< h5 >
< a name = "boost_regex.syntax.basic_syntax.h19" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.emacs" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.emacs" > emacs</ a >
2007-12-14 10:11:21 +00:00
</ h5 >
2007-06-08 09:23:23 +00:00
< p >
2008-12-23 11:46:00 +00:00
In addition to the < a class = "link" href = "basic_syntax.html#boost_regex.posix_basic" > POSIX-Basic features</ a >
2007-06-08 09:23:23 +00:00
the following characters are also special:
</ p >
< div class = "informaltable" >< table class = "table" >
< colgroup >
< col >
< col >
</ colgroup >
< thead >< tr >
< th >
2010-07-08 22:49:58 +00:00
< p >
Character
</ p >
2007-06-08 09:23:23 +00:00
</ th >
< th >
2010-07-08 22:49:58 +00:00
< p >
Description
</ p >
2007-06-08 09:23:23 +00:00
</ th >
</ tr ></ thead >
< tbody >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
+
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
repeats the preceding atom one or more times.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
?
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
repeats the preceding atom zero or one times.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2015-10-15 13:27:45 +01:00
< div class = "itemizedlist" >< ul class = "itemizedlist" style = "list-style-type: disc; " >< li class = "listitem" >
?
</ li ></ ul ></ div >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
A non-greedy version of *.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
+?
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
A non-greedy version of +.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
??
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
A non-greedy version of ?.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
</ tbody >
</ table ></ div >
< p >
And the following escape sequences are also recognised:
</ p >
< div class = "informaltable" >< table class = "table" >
< colgroup >
< col >
< col >
</ colgroup >
< thead >< tr >
< th >
2010-07-08 22:49:58 +00:00
< p >
Escape
</ p >
2007-06-08 09:23:23 +00:00
</ th >
< th >
2010-07-08 22:49:58 +00:00
< p >
Description
</ p >
2007-06-08 09:23:23 +00:00
</ th >
</ tr ></ thead >
< tbody >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\|
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
specifies an alternative.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\(?: ... )
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
is a non-marking grouping construct - allows you to lexically group
something without spitting out an extra sub-expression.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\w
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches any word character.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\W
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches any non-word character.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\sx
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches any character in the syntax group x, the following emacs
groupings are supported: 's', ' ', '_', 'w', '.', ')', '(', '"',
'\'', '> ' and '< '. Refer to the emacs docs for details.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\Sx
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches any character not in the syntax grouping x.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\c and \C
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
These are not supported.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\`
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches zero characters only at the start of a buffer (or string
being matched).
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\'
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches zero characters only at the end of a buffer (or string
being matched).
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\b
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches zero characters at a word boundary.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\B
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches zero characters, not at a word boundary.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\<
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches zero characters only at the start of a word.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
< tr >
< td >
2010-07-08 22:49:58 +00:00
< p >
\>
</ p >
2007-06-08 09:23:23 +00:00
</ td >
< td >
2010-07-08 22:49:58 +00:00
< p >
matches zero characters only at the end of a word.
</ p >
2007-06-08 09:23:23 +00:00
</ td >
</ tr >
</ tbody >
</ table ></ div >
< p >
Finally, you should note that emacs style regular expressions are matched
2008-12-23 11:46:00 +00:00
according to the < a class = "link" href = "perl_syntax.html#boost_regex.syntax.perl_syntax.what_gets_matched" > Perl
2007-06-08 09:23:23 +00:00
"depth first search" rules</ a > . Emacs expressions are matched
this way because they contain Perl-like extensions, that do not interact
2008-12-23 11:46:00 +00:00
well with the < a class = "link" href = "leftmost_longest_rule.html" title = "The Leftmost Longest Rule" > POSIX-style
2007-06-08 09:23:23 +00:00
leftmost-longest rule</ a > .
</ p >
2011-12-24 17:51:57 +00:00
< h4 >
< a name = "boost_regex.syntax.basic_syntax.h20" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.options" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.options" > Options</ a >
2007-12-14 10:11:21 +00:00
</ h4 >
2007-06-08 09:23:23 +00:00
< p >
2008-12-23 11:46:00 +00:00
There are a < a class = "link" href = "../ref/syntax_option_type/syntax_option_type_basic.html" title = "Options for POSIX Basic Regular Expressions" > variety
2007-12-14 10:11:21 +00:00
of flags</ a > that may be combined with the < code class = "computeroutput" >< span class = "identifier" > basic</ span ></ code >
and < code class = "computeroutput" >< span class = "identifier" > grep</ span ></ code > options when constructing
2008-12-23 11:46:00 +00:00
the regular expression, in particular note that the < a class = "link" href = "../ref/syntax_option_type/syntax_option_type_basic.html" title = "Options for POSIX Basic Regular Expressions" >< code class = "computeroutput" >< span class = "identifier" > newline_alt</ span ></ code > , < code class = "computeroutput" >< span class = "identifier" > no_char_classes</ span ></ code > ,
2007-12-14 10:11:21 +00:00
< code class = "computeroutput" >< span class = "identifier" > no</ span >< span class = "special" > -</ span >< span class = "identifier" > intervals</ span ></ code > , < code class = "computeroutput" >< span class = "identifier" > bk_plus_qm</ span ></ code >
and < code class = "computeroutput" >< span class = "identifier" > bk_plus_vbar</ span ></ code ></ a > options
2008-12-23 11:46:00 +00:00
all alter the syntax, while the < a class = "link" href = "../ref/syntax_option_type/syntax_option_type_basic.html" title = "Options for POSIX Basic Regular Expressions" >< code class = "computeroutput" >< span class = "identifier" > collate</ span ></ code > and < code class = "computeroutput" >< span class = "identifier" > icase</ span ></ code >
2007-06-08 09:23:23 +00:00
options</ a > modify how the case and locale sensitivity are to be applied.
</ p >
2011-12-24 17:51:57 +00:00
< h4 >
< a name = "boost_regex.syntax.basic_syntax.h21" ></ a >
2012-11-29 10:28:07 +00:00
< span class = "phrase" >< a name = "boost_regex.syntax.basic_syntax.references" ></ a ></ span >< a class = "link" href = "basic_syntax.html#boost_regex.syntax.basic_syntax.references" > References</ a >
2007-12-14 10:11:21 +00:00
</ h4 >
2007-06-08 09:23:23 +00:00
< p >
< a href = "http://www.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap09.html" target = "_top" > IEEE
Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions
and Headers, Section 9, Regular Expressions (FWD.1).</ a >
</ p >
< p >
< a href = "http://www.opengroup.org/onlinepubs/000095399/utilities/grep.html" target = "_top" > IEEE
Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
Utilities, Section 4, Utilities, grep (FWD.1).</ a >
</ p >
< p >
< a href = "http://www.gnu.org/software/emacs/" target = "_top" > Emacs Version 21.3.</ a >
</ p >
</ div >
< table xmlns:rev = "http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width = "100%" >< tr >
< td align = "left" ></ td >
2020-10-12 18:22:57 +01:00
< td align = "right" >< div class = "copyright-footer" > Copyright © 1998-2013 John Maddock< p >
2007-11-07 03:23:31 +00:00
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at < a href = "http://www.boost.org/LICENSE_1_0.txt" target = "_top" > http://www.boost.org/LICENSE_1_0.txt</ a > )
2007-12-14 10:11:21 +00:00
</ p >
</ div ></ td >
2007-06-08 09:23:23 +00:00
</ tr ></ table >
< hr >
< div class = "spirit-nav" >
2010-07-08 22:49:58 +00:00
< a accesskey = "p" href = "basic_extended.html" >< img src = "../../../../../../doc/src/images/prev.png" alt = "Prev" ></ a >< a accesskey = "u" href = "../syntax.html" >< img src = "../../../../../../doc/src/images/up.png" alt = "Up" ></ a >< a accesskey = "h" href = "../../index.html" >< img src = "../../../../../../doc/src/images/home.png" alt = "Home" ></ a >< a accesskey = "n" href = "character_classes.html" >< img src = "../../../../../../doc/src/images/next.png" alt = "Next" ></ a >
2007-06-08 09:23:23 +00:00
</ div >
</ body >
</ html >