A marked sub-expression is useful to lexically group part of a regular expression,
but has the side-effect of spitting out an extra field in the result. As
an alternative you can lexically group part of a regular expression, without
generating a marked sub-expression by using <codeclass="computeroutput"><spanclass="special">(?:</span></code>
and <codeclass="computeroutput"><spanclass="special">)</span></code> , for example <codeclass="computeroutput"><spanclass="special">(?:</span><spanclass="identifier">ab</span><spanclass="special">)+</span></code>
will repeat <codeclass="computeroutput"><spanclass="identifier">ab</span></code> without splitting
and <codeclass="computeroutput"><spanclass="special">{}</span></code> operators.
</p>
<p>
The <codeclass="computeroutput"><spanclass="special">*</span></code> operator will match the
preceding atom zero or more times, for example the expression <codeclass="computeroutput"><spanclass="identifier">a</span><spanclass="special">*</span><spanclass="identifier">b</span></code>
will match any of the following:
</p>
<preclass="programlisting">
<spanclass="identifier">b</span>
<spanclass="identifier">ab</span>
<spanclass="identifier">aaaaaaaab</span>
</pre>
<p>
The <codeclass="computeroutput"><spanclass="special">+</span></code> operator will match the
preceding atom one or more times, for example the expression <codeclass="computeroutput"><spanclass="identifier">a</span><spanclass="special">+</span><spanclass="identifier">b</span></code>
will match any of the following:
</p>
<preclass="programlisting">
<spanclass="identifier">ab</span>
<spanclass="identifier">aaaaaaaab</span>
</pre>
<p>
But will not match:
</p>
<preclass="programlisting">
<spanclass="identifier">b</span>
</pre>
<p>
The <codeclass="computeroutput"><spanclass="special">?</span></code> operator will match the
preceding atom zero or one times, for example the expression ca?b will match
any of the following:
</p>
<preclass="programlisting">
<spanclass="identifier">cb</span>
<spanclass="identifier">cab</span>
</pre>
<p>
But will not match:
</p>
<preclass="programlisting">
<spanclass="identifier">caab</span>
</pre>
<p>
An atom can also be repeated with a bounded repeat:
<codeclass="computeroutput"><spanclass="identifier">a</span><spanclass="special">{</span><spanclass="identifier">n</span><spanclass="special">,</span><spanclass="identifier">m</span><spanclass="special">}</span></code> Matches 'a' repeated between n and m times
inclusive.
</p>
<p>
For example:
</p>
<preclass="programlisting">^a{2,3}$</pre>
<p>
Will match either of:
</p>
<preclass="programlisting">
<spanclass="identifier">aa</span>
<spanclass="identifier">aaa</span>
</pre>
<p>
But neither of:
</p>
<preclass="programlisting">
<spanclass="identifier">a</span>
<spanclass="identifier">aaaa</span>
</pre>
<p>
It is an error to use a repeat operator, if the preceding construct can not
The normal repeat operators are "greedy", that is to say they will
consume as much input as possible. There are non-greedy versions available
that will consume as little input as possible while still producing a match.
</p>
<p>
<codeclass="computeroutput"><spanclass="special">*?</span></code> Matches the previous atom
zero or more times, while consuming as little input as possible.
</p>
<p>
<codeclass="computeroutput"><spanclass="special">+?</span></code> Matches the previous atom
one or more times, while consuming as little input as possible.
</p>
<p>
<codeclass="computeroutput"><spanclass="special">??</span></code> Matches the previous atom
zero or one times, while consuming as little input as possible.
</p>
<p>
<codeclass="computeroutput"><spanclass="special">{</span><spanclass="identifier">n</span><spanclass="special">,}?</span></code> Matches the previous atom n or more times,
The <codeclass="computeroutput"><spanclass="special">|</span></code> operator will match either
of its arguments, so for example: <codeclass="computeroutput"><spanclass="identifier">abc</span><spanclass="special">|</span><spanclass="identifier">def</span></code> will
match either "abc" or "def".
</p>
<p>
Parenthesis can be used to group alternations, for example: <codeclass="computeroutput"><spanclass="identifier">ab</span><spanclass="special">(</span><spanclass="identifier">d</span><spanclass="special">|</span><spanclass="identifier">ef</span><spanclass="special">)</span></code>
will match either of "abd" or "abef".
</p>
<p>
Empty alternatives are not allowed (these are almost always a mistake), but
if you really want an empty alternative use <codeclass="computeroutput"><spanclass="special">(?:)</span></code>
<codeclass="computeroutput"><spanclass="special">(?:</span><spanclass="identifier">abc</span><spanclass="special">)??</span></code> has exactly the same effect.
For example <codeclass="computeroutput"><spanclass="special">[</span><spanclass="identifier">abc</span><spanclass="special">]</span></code>, will match any of the characters 'a', 'b',
For example <codeclass="computeroutput"><spanclass="special">[</span><spanclass="identifier">a</span><spanclass="special">-</span><spanclass="identifier">c</span><spanclass="special">]</span></code>
will match any single character in the range 'a' to 'c'. By default, for
Perl regular expressions, a character x is within the range y to z, if the
code point of the character lies within the codepoints of the endpoints of
the range. Alternatively, if you set the <ahref="../ref/syntax_option_type/syntax_option_type_perl.html"title="Options for Perl Regular Expressions"><codeclass="computeroutput"><spanclass="identifier">collate</span></code> flag</a> when constructing the
regular expression, then ranges are locale sensitive.
If the bracket-expression begins with the ^ character, then it matches the
complement of the characters it contains, for example <codeclass="computeroutput"><spanclass="special">[^</span><spanclass="identifier">a</span><spanclass="special">-</span><spanclass="identifier">c</span><spanclass="special">]</span></code> matches any character that is not in the
range <codeclass="computeroutput"><spanclass="identifier">a</span><spanclass="special">-</span><spanclass="identifier">c</span></code>.
An expression of the form <codeclass="computeroutput"><spanclass="special">[[:</span><spanclass="identifier">name</span><spanclass="special">:]]</span></code>
matches the named character class "name", for example <codeclass="computeroutput"><spanclass="special">[[:</span><spanclass="identifier">lower</span><spanclass="special">:]]</span></code> matches any lower case character. See
<ahref="character_classes.html"title="Character Class Names">character class names</a>.
An expression of the form <codeclass="computeroutput"><spanclass="special">[[.</span><spanclass="identifier">col</span><spanclass="special">.]</span></code> matches
the collating element <spanclass="emphasis"><em>col</em></span>. A collating element is any
single character, or any sequence of characters that collates as a single
unit. Collating elements may also be used as the end point of a range, for
matches the character sequence "ae", plus any single character
in the range "ae"-c, assuming that "ae" is treated as
a single collating element in the current locale.
</p>
<p>
As an extension, a collating element may also be specified via it's <ahref="collating_names.html"title="Collating Names">symbolic name</a>, for example:
An expression of the form <codeclass="computeroutput"><spanclass="special">[[=</span><spanclass="identifier">col</span><spanclass="special">=]]</span></code>,
matches any character or collating element whose primary sort key is the
same as that for collating element <spanclass="emphasis"><em>col</em></span>, as with collating
elements the name <spanclass="emphasis"><em>col</em></span> may be a <ahref="collating_names.html"title="Collating Names">symbolic
name</a>. A primary sort key is one that ignores case, accentation, or
locale-specific tailorings; so for example <codeclass="computeroutput"><spanclass="special">[[=</span><spanclass="identifier">a</span><spanclass="special">=]]</span></code> matches
any of the characters: a, <20>, <20>, <20>, <20>, <20>, <20>, A, <20>, <20>, <20>, <20>, <20> and <20>. Unfortunately implementation
of this is reliant on the platform's collation and localisation support;
this feature can not be relied upon to work portably across all platforms,
All the escape sequences that match a single character, or a single character
class are permitted within a character class definition. For example <codeclass="computeroutput"><spanclass="special">[\[\]]</span></code> would match either of <codeclass="computeroutput"><spanclass="special">[</span></code> or <codeclass="computeroutput"><spanclass="special">]</span></code>
while <codeclass="computeroutput"><spanclass="special">[\</span><spanclass="identifier">W</span><spanclass="special">\</span><spanclass="identifier">d</span><spanclass="special">]</span></code>
would match any character that is either a "digit", <spanclass="emphasis"><em>or</em></span>
is <spanclass="emphasis"><em>not</em></span> a "word" character.
Matches the single character which has the <ahref="collating_names.html"title="Collating Names">symbolic
name</a><spanclass="emphasis"><em>name</em></span>. For example <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">N</span><spanclass="special">{</span><spanclass="identifier">newline</span><spanclass="special">}</span></code> matches the single character \n.
For example <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">pd</span></code>
matches any "digit" character, as does <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">p</span><spanclass="special">{</span><spanclass="identifier">digit</span><spanclass="special">}</span></code>.
The following match only at buffer boundaries: a "buffer" in this
context is the whole of the input text that is being matched against (note
that ^ and $ may match embedded newlines within the text).
</p>
<p>
\` Matches at the start of a buffer only.
</p>
<p>
\' Matches at the end of a buffer only.
</p>
<p>
\A Matches at the start of a buffer only (the same as \`).
</p>
<p>
\z Matches at the end of a buffer only (the same as \').
</p>
<p>
\Z Matches an optional sequence of newlines at the end of a buffer: equivalent
to the regular expression <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">n</span><spanclass="special">*\</span><spanclass="identifier">z</span></code>
The escape sequence <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">Q</span></code>
begins a "quoted sequence": all the subsequent characters are treated
as literals, until either the end of the regular expression or \E is found.
For example the expression: <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">Q</span><spanclass="special">\*+\</span><spanclass="identifier">Ea</span><spanclass="special">+</span></code> would match either of:
Matches a single code point: in Boost regex this has exactly the same effect
as a "." operator. <codeclass="computeroutput"><spanclass="special">\</span><spanclass="identifier">X</span></code> Matches a combining character sequence:
that is any non-combining character followed by a sequence of zero or more
<codeclass="computeroutput"><spanclass="special">(?:</span><spanclass="identifier">pattern</span><spanclass="special">)</span></code> lexically groups pattern, without generating
<codeclass="computeroutput"><spanclass="special">(?=</span><spanclass="identifier">pattern</span><spanclass="special">)</span></code> consumes zero characters, only if pattern
matches.
</p>
<p>
<codeclass="computeroutput"><spanclass="special">(?!</span><spanclass="identifier">pattern</span><spanclass="special">)</span></code> consumes zero characters, only if pattern
does not match.
</p>
<p>
Lookahead is typically used to create the logical AND of two regular expressions,
for example if a password must contain a lower case letter, an upper case
letter, a punctuation symbol, and be at least 6 characters long, then the
<codeclass="computeroutput"><spanclass="special">(?<=</span><spanclass="identifier">pattern</span><spanclass="special">)</span></code> consumes zero characters, only if pattern
could be matched against the characters preceding the current position (pattern
must be of fixed length).
</p>
<p>
<codeclass="computeroutput"><spanclass="special">(?<!</span><spanclass="identifier">pattern</span><spanclass="special">)</span></code> consumes zero characters, only if pattern
could not be matched against the characters preceding the current position
<codeclass="computeroutput"><spanclass="special">(?></span><spanclass="identifier">pattern</span><spanclass="special">)</span></code><spanclass="emphasis"><em>pattern</em></span> is matched
independently of the surrounding patterns, the expression will never backtrack
into <spanclass="emphasis"><em>pattern</em></span>. Independent sub-expressions are typically
used to improve performance; only the best possible match for pattern will
be considered, if this doesn't allow the expression as a whole to match then
<codeclass="computeroutput"><spanclass="special">(?(</span><spanclass="identifier">condition</span><spanclass="special">)</span><spanclass="identifier">yes</span><spanclass="special">-</span><spanclass="identifier">pattern</span><spanclass="special">|</span><spanclass="identifier">no</span><spanclass="special">-</span><spanclass="identifier">pattern</span><spanclass="special">)</span></code> attempts to match <spanclass="emphasis"><em>yes-pattern</em></span>
if the <spanclass="emphasis"><em>condition</em></span> is true, otherwise attempts to match
The same as <codeclass="computeroutput"><spanclass="identifier">S</span><spanclass="special">{</span><spanclass="number">0</span><spanclass="special">,</span><spanclass="number">1</span><spanclass="special">}</span></code>,
The same as <codeclass="computeroutput"><spanclass="identifier">S</span><spanclass="special">{</span><spanclass="number">0</span><spanclass="special">,</span><spanclass="number">1</span><spanclass="special">}?</span></code>,
<codeclass="computeroutput"><spanclass="identifier">JavaScript</span></code> and <codeclass="computeroutput"><spanclass="identifier">JScript</span></code></a> are all synonyms for <codeclass="computeroutput"><spanclass="identifier">perl</span></code>.
There are a <ahref="../ref/syntax_option_type/syntax_option_type_perl.html"title="Options for Perl Regular Expressions">variety
of flags</a> that may be combined with the <codeclass="computeroutput"><spanclass="identifier">perl</span></code>
option when constructing the regular expression, in particular note that
the <codeclass="computeroutput"><spanclass="identifier">newline_alt</span></code> option alters
the syntax, while the <codeclass="computeroutput"><spanclass="identifier">collate</span></code>,
<codeclass="computeroutput"><spanclass="identifier">nosubs</span></code> and <codeclass="computeroutput"><spanclass="identifier">icase</span></code> options modify how the case and locale
The perl <codeclass="computeroutput"><spanclass="identifier">smix</span></code> modifiers can
either be applied using a <codeclass="computeroutput"><spanclass="special">(?</span><spanclass="identifier">smix</span><spanclass="special">-</span><spanclass="identifier">smix</span><spanclass="special">)</span></code> prefix to the regular expression, or with
one of the <ahref="../ref/syntax_option_type/syntax_option_type_perl.html"title="Options for Perl Regular Expressions">regex-compile
time flags <codeclass="computeroutput"><spanclass="identifier">no_mod_m</span></code>, <codeclass="computeroutput"><spanclass="identifier">mod_x</span></code>, <codeclass="computeroutput"><spanclass="identifier">mod_s</span></code>,
and <codeclass="computeroutput"><spanclass="identifier">no_mod_s</span></code></a>.