Merged regex-4 branch.

[SVN r18431]
This commit is contained in:
John Maddock
2003-05-17 11:55:51 +00:00
parent f0f32bdda1
commit 1f15026060
42 changed files with 7254 additions and 7501 deletions

79
doc/Attic/standards.html Normal file
View File

@ -0,0 +1,79 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Standards Conformance</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Standards Conformance</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>C++</H3>
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>, which will appear in a
future C++ standard technical report (and hopefully in a future version of the
standard).&nbsp; Currently there are some differences in how the regular
expression traits classes are defined, these will be fixed in a future release.</P>
<H3>ECMAScript / JavaScript</H3>
<P>All of the ECMAScript regular expression syntax features are supported, except
that:</P>
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
definitions ( [...] ).</P>
<P>The escape sequence \u matches any upper case character (the same as
[[:upper:]])&nbsp;rather than a Unicode escape sequence; use \x{DDDD} for
Unicode escape sequences.</P>
<H3>Perl</H3>
<P>Almost all Perl features are supported, except for:</P>
<P>\N{name}&nbsp; Use [[:name:]] instead.</P>
<P>\pP and \PP</P>
<P>(?imsx-imsx)</P>
<P>(?&lt;=pattern)</P>
<P>(?&lt;!pattern)</P>
<P>(?{code})</P>
<P>(??{code})</P>
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
<P>These embarrassments / limitations will be removed in due course, mainly
dependent upon user demand.</P>
<H3>POSIX</H3>
<P>All the POSIX basic and extended regular expression features are supported,
except that:</P>
<P>No character collating names are recognized except those specified in the POSIX
standard for the C locale, unless they are explicitly registered with the
traits class.</P>
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
Win32.&nbsp; Implementing this feature requires knowledge of the format of the
string sort keys produced by the system; if you need this, and the default
implementation doesn't work on your platform, then you will need to supply a
custom traits class.</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

426
doc/Attic/sub_match.html Normal file
View File

@ -0,0 +1,426 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: sub_match</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">sub_match</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>#include &lt;<A href="../../boost/regex.hpp">boost/regex.hpp</A>&gt;
</P>
<P>Regular expressions are different from many simple pattern-matching algorithms
in that as well as finding an overall match they can also produce
sub-expression matches: each sub-expression being delimited in the pattern by a
pair of parenthesis (...). There has to be some method for reporting
sub-expression matches back to the user: this is achieved this by defining a
class <I><A href="match_results.htm">match_results</A></I> that acts as an
indexed collection of sub-expression matches, each sub-expression match being
contained in an object of type <I>sub_match</I>
.
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
of type <EM><A href="match_results.html">match_results</A></EM>
.
<P>When the marked sub-expression denoted by an object of type sub_match&lt;&gt;
participated in a regular expression match then member <CODE>matched</CODE> evaluates
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
range of characters <CODE>[first,second)</CODE> which formed that match.
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
contained undefined values.</P>
<P>If an object of type <CODE>sub_match&lt;&gt;</CODE> represents sub-expression 0
- that is to say the whole match - then member <CODE>matched</CODE> is always
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
character range that formed the partial match.</P>
<PRE>
namespace boost{
template &lt;class BidirectionalIterator&gt;
class sub_match : public std::pair&lt;BidirectionalIterator, BidirectionalIterator&gt;
{
public:
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::value_type value_type;
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::difference_type difference_type;
typedef BidirectionalIterator iterator;
bool matched;
difference_type length()const;
operator basic_string&lt;value_type&gt;()const;
basic_string&lt;value_type&gt; str()const;
int compare(const sub_match&amp; s)const;
int compare(const basic_string&lt;value_type&gt;&amp; s)const;
int compare(const value_type* s)const;
};
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os,
const sub_match&lt;BidirectionalIterator&gt;&amp; m);
} // namespace boost</PRE>
<H3>Description</H3>
<H4>
sub_match members</H4>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::value_type value_type;</PRE>
<P>The type pointed to by the iterators.</P>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::difference_type difference_type;</PRE>
<P>A type that represents the difference between two iterators.</P>
<PRE>typedef iterator iterator_type;</PRE>
<P>The iterator type.</P>
<PRE>iterator first</PRE>
<P>An iterator denoting the position of the start of the match.</P>
<PRE>iterator second</PRE>
<P>An iterator denoting the position of the end of the match.</P>
<PRE>bool matched</PRE>
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
<PRE>static difference_type length();</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string&lt;value_type&gt;()const;</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;()).</P></CODE><PRE>basic_string&lt;value_type&gt; str()const;</PRE>
<P><B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;())</CODE>.</P><PRE>int compare(const sub_match&amp; s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string&lt;value_type&gt;&amp; s)const;</PRE>
<P><B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
<H4>
sub_match non-member operators</H4>
<PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P><B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os
const sub_match&lt;BidirectionalIterator&gt;&amp; m);</PRE>
<P> <B>
Effects: </B>returns <CODE>(os &lt;&lt; m.str())</CODE>.
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

773
doc/Attic/syntax.html Normal file
View File

@ -0,0 +1,773 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Regular Expression Syntax</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Regular Expression Syntax</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>This section covers the regular expression syntax used by this library, this is
a programmers guide, the actual syntax presented to your program's users will
depend upon the flags used during expression compilation.
</P>
<H3>Literals
</H3>
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
a "\". A literal is a character that matches itself, or matches the result of
traits_type::translate(), where traits_type is the traits template parameter to
class basic_regex.</P>
<H3>Wildcard
</H3>
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
is passed to the matching algorithms, the dot does not match a null character;
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
the dot does not match a newline character.
</P>
<H3>Repeats
</H3>
<P>A repeat is an expression that is repeated an arbitrary number of times. An
expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times, but at least
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
then "+" is an ordinary character and "\+" represents a repeat of once or more.
An expression followed by "?" may be repeated zero or one times only, if the
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
ordinary character and "\?" represents the repeat zero or once operator. When
it is necessary to specify the minimum and maximum number of repeats
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
no upper limit. Note that there must be no white-space inside the {}, and there
is no upper limit on the values of the lower and upper bounds. When the
expression is compiled with the flag regex_constants::bk_braces then "{" and
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
instead. All repeat expressions refer to the shortest possible previous
sub-expression: a single character; a character set, or a sub-expression
grouped with "()" for example.
</P>
<P>Examples:
</P>
<P>"ba*" will match all of "b", "ba", "baaa" etc.
</P>
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
</P>
<P>"ba?" will match "b" or "ba".
</P>
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
</P>
<H3>Non-greedy repeats
</H3>
<P>Whenever the "extended" regular expression syntax is in use (the default) then
non-greedy repeats are possible by appending a '?' after the repeat; a
non-greedy repeat is one which will match the <I>shortest</I> possible string.
</P>
<P>For example to match html tag pairs one could use something like:
</P>
<P>"&lt;\s*tagname[^&gt;]*&gt;(.*?)&lt;\s*/tagname\s*&gt;"
</P>
<P>In this case $1 will contain the text between the tag pairs, and will be the
shortest possible matching string.&nbsp;
</P>
<H3>Parenthesis
</H3>
<P>Parentheses serve two purposes, to group items together into a sub-expression,
and to mark what generated the match. For example the expression "(ab)*" would
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
match_results</A> contains information both on what the whole expression
matched and on what each sub-expression matched. In the example above
match_results[1] would contain a pair of iterators denoting the final "ab" of
the matching string. It is permissible for sub-expressions to match null
strings. If a sub-expression takes no part in a match - for example if it is
part of an alternative that is not taken - then both of the iterators that are
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
from left to right starting from 1, sub-expression 0 is the whole expression.
</P>
<H3>Non-Marking Parenthesis
</H3>
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
the parenthesis to spit out another marked sub-expression, in this case a
non-marking parenthesis (?:expression) can be used. For example the following
expression creates no sub-expressions:
</P>
<P>"(?:abc)*"</P>
<H3>Forward Lookahead Asserts&nbsp;
</H3>
<P>There are two forms of these; one for positive forward lookahead asserts, and
one for negative lookahead asserts:</P>
<P>"(?=abc)" matches zero characters only if they are followed by the expression
"abc".</P>
<P>"(?!abc)" matches zero characters only if they are not followed by the
expression "abc".</P>
<H3>Independent sub-expressions</H3>
<P>"(?&gt;expression)" matches "expression" as an independent atom (the algorithm
will not backtrack into it if a failure occurs later in the expression).</P>
<H3>Alternatives
</H3>
<P>Alternatives occur when the expression can match either one sub-expression or
another, each alternative is separated by a "|", or a "\|" if the flag
regex_constants::bk_vbar is set, or by a newline character if the flag
regex_constants::newline_alt is set. Each alternative is the largest possible
previous sub-expression; this is the opposite behavior from repetition
operators.
</P>
<P>Examples:
</P>
<P>"a(b|c)" could match "ab" or "ac".
</P>
<P>"abc|def" could match "abc" or "def".
</P>
<H3>Sets
</H3>
<P>A set is a set of characters that can match any single character that is a
member of the set. Sets are delimited by "[" and "]" and can contain literals,
character ranges, character classes, collating elements and equivalence
classes. Set declarations that start with "^" contain the compliment of the
elements that follow.
</P>
<P>Examples:
</P>
<P>Character literals:
</P>
<P>"[abc]" will match either of "a", "b", or "c".
</P>
<P>"[^abc] will match any character other than "a", "b", or "c".
</P>
<P>Character ranges:
</P>
<P>"[a-z]" will match any character in the range "a" to "z".
</P>
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
</P>
<P>Note that character ranges are highly locale dependent if the flag
regex_constants::collate is set: they match any character that collates between
the endpoints of the range, ranges will only behave according to ASCII rules
when the default "C" locale is in effect. For example if the library is
compiled with the Win32 localization model, then [a-z] will match the ASCII
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
'z'. This locale specific behavior is disabled by default (in perl mode), and
forces ranges to collate according to ASCII character code.
</P>
<P>Character classes are denoted using the syntax "[:classname:]" within a set
declaration, for example "[[:space:]]" is the set of all whitespace characters.
Character classes are only available if the flag regex_constants::char_classes
is set. The available character classes are:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="50%">alnum</TD>
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">alpha</TD>
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
characters may also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">blank</TD>
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">cntrl</TD>
<TD vAlign="top" width="50%">Any control character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">digit</TD>
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">graph</TD>
<TD vAlign="top" width="50%">Any graphical character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">lower</TD>
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">print</TD>
<TD vAlign="top" width="50%">Any printable character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">punct</TD>
<TD vAlign="top" width="50%">Any punctuation character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">space</TD>
<TD vAlign="top" width="50%">Any whitespace character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">upper</TD>
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">xdigit</TD>
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">word</TD>
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
the underscore.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">Unicode</TD>
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
applies to the wide character traits classes only.</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<P>There are some shortcuts that can be used in place of the character classes,
provided the flag regex_constants::escape_in_lists is set then you can use:
</P>
<P>\w in place of [:word:]
</P>
<P>\s in place of [:space:]
</P>
<P>\d in place of [:digit:]
</P>
<P>\l in place of [:lower:]
</P>
<P>\u in place of [:upper:]&nbsp;
</P>
<P>Collating elements take the general form [.tagname.] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
equivalent to [,]. The library supports all the standard POSIX collating
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
"nj", "dz", "lj", each in lower, upper and title case variations.
Multi-character collating elements can result in the set matching more than one
character, for example [[.ae.]] would match two characters, but note that
[^[.ae.]] would only match one character.&nbsp;
</P>
<P>
Equivalence classes take the general form[=tagname=] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, and matches any character that is a member of the same primary
equivalence class as the collating element [.tagname.]. An equivalence class is
a set of characters that collate the same, a primary equivalence class is a set
of characters whose primary sort key are all the same (for example strings are
typically collated by character, then by accent, and then by case; the primary
sort key then relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
locale independent method of obtaining the primary sort key for a character,
except under Win32. For other operating systems the library will "guess" the
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
equivalence classes are probably best considered broken under any operating
system other than Win32.&nbsp;
</P>
<P>To include a literal "-" in a set declaration then: make it the first character
after the opening "[" or "[^", the endpoint of a range, a collating element, or
if the flag regex_constants::escape_in_lists is set then precede with an escape
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
make them the endpoint of a range, a collating element, or precede with an
escape character if the flag regex_constants::escape_in_lists is set.
</P>
<H3>Line anchors
</H3>
<P>An anchor is something that matches the null string at the start or end of a
line: "^" matches the null string at the start of a line, "$" matches the null
string at the end of a line.
</P>
<H3>Back references
</H3>
<P>A back reference is a reference to a previous sub-expression that has already
been matched, the reference is to what the sub-expression matched, not to the
expression itself. A back reference consists of the escape character "\"
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
to the second etc. For example the expression "(.*)\1" matches any string that
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
reference to a sub-expression that did not participate in any match, matches
the null string: NB this is different to some other regular expression
matchers. Back references are only available if the expression is compiled with
the flag regex_constants::bk_refs set.
</P>
<H3>Characters by code
</H3>
<P>This is an extension to the algorithm that is not available in other libraries,
it consists of the escape character followed by the digit "0" followed by the
octal character code. For example "\023" represents the character whose octal
code is 23. Where ambiguity could occur use parentheses to break the expression
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
character 10 followed by "3". To match characters by their hexadecimal code,
use \x followed by a string of hexadecimal digits, optionally enclosed inside
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
character.</P>
<H3>Word operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library.
</P>
<P>"\w" matches any single character that is a member of the "word" character
class, this is identical to the expression "[[:word:]]".
</P>
<P>"\W" matches any single character that is not a member of the "word" character
class, this is identical to the expression "[^[:word:]]".
</P>
<P>"\&lt;" matches the null string at the start of a word.
</P>
<P>"\&gt;" matches the null string at the end of the word.
</P>
<P>"\b" matches the null string at either the start or the end of a word.
</P>
<P>"\B" matches a null string within a word.
</P>
<P>The start of the sequence passed to the matching algorithms is considered to be
a potential start of a word unless the flag match_not_bow is set. The end of
the sequence passed to the matching algorithms is considered to be a potential
end of a word unless the flag match_not_eow is set.
</P>
<H3>Buffer operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library, and Perl regular expressions:
</P>
<P>"\`" matches the start of a buffer.
</P>
<P>"\A" matches the start of the buffer.
</P>
<P>"\'" matches the end of a buffer.
</P>
<P>"\z" matches the end of a buffer.
</P>
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
followed by the end of the buffer.
</P>
<P>A buffer is considered to consist of the whole sequence passed to the matching
algorithms, unless the flags match_not_bob or match_not_eob are set.
</P>
<H3>Escape operator
</H3>
<P>The escape character "\" has several meanings.
</P>
<P>Inside a set declaration the escape character is a normal character unless the
flag regex_constants::escape_in_lists is set in which case whatever follows the
escape is a literal character regardless of its normal meaning.
</P>
<P>The escape operator may introduce an operator for example: back references, or
a word operator.
</P>
<P>The escape operator may make the following character normal, for example "\*"
represents a literal "*" rather than the repeat operator.
</P>
<H4>Single character escape sequences
</H4>
<P>The following escape sequences are aliases for single characters:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="33%">Escape sequence
</TD>
<TD vAlign="top" width="33%">Character code
</TD>
<TD vAlign="top" width="33%">Meaning
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\a
</TD>
<TD vAlign="top" width="33%">0x07
</TD>
<TD vAlign="top" width="33%">Bell character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\f
</TD>
<TD vAlign="top" width="33%">0x0C
</TD>
<TD vAlign="top" width="33%">Form feed.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\n
</TD>
<TD vAlign="top" width="33%">0x0A
</TD>
<TD vAlign="top" width="33%">Newline character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\r
</TD>
<TD vAlign="top" width="33%">0x0D
</TD>
<TD vAlign="top" width="33%">Carriage return.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\t
</TD>
<TD vAlign="top" width="33%">0x09
</TD>
<TD vAlign="top" width="33%">Tab character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\v
</TD>
<TD vAlign="top" width="33%">0x0B
</TD>
<TD vAlign="top" width="33%">Vertical tab.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\e
</TD>
<TD vAlign="top" width="33%">0x1B
</TD>
<TD vAlign="top" width="33%">ASCII Escape character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\0dd
</TD>
<TD vAlign="top" width="33%">0dd
</TD>
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
more octal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\xXX
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\x{XX}
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits, optionally a Unicode character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\cZ
</TD>
<TD vAlign="top" width="33%">z-@
</TD>
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
ASCII character greater than or equal to the character code for '@'.
</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<H4>Miscellaneous escape sequences:
</H4>
<P>The following are provided mostly for perl compatibility, but note that there
are some differences in the meanings of \l \L \u and \U:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\w
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\W
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\s
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\S
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\d
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\D
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\l
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\L
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\u
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\U
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\C
</TD>
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\X
</TD>
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
example "a\x 0301" (a letter a with an acute).
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\Q
</TD>
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
treated as a literal character until a \E end quote operator is found.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\E
</TD>
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
with \Q.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
</TABLE>
</P>
<H3>What gets matched?
</H3>
<P>
When the expression is compiled as a Perl-compatible regex then the matching
algorithms will perform a depth first search on the state machine and report
the first match found.</P>
<P>
When the expression is compiled as a POSIX-compatible regex then the matching
algorithms will match the first possible matching string, if more than one
string starting at a given location can match then it matches the longest
possible string, unless the flag match_any is set, in which case the first
match encountered is returned. Use of the match_any option can reduce the time
taken to find the match - but is only useful if the user is less concerned
about what matched - for example it would not be suitable for search and
replace operations. In cases where their are multiple possible matches all
starting at the same location, and all of the same length, then the match
chosen is the one with the longest first sub-expression, if that is the same
for two or more matches, then the second sub-expression will be examined and so
on.
</P><P>
The following table examples illustrate the main differences between Perl and
POSIX regular expression matching rules:
</P>
<P>
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
<TBODY>
<TR>
<TD vAlign="top" width="25%">
<P>Expression</P>
</TD>
<TD vAlign="top" width="25%">
<P>Text</P>
</TD>
<TD vAlign="top" width="25%">
<P>POSIX leftmost longest match</P>
</TD>
<TD vAlign="top" width="25%">
<P>ECMAScript depth first search match</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>a|ab</CODE></P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
xaby</CODE>
</P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
"ab"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"a"</CODE></P></TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*([[:alnum:]]+).*</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
" abc def xyz "</CODE></P></TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "abc"</P>
</TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "z"</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*(a|xayy)</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
zzxayyzz</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"zzxayy"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>"zzxa"</CODE></P>
</TD>
</TR>
</TBODY></CODE></TD></TR></TABLE>
<P>These differences between Perl matching rules, and POSIX matching rules, mean
that these two regular expression syntaxes differ not only in the features
offered, but also in the form that the state machine takes and/or the
algorithms used to traverse the state machine.</p>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

View File

@ -0,0 +1,332 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: syntax_option_type</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">syntax_option_type</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>Type syntax_option type is an implementation defined bitmask type that controls
how a regular expression string is to be interpreted.&nbsp; For convenience
note that all the constants listed here, are also duplicated within the scope
of class template <A href="basic_regex.html">basic_regex</A>.</P>
<PRE>namespace std{ namespace regex_constants{
typedef bitmask_type syntax_option_type;
// these flags are standardized:
static const syntax_option_type normal;
static const syntax_option_type icase;
static const syntax_option_type nosubs;
static const syntax_option_type optimize;
static const syntax_option_type collate;
static const syntax_option_type ECMAScript = normal;
static const syntax_option_type JavaScript = normal;
static const syntax_option_type JScript = normal;
static const syntax_option_type basic;
static const syntax_option_type extended;
static const syntax_option_type awk;
static const syntax_option_type grep;
static const syntax_option_type egrep;
static const syntax_option_type sed = basic;
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
} // namespace regex_constants
} // namespace std</PRE>
<H3>Description</H3>
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
type (17.3.2.1.2). Setting its elements has the effects listed in the table
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
or perl</CODE> set.</P>
<P>Note that for convenience all the constants listed here are duplicated within
the scope of class template basic_regex, so you can use any of:</P>
<PRE>boost::regex_constants::constant_name</PRE>
<P>or</P>
<PRE>boost::regex::constant_name</PRE>
<P>or</P>
<PRE>boost::wregex::constant_name</PRE>
<P>in an interchangeable manner.</P>
<P>
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="316">
<P>Element</P>
</TD>
<TD vAlign="top" width="50%">
<P>Effect if set</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>normal</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine uses its
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
(FWD.1).</P>
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>icase</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that matching of regular expressions against a character container
sequence shall be performed without regard to case.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>nosubs</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that when a regular expression is matched against a character
container sequence, then no sub-expression matches are to be stored in the
supplied match_results structure.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>optimize</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the regular expression engine should pay more attention to the
speed with which regular expressions are matched, and less to the speed with
which regular expression objects are constructed. Otherwise it has no
detectable effect on the program output.&nbsp; This currently has no effect for
boost.regex.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>collate</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>ECMAScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JavaScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>basic</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
Section 9, Regular Expressions (FWD.1).
</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>extended</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX extended regular expressions in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
Headers, Section 9, Regular Expressions (FWD.1).</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>awk</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
(FWD.1).</P>
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
character classes permitted.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>grep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX basic syntax, but with the newline character
acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>egrep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep when given the -E option in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
Utilities, Section 4, Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX extended syntax, but with the newline
character acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>sed</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as basic.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>perl</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
</TABLE>
</P>
<P>The following constants are specific to this particular regular expression
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>:</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
characters, for example [\]] represents the set of characters containing only
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::char_classes</TD>
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
are allowed inside character set declarations, for example "[[:word:]]"
represents the set of all characters that belong to the character class "word".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: intervals</TD>
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
a's.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
ordinary characters in all situations.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
has the same effect as the alternation operator "|".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
more repetition operator and "\?" represents the zero or one repetition
operator. When this bit is not set then "+" and "?" are used instead.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
bounded repetitions and "{" and "}" are normal characters. This is the opposite
of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
group sub-expressions and "(" and ")" are ordinary characters, this is the
opposite of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
<TD vAlign="top" width="45%">When this bit is set then back references are
allowed.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
alternation operator and "|" is an ordinary character. This is the opposite of
default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: use_except</TD>
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
exception will be thrown on error.&nbsp; Use of this flag is deprecated -
basic_regex will always throw on error.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: failbit</TD>
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
not set, then this bit should be checked to see if a regular expression is
valid before usage.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::literal</TD>
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
there are no special characters or escape sequences.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
</TR>
</TABLE>
</P>
<HR>
<P>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

View File

@ -0,0 +1,68 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Thread Safety</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Thread Safety</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>Class <A href="basic_regex.html">basic_regex</A>&lt;&gt; and its typedefs regex
and wregex are thread safe, in that compiled regular expressions can safely be
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
is now thread safe, in that the results of a match can be safely copied from
one thread to another (for example one thread may find matches and push
match_results instances onto a queue, while another thread pops them off the
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
per thread.
</P>
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
thread safe, regular expressions compiled with <I>regcomp</I> can also be
shared between threads.
</P>
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
gets its own RegEx instance (apartment threading) - this is a consequence of
RegEx handling both compiling and matching regular expressions.
</P>
<P>Finally note that changing the global locale invalidates all compiled regular
expressions, therefore calling <I>set_locale</I> from one thread while another
uses regular expressions <I>will</I> produce unpredictable results.
</P>
<P>
There is also a requirement that there is only one thread executing prior to
the start of main().</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

BIN
doc/Attic/uarrow.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

79
doc/standards.html Normal file
View File

@ -0,0 +1,79 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Standards Conformance</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Standards Conformance</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>C++</H3>
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>, which will appear in a
future C++ standard technical report (and hopefully in a future version of the
standard).&nbsp; Currently there are some differences in how the regular
expression traits classes are defined, these will be fixed in a future release.</P>
<H3>ECMAScript / JavaScript</H3>
<P>All of the ECMAScript regular expression syntax features are supported, except
that:</P>
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
definitions ( [...] ).</P>
<P>The escape sequence \u matches any upper case character (the same as
[[:upper:]])&nbsp;rather than a Unicode escape sequence; use \x{DDDD} for
Unicode escape sequences.</P>
<H3>Perl</H3>
<P>Almost all Perl features are supported, except for:</P>
<P>\N{name}&nbsp; Use [[:name:]] instead.</P>
<P>\pP and \PP</P>
<P>(?imsx-imsx)</P>
<P>(?&lt;=pattern)</P>
<P>(?&lt;!pattern)</P>
<P>(?{code})</P>
<P>(??{code})</P>
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
<P>These embarrassments / limitations will be removed in due course, mainly
dependent upon user demand.</P>
<H3>POSIX</H3>
<P>All the POSIX basic and extended regular expression features are supported,
except that:</P>
<P>No character collating names are recognized except those specified in the POSIX
standard for the C locale, unless they are explicitly registered with the
traits class.</P>
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
Win32.&nbsp; Implementing this feature requires knowledge of the format of the
string sort keys produced by the system; if you need this, and the default
implementation doesn't work on your platform, then you will need to supply a
custom traits class.</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

426
doc/sub_match.html Normal file
View File

@ -0,0 +1,426 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: sub_match</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">sub_match</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>#include &lt;<A href="../../boost/regex.hpp">boost/regex.hpp</A>&gt;
</P>
<P>Regular expressions are different from many simple pattern-matching algorithms
in that as well as finding an overall match they can also produce
sub-expression matches: each sub-expression being delimited in the pattern by a
pair of parenthesis (...). There has to be some method for reporting
sub-expression matches back to the user: this is achieved this by defining a
class <I><A href="match_results.htm">match_results</A></I> that acts as an
indexed collection of sub-expression matches, each sub-expression match being
contained in an object of type <I>sub_match</I>
.
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
of type <EM><A href="match_results.html">match_results</A></EM>
.
<P>When the marked sub-expression denoted by an object of type sub_match&lt;&gt;
participated in a regular expression match then member <CODE>matched</CODE> evaluates
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
range of characters <CODE>[first,second)</CODE> which formed that match.
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
contained undefined values.</P>
<P>If an object of type <CODE>sub_match&lt;&gt;</CODE> represents sub-expression 0
- that is to say the whole match - then member <CODE>matched</CODE> is always
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
character range that formed the partial match.</P>
<PRE>
namespace boost{
template &lt;class BidirectionalIterator&gt;
class sub_match : public std::pair&lt;BidirectionalIterator, BidirectionalIterator&gt;
{
public:
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::value_type value_type;
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::difference_type difference_type;
typedef BidirectionalIterator iterator;
bool matched;
difference_type length()const;
operator basic_string&lt;value_type&gt;()const;
basic_string&lt;value_type&gt; str()const;
int compare(const sub_match&amp; s)const;
int compare(const basic_string&lt;value_type&gt;&amp; s)const;
int compare(const value_type* s)const;
};
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os,
const sub_match&lt;BidirectionalIterator&gt;&amp; m);
} // namespace boost</PRE>
<H3>Description</H3>
<H4>
sub_match members</H4>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::value_type value_type;</PRE>
<P>The type pointed to by the iterators.</P>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::difference_type difference_type;</PRE>
<P>A type that represents the difference between two iterators.</P>
<PRE>typedef iterator iterator_type;</PRE>
<P>The iterator type.</P>
<PRE>iterator first</PRE>
<P>An iterator denoting the position of the start of the match.</P>
<PRE>iterator second</PRE>
<P>An iterator denoting the position of the end of the match.</P>
<PRE>bool matched</PRE>
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
<PRE>static difference_type length();</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string&lt;value_type&gt;()const;</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;()).</P></CODE><PRE>basic_string&lt;value_type&gt; str()const;</PRE>
<P><B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;())</CODE>.</P><PRE>int compare(const sub_match&amp; s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string&lt;value_type&gt;&amp; s)const;</PRE>
<P><B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
<H4>
sub_match non-member operators</H4>
<PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P><B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os
const sub_match&lt;BidirectionalIterator&gt;&amp; m);</PRE>
<P> <B>
Effects: </B>returns <CODE>(os &lt;&lt; m.str())</CODE>.
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

773
doc/syntax.html Normal file
View File

@ -0,0 +1,773 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Regular Expression Syntax</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Regular Expression Syntax</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>This section covers the regular expression syntax used by this library, this is
a programmers guide, the actual syntax presented to your program's users will
depend upon the flags used during expression compilation.
</P>
<H3>Literals
</H3>
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
a "\". A literal is a character that matches itself, or matches the result of
traits_type::translate(), where traits_type is the traits template parameter to
class basic_regex.</P>
<H3>Wildcard
</H3>
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
is passed to the matching algorithms, the dot does not match a null character;
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
the dot does not match a newline character.
</P>
<H3>Repeats
</H3>
<P>A repeat is an expression that is repeated an arbitrary number of times. An
expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times, but at least
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
then "+" is an ordinary character and "\+" represents a repeat of once or more.
An expression followed by "?" may be repeated zero or one times only, if the
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
ordinary character and "\?" represents the repeat zero or once operator. When
it is necessary to specify the minimum and maximum number of repeats
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
no upper limit. Note that there must be no white-space inside the {}, and there
is no upper limit on the values of the lower and upper bounds. When the
expression is compiled with the flag regex_constants::bk_braces then "{" and
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
instead. All repeat expressions refer to the shortest possible previous
sub-expression: a single character; a character set, or a sub-expression
grouped with "()" for example.
</P>
<P>Examples:
</P>
<P>"ba*" will match all of "b", "ba", "baaa" etc.
</P>
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
</P>
<P>"ba?" will match "b" or "ba".
</P>
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
</P>
<H3>Non-greedy repeats
</H3>
<P>Whenever the "extended" regular expression syntax is in use (the default) then
non-greedy repeats are possible by appending a '?' after the repeat; a
non-greedy repeat is one which will match the <I>shortest</I> possible string.
</P>
<P>For example to match html tag pairs one could use something like:
</P>
<P>"&lt;\s*tagname[^&gt;]*&gt;(.*?)&lt;\s*/tagname\s*&gt;"
</P>
<P>In this case $1 will contain the text between the tag pairs, and will be the
shortest possible matching string.&nbsp;
</P>
<H3>Parenthesis
</H3>
<P>Parentheses serve two purposes, to group items together into a sub-expression,
and to mark what generated the match. For example the expression "(ab)*" would
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
match_results</A> contains information both on what the whole expression
matched and on what each sub-expression matched. In the example above
match_results[1] would contain a pair of iterators denoting the final "ab" of
the matching string. It is permissible for sub-expressions to match null
strings. If a sub-expression takes no part in a match - for example if it is
part of an alternative that is not taken - then both of the iterators that are
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
from left to right starting from 1, sub-expression 0 is the whole expression.
</P>
<H3>Non-Marking Parenthesis
</H3>
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
the parenthesis to spit out another marked sub-expression, in this case a
non-marking parenthesis (?:expression) can be used. For example the following
expression creates no sub-expressions:
</P>
<P>"(?:abc)*"</P>
<H3>Forward Lookahead Asserts&nbsp;
</H3>
<P>There are two forms of these; one for positive forward lookahead asserts, and
one for negative lookahead asserts:</P>
<P>"(?=abc)" matches zero characters only if they are followed by the expression
"abc".</P>
<P>"(?!abc)" matches zero characters only if they are not followed by the
expression "abc".</P>
<H3>Independent sub-expressions</H3>
<P>"(?&gt;expression)" matches "expression" as an independent atom (the algorithm
will not backtrack into it if a failure occurs later in the expression).</P>
<H3>Alternatives
</H3>
<P>Alternatives occur when the expression can match either one sub-expression or
another, each alternative is separated by a "|", or a "\|" if the flag
regex_constants::bk_vbar is set, or by a newline character if the flag
regex_constants::newline_alt is set. Each alternative is the largest possible
previous sub-expression; this is the opposite behavior from repetition
operators.
</P>
<P>Examples:
</P>
<P>"a(b|c)" could match "ab" or "ac".
</P>
<P>"abc|def" could match "abc" or "def".
</P>
<H3>Sets
</H3>
<P>A set is a set of characters that can match any single character that is a
member of the set. Sets are delimited by "[" and "]" and can contain literals,
character ranges, character classes, collating elements and equivalence
classes. Set declarations that start with "^" contain the compliment of the
elements that follow.
</P>
<P>Examples:
</P>
<P>Character literals:
</P>
<P>"[abc]" will match either of "a", "b", or "c".
</P>
<P>"[^abc] will match any character other than "a", "b", or "c".
</P>
<P>Character ranges:
</P>
<P>"[a-z]" will match any character in the range "a" to "z".
</P>
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
</P>
<P>Note that character ranges are highly locale dependent if the flag
regex_constants::collate is set: they match any character that collates between
the endpoints of the range, ranges will only behave according to ASCII rules
when the default "C" locale is in effect. For example if the library is
compiled with the Win32 localization model, then [a-z] will match the ASCII
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
'z'. This locale specific behavior is disabled by default (in perl mode), and
forces ranges to collate according to ASCII character code.
</P>
<P>Character classes are denoted using the syntax "[:classname:]" within a set
declaration, for example "[[:space:]]" is the set of all whitespace characters.
Character classes are only available if the flag regex_constants::char_classes
is set. The available character classes are:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="50%">alnum</TD>
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">alpha</TD>
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
characters may also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">blank</TD>
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">cntrl</TD>
<TD vAlign="top" width="50%">Any control character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">digit</TD>
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">graph</TD>
<TD vAlign="top" width="50%">Any graphical character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">lower</TD>
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">print</TD>
<TD vAlign="top" width="50%">Any printable character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">punct</TD>
<TD vAlign="top" width="50%">Any punctuation character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">space</TD>
<TD vAlign="top" width="50%">Any whitespace character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">upper</TD>
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">xdigit</TD>
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">word</TD>
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
the underscore.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">Unicode</TD>
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
applies to the wide character traits classes only.</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<P>There are some shortcuts that can be used in place of the character classes,
provided the flag regex_constants::escape_in_lists is set then you can use:
</P>
<P>\w in place of [:word:]
</P>
<P>\s in place of [:space:]
</P>
<P>\d in place of [:digit:]
</P>
<P>\l in place of [:lower:]
</P>
<P>\u in place of [:upper:]&nbsp;
</P>
<P>Collating elements take the general form [.tagname.] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
equivalent to [,]. The library supports all the standard POSIX collating
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
"nj", "dz", "lj", each in lower, upper and title case variations.
Multi-character collating elements can result in the set matching more than one
character, for example [[.ae.]] would match two characters, but note that
[^[.ae.]] would only match one character.&nbsp;
</P>
<P>
Equivalence classes take the general form[=tagname=] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, and matches any character that is a member of the same primary
equivalence class as the collating element [.tagname.]. An equivalence class is
a set of characters that collate the same, a primary equivalence class is a set
of characters whose primary sort key are all the same (for example strings are
typically collated by character, then by accent, and then by case; the primary
sort key then relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
locale independent method of obtaining the primary sort key for a character,
except under Win32. For other operating systems the library will "guess" the
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
equivalence classes are probably best considered broken under any operating
system other than Win32.&nbsp;
</P>
<P>To include a literal "-" in a set declaration then: make it the first character
after the opening "[" or "[^", the endpoint of a range, a collating element, or
if the flag regex_constants::escape_in_lists is set then precede with an escape
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
make them the endpoint of a range, a collating element, or precede with an
escape character if the flag regex_constants::escape_in_lists is set.
</P>
<H3>Line anchors
</H3>
<P>An anchor is something that matches the null string at the start or end of a
line: "^" matches the null string at the start of a line, "$" matches the null
string at the end of a line.
</P>
<H3>Back references
</H3>
<P>A back reference is a reference to a previous sub-expression that has already
been matched, the reference is to what the sub-expression matched, not to the
expression itself. A back reference consists of the escape character "\"
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
to the second etc. For example the expression "(.*)\1" matches any string that
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
reference to a sub-expression that did not participate in any match, matches
the null string: NB this is different to some other regular expression
matchers. Back references are only available if the expression is compiled with
the flag regex_constants::bk_refs set.
</P>
<H3>Characters by code
</H3>
<P>This is an extension to the algorithm that is not available in other libraries,
it consists of the escape character followed by the digit "0" followed by the
octal character code. For example "\023" represents the character whose octal
code is 23. Where ambiguity could occur use parentheses to break the expression
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
character 10 followed by "3". To match characters by their hexadecimal code,
use \x followed by a string of hexadecimal digits, optionally enclosed inside
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
character.</P>
<H3>Word operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library.
</P>
<P>"\w" matches any single character that is a member of the "word" character
class, this is identical to the expression "[[:word:]]".
</P>
<P>"\W" matches any single character that is not a member of the "word" character
class, this is identical to the expression "[^[:word:]]".
</P>
<P>"\&lt;" matches the null string at the start of a word.
</P>
<P>"\&gt;" matches the null string at the end of the word.
</P>
<P>"\b" matches the null string at either the start or the end of a word.
</P>
<P>"\B" matches a null string within a word.
</P>
<P>The start of the sequence passed to the matching algorithms is considered to be
a potential start of a word unless the flag match_not_bow is set. The end of
the sequence passed to the matching algorithms is considered to be a potential
end of a word unless the flag match_not_eow is set.
</P>
<H3>Buffer operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library, and Perl regular expressions:
</P>
<P>"\`" matches the start of a buffer.
</P>
<P>"\A" matches the start of the buffer.
</P>
<P>"\'" matches the end of a buffer.
</P>
<P>"\z" matches the end of a buffer.
</P>
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
followed by the end of the buffer.
</P>
<P>A buffer is considered to consist of the whole sequence passed to the matching
algorithms, unless the flags match_not_bob or match_not_eob are set.
</P>
<H3>Escape operator
</H3>
<P>The escape character "\" has several meanings.
</P>
<P>Inside a set declaration the escape character is a normal character unless the
flag regex_constants::escape_in_lists is set in which case whatever follows the
escape is a literal character regardless of its normal meaning.
</P>
<P>The escape operator may introduce an operator for example: back references, or
a word operator.
</P>
<P>The escape operator may make the following character normal, for example "\*"
represents a literal "*" rather than the repeat operator.
</P>
<H4>Single character escape sequences
</H4>
<P>The following escape sequences are aliases for single characters:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="33%">Escape sequence
</TD>
<TD vAlign="top" width="33%">Character code
</TD>
<TD vAlign="top" width="33%">Meaning
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\a
</TD>
<TD vAlign="top" width="33%">0x07
</TD>
<TD vAlign="top" width="33%">Bell character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\f
</TD>
<TD vAlign="top" width="33%">0x0C
</TD>
<TD vAlign="top" width="33%">Form feed.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\n
</TD>
<TD vAlign="top" width="33%">0x0A
</TD>
<TD vAlign="top" width="33%">Newline character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\r
</TD>
<TD vAlign="top" width="33%">0x0D
</TD>
<TD vAlign="top" width="33%">Carriage return.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\t
</TD>
<TD vAlign="top" width="33%">0x09
</TD>
<TD vAlign="top" width="33%">Tab character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\v
</TD>
<TD vAlign="top" width="33%">0x0B
</TD>
<TD vAlign="top" width="33%">Vertical tab.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\e
</TD>
<TD vAlign="top" width="33%">0x1B
</TD>
<TD vAlign="top" width="33%">ASCII Escape character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\0dd
</TD>
<TD vAlign="top" width="33%">0dd
</TD>
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
more octal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\xXX
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\x{XX}
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits, optionally a Unicode character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\cZ
</TD>
<TD vAlign="top" width="33%">z-@
</TD>
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
ASCII character greater than or equal to the character code for '@'.
</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<H4>Miscellaneous escape sequences:
</H4>
<P>The following are provided mostly for perl compatibility, but note that there
are some differences in the meanings of \l \L \u and \U:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\w
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\W
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\s
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\S
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\d
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\D
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\l
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\L
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\u
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\U
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\C
</TD>
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\X
</TD>
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
example "a\x 0301" (a letter a with an acute).
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\Q
</TD>
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
treated as a literal character until a \E end quote operator is found.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\E
</TD>
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
with \Q.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
</TABLE>
</P>
<H3>What gets matched?
</H3>
<P>
When the expression is compiled as a Perl-compatible regex then the matching
algorithms will perform a depth first search on the state machine and report
the first match found.</P>
<P>
When the expression is compiled as a POSIX-compatible regex then the matching
algorithms will match the first possible matching string, if more than one
string starting at a given location can match then it matches the longest
possible string, unless the flag match_any is set, in which case the first
match encountered is returned. Use of the match_any option can reduce the time
taken to find the match - but is only useful if the user is less concerned
about what matched - for example it would not be suitable for search and
replace operations. In cases where their are multiple possible matches all
starting at the same location, and all of the same length, then the match
chosen is the one with the longest first sub-expression, if that is the same
for two or more matches, then the second sub-expression will be examined and so
on.
</P><P>
The following table examples illustrate the main differences between Perl and
POSIX regular expression matching rules:
</P>
<P>
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
<TBODY>
<TR>
<TD vAlign="top" width="25%">
<P>Expression</P>
</TD>
<TD vAlign="top" width="25%">
<P>Text</P>
</TD>
<TD vAlign="top" width="25%">
<P>POSIX leftmost longest match</P>
</TD>
<TD vAlign="top" width="25%">
<P>ECMAScript depth first search match</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>a|ab</CODE></P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
xaby</CODE>
</P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
"ab"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"a"</CODE></P></TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*([[:alnum:]]+).*</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
" abc def xyz "</CODE></P></TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "abc"</P>
</TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "z"</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*(a|xayy)</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
zzxayyzz</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"zzxayy"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>"zzxa"</CODE></P>
</TD>
</TR>
</TBODY></CODE></TD></TR></TABLE>
<P>These differences between Perl matching rules, and POSIX matching rules, mean
that these two regular expression syntaxes differ not only in the features
offered, but also in the form that the state machine takes and/or the
algorithms used to traverse the state machine.</p>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

332
doc/syntax_option_type.html Normal file
View File

@ -0,0 +1,332 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: syntax_option_type</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">syntax_option_type</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>Type syntax_option type is an implementation defined bitmask type that controls
how a regular expression string is to be interpreted.&nbsp; For convenience
note that all the constants listed here, are also duplicated within the scope
of class template <A href="basic_regex.html">basic_regex</A>.</P>
<PRE>namespace std{ namespace regex_constants{
typedef bitmask_type syntax_option_type;
// these flags are standardized:
static const syntax_option_type normal;
static const syntax_option_type icase;
static const syntax_option_type nosubs;
static const syntax_option_type optimize;
static const syntax_option_type collate;
static const syntax_option_type ECMAScript = normal;
static const syntax_option_type JavaScript = normal;
static const syntax_option_type JScript = normal;
static const syntax_option_type basic;
static const syntax_option_type extended;
static const syntax_option_type awk;
static const syntax_option_type grep;
static const syntax_option_type egrep;
static const syntax_option_type sed = basic;
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
} // namespace regex_constants
} // namespace std</PRE>
<H3>Description</H3>
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
type (17.3.2.1.2). Setting its elements has the effects listed in the table
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
or perl</CODE> set.</P>
<P>Note that for convenience all the constants listed here are duplicated within
the scope of class template basic_regex, so you can use any of:</P>
<PRE>boost::regex_constants::constant_name</PRE>
<P>or</P>
<PRE>boost::regex::constant_name</PRE>
<P>or</P>
<PRE>boost::wregex::constant_name</PRE>
<P>in an interchangeable manner.</P>
<P>
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="316">
<P>Element</P>
</TD>
<TD vAlign="top" width="50%">
<P>Effect if set</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>normal</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine uses its
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
(FWD.1).</P>
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>icase</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that matching of regular expressions against a character container
sequence shall be performed without regard to case.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>nosubs</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that when a regular expression is matched against a character
container sequence, then no sub-expression matches are to be stored in the
supplied match_results structure.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>optimize</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the regular expression engine should pay more attention to the
speed with which regular expressions are matched, and less to the speed with
which regular expression objects are constructed. Otherwise it has no
detectable effect on the program output.&nbsp; This currently has no effect for
boost.regex.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>collate</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>ECMAScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JavaScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>basic</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
Section 9, Regular Expressions (FWD.1).
</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>extended</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX extended regular expressions in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
Headers, Section 9, Regular Expressions (FWD.1).</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>awk</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
(FWD.1).</P>
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
character classes permitted.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>grep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX basic syntax, but with the newline character
acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>egrep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep when given the -E option in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
Utilities, Section 4, Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX extended syntax, but with the newline
character acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>sed</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as basic.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>perl</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
</TABLE>
</P>
<P>The following constants are specific to this particular regular expression
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>:</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
characters, for example [\]] represents the set of characters containing only
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::char_classes</TD>
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
are allowed inside character set declarations, for example "[[:word:]]"
represents the set of all characters that belong to the character class "word".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: intervals</TD>
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
a's.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
ordinary characters in all situations.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
has the same effect as the alternation operator "|".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
more repetition operator and "\?" represents the zero or one repetition
operator. When this bit is not set then "+" and "?" are used instead.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
bounded repetitions and "{" and "}" are normal characters. This is the opposite
of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
group sub-expressions and "(" and ")" are ordinary characters, this is the
opposite of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
<TD vAlign="top" width="45%">When this bit is set then back references are
allowed.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
alternation operator and "|" is an ordinary character. This is the opposite of
default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: use_except</TD>
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
exception will be thrown on error.&nbsp; Use of this flag is deprecated -
basic_regex will always throw on error.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: failbit</TD>
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
not set, then this bit should be checked to see if a regular expression is
valid before usage.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::literal</TD>
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
there are no special characters or escape sequences.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
</TR>
</TABLE>
</P>
<HR>
<P>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

68
doc/thread_safety.html Normal file
View File

@ -0,0 +1,68 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Thread Safety</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Thread Safety</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>Class <A href="basic_regex.html">basic_regex</A>&lt;&gt; and its typedefs regex
and wregex are thread safe, in that compiled regular expressions can safely be
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
is now thread safe, in that the results of a match can be safely copied from
one thread to another (for example one thread may find matches and push
match_results instances onto a queue, while another thread pops them off the
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
per thread.
</P>
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
thread safe, regular expressions compiled with <I>regcomp</I> can also be
shared between threads.
</P>
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
gets its own RegEx instance (apartment threading) - this is a consequence of
RegEx handling both compiling and matching regular expressions.
</P>
<P>Finally note that changing the global locale invalidates all compiled regular
expressions, therefore calling <I>set_locale</I> from one thread while another
uses regular expressions <I>will</I> produce unpredictable results.
</P>
<P>
There is also a requirement that there is only one thread executing prior to
the start of main().</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

BIN
doc/uarrow.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

705
doc/vc71-performance.html Normal file
View File

@ -0,0 +1,705 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Regular Expression Performance Comparison (Visual Studio.NET 2003)</title>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
<META content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot" name="Template">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
</head>
<body bgcolor="#ffffff" link="#0000ff" vlink="#800080">
<h2>Regular Expression Performance Comparison</h2>
<p>The following tables provide comparisons between the following regular
expression libraries:</p>
<p><a href="http://research.microsoft.com/projects/greta"> GRETA</a>.</p>
<p><a href="http://www.boost.org/">The Boost regex library</a>.</p>
<p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a>
- this is provided for comparison as a typical non-backtracking implementation.</p>
<p>Philip Hazel's <a href="http://www.pcre.org">PCRE</a> library.</p>
<h3>Details</h3>
<p>Machine: Intel Pentium 4 2.8GHz PC.</p>
<p>Compiler: Microsoft Visual C++ version 7.1.</p>
<p>C++ Standard Library: Dinkumware standard library version 313.</p>
<p>OS: Win32.</p>
<p>Boost version: 1.31.0.</p>
<p>PCRE version: 3.9.</p>
<p>As ever care should be taken in interpreting the results, only sensible regular
expressions (rather than pathological cases) are given, most are taken from the
Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of
Regular Expressions</a>. In addition, some variation in the relative
performance of these libraries can be expected on other machines - as memory
access and processor caching effects can be quite large for most finite state
machine algorithms.&nbsp; In each case the first figure given is the relative
time taken (so a value of 1.0 is as good as it gets), while the second figure
is the actual time taken.</p>
<h3>Averages</h3>
<p>The following are the average relative scores for all the tests: the perfect
regular expression library&nbsp;would score 1, in practice anything less than 2
is pretty good.</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td>6.90669</td>
<td>23.751</td>
<td>1.62553</td>
<td>1.38213</td>
<td>110.973</td>
<td>1.69371</td>
</tr>
</table>
<br>
<br>
<h3>Comparison 1: Long Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within a long English language text was measured
(<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a>
from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb).&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>Twain</code></td>
<td>19.7<br>
(0.541s)</td>
<td>85.5<br>
(2.35s)</td>
<td>3.09<br>
(0.0851s)</td>
<td>3.09<br>
(0.0851s)</td>
<td>131<br>
(3.6s)</td>
<td><font color="#008000">1<br>
(0.0275s)</font></td>
</tr>
<tr>
<td><code>Huck[[:alpha:]]+</code></td>
<td>11<br>
(0.55s)</td>
<td>93.4<br>
(4.68s)</td>
<td>3.4<br>
(0.17s)</td>
<td>3.35<br>
(0.168s)</td>
<td>124<br>
(6.19s)</td>
<td><font color="#008000">1<br>
(0.0501s)</font></td>
</tr>
<tr>
<td><code>[[:alpha:]]+ing</code></td>
<td>11.3<br>
(6.82s)</td>
<td>21.3<br>
(12.8s)</td>
<td>1.83<br>
(1.1s)</td>
<td><font color="#008000">1<br>
(0.601s)</font></td>
<td>6.47<br>
(3.89s)</td>
<td>4.75<br>
(2.85s)</td>
</tr>
<tr>
<td><code>^[^ ]*?Twain</code></td>
<td>5.75<br>
(1.15s)</td>
<td>17.1<br>
(3.43s)</td>
<td><font color="#008000">1<br>
(0.2s)</font></td>
<td>1.3<br>
(0.26s)</td>
<td>NA</td>
<td>3.8<br>
(0.761s)</td>
</tr>
<tr>
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
<td>28.5<br>
(3.1s)</td>
<td>77.2<br>
(8.4s)</td>
<td>2.3<br>
(0.251s)</td>
<td><font color="#008000">1<br>
(0.109s)</font></td>
<td>191<br>
(20.8s)</td>
<td>1.77<br>
(0.193s)</td>
</tr>
<tr>
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
<td>16.2<br>
(4.14s)</td>
<td>49<br>
(12.5s)</td>
<td>1.65<br>
(0.42s)</td>
<td><font color="#008000">1<br>
(0.255s)</font></td>
<td>NA</td>
<td>2.43<br>
(0.62s)</td>
</tr>
</table>
<br>
<br>
<h3>Comparison 2: Medium Sized Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within a medium sized English language text was
measured (the first 50K from mtent12.txt).&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>Twain</code></td>
<td>9.49<br>
(0.00274s)</td>
<td>40.7<br>
(0.0117s)</td>
<td>1.54<br>
(0.000445s)</td>
<td>1.56<br>
(0.00045s)</td>
<td>13.5<br>
(0.00391s)</td>
<td><font color="#008000">1<br>
(0.000289s)</font></td>
</tr>
<tr>
<td><code>Huck[[:alpha:]]+</code></td>
<td>14.3<br>
(0.0027s)</td>
<td>62.3<br>
(0.0117s)</td>
<td>2.26<br>
(0.000425s)</td>
<td>2.29<br>
(0.000431s)</td>
<td>1.27<br>
(0.000239s)</td>
<td><font color="#008000">1<br>
(0.000188s)</font></td>
</tr>
<tr>
<td><code>[[:alpha:]]+ing</code></td>
<td>7.34<br>
(0.0178s)</td>
<td>13.7<br>
(0.0331s)</td>
<td><font color="#008000">1<br>
(0.00243s)</font></td>
<td><font color="#008000">1.02<br>
(0.00246s)</font></td>
<td>7.36<br>
(0.0178s)</td>
<td>5.87<br>
(0.0142s)</td>
</tr>
<tr>
<td><code>^[^ ]*?Twain</code></td>
<td>8.34<br>
(0.00579s)</td>
<td>24.8<br>
(0.0172s)</td>
<td>1.52<br>
(0.00105s)</td>
<td><font color="#008000">1<br>
(0.000694s)</font></td>
<td>NA</td>
<td>2.81<br>
(0.00195s)</td>
</tr>
<tr>
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
<td>12.9<br>
(0.00781s)</td>
<td>35.1<br>
(0.0213s)</td>
<td>1.67<br>
(0.00102s)</td>
<td><font color="#008000">1<br>
(0.000606s)</font></td>
<td>81.5<br>
(0.0494s)</td>
<td>1.94<br>
(0.00117s)</td>
</tr>
<tr>
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
<td>15.6<br>
(0.0106s)</td>
<td>46.6<br>
(0.0319s)</td>
<td>2.72<br>
(0.00186s)</td>
<td><font color="#008000">1<br>
(0.000684s)</font></td>
<td>311<br>
(0.213s)</td>
<td>1.72<br>
(0.00117s)</td>
</tr>
</table>
<br>
<br>
<h3>Comparison 3:&nbsp;C++ Code&nbsp;Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within the C++ source file <a href="../../../boost/crc.hpp">
boost/crc.hpp</a>&nbsp;was measured.&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code> ^(template[[:space:]]*&lt;[^;:{]+&gt;[[:space:]]*)?(class|struct)[[:space:]]*(\&lt;\w+\&gt;([
]*\([^)]*\))?[[:space:]]*)*(\&lt;\w*\&gt;)[[:space:]]*(&lt;[^;:{]+&gt;[[:space:]]*)?(\{|:[^;\{()]*\{)</code></td>
<td>8.88<br>
(0.000792s)</td>
<td>46.4<br>
(0.00414s)</td>
<td>1.19<br>
(0.000106s)</td>
<td><font color="#008000">1<br>
(8.92e-005s)</font></td>
<td>688<br>
(0.0614s)</td>
<td>3.23<br>
(0.000288s)</td>
</tr>
<tr>
<td><code>(^[
]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\&lt;([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\&gt;|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\&lt;(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\&gt;</code></td>
<td><font color="#008000">1<br>
(0.00571s)</font></td>
<td>5.31<br>
(0.0303s)</td>
<td>2.47<br>
(0.0141s)</td>
<td>1.92<br>
(0.011s)</td>
<td>NA</td>
<td>3.29<br>
(0.0188s)</td>
</tr>
<tr>
<td><code>^[ ]*#[ ]*include[ ]+("[^"]+"|&lt;[^&gt;]+&gt;)</code></td>
<td>5.78<br>
(0.00172s)</td>
<td>26.3<br>
(0.00783s)</td>
<td>1.12<br>
(0.000333s)</td>
<td><font color="#008000">1<br>
(0.000298s)</font></td>
<td>128<br>
(0.0382s)</td>
<td>1.74<br>
(0.000518s)</td>
</tr>
<tr>
<td><code>^[ ]*#[ ]*include[ ]+("boost/[^"]+"|&lt;boost/[^&gt;]+&gt;)</code></td>
<td>10.2<br>
(0.00305s)</td>
<td>28.4<br>
(0.00845s)</td>
<td>1.12<br>
(0.000333s)</td>
<td><font color="#008000">1<br>
(0.000298s)</font></td>
<td>155<br>
(0.0463s)</td>
<td>1.74<br>
(0.000519s)</td>
</tr>
</table>
<br>
<h3></h3>
<H3>Comparison 4: HTML Document Search
</H3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within the html file <a href="../../libraries.htm">libs/libraries.htm</a>
was measured.&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>beman|john|dave</code></td>
<td>11<br>
(0.00297s)</td>
<td>34.3<br>
(0.00922s)</td>
<td>1.78<br>
(0.000479s)</td>
<td><font color="#008000">1<br>
(0.000269s)</font></td>
<td>55.2<br>
(0.0149s)</td>
<td>1.85<br>
(0.000499s)</td>
</tr>
<tr>
<td><code>&lt;p&gt;.*?&lt;/p&gt;</code></td>
<td>5.38<br>
(0.00145s)</td>
<td>21.8<br>
(0.00587s)</td>
<td><font color="#008000">1.02<br>
(0.000274s)</font></td>
<td><font color="#008000">1<br>
(0.000269s)</font></td>
<td>NA</td>
<td><font color="#008000">1.05<br>
(0.000283s)</font></td>
</tr>
<tr>
<td><code> &lt;a[^&gt;]+href=("[^"]*"|[^[:space:]]+)[^&gt;]*&gt;</code></td>
<td>4.51<br>
(0.00207s)</td>
<td>12.6<br>
(0.00579s)</td>
<td>1.34<br>
(0.000616s)</td>
<td><font color="#008000">1<br>
(0.000459s)</font></td>
<td>343<br>
(0.158s)</td>
<td><font color="#008000">1.09<br>
(0.000499s)</font></td>
</tr>
<tr>
<td><code> &lt;h[12345678][^&gt;]*&gt;.*?&lt;/h[12345678]&gt;</code></td>
<td>7.39<br>
(0.00143s)</td>
<td>29.6<br>
(0.00571s)</td>
<td>1.87<br>
(0.000362s)</td>
<td><font color="#008000">1<br>
(0.000193s)</font></td>
<td>NA</td>
<td>1.27<br>
(0.000245s)</td>
</tr>
<tr>
<td><code> &lt;img[^&gt;]+src=("[^"]*"|[^[:space:]]+)[^&gt;]*&gt;</code></td>
<td>6.73<br>
(0.00145s)</td>
<td>27.3<br>
(0.00587s)</td>
<td>1.2<br>
(0.000259s)</td>
<td>1.32<br>
(0.000283s)</td>
<td>148<br>
(0.0319s)</td>
<td><font color="#008000">1<br>
(0.000215s)</font></td>
</tr>
<tr>
<td><code> &lt;font[^&gt;]+face=("[^"]*"|[^[:space:]]+)[^&gt;]*&gt;.*?&lt;/font&gt;</code></td>
<td>6.93<br>
(0.00153s)</td>
<td>27<br>
(0.00595s)</td>
<td>1.22<br>
(0.000269s)</td>
<td>1.31<br>
(0.000289s)</td>
<td>NA</td>
<td><font color="#008000">1<br>
(0.00022s)</font></td>
</tr>
</table>
<br>
<br>
<h3>Comparison 3: Simple Matches</h3>
<p>For each of the following regular expressions the time taken to match against
the text indicated was measured.&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>Text</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>abc</code></td>
<td>abc</td>
<td>1.31<br>
(2.2e-007s)</td>
<td>1.94<br>
(3.25e-007s)</td>
<td>1.26<br>
(2.1e-007s)</td>
<td>1.24<br>
(2.08e-007s)</td>
<td>3.03<br>
(5.06e-007s)</td>
<td><font color="#008000">1<br>
(1.67e-007s)</font></td>
</tr>
<tr>
<td><code>^([0-9]+)(\-| |$)(.*)$</code></td>
<td>100- this is a line of ftp response which contains a message string</td>
<td>1.52<br>
(6.88e-007s)</td>
<td>2.28<br>
(1.03e-006s)</td>
<td>1.5<br>
(6.78e-007s)</td>
<td>1.5<br>
(6.78e-007s)</td>
<td>329<br>
(0.000149s)</td>
<td><font color="#008000">1<br>
(4.53e-007s)</font></td>
</tr>
<tr>
<td><code>([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td>
<td>1234-5678-1234-456</td>
<td>2.04<br>
(1.03e-006s)</td>
<td>2.83<br>
(1.43e-006s)</td>
<td>2.12<br>
(1.07e-006s)</td>
<td>2.04<br>
(1.03e-006s)</td>
<td>30.8<br>
(1.56e-005s)</td>
<td><font color="#008000">1<br>
(5.05e-007s)</font></td>
</tr>
<tr>
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
<td>john_maddock@compuserve.com</td>
<td>1.48<br>
(1.78e-006s)</td>
<td>2.1<br>
(2.52e-006s)</td>
<td>1.35<br>
(1.62e-006s)</td>
<td>1.32<br>
(1.59e-006s)</td>
<td>165<br>
(0.000198s)</td>
<td><font color="#008000">1<br>
(1.2e-006s)</font></td>
</tr>
<tr>
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
<td>foo12@foo.edu</td>
<td>1.28<br>
(1.41e-006s)</td>
<td>1.9<br>
(2.1e-006s)</td>
<td>1.42<br>
(1.57e-006s)</td>
<td>1.38<br>
(1.53e-006s)</td>
<td>107<br>
(0.000119s)</td>
<td><font color="#008000">1<br>
(1.11e-006s)</font></td>
</tr>
<tr>
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
<td>bob.smith@foo.tv</td>
<td>1.29<br>
(1.43e-006s)</td>
<td>1.9<br>
(2.1e-006s)</td>
<td>1.42<br>
(1.57e-006s)</td>
<td>1.38<br>
(1.53e-006s)</td>
<td>119<br>
(0.000132s)</td>
<td><font color="#008000">1<br>
(1.11e-006s)</font></td>
</tr>
<tr>
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
<td>EH10 2QQ</td>
<td>1.26<br>
(4.63e-007s)</td>
<td>1.77<br>
(6.49e-007s)</td>
<td>1.3<br>
(4.77e-007s)</td>
<td>1.2<br>
(4.4e-007s)</td>
<td>9.15<br>
(3.36e-006s)</td>
<td><font color="#008000">1<br>
(3.68e-007s)</font></td>
</tr>
<tr>
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
<td>G1 1AA</td>
<td><font color="#008000">1.06<br>
(4.73e-007s)</font></td>
<td>1.59<br>
(7.07e-007s)</td>
<td><font color="#008000">1.05<br>
(4.68e-007s)</font></td>
<td><font color="#008000">1<br>
(4.44e-007s)</font></td>
<td>12.9<br>
(5.73e-006s)</td>
<td>1.63<br>
(7.26e-007s)</td>
</tr>
<tr>
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
<td>SW1 1ZZ</td>
<td>1.26<br>
(9.17e-007s)</td>
<td>1.84<br>
(1.34e-006s)</td>
<td>1.28<br>
(9.26e-007s)</td>
<td>1.21<br>
(8.78e-007s)</td>
<td>8.42<br>
(6.11e-006s)</td>
<td><font color="#008000">1<br>
(7.26e-007s)</font></td>
</tr>
<tr>
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
<td>4/1/2001</td>
<td>1.57<br>
(9.73e-007s)</td>
<td>2.28<br>
(1.41e-006s)</td>
<td>1.25<br>
(7.73e-007s)</td>
<td>1.26<br>
(7.83e-007s)</td>
<td>11.2<br>
(6.95e-006s)</td>
<td><font color="#008000">1<br>
(6.21e-007s)</font></td>
</tr>
<tr>
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
<td>12/12/2001</td>
<td>1.52<br>
(9.56e-007s)</td>
<td>2.06<br>
(1.3e-006s)</td>
<td>1.29<br>
(8.12e-007s)</td>
<td>1.24<br>
(7.83e-007s)</td>
<td>12.4<br>
(7.8e-006s)</td>
<td><font color="#008000">1<br>
(6.3e-007s)</font></td>
</tr>
<tr>
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
<td>123</td>
<td>2.11<br>
(7.35e-007s)</td>
<td>3.18<br>
(1.11e-006s)</td>
<td>2.5<br>
(8.7e-007s)</td>
<td>2.44<br>
(8.5e-007s)</td>
<td>5.26<br>
(1.83e-006s)</td>
<td><font color="#008000">1<br>
(3.49e-007s)</font></td>
</tr>
<tr>
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
<td>+3.14159</td>
<td>1.31<br>
(4.96e-007s)</td>
<td>1.92<br>
(7.26e-007s)</td>
<td>1.26<br>
(4.77e-007s)</td>
<td>1.2<br>
(4.53e-007s)</td>
<td>9.71<br>
(3.66e-006s)</td>
<td><font color="#008000">1<br>
(3.77e-007s)</font></td>
</tr>
<tr>
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
<td>-3.14159</td>
<td>1.32<br>
(4.97e-007s)</td>
<td>1.92<br>
(7.26e-007s)</td>
<td>1.24<br>
(4.67e-007s)</td>
<td>1.2<br>
(4.53e-007s)</td>
<td>9.7<br>
(3.66e-006s)</td>
<td><font color="#008000">1<br>
(3.78e-007s)</font></td>
</tr>
</table>
<br>
<br>
<hr>
<p>Copyright John Maddock April 2003, all rights reserved.</p>
</body>
</html>