Merged regex-4 branch.

[SVN r18431]
This commit is contained in:
John Maddock
2003-05-17 11:55:51 +00:00
parent f0f32bdda1
commit 1f15026060
42 changed files with 7254 additions and 7501 deletions

File diff suppressed because it is too large Load Diff

79
doc/Attic/standards.html Normal file
View File

@ -0,0 +1,79 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Standards Conformance</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Standards Conformance</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>C++</H3>
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>, which will appear in a
future C++ standard technical report (and hopefully in a future version of the
standard).&nbsp; Currently there are some differences in how the regular
expression traits classes are defined, these will be fixed in a future release.</P>
<H3>ECMAScript / JavaScript</H3>
<P>All of the ECMAScript regular expression syntax features are supported, except
that:</P>
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
definitions ( [...] ).</P>
<P>The escape sequence \u matches any upper case character (the same as
[[:upper:]])&nbsp;rather than a Unicode escape sequence; use \x{DDDD} for
Unicode escape sequences.</P>
<H3>Perl</H3>
<P>Almost all Perl features are supported, except for:</P>
<P>\N{name}&nbsp; Use [[:name:]] instead.</P>
<P>\pP and \PP</P>
<P>(?imsx-imsx)</P>
<P>(?&lt;=pattern)</P>
<P>(?&lt;!pattern)</P>
<P>(?{code})</P>
<P>(??{code})</P>
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
<P>These embarrassments / limitations will be removed in due course, mainly
dependent upon user demand.</P>
<H3>POSIX</H3>
<P>All the POSIX basic and extended regular expression features are supported,
except that:</P>
<P>No character collating names are recognized except those specified in the POSIX
standard for the C locale, unless they are explicitly registered with the
traits class.</P>
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
Win32.&nbsp; Implementing this feature requires knowledge of the format of the
string sort keys produced by the system; if you need this, and the default
implementation doesn't work on your platform, then you will need to supply a
custom traits class.</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

426
doc/Attic/sub_match.html Normal file
View File

@ -0,0 +1,426 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: sub_match</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">sub_match</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>#include &lt;<A href="../../boost/regex.hpp">boost/regex.hpp</A>&gt;
</P>
<P>Regular expressions are different from many simple pattern-matching algorithms
in that as well as finding an overall match they can also produce
sub-expression matches: each sub-expression being delimited in the pattern by a
pair of parenthesis (...). There has to be some method for reporting
sub-expression matches back to the user: this is achieved this by defining a
class <I><A href="match_results.htm">match_results</A></I> that acts as an
indexed collection of sub-expression matches, each sub-expression match being
contained in an object of type <I>sub_match</I>
.
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
of type <EM><A href="match_results.html">match_results</A></EM>
.
<P>When the marked sub-expression denoted by an object of type sub_match&lt;&gt;
participated in a regular expression match then member <CODE>matched</CODE> evaluates
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
range of characters <CODE>[first,second)</CODE> which formed that match.
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
contained undefined values.</P>
<P>If an object of type <CODE>sub_match&lt;&gt;</CODE> represents sub-expression 0
- that is to say the whole match - then member <CODE>matched</CODE> is always
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
character range that formed the partial match.</P>
<PRE>
namespace boost{
template &lt;class BidirectionalIterator&gt;
class sub_match : public std::pair&lt;BidirectionalIterator, BidirectionalIterator&gt;
{
public:
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::value_type value_type;
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::difference_type difference_type;
typedef BidirectionalIterator iterator;
bool matched;
difference_type length()const;
operator basic_string&lt;value_type&gt;()const;
basic_string&lt;value_type&gt; str()const;
int compare(const sub_match&amp; s)const;
int compare(const basic_string&lt;value_type&gt;&amp; s)const;
int compare(const value_type* s)const;
};
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os,
const sub_match&lt;BidirectionalIterator&gt;&amp; m);
} // namespace boost</PRE>
<H3>Description</H3>
<H4>
sub_match members</H4>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::value_type value_type;</PRE>
<P>The type pointed to by the iterators.</P>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::difference_type difference_type;</PRE>
<P>A type that represents the difference between two iterators.</P>
<PRE>typedef iterator iterator_type;</PRE>
<P>The iterator type.</P>
<PRE>iterator first</PRE>
<P>An iterator denoting the position of the start of the match.</P>
<PRE>iterator second</PRE>
<P>An iterator denoting the position of the end of the match.</P>
<PRE>bool matched</PRE>
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
<PRE>static difference_type length();</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string&lt;value_type&gt;()const;</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;()).</P></CODE><PRE>basic_string&lt;value_type&gt; str()const;</PRE>
<P><B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;())</CODE>.</P><PRE>int compare(const sub_match&amp; s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string&lt;value_type&gt;&amp; s)const;</PRE>
<P><B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
<H4>
sub_match non-member operators</H4>
<PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P><B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os
const sub_match&lt;BidirectionalIterator&gt;&amp; m);</PRE>
<P> <B>
Effects: </B>returns <CODE>(os &lt;&lt; m.str())</CODE>.
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

773
doc/Attic/syntax.html Normal file
View File

@ -0,0 +1,773 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Regular Expression Syntax</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Regular Expression Syntax</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>This section covers the regular expression syntax used by this library, this is
a programmers guide, the actual syntax presented to your program's users will
depend upon the flags used during expression compilation.
</P>
<H3>Literals
</H3>
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
a "\". A literal is a character that matches itself, or matches the result of
traits_type::translate(), where traits_type is the traits template parameter to
class basic_regex.</P>
<H3>Wildcard
</H3>
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
is passed to the matching algorithms, the dot does not match a null character;
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
the dot does not match a newline character.
</P>
<H3>Repeats
</H3>
<P>A repeat is an expression that is repeated an arbitrary number of times. An
expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times, but at least
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
then "+" is an ordinary character and "\+" represents a repeat of once or more.
An expression followed by "?" may be repeated zero or one times only, if the
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
ordinary character and "\?" represents the repeat zero or once operator. When
it is necessary to specify the minimum and maximum number of repeats
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
no upper limit. Note that there must be no white-space inside the {}, and there
is no upper limit on the values of the lower and upper bounds. When the
expression is compiled with the flag regex_constants::bk_braces then "{" and
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
instead. All repeat expressions refer to the shortest possible previous
sub-expression: a single character; a character set, or a sub-expression
grouped with "()" for example.
</P>
<P>Examples:
</P>
<P>"ba*" will match all of "b", "ba", "baaa" etc.
</P>
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
</P>
<P>"ba?" will match "b" or "ba".
</P>
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
</P>
<H3>Non-greedy repeats
</H3>
<P>Whenever the "extended" regular expression syntax is in use (the default) then
non-greedy repeats are possible by appending a '?' after the repeat; a
non-greedy repeat is one which will match the <I>shortest</I> possible string.
</P>
<P>For example to match html tag pairs one could use something like:
</P>
<P>"&lt;\s*tagname[^&gt;]*&gt;(.*?)&lt;\s*/tagname\s*&gt;"
</P>
<P>In this case $1 will contain the text between the tag pairs, and will be the
shortest possible matching string.&nbsp;
</P>
<H3>Parenthesis
</H3>
<P>Parentheses serve two purposes, to group items together into a sub-expression,
and to mark what generated the match. For example the expression "(ab)*" would
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
match_results</A> contains information both on what the whole expression
matched and on what each sub-expression matched. In the example above
match_results[1] would contain a pair of iterators denoting the final "ab" of
the matching string. It is permissible for sub-expressions to match null
strings. If a sub-expression takes no part in a match - for example if it is
part of an alternative that is not taken - then both of the iterators that are
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
from left to right starting from 1, sub-expression 0 is the whole expression.
</P>
<H3>Non-Marking Parenthesis
</H3>
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
the parenthesis to spit out another marked sub-expression, in this case a
non-marking parenthesis (?:expression) can be used. For example the following
expression creates no sub-expressions:
</P>
<P>"(?:abc)*"</P>
<H3>Forward Lookahead Asserts&nbsp;
</H3>
<P>There are two forms of these; one for positive forward lookahead asserts, and
one for negative lookahead asserts:</P>
<P>"(?=abc)" matches zero characters only if they are followed by the expression
"abc".</P>
<P>"(?!abc)" matches zero characters only if they are not followed by the
expression "abc".</P>
<H3>Independent sub-expressions</H3>
<P>"(?&gt;expression)" matches "expression" as an independent atom (the algorithm
will not backtrack into it if a failure occurs later in the expression).</P>
<H3>Alternatives
</H3>
<P>Alternatives occur when the expression can match either one sub-expression or
another, each alternative is separated by a "|", or a "\|" if the flag
regex_constants::bk_vbar is set, or by a newline character if the flag
regex_constants::newline_alt is set. Each alternative is the largest possible
previous sub-expression; this is the opposite behavior from repetition
operators.
</P>
<P>Examples:
</P>
<P>"a(b|c)" could match "ab" or "ac".
</P>
<P>"abc|def" could match "abc" or "def".
</P>
<H3>Sets
</H3>
<P>A set is a set of characters that can match any single character that is a
member of the set. Sets are delimited by "[" and "]" and can contain literals,
character ranges, character classes, collating elements and equivalence
classes. Set declarations that start with "^" contain the compliment of the
elements that follow.
</P>
<P>Examples:
</P>
<P>Character literals:
</P>
<P>"[abc]" will match either of "a", "b", or "c".
</P>
<P>"[^abc] will match any character other than "a", "b", or "c".
</P>
<P>Character ranges:
</P>
<P>"[a-z]" will match any character in the range "a" to "z".
</P>
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
</P>
<P>Note that character ranges are highly locale dependent if the flag
regex_constants::collate is set: they match any character that collates between
the endpoints of the range, ranges will only behave according to ASCII rules
when the default "C" locale is in effect. For example if the library is
compiled with the Win32 localization model, then [a-z] will match the ASCII
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
'z'. This locale specific behavior is disabled by default (in perl mode), and
forces ranges to collate according to ASCII character code.
</P>
<P>Character classes are denoted using the syntax "[:classname:]" within a set
declaration, for example "[[:space:]]" is the set of all whitespace characters.
Character classes are only available if the flag regex_constants::char_classes
is set. The available character classes are:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="50%">alnum</TD>
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">alpha</TD>
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
characters may also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">blank</TD>
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">cntrl</TD>
<TD vAlign="top" width="50%">Any control character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">digit</TD>
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">graph</TD>
<TD vAlign="top" width="50%">Any graphical character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">lower</TD>
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">print</TD>
<TD vAlign="top" width="50%">Any printable character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">punct</TD>
<TD vAlign="top" width="50%">Any punctuation character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">space</TD>
<TD vAlign="top" width="50%">Any whitespace character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">upper</TD>
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">xdigit</TD>
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">word</TD>
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
the underscore.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">Unicode</TD>
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
applies to the wide character traits classes only.</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<P>There are some shortcuts that can be used in place of the character classes,
provided the flag regex_constants::escape_in_lists is set then you can use:
</P>
<P>\w in place of [:word:]
</P>
<P>\s in place of [:space:]
</P>
<P>\d in place of [:digit:]
</P>
<P>\l in place of [:lower:]
</P>
<P>\u in place of [:upper:]&nbsp;
</P>
<P>Collating elements take the general form [.tagname.] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
equivalent to [,]. The library supports all the standard POSIX collating
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
"nj", "dz", "lj", each in lower, upper and title case variations.
Multi-character collating elements can result in the set matching more than one
character, for example [[.ae.]] would match two characters, but note that
[^[.ae.]] would only match one character.&nbsp;
</P>
<P>
Equivalence classes take the general form[=tagname=] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, and matches any character that is a member of the same primary
equivalence class as the collating element [.tagname.]. An equivalence class is
a set of characters that collate the same, a primary equivalence class is a set
of characters whose primary sort key are all the same (for example strings are
typically collated by character, then by accent, and then by case; the primary
sort key then relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
locale independent method of obtaining the primary sort key for a character,
except under Win32. For other operating systems the library will "guess" the
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
equivalence classes are probably best considered broken under any operating
system other than Win32.&nbsp;
</P>
<P>To include a literal "-" in a set declaration then: make it the first character
after the opening "[" or "[^", the endpoint of a range, a collating element, or
if the flag regex_constants::escape_in_lists is set then precede with an escape
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
make them the endpoint of a range, a collating element, or precede with an
escape character if the flag regex_constants::escape_in_lists is set.
</P>
<H3>Line anchors
</H3>
<P>An anchor is something that matches the null string at the start or end of a
line: "^" matches the null string at the start of a line, "$" matches the null
string at the end of a line.
</P>
<H3>Back references
</H3>
<P>A back reference is a reference to a previous sub-expression that has already
been matched, the reference is to what the sub-expression matched, not to the
expression itself. A back reference consists of the escape character "\"
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
to the second etc. For example the expression "(.*)\1" matches any string that
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
reference to a sub-expression that did not participate in any match, matches
the null string: NB this is different to some other regular expression
matchers. Back references are only available if the expression is compiled with
the flag regex_constants::bk_refs set.
</P>
<H3>Characters by code
</H3>
<P>This is an extension to the algorithm that is not available in other libraries,
it consists of the escape character followed by the digit "0" followed by the
octal character code. For example "\023" represents the character whose octal
code is 23. Where ambiguity could occur use parentheses to break the expression
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
character 10 followed by "3". To match characters by their hexadecimal code,
use \x followed by a string of hexadecimal digits, optionally enclosed inside
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
character.</P>
<H3>Word operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library.
</P>
<P>"\w" matches any single character that is a member of the "word" character
class, this is identical to the expression "[[:word:]]".
</P>
<P>"\W" matches any single character that is not a member of the "word" character
class, this is identical to the expression "[^[:word:]]".
</P>
<P>"\&lt;" matches the null string at the start of a word.
</P>
<P>"\&gt;" matches the null string at the end of the word.
</P>
<P>"\b" matches the null string at either the start or the end of a word.
</P>
<P>"\B" matches a null string within a word.
</P>
<P>The start of the sequence passed to the matching algorithms is considered to be
a potential start of a word unless the flag match_not_bow is set. The end of
the sequence passed to the matching algorithms is considered to be a potential
end of a word unless the flag match_not_eow is set.
</P>
<H3>Buffer operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library, and Perl regular expressions:
</P>
<P>"\`" matches the start of a buffer.
</P>
<P>"\A" matches the start of the buffer.
</P>
<P>"\'" matches the end of a buffer.
</P>
<P>"\z" matches the end of a buffer.
</P>
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
followed by the end of the buffer.
</P>
<P>A buffer is considered to consist of the whole sequence passed to the matching
algorithms, unless the flags match_not_bob or match_not_eob are set.
</P>
<H3>Escape operator
</H3>
<P>The escape character "\" has several meanings.
</P>
<P>Inside a set declaration the escape character is a normal character unless the
flag regex_constants::escape_in_lists is set in which case whatever follows the
escape is a literal character regardless of its normal meaning.
</P>
<P>The escape operator may introduce an operator for example: back references, or
a word operator.
</P>
<P>The escape operator may make the following character normal, for example "\*"
represents a literal "*" rather than the repeat operator.
</P>
<H4>Single character escape sequences
</H4>
<P>The following escape sequences are aliases for single characters:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="33%">Escape sequence
</TD>
<TD vAlign="top" width="33%">Character code
</TD>
<TD vAlign="top" width="33%">Meaning
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\a
</TD>
<TD vAlign="top" width="33%">0x07
</TD>
<TD vAlign="top" width="33%">Bell character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\f
</TD>
<TD vAlign="top" width="33%">0x0C
</TD>
<TD vAlign="top" width="33%">Form feed.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\n
</TD>
<TD vAlign="top" width="33%">0x0A
</TD>
<TD vAlign="top" width="33%">Newline character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\r
</TD>
<TD vAlign="top" width="33%">0x0D
</TD>
<TD vAlign="top" width="33%">Carriage return.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\t
</TD>
<TD vAlign="top" width="33%">0x09
</TD>
<TD vAlign="top" width="33%">Tab character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\v
</TD>
<TD vAlign="top" width="33%">0x0B
</TD>
<TD vAlign="top" width="33%">Vertical tab.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\e
</TD>
<TD vAlign="top" width="33%">0x1B
</TD>
<TD vAlign="top" width="33%">ASCII Escape character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\0dd
</TD>
<TD vAlign="top" width="33%">0dd
</TD>
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
more octal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\xXX
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\x{XX}
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits, optionally a Unicode character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\cZ
</TD>
<TD vAlign="top" width="33%">z-@
</TD>
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
ASCII character greater than or equal to the character code for '@'.
</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<H4>Miscellaneous escape sequences:
</H4>
<P>The following are provided mostly for perl compatibility, but note that there
are some differences in the meanings of \l \L \u and \U:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\w
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\W
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\s
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\S
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\d
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\D
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\l
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\L
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\u
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\U
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\C
</TD>
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\X
</TD>
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
example "a\x 0301" (a letter a with an acute).
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\Q
</TD>
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
treated as a literal character until a \E end quote operator is found.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\E
</TD>
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
with \Q.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
</TABLE>
</P>
<H3>What gets matched?
</H3>
<P>
When the expression is compiled as a Perl-compatible regex then the matching
algorithms will perform a depth first search on the state machine and report
the first match found.</P>
<P>
When the expression is compiled as a POSIX-compatible regex then the matching
algorithms will match the first possible matching string, if more than one
string starting at a given location can match then it matches the longest
possible string, unless the flag match_any is set, in which case the first
match encountered is returned. Use of the match_any option can reduce the time
taken to find the match - but is only useful if the user is less concerned
about what matched - for example it would not be suitable for search and
replace operations. In cases where their are multiple possible matches all
starting at the same location, and all of the same length, then the match
chosen is the one with the longest first sub-expression, if that is the same
for two or more matches, then the second sub-expression will be examined and so
on.
</P><P>
The following table examples illustrate the main differences between Perl and
POSIX regular expression matching rules:
</P>
<P>
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
<TBODY>
<TR>
<TD vAlign="top" width="25%">
<P>Expression</P>
</TD>
<TD vAlign="top" width="25%">
<P>Text</P>
</TD>
<TD vAlign="top" width="25%">
<P>POSIX leftmost longest match</P>
</TD>
<TD vAlign="top" width="25%">
<P>ECMAScript depth first search match</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>a|ab</CODE></P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
xaby</CODE>
</P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
"ab"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"a"</CODE></P></TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*([[:alnum:]]+).*</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
" abc def xyz "</CODE></P></TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "abc"</P>
</TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "z"</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*(a|xayy)</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
zzxayyzz</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"zzxayy"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>"zzxa"</CODE></P>
</TD>
</TR>
</TBODY></CODE></TD></TR></TABLE>
<P>These differences between Perl matching rules, and POSIX matching rules, mean
that these two regular expression syntaxes differ not only in the features
offered, but also in the form that the state machine takes and/or the
algorithms used to traverse the state machine.</p>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

View File

@ -0,0 +1,332 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: syntax_option_type</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">syntax_option_type</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>Type syntax_option type is an implementation defined bitmask type that controls
how a regular expression string is to be interpreted.&nbsp; For convenience
note that all the constants listed here, are also duplicated within the scope
of class template <A href="basic_regex.html">basic_regex</A>.</P>
<PRE>namespace std{ namespace regex_constants{
typedef bitmask_type syntax_option_type;
// these flags are standardized:
static const syntax_option_type normal;
static const syntax_option_type icase;
static const syntax_option_type nosubs;
static const syntax_option_type optimize;
static const syntax_option_type collate;
static const syntax_option_type ECMAScript = normal;
static const syntax_option_type JavaScript = normal;
static const syntax_option_type JScript = normal;
static const syntax_option_type basic;
static const syntax_option_type extended;
static const syntax_option_type awk;
static const syntax_option_type grep;
static const syntax_option_type egrep;
static const syntax_option_type sed = basic;
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
} // namespace regex_constants
} // namespace std</PRE>
<H3>Description</H3>
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
type (17.3.2.1.2). Setting its elements has the effects listed in the table
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
or perl</CODE> set.</P>
<P>Note that for convenience all the constants listed here are duplicated within
the scope of class template basic_regex, so you can use any of:</P>
<PRE>boost::regex_constants::constant_name</PRE>
<P>or</P>
<PRE>boost::regex::constant_name</PRE>
<P>or</P>
<PRE>boost::wregex::constant_name</PRE>
<P>in an interchangeable manner.</P>
<P>
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="316">
<P>Element</P>
</TD>
<TD vAlign="top" width="50%">
<P>Effect if set</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>normal</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine uses its
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
(FWD.1).</P>
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>icase</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that matching of regular expressions against a character container
sequence shall be performed without regard to case.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>nosubs</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that when a regular expression is matched against a character
container sequence, then no sub-expression matches are to be stored in the
supplied match_results structure.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>optimize</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the regular expression engine should pay more attention to the
speed with which regular expressions are matched, and less to the speed with
which regular expression objects are constructed. Otherwise it has no
detectable effect on the program output.&nbsp; This currently has no effect for
boost.regex.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>collate</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>ECMAScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JavaScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>basic</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
Section 9, Regular Expressions (FWD.1).
</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>extended</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX extended regular expressions in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
Headers, Section 9, Regular Expressions (FWD.1).</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>awk</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
(FWD.1).</P>
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
character classes permitted.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>grep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX basic syntax, but with the newline character
acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>egrep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep when given the -E option in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
Utilities, Section 4, Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX extended syntax, but with the newline
character acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>sed</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as basic.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>perl</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
</TABLE>
</P>
<P>The following constants are specific to this particular regular expression
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>:</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
characters, for example [\]] represents the set of characters containing only
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::char_classes</TD>
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
are allowed inside character set declarations, for example "[[:word:]]"
represents the set of all characters that belong to the character class "word".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: intervals</TD>
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
a's.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
ordinary characters in all situations.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
has the same effect as the alternation operator "|".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
more repetition operator and "\?" represents the zero or one repetition
operator. When this bit is not set then "+" and "?" are used instead.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
bounded repetitions and "{" and "}" are normal characters. This is the opposite
of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
group sub-expressions and "(" and ")" are ordinary characters, this is the
opposite of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
<TD vAlign="top" width="45%">When this bit is set then back references are
allowed.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
alternation operator and "|" is an ordinary character. This is the opposite of
default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: use_except</TD>
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
exception will be thrown on error.&nbsp; Use of this flag is deprecated -
basic_regex will always throw on error.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: failbit</TD>
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
not set, then this bit should be checked to see if a regular expression is
valid before usage.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::literal</TD>
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
there are no special characters or escape sequences.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
</TR>
</TABLE>
</P>
<HR>
<P>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

View File

@ -0,0 +1,68 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Thread Safety</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Thread Safety</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>Class <A href="basic_regex.html">basic_regex</A>&lt;&gt; and its typedefs regex
and wregex are thread safe, in that compiled regular expressions can safely be
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
is now thread safe, in that the results of a match can be safely copied from
one thread to another (for example one thread may find matches and push
match_results instances onto a queue, while another thread pops them off the
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
per thread.
</P>
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
thread safe, regular expressions compiled with <I>regcomp</I> can also be
shared between threads.
</P>
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
gets its own RegEx instance (apartment threading) - this is a consequence of
RegEx handling both compiling and matching regular expressions.
</P>
<P>Finally note that changing the global locale invalidates all compiled regular
expressions, therefore calling <I>set_locale</I> from one thread while another
uses regular expressions <I>will</I> produce unpredictable results.
</P>
<P>
There is also a requirement that there is only one thread executing prior to
the start of main().</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

BIN
doc/Attic/uarrow.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

79
doc/standards.html Normal file
View File

@ -0,0 +1,79 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Standards Conformance</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Standards Conformance</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>C++</H3>
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>, which will appear in a
future C++ standard technical report (and hopefully in a future version of the
standard).&nbsp; Currently there are some differences in how the regular
expression traits classes are defined, these will be fixed in a future release.</P>
<H3>ECMAScript / JavaScript</H3>
<P>All of the ECMAScript regular expression syntax features are supported, except
that:</P>
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
definitions ( [...] ).</P>
<P>The escape sequence \u matches any upper case character (the same as
[[:upper:]])&nbsp;rather than a Unicode escape sequence; use \x{DDDD} for
Unicode escape sequences.</P>
<H3>Perl</H3>
<P>Almost all Perl features are supported, except for:</P>
<P>\N{name}&nbsp; Use [[:name:]] instead.</P>
<P>\pP and \PP</P>
<P>(?imsx-imsx)</P>
<P>(?&lt;=pattern)</P>
<P>(?&lt;!pattern)</P>
<P>(?{code})</P>
<P>(??{code})</P>
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
<P>These embarrassments / limitations will be removed in due course, mainly
dependent upon user demand.</P>
<H3>POSIX</H3>
<P>All the POSIX basic and extended regular expression features are supported,
except that:</P>
<P>No character collating names are recognized except those specified in the POSIX
standard for the C locale, unless they are explicitly registered with the
traits class.</P>
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
Win32.&nbsp; Implementing this feature requires knowledge of the format of the
string sort keys produced by the system; if you need this, and the default
implementation doesn't work on your platform, then you will need to supply a
custom traits class.</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

426
doc/sub_match.html Normal file
View File

@ -0,0 +1,426 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: sub_match</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">sub_match</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>#include &lt;<A href="../../boost/regex.hpp">boost/regex.hpp</A>&gt;
</P>
<P>Regular expressions are different from many simple pattern-matching algorithms
in that as well as finding an overall match they can also produce
sub-expression matches: each sub-expression being delimited in the pattern by a
pair of parenthesis (...). There has to be some method for reporting
sub-expression matches back to the user: this is achieved this by defining a
class <I><A href="match_results.htm">match_results</A></I> that acts as an
indexed collection of sub-expression matches, each sub-expression match being
contained in an object of type <I>sub_match</I>
.
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
of type <EM><A href="match_results.html">match_results</A></EM>
.
<P>When the marked sub-expression denoted by an object of type sub_match&lt;&gt;
participated in a regular expression match then member <CODE>matched</CODE> evaluates
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
range of characters <CODE>[first,second)</CODE> which formed that match.
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
contained undefined values.</P>
<P>If an object of type <CODE>sub_match&lt;&gt;</CODE> represents sub-expression 0
- that is to say the whole match - then member <CODE>matched</CODE> is always
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
character range that formed the partial match.</P>
<PRE>
namespace boost{
template &lt;class BidirectionalIterator&gt;
class sub_match : public std::pair&lt;BidirectionalIterator, BidirectionalIterator&gt;
{
public:
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::value_type value_type;
typedef typename iterator_traits&lt;BidirectionalIterator&gt;::difference_type difference_type;
typedef BidirectionalIterator iterator;
bool matched;
difference_type length()const;
operator basic_string&lt;value_type&gt;()const;
basic_string&lt;value_type&gt; str()const;
int compare(const sub_match&amp; s)const;
int compare(const basic_string&lt;value_type&gt;&amp; s)const;
int compare(const value_type* s)const;
};
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator, class traits, class Allocator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const std::basic_string&lt;iterator_traits&lt;BidirectionalIterator&gt;::value_type, traits, Allocator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs);
template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os,
const sub_match&lt;BidirectionalIterator&gt;&amp; m);
} // namespace boost</PRE>
<H3>Description</H3>
<H4>
sub_match members</H4>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::value_type value_type;</PRE>
<P>The type pointed to by the iterators.</P>
<PRE>typedef typename std::iterator_traits&lt;iterator&gt;::difference_type difference_type;</PRE>
<P>A type that represents the difference between two iterators.</P>
<PRE>typedef iterator iterator_type;</PRE>
<P>The iterator type.</P>
<PRE>iterator first</PRE>
<P>An iterator denoting the position of the start of the match.</P>
<PRE>iterator second</PRE>
<P>An iterator denoting the position of the end of the match.</P>
<PRE>bool matched</PRE>
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
<PRE>static difference_type length();</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string&lt;value_type&gt;()const;</PRE>
<P> <B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;()).</P></CODE><PRE>basic_string&lt;value_type&gt; str()const;</PRE>
<P><B>
Effects: </B>returns <CODE>(matched ? basic_string&lt;value_type&gt;(first,
second) : basic_string&lt;value_type&gt;())</CODE>.</P><PRE>int compare(const sub_match&amp; s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string&lt;value_type&gt;&amp; s)const;</PRE>
<P><B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
<P> <B>
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
<H4>
sub_match non-member operators</H4>
<PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P><B>
Effects: </B>returns <CODE>lhs.compare(rhs) &lt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt;= 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs);</PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.compare(rhs) &gt; 0</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const* rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt; rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &gt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; lhs,
const sub_match&lt;BidirectionalIterator&gt;&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs &lt;= rhs.str()</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator == (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator != (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt; (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt; rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &gt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &gt;= rhs</CODE>.</P><PRE>template &lt;class BidirectionalIterator&gt;
bool operator &lt;= (const sub_match&lt;BidirectionalIterator&gt;&amp; lhs,
typename iterator_traits&lt;BidirectionalIterator&gt;::value_type const&amp; rhs); </PRE>
<P> <B>
Effects: </B>returns <CODE>lhs.str() &lt;= rhs</CODE>.</P><PRE>template &lt;class charT, class traits, class BidirectionalIterator&gt;
basic_ostream&lt;charT, traits&gt;&amp;
operator &lt;&lt; (basic_ostream&lt;charT, traits&gt;&amp; os
const sub_match&lt;BidirectionalIterator&gt;&amp; m);</PRE>
<P> <B>
Effects: </B>returns <CODE>(os &lt;&lt; m.str())</CODE>.
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

773
doc/syntax.html Normal file
View File

@ -0,0 +1,773 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Regular Expression Syntax</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Regular Expression Syntax</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>This section covers the regular expression syntax used by this library, this is
a programmers guide, the actual syntax presented to your program's users will
depend upon the flags used during expression compilation.
</P>
<H3>Literals
</H3>
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
a "\". A literal is a character that matches itself, or matches the result of
traits_type::translate(), where traits_type is the traits template parameter to
class basic_regex.</P>
<H3>Wildcard
</H3>
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
is passed to the matching algorithms, the dot does not match a null character;
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
the dot does not match a newline character.
</P>
<H3>Repeats
</H3>
<P>A repeat is an expression that is repeated an arbitrary number of times. An
expression followed by "*" can be repeated any number of times including zero.
An expression followed by "+" can be repeated any number of times, but at least
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
then "+" is an ordinary character and "\+" represents a repeat of once or more.
An expression followed by "?" may be repeated zero or one times only, if the
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
ordinary character and "\?" represents the repeat zero or once operator. When
it is necessary to specify the minimum and maximum number of repeats
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
no upper limit. Note that there must be no white-space inside the {}, and there
is no upper limit on the values of the lower and upper bounds. When the
expression is compiled with the flag regex_constants::bk_braces then "{" and
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
instead. All repeat expressions refer to the shortest possible previous
sub-expression: a single character; a character set, or a sub-expression
grouped with "()" for example.
</P>
<P>Examples:
</P>
<P>"ba*" will match all of "b", "ba", "baaa" etc.
</P>
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
</P>
<P>"ba?" will match "b" or "ba".
</P>
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
</P>
<H3>Non-greedy repeats
</H3>
<P>Whenever the "extended" regular expression syntax is in use (the default) then
non-greedy repeats are possible by appending a '?' after the repeat; a
non-greedy repeat is one which will match the <I>shortest</I> possible string.
</P>
<P>For example to match html tag pairs one could use something like:
</P>
<P>"&lt;\s*tagname[^&gt;]*&gt;(.*?)&lt;\s*/tagname\s*&gt;"
</P>
<P>In this case $1 will contain the text between the tag pairs, and will be the
shortest possible matching string.&nbsp;
</P>
<H3>Parenthesis
</H3>
<P>Parentheses serve two purposes, to group items together into a sub-expression,
and to mark what generated the match. For example the expression "(ab)*" would
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
match_results</A> contains information both on what the whole expression
matched and on what each sub-expression matched. In the example above
match_results[1] would contain a pair of iterators denoting the final "ab" of
the matching string. It is permissible for sub-expressions to match null
strings. If a sub-expression takes no part in a match - for example if it is
part of an alternative that is not taken - then both of the iterators that are
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
from left to right starting from 1, sub-expression 0 is the whole expression.
</P>
<H3>Non-Marking Parenthesis
</H3>
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
the parenthesis to spit out another marked sub-expression, in this case a
non-marking parenthesis (?:expression) can be used. For example the following
expression creates no sub-expressions:
</P>
<P>"(?:abc)*"</P>
<H3>Forward Lookahead Asserts&nbsp;
</H3>
<P>There are two forms of these; one for positive forward lookahead asserts, and
one for negative lookahead asserts:</P>
<P>"(?=abc)" matches zero characters only if they are followed by the expression
"abc".</P>
<P>"(?!abc)" matches zero characters only if they are not followed by the
expression "abc".</P>
<H3>Independent sub-expressions</H3>
<P>"(?&gt;expression)" matches "expression" as an independent atom (the algorithm
will not backtrack into it if a failure occurs later in the expression).</P>
<H3>Alternatives
</H3>
<P>Alternatives occur when the expression can match either one sub-expression or
another, each alternative is separated by a "|", or a "\|" if the flag
regex_constants::bk_vbar is set, or by a newline character if the flag
regex_constants::newline_alt is set. Each alternative is the largest possible
previous sub-expression; this is the opposite behavior from repetition
operators.
</P>
<P>Examples:
</P>
<P>"a(b|c)" could match "ab" or "ac".
</P>
<P>"abc|def" could match "abc" or "def".
</P>
<H3>Sets
</H3>
<P>A set is a set of characters that can match any single character that is a
member of the set. Sets are delimited by "[" and "]" and can contain literals,
character ranges, character classes, collating elements and equivalence
classes. Set declarations that start with "^" contain the compliment of the
elements that follow.
</P>
<P>Examples:
</P>
<P>Character literals:
</P>
<P>"[abc]" will match either of "a", "b", or "c".
</P>
<P>"[^abc] will match any character other than "a", "b", or "c".
</P>
<P>Character ranges:
</P>
<P>"[a-z]" will match any character in the range "a" to "z".
</P>
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
</P>
<P>Note that character ranges are highly locale dependent if the flag
regex_constants::collate is set: they match any character that collates between
the endpoints of the range, ranges will only behave according to ASCII rules
when the default "C" locale is in effect. For example if the library is
compiled with the Win32 localization model, then [a-z] will match the ASCII
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
'z'. This locale specific behavior is disabled by default (in perl mode), and
forces ranges to collate according to ASCII character code.
</P>
<P>Character classes are denoted using the syntax "[:classname:]" within a set
declaration, for example "[[:space:]]" is the set of all whitespace characters.
Character classes are only available if the flag regex_constants::char_classes
is set. The available character classes are:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="50%">alnum</TD>
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">alpha</TD>
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
characters may also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">blank</TD>
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">cntrl</TD>
<TD vAlign="top" width="50%">Any control character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">digit</TD>
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">graph</TD>
<TD vAlign="top" width="50%">Any graphical character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">lower</TD>
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">print</TD>
<TD vAlign="top" width="50%">Any printable character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">punct</TD>
<TD vAlign="top" width="50%">Any punctuation character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">space</TD>
<TD vAlign="top" width="50%">Any whitespace character.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">upper</TD>
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
also be included depending upon the locale.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">xdigit</TD>
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">word</TD>
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
the underscore.</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="50%">Unicode</TD>
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
applies to the wide character traits classes only.</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<P>There are some shortcuts that can be used in place of the character classes,
provided the flag regex_constants::escape_in_lists is set then you can use:
</P>
<P>\w in place of [:word:]
</P>
<P>\s in place of [:space:]
</P>
<P>\d in place of [:digit:]
</P>
<P>\l in place of [:lower:]
</P>
<P>\u in place of [:upper:]&nbsp;
</P>
<P>Collating elements take the general form [.tagname.] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
equivalent to [,]. The library supports all the standard POSIX collating
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
"nj", "dz", "lj", each in lower, upper and title case variations.
Multi-character collating elements can result in the set matching more than one
character, for example [[.ae.]] would match two characters, but note that
[^[.ae.]] would only match one character.&nbsp;
</P>
<P>
Equivalence classes take the general form[=tagname=] inside a set declaration,
where <I>tagname</I> is either a single character, or a name of a collating
element, and matches any character that is a member of the same primary
equivalence class as the collating element [.tagname.]. An equivalence class is
a set of characters that collate the same, a primary equivalence class is a set
of characters whose primary sort key are all the same (for example strings are
typically collated by character, then by accent, and then by case; the primary
sort key then relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
locale independent method of obtaining the primary sort key for a character,
except under Win32. For other operating systems the library will "guess" the
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
equivalence classes are probably best considered broken under any operating
system other than Win32.&nbsp;
</P>
<P>To include a literal "-" in a set declaration then: make it the first character
after the opening "[" or "[^", the endpoint of a range, a collating element, or
if the flag regex_constants::escape_in_lists is set then precede with an escape
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
make them the endpoint of a range, a collating element, or precede with an
escape character if the flag regex_constants::escape_in_lists is set.
</P>
<H3>Line anchors
</H3>
<P>An anchor is something that matches the null string at the start or end of a
line: "^" matches the null string at the start of a line, "$" matches the null
string at the end of a line.
</P>
<H3>Back references
</H3>
<P>A back reference is a reference to a previous sub-expression that has already
been matched, the reference is to what the sub-expression matched, not to the
expression itself. A back reference consists of the escape character "\"
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
to the second etc. For example the expression "(.*)\1" matches any string that
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
reference to a sub-expression that did not participate in any match, matches
the null string: NB this is different to some other regular expression
matchers. Back references are only available if the expression is compiled with
the flag regex_constants::bk_refs set.
</P>
<H3>Characters by code
</H3>
<P>This is an extension to the algorithm that is not available in other libraries,
it consists of the escape character followed by the digit "0" followed by the
octal character code. For example "\023" represents the character whose octal
code is 23. Where ambiguity could occur use parentheses to break the expression
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
character 10 followed by "3". To match characters by their hexadecimal code,
use \x followed by a string of hexadecimal digits, optionally enclosed inside
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
character.</P>
<H3>Word operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library.
</P>
<P>"\w" matches any single character that is a member of the "word" character
class, this is identical to the expression "[[:word:]]".
</P>
<P>"\W" matches any single character that is not a member of the "word" character
class, this is identical to the expression "[^[:word:]]".
</P>
<P>"\&lt;" matches the null string at the start of a word.
</P>
<P>"\&gt;" matches the null string at the end of the word.
</P>
<P>"\b" matches the null string at either the start or the end of a word.
</P>
<P>"\B" matches a null string within a word.
</P>
<P>The start of the sequence passed to the matching algorithms is considered to be
a potential start of a word unless the flag match_not_bow is set. The end of
the sequence passed to the matching algorithms is considered to be a potential
end of a word unless the flag match_not_eow is set.
</P>
<H3>Buffer operators
</H3>
<P>The following operators are provided for compatibility with the GNU regular
expression library, and Perl regular expressions:
</P>
<P>"\`" matches the start of a buffer.
</P>
<P>"\A" matches the start of the buffer.
</P>
<P>"\'" matches the end of a buffer.
</P>
<P>"\z" matches the end of a buffer.
</P>
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
followed by the end of the buffer.
</P>
<P>A buffer is considered to consist of the whole sequence passed to the matching
algorithms, unless the flags match_not_bob or match_not_eob are set.
</P>
<H3>Escape operator
</H3>
<P>The escape character "\" has several meanings.
</P>
<P>Inside a set declaration the escape character is a normal character unless the
flag regex_constants::escape_in_lists is set in which case whatever follows the
escape is a literal character regardless of its normal meaning.
</P>
<P>The escape operator may introduce an operator for example: back references, or
a word operator.
</P>
<P>The escape operator may make the following character normal, for example "\*"
represents a literal "*" rather than the repeat operator.
</P>
<H4>Single character escape sequences
</H4>
<P>The following escape sequences are aliases for single characters:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="33%">Escape sequence
</TD>
<TD vAlign="top" width="33%">Character code
</TD>
<TD vAlign="top" width="33%">Meaning
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\a
</TD>
<TD vAlign="top" width="33%">0x07
</TD>
<TD vAlign="top" width="33%">Bell character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\f
</TD>
<TD vAlign="top" width="33%">0x0C
</TD>
<TD vAlign="top" width="33%">Form feed.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\n
</TD>
<TD vAlign="top" width="33%">0x0A
</TD>
<TD vAlign="top" width="33%">Newline character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\r
</TD>
<TD vAlign="top" width="33%">0x0D
</TD>
<TD vAlign="top" width="33%">Carriage return.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\t
</TD>
<TD vAlign="top" width="33%">0x09
</TD>
<TD vAlign="top" width="33%">Tab character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\v
</TD>
<TD vAlign="top" width="33%">0x0B
</TD>
<TD vAlign="top" width="33%">Vertical tab.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\e
</TD>
<TD vAlign="top" width="33%">0x1B
</TD>
<TD vAlign="top" width="33%">ASCII Escape character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\0dd
</TD>
<TD vAlign="top" width="33%">0dd
</TD>
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
more octal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\xXX
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\x{XX}
</TD>
<TD vAlign="top" width="33%">0xXX
</TD>
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
hexadecimal digits, optionally a Unicode character.
</TD>
<TD>&nbsp;</TD>
</TR>
<TR>
<TD>&nbsp;</TD>
<TD vAlign="top" width="33%">\cZ
</TD>
<TD vAlign="top" width="33%">z-@
</TD>
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
ASCII character greater than or equal to the character code for '@'.
</TD>
<TD>&nbsp;</TD>
</TR>
</TABLE>
</P>
<H4>Miscellaneous escape sequences:
</H4>
<P>The following are provided mostly for perl compatibility, but note that there
are some differences in the meanings of \l \L \u and \U:
<BR>
&nbsp;
</P>
<P>
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\w
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\W
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\s
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\S
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\d
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\D
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\l
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\L
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\u
</TD>
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\U
</TD>
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\C
</TD>
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\X
</TD>
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
example "a\x 0301" (a letter a with an acute).
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\Q
</TD>
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
treated as a literal character until a \E end quote operator is found.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
<TR>
<TD width="5%">&nbsp;</TD>
<TD vAlign="top" width="45%">\E
</TD>
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
with \Q.
</TD>
<TD width="5%">&nbsp;</TD>
</TR>
</TABLE>
</P>
<H3>What gets matched?
</H3>
<P>
When the expression is compiled as a Perl-compatible regex then the matching
algorithms will perform a depth first search on the state machine and report
the first match found.</P>
<P>
When the expression is compiled as a POSIX-compatible regex then the matching
algorithms will match the first possible matching string, if more than one
string starting at a given location can match then it matches the longest
possible string, unless the flag match_any is set, in which case the first
match encountered is returned. Use of the match_any option can reduce the time
taken to find the match - but is only useful if the user is less concerned
about what matched - for example it would not be suitable for search and
replace operations. In cases where their are multiple possible matches all
starting at the same location, and all of the same length, then the match
chosen is the one with the longest first sub-expression, if that is the same
for two or more matches, then the second sub-expression will be examined and so
on.
</P><P>
The following table examples illustrate the main differences between Perl and
POSIX regular expression matching rules:
</P>
<P>
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
<TBODY>
<TR>
<TD vAlign="top" width="25%">
<P>Expression</P>
</TD>
<TD vAlign="top" width="25%">
<P>Text</P>
</TD>
<TD vAlign="top" width="25%">
<P>POSIX leftmost longest match</P>
</TD>
<TD vAlign="top" width="25%">
<P>ECMAScript depth first search match</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>a|ab</CODE></P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
xaby</CODE>
</P>
</TD>
<TD vAlign="top" width="25%">
<P><CODE>
"ab"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"a"</CODE></P></TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*([[:alnum:]]+).*</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
" abc def xyz "</CODE></P></TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "abc"</P>
</TD>
<TD vAlign="top" width="25%">
<P>$0 = " abc def xyz "<BR>
$1 = "z"</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="25%">
<P><CODE>
.*(a|xayy)</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
zzxayyzz</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>
"zzxayy"</CODE></P></TD>
<TD vAlign="top" width="25%">
<P><CODE>"zzxa"</CODE></P>
</TD>
</TR>
</TBODY></CODE></TD></TR></TABLE>
<P>These differences between Perl matching rules, and POSIX matching rules, mean
that these two regular expression syntaxes differ not only in the features
offered, but also in the form that the state machine takes and/or the
algorithms used to traverse the state machine.</p>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

332
doc/syntax_option_type.html Normal file
View File

@ -0,0 +1,332 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: syntax_option_type</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">syntax_option_type</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<H3>Synopsis</H3>
<P>Type syntax_option type is an implementation defined bitmask type that controls
how a regular expression string is to be interpreted.&nbsp; For convenience
note that all the constants listed here, are also duplicated within the scope
of class template <A href="basic_regex.html">basic_regex</A>.</P>
<PRE>namespace std{ namespace regex_constants{
typedef bitmask_type syntax_option_type;
// these flags are standardized:
static const syntax_option_type normal;
static const syntax_option_type icase;
static const syntax_option_type nosubs;
static const syntax_option_type optimize;
static const syntax_option_type collate;
static const syntax_option_type ECMAScript = normal;
static const syntax_option_type JavaScript = normal;
static const syntax_option_type JScript = normal;
static const syntax_option_type basic;
static const syntax_option_type extended;
static const syntax_option_type awk;
static const syntax_option_type grep;
static const syntax_option_type egrep;
static const syntax_option_type sed = basic;
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
} // namespace regex_constants
} // namespace std</PRE>
<H3>Description</H3>
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
type (17.3.2.1.2). Setting its elements has the effects listed in the table
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
or perl</CODE> set.</P>
<P>Note that for convenience all the constants listed here are duplicated within
the scope of class template basic_regex, so you can use any of:</P>
<PRE>boost::regex_constants::constant_name</PRE>
<P>or</P>
<PRE>boost::regex::constant_name</PRE>
<P>or</P>
<PRE>boost::wregex::constant_name</PRE>
<P>in an interchangeable manner.</P>
<P>
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="316">
<P>Element</P>
</TD>
<TD vAlign="top" width="50%">
<P>Effect if set</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>normal</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine uses its
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
(FWD.1).</P>
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>icase</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that matching of regular expressions against a character container
sequence shall be performed without regard to case.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>nosubs</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that when a regular expression is matched against a character
container sequence, then no sub-expression matches are to be stored in the
supplied match_results structure.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>optimize</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the regular expression engine should pay more attention to the
speed with which regular expressions are matched, and less to the speed with
which regular expression objects are constructed. Otherwise it has no
detectable effect on the program output.&nbsp; This currently has no effect for
boost.regex.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>collate</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>ECMAScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JavaScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>JScript</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>basic</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
Section 9, Regular Expressions (FWD.1).
</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>extended</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX extended regular expressions in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
Headers, Section 9, Regular Expressions (FWD.1).</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>awk</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
(FWD.1).</P>
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
character classes permitted.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>grep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX basic syntax, but with the newline character
acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>egrep</P>
</TD>
<TD vAlign="top" width="50%">
<P>Specifies that the grammar recognized by the regular expression engine is the
same as that used by POSIX utility grep when given the -E option in IEEE Std
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
Utilities, Section 4, Utilities, grep (FWD.1).</P>
<P>That is to say, the same as POSIX extended syntax, but with the newline
character acting as an alternation character in addition to "|".</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>sed</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as basic.</P>
</TD>
</TR>
<TR>
<TD vAlign="top" width="316">
<P>perl</P>
</TD>
<TD vAlign="top" width="50%">
<P>The same as normal.</P>
</TD>
</TR>
</TABLE>
</P>
<P>The following constants are specific to this particular regular expression
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
regular expression standardization proposal</A>:</P>
<P>
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
<TR>
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
characters, for example [\]] represents the set of characters containing only
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::char_classes</TD>
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
are allowed inside character set declarations, for example "[[:word:]]"
represents the set of all characters that belong to the character class "word".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: intervals</TD>
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
a's.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
ordinary characters in all situations.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
has the same effect as the alternation operator "|".</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
more repetition operator and "\?" represents the zero or one repetition
operator. When this bit is not set then "+" and "?" are used instead.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
bounded repetitions and "{" and "}" are normal characters. This is the opposite
of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
group sub-expressions and "(" and ")" are ordinary characters, this is the
opposite of default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
<TD vAlign="top" width="45%">When this bit is set then back references are
allowed.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
alternation operator and "|" is an ordinary character. This is the opposite of
default behavior.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: use_except</TD>
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
exception will be thrown on error.&nbsp; Use of this flag is deprecated -
basic_regex will always throw on error.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase:: failbit</TD>
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
not set, then this bit should be checked to see if a regular expression is
valid before usage.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%">regbase::literal</TD>
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
there are no special characters or escape sequences.</TD>
</TR>
<TR>
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
</TR>
</TABLE>
</P>
<HR>
<P>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

68
doc/thread_safety.html Normal file
View File

@ -0,0 +1,68 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Boost.Regex: Thread Safety</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../../boost.css">
</head>
<body>
<P>
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
<TR>
<td valign="top" width="300">
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
</td>
<TD width="353">
<H1 align="center">Boost.Regex</H1>
<H2 align="center">Thread Safety</H2>
</TD>
<td width="50">
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
</td>
</TR>
</TABLE>
</P>
<HR>
<P>Class <A href="basic_regex.html">basic_regex</A>&lt;&gt; and its typedefs regex
and wregex are thread safe, in that compiled regular expressions can safely be
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
is now thread safe, in that the results of a match can be safely copied from
one thread to another (for example one thread may find matches and push
match_results instances onto a queue, while another thread pops them off the
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
per thread.
</P>
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
thread safe, regular expressions compiled with <I>regcomp</I> can also be
shared between threads.
</P>
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
gets its own RegEx instance (apartment threading) - this is a consequence of
RegEx handling both compiling and matching regular expressions.
</P>
<P>Finally note that changing the global locale invalidates all compiled regular
expressions, therefore calling <I>set_locale</I> from one thread while another
uses regular expressions <I>will</I> produce unpredictable results.
</P>
<P>
There is also a requirement that there is only one thread executing prior to
the start of main().</P>
<HR>
<p>Revised
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
17 May 2003
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
</p>
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a>&nbsp;1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
and its documentation for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting documentation.
Dr John Maddock makes no representations about the suitability of this software
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
</body>
</html>

BIN
doc/uarrow.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 KiB

705
doc/vc71-performance.html Normal file
View File

@ -0,0 +1,705 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title>Regular Expression Performance Comparison (Visual Studio.NET 2003)</title>
<meta name="generator" content="HTML Tidy, see www.w3.org">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
<META content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot" name="Template">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
</head>
<body bgcolor="#ffffff" link="#0000ff" vlink="#800080">
<h2>Regular Expression Performance Comparison</h2>
<p>The following tables provide comparisons between the following regular
expression libraries:</p>
<p><a href="http://research.microsoft.com/projects/greta"> GRETA</a>.</p>
<p><a href="http://www.boost.org/">The Boost regex library</a>.</p>
<p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a>
- this is provided for comparison as a typical non-backtracking implementation.</p>
<p>Philip Hazel's <a href="http://www.pcre.org">PCRE</a> library.</p>
<h3>Details</h3>
<p>Machine: Intel Pentium 4 2.8GHz PC.</p>
<p>Compiler: Microsoft Visual C++ version 7.1.</p>
<p>C++ Standard Library: Dinkumware standard library version 313.</p>
<p>OS: Win32.</p>
<p>Boost version: 1.31.0.</p>
<p>PCRE version: 3.9.</p>
<p>As ever care should be taken in interpreting the results, only sensible regular
expressions (rather than pathological cases) are given, most are taken from the
Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of
Regular Expressions</a>. In addition, some variation in the relative
performance of these libraries can be expected on other machines - as memory
access and processor caching effects can be quite large for most finite state
machine algorithms.&nbsp; In each case the first figure given is the relative
time taken (so a value of 1.0 is as good as it gets), while the second figure
is the actual time taken.</p>
<h3>Averages</h3>
<p>The following are the average relative scores for all the tests: the perfect
regular expression library&nbsp;would score 1, in practice anything less than 2
is pretty good.</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td>6.90669</td>
<td>23.751</td>
<td>1.62553</td>
<td>1.38213</td>
<td>110.973</td>
<td>1.69371</td>
</tr>
</table>
<br>
<br>
<h3>Comparison 1: Long Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within a long English language text was measured
(<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a>
from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb).&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>Twain</code></td>
<td>19.7<br>
(0.541s)</td>
<td>85.5<br>
(2.35s)</td>
<td>3.09<br>
(0.0851s)</td>
<td>3.09<br>
(0.0851s)</td>
<td>131<br>
(3.6s)</td>
<td><font color="#008000">1<br>
(0.0275s)</font></td>
</tr>
<tr>
<td><code>Huck[[:alpha:]]+</code></td>
<td>11<br>
(0.55s)</td>
<td>93.4<br>
(4.68s)</td>
<td>3.4<br>
(0.17s)</td>
<td>3.35<br>
(0.168s)</td>
<td>124<br>
(6.19s)</td>
<td><font color="#008000">1<br>
(0.0501s)</font></td>
</tr>
<tr>
<td><code>[[:alpha:]]+ing</code></td>
<td>11.3<br>
(6.82s)</td>
<td>21.3<br>
(12.8s)</td>
<td>1.83<br>
(1.1s)</td>
<td><font color="#008000">1<br>
(0.601s)</font></td>
<td>6.47<br>
(3.89s)</td>
<td>4.75<br>
(2.85s)</td>
</tr>
<tr>
<td><code>^[^ ]*?Twain</code></td>
<td>5.75<br>
(1.15s)</td>
<td>17.1<br>
(3.43s)</td>
<td><font color="#008000">1<br>
(0.2s)</font></td>
<td>1.3<br>
(0.26s)</td>
<td>NA</td>
<td>3.8<br>
(0.761s)</td>
</tr>
<tr>
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
<td>28.5<br>
(3.1s)</td>
<td>77.2<br>
(8.4s)</td>
<td>2.3<br>
(0.251s)</td>
<td><font color="#008000">1<br>
(0.109s)</font></td>
<td>191<br>
(20.8s)</td>
<td>1.77<br>
(0.193s)</td>
</tr>
<tr>
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
<td>16.2<br>
(4.14s)</td>
<td>49<br>
(12.5s)</td>
<td>1.65<br>
(0.42s)</td>
<td><font color="#008000">1<br>
(0.255s)</font></td>
<td>NA</td>
<td>2.43<br>
(0.62s)</td>
</tr>
</table>
<br>
<br>
<h3>Comparison 2: Medium Sized Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within a medium sized English language text was
measured (the first 50K from mtent12.txt).&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>Twain</code></td>
<td>9.49<br>
(0.00274s)</td>
<td>40.7<br>
(0.0117s)</td>
<td>1.54<br>
(0.000445s)</td>
<td>1.56<br>
(0.00045s)</td>
<td>13.5<br>
(0.00391s)</td>
<td><font color="#008000">1<br>
(0.000289s)</font></td>
</tr>
<tr>
<td><code>Huck[[:alpha:]]+</code></td>
<td>14.3<br>
(0.0027s)</td>
<td>62.3<br>
(0.0117s)</td>
<td>2.26<br>
(0.000425s)</td>
<td>2.29<br>
(0.000431s)</td>
<td>1.27<br>
(0.000239s)</td>
<td><font color="#008000">1<br>
(0.000188s)</font></td>
</tr>
<tr>
<td><code>[[:alpha:]]+ing</code></td>
<td>7.34<br>
(0.0178s)</td>
<td>13.7<br>
(0.0331s)</td>
<td><font color="#008000">1<br>
(0.00243s)</font></td>
<td><font color="#008000">1.02<br>
(0.00246s)</font></td>
<td>7.36<br>
(0.0178s)</td>
<td>5.87<br>
(0.0142s)</td>
</tr>
<tr>
<td><code>^[^ ]*?Twain</code></td>
<td>8.34<br>
(0.00579s)</td>
<td>24.8<br>
(0.0172s)</td>
<td>1.52<br>
(0.00105s)</td>
<td><font color="#008000">1<br>
(0.000694s)</font></td>
<td>NA</td>
<td>2.81<br>
(0.00195s)</td>
</tr>
<tr>
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
<td>12.9<br>
(0.00781s)</td>
<td>35.1<br>
(0.0213s)</td>
<td>1.67<br>
(0.00102s)</td>
<td><font color="#008000">1<br>
(0.000606s)</font></td>
<td>81.5<br>
(0.0494s)</td>
<td>1.94<br>
(0.00117s)</td>
</tr>
<tr>
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
<td>15.6<br>
(0.0106s)</td>
<td>46.6<br>
(0.0319s)</td>
<td>2.72<br>
(0.00186s)</td>
<td><font color="#008000">1<br>
(0.000684s)</font></td>
<td>311<br>
(0.213s)</td>
<td>1.72<br>
(0.00117s)</td>
</tr>
</table>
<br>
<br>
<h3>Comparison 3:&nbsp;C++ Code&nbsp;Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within the C++ source file <a href="../../../boost/crc.hpp">
boost/crc.hpp</a>&nbsp;was measured.&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code> ^(template[[:space:]]*&lt;[^;:{]+&gt;[[:space:]]*)?(class|struct)[[:space:]]*(\&lt;\w+\&gt;([
]*\([^)]*\))?[[:space:]]*)*(\&lt;\w*\&gt;)[[:space:]]*(&lt;[^;:{]+&gt;[[:space:]]*)?(\{|:[^;\{()]*\{)</code></td>
<td>8.88<br>
(0.000792s)</td>
<td>46.4<br>
(0.00414s)</td>
<td>1.19<br>
(0.000106s)</td>
<td><font color="#008000">1<br>
(8.92e-005s)</font></td>
<td>688<br>
(0.0614s)</td>
<td>3.23<br>
(0.000288s)</td>
</tr>
<tr>
<td><code>(^[
]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\&lt;([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\&gt;|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\&lt;(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\&gt;</code></td>
<td><font color="#008000">1<br>
(0.00571s)</font></td>
<td>5.31<br>
(0.0303s)</td>
<td>2.47<br>
(0.0141s)</td>
<td>1.92<br>
(0.011s)</td>
<td>NA</td>
<td>3.29<br>
(0.0188s)</td>
</tr>
<tr>
<td><code>^[ ]*#[ ]*include[ ]+("[^"]+"|&lt;[^&gt;]+&gt;)</code></td>
<td>5.78<br>
(0.00172s)</td>
<td>26.3<br>
(0.00783s)</td>
<td>1.12<br>
(0.000333s)</td>
<td><font color="#008000">1<br>
(0.000298s)</font></td>
<td>128<br>
(0.0382s)</td>
<td>1.74<br>
(0.000518s)</td>
</tr>
<tr>
<td><code>^[ ]*#[ ]*include[ ]+("boost/[^"]+"|&lt;boost/[^&gt;]+&gt;)</code></td>
<td>10.2<br>
(0.00305s)</td>
<td>28.4<br>
(0.00845s)</td>
<td>1.12<br>
(0.000333s)</td>
<td><font color="#008000">1<br>
(0.000298s)</font></td>
<td>155<br>
(0.0463s)</td>
<td>1.74<br>
(0.000519s)</td>
</tr>
</table>
<br>
<h3></h3>
<H3>Comparison 4: HTML Document Search
</H3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within the html file <a href="../../libraries.htm">libs/libraries.htm</a>
was measured.&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>beman|john|dave</code></td>
<td>11<br>
(0.00297s)</td>
<td>34.3<br>
(0.00922s)</td>
<td>1.78<br>
(0.000479s)</td>
<td><font color="#008000">1<br>
(0.000269s)</font></td>
<td>55.2<br>
(0.0149s)</td>
<td>1.85<br>
(0.000499s)</td>
</tr>
<tr>
<td><code>&lt;p&gt;.*?&lt;/p&gt;</code></td>
<td>5.38<br>
(0.00145s)</td>
<td>21.8<br>
(0.00587s)</td>
<td><font color="#008000">1.02<br>
(0.000274s)</font></td>
<td><font color="#008000">1<br>
(0.000269s)</font></td>
<td>NA</td>
<td><font color="#008000">1.05<br>
(0.000283s)</font></td>
</tr>
<tr>
<td><code> &lt;a[^&gt;]+href=("[^"]*"|[^[:space:]]+)[^&gt;]*&gt;</code></td>
<td>4.51<br>
(0.00207s)</td>
<td>12.6<br>
(0.00579s)</td>
<td>1.34<br>
(0.000616s)</td>
<td><font color="#008000">1<br>
(0.000459s)</font></td>
<td>343<br>
(0.158s)</td>
<td><font color="#008000">1.09<br>
(0.000499s)</font></td>
</tr>
<tr>
<td><code> &lt;h[12345678][^&gt;]*&gt;.*?&lt;/h[12345678]&gt;</code></td>
<td>7.39<br>
(0.00143s)</td>
<td>29.6<br>
(0.00571s)</td>
<td>1.87<br>
(0.000362s)</td>
<td><font color="#008000">1<br>
(0.000193s)</font></td>
<td>NA</td>
<td>1.27<br>
(0.000245s)</td>
</tr>
<tr>
<td><code> &lt;img[^&gt;]+src=("[^"]*"|[^[:space:]]+)[^&gt;]*&gt;</code></td>
<td>6.73<br>
(0.00145s)</td>
<td>27.3<br>
(0.00587s)</td>
<td>1.2<br>
(0.000259s)</td>
<td>1.32<br>
(0.000283s)</td>
<td>148<br>
(0.0319s)</td>
<td><font color="#008000">1<br>
(0.000215s)</font></td>
</tr>
<tr>
<td><code> &lt;font[^&gt;]+face=("[^"]*"|[^[:space:]]+)[^&gt;]*&gt;.*?&lt;/font&gt;</code></td>
<td>6.93<br>
(0.00153s)</td>
<td>27<br>
(0.00595s)</td>
<td>1.22<br>
(0.000269s)</td>
<td>1.31<br>
(0.000289s)</td>
<td>NA</td>
<td><font color="#008000">1<br>
(0.00022s)</font></td>
</tr>
</table>
<br>
<br>
<h3>Comparison 3: Simple Matches</h3>
<p>For each of the following regular expressions the time taken to match against
the text indicated was measured.&nbsp;</p>
<table border="1" cellspacing="1">
<tr>
<td><strong>Expression</strong></td>
<td><strong>Text</strong></td>
<td><strong>GRETA</strong></td>
<td><strong>GRETA<br>
(non-recursive mode)</strong></td>
<td><strong>Boost</strong></td>
<td><strong>Boost + C++ locale</strong></td>
<td><strong>POSIX</strong></td>
<td><strong>PCRE</strong></td>
</tr>
<tr>
<td><code>abc</code></td>
<td>abc</td>
<td>1.31<br>
(2.2e-007s)</td>
<td>1.94<br>
(3.25e-007s)</td>
<td>1.26<br>
(2.1e-007s)</td>
<td>1.24<br>
(2.08e-007s)</td>
<td>3.03<br>
(5.06e-007s)</td>
<td><font color="#008000">1<br>
(1.67e-007s)</font></td>
</tr>
<tr>
<td><code>^([0-9]+)(\-| |$)(.*)$</code></td>
<td>100- this is a line of ftp response which contains a message string</td>
<td>1.52<br>
(6.88e-007s)</td>
<td>2.28<br>
(1.03e-006s)</td>
<td>1.5<br>
(6.78e-007s)</td>
<td>1.5<br>
(6.78e-007s)</td>
<td>329<br>
(0.000149s)</td>
<td><font color="#008000">1<br>
(4.53e-007s)</font></td>
</tr>
<tr>
<td><code>([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td>
<td>1234-5678-1234-456</td>
<td>2.04<br>
(1.03e-006s)</td>
<td>2.83<br>
(1.43e-006s)</td>
<td>2.12<br>
(1.07e-006s)</td>
<td>2.04<br>
(1.03e-006s)</td>
<td>30.8<br>
(1.56e-005s)</td>
<td><font color="#008000">1<br>
(5.05e-007s)</font></td>
</tr>
<tr>
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
<td>john_maddock@compuserve.com</td>
<td>1.48<br>
(1.78e-006s)</td>
<td>2.1<br>
(2.52e-006s)</td>
<td>1.35<br>
(1.62e-006s)</td>
<td>1.32<br>
(1.59e-006s)</td>
<td>165<br>
(0.000198s)</td>
<td><font color="#008000">1<br>
(1.2e-006s)</font></td>
</tr>
<tr>
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
<td>foo12@foo.edu</td>
<td>1.28<br>
(1.41e-006s)</td>
<td>1.9<br>
(2.1e-006s)</td>
<td>1.42<br>
(1.57e-006s)</td>
<td>1.38<br>
(1.53e-006s)</td>
<td>107<br>
(0.000119s)</td>
<td><font color="#008000">1<br>
(1.11e-006s)</font></td>
</tr>
<tr>
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
<td>bob.smith@foo.tv</td>
<td>1.29<br>
(1.43e-006s)</td>
<td>1.9<br>
(2.1e-006s)</td>
<td>1.42<br>
(1.57e-006s)</td>
<td>1.38<br>
(1.53e-006s)</td>
<td>119<br>
(0.000132s)</td>
<td><font color="#008000">1<br>
(1.11e-006s)</font></td>
</tr>
<tr>
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
<td>EH10 2QQ</td>
<td>1.26<br>
(4.63e-007s)</td>
<td>1.77<br>
(6.49e-007s)</td>
<td>1.3<br>
(4.77e-007s)</td>
<td>1.2<br>
(4.4e-007s)</td>
<td>9.15<br>
(3.36e-006s)</td>
<td><font color="#008000">1<br>
(3.68e-007s)</font></td>
</tr>
<tr>
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
<td>G1 1AA</td>
<td><font color="#008000">1.06<br>
(4.73e-007s)</font></td>
<td>1.59<br>
(7.07e-007s)</td>
<td><font color="#008000">1.05<br>
(4.68e-007s)</font></td>
<td><font color="#008000">1<br>
(4.44e-007s)</font></td>
<td>12.9<br>
(5.73e-006s)</td>
<td>1.63<br>
(7.26e-007s)</td>
</tr>
<tr>
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
<td>SW1 1ZZ</td>
<td>1.26<br>
(9.17e-007s)</td>
<td>1.84<br>
(1.34e-006s)</td>
<td>1.28<br>
(9.26e-007s)</td>
<td>1.21<br>
(8.78e-007s)</td>
<td>8.42<br>
(6.11e-006s)</td>
<td><font color="#008000">1<br>
(7.26e-007s)</font></td>
</tr>
<tr>
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
<td>4/1/2001</td>
<td>1.57<br>
(9.73e-007s)</td>
<td>2.28<br>
(1.41e-006s)</td>
<td>1.25<br>
(7.73e-007s)</td>
<td>1.26<br>
(7.83e-007s)</td>
<td>11.2<br>
(6.95e-006s)</td>
<td><font color="#008000">1<br>
(6.21e-007s)</font></td>
</tr>
<tr>
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
<td>12/12/2001</td>
<td>1.52<br>
(9.56e-007s)</td>
<td>2.06<br>
(1.3e-006s)</td>
<td>1.29<br>
(8.12e-007s)</td>
<td>1.24<br>
(7.83e-007s)</td>
<td>12.4<br>
(7.8e-006s)</td>
<td><font color="#008000">1<br>
(6.3e-007s)</font></td>
</tr>
<tr>
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
<td>123</td>
<td>2.11<br>
(7.35e-007s)</td>
<td>3.18<br>
(1.11e-006s)</td>
<td>2.5<br>
(8.7e-007s)</td>
<td>2.44<br>
(8.5e-007s)</td>
<td>5.26<br>
(1.83e-006s)</td>
<td><font color="#008000">1<br>
(3.49e-007s)</font></td>
</tr>
<tr>
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
<td>+3.14159</td>
<td>1.31<br>
(4.96e-007s)</td>
<td>1.92<br>
(7.26e-007s)</td>
<td>1.26<br>
(4.77e-007s)</td>
<td>1.2<br>
(4.53e-007s)</td>
<td>9.71<br>
(3.66e-006s)</td>
<td><font color="#008000">1<br>
(3.77e-007s)</font></td>
</tr>
<tr>
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
<td>-3.14159</td>
<td>1.32<br>
(4.97e-007s)</td>
<td>1.92<br>
(7.26e-007s)</td>
<td>1.24<br>
(4.67e-007s)</td>
<td>1.2<br>
(4.53e-007s)</td>
<td>9.7<br>
(3.66e-006s)</td>
<td><font color="#008000">1<br>
(3.78e-007s)</font></td>
</tr>
</table>
<br>
<br>
<hr>
<p>Copyright John Maddock April 2003, all rights reserved.</p>
</body>
</html>

View File

@ -0,0 +1,115 @@
/*
*
* Copyright (c) 2003
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE regex_iterator_example_2.cpp
* VERSION see <boost/version.hpp>
* DESCRIPTION: regex_iterator example 2: searches a cpp file for class definitions,
* using global data.
*/
#include <string>
#include <map>
#include <fstream>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
// purpose:
// takes the contents of a file in the form of a string
// and searches for all the C++ class definitions, storing
// their locations in a map of strings/int's
typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type;
const char* re =
// possibly leading whitespace:
"^[[:space:]]*"
// possible template declaration:
"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
// class or struct:
"(class|struct)[[:space:]]*"
// leading declspec macros etc:
"("
"\\<\\w+\\>"
"("
"[[:blank:]]*\\([^)]*\\)"
")?"
"[[:space:]]*"
")*"
// the class name
"(\\<\\w*\\>)[[:space:]]*"
// template specialisation parameters
"(<[^;:{]+>)?[[:space:]]*"
// terminate in { or :
"(\\{|:[^;\\{()]*\\{)";
boost::regex expression(re);
map_type class_index;
bool regex_callback(const boost::match_results<std::string::const_iterator>& what)
{
// what[0] contains the whole string
// what[5] contains the class name.
// what[6] contains the template specialisation if any.
// add class name and position to map:
class_index[what[5].str() + what[6].str()] = what.position(5);
return true;
}
void load_file(std::string& s, std::istream& is)
{
s.erase();
s.reserve(is.rdbuf()->in_avail());
char c;
while(is.get(c))
{
if(s.capacity() == s.size())
s.reserve(s.capacity() * 3);
s.append(1, c);
}
}
int main(int argc, const char** argv)
{
std::string text;
for(int i = 1; i < argc; ++i)
{
cout << "Processing file " << argv[i] << endl;
std::ifstream fs(argv[i]);
load_file(text, fs);
// construct our iterators:
boost::regex_iterator<std::string::const_iterator> m1(text.begin(), text.end(), expression);
boost::regex_iterator<std::string::const_iterator> m2;
std::for_each(m1, m2, &regex_callback);
// copy results:
cout << class_index.size() << " matches found" << endl;
map_type::iterator c, d;
c = class_index.begin();
d = class_index.end();
while(c != d)
{
cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl;
++c;
}
class_index.erase(class_index.begin(), class_index.end());
}
return 0;
}

View File

@ -0,0 +1,138 @@
/*
*
* Copyright (c) 1998-2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE regex_replace_example.cpp
* VERSION see <boost/version.hpp>
* DESCRIPTION: regex_replace example:
* converts a C++ file to syntax highlighted HTML.
*/
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <iterator>
#include <boost/regex.hpp>
#include <fstream>
#include <iostream>
// purpose:
// takes the contents of a file and transform to
// syntax highlighted code in html format
boost::regex e1, e2;
extern const char* expression_text;
extern const char* format_string;
extern const char* pre_expression;
extern const char* pre_format;
extern const char* header_text;
extern const char* footer_text;
void load_file(std::string& s, std::istream& is)
{
s.erase();
s.reserve(is.rdbuf()->in_avail());
char c;
while(is.get(c))
{
if(s.capacity() == s.size())
s.reserve(s.capacity() * 3);
s.append(1, c);
}
}
int main(int argc, const char** argv)
{
try{
e1.assign(expression_text);
e2.assign(pre_expression);
for(int i = 1; i < argc; ++i)
{
std::cout << "Processing file " << argv[i] << std::endl;
std::ifstream fs(argv[i]);
std::string in;
load_file(in, fs);
std::string out_name = std::string(argv[i]) + std::string(".htm");
std::ofstream os(out_name.c_str());
os << header_text;
// strip '<' and '>' first by outputting to a
// temporary string stream
std::ostringstream t(std::ios::out | std::ios::binary);
std::ostream_iterator<char> oi(t);
boost::regex_replace(oi, in.begin(), in.end(), e2, pre_format, boost::match_default | boost::format_all);
// then output to final output stream
// adding syntax highlighting:
std::string s(t.str());
std::ostream_iterator<char> out(os);
boost::regex_replace(out, s.begin(), s.end(), e1, format_string, boost::match_default | boost::format_all);
os << footer_text;
}
}
catch(...)
{ return -1; }
return 0;
}
extern const char* pre_expression = "(<)|(>)|\\r";
extern const char* pre_format = "(?1&lt;)(?2&gt;)";
const char* expression_text = // preprocessor directives: index 1
"(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
// comment: index 2
"(//[^\\n]*|/\\*.*?\\*/)|"
// literals: index 3
"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
// string literals: index 4
"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
// keywords: index 5
"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
"|using|virtual|void|volatile|wchar_t|while)\\>"
;
const char* format_string = "(?1<font color=\"#008040\">$&</font>)"
"(?2<I><font color=\"#000080\">$&</font></I>)"
"(?3<font color=\"#0000A0\">$&</font>)"
"(?4<font color=\"#0000FF\">$&</font>)"
"(?5<B>$&</B>)";
const char* header_text = "<HTML>\n<HEAD>\n"
"<TITLE>Auto-generated html formated source</TITLE>\n"
"<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n"
"</HEAD>\n"
"<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n"
"<P> </P>\n<PRE>";
const char* footer_text = "</PRE>\n</BODY>\n\n";

View File

@ -0,0 +1,75 @@
/*
*
* Copyright (c) 12003
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE regex_token_iterator_example_1.cpp
* VERSION see <boost/version.hpp>
* DESCRIPTION: regex_token_iterator example: split a string into tokens.
*/
#include <boost/regex.hpp>
#include <iostream>
using namespace std;
#if defined(BOOST_MSVC) || (defined(__BORLANDC__) && (__BORLANDC__ == 0x550))
//
// problem with std::getline under MSVC6sp3
istream& getline(istream& is, std::string& s)
{
s.erase();
char c = is.get();
while(c != '\n')
{
s.append(1, c);
c = is.get();
}
return is;
}
#endif
int main(int argc)
{
string s;
do{
if(argc == 1)
{
cout << "Enter text to split (or \"quit\" to exit): ";
getline(cin, s);
if(s == "quit") break;
}
else
s = "This is a string of tokens";
boost::regex re("\\s+");
boost::regex_token_iterator<std::string::const_iterator> i(s.begin(), s.end(), re, -1);
boost::regex_token_iterator<std::string::const_iterator> j;
unsigned count = 0;
while(i != j)
{
cout << *i++ << endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
}while(argc == 1);
return 0;
}

View File

@ -0,0 +1,92 @@
/*
*
* Copyright (c) 2003
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE regex_token_iterator_example_2.cpp
* VERSION see <boost/version.hpp>
* DESCRIPTION: regex_token_iterator example: spit out linked URL's.
*/
#include <fstream>
#include <iostream>
#include <iterator>
#include <boost/regex.hpp>
boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
boost::regex::normal | boost::regbase::icase);
void load_file(std::string& s, std::istream& is)
{
s.erase();
//
// attempt to grow string buffer to match file size,
// this doesn't always work...
s.reserve(is.rdbuf()->in_avail());
char c;
while(is.get(c))
{
// use logarithmic growth stategy, in case
// in_avail (above) returned zero:
if(s.capacity() == s.size())
s.reserve(s.capacity() * 3);
s.append(1, c);
}
}
int main(int argc, char** argv)
{
std::string s;
int i;
for(i = 1; i < argc; ++i)
{
std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
s.erase();
std::ifstream is(argv[i]);
load_file(s, is);
boost::regex_token_iterator<std::string::const_iterator>
i(s.begin(), s.end(), e, 1);
boost::regex_token_iterator<std::string::const_iterator> j;
while(i != j)
{
std::cout << *i++ << std::endl;
}
}
//
// alternative method:
// test the array-literal constructor, and split out the whole
// match as well as $1....
//
for(i = 1; i < argc; ++i)
{
std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
s.erase();
std::ifstream is(argv[i]);
load_file(s, is);
const int subs[] = {1, 0,};
boost::regex_token_iterator<std::string::const_iterator>
i(s.begin(), s.end(), e, subs);
boost::regex_token_iterator<std::string::const_iterator> j;
while(i != j)
{
std::cout << *i++ << std::endl;
}
}
return 0;
}

205
faq.htm
View File

@ -1,205 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++ - FAQ</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, FAQ.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<p><font color="#FF0000">Q. Why does using parenthesis in a
regular expression change the result of a match?</font></p>
<p>Parentheses don't only mark; they determine what the best
match is as well. regex++ tries to follow the POSIX standard
leftmost longest rule for determining what matched. So if there
is more than one possible match after considering the whole
expression, it looks next at the first sub-expression and then
the second sub-expression and so on. So...</p>
<pre>&quot;(0*)([0-9]*)&quot; against &quot;00123&quot; would produce
$1 = &quot;00&quot;
$2 = &quot;123&quot;</pre>
<p>where as</p>
<pre>&quot;0*([0-9)*&quot; against &quot;00123&quot; would produce
$1 = &quot;00123&quot;</pre>
<p>If you think about it, had $1 only matched the &quot;123&quot;,
this would be &quot;less good&quot; than the match &quot;00123&quot;
which is both further to the left and longer. If you want $1 to
match only the &quot;123&quot; part, then you need to use
something like:</p>
<pre>&quot;0*([1-9][0-9]*)&quot;</pre>
<p>as the expression.</p>
<p><font color="#FF0000">Q. Configure says that my compiler is
unable to merge template instances, what does this mean?</font> </p>
<p>A. When you compile template code, you can end up with the
same template instances in multiple translation units - this will
lead to link time errors unless your compiler/linker is smart
enough to merge these template instances into a single record in
the executable file. If you see this warning after running
configure, then you can still link to libregex++.a if: </p>
<ol>
<li>You use only the low-level template classes (reg_expression&lt;&gt;
match_results&lt;&gt; etc), from a single translation
unit, and use no other part of regex++.</li>
<li>You use only the POSIX API functions (regcomp regexec etc),
and no other part of regex++.</li>
<li>You use only the high level class RegEx, and no other
part of regex++. </li>
</ol>
<p>Another option is to create a master include file, which
#include's all the regex++ source files, and all the source files
in which you use regex++. You then compile and link this master
file as a single translation unit. </p>
<p><font color="#FF0000">Q. Configure says that my compiler is
unable to merge template instances from archive files, what does
this mean?</font> </p>
<p>A. When you compile template code, you can end up with the
same template instances in multiple translation units - this will
lead to link time errors unless your compiler/linker is smart
enough to merge these template instances into a single record in
the executable file. Some compilers are able to do this for
normal .cpp or .o files, but fail if the object file has been
placed in a library archive. If you see this warning after
running configure, then you can still link to libregex++.a if: </p>
<ol>
<li>You use only the low-level template classes (reg_expression&lt;&gt;
match_results&lt;&gt; etc), and use no other part of
regex++.</li>
<li>You use only the POSIX API functions (regcomp regexec etc),
and no other part of regex++.</li>
<li>You use only the high level class RegEx, and no other
part of regex++. </li>
</ol>
<p>Another option is to add the regex++ source files directly to
your project instead of linking to libregex++.a, generally you
should do this only if you are getting link time errors with
libregex++.a. </p>
<p><font color="#FF0000">Q. Configure says that my compiler can't
merge templates containing switch statements, what does this
mean?</font> </p>
<p>A. Some compilers can't merge templates that contain static
data - this includes switch statements which implicitly generate
static data as well as code. Principally this affects the egcs
compiler - but note gcc 2.81 also suffers from this problem - the
compiler will compile and link the code - but the code will not
run because the code and the static data it uses have become
separated. The default behaviour of regex++ is to try and fix
this problem by declaring &quot;problem&quot; templates inside
unnamed namespaces, so that the templates have internal linkage.
Note that this can result in a great deal of code bloat. If the
compiler doesn't support namespaces, or if code bloat becomes a
problem, then follow the guidelines above for placing all the
templates used in a single translation unit, and edit boost/regex/config.hpp
so that BOOST_REGEX_NO_TEMPLATE_SWITCH_MERGE is no longer defined.
</p>
<p><font color="#FF0000">Q. I can't get regex++ to work with
escape characters, what's going on?</font> </p>
<p>A. If you embed regular expressions in C++ code, then remember
that escape characters are processed twice: once by the C++
compiler, and once by the regex++ expression compiler, so to pass
the regular expression \d+ to regex++, you need to embed &quot;\\d+&quot;
in your code. Likewise to match a literal backslash you will need
to embed &quot;\\\\&quot; in your code. </p>
<p><font color="#FF0000">Q. Why don't character ranges work
properly?</font> <br>
A. The POSIX standard specifies that character range expressions
are locale sensitive - so for example the expression [A-Z] will
match any collating element that collates between 'A' and 'Z'.
That means that for most locales other than &quot;C&quot; or
&quot;POSIX&quot;, [A-Z] would match the single character 't' for
example, which is not what most people expect - or at least not
what most people have come to expect from regular expression
engines. For this reason, the default behaviour of regex++ is to
turn locale sensitive collation off by setting the regbase::nocollate
compile time flag (this is set by regbase::normal). However if
you set a non-default compile time flag - for example regbase::extended
or regbase::basic, then locale dependent collation will be
enabled, this also applies to the POSIX API functions which use
either regbase::extended or regbase::basic internally, in the
latter case use REG_NOCOLLATE in combination with either
REG_BASIC or REG_EXTENDED when invoking regcomp if you don't want
locale sensitive collation. <i>[Note - when regbase::nocollate in
effect, the library behaves &quot;as if&quot; the LC_COLLATE
locale category were always &quot;C&quot;, regardless of what its
actually set to - end note</i>]. </p>
<p><font color="#FF0000">&nbsp;Q. Why can't I use the &quot;convenience&quot;
versions of query_match/reg_search/reg_grep/reg_format/reg_merge?</font>
</p>
<p>A. These versions may or may not be available depending upon
the capabilities of your compiler, the rules determining the
format of these functions are quite complex - and only the
versions visible to a standard compliant compiler are given in
the help. To find out what your compiler supports, run &lt;boost/regex.hpp&gt;
through your C++ pre-processor, and search the output file for
the function that you are interested in. </p>
<p><font color="#FF0000">Q. Why are there no throw specifications
on any of the functions? What exceptions can the library throw?</font>
</p>
<p>A. Not all compilers support (or honor) throw specifications,
others support them but with reduced efficiency. Throw
specifications may be added at a later date as compilers begin to
handle this better. The library should throw only three types of
exception: boost::bad_expression can be thrown by reg_expression
when compiling a regular expression, std::runtime_error can be
thrown when a call to reg_expression::imbue tries to open a
message catalogue that doesn't exist or when a call to RegEx::GrepFiles
or RegEx::FindFiles tries to open a file that cannot be opened,
finally std::bad_alloc can be thrown by just about any of the
functions in this library. </p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,243 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, Format String Reference</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Format
String Reference.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="format_string"></a>Format String Syntax</h3>
<p>Format strings are used by the algorithms <a
href="template_class_ref.htm#reg_format">regex_format</a> and <a
href="template_class_ref.htm#reg_merge">regex_merge</a>, and are
used to transform one string into another. </p>
<p>There are three kind of format string: sed, perl and extended,
the extended syntax is the default so this is covered first. </p>
<p><b><i>Extended format syntax</i></b> </p>
<p>In format strings, all characters are treated as literals
except: ()$\?: </p>
<p>To use any of these as literals you must prefix them with the
escape character \ </p>
<p>The following special sequences are recognized: <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Grouping:</i> </p>
<p>Use the parenthesis characters ( and ) to group sub-expressions
within the format string, use \( and \) to represent literal '('
and ')'. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Sub-expression expansions:</i> </p>
<p>The following perl like expressions expand to a particular
matched sub-expression: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$`</td>
<td valign="top" width="43%">Expands to all the text from
the end of the previous match to the start of the current
match, if there was no previous match in the current
operation, then everything from the start of the input
string to the start of the match.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$'</td>
<td valign="top" width="43%">Expands to all the text from
the end of the match to the end of the input string.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$&amp;</td>
<td valign="top" width="43%">Expands to all of the
current match.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$0</td>
<td valign="top" width="43%">Expands to all of the
current match.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$N</td>
<td valign="top" width="43%">Expands to the text that
matched sub-expression <i>N</i>.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><i>Conditional expressions:</i> </p>
<p>Conditional expressions allow two different format strings to
be selected dependent upon whether a sub-expression participated
in the match or not: </p>
<p>?Ntrue_expression:false_expression </p>
<p>Executes true_expression if sub-expression <i>N</i>
participated in the match, otherwise executes false_expression. </p>
<p>Example: suppose we search for &quot;(while)|(for)&quot; then
the format string &quot;?1WHILE:FOR&quot; would output what
matched, but in upper case. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Escape sequences:</i> </p>
<p>The following escape sequences are also allowed: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\a</td>
<td valign="top" width="43%">The bell character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\f</td>
<td valign="top" width="43%">The form feed character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\n</td>
<td valign="top" width="43%">The newline character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\r</td>
<td valign="top" width="43%">The carriage return
character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\t</td>
<td valign="top" width="43%">The tab character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\v</td>
<td valign="top" width="43%">A vertical tab character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\x</td>
<td valign="top" width="43%">A hexadecimal character -
for example \x0D.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\x{}</td>
<td valign="top" width="43%">A possible unicode
hexadecimal character - for example \x{1A0}</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\cx</td>
<td valign="top" width="43%">The ASCII escape character
x, for example \c@ is equivalent to escape-@.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\e</td>
<td valign="top" width="43%">The ASCII escape character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\dd</td>
<td valign="top" width="43%">An octal character constant,
for example \10.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><b><i>Perl format strings</i></b> </p>
<p>Perl format strings are the same as the default syntax except
that the characters ()?: have no special meaning. </p>
<p><b><i>Sed format strings</i></b> </p>
<p>Sed format strings use only the characters \ and &amp; as
special characters. </p>
<p>\n where n is a digit, is expanded to the nth sub-expression. </p>
<p>&amp; is expanded to the whole of the match (equivalent to \0).
</p>
<p>Other escape sequences are expanded as per the default syntax.
<br>
</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,572 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, RegEx Class Reference</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, RegEx Class
Reference. </h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="RegEx"></a><i>Class RegEx</i></h3>
<p>#include &lt;boost/cregex.hpp&gt; </p>
<p>The class RegEx provides a high level simplified interface to
the regular expression library, this class only handles narrow
character strings, and regular expressions always follow the
&quot;normal&quot; syntax - that is the same as the standard
POSIX extended syntax, but with locale specific collation
disabled, and escape characters inside character set declarations
are allowed. </p>
<pre><b>typedef</b> <b>bool</b> (*GrepCallback)(<b>const</b> RegEx&amp; expression);
<b>typedef</b> <b>bool</b> (*GrepFileCallback)(<b>const</b> <b>char</b>* file, <b>const</b> RegEx&amp; expression);
<b>typedef</b> <b>bool</b> (*FindFilesCallback)(<b>const</b> <b>char</b>* file);
<b>class</b>&nbsp; RegEx
{
<b>public</b>:
&nbsp;&nbsp; RegEx();
&nbsp;&nbsp; RegEx(<b>const</b> RegEx&amp; o);
&nbsp;&nbsp; ~RegEx();
&nbsp;&nbsp; RegEx(<b>const</b> <b>char</b>* c, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; <strong>explicit</strong> RegEx(<b>const</b> std::string&amp; s, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; RegEx&amp; <b>operator</b>=(<b>const</b> RegEx&amp; o);
&nbsp;&nbsp; RegEx&amp; <b>operator</b>=(<b>const</b> <b>char</b>* p);
&nbsp;&nbsp; RegEx&amp; <b>operator</b>=(<b>const</b> std::string&amp; s);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> SetExpression(<b>const</b> <b>char</b>* p, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> SetExpression(<b>const</b> std::string&amp; s, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; std::string Expression()<b>const</b>;
&nbsp;&nbsp; <font color="#000080"><i>//
</i>&nbsp;&nbsp;<i>// now matching operators: </i>
&nbsp;&nbsp; <i>// </i></font>
&nbsp;&nbsp; <b>bool</b> Match(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>bool</b> Match(<b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>bool</b> Search(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>bool</b> Search(<b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;<b>unsigned</b> <b>int</b>&gt;&amp; v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;<b>unsigned</b> <b>int</b>&gt;&amp; v, <b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> std::string&amp; files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> std::string&amp; files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; std::string Merge(<b>const</b> std::string&amp; in, <b>const</b> std::string&amp; fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; std::string Merge(<b>const</b> char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned int </b>flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> Split(std::vector&lt;std::string&gt;&amp; v, std::string&amp; s, <b>unsigned</b> flags = match_default, <b>unsigned</b> max_count = ~0);
&nbsp;&nbsp; <font color="#000080"><i>//
</i>&nbsp;&nbsp; <i>// now operators for returning what matched in more detail:
</i>&nbsp;&nbsp; <i>//
</i></font>&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Position(<b>int</b> i = 0)<b>const</b>;
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Length(<b>int</b> i = 0)<b>const</b>;
<strong>bool</strong> Matched(<strong>int</strong> i = 0)<strong>const</strong>;
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Line()<b>const</b>;
&nbsp;&nbsp; <b>unsigned int</b> Marks() const;
&nbsp;&nbsp; std::string What(<b>int</b> i)<b>const</b>;
&nbsp;&nbsp; std::string <b>operator</b>[](<b>int</b> i)<b>const</b> ;
<strong>static const unsigned int</strong> npos;
}; &nbsp; &nbsp; </pre>
<p>Member functions for class RegEx are defined as follows: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx();</td>
<td valign="top" width="42%">Default constructor,
constructs an instance of RegEx without any valid
expression.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx(<b>const</b>
RegEx&amp; o);</td>
<td valign="top" width="42%">Copy constructor, all the
properties of parameter <i>o</i> are copied.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx(<b>const</b> <b>char</b>*
c, <b>bool</b> icase = <b>false</b>);</td>
<td valign="top" width="42%">Constructs an instance of
RegEx, setting the expression to <i>c</i>, if <i>icase</i>
is <i>true</i> then matching is insensitive to case,
otherwise it is sensitive to case. Throws <i>bad_expression</i>
on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx(<b>const</b> std::string&amp;
s, <b>bool</b> icase = <b>false</b>);</td>
<td valign="top" width="42%">Constructs an instance of
RegEx, setting the expression to <i>s</i>, if <i>icase </i>is
<i>true</i> then matching is insensitive to case,
otherwise it is sensitive to case. Throws <i>bad_expression</i>
on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx&amp; <b>operator</b>=(<b>const</b>
RegEx&amp; o);</td>
<td valign="top" width="42%">Default assignment operator.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx&amp; <b>operator</b>=(<b>const</b>
<b>char</b>* p);</td>
<td valign="top" width="42%">Assignment operator,
equivalent to calling <i>SetExpression(p, false).</i>
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx&amp; <b>operator</b>=(<b>const</b>
std::string&amp; s);</td>
<td valign="top" width="42%">Assignment operator,
equivalent to calling <i>SetExpression(s, false).</i>
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
SetExpression(<b>constchar</b>* p, <b>bool</b> icase = <b>false</b>);</td>
<td valign="top" width="42%">Sets the current expression
to <i>p</i>, if <i>icase</i> is <i>true</i> then matching
is insensitive to case, otherwise it is sensitive to case.
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
SetExpression(<b>const</b> std::string&amp; s, <b>bool</b>
icase = <b>false</b>);</td>
<td valign="top" width="42%">Sets the current expression
to <i>s</i>, if <i>icase</i> is <i>true</i> then matching
is insensitive to case, otherwise it is sensitive to case.
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string Expression()<b>const</b>;</td>
<td valign="top" width="42%">Returns a copy of the
current regular expression.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Attempts to match the
current expression against the text <i>p</i> using the
match flags <i>flags</i> - see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the expression matches the whole
of the input string.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default) ;</td>
<td valign="top" width="42%">Attempts to match the
current expression against the text <i>s</i> using the
match flags <i>flags</i> - see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the expression matches the whole
of the input string.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Attempts to find a match for
the current expression somewhere in the text <i>p</i>
using the match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the match succeeds.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default) ;</td>
<td valign="top" width="42%">Attempts to find a match for
the current expression somewhere in the text <i>s</i>
using the match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the match succeeds.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>p</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match found calls the call-back function <i>cb</i>
as: cb(*this); <p>If at any stage the call-back function
returns false then the grep operation terminates,
otherwise continues until no further matches are found.
Returns the number of matches found.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(GrepCallback cb, <b>const</b> std::string&amp; s, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>s</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match found calls the call-back function <i>cb</i>
as: cb(*this); <p>If at any stage the call-back function
returns false then the grep operation terminates,
otherwise continues until no further matches are found.
Returns the number of matches found. </p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b> <b>char</b>*
p, <b>unsigned</b> <b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>p</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes a copy of what matched onto <i>v</i>.
Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>s</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes a copy of what matched onto <i>v</i>.
Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;<b>unsigned int</b>&gt;&amp; v, <b>const</b>
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>p</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes the starting index of what matched
onto <i>v</i>. Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;<b>unsigned int</b>&gt;&amp; v, <b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>s</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes the starting index of what matched
onto <i>v</i>. Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>*
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the files <i>files</i> using the
match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match calls the call-back function cb.&nbsp; <p>If
the call-back returns false then the algorithm returns
without considering further matches in the current file,
or any further files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of matches found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
GrepFiles(GrepFileCallback cb, <b>const</b> std::string&amp;
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the files <i>files</i> using the
match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match calls the call-back function cb.&nbsp; <p>If
the call-back returns false then the algorithm returns
without considering further matches in the current file,
or any further files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of matches found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>*
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Searches <i>files</i> to
find all those which contain at least one match of the
current expression using the match flags <i>flags </i>-
see <a href="template_class_ref.htm#match_type">match
flags</a>. For each matching file calls the call-back
function cb.&nbsp; <p>If the call-back returns false then
the algorithm returns without considering any further
files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of files found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
FindFiles(FindFilesCallback cb, <b>const</b> std::string&amp;
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Searches <i>files</i> to
find all those which contain at least one match of the
current expression using the match flags <i>flags </i>-
see <a href="template_class_ref.htm#match_type">match
flags</a>. For each matching file calls the call-back
function cb.&nbsp; <p>If the call-back returns false then
the algorithm returns without considering any further
files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of files found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string Merge(<b>const</b>
std::string&amp; in, <b>const</b> std::string&amp; fmt, <b>bool</b>
copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Performs a search and
replace operation: searches through the string <i>in</i>
for all occurrences of the current expression, for each
occurrence replaces the match with the format string <i>fmt</i>.
Uses <i>flags</i> to determine what gets matched, and how
the format string should be treated. If <i>copy</i> is
true then all unmatched sections of input are copied
unchanged to output, if the flag <em>format_first_only</em>
is set then only the first occurance of the pattern found
is replaced. Returns the new string. See <a
href="format_string.htm#format_string">also format string
syntax</a>, <a href="template_class_ref.htm#match_type">match
flags</a> and <a
href="template_class_ref.htm#format_flags">format flags</a>.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string Merge(<b>const</b>
char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>,
<b>unsigned int </b>flags = match_default);</td>
<td valign="top" width="42%">Performs a search and
replace operation: searches through the string <i>in</i>
for all occurrences of the current expression, for each
occurrence replaces the match with the format string <i>fmt</i>.
Uses <i>flags</i> to determine what gets matched, and how
the format string should be treated. If <i>copy</i> is
true then all unmatched sections of input are copied
unchanged to output, if the flag <em>format_first_only</em>
is set then only the first occurance of the pattern found
is replaced. Returns the new string. See <a
href="format_string.htm#format_string">also format string
syntax</a>, <a href="template_class_ref.htm#match_type">match
flags</a> and <a
href="template_class_ref.htm#format_flags">format flags</a>.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top"><b>unsigned</b> Split(std::vector&lt;std::string&gt;&amp;
v, std::string&amp; s, <b>unsigned</b> flags =
match_default, <b>unsigned</b> max_count = ~0);</td>
<td valign="top">Splits the input string and pushes each
one onto the vector. If the expression contains no marked
sub-expressions, then one string is outputted for each
section of the input that does not match the expression.
If the expression does contain marked sub-expressions,
then outputs one string for each marked sub-expression
each time a match occurs. Outputs no more than <i>max_count
</i>strings. Before returning, deletes from the input
string <i>s</i> all of the input that has been processed
(all of the string if <i>max_count</i> was not reached).
Returns the number of strings pushed onto the vector.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Position(<b>int</b> i = 0)<b>const</b>;</td>
<td valign="top" width="42%">Returns the position of what
matched sub-expression <i>i</i>. If <i>i = 0</i> then
returns the position of the whole match. Returns RegEx::npos
if the supplied index is invalid, or if the specified sub-expression
did not participate in the match.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Length(<b>int</b> i = 0)<b>const</b>;</td>
<td valign="top" width="42%">Returns the length of what
matched sub-expression <i>i</i>. If <i>i = 0</i> then
returns the length of the whole match. Returns RegEx::npos
if the supplied index is invalid, or if the specified sub-expression
did not participate in the match.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td><strong>bool</strong> Matched(<strong>int</strong> i
= 0)<strong>const</strong>;</td>
<td>Returns true if sub-expression <em>i</em> was
matched, false otherwise.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Line()<b>const</b>;</td>
<td valign="top" width="42%">Returns the line on which
the match occurred, indexes start from 1 not zero, if no
match occurred then returns RegEx::npos.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned int</b> Marks()
const;</td>
<td valign="top" width="42%">Returns the number of marked
sub-expressions contained in the expression. Note that
this includes the whole match (sub-expression zero), so
the value returned is always &gt;= 1.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string What(<b>int</b>
i)<b>const</b>;</td>
<td valign="top" width="42%">Returns a copy of what
matched sub-expression <i>i</i>. If <i>i = 0</i> then
returns a copy of the whole match. Returns a null string
if the index is invalid or if the specified sub-expression
did not participate in a match.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string <b>operator</b>[](<b>int</b>
i)<b>const</b> ;</td>
<td valign="top" width="42%">Returns <i>what(i);</i> <p>Can
be used to simplify access to sub-expression matches, and
make usage more perl-like.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
</table>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

150
index.htm
View File

@ -1,150 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="keywords"
content="regex++, regular expressions, regular expression library, C++">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>regex++, Index</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="277" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Index.</h3>
<p align="left"><i>(Version 3.31, 16th Dec 2001)</i>&nbsp;
</p>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3 align="center">Contents</h3>
<ul>
<li><a href="introduction.htm#intro">Introduction</a></li>
<li><a href="introduction.htm#Installation">Installation and
Configuration</a> </li>
<li><a href="template_class_ref.htm#regbase">Template Class
and Algorithm Reference</a> <ul>
<li>Class <a href="template_class_ref.htm#regbase">regbase</a></li>
<li>Class <a
href="template_class_ref.htm#bad_expression">bad_expression</a>
</li>
<li>Class <a
href="template_class_ref.htm#reg_expression">reg_expression</a>
</li>
<li>Class <a
href="template_class_ref.htm#regex_char_traits">char_regex_traits</a></li>
<li>Class <a href="template_class_ref.htm#reg_match">match_results</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#query_match">regex_match</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#reg_search">regex_search</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#reg_grep">regex_grep</a></li>
<li>Algorithm <a
href="template_class_ref.htm#reg_format">regex_format</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#reg_merge">regex_merge</a></li>
<li>Algorithm <a
href="template_class_ref.htm#regex_split">regex_split</a>
</li>
<li><a href="template_class_ref.htm#partial_matches">Partial
regular expression matches</a></li>
</ul>
</li>
<li>Class <a href="hl_ref.htm#RegEx">RegEx</a> reference</li>
<li><a href="posix_ref.htm#posix">POSIX Compatibility
Functions</a></li>
<li><a href="syntax.htm#syntax">Regular Expression Syntax</a></li>
<li><a href="format_string.htm#format_string">Format String
Syntax</a></li>
<li><a href="appendix.htm#implementation">Appendices</a> <ul>
<li><a href="appendix.htm#implementation">Implementation
notes</a></li>
<li><a href="appendix.htm#threads">Thread safety</a></li>
<li><a href="appendix.htm#localisation">Localization</a></li>
<li><a href="appendix.htm#demos">Example Applications</a>
<ul>
<li><a
href="example/snippets/regex_match_example.cpp">regex_match_example.cpp</a>:
ftp based regex_match example.</li>
<li><a
href="example/snippets/regex_search_example.cpp">regex_search_example.cpp</a>:
regex_search example: searches a cpp file
for class definitions.</li>
<li><a
href="example/snippets/regex_grep_example_1.cpp">regex_grep_example_1.cpp</a>:
regex_grep example 1: searches a cpp file
for class definitions.</li>
<li><a
href="example/snippets/regex_merge_example.cpp">regex_merge_example.cpp</a>:
regex_merge example: converts a C++ file
to syntax highlighted HTML.</li>
<li><a
href="example/snippets/regex_grep_example_2.cpp">regex_grep_example_2.cpp</a>:
regex_grep example 2: searches a cpp file
for class definitions, using a global
callback function. </li>
<li><a
href="example/snippets/regex_grep_example_3.cpp">regex_grep_example_3.cpp</a>:
regex_grep example 2: searches a cpp file
for class definitions, using a bound
member function callback.</li>
<li><a
href="example/snippets/regex_grep_example_4.cpp">regex_grep_example_4.cpp</a>:
regex_grep example 2: searches a cpp file
for class definitions, using a C++
Builder closure as a callback.</li>
<li><a
href="example/snippets/regex_split_example_1.cpp">regex_split_example_1.cpp</a>:
regex_split example: split a string into
tokens.</li>
<li><a
href="example/snippets/regex_split_example_2.cpp">regex_split_example_2.cpp</a>:
regex_split example: spit out linked
URL's.</li>
</ul>
</li>
<li><a href="appendix.htm#headers">Header Files.</a></li>
<li><a href="appendix.htm#redist">Redistributables</a></li>
<li><a href="appendix.htm#upgrade">Note for upgraders</a></li>
</ul>
</li>
<li><a href="appendix.htm#furtherInfo">Further Information (Contacts
and Acknowledgements)</a></li>
<li><a href="faq.htm">FAQ</a></li>
</ul>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
</body>
</html>

9
index.html Normal file
View File

@ -0,0 +1,9 @@
<html>
<head>
<meta http-equiv="refresh" content="0; URL=doc/index.html">
</head>
<body>
Automatic redirection failed, please go to <A href="doc/index.html">doc/index.html</A>.
</body>
</html>

View File

@ -1,476 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="keywords"
content="regex++, regular expressions, regular expression library, C++">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>regex++, Introduction</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Introduction.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="intro"></a><i>Introduction</i></h3>
<p>Regular expressions are a form of pattern-matching that are
often used in text processing; many users will be familiar with
the Unix utilities <i>grep</i>, <i>sed</i> and <i>awk</i>, and
the programming language <i>perl</i>, each of which make
extensive use of regular expressions. Traditionally C++ users
have been limited to the POSIX C API's for manipulating regular
expressions, and while regex++ does provide these API's, they do
not represent the best way to use the library. For example regex++
can cope with wide character strings, or search and replace
operations (in a manner analogous to either sed or perl),
something that traditional C libraries can not do.</p>
<p>The class <a href="template_class_ref.htm#reg_expression">boost::reg_expression</a>
is the key class in this library; it represents a &quot;machine
readable&quot; regular expression, and is very closely modelled
on std::basic_string, think of it as a string plus the actual
state-machine required by the regular expression algorithms. Like
std::basic_string there are two typedefs that are almost always
the means by which this class is referenced:</p>
<pre><b>namespace </b>boost{
<b>template</b> &lt;<b>class</b> charT,
<b> class</b> traits = regex_traits&lt;charT&gt;,
<b>class</b> Allocator = std::allocator&lt;charT&gt; &gt;
<b>class</b> reg_expression;
<b>typedef</b> reg_expression&lt;<b>char</b>&gt; regex;
<b>typedef</b> reg_expression&lt;<b>wchar_t&gt;</b> wregex;
}</pre>
<p>To see how this library can be used, imagine that we are
writing a credit card processing application. Credit card numbers
generally come as a string of 16-digits, separated into groups of
4-digits, and separated by either a space or a hyphen. Before
storing a credit card number in a database (not necessarily
something your customers will appreciate!), we may want to verify
that the number is in the correct format. To match any digit we
could use the regular expression [0-9], however ranges of
characters like this are actually locale dependent. Instead we
should use the POSIX standard form [[:digit:]], or the regex++
and perl shorthand for this \d (note that many older libraries
tended to be hard-coded to the C-locale, consequently this was
not an issue for them). That leaves us with the following regular
expression to validate credit card number formats:</p>
<p>(\d{4}[- ]){3}\d{4}</p>
<p>Here the parenthesis act to group (and mark for future
reference) sub-expressions, and the {4} means &quot;repeat
exactly 4 times&quot;. This is an example of the extended regular
expression syntax used by perl, awk and egrep. Regex++ also
supports the older &quot;basic&quot; syntax used by sed and grep,
but this is generally less useful, unless you already have some
basic regular expressions that you need to reuse.</p>
<p>Now lets take that expression and place it in some C++ code to
validate the format of a credit card number:</p>
<pre><b>bool</b> validate_card_format(<b>const</b> std::string s)
{
<b>static</b> <b>const</b> <a
href="template_class_ref.htm#reg_expression">boost::regex</a> e(&quot;(\\d{4}[- ]){3}\\d{4}&quot;);
<b>return</b> <a href="template_class_ref.htm#query_match">regex_match</a>(s, e);
}</pre>
<p>Note how we had to add some extra escapes to the expression:
remember that the escape is seen once by the C++ compiler, before
it gets to be seen by the regular expression engine, consequently
escapes in regular expressions have to be doubled up when
embedding them in C/C++ code. Also note that all the examples
assume that your compiler supports Koenig lookup, if yours
doesn't (for example VC6), then you will have to add some boost::
prefixes to some of the function calls in the examples.</p>
<p>Those of you who are familiar with credit card processing,
will have realised that while the format used above is suitable
for human readable card numbers, it does not represent the format
required by online credit card systems; these require the number
as a string of 16 (or possibly 15) digits, without any
intervening spaces. What we need is a means to convert easily
between the two formats, and this is where search and replace
comes in. Those who are familiar with the utilities <i>sed</i>
and <i>perl</i> will already be ahead here; we need two strings -
one a regular expression - the other a &quot;<a
href="format_string.htm">format string</a>&quot; that provides a
description of the text to replace the match with. In regex++
this search and replace operation is performed with the algorithm
regex_merge, for our credit card example we can write two
algorithms like this to provide the format conversions:</p>
<pre>
<i>// match any format with the regular expression:
</i><b>const</b> boost::regex e(&quot;\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z&quot;);
<b>const</b> std::string machine_format(&quot;\\1\\2\\3\\4&quot;);
<b>const</b> std::string human_format(&quot;\\1-\\2-\\3-\\4&quot;);
std::string machine_readable_card_number(<b>const</b> std::string s)
{
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, machine_format, boost::match_default | boost::format_sed);
}
std::string human_readable_card_number(<b>const</b> std::string s)
{
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, human_format, boost::match_default | boost::format_sed);
}</pre>
<p>Here we've used marked sub-expressions in the regular
expression to split out the four parts of the card number as
separate fields, the format string then uses the sed-like syntax
to replace the matched text with the reformatted version.</p>
<p>In the examples above, we haven't directly manipulated the
results of a regular expression match, however in general the
result of a match contains a number of sub-expression matches in
addition to the overall match. When the library needs to report a
regular expression match it does so using an instance of the
class <a href="template_class_ref.htm#reg_match">match_results</a>,
as before there are typedefs of this class for the most common
cases: </p>
<pre><b>namespace </b>boost{
<b>typedef</b> match_results&lt;<b>const</b> <b>char</b>*&gt; cmatch;
<b>typedef</b> match_results&lt;<b>const</b> <b>wchar_t</b>*&gt; wcmatch;
<strong>typedef</strong> match_results&lt;std::string::const_iterator&gt; smatch;
<strong>typedef</strong> match_results&lt;std::wstring::const_iterator&gt; wsmatch;
}</pre>
<p>The algorithms <a href="template_class_ref.htm#reg_search">regex_search</a>
and <a href="template_class_ref.htm#reg_grep">regex_grep</a> (i.e.
finding all matches in a string) make use of match_results to
report what matched.</p>
<p>Note that these algorithms are not restricted to searching
regular C-strings, any bidirectional iterator type can be
searched, allowing for the possibility of seamlessly searching
almost any kind of data. </p>
<p>For search and replace operations in addition to the algorithm
<a href="template_class_ref.htm#reg_merge">regex_merge</a> that
we have already seen, the algorithm <a
href="template_class_ref.htm#reg_format">regex_format</a> takes
the result of a match and a format string, and produces a new
string by merging the two.</p>
<p>For those that dislike templates, there is a high level
wrapper class RegEx that is an encapsulation of the lower level
template code - it provides a simplified interface for those that
don't need the full power of the library, and supports only
narrow characters, and the &quot;extended&quot; regular
expression syntax. </p>
<p>The <a href="posix_ref.htm#posix">POSIX API</a> functions:
regcomp, regexec, regfree and regerror, are available in both
narrow character and Unicode versions, and are provided for those
who need compatibility with these API's. </p>
<p>Finally, note that the library now has run-time <a
href="appendix.htm#localisation">localization</a> support, and
recognizes the full POSIX regular expression syntax - including
advanced features like multi-character collating elements and
equivalence classes - as well as providing compatibility with
other regular expression libraries including GNU and BSD4 regex
packages, and to a more limited extent perl 5. </p>
<h3><a name="Installation"></a><i>Installation and Configuration
Options</i> </h3>
<p><em>[ </em><strong><i>Important</i></strong><em>: If you are
upgrading from the 2.x version of this library then you will find
a number of changes to the documented header names and library
interfaces, existing code should still compile unchanged however
- see </em><a href="appendix.htm#upgrade"><font color="#0000FF"><em>Note
for Upgraders</em></font></a><em>. ]</em></p>
<p>When you extract the library from its zip file, you must
preserve its internal directory structure (for example by using
the -d option when extracting). If you didn't do that when
extracting, then you'd better stop reading this, delete the files
you just extracted, and try again! </p>
<p>This library should not need configuring before use; most
popular compilers/standard libraries/platforms are already
supported &quot;as is&quot;. If you do experience configuration
problems, or just want to test the configuration with your
compiler, then the process is the same as for all of boost; see
the <a href="../config/config.htm">configuration library
documentation</a>.</p>
<p>The library will encase all code inside namespace boost. </p>
<p>Unlike some other template libraries, this library consists of
a mixture of template code (in the headers) and static code and
data (in cpp files). Consequently it is necessary to build the
library's support code into a library or archive file before you
can use it, instructions for specific platforms are as follows: </p>
<p><b>Borland C++ Builder:</b> </p>
<ul>
<li>Open up a console window and change to the
&lt;boost&gt;\libs\regex\build directory. </li>
<li>Select the appropriate makefile (bcb4.mak for C++ Builder
4, bcb5.mak for C++ Builder 5, and bcb6.mak for C++
Builder 6). </li>
<li>Invoke the makefile (pass the full path to your version
of make if you have more than one version installed, the
makefile relies on the path to make to obtain your C++
Builder installation directory and tools) for example: </li>
</ul>
<pre>make -fbcb5.mak</pre>
<p>The build process will build a variety of .lib and .dll files
(the exact number depends upon the version of Borland's tools you
are using) the .lib and dll files will be in a sub-directory
called bcb4 or bcb5 depending upon the makefile used. To install
the libraries into your development system use:</p>
<p>make -fbcb5.mak install</p>
<p>library files will be copied to &lt;BCROOT&gt;/lib and the
dll's to &lt;BCROOT&gt;/bin, where &lt;BCROOT&gt; corresponds to
the install path of your Borland C++ tools. </p>
<p>You may also remove temporary files created during the build
process (excluding lib and dll files) by using:</p>
<p>make -fbcb5.mak clean</p>
<p>Finally when you use regex++ it is only necessary for you to
add the &lt;boost&gt; root director to your list of include
directories for that project. It is not necessary for you to
manually add a .lib file to the project; the headers will
automatically select the correct .lib file for your build mode
and tell the linker to include it. There is one caveat however:
the library can not tell the difference between VCL and non-VCL
enabled builds when building a GUI application from the command
line, if you build from the command line with the 5.5 command
line tools then you must define the pre-processor symbol _NO_VCL
in order to ensure that the correct link libraries are selected:
the C++ Builder IDE normally sets this automatically. Hint, users
of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg
in order to set this option permanently. </p>
<p>If you would prefer to do a static link to the regex libraries
even when using the dll runtime then define
BOOST_REGEX_STATIC_LINK, and if you want to suppress automatic
linking altogether (and supply your own custom build of the lib)
then define BOOST_REGEX_NO_LIB.</p>
<p>If you are building with C++ Builder 6, you will find that
&lt;boost/regex.hpp&gt; can not be used in a pre-compiled header
(the actual problem is in &lt;locale&gt; which gets included by
&lt;boost/regex.hpp&gt;), if this causes problems for you, then
try defining BOOST_NO_STD_LOCALE when building, this will disable
some features throughout boost, but may save you a lot in compile
times!</p>
<p><b>Microsoft Visual C++ 6</b><strong> and 7</strong></p>
<p>You need version 6 of MSVC to build this library. If you are
using VC5 then you may want to look at one of the previous
releases of this <a
href="http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm">library</a>
</p>
<p>Open up a command prompt, which has the necessary MSVC
environment variables defined (for example by using the batch
file Vcvars32.bat installed by the Visual Studio installation),
and change to the &lt;boost&gt;\libs\regex\build directory. </p>
<p>Select the correct makefile - vc6.mak for &quot;vanilla&quot;
Visual C++ 6 or vc6-stlport.mak if you are using STLPort.</p>
<p>Invoke the makefile like this:</p>
<p>nmake -fvc6.mak</p>
<p>You will now have a collection of lib and dll files in a
&quot;vc6&quot; subdirectory, to install these into your
development system use:</p>
<p>nmake -fvc6.mak install</p>
<p>The lib files will be copied to your &lt;VC6&gt;\lib directory
and the dll files to &lt;VC6&gt;\bin, where &lt;VC6&gt; is the
root of your Visual C++ 6 installation.</p>
<p>You can delete all the temporary files created during the
build (excluding lib and dll files) using:</p>
<p>nmake -fvc6.mak clean </p>
<p>Finally when you use regex++ it is only necessary for you to
add the &lt;boost&gt; root directory to your list of include
directories for that project. It is not necessary for you to
manually add a .lib file to the project; the headers will
automatically select the correct .lib file for your build mode
and tell the linker to include it. </p>
<p>Note that if you want to statically link to the regex library
when using the dynamic C++ runtime, define
BOOST_REGEX_STATIC_LINK when building your project (this only has
an effect for release builds). If you want to add the source
directly to your project then define BOOST_REGEX_NO_LIB to
disable automatic library selection.</p>
<p><strong><i>Important</i></strong><em>: there have been some
reports of compiler-optimisation bugs affecting this library, (particularly
with VC6 versions prior to service patch 5) the workaround is to
build the library using /Oityb1 rather than /O2. That is to use
all optimisation settings except /Oa. This problem is reported to
affect some standard library code as well (in fact I'm not sure
if the problem is with the regex code or the underlying standard
library), so it's probably worthwhile applying this workaround in
normal practice in any case.</em></p>
<p>Note: if you have replaced the C++ standard library that comes
with VC6, then when you build the library you must ensure that
the environment variables &quot;INCLUDE&quot; and &quot;LIB&quot;
have been updated to reflect the include and library paths for
the new library - see vcvars32.bat (part of your Visual Studio
installation) for more details. Alternatively if STLPort is in c:/stlport
then you could use:</p>
<p>nmake INCLUDES=&quot;-Ic:/stlport/stlport&quot; XLFLAGS=&quot;/LIBPATH:c:/stlport/lib&quot;
-fvc6-stlport.mak</p>
<p>If you are building with the full STLPort v4.x, then use the
vc6-stlport.mak file provided and set the environment variable
STLPORT_PATH to point to the location of your STLport
installation (Note that the full STLPort libraries appear not to
support single-thread static builds). <br>
&nbsp; <br>
&nbsp; </p>
<p><b>GCC(2.95)</b> </p>
<p>There is a conservative makefile for the g++ compiler. From
the command prompt change to the &lt;boost&gt;/libs/regex/build
directory and type: </p>
<p>make -fgcc.mak </p>
<p>At the end of the build process you should have a gcc sub-directory
containing release and debug versions of the library (libboost_regex.a
and libboost_regex_debug.a). When you build projects that use
regex++, you will need to add the boost install directory to your
list of include paths and add &lt;boost&gt;/libs/regex/build/gcc/libboost_regex.a
to your list of library files. </p>
<p>There is also a makefile to build the library as a shared
library:</p>
<p>make -fgcc-shared.mak</p>
<p>which will build libboost_regex.so and libboost_regex_debug.so.</p>
<p>Both of the these makefiles support the following environment
variables:</p>
<p>CXXFLAGS: extra compiler options - note that this applies to
both the debug and release builds.</p>
<p>INCLUDES: additional include directories.</p>
<p>LDFLAGS: additional linker options.</p>
<p>LIBS: additional library files.</p>
<p>For the more adventurous there is a configure script in
&lt;boost&gt;/libs/config; see the <a href="../config/config.htm">config
library documentation</a>.</p>
<p><b>Sun Workshop 6.1</b></p>
<p>There is a makefile for the sun (6.1) compiler (C++ version 3.12).
From the command prompt change to the &lt;boost&gt;/libs/regex/build
directory and type: </p>
<p>dmake -f sunpro.mak </p>
<p>At the end of the build process you should have a sunpro sub-directory
containing single and multithread versions of the library (libboost_regex.a,
libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so).
When you build projects that use regex++, you will need to add
the boost install directory to your list of include paths and add
&lt;boost&gt;/libs/regex/build/sunpro/ to your library search
path. </p>
<p>Both of the these makefiles support the following environment
variables:</p>
<p>CXXFLAGS: extra compiler options - note that this applies to
both the single and multithreaded builds.</p>
<p>INCLUDES: additional include directories.</p>
<p>LDFLAGS: additional linker options.</p>
<p>LIBS: additional library files.</p>
<p>LIBSUFFIX: a suffix to mangle the library name with (defaults
to nothing).</p>
<p>This makefile does not set any architecture specific options
like -xarch=v9, you can set these by defining the appropriate
macros, for example:</p>
<p>dmake CXXFLAGS=&quot;-xarch=v9&quot; LDFLAGS=&quot;-xarch=v9&quot;
LIBSUFFIX=&quot;_v9&quot; -f sunpro.mak</p>
<p>will build v9 variants of the regex library named
libboost_regex_v9.a etc.</p>
<p><b>Other compilers:</b> </p>
<p>There is a generic makefile (<a href="build/generic.mak">generic.mak</a>)
provided in &lt;boost-root&gt;/libs/regex/build - see that
makefile for details of environment variables that need to be set
before use. Alternatively you can using the <a
href="../../tools/build/index.html">Jam based build system</a>.
If you need to configure the library for your platform, then
refer to the <a href="../config/config.htm">config library
documentation</a>.</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
</body>
</html>

43
performance/Jamfile Normal file
View File

@ -0,0 +1,43 @@
subproject libs/regex/performance ;
SOURCES = command_line main time_boost time_greta time_localised_boost time_pcre time_posix time_safe_greta ;
if $(HS_REGEX_PATH)
{
HS_SOURCES = $(HS_REGEX_PATH)/regcomp.c $(HS_REGEX_PATH)/regerror.c $(HS_REGEX_PATH)/regexec.c $(HS_REGEX_PATH)/regfree.c ;
POSIX_OPTS = <define>BOOST_HAS_POSIX=1 <include>$(HS_REGEX_PATH) ;
}
else if $(USE_POSIX)
{
POSIX_OPTS = <define>BOOST_HAS_POSIX=1 ;
}
if $(PCRE_PATH)
{
PCRE_SOURCES = $(PCRE_PATH)/chartables.c $(PCRE_PATH)/get.c $(PCRE_PATH)/pcre.c $(PCRE_PATH)/study.c ;
PCRE_OPTS = <define>BOOST_HAS_PCRE=1 <include>$(PCRE_PATH) ;
}
else if $(USE_PCRE)
{
PCRE_OPTS = <define>BOOST_HAS_PCRE=1 <find-library>pcre ;
}
exe regex_comparison :
$(SOURCES).cpp
$(HS_SOURCES)
$(PCRE_SOURCES)
<lib>../build/boost_regex
<lib>../../test/build/boost_prg_exec_monitor
:
<include>$(BOOST_ROOT)
<define>BOOST_REGEX_NO_LIB=1
<define>BOOST_REGEX_STATIC_LINK=1
$(POSIX_OPTS)
$(PCRE_OPTS)
;

View File

@ -0,0 +1,470 @@
#include <iostream>
#include <iomanip>
#include <fstream>
#include <deque>
#include <sstream>
#include <stdexcept>
#include <iterator>
#include <boost/regex.hpp>
#include <boost/version.hpp>
#include "regex_comparison.hpp"
#ifdef BOOST_HAS_PCRE
#include "pcre.h" // for pcre version number
#endif
//
// globals:
//
bool time_boost = false;
bool time_localised_boost = false;
bool time_greta = false;
bool time_safe_greta = false;
bool time_posix = false;
bool time_pcre = false;
bool test_matches = false;
bool test_code = false;
bool test_html = false;
bool test_short_twain = false;
bool test_long_twain = false;
std::string html_template_file;
std::string html_out_file;
std::string html_contents;
std::list<results> result_list;
// the following let us compute averages:
double greta_total = 0;
double safe_greta_total = 0;
double boost_total = 0;
double locale_boost_total = 0;
double posix_total = 0;
double pcre_total = 0;
unsigned greta_test_count = 0;
unsigned safe_greta_test_count = 0;
unsigned boost_test_count = 0;
unsigned locale_boost_test_count = 0;
unsigned posix_test_count = 0;
unsigned pcre_test_count = 0;
int handle_argument(const std::string& what)
{
if(what == "-b")
time_boost = true;
else if(what == "-bl")
time_localised_boost = true;
#ifdef BOOST_HAS_GRETA
else if(what == "-g")
time_greta = true;
else if(what == "-gs")
time_safe_greta = true;
#endif
#ifdef BOOST_HAS_POSIX
else if(what == "-posix")
time_posix = true;
#endif
#ifdef BOOST_HAS_PCRE
else if(what == "-pcre")
time_pcre = true;
#endif
else if(what == "-all")
{
time_boost = true;
time_localised_boost = true;
#ifdef BOOST_HAS_GRETA
time_greta = true;
time_safe_greta = true;
#endif
#ifdef BOOST_HAS_POSIX
time_posix = true;
#endif
#ifdef BOOST_HAS_PCRE
time_pcre = true;
#endif
}
else if(what == "-test-matches")
test_matches = true;
else if(what == "-test-code")
test_code = true;
else if(what == "-test-html")
test_html = true;
else if(what == "-test-short-twain")
test_short_twain = true;
else if(what == "-test-long-twain")
test_long_twain = true;
else if(what == "-test-all")
{
test_matches = true;
test_code = true;
test_html = true;
test_short_twain = true;
test_long_twain = true;
}
else if((what == "-h") || (what == "--help"))
return show_usage();
else if((what[0] == '-') || (what[0] == '/'))
{
std::cerr << "Unknown argument: \"" << what << "\"" << std::endl;
return 1;
}
else if(html_template_file.size() == 0)
{
html_template_file = what;
load_file(html_contents, what.c_str());
}
else if(html_out_file.size() == 0)
html_out_file = what;
else
{
std::cerr << "Unexpected argument: \"" << what << "\"" << std::endl;
return 1;
}
return 0;
}
int show_usage()
{
std::cout <<
"Usage\n"
"regex_comparison [-h] [library options] [test options] [html_template html_output_file]\n"
" -h Show help\n\n"
" library options:\n"
" -b Apply tests to boost library\n"
" -bl Apply tests to boost library with C++ locale\n"
#ifdef BOOST_HAS_GRETA
" -g Apply tests to GRETA library\n"
" -gs Apply tests to GRETA library (in non-recursive mode)\n"
#endif
#ifdef BOOST_HAS_POSIX
" -posix Apply tests to POSIX library\n"
#endif
#ifdef BOOST_HAS_PCRE
" -pcre Apply tests to PCRE library\n"
#endif
" -all Apply tests to all libraries\n\n"
" test options:\n"
" -test-matches Test short matches\n"
" -test-code Test c++ code examples\n"
" -test-html Test c++ code examples\n"
" -test-short-twain Test short searches\n"
" -test-long-twain Test long searches\n"
" -test-all Test everthing\n";
return 1;
}
void load_file(std::string& text, const char* file)
{
std::deque<char> temp_copy;
std::ifstream is(file);
if(!is.good())
{
std::string msg("Unable to open file: \"");
msg.append(file);
msg.append("\"");
throw std::runtime_error(msg);
}
is.seekg(0, std::ios_base::end);
std::istream::pos_type pos = is.tellg();
is.seekg(0, std::ios_base::beg);
text.erase();
text.reserve(pos);
std::istreambuf_iterator<char> it(is);
std::copy(it, std::istreambuf_iterator<char>(), std::back_inserter(text));
}
void print_result(std::ostream& os, double time, double best)
{
static const char* suffixes[] = {"s", "ms", "us", "ns", "ps", };
if(time < 0)
{
os << "<td>NA</td>";
return;
}
double rel = time / best;
bool highlight = ((rel > 0) && (rel < 1.1));
unsigned suffix = 0;
while(time < 0)
{
time *= 1000;
++suffix;
}
os << "<td>";
if(highlight)
os << "<font color=\"#008000\">";
if(rel <= 1000)
os << std::setprecision(3) << rel;
else
os << (int)rel;
os << "<BR>(";
if(time <= 1000)
os << std::setprecision(3) << time;
else
os << (int)time;
os << suffixes[suffix] << ")";
if(highlight)
os << "</font>";
os << "</td>";
}
std::string html_quote(const std::string& in)
{
static const boost::regex e("(<)|(>)|(&)|(\")");
static const std::string format("(?1&lt;)(?2&gt;)(?3&amp;)(?4&quot;)");
return regex_replace(in, e, format, boost::match_default | boost::format_all);
}
void output_html_results(bool show_description, const std::string& tagname)
{
std::stringstream os;
if(result_list.size())
{
//
// start by outputting the table header:
//
os << "<table border=\"1\" cellspacing=\"1\">\n";
os << "<tr><td><strong>Expression</strong></td>";
if(show_description)
os << "<td><strong>Text</strong></td>";
#if defined(BOOST_HAS_GRETA)
if(time_greta == true)
os << "<td><strong>GRETA</strong></td>";
if(time_safe_greta == true)
os << "<td><strong>GRETA<BR>(non-recursive mode)</strong></td>";
#endif
if(time_boost == true)
os << "<td><strong>Boost</strong></td>";
if(time_localised_boost == true)
os << "<td><strong>Boost + C++ locale</strong></td>";
#if defined(BOOST_HAS_POSIX)
if(time_posix == true)
os << "<td><strong>POSIX</strong></td>";
#endif
#ifdef BOOST_HAS_PCRE
if(time_pcre == true)
os << "<td><strong>PCRE</strong></td>";
#endif
os << "</tr>\n";
//
// Now enumerate through all the test results:
//
std::list<results>::const_iterator first, last;
first = result_list.begin();
last = result_list.end();
while(first != last)
{
os << "<tr><td><code>" << html_quote(first->expression) << "</code></td>";
if(show_description)
os << "<td>" << html_quote(first->description) << "</td>";
#if defined(BOOST_HAS_GRETA)
if(time_greta == true)
{
print_result(os, first->greta_time, first->factor);
if(first->greta_time > 0)
{
greta_total += first->greta_time / first->factor;
++greta_test_count;
}
}
if(time_safe_greta == true)
{
print_result(os, first->safe_greta_time, first->factor);
if(first->safe_greta_time > 0)
{
safe_greta_total += first->safe_greta_time / first->factor;
++safe_greta_test_count;
}
}
#endif
#if defined(BOOST_HAS_POSIX)
if(time_boost == true)
{
print_result(os, first->boost_time, first->factor);
if(first->boost_time > 0)
{
boost_total += first->boost_time / first->factor;
++boost_test_count;
}
}
if(time_localised_boost == true)
{
print_result(os, first->localised_boost_time, first->factor);
if(first->localised_boost_time > 0)
{
locale_boost_total += first->localised_boost_time / first->factor;
++locale_boost_test_count;
}
}
#endif
if(time_posix == true)
{
print_result(os, first->posix_time, first->factor);
if(first->posix_time > 0)
{
posix_total += first->posix_time / first->factor;
++posix_test_count;
}
}
#if defined(BOOST_HAS_PCRE)
if(time_pcre == true)
{
print_result(os, first->pcre_time, first->factor);
if(first->pcre_time > 0)
{
pcre_total += first->pcre_time / first->factor;
++pcre_test_count;
}
}
#endif
os << "</tr>\n";
++first;
}
os << "</table>\n";
result_list.clear();
}
else
{
os << "<P><I>Results not available...</I></P>\n";
}
std::string result = os.str();
std::string::size_type pos = html_contents.find(tagname);
if(pos != std::string::npos)
{
html_contents.replace(pos, tagname.size(), result);
}
}
std::string get_boost_version()
{
std::stringstream os;
os << (BOOST_VERSION / 100000) << '.' << ((BOOST_VERSION / 100) % 1000) << '.' << (BOOST_VERSION % 100);
return os.str();
}
std::string get_averages_table()
{
std::stringstream os;
//
// start by outputting the table header:
//
os << "<table border=\"1\" cellspacing=\"1\">\n";
os << "<tr>";
#if defined(BOOST_HAS_GRETA)
if(time_greta == true)
{
os << "<td><strong>GRETA</strong></td>";
}
if(time_safe_greta == true)
{
os << "<td><strong>GRETA<BR>(non-recursive mode)</strong></td>";
}
#endif
if(time_boost == true)
{
os << "<td><strong>Boost</strong></td>";
}
if(time_localised_boost == true)
{
os << "<td><strong>Boost + C++ locale</strong></td>";
}
#if defined(BOOST_HAS_POSIX)
if(time_posix == true)
{
os << "<td><strong>POSIX</strong></td>";
}
#endif
#ifdef BOOST_HAS_PCRE
if(time_pcre == true)
{
os << "<td><strong>PCRE</strong></td>";
}
#endif
os << "</tr>\n";
//
// Now enumerate through all averages:
//
os << "<tr>";
#if defined(BOOST_HAS_GRETA)
if(time_greta == true)
os << "<td>" << (greta_total / greta_test_count) << "</td>\n";
if(time_safe_greta == true)
os << "<td>" << (safe_greta_total / safe_greta_test_count) << "</td>\n";
#endif
#if defined(BOOST_HAS_POSIX)
if(time_boost == true)
os << "<td>" << (boost_total / boost_test_count) << "</td>\n";
if(time_localised_boost == true)
os << "<td>" << (locale_boost_total / locale_boost_test_count) << "</td>\n";
#endif
if(time_posix == true)
os << "<td>" << (posix_total / posix_test_count) << "</td>\n";
#if defined(BOOST_HAS_PCRE)
if(time_pcre == true)
os << "<td>" << (pcre_total / pcre_test_count) << "</td>\n";
#endif
os << "</tr>\n";
os << "</table>\n";
return os.str();
}
void output_final_html()
{
if(html_out_file.size())
{
//
// start with search and replace ops:
//
std::string::size_type pos;
pos = html_contents.find("%compiler%");
if(pos != std::string::npos)
{
html_contents.replace(pos, 10, BOOST_COMPILER);
}
pos = html_contents.find("%library%");
if(pos != std::string::npos)
{
html_contents.replace(pos, 9, BOOST_STDLIB);
}
pos = html_contents.find("%os%");
if(pos != std::string::npos)
{
html_contents.replace(pos, 4, BOOST_PLATFORM);
}
pos = html_contents.find("%boost%");
if(pos != std::string::npos)
{
html_contents.replace(pos, 7, get_boost_version());
}
pos = html_contents.find("%pcre%");
if(pos != std::string::npos)
{
#ifdef PCRE_MINOR
html_contents.replace(pos, 6, BOOST_STRINGIZE(PCRE_MAJOR.PCRE_MINOR));
#else
html_contents.replace(pos, 6, "N/A");
#endif
}
pos = html_contents.find("%averages%");
if(pos != std::string::npos)
{
html_contents.replace(pos, 10, get_averages_table());
}
//
// now right the output to file:
//
std::ofstream os(html_out_file.c_str());
os << html_contents;
}
else
{
std::cout << html_contents;
}
}

70
performance/input.html Normal file
View File

@ -0,0 +1,70 @@
<html>
<head>
<title>Regular Expression Performance Comparison</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
<meta name="Template" content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
</head>
<body bgcolor="#ffffff" link="#0000ff" vlink="#800080">
<h2>Regular Expression Performance Comparison</h2>
<p>
The following tables provide comparisons between the following regular
expression libraries:</p>
<p><a href="http://research.microsoft.com/projects/greta">GRETA</a>.</p>
<p><a href="http://www.boost.org/">The Boost regex library</a>.</p>
<p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a>
- this is provided for comparison as a typical non-backtracking implementation.</p>
<P>Philip Hazel's <A href="http://www.pcre.org">PCRE</A> library.</P>
<H3>Details</H3>
<P>Machine: Intel Pentium 4 2.8GHz PC.</P>
<P>Compiler: %compiler%.</P>
<P>C++ Standard Library: %library%.</P>
<P>OS: %os%.</P>
<P>Boost version: %boost%.</P>
<P>PCRE version: %pcre%.</P>
<P>
As ever care should be taken in interpreting the results, only sensible regular
expressions (rather than pathological cases) are given, most are taken from the
Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of
Regular Expressions</a>. In addition, some variation in the relative
performance of these libraries can be expected on other machines - as memory
access and processor caching effects can be quite large for most finite state
machine algorithms.</P>
<H3>Averages</H3>
<P>The following are the average relative scores for all the tests: the perfect
regular expression library&nbsp;would score 1, in practice anything less than 2
is pretty good.</P>
<P>%averages%</P>
<h3>Comparison 1: Long Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within a long English language text was measured
(<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a>
from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb).&nbsp;</p>
<P>%long_twain_search%</P>
<h3>Comparison 2: Medium Sized Search</h3>
<p>For each of the following regular expressions the time taken to find all
occurrences of the expression within a medium sized English language text was
measured (the first 50K from mtent12.txt).&nbsp;</p>
<P>%short_twain_search%</P>
<H3>Comparison 3:&nbsp;C++ Code&nbsp;Search</H3>
<P>For each of the following regular expressions the time taken to find all
occurrences of the expression within the C++ source file <A href="../../../boost/crc.hpp">
boost/crc.hpp</A>&nbsp;was measured.&nbsp;</P>
<P>%code_search%</P>
<H3>
<H3>Comparison 4: HTML Document Search</H3>
</H3>
<P>For each of the following regular expressions the time taken to find all
occurrences of the expression within the html file <A href="../../libraries.htm">libs/libraries.htm</A>
was measured.&nbsp;</P>
<P>%html_search%</P>
<H3>Comparison 3: Simple Matches</H3>
<p>
For each of the following regular expressions the time taken to match against
the text indicated was measured.&nbsp;</p>
<P>%short_matches%</P>
<hr>
<p>Copyright John Maddock April 2003, all rights reserved.</p>
</body>
</html>

251
performance/main.cpp Normal file
View File

@ -0,0 +1,251 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include <iostream>
#include <fstream>
#include <iterator>
#include <cassert>
#include <boost/test/execution_monitor.hpp>
#include "regex_comparison.hpp"
void test_match(const std::string& re, const std::string& text, const std::string& description, bool icase)
{
double time;
results r(re, description);
std::cout << "Testing: \"" << re << "\" against \"" << description << "\"" << std::endl;
#ifdef BOOST_HAS_GRETA
if(time_greta == true)
{
time = g::time_match(re, text, icase);
r.greta_time = time;
std::cout << "\tGRETA regex: " << time << "s\n";
}
if(time_safe_greta == true)
{
time = gs::time_match(re, text, icase);
r.safe_greta_time = time;
std::cout << "\tSafe GRETA regex: " << time << "s\n";
}
#endif
if(time_boost == true)
{
time = b::time_match(re, text, icase);
r.boost_time = time;
std::cout << "\tBoost regex: " << time << "s\n";
}
if(time_localised_boost == true)
{
time = bl::time_match(re, text, icase);
r.localised_boost_time = time;
std::cout << "\tBoost regex (C++ locale): " << time << "s\n";
}
#ifdef BOOST_HAS_POSIX
if(time_posix == true)
{
time = posix::time_match(re, text, icase);
r.posix_time = time;
std::cout << "\tPOSIX regex: " << time << "s\n";
}
#endif
#ifdef BOOST_HAS_PCRE
if(time_pcre == true)
{
time = pcr::time_match(re, text, icase);
r.pcre_time = time;
std::cout << "\tPCRE regex: " << time << "s\n";
}
#endif
r.finalise();
result_list.push_back(r);
}
void test_find_all(const std::string& re, const std::string& text, const std::string& description, bool icase)
{
std::cout << "Testing: " << re << std::endl;
double time;
results r(re, description);
#ifdef BOOST_HAS_GRETA
if(time_greta == true)
{
time = g::time_find_all(re, text, icase);
r.greta_time = time;
std::cout << "\tGRETA regex: " << time << "s\n";
}
if(time_safe_greta == true)
{
time = gs::time_find_all(re, text, icase);
r.safe_greta_time = time;
std::cout << "\tSafe GRETA regex: " << time << "s\n";
}
#endif
if(time_boost == true)
{
time = b::time_find_all(re, text, icase);
r.boost_time = time;
std::cout << "\tBoost regex: " << time << "s\n";
}
if(time_localised_boost == true)
{
time = bl::time_find_all(re, text, icase);
r.localised_boost_time = time;
std::cout << "\tBoost regex (C++ locale): " << time << "s\n";
}
#ifdef BOOST_HAS_POSIX
if(time_posix == true)
{
time = posix::time_find_all(re, text, icase);
r.posix_time = time;
std::cout << "\tPOSIX regex: " << time << "s\n";
}
#endif
#ifdef BOOST_HAS_PCRE
if(time_pcre == true)
{
time = pcr::time_find_all(re, text, icase);
r.pcre_time = time;
std::cout << "\tPCRE regex: " << time << "s\n";
}
#endif
r.finalise();
result_list.push_back(r);
}
int cpp_main(int argc, char * argv[])
{
// start by processing the command line args:
if(argc < 2)
return show_usage();
int result = 0;
for(int c = 1; c < argc; ++c)
{
result += handle_argument(argv[c]);
}
if(result)
return result;
if(test_matches)
{
// start with a simple test, this is basically a measure of the minimal overhead
// involved in calling a regex matcher:
test_match("abc", "abc");
// these are from the regex docs:
test_match("^([0-9]+)(\\-| |$)(.*)$", "100- this is a line of ftp response which contains a message string");
test_match("([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}", "1234-5678-1234-456");
// these are from http://www.regxlib.com/
test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "john_maddock@compuserve.com");
test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "foo12@foo.edu");
test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "bob.smith@foo.tv");
test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "EH10 2QQ");
test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "G1 1AA");
test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "SW1 1ZZ");
test_match("^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$", "4/1/2001");
test_match("^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$", "12/12/2001");
test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "123");
test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "+3.14159");
test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "-3.14159");
}
output_html_results(true, "%short_matches%");
std::string file_contents;
if(test_code)
{
load_file(file_contents, "../../../boost/crc.hpp");
const char* highlight_expression = // preprocessor directives: index 1
"(^[ \t]*#(?:[^\\\\\\n]|\\\\[^\\n_[:punct:][:alnum:]]*[\\n[:punct:][:word:]])*)|"
// comment: index 2
"(//[^\\n]*|/\\*.*?\\*/)|"
// literals: index 3
"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
// string literals: index 4
"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
// keywords: index 5
"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
"|using|virtual|void|volatile|wchar_t|while)\\>"
;
const char* class_expression = "^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
"(class|struct)[[:space:]]*(\\<\\w+\\>([ \t]*\\([^)]*\\))?"
"[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?"
"(\\{|:[^;\\{()]*\\{)";
const char* include_expression = "^[ \t]*#[ \t]*include[ \t]+(\"[^\"]+\"|<[^>]+>)";
const char* boost_include_expression = "^[ \t]*#[ \t]*include[ \t]+(\"boost/[^\"]+\"|<boost/[^>]+>)";
test_find_all(class_expression, file_contents);
test_find_all(highlight_expression, file_contents);
test_find_all(include_expression, file_contents);
test_find_all(boost_include_expression, file_contents);
}
output_html_results(false, "%code_search%");
if(test_html)
{
load_file(file_contents, "../../../libs/libraries.htm");
test_find_all("beman|john|dave", file_contents, true);
test_find_all("<p>.*?</p>", file_contents, true);
test_find_all("<a[^>]+href=(\"[^\"]*\"|[^[:space:]]+)[^>]*>", file_contents, true);
test_find_all("<h[12345678][^>]*>.*?</h[12345678]>", file_contents, true);
test_find_all("<img[^>]+src=(\"[^\"]*\"|[^[:space:]]+)[^>]*>", file_contents, true);
test_find_all("<font[^>]+face=(\"[^\"]*\"|[^[:space:]]+)[^>]*>.*?</font>", file_contents, true);
}
output_html_results(false, "%html_search%");
if(test_short_twain)
{
load_file(file_contents, "short_twain.txt");
test_find_all("Twain", file_contents);
test_find_all("Huck[[:alpha:]]+", file_contents);
test_find_all("[[:alpha:]]+ing", file_contents);
test_find_all("^[^\n]*?Twain", file_contents);
test_find_all("Tom|Sawyer|Huckleberry|Finn", file_contents);
test_find_all("(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)", file_contents);
}
output_html_results(false, "%short_twain_search%");
if(test_long_twain)
{
load_file(file_contents, "mtent13.txt");
test_find_all("Twain", file_contents);
test_find_all("Huck[[:alpha:]]+", file_contents);
test_find_all("[[:alpha:]]+ing", file_contents);
test_find_all("^[^\n]*?Twain", file_contents);
test_find_all("Tom|Sawyer|Huckleberry|Finn", file_contents);
time_posix = false;
test_find_all("(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)", file_contents);
time_posix = true;
}
output_html_results(false, "%long_twain_search%");
output_final_html();
return 0;
}

View File

@ -0,0 +1,136 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* All rights reserved.
* May not be transfered or disclosed to a third party without
* prior consent of the author.
*
*/
#ifndef REGEX_COMPARISON_HPP
#define REGEX_COMPARISON_HPP
#include <string>
#include <list>
#include <boost/limits.hpp>
//
// globals:
//
extern bool time_boost;
extern bool time_localised_boost;
extern bool time_greta;
extern bool time_safe_greta;
extern bool time_posix;
extern bool time_pcre;
extern bool test_matches;
extern bool test_short_twain;
extern bool test_long_twain;
extern bool test_code;
extern bool test_html;
extern std::string html_template_file;
extern std::string html_out_file;
extern std::string html_contents;
int handle_argument(const std::string& what);
int show_usage();
void load_file(std::string& text, const char* file);
void output_html_results(bool show_description, const std::string& tagname);
void output_final_html();
struct results
{
double boost_time;
double localised_boost_time;
double greta_time;
double safe_greta_time;
double posix_time;
double pcre_time;
double factor;
std::string expression;
std::string description;
results(const std::string& ex, const std::string& desc)
: boost_time(-1),
localised_boost_time(-1),
greta_time(-1),
safe_greta_time(-1),
posix_time(-1),
pcre_time(-1),
factor(std::numeric_limits<double>::max()),
expression(ex),
description(desc)
{}
void finalise()
{
if((boost_time >= 0) && (boost_time < factor))
factor = boost_time;
if((localised_boost_time >= 0) && (localised_boost_time < factor))
factor = localised_boost_time;
if((greta_time >= 0) && (greta_time < factor))
factor = greta_time;
if((safe_greta_time >= 0) && (safe_greta_time < factor))
factor = safe_greta_time;
if((posix_time >= 0) && (posix_time < factor))
factor = posix_time;
if((pcre_time >= 0) && (pcre_time < factor))
factor = pcre_time;
}
};
extern std::list<results> result_list;
namespace b {
// boost tests:
double time_match(const std::string& re, const std::string& text, bool icase);
double time_find_all(const std::string& re, const std::string& text, bool icase);
}
namespace bl {
// localised boost tests:
double time_match(const std::string& re, const std::string& text, bool icase);
double time_find_all(const std::string& re, const std::string& text, bool icase);
}
namespace pcr {
// pcre tests:
double time_match(const std::string& re, const std::string& text, bool icase);
double time_find_all(const std::string& re, const std::string& text, bool icase);
}
namespace g {
// greta tests:
double time_match(const std::string& re, const std::string& text, bool icase);
double time_find_all(const std::string& re, const std::string& text, bool icase);
}
namespace gs {
// safe greta tests:
double time_match(const std::string& re, const std::string& text, bool icase);
double time_find_all(const std::string& re, const std::string& text, bool icase);
}
namespace posix {
// safe greta tests:
double time_match(const std::string& re, const std::string& text, bool icase);
double time_find_all(const std::string& re, const std::string& text, bool icase);
}
void test_match(const std::string& re, const std::string& text, const std::string& description, bool icase = false);
void test_find_all(const std::string& re, const std::string& text, const std::string& description, bool icase = false);
inline void test_match(const std::string& re, const std::string& text, bool icase = false)
{ test_match(re, text, text, icase); }
inline void test_find_all(const std::string& re, const std::string& text, bool icase = false)
{ test_find_all(re, text, "", icase); }
#define REPEAT_COUNT 10
#endif

View File

@ -0,0 +1,98 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include "regex_comparison.hpp"
#include <boost/timer.hpp>
#include <boost/regex.hpp>
namespace b{
double time_match(const std::string& re, const std::string& text, bool icase)
{
boost::regex e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
boost::smatch what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_match(text, what, e);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_match(text, what, e);
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
bool dummy_grep_proc(const boost::smatch&)
{ return true; }
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
boost::regex e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
boost::smatch what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_grep(&dummy_grep_proc, text, e);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
if(result >10)
return result / iter;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_grep(&dummy_grep_proc, text, e);
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
}

125
performance/time_greta.cpp Normal file
View File

@ -0,0 +1,125 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include "regex_comparison.hpp"
#if defined(BOOST_HAS_GRETA)
#include <cassert>
#include <boost/timer.hpp>
#include "regexpr2.h"
namespace g{
double time_match(const std::string& re, const std::string& text, bool icase)
{
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE));
regex::match_results what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
assert(e.match(text, what));
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text, what);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text, what);
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE));
regex::match_results what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text.begin(), text.end(), what);
while(what.backref(0).matched)
{
e.match(what.backref(0).end(), text.end(), what);
}
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
if(result > 10)
return result / iter;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text.begin(), text.end(), what);
while(what.backref(0).matched)
{
e.match(what.backref(0).end(), text.end(), what);
}
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
}
#else
namespace g {
double time_match(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
}
#endif

View File

@ -0,0 +1,98 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include "regex_comparison.hpp"
#include <boost/timer.hpp>
#include <boost/regex.hpp>
namespace bl{
double time_match(const std::string& re, const std::string& text, bool icase)
{
boost::reg_expression<char, boost::cpp_regex_traits<char> > e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
boost::smatch what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_match(text, what, e);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_match(text, what, e);
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
bool dummy_grep_proc(const boost::smatch&)
{ return true; }
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
boost::reg_expression<char, boost::cpp_regex_traits<char> > e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
boost::smatch what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_grep(&dummy_grep_proc, text, e);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
if(result >10)
return result / iter;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
boost::regex_grep(&dummy_grep_proc, text, e);
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
}

180
performance/time_pcre.cpp Normal file
View File

@ -0,0 +1,180 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include <cassert>
#include <cfloat>
#include "regex_comparison.hpp"
#ifdef BOOST_HAS_PCRE
#include "pcre.h"
#include <boost/timer.hpp>
namespace pcr{
double time_match(const std::string& re, const std::string& text, bool icase)
{
pcre *ppcre;
const char *error;
int erroffset;
int what[50];
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
if(0 == (ppcre = pcre_compile(re.c_str(), (icase ? PCRE_CASELESS | PCRE_ANCHORED | PCRE_DOTALL | PCRE_MULTILINE : PCRE_ANCHORED | PCRE_DOTALL | PCRE_MULTILINE),
&error, &erroffset, NULL)))
{
free(ppcre);
return -1;
}
pcre_extra *pe;
pe = pcre_study(ppcre, 0, &error);
if(error)
{
free(ppcre);
free(pe);
return -1;
}
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
}
run = tim.elapsed();
result = std::min(run, result);
}
free(ppcre);
free(pe);
return result / iter;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
pcre *ppcre;
const char *error;
int erroffset;
int what[50];
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
int exec_result;
int matches;
if(0 == (ppcre = pcre_compile(re.c_str(), (icase ? PCRE_CASELESS | PCRE_DOTALL | PCRE_MULTILINE : PCRE_DOTALL | PCRE_MULTILINE), &error, &erroffset, NULL)))
{
free(ppcre);
return -1;
}
pcre_extra *pe;
pe = pcre_study(ppcre, 0, &error);
if(error)
{
free(ppcre);
free(pe);
return -1;
}
do
{
int startoff;
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
matches = 0;
startoff = 0;
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
while(exec_result >= 0)
{
++matches;
startoff = what[1];
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
}
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
if(result >10)
return result / iter;
result = DBL_MAX;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
int startoff;
matches = 0;
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
matches = 0;
startoff = 0;
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
while(exec_result >= 0)
{
++matches;
startoff = what[1];
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
}
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
}
#else
namespace pcr{
double time_match(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
}
#endif

143
performance/time_posix.cpp Normal file
View File

@ -0,0 +1,143 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include <cassert>
#include <cfloat>
#include "regex_comparison.hpp"
#ifdef BOOST_HAS_POSIX
#include <boost/timer.hpp>
#include "regex.h"
namespace posix{
double time_match(const std::string& re, const std::string& text, bool icase)
{
regex_t e;
regmatch_t what[20];
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
if(0 != regcomp(&e, re.c_str(), (icase ? REG_ICASE | REG_EXTENDED : REG_EXTENDED)))
return -1;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
regexec(&e, text.c_str(), e.re_nsub, what, 0);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
regexec(&e, text.c_str(), e.re_nsub, what, 0);
}
run = tim.elapsed();
result = std::min(run, result);
}
regfree(&e);
return result / iter;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
regex_t e;
regmatch_t what[20];
memset(what, 0, sizeof(what));
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
int exec_result;
int matches;
if(0 != regcomp(&e, re.c_str(), (icase ? REG_ICASE | REG_EXTENDED : REG_EXTENDED)))
return -1;
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
what[0].rm_so = 0;
what[0].rm_eo = text.size();
matches = 0;
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
while(exec_result == 0)
{
++matches;
what[0].rm_so = what[0].rm_eo;
what[0].rm_eo = text.size();
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
}
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
if(result >10)
return result / iter;
result = DBL_MAX;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
what[0].rm_so = 0;
what[0].rm_eo = text.size();
matches = 0;
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
while(exec_result == 0)
{
++matches;
what[0].rm_so = what[0].rm_eo;
what[0].rm_eo = text.size();
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
}
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
}
#else
namespace posix{
double time_match(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
}
#endif

View File

@ -0,0 +1,127 @@
/*
*
* Copyright (c) 2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
#include "regex_comparison.hpp"
#if defined(BOOST_HAS_GRETA)
#include <cassert>
#include <boost/timer.hpp>
#include "regexpr2.h"
namespace gs{
double time_match(const std::string& re, const std::string& text, bool icase)
{
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE), regex::MODE_SAFE);
regex::match_results what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
assert(e.match(text, what));
do
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text, what);
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text, what);
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE), regex::MODE_SAFE);
regex::match_results what;
boost::timer tim;
int iter = 1;
int counter, repeats;
double result = 0;
double run;
do
{
bool r;
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text.begin(), text.end(), what);
while(what.backref(0).matched)
{
e.match(what.backref(0).end(), text.end(), what);
}
}
result = tim.elapsed();
iter *= 2;
}while(result < 0.5);
iter /= 2;
if(result > 10)
return result / iter;
// repeat test and report least value for consistency:
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
{
tim.restart();
for(counter = 0; counter < iter; ++counter)
{
e.match(text.begin(), text.end(), what);
while(what.backref(0).matched)
{
e.match(what.backref(0).end(), text.end(), what);
}
}
run = tim.elapsed();
result = std::min(run, result);
}
return result / iter;
}
}
#else
namespace gs{
double time_match(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
double time_find_all(const std::string& re, const std::string& text, bool icase)
{
return -1;
}
}
#endif

View File

@ -1,314 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, POSIX API Reference</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, POSIX API
Reference. </h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="posix"></a><i>POSIX compatibility library</i></h3>
<pre>#include &lt;boost/cregex.hpp&gt;
<i>or</i>:
#include &lt;boost/regex.h&gt;</pre>
<p>The following functions are available for users who need a
POSIX compatible C library, they are available in both Unicode
and narrow character versions, the standard POSIX API names are
macros that expand to one version or the other depending upon
whether UNICODE is defined or not. </p>
<p><b>Important</b>: Note that all the symbols defined here are
enclosed inside namespace <i>boost</i> when used in C++ programs,
unless you use #include &lt;boost/regex.h&gt; instead - in which
case the symbols are still defined in namespace boost, but are
made available in the global namespace as well.</p>
<p>The functions are defined as: </p>
<pre>extern &quot;C&quot; {
<b>int</b> regcompA(regex_tA*, <b>const</b> <b>char</b>*, <b>int</b>);
<b>unsigned</b> <b>int</b> regerrorA(<b>int</b>, <b>const</b> regex_tA*, <b>char</b>*, <b>unsigned</b> <b>int</b>);
<b>int</b> regexecA(<b>const</b> regex_tA*, <b>const</b> <b>char</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
<b>void</b> regfreeA(regex_tA*);
<b>int</b> regcompW(regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>int</b>);
<b>unsigned</b> <b>int</b> regerrorW(<b>int</b>, <b>const</b> regex_tW*, <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>);
<b>int</b> regexecW(<b>const</b> regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
<b>void</b> regfreeW(regex_tW*);
#ifdef UNICODE
#define regcomp regcompW
#define regerror regerrorW
#define regexec regexecW
#define regfree regfreeW
#define regex_t regex_tW
#else
#define regcomp regcompA
#define regerror regerrorA
#define regexec regexecA
#define regfree regfreeA
#define regex_t regex_tA
#endif
}</pre>
<p>All the functions operate on structure <b>regex_t</b>, which
exposes two public members: </p>
<p><b>unsigned int re_nsub</b> this is filled in by <b>regcomp</b>
and indicates the number of sub-expressions contained in the
regular expression. </p>
<p><b>const TCHAR* re_endp</b> points to the end of the
expression to compile when the flag REG_PEND is set. </p>
<p><i>Footnote: regex_t is actually a #define - it is either
regex_tA or regex_tW depending upon whether UNICODE is defined or
not, TCHAR is either char or wchar_t again depending upon the
macro UNICODE.</i> </p>
<p><b>regcomp</b> takes a pointer to a <b>regex_t</b>, a pointer
to the expression to compile and a flags parameter which can be a
combination of: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_EXTENDED</td>
<td valign="top" width="45%">Compiles modern regular
expressions. Equivalent to regbase::char_classes |
regbase::intervals | regbase::bk_refs.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_BASIC</td>
<td valign="top" width="45%">Compiles basic (obsolete)
regular expression syntax. Equivalent to regbase::char_classes
| regbase::intervals | regbase::limited_ops | regbase::bk_braces
| regbase::bk_parens | regbase::bk_refs.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NOSPEC</td>
<td valign="top" width="45%">All characters are ordinary,
the expression is a literal string.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_ICASE</td>
<td valign="top" width="45%">Compiles for matching that
ignores character case.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NOSUB</td>
<td valign="top" width="45%">Has no effect in this
library.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NEWLINE</td>
<td valign="top" width="45%">When this flag is set a dot
does not match the newline character.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_PEND</td>
<td valign="top" width="45%">When this flag is set the
re_endp parameter of the regex_t structure must point to
the end of the regular expression to compile.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NOCOLLATE</td>
<td valign="top" width="45%">When this flag is set then
locale dependent collation for character ranges is turned
off.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_ESCAPE_IN_LISTS<br>
, , , </td>
<td valign="top" width="45%">When this flag is set, then
escape sequences are permitted in bracket expressions (character
sets).</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NEWLINE_ALT&nbsp;</td>
<td valign="top" width="45%">When this flag is set then
the newline character is equivalent to the alternation
operator |.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_PERL&nbsp;</td>
<td valign="top" width="45%">&nbsp;A shortcut for perl-like
behavior: REG_EXTENDED | REG_NOCOLLATE |
REG_ESCAPE_IN_LISTS</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_AWK</td>
<td valign="top" width="45%">A shortcut for awk-like
behavior: REG_EXTENDED | REG_ESCAPE_IN_LISTS</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_GREP</td>
<td valign="top" width="45%">A shortcut for grep like
behavior: REG_BASIC | REG_NEWLINE_ALT</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_EGREP</td>
<td valign="top" width="45%">&nbsp;A shortcut for egrep
like behavior: REG_EXTENDED | REG_NEWLINE_ALT</td>
<td width="5%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><b>regerror</b> takes the following parameters, it maps an
error code to a human readable string: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="50%">int code</td>
<td valign="top" width="50%">The error code.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">const regex_t* e</td>
<td valign="top" width="50%">The regular expression (can
be null).</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">char* buf</td>
<td valign="top" width="50%">The buffer to fill in with
the error message.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">unsigned int buf_size</td>
<td valign="top" width="50%">The length of buf.</td>
<td>&nbsp;</td>
</tr>
</table>
<p>If the error code is OR'ed with REG_ITOA then the message that
results is the printable name of the code rather than a message,
for example &quot;REG_BADPAT&quot;. If the code is REG_ATIO then <b>e</b>
must not be null and <b>e-&gt;re_pend</b> must point to the
printable name of an error code, the return value is then the
value of the error code. For any other value of <b>code</b>, the
return value is the number of characters in the error message, if
the return value is greater than or equal to <b>buf_size</b> then
<b>regerror</b> will have to be called again with a larger buffer.</p>
<p><b>regexec</b> finds the first occurrence of expression <b>e</b>
within string <b>buf</b>. If <b>len</b> is non-zero then *<b>m</b>
is filled in with what matched the regular expression, <b>m[0]</b>
contains what matched the whole string, <b>m[1] </b>the first sub-expression
etc, see <b>regmatch_t</b> in the header file declaration for
more details. The <b>eflags</b> parameter can be a combination of:
<br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="50%">REG_NOTBOL</td>
<td valign="top" width="50%">Parameter <b>buf </b>does
not represent the start of a line.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">REG_NOTEOL</td>
<td valign="top" width="50%">Parameter <b>buf</b> does
not terminate at the end of a line.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">REG_STARTEND</td>
<td valign="top" width="50%">The string searched starts
at buf + pmatch[0].rm_so and ends at buf + pmatch[0].rm_eo.</td>
<td>&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p>Finally <b>regfree</b> frees all the memory that was allocated
by regcomp. </p>
<p><i>Footnote: this is an abridged reference to the POSIX API
functions, it is provided for compatibility with other libraries,
rather than an API to be used in new code (unless you need access
from a language other than C++). This version of these functions
should also happily coexist with other versions, as the names
used are macros that expand to the actual function names.</i> <br>
</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,742 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, Regular Expression Syntax</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Regular
Expression Syntax.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="syntax"></a><i>Regular expression syntax</i></h3>
<p>This section covers the regular expression syntax used by this
library, this is a programmers guide, the actual syntax presented
to your program's users will depend upon the flags used during
expression compilation. </p>
<p><i>Literals</i> </p>
<p>All characters are literals except: &quot;.&quot;, &quot;|&quot;,
&quot;*&quot;, &quot;?&quot;, &quot;+&quot;, &quot;(&quot;,
&quot;)&quot;, &quot;{&quot;, &quot;}&quot;, &quot;[&quot;,
&quot;]&quot;, &quot;^&quot;, &quot;$&quot; and &quot;\&quot;.
These characters are literals when preceded by a &quot;\&quot;. A
literal is a character that matches itself, or matches the result
of traits_type::translate(), where traits_type is the traits
template parameter to class reg_expression. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Wildcard</i> </p>
<p>The dot character &quot;.&quot; matches any single character
except : when <i>match_not_dot_null</i> is passed to the matching
algorithms, the dot does not match a null character; when <i>match_not_dot_newline</i>
is passed to the matching algorithms, then the dot does not match
a newline character. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Repeats</i> </p>
<p>A repeat is an expression that is repeated an arbitrary number
of times. An expression followed by &quot;*&quot; can be repeated
any number of times including zero. An expression followed by
&quot;+&quot; can be repeated any number of times, but at least
once, if the expression is compiled with the flag regbase::bk_plus_qm
then &quot;+&quot; is an ordinary character and &quot;\+&quot;
represents a repeat of once or more. An expression followed by
&quot;?&quot; may be repeated zero or one times only, if the
expression is compiled with the flag regbase::bk_plus_qm then
&quot;?&quot; is an ordinary character and &quot;\?&quot;
represents the repeat zero or once operator. When it is necessary
to specify the minimum and maximum number of repeats explicitly,
the bounds operator &quot;{}&quot; may be used, thus &quot;a{2}&quot;
is the letter &quot;a&quot; repeated exactly twice, &quot;a{2,4}&quot;
represents the letter &quot;a&quot; repeated between 2 and 4
times, and &quot;a{2,}&quot; represents the letter &quot;a&quot;
repeated at least twice with no upper limit. Note that there must
be no white-space inside the {}, and there is no upper limit on
the values of the lower and upper bounds. When the expression is
compiled with the flag regbase::bk_braces then &quot;{&quot; and
&quot;}&quot; are ordinary characters and &quot;\{&quot; and
&quot;\}&quot; are used to delimit bounds instead. All repeat
expressions refer to the shortest possible previous sub-expression:
a single character; a character set, or a sub-expression grouped
with &quot;()&quot; for example. </p>
<p>Examples: </p>
<p>&quot;ba*&quot; will match all of &quot;b&quot;, &quot;ba&quot;,
&quot;baaa&quot; etc. </p>
<p>&quot;ba+&quot; will match &quot;ba&quot; or &quot;baaaa&quot;
for example but not &quot;b&quot;. </p>
<p>&quot;ba?&quot; will match &quot;b&quot; or &quot;ba&quot;. </p>
<p>&quot;ba{2,4}&quot; will match &quot;baa&quot;, &quot;baaa&quot;
and &quot;baaaa&quot;. </p>
<p><i>Non-greedy repeats</i> </p>
<p>Whenever the &quot;extended&quot; regular expression syntax is
in use (the default) then non-greedy repeats are possible by
appending a '?' after the repeat; a non-greedy repeat is one
which will match the <i>shortest</i> possible string. </p>
<p>For example to match html tag pairs one could use something
like: </p>
<p>&quot;&lt;\s*tagname[^&gt;]*&gt;(.*?)&lt;\s*/tagname\s*&gt;&quot;
</p>
<p>In this case $1 will contain the text between the tag pairs,
and will be the shortest possible matching string. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Parenthesis</i> </p>
<p>Parentheses serve two purposes, to group items together into a
sub-expression, and to mark what generated the match. For example
the expression &quot;(ab)*&quot; would match all of the string
&quot;ababab&quot;. The matching algorithms <a
href="template_class_ref.htm#query_match">regex_match</a> and <a
href="template_class_ref.htm#reg_search">regex_search</a> each
take an instance of <a href="template_class_ref.htm#reg_match">match_results</a>
that reports what caused the match, on exit from these functions
the <a href="template_class_ref.htm#reg_match">match_results</a>
contains information both on what the whole expression matched
and on what each sub-expression matched. In the example above
match_results[1] would contain a pair of iterators denoting the
final &quot;ab&quot; of the matching string. It is permissible
for sub-expressions to match null strings. If a sub-expression
takes no part in a match - for example if it is part of an
alternative that is not taken - then both of the iterators that
are returned for that sub-expression point to the end of the
input string, and the <i>matched</i> parameter for that sub-expression
is <i>false</i>. Sub-expressions are indexed from left to right
starting from 1, sub-expression 0 is the whole expression. </p>
<p><i>Non-Marking Parenthesis</i> </p>
<p>Sometimes you need to group sub-expressions with parenthesis,
but don't want the parenthesis to spit out another marked sub-expression,
in this case a non-marking parenthesis (?:expression) can be used.
For example the following expression creates no sub-expressions: </p>
<p>&quot;(?:abc)*&quot;</p>
<p><em>Forward Lookahead Asserts</em>&nbsp; </p>
<p>There are two forms of these; one for positive forward
lookahead asserts, and one for negative lookahead asserts:</p>
<p>&quot;(?=abc)&quot; matches zero characters only if they are
followed by the expression &quot;abc&quot;.</p>
<p>&quot;(?!abc)&quot; matches zero characters only if they are
not followed by the expression &quot;abc&quot;.</p>
<p><i>Alternatives</i> </p>
<p>Alternatives occur when the expression can match either one
sub-expression or another, each alternative is separated by a
&quot;|&quot;, or a &quot;\|&quot; if the flag regbase::bk_vbar
is set, or by a newline character if the flag regbase::newline_alt
is set. Each alternative is the largest possible previous sub-expression;
this is the opposite behaviour from repetition operators. </p>
<p>Examples: </p>
<p>&quot;a(b|c)&quot; could match &quot;ab&quot; or &quot;ac&quot;.
</p>
<p>&quot;abc|def&quot; could match &quot;abc&quot; or &quot;def&quot;.
<br>
&nbsp; <br>
&nbsp; </p>
<p><i>Sets</i> </p>
<p>A set is a set of characters that can match any single
character that is a member of the set. Sets are delimited by
&quot;[&quot; and &quot;]&quot; and can contain literals,
character ranges, character classes, collating elements and
equivalence classes. Set declarations that start with &quot;^&quot;
contain the compliment of the elements that follow. </p>
<p>Examples: </p>
<p>Character literals: </p>
<p>&quot;[abc]&quot; will match either of &quot;a&quot;, &quot;b&quot;,
or &quot;c&quot;. </p>
<p>&quot;[^abc] will match any character other than &quot;a&quot;,
&quot;b&quot;, or &quot;c&quot;. </p>
<p>Character ranges: </p>
<p>&quot;[a-z]&quot; will match any character in the range &quot;a&quot;
to &quot;z&quot;. </p>
<p>&quot;[^A-Z]&quot; will match any character other than those
in the range &quot;A&quot; to &quot;Z&quot;. </p>
<p>Note that character ranges are highly locale dependent: they
match any character that collates between the endpoints of the
range, ranges will only behave according to ASCII rules when the
default &quot;C&quot; locale is in effect. For example if the
library is compiled with the Win32 localization model, then [a-z]
will match the ASCII characters a-z, and also 'A', 'B' etc, but
not 'Z' which collates just after 'z'. This locale specific
behaviour can be disabled by specifying regbase::nocollate when
compiling, this is the default behaviour when using regbase::normal,
and forces ranges to collate according to ASCII character code.
Likewise, if you use the POSIX C API functions then setting
REG_NOCOLLATE turns off locale dependent collation. </p>
<p>Character classes are denoted using the syntax &quot;[:classname:]&quot;
within a set declaration, for example &quot;[[:space:]]&quot; is
the set of all whitespace characters. Character classes are only
available if the flag regbase::char_classes is set. The available
character classes are: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="50%">alnum</td>
<td valign="top" width="50%">Any alpha numeric character.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">alpha</td>
<td valign="top" width="50%">Any alphabetical character a-z
and A-Z. Other characters may also be included depending
upon the locale.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">blank</td>
<td valign="top" width="50%">Any blank character, either
a space or a tab.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">cntrl</td>
<td valign="top" width="50%">Any control character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">digit</td>
<td valign="top" width="50%">Any digit 0-9.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">graph</td>
<td valign="top" width="50%">Any graphical character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">lower</td>
<td valign="top" width="50%">Any lower case character a-z.
Other characters may also be included depending upon the
locale.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">print</td>
<td valign="top" width="50%">Any printable character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">punct</td>
<td valign="top" width="50%">Any punctuation character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">space</td>
<td valign="top" width="50%">Any whitespace character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">upper</td>
<td valign="top" width="50%">Any upper case character A-Z.
Other characters may also be included depending upon the
locale.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">xdigit</td>
<td valign="top" width="50%">Any hexadecimal digit
character, 0-9, a-f and A-F.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">word</td>
<td valign="top" width="50%">Any word character - all
alphanumeric characters plus the underscore.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">unicode</td>
<td valign="top" width="50%">Any character whose code is
greater than 255, this applies to the wide character
traits classes only.</td>
<td>&nbsp;</td>
</tr>
</table>
<p>There are some shortcuts that can be used in place of the
character classes, provided the flag regbase::escape_in_lists is
set then you can use: </p>
<p>\w in place of [:word:] </p>
<p>\s in place of [:space:] </p>
<p>\d in place of [:digit:] </p>
<p>\l in place of [:lower:] </p>
<p>\u in place of [:upper:] <br>
&nbsp; <br>
&nbsp; </p>
<p>Collating elements take the general form [.tagname.] inside a
set declaration, where <i>tagname</i> is either a single
character, or a name of a collating element, for example [[.a.]]
is equivalent to [a], and [[.comma.]] is equivalent to [,]. The
library supports all the standard POSIX collating element names,
and in addition the following digraphs: &quot;ae&quot;, &quot;ch&quot;,
&quot;ll&quot;, &quot;ss&quot;, &quot;nj&quot;, &quot;dz&quot;,
&quot;lj&quot;, each in lower, upper and title case variations.
Multi-character collating elements can result in the set matching
more than one character, for example [[.ae.]] would match two
characters, but note that [^[.ae.]] would only match one
character. <br>
&nbsp; <br>
&nbsp; </p>
<p>Equivalence classes take the general form [=tagname=] inside a
set declaration, where <i>tagname</i> is either a single
character, or a name of a collating element, and matches any
character that is a member of the same primary equivalence class
as the collating element [.tagname.]. An equivalence class is a
set of characters that collate the same, a primary equivalence
class is a set of characters whose primary sort key are all the
same (for example strings are typically collated by character,
then by accent, and then by case; the primary sort key then
relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class
corresponding to <i>tagname</i>, then [=tagname=] is exactly the
same as [.tagname.]. Unfortunately there is no locale independent
method of obtaining the primary sort key for a character, except
under Win32. For other operating systems the library will &quot;guess&quot;
the primary sort key from the full sort key (obtained from <i>strxfrm</i>),
so equivalence classes are probably best considered broken under
any operating system other than Win32. <br>
&nbsp; <br>
&nbsp; </p>
<p>To include a literal &quot;-&quot; in a set declaration then:
make it the first character after the opening &quot;[&quot; or
&quot;[^&quot;, the endpoint of a range, a collating element, or
if the flag regbase::escape_in_lists is set then precede with an
escape character as in &quot;[\-]&quot;. To include a literal
&quot;[&quot; or &quot;]&quot; or &quot;^&quot; in a set then
make them the endpoint of a range, a collating element, or
precede with an escape character if the flag regbase::escape_in_lists
is set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Line anchors</i> </p>
<p>An anchor is something that matches the null string at the
start or end of a line: &quot;^&quot; matches the null string at
the start of a line, &quot;$&quot; matches the null string at the
end of a line. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Back references</i> </p>
<p>A back reference is a reference to a previous sub-expression
that has already been matched, the reference is to what the sub-expression
matched, not to the expression itself. A back reference consists
of the escape character &quot;\&quot; followed by a digit &quot;1&quot;
to &quot;9&quot;, &quot;\1&quot; refers to the first sub-expression,
&quot;\2&quot; to the second etc. For example the expression
&quot;(.*)\1&quot; matches any string that is repeated about its
mid-point for example &quot;abcabc&quot; or &quot;xyzxyz&quot;. A
back reference to a sub-expression that did not participate in
any match, matches the null string: NB this is different to some
other regular expression matchers. Back references are only
available if the expression is compiled with the flag regbase::bk_refs
set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Characters by code</i> </p>
<p>This is an extension to the algorithm that is not available in
other libraries, it consists of the escape character followed by
the digit &quot;0&quot; followed by the octal character code. For
example &quot;\023&quot; represents the character whose octal
code is 23. Where ambiguity could occur use parentheses to break
the expression up: &quot;\0103&quot; represents the character
whose code is 103, &quot;(\010)3 represents the character 10
followed by &quot;3&quot;. To match characters by their
hexadecimal code, use \x followed by a string of hexadecimal
digits, optionally enclosed inside {}, for example \xf0 or
\x{aff}, notice the latter example is a Unicode character. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Word operators</i> </p>
<p>The following operators are provided for compatibility with
the GNU regular expression library. </p>
<p>&quot;\w&quot; matches any single character that is a member
of the &quot;word&quot; character class, this is identical to the
expression &quot;[[:word:]]&quot;. </p>
<p>&quot;\W&quot; matches any single character that is not a
member of the &quot;word&quot; character class, this is identical
to the expression &quot;[^[:word:]]&quot;. </p>
<p>&quot;\&lt;&quot; matches the null string at the start of a
word. </p>
<p>&quot;\&gt;&quot; matches the null string at the end of the
word. </p>
<p>&quot;\b&quot; matches the null string at either the start or
the end of a word. </p>
<p>&quot;\B&quot; matches a null string within a word. </p>
<p>The start of the sequence passed to the matching algorithms is
considered to be a potential start of a word unless the flag
match_not_bow is set. The end of the sequence passed to the
matching algorithms is considered to be a potential end of a word
unless the flag match_not_eow is set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Buffer operators</i> </p>
<p>The following operators are provide for compatibility with the
GNU regular expression library, and Perl regular expressions: </p>
<p>&quot;\`&quot; matches the start of a buffer. </p>
<p>&quot;\A&quot; matches the start of the buffer. </p>
<p>&quot;\'&quot; matches the end of a buffer. </p>
<p>&quot;\z&quot; matches the end of a buffer. </p>
<p>&quot;\Z&quot; matches the end of a buffer, or possibly one or
more new line characters followed by the end of the buffer. </p>
<p>A buffer is considered to consist of the whole sequence passed
to the matching algorithms, unless the flags match_not_bob or
match_not_eob are set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Escape operator</i> </p>
<p>The escape character &quot;\&quot; has several meanings. </p>
<p>Inside a set declaration the escape character is a normal
character unless the flag regbase::escape_in_lists is set in
which case whatever follows the escape is a literal character
regardless of its normal meaning. </p>
<p>The escape operator may introduce an operator for example:
back references, or a word operator. </p>
<p>The escape operator may make the following character normal,
for example &quot;\*&quot; represents a literal &quot;*&quot;
rather than the repeat operator. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Single character escape sequences</i> </p>
<p>The following escape sequences are aliases for single
characters: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="33%">Escape sequence </td>
<td valign="top" width="33%">Character code </td>
<td valign="top" width="33%">Meaning </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\a </td>
<td valign="top" width="33%">0x07 </td>
<td valign="top" width="33%">Bell character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\f </td>
<td valign="top" width="33%">0x0C </td>
<td valign="top" width="33%">Form feed. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\n </td>
<td valign="top" width="33%">0x0A </td>
<td valign="top" width="33%">Newline character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\r </td>
<td valign="top" width="33%">0x0D </td>
<td valign="top" width="33%">Carriage return. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\t </td>
<td valign="top" width="33%">0x09 </td>
<td valign="top" width="33%">Tab character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\v </td>
<td valign="top" width="33%">0x0B </td>
<td valign="top" width="33%">Vertical tab. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\e </td>
<td valign="top" width="33%">0x1B </td>
<td valign="top" width="33%">ASCII Escape character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\0dd </td>
<td valign="top" width="33%">0dd </td>
<td valign="top" width="33%">An octal character code,
where <i>dd</i> is one or more octal digits. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\xXX </td>
<td valign="top" width="33%">0xXX </td>
<td valign="top" width="33%">A hexadecimal character
code, where XX is one or more hexadecimal digits. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\x{XX} </td>
<td valign="top" width="33%">0xXX </td>
<td valign="top" width="33%">A hexadecimal character
code, where XX is one or more hexadecimal digits,
optionally a unicode character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\cZ </td>
<td valign="top" width="33%">z-@ </td>
<td valign="top" width="33%">An ASCII escape sequence
control-Z, where Z is any ASCII character greater than or
equal to the character code for '@'. </td>
<td>&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><i>Miscellaneous escape sequences:</i> </p>
<p>The following are provided mostly for perl compatibility, but
note that there are some differences in the meanings of \l \L \u
and \U: <br>
&nbsp; </p>
<table border="0" cellpadding="6" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\w </td>
<td valign="top" width="45%">Equivalent to [[:word:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\W </td>
<td valign="top" width="45%">Equivalent to [^[:word:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\s </td>
<td valign="top" width="45%">Equivalent to [[:space:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\S </td>
<td valign="top" width="45%">Equivalent to [^[:space:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\d </td>
<td valign="top" width="45%">Equivalent to [[:digit:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\D </td>
<td valign="top" width="45%">Equivalent to [^[:digit:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\l </td>
<td valign="top" width="45%">Equivalent to [[:lower:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\L </td>
<td valign="top" width="45%">Equivalent to [^[:lower:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\u </td>
<td valign="top" width="45%">Equivalent to [[:upper:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\U </td>
<td valign="top" width="45%">Equivalent to [^[:upper:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\C </td>
<td valign="top" width="45%">Any single character,
equivalent to '.'. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\X </td>
<td valign="top" width="45%">Match any Unicode combining
character sequence, for example &quot;a\x 0301&quot; (a
letter a with an acute). </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\Q </td>
<td valign="top" width="45%">The begin quote operator,
everything that follows is treated as a literal character
until a \E end quote operator is found. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\E </td>
<td valign="top" width="45%">The end quote operator,
terminates a sequence begun with \Q. </td>
<td width="5%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><i>What gets matched?</i> </p>
<p>The regular expression library will match the first possible
matching string, if more than one string starting at a given
location can match then it matches the longest possible string,
unless the flag match_any is set, in which case the first match
encountered is returned. Use of the match_any option can reduce
the time taken to find the match - but is only useful if the user
is less concerned about what matched - for example it would not
be suitable for search and replace operations. In cases where
their are multiple possible matches all starting at the same
location, and all of the same length, then the match chosen is
the one with the longest first sub-expression, if that is the
same for two or more matches, then the second sub-expression will
be examined and so on. <br>
</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,52 @@
/*
*
* Copyright (c) 1998-2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE: recursion_test.cpp
* VERSION: see <boost/version.hpp>
* DESCRIPTION: Test for indefinite recursion and/or stack overrun.
*/
#include <string>
#include <boost/regex.hpp>
#include <boost/test/test_tools.hpp>
int test_main( int argc, char* argv[] )
{
std::string bad_text(1024, ' ');
std::string good_text(200, ' ');
good_text.append("xyz");
boost::smatch what;
boost::regex e1("(.+)+xyz");
BOOST_CHECK(boost::regex_search(good_text, what, e1));
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e1), std::runtime_error);
BOOST_CHECK(boost::regex_search(good_text, what, e1));
BOOST_CHECK(boost::regex_match(good_text, what, e1));
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e1), std::runtime_error);
BOOST_CHECK(boost::regex_match(good_text, what, e1));
boost::regex e2("abc|[[:space:]]+(xyz)?[[:space:]]+xyz");
BOOST_CHECK(boost::regex_search(good_text, what, e2));
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e2), std::runtime_error);
BOOST_CHECK(boost::regex_search(good_text, what, e2));
return 0;
}

View File

@ -0,0 +1,63 @@
/*
*
* Copyright (c) 1998-2002
* Dr John Maddock
*
* Permission to use, copy, modify, distribute and sell this software
* and its documentation for any purpose is hereby granted without fee,
* provided that the above copyright notice appear in all copies and
* that both that copyright notice and this permission notice appear
* in supporting documentation. Dr John Maddock makes no representations
* about the suitability of this software for any purpose.
* It is provided "as is" without express or implied warranty.
*
*/
/*
* LOCATION: see http://www.boost.org for most recent version.
* FILE: recursion_test.cpp
* VERSION: see <boost/version.hpp>
* DESCRIPTION: Test for indefinite recursion and/or stack overrun.
*/
#include <string>
#include <boost/regex.hpp>
#include <boost/test/test_tools.hpp>
int test_main( int argc, char* argv[] )
{
// this regex will recurse twice for each whitespace character matched:
boost::regex e("([[:space:]]|.)+");
std::string bad_text(1024*1024*4, ' ');
std::string good_text(200, ' ');
boost::smatch what;
//
// Over and over: We want to make sure that after a stack error has
// been triggered, that we can still conduct a good search and that
// subsequent stack failures still do the right thing:
//
BOOST_CHECK(boost::regex_search(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_search(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_search(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_search(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_search(good_text, what, e));
BOOST_CHECK(boost::regex_match(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_match(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_match(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_match(good_text, what, e));
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
BOOST_CHECK(boost::regex_match(good_text, what, e));
return 0;
}

908
test/regress/v3_tests.txt Normal file
View File

@ -0,0 +1,908 @@
;
;
; this file contains a script of tests to run through regress.exe
;
; comments start with a semicolon and proceed to the end of the line
;
; changes to regular expression compile flags start with a "-" as the first
; non-whitespace character and consist of a list of the printable names
; of the flags, for example "match_default"
;
; Other lines contain a test to perform using the current flag status
; the first token contains the expression to compile, the second the string
; to match it against. If the second string is "!" then the expression should
; not compile, that is the first string is an invalid regular expression.
; This is then followed by a list of integers that specify what should match,
; each pair represents the starting and ending positions of a subexpression
; starting with the zeroth subexpression (the whole match).
; A value of -1 indicates that the subexpression should not take part in the
; match at all, if the first value is -1 then no part of the expression should
; match the string.
;
- match_default normal REG_EXTENDED
;
; try some really simple literals:
a a 0 1
Z Z 0 1
Z aaa -1 -1
Z xxxxZZxxx 4 5
; and some simple brackets:
(a) zzzaazz 3 4 3 4
() zzz 0 0 0 0
() "" 0 0 0 0
( !
) !
(aa !
aa) !
a b -1 -1
\(\) () 0 2
\(a\) (a) 0 3
\() !
(\) !
p(a)rameter ABCparameterXYZ 3 12 4 5
[pq](a)rameter ABCparameterXYZ 3 12 4 5
; now try escaped brackets:
- match_default bk_parens REG_BASIC
\(a\) zzzaazz 3 4 3 4
\(\) zzz 0 0 0 0
\(\) "" 0 0 0 0
\( !
\) !
\(aa !
aa\) !
() () 0 2
(a) (a) 0 3
(\) !
\() !
; now move on to "." wildcards
- match_default normal REG_EXTENDED REG_STARTEND
. a 0 1
. \n 0 1
. \r 0 1
. \0 0 1
- match_default normal match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE
. a 0 1
. \n -1 -1
. \r -1 -1
. \0 0 1
- match_default normal match_not_dot_null match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE
. \n -1 -1
. \r -1 -1
; this *WILL* produce an error from the POSIX API functions:
- match_default normal match_not_dot_null match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE REG_NO_POSIX_TEST
. \0 -1 -1
;
; now move on to the repetion ops,
; starting with operator *
- match_default normal REG_EXTENDED
a* b 0 0
ab* a 0 1
ab* ab 0 2
ab* sssabbbbbbsss 3 10
ab*c* a 0 1
ab*c* abbb 0 4
ab*c* accc 0 4
ab*c* abbcc 0 5
*a !
\<* !
\>* !
\n* \n\n 0 2
\** ** 0 2
\* * 0 1
; now try operator +
ab+ a -1 -1
ab+ ab 0 2
ab+ sssabbbbbbsss 3 10
ab+c+ a -1 -1
ab+c+ abbb -1 -1
ab+c+ accc -1 -1
ab+c+ abbcc 0 5
+a !
\<+ !
\>+ !
\n+ \n\n 0 2
\+ + 0 1
\+ ++ 0 1
\++ ++ 0 2
- match_default normal bk_plus_qm REG_EXTENDED REG_NO_POSIX_TEST
+ + 0 1
\+ !
a\+ aa 0 2
; now try operator ?
- match_default normal REG_EXTENDED
a? b 0 0
ab? a 0 1
ab? ab 0 2
ab? sssabbbbbbsss 3 5
ab?c? a 0 1
ab?c? abbb 0 2
ab?c? accc 0 2
ab?c? abcc 0 3
?a !
\<? !
\>? !
\n? \n\n 0 1
\? ? 0 1
\? ?? 0 1
\?? ?? 0 1
- match_default normal bk_plus_qm REG_EXTENDED REG_NO_POSIX_TEST
? ? 0 1
\? !
a\? aa 0 1
a\? b 0 0
- match_default normal limited_ops
a? a? 0 2
a+ a+ 0 2
a\? a? 0 2
a\+ a+ 0 2
; now try operator {}
- match_default normal REG_EXTENDED
a{2} a -1 -1
a{2} aa 0 2
a{2} aaa 0 2
a{2,} a -1 -1
a{2,} aa 0 2
a{2,} aaaaa 0 5
a{2,4} a -1 -1
a{2,4} aa 0 2
a{2,4} aaa 0 3
a{2,4} aaaa 0 4
a{2,4} aaaaa 0 4
; spaces are now allowed inside {}
"a{ 2 , 4 }" aaaaa 0 4
a{} !
"a{ }" !
a{2 !
a} !
\{\} {} 0 2
- match_default normal bk_braces
a\{2\} a -1 -1
a\{2\} aa 0 2
a\{2\} aaa 0 2
a\{2,\} a -1 -1
a\{2,\} aa 0 2
a\{2,\} aaaaa 0 5
a\{2,4\} a -1 -1
a\{2,4\} aa 0 2
a\{2,4\} aaa 0 3
a\{2,4\} aaaa 0 4
a\{2,4\} aaaaa 0 4
"a\{ 2 , 4 \}" aaaaa 0 4
{} {} 0 2
; now test the alternation operator |
- match_default normal REG_EXTENDED
a|b a 0 1
a|b b 0 1
a(b|c) ab 0 2 1 2
a(b|c) ac 0 2 1 2
a(b|c) ad -1 -1 -1 -1
|c !
c| !
(|) !
(a|) !
(|a) !
a\| a| 0 2
- match_default normal limited_ops
a| a| 0 2
a\| a| 0 2
| | 0 1
- match_default normal bk_vbar REG_NO_POSIX_TEST
a| a| 0 2
a\|b a 0 1
a\|b b 0 1
; now test the set operator []
- match_default normal REG_EXTENDED
; try some literals first
[abc] a 0 1
[abc] b 0 1
[abc] c 0 1
[abc] d -1 -1
[^bcd] a 0 1
[^bcd] b -1 -1
[^bcd] d -1 -1
[^bcd] e 0 1
a[b]c abc 0 3
a[ab]c abc 0 3
a[^ab]c adc 0 3
a[]b]c a]c 0 3
a[[b]c a[c 0 3
a[-b]c a-c 0 3
a[^]b]c adc 0 3
a[^-b]c adc 0 3
a[b-]c a-c 0 3
a[b !
a[] !
; then some ranges
[b-e] a -1 -1
[b-e] b 0 1
[b-e] e 0 1
[b-e] f -1 -1
[^b-e] a 0 1
[^b-e] b -1 -1
[^b-e] e -1 -1
[^b-e] f 0 1
a[1-3]c a2c 0 3
a[3-1]c !
a[1-3-5]c !
a[1- !
; and some classes
a[[:alpha:]]c abc 0 3
a[[:unknown:]]c !
a[[: !
a[[:alpha !
a[[:alpha:] !
a[[:alpha,:] !
a[[:]:]]b !
a[[:-:]]b !
a[[:alph:]] !
a[[:alphabet:]] !
[[:alnum:]]+ -%@a0X_- 3 6
[[:alpha:]]+ -%@aX_0- 3 5
[[:blank:]]+ "a \tb" 1 4
[[:cntrl:]]+ a\n\tb 1 3
[[:digit:]]+ a019b 1 4
[[:graph:]]+ " a%b " 1 4
[[:lower:]]+ AabC 1 3
; This test fails with STLPort, disable for now as this is a corner case anyway...
;[[:print:]]+ "\na b\n" 1 4
[[:punct:]]+ " %-&\t" 1 4
[[:space:]]+ "a \n\t\rb" 1 5
[[:upper:]]+ aBCd 1 3
[[:xdigit:]]+ p0f3Cx 1 5
; now test flag settings:
- escape_in_lists REG_NO_POSIX_TEST
[\n] \n 0 1
- REG_NO_POSIX_TEST
[\n] \n -1 -1
[\n] \\ 0 1
[[:class:] : 0 1
[[:class:] [ 0 1
[[:class:] c 0 1
; line anchors
- match_default normal REG_EXTENDED
^ab ab 0 2
^ab xxabxx -1 -1
^ab xx\nabzz 3 5
ab$ ab 0 2
ab$ abxx -1 -1
ab$ ab\nzz 0 2
- match_default match_not_bol match_not_eol normal REG_EXTENDED REG_NOTBOL REG_NOTEOL
^ab ab -1 -1
^ab xxabxx -1 -1
^ab xx\nabzz 3 5
ab$ ab -1 -1
ab$ abxx -1 -1
ab$ ab\nzz 0 2
; back references
- match_default normal REG_EXTENDED
a(b)\2c !
a(b\1)c !
a(b*)c\1d abbcbbd 0 7 1 3
a(b*)c\1d abbcbd -1 -1
a(b*)c\1d abbcbbbd -1 -1
^(.)\1 abc -1 -1
a([bc])\1d abcdabbd 4 8 5 6
; strictly speaking this is at best ambiguous, at worst wrong, this is what most
; re implimentations will match though.
a(([bc])\2)*d abbccd 0 6 3 5 3 4
a(([bc])\2)*d abbcbd -1 -1
a((b)*\2)*d abbbd 0 5 1 4 2 3
(ab*)[ab]*\1 ababaaa 0 7 0 1
(a)\1bcd aabcd 0 5 0 1
(a)\1bc*d aabcd 0 5 0 1
(a)\1bc*d aabd 0 4 0 1
(a)\1bc*d aabcccd 0 7 0 1
(a)\1bc*[ce]d aabcccd 0 7 0 1
^(a)\1b(c)*cd$ aabcccd 0 7 0 1 4 5
;
; characters by code:
- match_default normal REG_EXTENDED REG_STARTEND
\0101 A 0 1
\00 \0 0 1
\0 \0 0 1
\0172 z 0 1
;
; word operators:
\w a 0 1
\w z 0 1
\w A 0 1
\w Z 0 1
\w _ 0 1
\w } -1 -1
\w ` -1 -1
\w [ -1 -1
\w @ -1 -1
; non-word:
\W a -1 -1
\W z -1 -1
\W A -1 -1
\W Z -1 -1
\W _ -1 -1
\W } 0 1
\W ` 0 1
\W [ 0 1
\W @ 0 1
; word start:
\<abcd " abcd" 2 6
\<ab cab -1 -1
\<ab "\nab" 1 3
\<tag ::tag 2 5
;word end:
abc\> abc 0 3
abc\> abcd -1 -1
abc\> abc\n 0 3
abc\> abc:: 0 3
; word boundary:
\babcd " abcd" 2 6
\bab cab -1 -1
\bab "\nab" 1 3
\btag ::tag 2 5
abc\b abc 0 3
abc\b abcd -1 -1
abc\b abc\n 0 3
abc\b abc:: 0 3
; within word:
\B ab 1 1
a\Bb ab 0 2
a\B ab 0 1
a\B a -1 -1
a\B "a " -1 -1
;
; buffer operators:
\`abc abc 0 3
\`abc \nabc -1 -1
\`abc " abc" -1 -1
abc\' abc 0 3
abc\' abc\n -1 -1
abc\' "abc " -1 -1
;
; extra escape sequences:
\a \a 0 1
\f \f 0 1
\n \n 0 1
\r \r 0 1
\t \t 0 1
\v \v 0 1
;
; now follows various complex expressions designed to try and bust the matcher:
a(((b)))c abc 0 3 1 2 1 2 1 2
a(b|(c))d abd 0 3 1 2 -1 -1
a(b|(c))d acd 0 3 1 2 1 2
a(b*|c)d abbd 0 4 1 3
; just gotta have one DFA-buster, of course
a[ab]{20} aaaaabaaaabaaaabaaaab 0 21
; and an inline expansion in case somebody gets tricky
a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab] aaaaabaaaabaaaabaaaab 0 21
; and in case somebody just slips in an NFA...
a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab](wee|week)(knights|night) aaaaabaaaabaaaabaaaabweeknights 0 31 21 24 24 31
; one really big one
1234567890123456789012345678901234567890123456789012345678901234567890 a1234567890123456789012345678901234567890123456789012345678901234567890b 1 71
; fish for problems as brackets go past 8
[ab][cd][ef][gh][ij][kl][mn] xacegikmoq 1 8
[ab][cd][ef][gh][ij][kl][mn][op] xacegikmoq 1 9
[ab][cd][ef][gh][ij][kl][mn][op][qr] xacegikmoqy 1 10
[ab][cd][ef][gh][ij][kl][mn][op][q] xacegikmoqy 1 10
; and as parenthesis go past 9:
(a)(b)(c)(d)(e)(f)(g)(h) zabcdefghi 1 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
(a)(b)(c)(d)(e)(f)(g)(h)(i) zabcdefghij 1 10 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
(a)(b)(c)(d)(e)(f)(g)(h)(i)(j) zabcdefghijk 1 11 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11
(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k) zabcdefghijkl 1 12 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12
(a)d|(b)c abc 1 3 -1 -1 1 2
"_+((www)|(ftp)|(mailto)):_*" "_wwwnocolon _mailto:" 12 20 13 19 -1 -1 -1 -1 13 19
; subtleties of matching
a(b)?c\1d acd 0 3 -1 -1
a(b?c)+d accd 0 4 2 3
(wee|week)(knights|night) weeknights 0 10 0 3 3 10
.* abc 0 3
a(b|(c))d abd 0 3 1 2 -1 -1
a(b|(c))d acd 0 3 1 2 1 2
a(b*|c|e)d abbd 0 4 1 3
a(b*|c|e)d acd 0 3 1 2
a(b*|c|e)d ad 0 2 1 1
a(b?)c abc 0 3 1 2
a(b?)c ac 0 2 1 1
a(b+)c abc 0 3 1 2
a(b+)c abbbc 0 5 1 4
a(b*)c ac 0 2 1 1
(a|ab)(bc([de]+)f|cde) abcdef 0 6 0 1 1 6 3 5
a([bc]?)c abc 0 3 1 2
a([bc]?)c ac 0 2 1 1
a([bc]+)c abc 0 3 1 2
a([bc]+)c abcc 0 4 1 3
a([bc]+)bc abcbc 0 5 1 3
a(bb+|b)b abb 0 3 1 2
a(bbb+|bb+|b)b abb 0 3 1 2
a(bbb+|bb+|b)b abbb 0 4 1 3
a(bbb+|bb+|b)bb abbb 0 4 1 2
(.*).* abcdef 0 6 0 6
(a*)* bc 0 0 0 0
; do we get the right subexpression when it is used more than once?
a(b|c)*d ad 0 2 -1 -1
a(b|c)*d abcd 0 4 2 3
a(b|c)+d abd 0 3 1 2
a(b|c)+d abcd 0 4 2 3
a(b|c?)+d ad 0 2 1 1
a(b|c?)+d abcd 0 4 2 3
a(b|c){0,0}d ad 0 2 -1 -1
a(b|c){0,1}d ad 0 2 -1 -1
a(b|c){0,1}d abd 0 3 1 2
a(b|c){0,2}d ad 0 2 -1 -1
a(b|c){0,2}d abcd 0 4 2 3
a(b|c){0,}d ad 0 2 -1 -1
a(b|c){0,}d abcd 0 4 2 3
a(b|c){1,1}d abd 0 3 1 2
a(b|c){1,2}d abd 0 3 1 2
a(b|c){1,2}d abcd 0 4 2 3
a(b|c){1,}d abd 0 3 1 2
a(b|c){1,}d abcd 0 4 2 3
a(b|c){2,2}d acbd 0 4 2 3
a(b|c){2,2}d abcd 0 4 2 3
a(b|c){2,4}d abcd 0 4 2 3
a(b|c){2,4}d abcbd 0 5 3 4
a(b|c){2,4}d abcbcd 0 6 4 5
a(b|c){2,}d abcd 0 4 2 3
a(b|c){2,}d abcbd 0 5 3 4
a(b+|((c)*))+d abd 0 3 1 2 -1 -1 -1 -1
a(b+|((c)*))+d abcd 0 4 2 3 2 3 2 3
- match_default normal REG_EXTENDED REG_STARTEND REG_NOSPEC literal
\**?/{} \\**?/{} 0 7
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST ; we disable POSIX testing because it can't handle escapes in sets
; try to match C++ syntax elements:
; line comment:
//[^\n]* "++i //here is a line comment\n" 4 28
; block comment:
/\*([^*]|\*+[^*/])*\*+/ "/* here is a block comment */" 0 29 26 27
/\*([^*]|\*+[^*/])*\*+/ "/**/" 0 4 -1 -1
/\*([^*]|\*+[^*/])*\*+/ "/***/" 0 5 -1 -1
/\*([^*]|\*+[^*/])*\*+/ "/****/" 0 6 -1 -1
/\*([^*]|\*+[^*/])*\*+/ "/*****/" 0 7 -1 -1
/\*([^*]|\*+[^*/])*\*+/ "/*****/*/" 0 7 -1 -1
; preprossor directives:
^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol" 0 19 -1 -1
^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) #x" 0 25 -1 -1
^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) \\ \r\n foo();\\\r\n printf(#x);" 0 53 28 42
; literals:
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFF 0 4 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 35 0 2 0 2 -1 -1 0 2 -1 -1 -1 -1 -1 -1
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFu 0 5 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFL 0 5 0 4 0 4 -1 -1 4 5 -1 -1 -1 -1
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFFFFFFFFFFFFFFFuint64 0 24 0 18 0 18 -1 -1 19 24 19 24 22 24
; strings:
'([^\\']|\\.)*' '\\x3A' 0 6 4 5
'([^\\']|\\.)*' '\\'' 0 4 1 3
'([^\\']|\\.)*' '\\n' 0 4 1 3
; now try and test some unicode specific characters:
- match_default normal REG_PERL REG_UNICODE_ONLY
[[:unicode:]]+ a\0300\0400z 1 3
[\x10-\xff] \39135\12409 -1 -1
[\01-\05]{5} \36865\36865\36865\36865\36865 -1 -1
; finally try some case insensitive matches:
- match_default normal REG_EXTENDED REG_ICASE
; upper and lower have no meaning here so they fail, however these
; may compile with other libraries...
;[[:lower:]] !
;[[:upper:]] !
0123456789@abcdefghijklmnopqrstuvwxyz\[\\\]\^_`ABCDEFGHIJKLMNOPQRSTUVWXYZ\{\|\} 0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]\^_`abcdefghijklmnopqrstuvwxyz\{\|\} 0 72
; known and suspected bugs:
- match_default normal REG_EXTENDED
\( ( 0 1
\) ) 0 1
\$ $ 0 1
\^ ^ 0 1
\. . 0 1
\* * 0 1
\+ + 0 1
\? ? 0 1
\[ [ 0 1
\] ] 0 1
\| | 0 1
\\ \\ 0 1
# # 0 1
\# # 0 1
a- a- 0 2
\- - 0 1
\{ { 0 1
\} } 0 1
0 0 0 1
1 1 0 1
9 9 0 1
b b 0 1
B B 0 1
< < 0 1
> > 0 1
w w 0 1
W W 0 1
` ` 0 1
' ' 0 1
\n \n 0 1
, , 0 1
a a 0 1
f f 0 1
n n 0 1
r r 0 1
t t 0 1
v v 0 1
c c 0 1
x x 0 1
: : 0 1
(\.[[:alnum:]]+){2} "w.a.b " 1 5 3 5
- match_default normal REG_EXTENDED REG_ICASE
a A 0 1
A a 0 1
[abc]+ abcABC 0 6
[ABC]+ abcABC 0 6
[a-z]+ abcABC 0 6
[A-Z]+ abzANZ 0 6
[a-Z]+ abzABZ 0 6
[A-z]+ abzABZ 0 6
[[:lower:]]+ abyzABYZ 0 8
[[:upper:]]+ abzABZ 0 6
[[:word:]]+ abcZZZ 0 6
[[:alpha:]]+ abyzABYZ 0 8
[[:alnum:]]+ 09abyzABYZ 0 10
; updated tests for version 2:
- match_default normal REG_EXTENDED
\x41 A 0 1
\xff \255 0 1
\xFF \255 0 1
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
\c@ \0 0 1
- match_default normal REG_EXTENDED
\cA \1 0 1
\cz \58 0 1
\c= !
\c? !
=: =: 0 2
; word start:
[[:<:]]abcd " abcd" 2 6
[[:<:]]ab cab -1 -1
[[:<:]]ab "\nab" 1 3
[[:<:]]tag ::tag 2 5
;word end:
abc[[:>:]] abc 0 3
abc[[:>:]] abcd -1 -1
abc[[:>:]] abc\n 0 3
abc[[:>:]] abc:: 0 3
; collating elements and rewritten set code:
- match_default normal REG_EXTENDED REG_STARTEND
[[.zero.]] 0 0 1
[[.one.]] 1 0 1
[[.two.]] 2 0 1
[[.three.]] 3 0 1
[[.a.]] baa 1 2
[[.right-curly-bracket.]] } 0 1
[[.NUL.]] \0 0 1
[[:<:]z] !
[a[:>:]] !
[[=a=]] a 0 1
[[=right-curly-bracket=]] } 0 1
- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
[[.A.]] A 0 1
[[.A.]] a 0 1
[[.A.]-b]+ AaBb 0 4
[A-[.b.]]+ AaBb 0 4
[[.a.]-B]+ AaBb 0 4
[a-[.B.]]+ AaBb 0 4
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
[\x61] a 0 1
[\x61-c]+ abcd 0 3
[a-\x63]+ abcd 0 3
- match_default normal REG_EXTENDED REG_STARTEND
[[.a.]-c]+ abcd 0 3
[a-[.c.]]+ abcd 0 3
[[:alpha:]-a] !
[a-[:alpha:]] !
; try mutli-character ligatures:
[[.ae.]] ae 0 2
[[.ae.]] aE -1 -1
[[.AE.]] AE 0 2
[[.Ae.]] Ae 0 2
[[.ae.]-b] a -1 -1
[[.ae.]-b] b 0 1
[[.ae.]-b] ae 0 2
[a-[.ae.]] a 0 1
[a-[.ae.]] b -1 -1
[a-[.ae.]] ae 0 2
- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
[[.ae.]] AE 0 2
[[.ae.]] Ae 0 2
[[.AE.]] Ae 0 2
[[.Ae.]] aE 0 2
[[.AE.]-B] a -1 -1
[[.Ae.]-b] b 0 1
[[.Ae.]-b] B 0 1
[[.ae.]-b] AE 0 2
- match_default normal REG_EXTENDED REG_STARTEND
;extended perl style escape sequences:
\e \27 0 1
\x1b \27 0 1
\x{1b} \27 0 1
\x{} !
\x{ !
\x} !
\x !
\x{yy !
\x{1b !
- match_default normal REG_EXTENDED REG_STARTEND REG_NO_POSIX_TEST
\l+ ABabcAB 2 5
[\l]+ ABabcAB 2 5
[a-\l] !
[\l-a] !
[\L] !
\L+ abABCab 2 5
\u+ abABCab 2 5
[\u]+ abABCab 2 5
[\U] !
\U+ ABabcAB 2 5
\d+ ab012ab 2 5
[\d]+ ab012ab 2 5
[\D] !
\D+ 01abc01 2 5
\s+ "ab ab" 2 5
[\s]+ "ab ab" 2 5
[\S] !
\S+ " abc " 2 5
- match_default normal REG_EXTENDED REG_STARTEND
\Qabc !
\Qabc\E abcd 0 3
\Qabc\Ed abcde 0 4
\Q+*?\\E +*?\\ 0 4
\C+ abcde 0 5
\X+ abcde 0 5
- match_default normal REG_EXTENDED REG_STARTEND REG_UNICODE_ONLY
\X+ a\768\769 0 3
\X+ \2309\2307 0 2 ;DEVANAGARI script
\X+ \2489\2494 0 2 ;BENGALI script
- match_default normal REG_EXTENDED REG_STARTEND
\Aabc abc 0 3
\Aabc aabc -1 -1
abc\z abc 0 3
abc\z abcd -1 -1
abc\Z abc\n\n 0 3
abc\Z abc 0 3
\Gabc abc 0 3
\Gabc dabcd -1 -1
a\Gbc abc -1 -1
a\Aab abc -1 -1
;
; now test grep,
; basically check all our restart types - line, word, etc
; checking each one for null and non-null matches.
;
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP
a " a a a aa" 1 2 3 4 5 6 7 8 8 9
a+b+ "aabaabbb ab" 0 3 3 8 9 11
a(b*|c|e)d adabbdacd 0 2 2 6 6 9
a "\na\na\na\naa" 1 2 3 4 5 6 7 8 8 9
^ " \n\n \n\n\n" 0 0 4 4 5 5 8 8 9 9 10 10
^ab "ab \nab ab\n" 0 2 5 7
^[^\n]*\n " \n \n\n \n" 0 4 4 7 7 8 8 11
\<abc "abcabc abc\n\nabc" 0 3 7 10 12 15
\< " ab a aaa " 2 2 5 5 7 7
\<\w+\W+ " aa aa a " 1 5 5 9 9 11
\Aabc "abc abc" 0 3
\G\w+\W+ "abc abc a cbbb " 0 5 5 9 9 11 11 18
\Ga+b+ "aaababb abb" 0 4 4 7
abc abc 0 3
abc " abc abcabc " 1 4 5 8 8 11
\n\n " \n\n\n \n \n\n\n\n " 1 3 18 20 20 22
$ " \n\n \n\n\n" 3 3 4 4 7 7 8 8 9 9 10 10
\b " abb a abbb " 2 2 5 5 6 6 7 7 8 8 12 12
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP REG_ICASE
A " a a a aa" 1 2 3 4 5 6 7 8 8 9
A+B+ "aabaabbb ab" 0 3 3 8 9 11
A(B*|c|e)D adabbdacd 0 2 2 6 6 9
A "\na\na\na\naa" 1 2 3 4 5 6 7 8 8 9
^aB "Ab \nab Ab\n" 0 2 5 7
\<abc "Abcabc aBc\n\nabc" 0 3 7 10 12 15
ABC abc 0 3
abc " ABC ABCABC " 1 4 5 8 8 11
;
; now test merge,
;
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_no_copy
; start by testing subs:
a+ "...aaa,,," $` "..."
a+ "...aaa,,," $' ",,,"
a+ "...aaa,,," $& "aaa"
a+ "...aaa,,," $0 aaa
a+ "...aaa,,," $1 ""
a+ "...aaa,,," $15 ""
(a+)b+ "...aaabbb,,," $1 aaa
[[:digit:]]* 123ab <$0> <123><><><>
[[:digit:]]* 123ab1 <$0> <123><><><1>
; and now escapes:
a+ "...aaa,,," $x "$x"
a+ "...aaa,,," \a "\a"
a+ "...aaa,,," \f "\f"
a+ "...aaa,,," \n "\n"
a+ "...aaa,,," \r "\r"
a+ "...aaa,,," \t "\t"
a+ "...aaa,,," \v "\v"
a+ "...aaa,,," \x21 "!"
a+ "...aaa,,," \x{21} "!"
a+ "...aaa,,," \c@ \0
a+ "...aaa,,," \e \27
a+ "...aaa,,," \0101 A
a+ "...aaa,,," (\0101) A
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_sed format_no_copy
(a+)(b+) ...aabb,, \0 aabb
(a+)(b+) ...aabb,, \1 aa
(a+)(b+) ...aabb,, \2 bb
(a+)(b+) ...aabb,, & aabb
(a+)(b+) ...aabb,, $ $
(a+)(b+) ...aabb,, $1 $1
(a+)(b+) ...aabb,, ()?: ()?:
(a+)(b+) ...aabb,, \\ \\
(a+)(b+) ...aabb,, \& &
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_perl format_no_copy
(a+)(b+) ...aabb,, $0 aabb
(a+)(b+) ...aabb,, $1 aa
(a+)(b+) ...aabb,, $2 bb
(a+)(b+) ...aabb,, $& aabb
(a+)(b+) ...aabb,, & &
(a+)(b+) ...aabb,, \0 \0
(a+)(b+) ...aabb,, ()?: ()?:
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE
; move to copying unmatched data:
a+ "...aaa,,," bbb "...bbb,,,"
a+(b+) "...aaabb,,," $1 "...bb,,,"
a+(b+) "...aaabb,,,ab*abbb?" $1 "...bb,,,b*bbb?"
(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A)(?2B) "...AB,,,AB*AB?"
(a+)|(b+) "...aaabb,,,ab*abbb?" ?1A:B "...AB,,,AB*AB?"
(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A:B)C "...ACBC,,,ACBC*ACBC?"
(a+)|(b+) "...aaabb,,,ab*abbb?" ?1:B "...B,,,B*B?"
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_first_only
; move to copying unmatched data, but replace first occurance only:
a+ "...aaa,,," bbb "...bbb,,,"
a+(b+) "...aaabb,,," $1 "...bb,,,"
a+(b+) "...aaabb,,,ab*abbb?" $1 "...bb,,,ab*abbb?"
(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A)(?2B) "...Abb,,,ab*abbb?"
;
; changes to newline handling with 2.11:
;
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP
^. " \n \r\n " 0 1 3 4 7 8
.$ " \n \r\n " 1 2 4 5 8 9
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP REG_UNICODE_ONLY
^. " \8232 \8233 " 0 1 3 4 5 6
.$ " \8232 \8233 " 1 2 3 4 6 7
;
; non-greedy repeats added 21/04/00
- match_default normal REG_EXTENDED
a** !
a*? aa 0 0
a?? aa 0 0
a++ !
a+? aa 0 1
a{1,3}{1} !
a{1,3}? aaa 0 1
\w+?w ...ccccccwcccccw 3 10
\W+\w+?w ...ccccccwcccccw 0 10
abc|\w+? abd 0 1
abc|\w+? abcd 0 3
<\s*tag[^>]*>(.*?)<\s*/tag\s*> " <tag>here is some text</tag> <tag></tag>" 1 29 6 23
<\s*tag[^>]*>(.*?)<\s*/tag\s*> " < tag attr=\"something\">here is some text< /tag > <tag></tag>" 1 49 24 41
;
; non-marking parenthesis added 25/04/00
- match_default normal REG_EXTENDED
(?:abc)+ xxabcabcxx 2 8
(?:a+)(b+) xaaabbbx 1 7 4 7
(a+)(?:b+) xaaabbba 1 7 1 4
(?:(a+)b+) xaaabbba 1 7 1 4
(?:a+(b+)) xaaabbba 1 7 4 7
a+(?#b+)b+ xaaabbba 1 7
(a)(?:b|$) ab 0 2 0 1
(a)(?:b|$) a 0 1 0 1
;
; try some partial matches:
- match_partial match_default normal REG_EXTENDED REG_NO_POSIX_TEST
(xyz)(.*)abc xyzaaab -1 -1 0 3 3 7
(xyz)(.*)abc xyz -1 -1 0 3 3 3
(xyz)(.*)abc xy -1 -1 -1 -1 -1 -1
;
; forward lookahead asserts added 21/01/02
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
((?:(?!a|b)\w)+)(\w+) " xxxabaxxx " 2 11 2 5 5 11
/\*(?:(?!\*/).)*\*/ " /**/ " 2 6
/\*(?:(?!\*/).)*\*/ " /***/ " 2 7
/\*(?:(?!\*/).)*\*/ " /********/ " 2 12
/\*(?:(?!\*/).)*\*/ " /* comment */ " 2 15
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)<\s*/\s*a\s*> " <a href=\"here\">here</a> " 1 24 16 20
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)<\s*/\s*a\s*> " <a href=\"here\">here< / a > " 1 28 16 20
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)(?=<\s*/\s*a\s*>) " <a href=\"here\">here</a> " 1 20 16 20
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)(?=<\s*/\s*a\s*>) " <a href=\"here\">here< / a > " 1 20 16 20
; filename matching:
^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ command.com 0 11
^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ PRN -1 -1
^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ COM2 -1 -1
; password checking:
^(?=.*\d).{4,8}$ abc3 0 4
^(?=.*\d).{4,8}$ abc3def4 0 8
^(?=.*\d).{4,8}$ ab2 -1 -1
^(?=.*\d).{4,8}$ abcdefg -1 -1
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ abc3 -1 -1
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ abC3 0 4
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ ABCD3 -1 -1

File diff suppressed because it is too large Load Diff