mirror of
https://github.com/boostorg/regex.git
synced 2025-07-16 05:42:15 +02:00
Merged regex-4 branch.
[SVN r18431]
This commit is contained in:
79
doc/Attic/standards.html
Normal file
79
doc/Attic/standards.html
Normal file
@ -0,0 +1,79 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Standards Conformance</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Standards Conformance</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>C++</H3>
|
||||
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>, which will appear in a
|
||||
future C++ standard technical report (and hopefully in a future version of the
|
||||
standard). Currently there are some differences in how the regular
|
||||
expression traits classes are defined, these will be fixed in a future release.</P>
|
||||
<H3>ECMAScript / JavaScript</H3>
|
||||
<P>All of the ECMAScript regular expression syntax features are supported, except
|
||||
that:</P>
|
||||
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
|
||||
definitions ( [...] ).</P>
|
||||
<P>The escape sequence \u matches any upper case character (the same as
|
||||
[[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for
|
||||
Unicode escape sequences.</P>
|
||||
<H3>Perl</H3>
|
||||
<P>Almost all Perl features are supported, except for:</P>
|
||||
<P>\N{name} Use [[:name:]] instead.</P>
|
||||
<P>\pP and \PP</P>
|
||||
<P>(?imsx-imsx)</P>
|
||||
<P>(?<=pattern)</P>
|
||||
<P>(?<!pattern)</P>
|
||||
<P>(?{code})</P>
|
||||
<P>(??{code})</P>
|
||||
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
|
||||
<P>These embarrassments / limitations will be removed in due course, mainly
|
||||
dependent upon user demand.</P>
|
||||
<H3>POSIX</H3>
|
||||
<P>All the POSIX basic and extended regular expression features are supported,
|
||||
except that:</P>
|
||||
<P>No character collating names are recognized except those specified in the POSIX
|
||||
standard for the C locale, unless they are explicitly registered with the
|
||||
traits class.</P>
|
||||
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
|
||||
Win32. Implementing this feature requires knowledge of the format of the
|
||||
string sort keys produced by the system; if you need this, and the default
|
||||
implementation doesn't work on your platform, then you will need to supply a
|
||||
custom traits class.</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
426
doc/Attic/sub_match.html
Normal file
426
doc/Attic/sub_match.html
Normal file
@ -0,0 +1,426 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: sub_match</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">sub_match</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>#include <<A href="../../boost/regex.hpp">boost/regex.hpp</A>>
|
||||
</P>
|
||||
<P>Regular expressions are different from many simple pattern-matching algorithms
|
||||
in that as well as finding an overall match they can also produce
|
||||
sub-expression matches: each sub-expression being delimited in the pattern by a
|
||||
pair of parenthesis (...). There has to be some method for reporting
|
||||
sub-expression matches back to the user: this is achieved this by defining a
|
||||
class <I><A href="match_results.htm">match_results</A></I> that acts as an
|
||||
indexed collection of sub-expression matches, each sub-expression match being
|
||||
contained in an object of type <I>sub_match</I>
|
||||
.
|
||||
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
|
||||
of type <EM><A href="match_results.html">match_results</A></EM>
|
||||
.
|
||||
<P>When the marked sub-expression denoted by an object of type sub_match<>
|
||||
participated in a regular expression match then member <CODE>matched</CODE> evaluates
|
||||
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
|
||||
range of characters <CODE>[first,second)</CODE> which formed that match.
|
||||
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
|
||||
contained undefined values.</P>
|
||||
<P>If an object of type <CODE>sub_match<></CODE> represents sub-expression 0
|
||||
- that is to say the whole match - then member <CODE>matched</CODE> is always
|
||||
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
|
||||
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
|
||||
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
|
||||
character range that formed the partial match.</P>
|
||||
<PRE>
|
||||
namespace boost{
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
|
||||
{
|
||||
public:
|
||||
typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
|
||||
typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
|
||||
typedef BidirectionalIterator iterator;
|
||||
|
||||
bool matched;
|
||||
|
||||
difference_type length()const;
|
||||
operator basic_string<value_type>()const;
|
||||
basic_string<value_type> str()const;
|
||||
|
||||
int compare(const sub_match& s)const;
|
||||
int compare(const basic_string<value_type>& s)const;
|
||||
int compare(const value_type* s)const;
|
||||
};
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
|
||||
template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os,
|
||||
const sub_match<BidirectionalIterator>& m);
|
||||
|
||||
} // namespace boost</PRE>
|
||||
<H3>Description</H3>
|
||||
<H4>
|
||||
sub_match members</H4>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::value_type value_type;</PRE>
|
||||
<P>The type pointed to by the iterators.</P>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::difference_type difference_type;</PRE>
|
||||
<P>A type that represents the difference between two iterators.</P>
|
||||
<PRE>typedef iterator iterator_type;</PRE>
|
||||
<P>The iterator type.</P>
|
||||
<PRE>iterator first</PRE>
|
||||
<P>An iterator denoting the position of the start of the match.</P>
|
||||
<PRE>iterator second</PRE>
|
||||
<P>An iterator denoting the position of the end of the match.</P>
|
||||
<PRE>bool matched</PRE>
|
||||
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
|
||||
<PRE>static difference_type length();</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string<value_type>()const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>()).</P></CODE><PRE>basic_string<value_type> str()const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>())</CODE>.</P><PRE>int compare(const sub_match& s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string<value_type>& s)const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
|
||||
<H4>
|
||||
sub_match non-member operators</H4>
|
||||
<PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) < 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) <= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) >= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) > 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os
|
||||
const sub_match<BidirectionalIterator>& m);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(os << m.str())</CODE>.
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
773
doc/Attic/syntax.html
Normal file
773
doc/Attic/syntax.html
Normal file
@ -0,0 +1,773 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Regular Expression Syntax</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Regular Expression Syntax</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>This section covers the regular expression syntax used by this library, this is
|
||||
a programmers guide, the actual syntax presented to your program's users will
|
||||
depend upon the flags used during expression compilation.
|
||||
</P>
|
||||
<H3>Literals
|
||||
</H3>
|
||||
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
|
||||
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
|
||||
a "\". A literal is a character that matches itself, or matches the result of
|
||||
traits_type::translate(), where traits_type is the traits template parameter to
|
||||
class basic_regex.</P>
|
||||
<H3>Wildcard
|
||||
</H3>
|
||||
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
|
||||
is passed to the matching algorithms, the dot does not match a null character;
|
||||
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
|
||||
the dot does not match a newline character.
|
||||
</P>
|
||||
<H3>Repeats
|
||||
</H3>
|
||||
<P>A repeat is an expression that is repeated an arbitrary number of times. An
|
||||
expression followed by "*" can be repeated any number of times including zero.
|
||||
An expression followed by "+" can be repeated any number of times, but at least
|
||||
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
|
||||
then "+" is an ordinary character and "\+" represents a repeat of once or more.
|
||||
An expression followed by "?" may be repeated zero or one times only, if the
|
||||
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
|
||||
ordinary character and "\?" represents the repeat zero or once operator. When
|
||||
it is necessary to specify the minimum and maximum number of repeats
|
||||
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
|
||||
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
|
||||
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
|
||||
no upper limit. Note that there must be no white-space inside the {}, and there
|
||||
is no upper limit on the values of the lower and upper bounds. When the
|
||||
expression is compiled with the flag regex_constants::bk_braces then "{" and
|
||||
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
|
||||
instead. All repeat expressions refer to the shortest possible previous
|
||||
sub-expression: a single character; a character set, or a sub-expression
|
||||
grouped with "()" for example.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"ba*" will match all of "b", "ba", "baaa" etc.
|
||||
</P>
|
||||
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
|
||||
</P>
|
||||
<P>"ba?" will match "b" or "ba".
|
||||
</P>
|
||||
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
|
||||
</P>
|
||||
<H3>Non-greedy repeats
|
||||
</H3>
|
||||
<P>Whenever the "extended" regular expression syntax is in use (the default) then
|
||||
non-greedy repeats are possible by appending a '?' after the repeat; a
|
||||
non-greedy repeat is one which will match the <I>shortest</I> possible string.
|
||||
</P>
|
||||
<P>For example to match html tag pairs one could use something like:
|
||||
</P>
|
||||
<P>"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
|
||||
</P>
|
||||
<P>In this case $1 will contain the text between the tag pairs, and will be the
|
||||
shortest possible matching string.
|
||||
</P>
|
||||
<H3>Parenthesis
|
||||
</H3>
|
||||
<P>Parentheses serve two purposes, to group items together into a sub-expression,
|
||||
and to mark what generated the match. For example the expression "(ab)*" would
|
||||
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
|
||||
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
|
||||
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
|
||||
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
|
||||
match_results</A> contains information both on what the whole expression
|
||||
matched and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the final "ab" of
|
||||
the matching string. It is permissible for sub-expressions to match null
|
||||
strings. If a sub-expression takes no part in a match - for example if it is
|
||||
part of an alternative that is not taken - then both of the iterators that are
|
||||
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
|
||||
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
|
||||
from left to right starting from 1, sub-expression 0 is the whole expression.
|
||||
</P>
|
||||
<H3>Non-Marking Parenthesis
|
||||
</H3>
|
||||
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
|
||||
the parenthesis to spit out another marked sub-expression, in this case a
|
||||
non-marking parenthesis (?:expression) can be used. For example the following
|
||||
expression creates no sub-expressions:
|
||||
</P>
|
||||
<P>"(?:abc)*"</P>
|
||||
<H3>Forward Lookahead Asserts
|
||||
</H3>
|
||||
<P>There are two forms of these; one for positive forward lookahead asserts, and
|
||||
one for negative lookahead asserts:</P>
|
||||
<P>"(?=abc)" matches zero characters only if they are followed by the expression
|
||||
"abc".</P>
|
||||
<P>"(?!abc)" matches zero characters only if they are not followed by the
|
||||
expression "abc".</P>
|
||||
<H3>Independent sub-expressions</H3>
|
||||
<P>"(?>expression)" matches "expression" as an independent atom (the algorithm
|
||||
will not backtrack into it if a failure occurs later in the expression).</P>
|
||||
<H3>Alternatives
|
||||
</H3>
|
||||
<P>Alternatives occur when the expression can match either one sub-expression or
|
||||
another, each alternative is separated by a "|", or a "\|" if the flag
|
||||
regex_constants::bk_vbar is set, or by a newline character if the flag
|
||||
regex_constants::newline_alt is set. Each alternative is the largest possible
|
||||
previous sub-expression; this is the opposite behavior from repetition
|
||||
operators.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"a(b|c)" could match "ab" or "ac".
|
||||
</P>
|
||||
<P>"abc|def" could match "abc" or "def".
|
||||
</P>
|
||||
<H3>Sets
|
||||
</H3>
|
||||
<P>A set is a set of characters that can match any single character that is a
|
||||
member of the set. Sets are delimited by "[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and equivalence
|
||||
classes. Set declarations that start with "^" contain the compliment of the
|
||||
elements that follow.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>Character literals:
|
||||
</P>
|
||||
<P>"[abc]" will match either of "a", "b", or "c".
|
||||
</P>
|
||||
<P>"[^abc] will match any character other than "a", "b", or "c".
|
||||
</P>
|
||||
<P>Character ranges:
|
||||
</P>
|
||||
<P>"[a-z]" will match any character in the range "a" to "z".
|
||||
</P>
|
||||
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
|
||||
</P>
|
||||
<P>Note that character ranges are highly locale dependent if the flag
|
||||
regex_constants::collate is set: they match any character that collates between
|
||||
the endpoints of the range, ranges will only behave according to ASCII rules
|
||||
when the default "C" locale is in effect. For example if the library is
|
||||
compiled with the Win32 localization model, then [a-z] will match the ASCII
|
||||
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
|
||||
'z'. This locale specific behavior is disabled by default (in perl mode), and
|
||||
forces ranges to collate according to ASCII character code.
|
||||
</P>
|
||||
<P>Character classes are denoted using the syntax "[:classname:]" within a set
|
||||
declaration, for example "[[:space:]]" is the set of all whitespace characters.
|
||||
Character classes are only available if the flag regex_constants::char_classes
|
||||
is set. The available character classes are:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="50%">alnum</TD>
|
||||
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">alpha</TD>
|
||||
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
|
||||
characters may also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">blank</TD>
|
||||
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">cntrl</TD>
|
||||
<TD vAlign="top" width="50%">Any control character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">digit</TD>
|
||||
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">graph</TD>
|
||||
<TD vAlign="top" width="50%">Any graphical character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">lower</TD>
|
||||
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">print</TD>
|
||||
<TD vAlign="top" width="50%">Any printable character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">punct</TD>
|
||||
<TD vAlign="top" width="50%">Any punctuation character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">space</TD>
|
||||
<TD vAlign="top" width="50%">Any whitespace character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">upper</TD>
|
||||
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">xdigit</TD>
|
||||
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">word</TD>
|
||||
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
|
||||
the underscore.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">Unicode</TD>
|
||||
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
|
||||
applies to the wide character traits classes only.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>There are some shortcuts that can be used in place of the character classes,
|
||||
provided the flag regex_constants::escape_in_lists is set then you can use:
|
||||
</P>
|
||||
<P>\w in place of [:word:]
|
||||
</P>
|
||||
<P>\s in place of [:space:]
|
||||
</P>
|
||||
<P>\d in place of [:digit:]
|
||||
</P>
|
||||
<P>\l in place of [:lower:]
|
||||
</P>
|
||||
<P>\u in place of [:upper:]
|
||||
</P>
|
||||
<P>Collating elements take the general form [.tagname.] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
|
||||
equivalent to [,]. The library supports all the standard POSIX collating
|
||||
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
|
||||
"nj", "dz", "lj", each in lower, upper and title case variations.
|
||||
Multi-character collating elements can result in the set matching more than one
|
||||
character, for example [[.ae.]] would match two characters, but note that
|
||||
[^[.ae.]] would only match one character.
|
||||
</P>
|
||||
<P>
|
||||
Equivalence classes take the general form[=tagname=] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, and matches any character that is a member of the same primary
|
||||
equivalence class as the collating element [.tagname.]. An equivalence class is
|
||||
a set of characters that collate the same, a primary equivalence class is a set
|
||||
of characters whose primary sort key are all the same (for example strings are
|
||||
typically collated by character, then by accent, and then by case; the primary
|
||||
sort key then relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
|
||||
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
locale independent method of obtaining the primary sort key for a character,
|
||||
except under Win32. For other operating systems the library will "guess" the
|
||||
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
|
||||
equivalence classes are probably best considered broken under any operating
|
||||
system other than Win32.
|
||||
</P>
|
||||
<P>To include a literal "-" in a set declaration then: make it the first character
|
||||
after the opening "[" or "[^", the endpoint of a range, a collating element, or
|
||||
if the flag regex_constants::escape_in_lists is set then precede with an escape
|
||||
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
|
||||
make them the endpoint of a range, a collating element, or precede with an
|
||||
escape character if the flag regex_constants::escape_in_lists is set.
|
||||
</P>
|
||||
<H3>Line anchors
|
||||
</H3>
|
||||
<P>An anchor is something that matches the null string at the start or end of a
|
||||
line: "^" matches the null string at the start of a line, "$" matches the null
|
||||
string at the end of a line.
|
||||
</P>
|
||||
<H3>Back references
|
||||
</H3>
|
||||
<P>A back reference is a reference to a previous sub-expression that has already
|
||||
been matched, the reference is to what the sub-expression matched, not to the
|
||||
expression itself. A back reference consists of the escape character "\"
|
||||
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
|
||||
to the second etc. For example the expression "(.*)\1" matches any string that
|
||||
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
|
||||
reference to a sub-expression that did not participate in any match, matches
|
||||
the null string: NB this is different to some other regular expression
|
||||
matchers. Back references are only available if the expression is compiled with
|
||||
the flag regex_constants::bk_refs set.
|
||||
</P>
|
||||
<H3>Characters by code
|
||||
</H3>
|
||||
<P>This is an extension to the algorithm that is not available in other libraries,
|
||||
it consists of the escape character followed by the digit "0" followed by the
|
||||
octal character code. For example "\023" represents the character whose octal
|
||||
code is 23. Where ambiguity could occur use parentheses to break the expression
|
||||
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
|
||||
character 10 followed by "3". To match characters by their hexadecimal code,
|
||||
use \x followed by a string of hexadecimal digits, optionally enclosed inside
|
||||
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
|
||||
character.</P>
|
||||
<H3>Word operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library.
|
||||
</P>
|
||||
<P>"\w" matches any single character that is a member of the "word" character
|
||||
class, this is identical to the expression "[[:word:]]".
|
||||
</P>
|
||||
<P>"\W" matches any single character that is not a member of the "word" character
|
||||
class, this is identical to the expression "[^[:word:]]".
|
||||
</P>
|
||||
<P>"\<" matches the null string at the start of a word.
|
||||
</P>
|
||||
<P>"\>" matches the null string at the end of the word.
|
||||
</P>
|
||||
<P>"\b" matches the null string at either the start or the end of a word.
|
||||
</P>
|
||||
<P>"\B" matches a null string within a word.
|
||||
</P>
|
||||
<P>The start of the sequence passed to the matching algorithms is considered to be
|
||||
a potential start of a word unless the flag match_not_bow is set. The end of
|
||||
the sequence passed to the matching algorithms is considered to be a potential
|
||||
end of a word unless the flag match_not_eow is set.
|
||||
</P>
|
||||
<H3>Buffer operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library, and Perl regular expressions:
|
||||
</P>
|
||||
<P>"\`" matches the start of a buffer.
|
||||
</P>
|
||||
<P>"\A" matches the start of the buffer.
|
||||
</P>
|
||||
<P>"\'" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\z" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
|
||||
followed by the end of the buffer.
|
||||
</P>
|
||||
<P>A buffer is considered to consist of the whole sequence passed to the matching
|
||||
algorithms, unless the flags match_not_bob or match_not_eob are set.
|
||||
</P>
|
||||
<H3>Escape operator
|
||||
</H3>
|
||||
<P>The escape character "\" has several meanings.
|
||||
</P>
|
||||
<P>Inside a set declaration the escape character is a normal character unless the
|
||||
flag regex_constants::escape_in_lists is set in which case whatever follows the
|
||||
escape is a literal character regardless of its normal meaning.
|
||||
</P>
|
||||
<P>The escape operator may introduce an operator for example: back references, or
|
||||
a word operator.
|
||||
</P>
|
||||
<P>The escape operator may make the following character normal, for example "\*"
|
||||
represents a literal "*" rather than the repeat operator.
|
||||
</P>
|
||||
<H4>Single character escape sequences
|
||||
</H4>
|
||||
<P>The following escape sequences are aliases for single characters:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="33%">Escape sequence
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Character code
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Meaning
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\a
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x07
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Bell character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\f
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0C
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Form feed.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\n
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0A
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Newline character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\r
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0D
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Carriage return.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\t
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x09
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Tab character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\v
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Vertical tab.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\e
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x1B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">ASCII Escape character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
|
||||
more octal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\x{XX}
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits, optionally a Unicode character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\cZ
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">z-@
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
|
||||
ASCII character greater than or equal to the character code for '@'.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H4>Miscellaneous escape sequences:
|
||||
</H4>
|
||||
<P>The following are provided mostly for perl compatibility, but note that there
|
||||
are some differences in the meanings of \l \L \u and \U:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\w
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\W
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\s
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\S
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\d
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\D
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\l
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\L
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\u
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\U
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\C
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\X
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
|
||||
example "a\x 0301" (a letter a with an acute).
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\Q
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
|
||||
treated as a literal character until a \E end quote operator is found.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\E
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
|
||||
with \Q.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H3>What gets matched?
|
||||
</H3>
|
||||
<P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
<P>
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P><P>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
xaby</CODE>
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"ab"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"a"</CODE></P></TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*([[:alnum:]]+).*</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
" abc def xyz "</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*(a|xayy)</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
zzxayyzz</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"zzxayy"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>These differences between Perl matching rules, and POSIX matching rules, mean
|
||||
that these two regular expression syntaxes differ not only in the features
|
||||
offered, but also in the form that the state machine takes and/or the
|
||||
algorithms used to traverse the state machine.</p>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
332
doc/Attic/syntax_option_type.html
Normal file
332
doc/Attic/syntax_option_type.html
Normal file
@ -0,0 +1,332 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: syntax_option_type</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">syntax_option_type</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>Type syntax_option type is an implementation defined bitmask type that controls
|
||||
how a regular expression string is to be interpreted. For convenience
|
||||
note that all the constants listed here, are also duplicated within the scope
|
||||
of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<PRE>namespace std{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type syntax_option_type;
|
||||
// these flags are standardized:
|
||||
static const syntax_option_type normal;
|
||||
static const syntax_option_type icase;
|
||||
static const syntax_option_type nosubs;
|
||||
static const syntax_option_type optimize;
|
||||
static const syntax_option_type collate;
|
||||
static const syntax_option_type ECMAScript = normal;
|
||||
static const syntax_option_type JavaScript = normal;
|
||||
static const syntax_option_type JScript = normal;
|
||||
static const syntax_option_type basic;
|
||||
static const syntax_option_type extended;
|
||||
static const syntax_option_type awk;
|
||||
static const syntax_option_type grep;
|
||||
static const syntax_option_type egrep;
|
||||
static const syntax_option_type sed = basic;
|
||||
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
|
||||
} // namespace regex_constants
|
||||
} // namespace std</PRE>
|
||||
<H3>Description</H3>
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
|
||||
type (17.3.2.1.2). Setting its elements has the effects listed in the table
|
||||
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
|
||||
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
|
||||
or perl</CODE> set.</P>
|
||||
<P>Note that for convenience all the constants listed here are duplicated within
|
||||
the scope of class template basic_regex, so you can use any of:</P>
|
||||
<PRE>boost::regex_constants::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::regex::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::wregex::constant_name</PRE>
|
||||
<P>in an interchangeable manner.</P>
|
||||
<P>
|
||||
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>Element</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Effect if set</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>normal</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine uses its
|
||||
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
|
||||
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
|
||||
(FWD.1).</P>
|
||||
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>icase</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that matching of regular expressions against a character container
|
||||
sequence shall be performed without regard to case.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>nosubs</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that when a regular expression is matched against a character
|
||||
container sequence, then no sub-expression matches are to be stored in the
|
||||
supplied match_results structure.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>optimize</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the regular expression engine should pay more attention to the
|
||||
speed with which regular expressions are matched, and less to the speed with
|
||||
which regular expression objects are constructed. Otherwise it has no
|
||||
detectable effect on the program output. This currently has no effect for
|
||||
boost.regex.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>collate</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>ECMAScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JavaScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>basic</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
|
||||
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
|
||||
Section 9, Regular Expressions (FWD.1).
|
||||
</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>extended</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX extended regular expressions in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
|
||||
Headers, Section 9, Regular Expressions (FWD.1).</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>awk</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
|
||||
(FWD.1).</P>
|
||||
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
|
||||
character classes permitted.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>grep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
|
||||
Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX basic syntax, but with the newline character
|
||||
acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>egrep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep when given the -E option in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
|
||||
Utilities, Section 4, Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX extended syntax, but with the newline
|
||||
character acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>sed</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as basic.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>perl</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>The following constants are specific to this particular regular expression
|
||||
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>:</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
|
||||
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
|
||||
characters, for example [\]] represents the set of characters containing only
|
||||
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::char_classes</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
|
||||
are allowed inside character set declarations, for example "[[:word:]]"
|
||||
represents the set of all characters that belong to the character class "word".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: intervals</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
|
||||
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
|
||||
a's.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
|
||||
ordinary characters in all situations.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
|
||||
has the same effect as the alternation operator "|".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
|
||||
more repetition operator and "\?" represents the zero or one repetition
|
||||
operator. When this bit is not set then "+" and "?" are used instead.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
|
||||
bounded repetitions and "{" and "}" are normal characters. This is the opposite
|
||||
of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
|
||||
group sub-expressions and "(" and ")" are ordinary characters, this is the
|
||||
opposite of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then back references are
|
||||
allowed.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
|
||||
alternation operator and "|" is an ordinary character. This is the opposite of
|
||||
default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: use_except</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
|
||||
exception will be thrown on error. Use of this flag is deprecated -
|
||||
basic_regex will always throw on error.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: failbit</TD>
|
||||
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
|
||||
not set, then this bit should be checked to see if a regular expression is
|
||||
valid before usage.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::literal</TD>
|
||||
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
|
||||
there are no special characters or escape sequences.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
|
||||
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
|
||||
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
68
doc/Attic/thread_safety.html
Normal file
68
doc/Attic/thread_safety.html
Normal file
@ -0,0 +1,68 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Thread Safety</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Thread Safety</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Class <A href="basic_regex.html">basic_regex</A><> and its typedefs regex
|
||||
and wregex are thread safe, in that compiled regular expressions can safely be
|
||||
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
|
||||
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
|
||||
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
|
||||
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
|
||||
is now thread safe, in that the results of a match can be safely copied from
|
||||
one thread to another (for example one thread may find matches and push
|
||||
match_results instances onto a queue, while another thread pops them off the
|
||||
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
|
||||
per thread.
|
||||
</P>
|
||||
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
|
||||
thread safe, regular expressions compiled with <I>regcomp</I> can also be
|
||||
shared between threads.
|
||||
</P>
|
||||
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
|
||||
gets its own RegEx instance (apartment threading) - this is a consequence of
|
||||
RegEx handling both compiling and matching regular expressions.
|
||||
</P>
|
||||
<P>Finally note that changing the global locale invalidates all compiled regular
|
||||
expressions, therefore calling <I>set_locale</I> from one thread while another
|
||||
uses regular expressions <I>will</I> produce unpredictable results.
|
||||
</P>
|
||||
<P>
|
||||
There is also a requirement that there is only one thread executing prior to
|
||||
the start of main().</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
BIN
doc/Attic/uarrow.gif
Normal file
BIN
doc/Attic/uarrow.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.6 KiB |
79
doc/standards.html
Normal file
79
doc/standards.html
Normal file
@ -0,0 +1,79 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Standards Conformance</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Standards Conformance</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>C++</H3>
|
||||
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>, which will appear in a
|
||||
future C++ standard technical report (and hopefully in a future version of the
|
||||
standard). Currently there are some differences in how the regular
|
||||
expression traits classes are defined, these will be fixed in a future release.</P>
|
||||
<H3>ECMAScript / JavaScript</H3>
|
||||
<P>All of the ECMAScript regular expression syntax features are supported, except
|
||||
that:</P>
|
||||
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
|
||||
definitions ( [...] ).</P>
|
||||
<P>The escape sequence \u matches any upper case character (the same as
|
||||
[[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for
|
||||
Unicode escape sequences.</P>
|
||||
<H3>Perl</H3>
|
||||
<P>Almost all Perl features are supported, except for:</P>
|
||||
<P>\N{name} Use [[:name:]] instead.</P>
|
||||
<P>\pP and \PP</P>
|
||||
<P>(?imsx-imsx)</P>
|
||||
<P>(?<=pattern)</P>
|
||||
<P>(?<!pattern)</P>
|
||||
<P>(?{code})</P>
|
||||
<P>(??{code})</P>
|
||||
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
|
||||
<P>These embarrassments / limitations will be removed in due course, mainly
|
||||
dependent upon user demand.</P>
|
||||
<H3>POSIX</H3>
|
||||
<P>All the POSIX basic and extended regular expression features are supported,
|
||||
except that:</P>
|
||||
<P>No character collating names are recognized except those specified in the POSIX
|
||||
standard for the C locale, unless they are explicitly registered with the
|
||||
traits class.</P>
|
||||
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
|
||||
Win32. Implementing this feature requires knowledge of the format of the
|
||||
string sort keys produced by the system; if you need this, and the default
|
||||
implementation doesn't work on your platform, then you will need to supply a
|
||||
custom traits class.</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
426
doc/sub_match.html
Normal file
426
doc/sub_match.html
Normal file
@ -0,0 +1,426 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: sub_match</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">sub_match</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>#include <<A href="../../boost/regex.hpp">boost/regex.hpp</A>>
|
||||
</P>
|
||||
<P>Regular expressions are different from many simple pattern-matching algorithms
|
||||
in that as well as finding an overall match they can also produce
|
||||
sub-expression matches: each sub-expression being delimited in the pattern by a
|
||||
pair of parenthesis (...). There has to be some method for reporting
|
||||
sub-expression matches back to the user: this is achieved this by defining a
|
||||
class <I><A href="match_results.htm">match_results</A></I> that acts as an
|
||||
indexed collection of sub-expression matches, each sub-expression match being
|
||||
contained in an object of type <I>sub_match</I>
|
||||
.
|
||||
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
|
||||
of type <EM><A href="match_results.html">match_results</A></EM>
|
||||
.
|
||||
<P>When the marked sub-expression denoted by an object of type sub_match<>
|
||||
participated in a regular expression match then member <CODE>matched</CODE> evaluates
|
||||
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
|
||||
range of characters <CODE>[first,second)</CODE> which formed that match.
|
||||
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
|
||||
contained undefined values.</P>
|
||||
<P>If an object of type <CODE>sub_match<></CODE> represents sub-expression 0
|
||||
- that is to say the whole match - then member <CODE>matched</CODE> is always
|
||||
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
|
||||
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
|
||||
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
|
||||
character range that formed the partial match.</P>
|
||||
<PRE>
|
||||
namespace boost{
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
|
||||
{
|
||||
public:
|
||||
typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
|
||||
typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
|
||||
typedef BidirectionalIterator iterator;
|
||||
|
||||
bool matched;
|
||||
|
||||
difference_type length()const;
|
||||
operator basic_string<value_type>()const;
|
||||
basic_string<value_type> str()const;
|
||||
|
||||
int compare(const sub_match& s)const;
|
||||
int compare(const basic_string<value_type>& s)const;
|
||||
int compare(const value_type* s)const;
|
||||
};
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
|
||||
template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os,
|
||||
const sub_match<BidirectionalIterator>& m);
|
||||
|
||||
} // namespace boost</PRE>
|
||||
<H3>Description</H3>
|
||||
<H4>
|
||||
sub_match members</H4>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::value_type value_type;</PRE>
|
||||
<P>The type pointed to by the iterators.</P>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::difference_type difference_type;</PRE>
|
||||
<P>A type that represents the difference between two iterators.</P>
|
||||
<PRE>typedef iterator iterator_type;</PRE>
|
||||
<P>The iterator type.</P>
|
||||
<PRE>iterator first</PRE>
|
||||
<P>An iterator denoting the position of the start of the match.</P>
|
||||
<PRE>iterator second</PRE>
|
||||
<P>An iterator denoting the position of the end of the match.</P>
|
||||
<PRE>bool matched</PRE>
|
||||
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
|
||||
<PRE>static difference_type length();</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string<value_type>()const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>()).</P></CODE><PRE>basic_string<value_type> str()const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>())</CODE>.</P><PRE>int compare(const sub_match& s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string<value_type>& s)const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
|
||||
<H4>
|
||||
sub_match non-member operators</H4>
|
||||
<PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) < 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) <= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) >= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) > 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os
|
||||
const sub_match<BidirectionalIterator>& m);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(os << m.str())</CODE>.
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
773
doc/syntax.html
Normal file
773
doc/syntax.html
Normal file
@ -0,0 +1,773 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Regular Expression Syntax</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Regular Expression Syntax</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>This section covers the regular expression syntax used by this library, this is
|
||||
a programmers guide, the actual syntax presented to your program's users will
|
||||
depend upon the flags used during expression compilation.
|
||||
</P>
|
||||
<H3>Literals
|
||||
</H3>
|
||||
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
|
||||
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
|
||||
a "\". A literal is a character that matches itself, or matches the result of
|
||||
traits_type::translate(), where traits_type is the traits template parameter to
|
||||
class basic_regex.</P>
|
||||
<H3>Wildcard
|
||||
</H3>
|
||||
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
|
||||
is passed to the matching algorithms, the dot does not match a null character;
|
||||
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
|
||||
the dot does not match a newline character.
|
||||
</P>
|
||||
<H3>Repeats
|
||||
</H3>
|
||||
<P>A repeat is an expression that is repeated an arbitrary number of times. An
|
||||
expression followed by "*" can be repeated any number of times including zero.
|
||||
An expression followed by "+" can be repeated any number of times, but at least
|
||||
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
|
||||
then "+" is an ordinary character and "\+" represents a repeat of once or more.
|
||||
An expression followed by "?" may be repeated zero or one times only, if the
|
||||
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
|
||||
ordinary character and "\?" represents the repeat zero or once operator. When
|
||||
it is necessary to specify the minimum and maximum number of repeats
|
||||
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
|
||||
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
|
||||
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
|
||||
no upper limit. Note that there must be no white-space inside the {}, and there
|
||||
is no upper limit on the values of the lower and upper bounds. When the
|
||||
expression is compiled with the flag regex_constants::bk_braces then "{" and
|
||||
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
|
||||
instead. All repeat expressions refer to the shortest possible previous
|
||||
sub-expression: a single character; a character set, or a sub-expression
|
||||
grouped with "()" for example.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"ba*" will match all of "b", "ba", "baaa" etc.
|
||||
</P>
|
||||
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
|
||||
</P>
|
||||
<P>"ba?" will match "b" or "ba".
|
||||
</P>
|
||||
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
|
||||
</P>
|
||||
<H3>Non-greedy repeats
|
||||
</H3>
|
||||
<P>Whenever the "extended" regular expression syntax is in use (the default) then
|
||||
non-greedy repeats are possible by appending a '?' after the repeat; a
|
||||
non-greedy repeat is one which will match the <I>shortest</I> possible string.
|
||||
</P>
|
||||
<P>For example to match html tag pairs one could use something like:
|
||||
</P>
|
||||
<P>"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
|
||||
</P>
|
||||
<P>In this case $1 will contain the text between the tag pairs, and will be the
|
||||
shortest possible matching string.
|
||||
</P>
|
||||
<H3>Parenthesis
|
||||
</H3>
|
||||
<P>Parentheses serve two purposes, to group items together into a sub-expression,
|
||||
and to mark what generated the match. For example the expression "(ab)*" would
|
||||
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
|
||||
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
|
||||
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
|
||||
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
|
||||
match_results</A> contains information both on what the whole expression
|
||||
matched and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the final "ab" of
|
||||
the matching string. It is permissible for sub-expressions to match null
|
||||
strings. If a sub-expression takes no part in a match - for example if it is
|
||||
part of an alternative that is not taken - then both of the iterators that are
|
||||
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
|
||||
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
|
||||
from left to right starting from 1, sub-expression 0 is the whole expression.
|
||||
</P>
|
||||
<H3>Non-Marking Parenthesis
|
||||
</H3>
|
||||
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
|
||||
the parenthesis to spit out another marked sub-expression, in this case a
|
||||
non-marking parenthesis (?:expression) can be used. For example the following
|
||||
expression creates no sub-expressions:
|
||||
</P>
|
||||
<P>"(?:abc)*"</P>
|
||||
<H3>Forward Lookahead Asserts
|
||||
</H3>
|
||||
<P>There are two forms of these; one for positive forward lookahead asserts, and
|
||||
one for negative lookahead asserts:</P>
|
||||
<P>"(?=abc)" matches zero characters only if they are followed by the expression
|
||||
"abc".</P>
|
||||
<P>"(?!abc)" matches zero characters only if they are not followed by the
|
||||
expression "abc".</P>
|
||||
<H3>Independent sub-expressions</H3>
|
||||
<P>"(?>expression)" matches "expression" as an independent atom (the algorithm
|
||||
will not backtrack into it if a failure occurs later in the expression).</P>
|
||||
<H3>Alternatives
|
||||
</H3>
|
||||
<P>Alternatives occur when the expression can match either one sub-expression or
|
||||
another, each alternative is separated by a "|", or a "\|" if the flag
|
||||
regex_constants::bk_vbar is set, or by a newline character if the flag
|
||||
regex_constants::newline_alt is set. Each alternative is the largest possible
|
||||
previous sub-expression; this is the opposite behavior from repetition
|
||||
operators.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"a(b|c)" could match "ab" or "ac".
|
||||
</P>
|
||||
<P>"abc|def" could match "abc" or "def".
|
||||
</P>
|
||||
<H3>Sets
|
||||
</H3>
|
||||
<P>A set is a set of characters that can match any single character that is a
|
||||
member of the set. Sets are delimited by "[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and equivalence
|
||||
classes. Set declarations that start with "^" contain the compliment of the
|
||||
elements that follow.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>Character literals:
|
||||
</P>
|
||||
<P>"[abc]" will match either of "a", "b", or "c".
|
||||
</P>
|
||||
<P>"[^abc] will match any character other than "a", "b", or "c".
|
||||
</P>
|
||||
<P>Character ranges:
|
||||
</P>
|
||||
<P>"[a-z]" will match any character in the range "a" to "z".
|
||||
</P>
|
||||
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
|
||||
</P>
|
||||
<P>Note that character ranges are highly locale dependent if the flag
|
||||
regex_constants::collate is set: they match any character that collates between
|
||||
the endpoints of the range, ranges will only behave according to ASCII rules
|
||||
when the default "C" locale is in effect. For example if the library is
|
||||
compiled with the Win32 localization model, then [a-z] will match the ASCII
|
||||
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
|
||||
'z'. This locale specific behavior is disabled by default (in perl mode), and
|
||||
forces ranges to collate according to ASCII character code.
|
||||
</P>
|
||||
<P>Character classes are denoted using the syntax "[:classname:]" within a set
|
||||
declaration, for example "[[:space:]]" is the set of all whitespace characters.
|
||||
Character classes are only available if the flag regex_constants::char_classes
|
||||
is set. The available character classes are:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="50%">alnum</TD>
|
||||
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">alpha</TD>
|
||||
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
|
||||
characters may also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">blank</TD>
|
||||
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">cntrl</TD>
|
||||
<TD vAlign="top" width="50%">Any control character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">digit</TD>
|
||||
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">graph</TD>
|
||||
<TD vAlign="top" width="50%">Any graphical character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">lower</TD>
|
||||
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">print</TD>
|
||||
<TD vAlign="top" width="50%">Any printable character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">punct</TD>
|
||||
<TD vAlign="top" width="50%">Any punctuation character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">space</TD>
|
||||
<TD vAlign="top" width="50%">Any whitespace character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">upper</TD>
|
||||
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">xdigit</TD>
|
||||
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">word</TD>
|
||||
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
|
||||
the underscore.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">Unicode</TD>
|
||||
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
|
||||
applies to the wide character traits classes only.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>There are some shortcuts that can be used in place of the character classes,
|
||||
provided the flag regex_constants::escape_in_lists is set then you can use:
|
||||
</P>
|
||||
<P>\w in place of [:word:]
|
||||
</P>
|
||||
<P>\s in place of [:space:]
|
||||
</P>
|
||||
<P>\d in place of [:digit:]
|
||||
</P>
|
||||
<P>\l in place of [:lower:]
|
||||
</P>
|
||||
<P>\u in place of [:upper:]
|
||||
</P>
|
||||
<P>Collating elements take the general form [.tagname.] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
|
||||
equivalent to [,]. The library supports all the standard POSIX collating
|
||||
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
|
||||
"nj", "dz", "lj", each in lower, upper and title case variations.
|
||||
Multi-character collating elements can result in the set matching more than one
|
||||
character, for example [[.ae.]] would match two characters, but note that
|
||||
[^[.ae.]] would only match one character.
|
||||
</P>
|
||||
<P>
|
||||
Equivalence classes take the general form[=tagname=] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, and matches any character that is a member of the same primary
|
||||
equivalence class as the collating element [.tagname.]. An equivalence class is
|
||||
a set of characters that collate the same, a primary equivalence class is a set
|
||||
of characters whose primary sort key are all the same (for example strings are
|
||||
typically collated by character, then by accent, and then by case; the primary
|
||||
sort key then relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
|
||||
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
locale independent method of obtaining the primary sort key for a character,
|
||||
except under Win32. For other operating systems the library will "guess" the
|
||||
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
|
||||
equivalence classes are probably best considered broken under any operating
|
||||
system other than Win32.
|
||||
</P>
|
||||
<P>To include a literal "-" in a set declaration then: make it the first character
|
||||
after the opening "[" or "[^", the endpoint of a range, a collating element, or
|
||||
if the flag regex_constants::escape_in_lists is set then precede with an escape
|
||||
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
|
||||
make them the endpoint of a range, a collating element, or precede with an
|
||||
escape character if the flag regex_constants::escape_in_lists is set.
|
||||
</P>
|
||||
<H3>Line anchors
|
||||
</H3>
|
||||
<P>An anchor is something that matches the null string at the start or end of a
|
||||
line: "^" matches the null string at the start of a line, "$" matches the null
|
||||
string at the end of a line.
|
||||
</P>
|
||||
<H3>Back references
|
||||
</H3>
|
||||
<P>A back reference is a reference to a previous sub-expression that has already
|
||||
been matched, the reference is to what the sub-expression matched, not to the
|
||||
expression itself. A back reference consists of the escape character "\"
|
||||
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
|
||||
to the second etc. For example the expression "(.*)\1" matches any string that
|
||||
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
|
||||
reference to a sub-expression that did not participate in any match, matches
|
||||
the null string: NB this is different to some other regular expression
|
||||
matchers. Back references are only available if the expression is compiled with
|
||||
the flag regex_constants::bk_refs set.
|
||||
</P>
|
||||
<H3>Characters by code
|
||||
</H3>
|
||||
<P>This is an extension to the algorithm that is not available in other libraries,
|
||||
it consists of the escape character followed by the digit "0" followed by the
|
||||
octal character code. For example "\023" represents the character whose octal
|
||||
code is 23. Where ambiguity could occur use parentheses to break the expression
|
||||
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
|
||||
character 10 followed by "3". To match characters by their hexadecimal code,
|
||||
use \x followed by a string of hexadecimal digits, optionally enclosed inside
|
||||
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
|
||||
character.</P>
|
||||
<H3>Word operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library.
|
||||
</P>
|
||||
<P>"\w" matches any single character that is a member of the "word" character
|
||||
class, this is identical to the expression "[[:word:]]".
|
||||
</P>
|
||||
<P>"\W" matches any single character that is not a member of the "word" character
|
||||
class, this is identical to the expression "[^[:word:]]".
|
||||
</P>
|
||||
<P>"\<" matches the null string at the start of a word.
|
||||
</P>
|
||||
<P>"\>" matches the null string at the end of the word.
|
||||
</P>
|
||||
<P>"\b" matches the null string at either the start or the end of a word.
|
||||
</P>
|
||||
<P>"\B" matches a null string within a word.
|
||||
</P>
|
||||
<P>The start of the sequence passed to the matching algorithms is considered to be
|
||||
a potential start of a word unless the flag match_not_bow is set. The end of
|
||||
the sequence passed to the matching algorithms is considered to be a potential
|
||||
end of a word unless the flag match_not_eow is set.
|
||||
</P>
|
||||
<H3>Buffer operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library, and Perl regular expressions:
|
||||
</P>
|
||||
<P>"\`" matches the start of a buffer.
|
||||
</P>
|
||||
<P>"\A" matches the start of the buffer.
|
||||
</P>
|
||||
<P>"\'" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\z" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
|
||||
followed by the end of the buffer.
|
||||
</P>
|
||||
<P>A buffer is considered to consist of the whole sequence passed to the matching
|
||||
algorithms, unless the flags match_not_bob or match_not_eob are set.
|
||||
</P>
|
||||
<H3>Escape operator
|
||||
</H3>
|
||||
<P>The escape character "\" has several meanings.
|
||||
</P>
|
||||
<P>Inside a set declaration the escape character is a normal character unless the
|
||||
flag regex_constants::escape_in_lists is set in which case whatever follows the
|
||||
escape is a literal character regardless of its normal meaning.
|
||||
</P>
|
||||
<P>The escape operator may introduce an operator for example: back references, or
|
||||
a word operator.
|
||||
</P>
|
||||
<P>The escape operator may make the following character normal, for example "\*"
|
||||
represents a literal "*" rather than the repeat operator.
|
||||
</P>
|
||||
<H4>Single character escape sequences
|
||||
</H4>
|
||||
<P>The following escape sequences are aliases for single characters:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="33%">Escape sequence
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Character code
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Meaning
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\a
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x07
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Bell character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\f
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0C
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Form feed.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\n
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0A
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Newline character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\r
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0D
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Carriage return.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\t
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x09
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Tab character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\v
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Vertical tab.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\e
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x1B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">ASCII Escape character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
|
||||
more octal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\x{XX}
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits, optionally a Unicode character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\cZ
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">z-@
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
|
||||
ASCII character greater than or equal to the character code for '@'.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H4>Miscellaneous escape sequences:
|
||||
</H4>
|
||||
<P>The following are provided mostly for perl compatibility, but note that there
|
||||
are some differences in the meanings of \l \L \u and \U:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\w
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\W
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\s
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\S
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\d
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\D
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\l
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\L
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\u
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\U
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\C
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\X
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
|
||||
example "a\x 0301" (a letter a with an acute).
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\Q
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
|
||||
treated as a literal character until a \E end quote operator is found.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\E
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
|
||||
with \Q.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H3>What gets matched?
|
||||
</H3>
|
||||
<P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
<P>
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P><P>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
xaby</CODE>
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"ab"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"a"</CODE></P></TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*([[:alnum:]]+).*</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
" abc def xyz "</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*(a|xayy)</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
zzxayyzz</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"zzxayy"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>These differences between Perl matching rules, and POSIX matching rules, mean
|
||||
that these two regular expression syntaxes differ not only in the features
|
||||
offered, but also in the form that the state machine takes and/or the
|
||||
algorithms used to traverse the state machine.</p>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
332
doc/syntax_option_type.html
Normal file
332
doc/syntax_option_type.html
Normal file
@ -0,0 +1,332 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: syntax_option_type</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">syntax_option_type</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>Type syntax_option type is an implementation defined bitmask type that controls
|
||||
how a regular expression string is to be interpreted. For convenience
|
||||
note that all the constants listed here, are also duplicated within the scope
|
||||
of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<PRE>namespace std{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type syntax_option_type;
|
||||
// these flags are standardized:
|
||||
static const syntax_option_type normal;
|
||||
static const syntax_option_type icase;
|
||||
static const syntax_option_type nosubs;
|
||||
static const syntax_option_type optimize;
|
||||
static const syntax_option_type collate;
|
||||
static const syntax_option_type ECMAScript = normal;
|
||||
static const syntax_option_type JavaScript = normal;
|
||||
static const syntax_option_type JScript = normal;
|
||||
static const syntax_option_type basic;
|
||||
static const syntax_option_type extended;
|
||||
static const syntax_option_type awk;
|
||||
static const syntax_option_type grep;
|
||||
static const syntax_option_type egrep;
|
||||
static const syntax_option_type sed = basic;
|
||||
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
|
||||
} // namespace regex_constants
|
||||
} // namespace std</PRE>
|
||||
<H3>Description</H3>
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
|
||||
type (17.3.2.1.2). Setting its elements has the effects listed in the table
|
||||
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
|
||||
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
|
||||
or perl</CODE> set.</P>
|
||||
<P>Note that for convenience all the constants listed here are duplicated within
|
||||
the scope of class template basic_regex, so you can use any of:</P>
|
||||
<PRE>boost::regex_constants::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::regex::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::wregex::constant_name</PRE>
|
||||
<P>in an interchangeable manner.</P>
|
||||
<P>
|
||||
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>Element</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Effect if set</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>normal</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine uses its
|
||||
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
|
||||
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
|
||||
(FWD.1).</P>
|
||||
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>icase</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that matching of regular expressions against a character container
|
||||
sequence shall be performed without regard to case.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>nosubs</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that when a regular expression is matched against a character
|
||||
container sequence, then no sub-expression matches are to be stored in the
|
||||
supplied match_results structure.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>optimize</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the regular expression engine should pay more attention to the
|
||||
speed with which regular expressions are matched, and less to the speed with
|
||||
which regular expression objects are constructed. Otherwise it has no
|
||||
detectable effect on the program output. This currently has no effect for
|
||||
boost.regex.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>collate</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>ECMAScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JavaScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>basic</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
|
||||
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
|
||||
Section 9, Regular Expressions (FWD.1).
|
||||
</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>extended</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX extended regular expressions in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
|
||||
Headers, Section 9, Regular Expressions (FWD.1).</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>awk</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
|
||||
(FWD.1).</P>
|
||||
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
|
||||
character classes permitted.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>grep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
|
||||
Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX basic syntax, but with the newline character
|
||||
acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>egrep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep when given the -E option in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
|
||||
Utilities, Section 4, Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX extended syntax, but with the newline
|
||||
character acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>sed</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as basic.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>perl</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>The following constants are specific to this particular regular expression
|
||||
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>:</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
|
||||
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
|
||||
characters, for example [\]] represents the set of characters containing only
|
||||
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::char_classes</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
|
||||
are allowed inside character set declarations, for example "[[:word:]]"
|
||||
represents the set of all characters that belong to the character class "word".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: intervals</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
|
||||
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
|
||||
a's.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
|
||||
ordinary characters in all situations.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
|
||||
has the same effect as the alternation operator "|".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
|
||||
more repetition operator and "\?" represents the zero or one repetition
|
||||
operator. When this bit is not set then "+" and "?" are used instead.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
|
||||
bounded repetitions and "{" and "}" are normal characters. This is the opposite
|
||||
of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
|
||||
group sub-expressions and "(" and ")" are ordinary characters, this is the
|
||||
opposite of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then back references are
|
||||
allowed.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
|
||||
alternation operator and "|" is an ordinary character. This is the opposite of
|
||||
default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: use_except</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
|
||||
exception will be thrown on error. Use of this flag is deprecated -
|
||||
basic_regex will always throw on error.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: failbit</TD>
|
||||
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
|
||||
not set, then this bit should be checked to see if a regular expression is
|
||||
valid before usage.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::literal</TD>
|
||||
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
|
||||
there are no special characters or escape sequences.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
|
||||
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
|
||||
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
68
doc/thread_safety.html
Normal file
68
doc/thread_safety.html
Normal file
@ -0,0 +1,68 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Thread Safety</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Thread Safety</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Class <A href="basic_regex.html">basic_regex</A><> and its typedefs regex
|
||||
and wregex are thread safe, in that compiled regular expressions can safely be
|
||||
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
|
||||
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
|
||||
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
|
||||
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
|
||||
is now thread safe, in that the results of a match can be safely copied from
|
||||
one thread to another (for example one thread may find matches and push
|
||||
match_results instances onto a queue, while another thread pops them off the
|
||||
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
|
||||
per thread.
|
||||
</P>
|
||||
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
|
||||
thread safe, regular expressions compiled with <I>regcomp</I> can also be
|
||||
shared between threads.
|
||||
</P>
|
||||
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
|
||||
gets its own RegEx instance (apartment threading) - this is a consequence of
|
||||
RegEx handling both compiling and matching regular expressions.
|
||||
</P>
|
||||
<P>Finally note that changing the global locale invalidates all compiled regular
|
||||
expressions, therefore calling <I>set_locale</I> from one thread while another
|
||||
uses regular expressions <I>will</I> produce unpredictable results.
|
||||
</P>
|
||||
<P>
|
||||
There is also a requirement that there is only one thread executing prior to
|
||||
the start of main().</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
BIN
doc/uarrow.gif
Normal file
BIN
doc/uarrow.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.6 KiB |
705
doc/vc71-performance.html
Normal file
705
doc/vc71-performance.html
Normal file
@ -0,0 +1,705 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Regular Expression Performance Comparison (Visual Studio.NET 2003)</title>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
|
||||
<META content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot" name="Template">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
</head>
|
||||
<body bgcolor="#ffffff" link="#0000ff" vlink="#800080">
|
||||
<h2>Regular Expression Performance Comparison</h2>
|
||||
<p>The following tables provide comparisons between the following regular
|
||||
expression libraries:</p>
|
||||
<p><a href="http://research.microsoft.com/projects/greta"> GRETA</a>.</p>
|
||||
<p><a href="http://www.boost.org/">The Boost regex library</a>.</p>
|
||||
<p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a>
|
||||
- this is provided for comparison as a typical non-backtracking implementation.</p>
|
||||
<p>Philip Hazel's <a href="http://www.pcre.org">PCRE</a> library.</p>
|
||||
<h3>Details</h3>
|
||||
<p>Machine: Intel Pentium 4 2.8GHz PC.</p>
|
||||
<p>Compiler: Microsoft Visual C++ version 7.1.</p>
|
||||
<p>C++ Standard Library: Dinkumware standard library version 313.</p>
|
||||
<p>OS: Win32.</p>
|
||||
<p>Boost version: 1.31.0.</p>
|
||||
<p>PCRE version: 3.9.</p>
|
||||
<p>As ever care should be taken in interpreting the results, only sensible regular
|
||||
expressions (rather than pathological cases) are given, most are taken from the
|
||||
Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of
|
||||
Regular Expressions</a>. In addition, some variation in the relative
|
||||
performance of these libraries can be expected on other machines - as memory
|
||||
access and processor caching effects can be quite large for most finite state
|
||||
machine algorithms. In each case the first figure given is the relative
|
||||
time taken (so a value of 1.0 is as good as it gets), while the second figure
|
||||
is the actual time taken.</p>
|
||||
<h3>Averages</h3>
|
||||
<p>The following are the average relative scores for all the tests: the perfect
|
||||
regular expression library would score 1, in practice anything less than 2
|
||||
is pretty good.</p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>6.90669</td>
|
||||
<td>23.751</td>
|
||||
<td>1.62553</td>
|
||||
<td>1.38213</td>
|
||||
<td>110.973</td>
|
||||
<td>1.69371</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 1: Long Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within a long English language text was measured
|
||||
(<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a>
|
||||
from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb). </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Twain</code></td>
|
||||
<td>19.7<br>
|
||||
(0.541s)</td>
|
||||
<td>85.5<br>
|
||||
(2.35s)</td>
|
||||
<td>3.09<br>
|
||||
(0.0851s)</td>
|
||||
<td>3.09<br>
|
||||
(0.0851s)</td>
|
||||
<td>131<br>
|
||||
(3.6s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.0275s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Huck[[:alpha:]]+</code></td>
|
||||
<td>11<br>
|
||||
(0.55s)</td>
|
||||
<td>93.4<br>
|
||||
(4.68s)</td>
|
||||
<td>3.4<br>
|
||||
(0.17s)</td>
|
||||
<td>3.35<br>
|
||||
(0.168s)</td>
|
||||
<td>124<br>
|
||||
(6.19s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.0501s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>[[:alpha:]]+ing</code></td>
|
||||
<td>11.3<br>
|
||||
(6.82s)</td>
|
||||
<td>21.3<br>
|
||||
(12.8s)</td>
|
||||
<td>1.83<br>
|
||||
(1.1s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.601s)</font></td>
|
||||
<td>6.47<br>
|
||||
(3.89s)</td>
|
||||
<td>4.75<br>
|
||||
(2.85s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[^ ]*?Twain</code></td>
|
||||
<td>5.75<br>
|
||||
(1.15s)</td>
|
||||
<td>17.1<br>
|
||||
(3.43s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.2s)</font></td>
|
||||
<td>1.3<br>
|
||||
(0.26s)</td>
|
||||
<td>NA</td>
|
||||
<td>3.8<br>
|
||||
(0.761s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
|
||||
<td>28.5<br>
|
||||
(3.1s)</td>
|
||||
<td>77.2<br>
|
||||
(8.4s)</td>
|
||||
<td>2.3<br>
|
||||
(0.251s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.109s)</font></td>
|
||||
<td>191<br>
|
||||
(20.8s)</td>
|
||||
<td>1.77<br>
|
||||
(0.193s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
|
||||
<td>16.2<br>
|
||||
(4.14s)</td>
|
||||
<td>49<br>
|
||||
(12.5s)</td>
|
||||
<td>1.65<br>
|
||||
(0.42s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.255s)</font></td>
|
||||
<td>NA</td>
|
||||
<td>2.43<br>
|
||||
(0.62s)</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 2: Medium Sized Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within a medium sized English language text was
|
||||
measured (the first 50K from mtent12.txt). </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Twain</code></td>
|
||||
<td>9.49<br>
|
||||
(0.00274s)</td>
|
||||
<td>40.7<br>
|
||||
(0.0117s)</td>
|
||||
<td>1.54<br>
|
||||
(0.000445s)</td>
|
||||
<td>1.56<br>
|
||||
(0.00045s)</td>
|
||||
<td>13.5<br>
|
||||
(0.00391s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000289s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Huck[[:alpha:]]+</code></td>
|
||||
<td>14.3<br>
|
||||
(0.0027s)</td>
|
||||
<td>62.3<br>
|
||||
(0.0117s)</td>
|
||||
<td>2.26<br>
|
||||
(0.000425s)</td>
|
||||
<td>2.29<br>
|
||||
(0.000431s)</td>
|
||||
<td>1.27<br>
|
||||
(0.000239s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000188s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>[[:alpha:]]+ing</code></td>
|
||||
<td>7.34<br>
|
||||
(0.0178s)</td>
|
||||
<td>13.7<br>
|
||||
(0.0331s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.00243s)</font></td>
|
||||
<td><font color="#008000">1.02<br>
|
||||
(0.00246s)</font></td>
|
||||
<td>7.36<br>
|
||||
(0.0178s)</td>
|
||||
<td>5.87<br>
|
||||
(0.0142s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[^ ]*?Twain</code></td>
|
||||
<td>8.34<br>
|
||||
(0.00579s)</td>
|
||||
<td>24.8<br>
|
||||
(0.0172s)</td>
|
||||
<td>1.52<br>
|
||||
(0.00105s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000694s)</font></td>
|
||||
<td>NA</td>
|
||||
<td>2.81<br>
|
||||
(0.00195s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
|
||||
<td>12.9<br>
|
||||
(0.00781s)</td>
|
||||
<td>35.1<br>
|
||||
(0.0213s)</td>
|
||||
<td>1.67<br>
|
||||
(0.00102s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000606s)</font></td>
|
||||
<td>81.5<br>
|
||||
(0.0494s)</td>
|
||||
<td>1.94<br>
|
||||
(0.00117s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
|
||||
<td>15.6<br>
|
||||
(0.0106s)</td>
|
||||
<td>46.6<br>
|
||||
(0.0319s)</td>
|
||||
<td>2.72<br>
|
||||
(0.00186s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000684s)</font></td>
|
||||
<td>311<br>
|
||||
(0.213s)</td>
|
||||
<td>1.72<br>
|
||||
(0.00117s)</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 3: C++ Code Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within the C++ source file <a href="../../../boost/crc.hpp">
|
||||
boost/crc.hpp</a> was measured. </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([
|
||||
]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{)</code></td>
|
||||
<td>8.88<br>
|
||||
(0.000792s)</td>
|
||||
<td>46.4<br>
|
||||
(0.00414s)</td>
|
||||
<td>1.19<br>
|
||||
(0.000106s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(8.92e-005s)</font></td>
|
||||
<td>688<br>
|
||||
(0.0614s)</td>
|
||||
<td>3.23<br>
|
||||
(0.000288s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>(^[
|
||||
]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\></code></td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.00571s)</font></td>
|
||||
<td>5.31<br>
|
||||
(0.0303s)</td>
|
||||
<td>2.47<br>
|
||||
(0.0141s)</td>
|
||||
<td>1.92<br>
|
||||
(0.011s)</td>
|
||||
<td>NA</td>
|
||||
<td>3.29<br>
|
||||
(0.0188s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>)</code></td>
|
||||
<td>5.78<br>
|
||||
(0.00172s)</td>
|
||||
<td>26.3<br>
|
||||
(0.00783s)</td>
|
||||
<td>1.12<br>
|
||||
(0.000333s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000298s)</font></td>
|
||||
<td>128<br>
|
||||
(0.0382s)</td>
|
||||
<td>1.74<br>
|
||||
(0.000518s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>)</code></td>
|
||||
<td>10.2<br>
|
||||
(0.00305s)</td>
|
||||
<td>28.4<br>
|
||||
(0.00845s)</td>
|
||||
<td>1.12<br>
|
||||
(0.000333s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000298s)</font></td>
|
||||
<td>155<br>
|
||||
(0.0463s)</td>
|
||||
<td>1.74<br>
|
||||
(0.000519s)</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<h3></h3>
|
||||
<H3>Comparison 4: HTML Document Search
|
||||
</H3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within the html file <a href="../../libraries.htm">libs/libraries.htm</a>
|
||||
was measured. </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>beman|john|dave</code></td>
|
||||
<td>11<br>
|
||||
(0.00297s)</td>
|
||||
<td>34.3<br>
|
||||
(0.00922s)</td>
|
||||
<td>1.78<br>
|
||||
(0.000479s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000269s)</font></td>
|
||||
<td>55.2<br>
|
||||
(0.0149s)</td>
|
||||
<td>1.85<br>
|
||||
(0.000499s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code><p>.*?</p></code></td>
|
||||
<td>5.38<br>
|
||||
(0.00145s)</td>
|
||||
<td>21.8<br>
|
||||
(0.00587s)</td>
|
||||
<td><font color="#008000">1.02<br>
|
||||
(0.000274s)</font></td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000269s)</font></td>
|
||||
<td>NA</td>
|
||||
<td><font color="#008000">1.05<br>
|
||||
(0.000283s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*></code></td>
|
||||
<td>4.51<br>
|
||||
(0.00207s)</td>
|
||||
<td>12.6<br>
|
||||
(0.00579s)</td>
|
||||
<td>1.34<br>
|
||||
(0.000616s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000459s)</font></td>
|
||||
<td>343<br>
|
||||
(0.158s)</td>
|
||||
<td><font color="#008000">1.09<br>
|
||||
(0.000499s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <h[12345678][^>]*>.*?</h[12345678]></code></td>
|
||||
<td>7.39<br>
|
||||
(0.00143s)</td>
|
||||
<td>29.6<br>
|
||||
(0.00571s)</td>
|
||||
<td>1.87<br>
|
||||
(0.000362s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000193s)</font></td>
|
||||
<td>NA</td>
|
||||
<td>1.27<br>
|
||||
(0.000245s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*></code></td>
|
||||
<td>6.73<br>
|
||||
(0.00145s)</td>
|
||||
<td>27.3<br>
|
||||
(0.00587s)</td>
|
||||
<td>1.2<br>
|
||||
(0.000259s)</td>
|
||||
<td>1.32<br>
|
||||
(0.000283s)</td>
|
||||
<td>148<br>
|
||||
(0.0319s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000215s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font></code></td>
|
||||
<td>6.93<br>
|
||||
(0.00153s)</td>
|
||||
<td>27<br>
|
||||
(0.00595s)</td>
|
||||
<td>1.22<br>
|
||||
(0.000269s)</td>
|
||||
<td>1.31<br>
|
||||
(0.000289s)</td>
|
||||
<td>NA</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.00022s)</font></td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 3: Simple Matches</h3>
|
||||
<p>For each of the following regular expressions the time taken to match against
|
||||
the text indicated was measured. </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>Text</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>abc</code></td>
|
||||
<td>abc</td>
|
||||
<td>1.31<br>
|
||||
(2.2e-007s)</td>
|
||||
<td>1.94<br>
|
||||
(3.25e-007s)</td>
|
||||
<td>1.26<br>
|
||||
(2.1e-007s)</td>
|
||||
<td>1.24<br>
|
||||
(2.08e-007s)</td>
|
||||
<td>3.03<br>
|
||||
(5.06e-007s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.67e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^([0-9]+)(\-| |$)(.*)$</code></td>
|
||||
<td>100- this is a line of ftp response which contains a message string</td>
|
||||
<td>1.52<br>
|
||||
(6.88e-007s)</td>
|
||||
<td>2.28<br>
|
||||
(1.03e-006s)</td>
|
||||
<td>1.5<br>
|
||||
(6.78e-007s)</td>
|
||||
<td>1.5<br>
|
||||
(6.78e-007s)</td>
|
||||
<td>329<br>
|
||||
(0.000149s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(4.53e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td>
|
||||
<td>1234-5678-1234-456</td>
|
||||
<td>2.04<br>
|
||||
(1.03e-006s)</td>
|
||||
<td>2.83<br>
|
||||
(1.43e-006s)</td>
|
||||
<td>2.12<br>
|
||||
(1.07e-006s)</td>
|
||||
<td>2.04<br>
|
||||
(1.03e-006s)</td>
|
||||
<td>30.8<br>
|
||||
(1.56e-005s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(5.05e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
|
||||
<td>john_maddock@compuserve.com</td>
|
||||
<td>1.48<br>
|
||||
(1.78e-006s)</td>
|
||||
<td>2.1<br>
|
||||
(2.52e-006s)</td>
|
||||
<td>1.35<br>
|
||||
(1.62e-006s)</td>
|
||||
<td>1.32<br>
|
||||
(1.59e-006s)</td>
|
||||
<td>165<br>
|
||||
(0.000198s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.2e-006s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
|
||||
<td>foo12@foo.edu</td>
|
||||
<td>1.28<br>
|
||||
(1.41e-006s)</td>
|
||||
<td>1.9<br>
|
||||
(2.1e-006s)</td>
|
||||
<td>1.42<br>
|
||||
(1.57e-006s)</td>
|
||||
<td>1.38<br>
|
||||
(1.53e-006s)</td>
|
||||
<td>107<br>
|
||||
(0.000119s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.11e-006s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
|
||||
<td>bob.smith@foo.tv</td>
|
||||
<td>1.29<br>
|
||||
(1.43e-006s)</td>
|
||||
<td>1.9<br>
|
||||
(2.1e-006s)</td>
|
||||
<td>1.42<br>
|
||||
(1.57e-006s)</td>
|
||||
<td>1.38<br>
|
||||
(1.53e-006s)</td>
|
||||
<td>119<br>
|
||||
(0.000132s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.11e-006s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
|
||||
<td>EH10 2QQ</td>
|
||||
<td>1.26<br>
|
||||
(4.63e-007s)</td>
|
||||
<td>1.77<br>
|
||||
(6.49e-007s)</td>
|
||||
<td>1.3<br>
|
||||
(4.77e-007s)</td>
|
||||
<td>1.2<br>
|
||||
(4.4e-007s)</td>
|
||||
<td>9.15<br>
|
||||
(3.36e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.68e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
|
||||
<td>G1 1AA</td>
|
||||
<td><font color="#008000">1.06<br>
|
||||
(4.73e-007s)</font></td>
|
||||
<td>1.59<br>
|
||||
(7.07e-007s)</td>
|
||||
<td><font color="#008000">1.05<br>
|
||||
(4.68e-007s)</font></td>
|
||||
<td><font color="#008000">1<br>
|
||||
(4.44e-007s)</font></td>
|
||||
<td>12.9<br>
|
||||
(5.73e-006s)</td>
|
||||
<td>1.63<br>
|
||||
(7.26e-007s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
|
||||
<td>SW1 1ZZ</td>
|
||||
<td>1.26<br>
|
||||
(9.17e-007s)</td>
|
||||
<td>1.84<br>
|
||||
(1.34e-006s)</td>
|
||||
<td>1.28<br>
|
||||
(9.26e-007s)</td>
|
||||
<td>1.21<br>
|
||||
(8.78e-007s)</td>
|
||||
<td>8.42<br>
|
||||
(6.11e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(7.26e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
|
||||
<td>4/1/2001</td>
|
||||
<td>1.57<br>
|
||||
(9.73e-007s)</td>
|
||||
<td>2.28<br>
|
||||
(1.41e-006s)</td>
|
||||
<td>1.25<br>
|
||||
(7.73e-007s)</td>
|
||||
<td>1.26<br>
|
||||
(7.83e-007s)</td>
|
||||
<td>11.2<br>
|
||||
(6.95e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(6.21e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
|
||||
<td>12/12/2001</td>
|
||||
<td>1.52<br>
|
||||
(9.56e-007s)</td>
|
||||
<td>2.06<br>
|
||||
(1.3e-006s)</td>
|
||||
<td>1.29<br>
|
||||
(8.12e-007s)</td>
|
||||
<td>1.24<br>
|
||||
(7.83e-007s)</td>
|
||||
<td>12.4<br>
|
||||
(7.8e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(6.3e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
|
||||
<td>123</td>
|
||||
<td>2.11<br>
|
||||
(7.35e-007s)</td>
|
||||
<td>3.18<br>
|
||||
(1.11e-006s)</td>
|
||||
<td>2.5<br>
|
||||
(8.7e-007s)</td>
|
||||
<td>2.44<br>
|
||||
(8.5e-007s)</td>
|
||||
<td>5.26<br>
|
||||
(1.83e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.49e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
|
||||
<td>+3.14159</td>
|
||||
<td>1.31<br>
|
||||
(4.96e-007s)</td>
|
||||
<td>1.92<br>
|
||||
(7.26e-007s)</td>
|
||||
<td>1.26<br>
|
||||
(4.77e-007s)</td>
|
||||
<td>1.2<br>
|
||||
(4.53e-007s)</td>
|
||||
<td>9.71<br>
|
||||
(3.66e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.77e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
|
||||
<td>-3.14159</td>
|
||||
<td>1.32<br>
|
||||
(4.97e-007s)</td>
|
||||
<td>1.92<br>
|
||||
(7.26e-007s)</td>
|
||||
<td>1.24<br>
|
||||
(4.67e-007s)</td>
|
||||
<td>1.2<br>
|
||||
(4.53e-007s)</td>
|
||||
<td>9.7<br>
|
||||
(3.66e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.78e-007s)</font></td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<hr>
|
||||
<p>Copyright John Maddock April 2003, all rights reserved.</p>
|
||||
</body>
|
||||
</html>
|
Reference in New Issue
Block a user