mirror of
https://github.com/boostorg/regex.git
synced 2025-07-12 20:06:38 +02:00
Merged regex-4 branch.
[SVN r18431]
This commit is contained in:
1304
appendix.htm
1304
appendix.htm
File diff suppressed because it is too large
Load Diff
79
doc/Attic/standards.html
Normal file
79
doc/Attic/standards.html
Normal file
@ -0,0 +1,79 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Standards Conformance</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Standards Conformance</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>C++</H3>
|
||||
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>, which will appear in a
|
||||
future C++ standard technical report (and hopefully in a future version of the
|
||||
standard). Currently there are some differences in how the regular
|
||||
expression traits classes are defined, these will be fixed in a future release.</P>
|
||||
<H3>ECMAScript / JavaScript</H3>
|
||||
<P>All of the ECMAScript regular expression syntax features are supported, except
|
||||
that:</P>
|
||||
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
|
||||
definitions ( [...] ).</P>
|
||||
<P>The escape sequence \u matches any upper case character (the same as
|
||||
[[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for
|
||||
Unicode escape sequences.</P>
|
||||
<H3>Perl</H3>
|
||||
<P>Almost all Perl features are supported, except for:</P>
|
||||
<P>\N{name} Use [[:name:]] instead.</P>
|
||||
<P>\pP and \PP</P>
|
||||
<P>(?imsx-imsx)</P>
|
||||
<P>(?<=pattern)</P>
|
||||
<P>(?<!pattern)</P>
|
||||
<P>(?{code})</P>
|
||||
<P>(??{code})</P>
|
||||
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
|
||||
<P>These embarrassments / limitations will be removed in due course, mainly
|
||||
dependent upon user demand.</P>
|
||||
<H3>POSIX</H3>
|
||||
<P>All the POSIX basic and extended regular expression features are supported,
|
||||
except that:</P>
|
||||
<P>No character collating names are recognized except those specified in the POSIX
|
||||
standard for the C locale, unless they are explicitly registered with the
|
||||
traits class.</P>
|
||||
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
|
||||
Win32. Implementing this feature requires knowledge of the format of the
|
||||
string sort keys produced by the system; if you need this, and the default
|
||||
implementation doesn't work on your platform, then you will need to supply a
|
||||
custom traits class.</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
426
doc/Attic/sub_match.html
Normal file
426
doc/Attic/sub_match.html
Normal file
@ -0,0 +1,426 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: sub_match</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">sub_match</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>#include <<A href="../../boost/regex.hpp">boost/regex.hpp</A>>
|
||||
</P>
|
||||
<P>Regular expressions are different from many simple pattern-matching algorithms
|
||||
in that as well as finding an overall match they can also produce
|
||||
sub-expression matches: each sub-expression being delimited in the pattern by a
|
||||
pair of parenthesis (...). There has to be some method for reporting
|
||||
sub-expression matches back to the user: this is achieved this by defining a
|
||||
class <I><A href="match_results.htm">match_results</A></I> that acts as an
|
||||
indexed collection of sub-expression matches, each sub-expression match being
|
||||
contained in an object of type <I>sub_match</I>
|
||||
.
|
||||
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
|
||||
of type <EM><A href="match_results.html">match_results</A></EM>
|
||||
.
|
||||
<P>When the marked sub-expression denoted by an object of type sub_match<>
|
||||
participated in a regular expression match then member <CODE>matched</CODE> evaluates
|
||||
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
|
||||
range of characters <CODE>[first,second)</CODE> which formed that match.
|
||||
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
|
||||
contained undefined values.</P>
|
||||
<P>If an object of type <CODE>sub_match<></CODE> represents sub-expression 0
|
||||
- that is to say the whole match - then member <CODE>matched</CODE> is always
|
||||
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
|
||||
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
|
||||
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
|
||||
character range that formed the partial match.</P>
|
||||
<PRE>
|
||||
namespace boost{
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
|
||||
{
|
||||
public:
|
||||
typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
|
||||
typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
|
||||
typedef BidirectionalIterator iterator;
|
||||
|
||||
bool matched;
|
||||
|
||||
difference_type length()const;
|
||||
operator basic_string<value_type>()const;
|
||||
basic_string<value_type> str()const;
|
||||
|
||||
int compare(const sub_match& s)const;
|
||||
int compare(const basic_string<value_type>& s)const;
|
||||
int compare(const value_type* s)const;
|
||||
};
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
|
||||
template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os,
|
||||
const sub_match<BidirectionalIterator>& m);
|
||||
|
||||
} // namespace boost</PRE>
|
||||
<H3>Description</H3>
|
||||
<H4>
|
||||
sub_match members</H4>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::value_type value_type;</PRE>
|
||||
<P>The type pointed to by the iterators.</P>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::difference_type difference_type;</PRE>
|
||||
<P>A type that represents the difference between two iterators.</P>
|
||||
<PRE>typedef iterator iterator_type;</PRE>
|
||||
<P>The iterator type.</P>
|
||||
<PRE>iterator first</PRE>
|
||||
<P>An iterator denoting the position of the start of the match.</P>
|
||||
<PRE>iterator second</PRE>
|
||||
<P>An iterator denoting the position of the end of the match.</P>
|
||||
<PRE>bool matched</PRE>
|
||||
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
|
||||
<PRE>static difference_type length();</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string<value_type>()const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>()).</P></CODE><PRE>basic_string<value_type> str()const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>())</CODE>.</P><PRE>int compare(const sub_match& s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string<value_type>& s)const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
|
||||
<H4>
|
||||
sub_match non-member operators</H4>
|
||||
<PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) < 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) <= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) >= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) > 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os
|
||||
const sub_match<BidirectionalIterator>& m);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(os << m.str())</CODE>.
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
773
doc/Attic/syntax.html
Normal file
773
doc/Attic/syntax.html
Normal file
@ -0,0 +1,773 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Regular Expression Syntax</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Regular Expression Syntax</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>This section covers the regular expression syntax used by this library, this is
|
||||
a programmers guide, the actual syntax presented to your program's users will
|
||||
depend upon the flags used during expression compilation.
|
||||
</P>
|
||||
<H3>Literals
|
||||
</H3>
|
||||
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
|
||||
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
|
||||
a "\". A literal is a character that matches itself, or matches the result of
|
||||
traits_type::translate(), where traits_type is the traits template parameter to
|
||||
class basic_regex.</P>
|
||||
<H3>Wildcard
|
||||
</H3>
|
||||
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
|
||||
is passed to the matching algorithms, the dot does not match a null character;
|
||||
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
|
||||
the dot does not match a newline character.
|
||||
</P>
|
||||
<H3>Repeats
|
||||
</H3>
|
||||
<P>A repeat is an expression that is repeated an arbitrary number of times. An
|
||||
expression followed by "*" can be repeated any number of times including zero.
|
||||
An expression followed by "+" can be repeated any number of times, but at least
|
||||
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
|
||||
then "+" is an ordinary character and "\+" represents a repeat of once or more.
|
||||
An expression followed by "?" may be repeated zero or one times only, if the
|
||||
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
|
||||
ordinary character and "\?" represents the repeat zero or once operator. When
|
||||
it is necessary to specify the minimum and maximum number of repeats
|
||||
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
|
||||
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
|
||||
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
|
||||
no upper limit. Note that there must be no white-space inside the {}, and there
|
||||
is no upper limit on the values of the lower and upper bounds. When the
|
||||
expression is compiled with the flag regex_constants::bk_braces then "{" and
|
||||
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
|
||||
instead. All repeat expressions refer to the shortest possible previous
|
||||
sub-expression: a single character; a character set, or a sub-expression
|
||||
grouped with "()" for example.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"ba*" will match all of "b", "ba", "baaa" etc.
|
||||
</P>
|
||||
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
|
||||
</P>
|
||||
<P>"ba?" will match "b" or "ba".
|
||||
</P>
|
||||
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
|
||||
</P>
|
||||
<H3>Non-greedy repeats
|
||||
</H3>
|
||||
<P>Whenever the "extended" regular expression syntax is in use (the default) then
|
||||
non-greedy repeats are possible by appending a '?' after the repeat; a
|
||||
non-greedy repeat is one which will match the <I>shortest</I> possible string.
|
||||
</P>
|
||||
<P>For example to match html tag pairs one could use something like:
|
||||
</P>
|
||||
<P>"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
|
||||
</P>
|
||||
<P>In this case $1 will contain the text between the tag pairs, and will be the
|
||||
shortest possible matching string.
|
||||
</P>
|
||||
<H3>Parenthesis
|
||||
</H3>
|
||||
<P>Parentheses serve two purposes, to group items together into a sub-expression,
|
||||
and to mark what generated the match. For example the expression "(ab)*" would
|
||||
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
|
||||
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
|
||||
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
|
||||
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
|
||||
match_results</A> contains information both on what the whole expression
|
||||
matched and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the final "ab" of
|
||||
the matching string. It is permissible for sub-expressions to match null
|
||||
strings. If a sub-expression takes no part in a match - for example if it is
|
||||
part of an alternative that is not taken - then both of the iterators that are
|
||||
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
|
||||
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
|
||||
from left to right starting from 1, sub-expression 0 is the whole expression.
|
||||
</P>
|
||||
<H3>Non-Marking Parenthesis
|
||||
</H3>
|
||||
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
|
||||
the parenthesis to spit out another marked sub-expression, in this case a
|
||||
non-marking parenthesis (?:expression) can be used. For example the following
|
||||
expression creates no sub-expressions:
|
||||
</P>
|
||||
<P>"(?:abc)*"</P>
|
||||
<H3>Forward Lookahead Asserts
|
||||
</H3>
|
||||
<P>There are two forms of these; one for positive forward lookahead asserts, and
|
||||
one for negative lookahead asserts:</P>
|
||||
<P>"(?=abc)" matches zero characters only if they are followed by the expression
|
||||
"abc".</P>
|
||||
<P>"(?!abc)" matches zero characters only if they are not followed by the
|
||||
expression "abc".</P>
|
||||
<H3>Independent sub-expressions</H3>
|
||||
<P>"(?>expression)" matches "expression" as an independent atom (the algorithm
|
||||
will not backtrack into it if a failure occurs later in the expression).</P>
|
||||
<H3>Alternatives
|
||||
</H3>
|
||||
<P>Alternatives occur when the expression can match either one sub-expression or
|
||||
another, each alternative is separated by a "|", or a "\|" if the flag
|
||||
regex_constants::bk_vbar is set, or by a newline character if the flag
|
||||
regex_constants::newline_alt is set. Each alternative is the largest possible
|
||||
previous sub-expression; this is the opposite behavior from repetition
|
||||
operators.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"a(b|c)" could match "ab" or "ac".
|
||||
</P>
|
||||
<P>"abc|def" could match "abc" or "def".
|
||||
</P>
|
||||
<H3>Sets
|
||||
</H3>
|
||||
<P>A set is a set of characters that can match any single character that is a
|
||||
member of the set. Sets are delimited by "[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and equivalence
|
||||
classes. Set declarations that start with "^" contain the compliment of the
|
||||
elements that follow.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>Character literals:
|
||||
</P>
|
||||
<P>"[abc]" will match either of "a", "b", or "c".
|
||||
</P>
|
||||
<P>"[^abc] will match any character other than "a", "b", or "c".
|
||||
</P>
|
||||
<P>Character ranges:
|
||||
</P>
|
||||
<P>"[a-z]" will match any character in the range "a" to "z".
|
||||
</P>
|
||||
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
|
||||
</P>
|
||||
<P>Note that character ranges are highly locale dependent if the flag
|
||||
regex_constants::collate is set: they match any character that collates between
|
||||
the endpoints of the range, ranges will only behave according to ASCII rules
|
||||
when the default "C" locale is in effect. For example if the library is
|
||||
compiled with the Win32 localization model, then [a-z] will match the ASCII
|
||||
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
|
||||
'z'. This locale specific behavior is disabled by default (in perl mode), and
|
||||
forces ranges to collate according to ASCII character code.
|
||||
</P>
|
||||
<P>Character classes are denoted using the syntax "[:classname:]" within a set
|
||||
declaration, for example "[[:space:]]" is the set of all whitespace characters.
|
||||
Character classes are only available if the flag regex_constants::char_classes
|
||||
is set. The available character classes are:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="50%">alnum</TD>
|
||||
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">alpha</TD>
|
||||
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
|
||||
characters may also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">blank</TD>
|
||||
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">cntrl</TD>
|
||||
<TD vAlign="top" width="50%">Any control character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">digit</TD>
|
||||
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">graph</TD>
|
||||
<TD vAlign="top" width="50%">Any graphical character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">lower</TD>
|
||||
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">print</TD>
|
||||
<TD vAlign="top" width="50%">Any printable character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">punct</TD>
|
||||
<TD vAlign="top" width="50%">Any punctuation character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">space</TD>
|
||||
<TD vAlign="top" width="50%">Any whitespace character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">upper</TD>
|
||||
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">xdigit</TD>
|
||||
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">word</TD>
|
||||
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
|
||||
the underscore.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">Unicode</TD>
|
||||
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
|
||||
applies to the wide character traits classes only.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>There are some shortcuts that can be used in place of the character classes,
|
||||
provided the flag regex_constants::escape_in_lists is set then you can use:
|
||||
</P>
|
||||
<P>\w in place of [:word:]
|
||||
</P>
|
||||
<P>\s in place of [:space:]
|
||||
</P>
|
||||
<P>\d in place of [:digit:]
|
||||
</P>
|
||||
<P>\l in place of [:lower:]
|
||||
</P>
|
||||
<P>\u in place of [:upper:]
|
||||
</P>
|
||||
<P>Collating elements take the general form [.tagname.] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
|
||||
equivalent to [,]. The library supports all the standard POSIX collating
|
||||
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
|
||||
"nj", "dz", "lj", each in lower, upper and title case variations.
|
||||
Multi-character collating elements can result in the set matching more than one
|
||||
character, for example [[.ae.]] would match two characters, but note that
|
||||
[^[.ae.]] would only match one character.
|
||||
</P>
|
||||
<P>
|
||||
Equivalence classes take the general form[=tagname=] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, and matches any character that is a member of the same primary
|
||||
equivalence class as the collating element [.tagname.]. An equivalence class is
|
||||
a set of characters that collate the same, a primary equivalence class is a set
|
||||
of characters whose primary sort key are all the same (for example strings are
|
||||
typically collated by character, then by accent, and then by case; the primary
|
||||
sort key then relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
|
||||
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
locale independent method of obtaining the primary sort key for a character,
|
||||
except under Win32. For other operating systems the library will "guess" the
|
||||
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
|
||||
equivalence classes are probably best considered broken under any operating
|
||||
system other than Win32.
|
||||
</P>
|
||||
<P>To include a literal "-" in a set declaration then: make it the first character
|
||||
after the opening "[" or "[^", the endpoint of a range, a collating element, or
|
||||
if the flag regex_constants::escape_in_lists is set then precede with an escape
|
||||
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
|
||||
make them the endpoint of a range, a collating element, or precede with an
|
||||
escape character if the flag regex_constants::escape_in_lists is set.
|
||||
</P>
|
||||
<H3>Line anchors
|
||||
</H3>
|
||||
<P>An anchor is something that matches the null string at the start or end of a
|
||||
line: "^" matches the null string at the start of a line, "$" matches the null
|
||||
string at the end of a line.
|
||||
</P>
|
||||
<H3>Back references
|
||||
</H3>
|
||||
<P>A back reference is a reference to a previous sub-expression that has already
|
||||
been matched, the reference is to what the sub-expression matched, not to the
|
||||
expression itself. A back reference consists of the escape character "\"
|
||||
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
|
||||
to the second etc. For example the expression "(.*)\1" matches any string that
|
||||
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
|
||||
reference to a sub-expression that did not participate in any match, matches
|
||||
the null string: NB this is different to some other regular expression
|
||||
matchers. Back references are only available if the expression is compiled with
|
||||
the flag regex_constants::bk_refs set.
|
||||
</P>
|
||||
<H3>Characters by code
|
||||
</H3>
|
||||
<P>This is an extension to the algorithm that is not available in other libraries,
|
||||
it consists of the escape character followed by the digit "0" followed by the
|
||||
octal character code. For example "\023" represents the character whose octal
|
||||
code is 23. Where ambiguity could occur use parentheses to break the expression
|
||||
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
|
||||
character 10 followed by "3". To match characters by their hexadecimal code,
|
||||
use \x followed by a string of hexadecimal digits, optionally enclosed inside
|
||||
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
|
||||
character.</P>
|
||||
<H3>Word operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library.
|
||||
</P>
|
||||
<P>"\w" matches any single character that is a member of the "word" character
|
||||
class, this is identical to the expression "[[:word:]]".
|
||||
</P>
|
||||
<P>"\W" matches any single character that is not a member of the "word" character
|
||||
class, this is identical to the expression "[^[:word:]]".
|
||||
</P>
|
||||
<P>"\<" matches the null string at the start of a word.
|
||||
</P>
|
||||
<P>"\>" matches the null string at the end of the word.
|
||||
</P>
|
||||
<P>"\b" matches the null string at either the start or the end of a word.
|
||||
</P>
|
||||
<P>"\B" matches a null string within a word.
|
||||
</P>
|
||||
<P>The start of the sequence passed to the matching algorithms is considered to be
|
||||
a potential start of a word unless the flag match_not_bow is set. The end of
|
||||
the sequence passed to the matching algorithms is considered to be a potential
|
||||
end of a word unless the flag match_not_eow is set.
|
||||
</P>
|
||||
<H3>Buffer operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library, and Perl regular expressions:
|
||||
</P>
|
||||
<P>"\`" matches the start of a buffer.
|
||||
</P>
|
||||
<P>"\A" matches the start of the buffer.
|
||||
</P>
|
||||
<P>"\'" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\z" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
|
||||
followed by the end of the buffer.
|
||||
</P>
|
||||
<P>A buffer is considered to consist of the whole sequence passed to the matching
|
||||
algorithms, unless the flags match_not_bob or match_not_eob are set.
|
||||
</P>
|
||||
<H3>Escape operator
|
||||
</H3>
|
||||
<P>The escape character "\" has several meanings.
|
||||
</P>
|
||||
<P>Inside a set declaration the escape character is a normal character unless the
|
||||
flag regex_constants::escape_in_lists is set in which case whatever follows the
|
||||
escape is a literal character regardless of its normal meaning.
|
||||
</P>
|
||||
<P>The escape operator may introduce an operator for example: back references, or
|
||||
a word operator.
|
||||
</P>
|
||||
<P>The escape operator may make the following character normal, for example "\*"
|
||||
represents a literal "*" rather than the repeat operator.
|
||||
</P>
|
||||
<H4>Single character escape sequences
|
||||
</H4>
|
||||
<P>The following escape sequences are aliases for single characters:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="33%">Escape sequence
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Character code
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Meaning
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\a
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x07
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Bell character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\f
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0C
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Form feed.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\n
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0A
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Newline character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\r
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0D
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Carriage return.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\t
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x09
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Tab character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\v
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Vertical tab.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\e
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x1B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">ASCII Escape character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
|
||||
more octal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\x{XX}
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits, optionally a Unicode character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\cZ
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">z-@
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
|
||||
ASCII character greater than or equal to the character code for '@'.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H4>Miscellaneous escape sequences:
|
||||
</H4>
|
||||
<P>The following are provided mostly for perl compatibility, but note that there
|
||||
are some differences in the meanings of \l \L \u and \U:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\w
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\W
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\s
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\S
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\d
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\D
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\l
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\L
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\u
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\U
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\C
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\X
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
|
||||
example "a\x 0301" (a letter a with an acute).
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\Q
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
|
||||
treated as a literal character until a \E end quote operator is found.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\E
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
|
||||
with \Q.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H3>What gets matched?
|
||||
</H3>
|
||||
<P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
<P>
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P><P>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
xaby</CODE>
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"ab"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"a"</CODE></P></TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*([[:alnum:]]+).*</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
" abc def xyz "</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*(a|xayy)</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
zzxayyzz</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"zzxayy"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>These differences between Perl matching rules, and POSIX matching rules, mean
|
||||
that these two regular expression syntaxes differ not only in the features
|
||||
offered, but also in the form that the state machine takes and/or the
|
||||
algorithms used to traverse the state machine.</p>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
332
doc/Attic/syntax_option_type.html
Normal file
332
doc/Attic/syntax_option_type.html
Normal file
@ -0,0 +1,332 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: syntax_option_type</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">syntax_option_type</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>Type syntax_option type is an implementation defined bitmask type that controls
|
||||
how a regular expression string is to be interpreted. For convenience
|
||||
note that all the constants listed here, are also duplicated within the scope
|
||||
of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<PRE>namespace std{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type syntax_option_type;
|
||||
// these flags are standardized:
|
||||
static const syntax_option_type normal;
|
||||
static const syntax_option_type icase;
|
||||
static const syntax_option_type nosubs;
|
||||
static const syntax_option_type optimize;
|
||||
static const syntax_option_type collate;
|
||||
static const syntax_option_type ECMAScript = normal;
|
||||
static const syntax_option_type JavaScript = normal;
|
||||
static const syntax_option_type JScript = normal;
|
||||
static const syntax_option_type basic;
|
||||
static const syntax_option_type extended;
|
||||
static const syntax_option_type awk;
|
||||
static const syntax_option_type grep;
|
||||
static const syntax_option_type egrep;
|
||||
static const syntax_option_type sed = basic;
|
||||
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
|
||||
} // namespace regex_constants
|
||||
} // namespace std</PRE>
|
||||
<H3>Description</H3>
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
|
||||
type (17.3.2.1.2). Setting its elements has the effects listed in the table
|
||||
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
|
||||
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
|
||||
or perl</CODE> set.</P>
|
||||
<P>Note that for convenience all the constants listed here are duplicated within
|
||||
the scope of class template basic_regex, so you can use any of:</P>
|
||||
<PRE>boost::regex_constants::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::regex::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::wregex::constant_name</PRE>
|
||||
<P>in an interchangeable manner.</P>
|
||||
<P>
|
||||
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>Element</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Effect if set</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>normal</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine uses its
|
||||
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
|
||||
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
|
||||
(FWD.1).</P>
|
||||
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>icase</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that matching of regular expressions against a character container
|
||||
sequence shall be performed without regard to case.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>nosubs</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that when a regular expression is matched against a character
|
||||
container sequence, then no sub-expression matches are to be stored in the
|
||||
supplied match_results structure.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>optimize</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the regular expression engine should pay more attention to the
|
||||
speed with which regular expressions are matched, and less to the speed with
|
||||
which regular expression objects are constructed. Otherwise it has no
|
||||
detectable effect on the program output. This currently has no effect for
|
||||
boost.regex.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>collate</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>ECMAScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JavaScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>basic</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
|
||||
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
|
||||
Section 9, Regular Expressions (FWD.1).
|
||||
</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>extended</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX extended regular expressions in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
|
||||
Headers, Section 9, Regular Expressions (FWD.1).</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>awk</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
|
||||
(FWD.1).</P>
|
||||
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
|
||||
character classes permitted.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>grep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
|
||||
Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX basic syntax, but with the newline character
|
||||
acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>egrep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep when given the -E option in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
|
||||
Utilities, Section 4, Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX extended syntax, but with the newline
|
||||
character acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>sed</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as basic.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>perl</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>The following constants are specific to this particular regular expression
|
||||
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>:</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
|
||||
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
|
||||
characters, for example [\]] represents the set of characters containing only
|
||||
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::char_classes</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
|
||||
are allowed inside character set declarations, for example "[[:word:]]"
|
||||
represents the set of all characters that belong to the character class "word".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: intervals</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
|
||||
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
|
||||
a's.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
|
||||
ordinary characters in all situations.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
|
||||
has the same effect as the alternation operator "|".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
|
||||
more repetition operator and "\?" represents the zero or one repetition
|
||||
operator. When this bit is not set then "+" and "?" are used instead.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
|
||||
bounded repetitions and "{" and "}" are normal characters. This is the opposite
|
||||
of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
|
||||
group sub-expressions and "(" and ")" are ordinary characters, this is the
|
||||
opposite of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then back references are
|
||||
allowed.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
|
||||
alternation operator and "|" is an ordinary character. This is the opposite of
|
||||
default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: use_except</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
|
||||
exception will be thrown on error. Use of this flag is deprecated -
|
||||
basic_regex will always throw on error.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: failbit</TD>
|
||||
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
|
||||
not set, then this bit should be checked to see if a regular expression is
|
||||
valid before usage.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::literal</TD>
|
||||
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
|
||||
there are no special characters or escape sequences.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
|
||||
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
|
||||
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
68
doc/Attic/thread_safety.html
Normal file
68
doc/Attic/thread_safety.html
Normal file
@ -0,0 +1,68 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Thread Safety</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Thread Safety</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Class <A href="basic_regex.html">basic_regex</A><> and its typedefs regex
|
||||
and wregex are thread safe, in that compiled regular expressions can safely be
|
||||
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
|
||||
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
|
||||
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
|
||||
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
|
||||
is now thread safe, in that the results of a match can be safely copied from
|
||||
one thread to another (for example one thread may find matches and push
|
||||
match_results instances onto a queue, while another thread pops them off the
|
||||
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
|
||||
per thread.
|
||||
</P>
|
||||
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
|
||||
thread safe, regular expressions compiled with <I>regcomp</I> can also be
|
||||
shared between threads.
|
||||
</P>
|
||||
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
|
||||
gets its own RegEx instance (apartment threading) - this is a consequence of
|
||||
RegEx handling both compiling and matching regular expressions.
|
||||
</P>
|
||||
<P>Finally note that changing the global locale invalidates all compiled regular
|
||||
expressions, therefore calling <I>set_locale</I> from one thread while another
|
||||
uses regular expressions <I>will</I> produce unpredictable results.
|
||||
</P>
|
||||
<P>
|
||||
There is also a requirement that there is only one thread executing prior to
|
||||
the start of main().</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
BIN
doc/Attic/uarrow.gif
Normal file
BIN
doc/Attic/uarrow.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.6 KiB |
79
doc/standards.html
Normal file
79
doc/standards.html
Normal file
@ -0,0 +1,79 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Standards Conformance</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Standards Conformance</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>C++</H3>
|
||||
<P>Boost.regex is intended to conform to the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>, which will appear in a
|
||||
future C++ standard technical report (and hopefully in a future version of the
|
||||
standard). Currently there are some differences in how the regular
|
||||
expression traits classes are defined, these will be fixed in a future release.</P>
|
||||
<H3>ECMAScript / JavaScript</H3>
|
||||
<P>All of the ECMAScript regular expression syntax features are supported, except
|
||||
that:</P>
|
||||
<P>Negated class escapes (\S, \D and \W) are not permitted inside character class
|
||||
definitions ( [...] ).</P>
|
||||
<P>The escape sequence \u matches any upper case character (the same as
|
||||
[[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for
|
||||
Unicode escape sequences.</P>
|
||||
<H3>Perl</H3>
|
||||
<P>Almost all Perl features are supported, except for:</P>
|
||||
<P>\N{name} Use [[:name:]] instead.</P>
|
||||
<P>\pP and \PP</P>
|
||||
<P>(?imsx-imsx)</P>
|
||||
<P>(?<=pattern)</P>
|
||||
<P>(?<!pattern)</P>
|
||||
<P>(?{code})</P>
|
||||
<P>(??{code})</P>
|
||||
<P>(?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)</P>
|
||||
<P>These embarrassments / limitations will be removed in due course, mainly
|
||||
dependent upon user demand.</P>
|
||||
<H3>POSIX</H3>
|
||||
<P>All the POSIX basic and extended regular expression features are supported,
|
||||
except that:</P>
|
||||
<P>No character collating names are recognized except those specified in the POSIX
|
||||
standard for the C locale, unless they are explicitly registered with the
|
||||
traits class.</P>
|
||||
<P>Character equivalence classes ( [[=a=]] etc) are probably buggy except on
|
||||
Win32. Implementing this feature requires knowledge of the format of the
|
||||
string sort keys produced by the system; if you need this, and the default
|
||||
implementation doesn't work on your platform, then you will need to supply a
|
||||
custom traits class.</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
426
doc/sub_match.html
Normal file
426
doc/sub_match.html
Normal file
@ -0,0 +1,426 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: sub_match</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">sub_match</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>#include <<A href="../../boost/regex.hpp">boost/regex.hpp</A>>
|
||||
</P>
|
||||
<P>Regular expressions are different from many simple pattern-matching algorithms
|
||||
in that as well as finding an overall match they can also produce
|
||||
sub-expression matches: each sub-expression being delimited in the pattern by a
|
||||
pair of parenthesis (...). There has to be some method for reporting
|
||||
sub-expression matches back to the user: this is achieved this by defining a
|
||||
class <I><A href="match_results.htm">match_results</A></I> that acts as an
|
||||
indexed collection of sub-expression matches, each sub-expression match being
|
||||
contained in an object of type <I>sub_match</I>
|
||||
.
|
||||
<P>Objects of type <EM>sub_match</EM> may only obtained by subscripting an object
|
||||
of type <EM><A href="match_results.html">match_results</A></EM>
|
||||
.
|
||||
<P>When the marked sub-expression denoted by an object of type sub_match<>
|
||||
participated in a regular expression match then member <CODE>matched</CODE> evaluates
|
||||
to true, and members <CODE>first</CODE> and <CODE>second</CODE> denote the
|
||||
range of characters <CODE>[first,second)</CODE> which formed that match.
|
||||
Otherwise <CODE>matched</CODE> is false, and members <CODE>first</CODE> and <CODE>second</CODE>
|
||||
contained undefined values.</P>
|
||||
<P>If an object of type <CODE>sub_match<></CODE> represents sub-expression 0
|
||||
- that is to say the whole match - then member <CODE>matched</CODE> is always
|
||||
true, unless a partial match was obtained as a result of the flag <CODE>match_partial</CODE>
|
||||
being passed to a regular expression algorithm, in which case member <CODE>matched</CODE>
|
||||
is false, and members <CODE>first</CODE> and <CODE>second</CODE> represent the
|
||||
character range that formed the partial match.</P>
|
||||
<PRE>
|
||||
namespace boost{
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
|
||||
{
|
||||
public:
|
||||
typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
|
||||
typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
|
||||
typedef BidirectionalIterator iterator;
|
||||
|
||||
bool matched;
|
||||
|
||||
difference_type length()const;
|
||||
operator basic_string<value_type>()const;
|
||||
basic_string<value_type> str()const;
|
||||
|
||||
int compare(const sub_match& s)const;
|
||||
int compare(const basic_string<value_type>& s)const;
|
||||
int compare(const value_type* s)const;
|
||||
};
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
template <class BidirectionalIterator, class traits, class Allocator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);
|
||||
|
||||
template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
|
||||
|
||||
template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os,
|
||||
const sub_match<BidirectionalIterator>& m);
|
||||
|
||||
} // namespace boost</PRE>
|
||||
<H3>Description</H3>
|
||||
<H4>
|
||||
sub_match members</H4>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::value_type value_type;</PRE>
|
||||
<P>The type pointed to by the iterators.</P>
|
||||
<PRE>typedef typename std::iterator_traits<iterator>::difference_type difference_type;</PRE>
|
||||
<P>A type that represents the difference between two iterators.</P>
|
||||
<PRE>typedef iterator iterator_type;</PRE>
|
||||
<P>The iterator type.</P>
|
||||
<PRE>iterator first</PRE>
|
||||
<P>An iterator denoting the position of the start of the match.</P>
|
||||
<PRE>iterator second</PRE>
|
||||
<P>An iterator denoting the position of the end of the match.</P>
|
||||
<PRE>bool matched</PRE>
|
||||
<P>A Boolean value denoting whether this sub-expression participated in the match.</P>
|
||||
<PRE>static difference_type length();</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? 0 : distance(first, second))</CODE>.</P><PRE>operator basic_string<value_type>()const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>()).</P></CODE><PRE>basic_string<value_type> str()const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>(matched ? basic_string<value_type>(first,
|
||||
second) : basic_string<value_type>())</CODE>.</P><PRE>int compare(const sub_match& s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s.str())</CODE>.</P><PRE>int compare(const basic_string<value_type>& s)const;</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P><PRE>int compare(const value_type* s)const;</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>str().compare(s)</CODE>.</P>
|
||||
<H4>
|
||||
sub_match non-member operators</H4>
|
||||
<PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) == 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) != 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) < 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P><B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) <= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) >= 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.compare(rhs) > 0</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const* rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs == rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs != rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs < rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs > rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs >= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
|
||||
const sub_match<BidirectionalIterator>& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs <= rhs.str()</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator == (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() == rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator != (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() != rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator < (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() < rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator > (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() > rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator >= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() >= rhs</CODE>.</P><PRE>template <class BidirectionalIterator>
|
||||
bool operator <= (const sub_match<BidirectionalIterator>& lhs,
|
||||
typename iterator_traits<BidirectionalIterator>::value_type const& rhs); </PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>lhs.str() <= rhs</CODE>.</P><PRE>template <class charT, class traits, class BidirectionalIterator>
|
||||
basic_ostream<charT, traits>&
|
||||
operator << (basic_ostream<charT, traits>& os
|
||||
const sub_match<BidirectionalIterator>& m);</PRE>
|
||||
|
||||
<P> <B>
|
||||
Effects: </B>returns <CODE>(os << m.str())</CODE>.
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
773
doc/syntax.html
Normal file
773
doc/syntax.html
Normal file
@ -0,0 +1,773 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Regular Expression Syntax</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Regular Expression Syntax</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>This section covers the regular expression syntax used by this library, this is
|
||||
a programmers guide, the actual syntax presented to your program's users will
|
||||
depend upon the flags used during expression compilation.
|
||||
</P>
|
||||
<H3>Literals
|
||||
</H3>
|
||||
<P>All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
|
||||
"}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
|
||||
a "\". A literal is a character that matches itself, or matches the result of
|
||||
traits_type::translate(), where traits_type is the traits template parameter to
|
||||
class basic_regex.</P>
|
||||
<H3>Wildcard
|
||||
</H3>
|
||||
<P>The dot character "." matches any single character except : when <I>match_not_dot_null</I>
|
||||
is passed to the matching algorithms, the dot does not match a null character;
|
||||
when <I>match_not_dot_newline</I> is passed to the matching algorithms, then
|
||||
the dot does not match a newline character.
|
||||
</P>
|
||||
<H3>Repeats
|
||||
</H3>
|
||||
<P>A repeat is an expression that is repeated an arbitrary number of times. An
|
||||
expression followed by "*" can be repeated any number of times including zero.
|
||||
An expression followed by "+" can be repeated any number of times, but at least
|
||||
once, if the expression is compiled with the flag regex_constants::bk_plus_qm
|
||||
then "+" is an ordinary character and "\+" represents a repeat of once or more.
|
||||
An expression followed by "?" may be repeated zero or one times only, if the
|
||||
expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
|
||||
ordinary character and "\?" represents the repeat zero or once operator. When
|
||||
it is necessary to specify the minimum and maximum number of repeats
|
||||
explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
|
||||
repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
|
||||
and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
|
||||
no upper limit. Note that there must be no white-space inside the {}, and there
|
||||
is no upper limit on the values of the lower and upper bounds. When the
|
||||
expression is compiled with the flag regex_constants::bk_braces then "{" and
|
||||
"}" are ordinary characters and "\{" and "\}" are used to delimit bounds
|
||||
instead. All repeat expressions refer to the shortest possible previous
|
||||
sub-expression: a single character; a character set, or a sub-expression
|
||||
grouped with "()" for example.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"ba*" will match all of "b", "ba", "baaa" etc.
|
||||
</P>
|
||||
<P>"ba+" will match "ba" or "baaaa" for example but not "b".
|
||||
</P>
|
||||
<P>"ba?" will match "b" or "ba".
|
||||
</P>
|
||||
<P>"ba{2,4}" will match "baa", "baaa" and "baaaa".
|
||||
</P>
|
||||
<H3>Non-greedy repeats
|
||||
</H3>
|
||||
<P>Whenever the "extended" regular expression syntax is in use (the default) then
|
||||
non-greedy repeats are possible by appending a '?' after the repeat; a
|
||||
non-greedy repeat is one which will match the <I>shortest</I> possible string.
|
||||
</P>
|
||||
<P>For example to match html tag pairs one could use something like:
|
||||
</P>
|
||||
<P>"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
|
||||
</P>
|
||||
<P>In this case $1 will contain the text between the tag pairs, and will be the
|
||||
shortest possible matching string.
|
||||
</P>
|
||||
<H3>Parenthesis
|
||||
</H3>
|
||||
<P>Parentheses serve two purposes, to group items together into a sub-expression,
|
||||
and to mark what generated the match. For example the expression "(ab)*" would
|
||||
match all of the string "ababab". The matching algorithms <A href="template_class_ref.htm#query_match">
|
||||
regex_match</A> and <A href="template_class_ref.htm#reg_search">regex_search</A>
|
||||
each take an instance of <A href="template_class_ref.htm#reg_match">match_results</A>
|
||||
that reports what caused the match, on exit from these functions the <A href="template_class_ref.htm#reg_match">
|
||||
match_results</A> contains information both on what the whole expression
|
||||
matched and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the final "ab" of
|
||||
the matching string. It is permissible for sub-expressions to match null
|
||||
strings. If a sub-expression takes no part in a match - for example if it is
|
||||
part of an alternative that is not taken - then both of the iterators that are
|
||||
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
|
||||
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
|
||||
from left to right starting from 1, sub-expression 0 is the whole expression.
|
||||
</P>
|
||||
<H3>Non-Marking Parenthesis
|
||||
</H3>
|
||||
<P>Sometimes you need to group sub-expressions with parenthesis, but don't want
|
||||
the parenthesis to spit out another marked sub-expression, in this case a
|
||||
non-marking parenthesis (?:expression) can be used. For example the following
|
||||
expression creates no sub-expressions:
|
||||
</P>
|
||||
<P>"(?:abc)*"</P>
|
||||
<H3>Forward Lookahead Asserts
|
||||
</H3>
|
||||
<P>There are two forms of these; one for positive forward lookahead asserts, and
|
||||
one for negative lookahead asserts:</P>
|
||||
<P>"(?=abc)" matches zero characters only if they are followed by the expression
|
||||
"abc".</P>
|
||||
<P>"(?!abc)" matches zero characters only if they are not followed by the
|
||||
expression "abc".</P>
|
||||
<H3>Independent sub-expressions</H3>
|
||||
<P>"(?>expression)" matches "expression" as an independent atom (the algorithm
|
||||
will not backtrack into it if a failure occurs later in the expression).</P>
|
||||
<H3>Alternatives
|
||||
</H3>
|
||||
<P>Alternatives occur when the expression can match either one sub-expression or
|
||||
another, each alternative is separated by a "|", or a "\|" if the flag
|
||||
regex_constants::bk_vbar is set, or by a newline character if the flag
|
||||
regex_constants::newline_alt is set. Each alternative is the largest possible
|
||||
previous sub-expression; this is the opposite behavior from repetition
|
||||
operators.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>"a(b|c)" could match "ab" or "ac".
|
||||
</P>
|
||||
<P>"abc|def" could match "abc" or "def".
|
||||
</P>
|
||||
<H3>Sets
|
||||
</H3>
|
||||
<P>A set is a set of characters that can match any single character that is a
|
||||
member of the set. Sets are delimited by "[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and equivalence
|
||||
classes. Set declarations that start with "^" contain the compliment of the
|
||||
elements that follow.
|
||||
</P>
|
||||
<P>Examples:
|
||||
</P>
|
||||
<P>Character literals:
|
||||
</P>
|
||||
<P>"[abc]" will match either of "a", "b", or "c".
|
||||
</P>
|
||||
<P>"[^abc] will match any character other than "a", "b", or "c".
|
||||
</P>
|
||||
<P>Character ranges:
|
||||
</P>
|
||||
<P>"[a-z]" will match any character in the range "a" to "z".
|
||||
</P>
|
||||
<P>"[^A-Z]" will match any character other than those in the range "A" to "Z".
|
||||
</P>
|
||||
<P>Note that character ranges are highly locale dependent if the flag
|
||||
regex_constants::collate is set: they match any character that collates between
|
||||
the endpoints of the range, ranges will only behave according to ASCII rules
|
||||
when the default "C" locale is in effect. For example if the library is
|
||||
compiled with the Win32 localization model, then [a-z] will match the ASCII
|
||||
characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
|
||||
'z'. This locale specific behavior is disabled by default (in perl mode), and
|
||||
forces ranges to collate according to ASCII character code.
|
||||
</P>
|
||||
<P>Character classes are denoted using the syntax "[:classname:]" within a set
|
||||
declaration, for example "[[:space:]]" is the set of all whitespace characters.
|
||||
Character classes are only available if the flag regex_constants::char_classes
|
||||
is set. The available character classes are:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="50%">alnum</TD>
|
||||
<TD vAlign="top" width="50%">Any alpha numeric character.</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">alpha</TD>
|
||||
<TD vAlign="top" width="50%">Any alphabetical character a-z and A-Z. Other
|
||||
characters may also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">blank</TD>
|
||||
<TD vAlign="top" width="50%">Any blank character, either a space or a tab.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">cntrl</TD>
|
||||
<TD vAlign="top" width="50%">Any control character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">digit</TD>
|
||||
<TD vAlign="top" width="50%">Any digit 0-9.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">graph</TD>
|
||||
<TD vAlign="top" width="50%">Any graphical character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">lower</TD>
|
||||
<TD vAlign="top" width="50%">Any lower case character a-z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">print</TD>
|
||||
<TD vAlign="top" width="50%">Any printable character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">punct</TD>
|
||||
<TD vAlign="top" width="50%">Any punctuation character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">space</TD>
|
||||
<TD vAlign="top" width="50%">Any whitespace character.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">upper</TD>
|
||||
<TD vAlign="top" width="50%">Any upper case character A-Z. Other characters may
|
||||
also be included depending upon the locale.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">xdigit</TD>
|
||||
<TD vAlign="top" width="50%">Any hexadecimal digit character, 0-9, a-f and A-F.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">word</TD>
|
||||
<TD vAlign="top" width="50%">Any word character - all alphanumeric characters plus
|
||||
the underscore.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="50%">Unicode</TD>
|
||||
<TD vAlign="top" width="50%">Any character whose code is greater than 255, this
|
||||
applies to the wide character traits classes only.</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>There are some shortcuts that can be used in place of the character classes,
|
||||
provided the flag regex_constants::escape_in_lists is set then you can use:
|
||||
</P>
|
||||
<P>\w in place of [:word:]
|
||||
</P>
|
||||
<P>\s in place of [:space:]
|
||||
</P>
|
||||
<P>\d in place of [:digit:]
|
||||
</P>
|
||||
<P>\l in place of [:lower:]
|
||||
</P>
|
||||
<P>\u in place of [:upper:]
|
||||
</P>
|
||||
<P>Collating elements take the general form [.tagname.] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
|
||||
equivalent to [,]. The library supports all the standard POSIX collating
|
||||
element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
|
||||
"nj", "dz", "lj", each in lower, upper and title case variations.
|
||||
Multi-character collating elements can result in the set matching more than one
|
||||
character, for example [[.ae.]] would match two characters, but note that
|
||||
[^[.ae.]] would only match one character.
|
||||
</P>
|
||||
<P>
|
||||
Equivalence classes take the general form[=tagname=] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, and matches any character that is a member of the same primary
|
||||
equivalence class as the collating element [.tagname.]. An equivalence class is
|
||||
a set of characters that collate the same, a primary equivalence class is a set
|
||||
of characters whose primary sort key are all the same (for example strings are
|
||||
typically collated by character, then by accent, and then by case; the primary
|
||||
sort key then relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
|
||||
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
locale independent method of obtaining the primary sort key for a character,
|
||||
except under Win32. For other operating systems the library will "guess" the
|
||||
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
|
||||
equivalence classes are probably best considered broken under any operating
|
||||
system other than Win32.
|
||||
</P>
|
||||
<P>To include a literal "-" in a set declaration then: make it the first character
|
||||
after the opening "[" or "[^", the endpoint of a range, a collating element, or
|
||||
if the flag regex_constants::escape_in_lists is set then precede with an escape
|
||||
character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
|
||||
make them the endpoint of a range, a collating element, or precede with an
|
||||
escape character if the flag regex_constants::escape_in_lists is set.
|
||||
</P>
|
||||
<H3>Line anchors
|
||||
</H3>
|
||||
<P>An anchor is something that matches the null string at the start or end of a
|
||||
line: "^" matches the null string at the start of a line, "$" matches the null
|
||||
string at the end of a line.
|
||||
</P>
|
||||
<H3>Back references
|
||||
</H3>
|
||||
<P>A back reference is a reference to a previous sub-expression that has already
|
||||
been matched, the reference is to what the sub-expression matched, not to the
|
||||
expression itself. A back reference consists of the escape character "\"
|
||||
followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
|
||||
to the second etc. For example the expression "(.*)\1" matches any string that
|
||||
is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
|
||||
reference to a sub-expression that did not participate in any match, matches
|
||||
the null string: NB this is different to some other regular expression
|
||||
matchers. Back references are only available if the expression is compiled with
|
||||
the flag regex_constants::bk_refs set.
|
||||
</P>
|
||||
<H3>Characters by code
|
||||
</H3>
|
||||
<P>This is an extension to the algorithm that is not available in other libraries,
|
||||
it consists of the escape character followed by the digit "0" followed by the
|
||||
octal character code. For example "\023" represents the character whose octal
|
||||
code is 23. Where ambiguity could occur use parentheses to break the expression
|
||||
up: "\0103" represents the character whose code is 103, "(\010)3 represents the
|
||||
character 10 followed by "3". To match characters by their hexadecimal code,
|
||||
use \x followed by a string of hexadecimal digits, optionally enclosed inside
|
||||
{}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
|
||||
character.</P>
|
||||
<H3>Word operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library.
|
||||
</P>
|
||||
<P>"\w" matches any single character that is a member of the "word" character
|
||||
class, this is identical to the expression "[[:word:]]".
|
||||
</P>
|
||||
<P>"\W" matches any single character that is not a member of the "word" character
|
||||
class, this is identical to the expression "[^[:word:]]".
|
||||
</P>
|
||||
<P>"\<" matches the null string at the start of a word.
|
||||
</P>
|
||||
<P>"\>" matches the null string at the end of the word.
|
||||
</P>
|
||||
<P>"\b" matches the null string at either the start or the end of a word.
|
||||
</P>
|
||||
<P>"\B" matches a null string within a word.
|
||||
</P>
|
||||
<P>The start of the sequence passed to the matching algorithms is considered to be
|
||||
a potential start of a word unless the flag match_not_bow is set. The end of
|
||||
the sequence passed to the matching algorithms is considered to be a potential
|
||||
end of a word unless the flag match_not_eow is set.
|
||||
</P>
|
||||
<H3>Buffer operators
|
||||
</H3>
|
||||
<P>The following operators are provided for compatibility with the GNU regular
|
||||
expression library, and Perl regular expressions:
|
||||
</P>
|
||||
<P>"\`" matches the start of a buffer.
|
||||
</P>
|
||||
<P>"\A" matches the start of the buffer.
|
||||
</P>
|
||||
<P>"\'" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\z" matches the end of a buffer.
|
||||
</P>
|
||||
<P>"\Z" matches the end of a buffer, or possibly one or more new line characters
|
||||
followed by the end of the buffer.
|
||||
</P>
|
||||
<P>A buffer is considered to consist of the whole sequence passed to the matching
|
||||
algorithms, unless the flags match_not_bob or match_not_eob are set.
|
||||
</P>
|
||||
<H3>Escape operator
|
||||
</H3>
|
||||
<P>The escape character "\" has several meanings.
|
||||
</P>
|
||||
<P>Inside a set declaration the escape character is a normal character unless the
|
||||
flag regex_constants::escape_in_lists is set in which case whatever follows the
|
||||
escape is a literal character regardless of its normal meaning.
|
||||
</P>
|
||||
<P>The escape operator may introduce an operator for example: back references, or
|
||||
a word operator.
|
||||
</P>
|
||||
<P>The escape operator may make the following character normal, for example "\*"
|
||||
represents a literal "*" rather than the repeat operator.
|
||||
</P>
|
||||
<H4>Single character escape sequences
|
||||
</H4>
|
||||
<P>The following escape sequences are aliases for single characters:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="33%">Escape sequence
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Character code
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Meaning
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\a
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x07
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Bell character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\f
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0C
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Form feed.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\n
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0A
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Newline character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\r
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0D
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Carriage return.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\t
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x09
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Tab character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\v
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x0B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">Vertical tab.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\e
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0x1B
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">ASCII Escape character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0dd
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An octal character code, where <I>dd</I> is one or
|
||||
more octal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\x{XX}
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">0xXX
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">A hexadecimal character code, where XX is one or more
|
||||
hexadecimal digits, optionally a Unicode character.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD> </TD>
|
||||
<TD vAlign="top" width="33%">\cZ
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">z-@
|
||||
</TD>
|
||||
<TD vAlign="top" width="33%">An ASCII escape sequence control-Z, where Z is any
|
||||
ASCII character greater than or equal to the character code for '@'.
|
||||
</TD>
|
||||
<TD> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H4>Miscellaneous escape sequences:
|
||||
</H4>
|
||||
<P>The following are provided mostly for perl compatibility, but note that there
|
||||
are some differences in the meanings of \l \L \u and \U:
|
||||
<BR>
|
||||
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table4" cellSpacing="0" cellPadding="6" width="100%" border="0">
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\w
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\W
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:word:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\s
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\S
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:space:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\d
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\D
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:digit:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\l
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\L
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:lower:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\u
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\U
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Equivalent to [^[:upper:]].
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\C
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Any single character, equivalent to '.'.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\X
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">Match any Unicode combining character sequence, for
|
||||
example "a\x 0301" (a letter a with an acute).
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\Q
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The begin quote operator, everything that follows is
|
||||
treated as a literal character until a \E end quote operator is found.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD width="5%"> </TD>
|
||||
<TD vAlign="top" width="45%">\E
|
||||
</TD>
|
||||
<TD vAlign="top" width="45%">The end quote operator, terminates a sequence begun
|
||||
with \Q.
|
||||
</TD>
|
||||
<TD width="5%"> </TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<H3>What gets matched?
|
||||
</H3>
|
||||
<P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
<P>
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P><P>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
xaby</CODE>
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"ab"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"a"</CODE></P></TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*([[:alnum:]]+).*</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
" abc def xyz "</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*(a|xayy)</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
zzxayyzz</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"zzxayy"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>These differences between Perl matching rules, and POSIX matching rules, mean
|
||||
that these two regular expression syntaxes differ not only in the features
|
||||
offered, but also in the form that the state machine takes and/or the
|
||||
algorithms used to traverse the state machine.</p>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
332
doc/syntax_option_type.html
Normal file
332
doc/syntax_option_type.html
Normal file
@ -0,0 +1,332 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: syntax_option_type</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">syntax_option_type</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>Type syntax_option type is an implementation defined bitmask type that controls
|
||||
how a regular expression string is to be interpreted. For convenience
|
||||
note that all the constants listed here, are also duplicated within the scope
|
||||
of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<PRE>namespace std{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type syntax_option_type;
|
||||
// these flags are standardized:
|
||||
static const syntax_option_type normal;
|
||||
static const syntax_option_type icase;
|
||||
static const syntax_option_type nosubs;
|
||||
static const syntax_option_type optimize;
|
||||
static const syntax_option_type collate;
|
||||
static const syntax_option_type ECMAScript = normal;
|
||||
static const syntax_option_type JavaScript = normal;
|
||||
static const syntax_option_type JScript = normal;
|
||||
static const syntax_option_type basic;
|
||||
static const syntax_option_type extended;
|
||||
static const syntax_option_type awk;
|
||||
static const syntax_option_type grep;
|
||||
static const syntax_option_type egrep;
|
||||
static const syntax_option_type sed = basic;
|
||||
static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>static const syntax_option_type escape_in_lists;<BR>static const syntax_option_type char_classes;<BR>static const syntax_option_type intervals;<BR>static const syntax_option_type limited_ops;<BR>static const syntax_option_type newline_alt;<BR>static const syntax_option_type bk_plus_qm;<BR>static const syntax_option_type bk_braces;<BR>static const syntax_option_type bk_parens;<BR>static const syntax_option_type bk_refs;<BR>static const syntax_option_type bk_vbar;<BR>static const syntax_option_type use_except;<BR>static const syntax_option_type failbit;<BR>static const syntax_option_type literal;<BR>static const syntax_option_type nocollate;<BR>static const syntax_option_type perlex;<BR>static const syntax_option_type emacs;<BR>
|
||||
} // namespace regex_constants
|
||||
} // namespace std</PRE>
|
||||
<H3>Description</H3>
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
|
||||
type (17.3.2.1.2). Setting its elements has the effects listed in the table
|
||||
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
|
||||
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
|
||||
or perl</CODE> set.</P>
|
||||
<P>Note that for convenience all the constants listed here are duplicated within
|
||||
the scope of class template basic_regex, so you can use any of:</P>
|
||||
<PRE>boost::regex_constants::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::regex::constant_name</PRE>
|
||||
<P>or</P>
|
||||
<PRE>boost::wregex::constant_name</PRE>
|
||||
<P>in an interchangeable manner.</P>
|
||||
<P>
|
||||
<TABLE id="Table2" height="1274" cellSpacing="1" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>Element</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Effect if set</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>normal</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine uses its
|
||||
normal semantics: that is the same as that given in the ECMA-262, ECMAScript
|
||||
Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
|
||||
(FWD.1).</P>
|
||||
<P>boost.regex also recognizes most perl-compatible extensions in this mode.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>icase</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that matching of regular expressions against a character container
|
||||
sequence shall be performed without regard to case.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>nosubs</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that when a regular expression is matched against a character
|
||||
container sequence, then no sub-expression matches are to be stored in the
|
||||
supplied match_results structure.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>optimize</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the regular expression engine should pay more attention to the
|
||||
speed with which regular expressions are matched, and less to the speed with
|
||||
which regular expression objects are constructed. Otherwise it has no
|
||||
detectable effect on the program output. This currently has no effect for
|
||||
boost.regex.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>collate</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>ECMAScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JavaScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>JScript</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>basic</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
|
||||
Portable Operating System Interface (POSIX ), Base Definitions and Headers,
|
||||
Section 9, Regular Expressions (FWD.1).
|
||||
</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>extended</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX extended regular expressions in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
|
||||
Headers, Section 9, Regular Expressions (FWD.1).</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>awk</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
|
||||
(FWD.1).</P>
|
||||
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in
|
||||
character classes permitted.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>grep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
|
||||
Operating System Interface (POSIX ), Shells and Utilities, Section 4,
|
||||
Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX basic syntax, but with the newline character
|
||||
acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>egrep</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>Specifies that the grammar recognized by the regular expression engine is the
|
||||
same as that used by POSIX utility grep when given the -E option in IEEE Std
|
||||
1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
|
||||
Utilities, Section 4, Utilities, grep (FWD.1).</P>
|
||||
<P>That is to say, the same as POSIX extended syntax, but with the newline
|
||||
character acting as an alternation character in addition to "|".</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>sed</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as basic.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="316">
|
||||
<P>perl</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="50%">
|
||||
<P>The same as normal.</P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<P>The following constants are specific to this particular regular expression
|
||||
implementation and do not appear in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">
|
||||
regular expression standardization proposal</A>:</P>
|
||||
<P>
|
||||
<TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0">
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::escape_in_lists</TD>
|
||||
<TD vAlign="top" width="45%">Allows the use of the escape "\" character in sets of
|
||||
characters, for example [\]] represents the set of characters containing only
|
||||
"]". If this flag is not set then "\" is an ordinary character inside sets.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::char_classes</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, character classes [:classname:]
|
||||
are allowed inside character set declarations, for example "[[:word:]]"
|
||||
represents the set of all characters that belong to the character class "word".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: intervals</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, repetition intervals are
|
||||
allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
|
||||
a's.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: limited_ops</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set all of "+", "?" and "|" are
|
||||
ordinary characters in all situations.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: newline_alt</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set, then the newline character "\n"
|
||||
has the same effect as the alternation operator "|".</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_plus_qm</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\+" represents the one or
|
||||
more repetition operator and "\?" represents the zero or one repetition
|
||||
operator. When this bit is not set then "+" and "?" are used instead.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_braces</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\{" and "\}" are used for
|
||||
bounded repetitions and "{" and "}" are normal characters. This is the opposite
|
||||
of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_parens</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\(" and "\)" are used to
|
||||
group sub-expressions and "(" and ")" are ordinary characters, this is the
|
||||
opposite of default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_refs</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then back references are
|
||||
allowed.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: bk_vbar</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then "\|" represents the
|
||||
alternation operator and "|" is an ordinary character. This is the opposite of
|
||||
default behavior.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: use_except</TD>
|
||||
<TD vAlign="top" width="45%">When this bit is set then a <A href="#bad_expression">bad_expression</A>
|
||||
exception will be thrown on error. Use of this flag is deprecated -
|
||||
basic_regex will always throw on error.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase:: failbit</TD>
|
||||
<TD vAlign="top" width="45%">This bit is set on error, if regbase::use_except is
|
||||
not set, then this bit should be checked to see if a regular expression is
|
||||
valid before usage.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%">regbase::literal</TD>
|
||||
<TD vAlign="top" width="45%">All characters in the string are treated as literals,
|
||||
there are no special characters or escape sequences.</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="45%" height="24">regbase::emacs</TD>
|
||||
<TD vAlign="top" width="45%" height="24">Provides compatability with the emacs
|
||||
editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></P>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
68
doc/thread_safety.html
Normal file
68
doc/thread_safety.html
Normal file
@ -0,0 +1,68 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Boost.Regex: Thread Safety</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<P>
|
||||
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
||||
<TR>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<TD width="353">
|
||||
<H1 align="center">Boost.Regex</H1>
|
||||
<H2 align="center">Thread Safety</H2>
|
||||
</TD>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<P>Class <A href="basic_regex.html">basic_regex</A><> and its typedefs regex
|
||||
and wregex are thread safe, in that compiled regular expressions can safely be
|
||||
shared between threads. The matching algorithms <A href="regex_match.html">regex_match</A>,
|
||||
<A href="regex_search.html">regex_search</A>, <A href="regex_grep.html">regex_grep</A>,
|
||||
<A href="regex_format.html">regex_format</A> and <A href="regex_merge.html">regex_merge</A>
|
||||
are all re-entrant and thread safe. Class <A href="match_results.html">match_results</A>
|
||||
is now thread safe, in that the results of a match can be safely copied from
|
||||
one thread to another (for example one thread may find matches and push
|
||||
match_results instances onto a queue, while another thread pops them off the
|
||||
other end), otherwise use a separate instance of <A href="match_results.html">match_results</A>
|
||||
per thread.
|
||||
</P>
|
||||
<P>The <A href="posix_api.html">POSIX API functions</A> are all re-entrant and
|
||||
thread safe, regular expressions compiled with <I>regcomp</I> can also be
|
||||
shared between threads.
|
||||
</P>
|
||||
<P>The class<A href="regex.html"> RegEx</A> is only thread safe if each thread
|
||||
gets its own RegEx instance (apartment threading) - this is a consequence of
|
||||
RegEx handling both compiling and matching regular expressions.
|
||||
</P>
|
||||
<P>Finally note that changing the global locale invalidates all compiled regular
|
||||
expressions, therefore calling <I>set_locale</I> from one thread while another
|
||||
uses regular expressions <I>will</I> produce unpredictable results.
|
||||
</P>
|
||||
<P>
|
||||
There is also a requirement that there is only one thread executing prior to
|
||||
the start of main().</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
17 May 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" -->
|
||||
</p>
|
||||
<P><I><EFBFBD> Copyright <a href="mailto:jm@regex.fsnet.co.uk">John Maddock</a> 1998-<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></I></P>
|
||||
<P align="left"><I>Permission to use, copy, modify, distribute and sell this software
|
||||
and its documentation for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and that both that
|
||||
copyright notice and this permission notice appear in supporting documentation.
|
||||
Dr John Maddock makes no representations about the suitability of this software
|
||||
for any purpose. It is provided "as is" without express or implied warranty.</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
BIN
doc/uarrow.gif
Normal file
BIN
doc/uarrow.gif
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.6 KiB |
705
doc/vc71-performance.html
Normal file
705
doc/vc71-performance.html
Normal file
@ -0,0 +1,705 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>Regular Expression Performance Comparison (Visual Studio.NET 2003)</title>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
|
||||
<META content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot" name="Template">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
</head>
|
||||
<body bgcolor="#ffffff" link="#0000ff" vlink="#800080">
|
||||
<h2>Regular Expression Performance Comparison</h2>
|
||||
<p>The following tables provide comparisons between the following regular
|
||||
expression libraries:</p>
|
||||
<p><a href="http://research.microsoft.com/projects/greta"> GRETA</a>.</p>
|
||||
<p><a href="http://www.boost.org/">The Boost regex library</a>.</p>
|
||||
<p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a>
|
||||
- this is provided for comparison as a typical non-backtracking implementation.</p>
|
||||
<p>Philip Hazel's <a href="http://www.pcre.org">PCRE</a> library.</p>
|
||||
<h3>Details</h3>
|
||||
<p>Machine: Intel Pentium 4 2.8GHz PC.</p>
|
||||
<p>Compiler: Microsoft Visual C++ version 7.1.</p>
|
||||
<p>C++ Standard Library: Dinkumware standard library version 313.</p>
|
||||
<p>OS: Win32.</p>
|
||||
<p>Boost version: 1.31.0.</p>
|
||||
<p>PCRE version: 3.9.</p>
|
||||
<p>As ever care should be taken in interpreting the results, only sensible regular
|
||||
expressions (rather than pathological cases) are given, most are taken from the
|
||||
Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of
|
||||
Regular Expressions</a>. In addition, some variation in the relative
|
||||
performance of these libraries can be expected on other machines - as memory
|
||||
access and processor caching effects can be quite large for most finite state
|
||||
machine algorithms. In each case the first figure given is the relative
|
||||
time taken (so a value of 1.0 is as good as it gets), while the second figure
|
||||
is the actual time taken.</p>
|
||||
<h3>Averages</h3>
|
||||
<p>The following are the average relative scores for all the tests: the perfect
|
||||
regular expression library would score 1, in practice anything less than 2
|
||||
is pretty good.</p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>6.90669</td>
|
||||
<td>23.751</td>
|
||||
<td>1.62553</td>
|
||||
<td>1.38213</td>
|
||||
<td>110.973</td>
|
||||
<td>1.69371</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 1: Long Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within a long English language text was measured
|
||||
(<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a>
|
||||
from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb). </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Twain</code></td>
|
||||
<td>19.7<br>
|
||||
(0.541s)</td>
|
||||
<td>85.5<br>
|
||||
(2.35s)</td>
|
||||
<td>3.09<br>
|
||||
(0.0851s)</td>
|
||||
<td>3.09<br>
|
||||
(0.0851s)</td>
|
||||
<td>131<br>
|
||||
(3.6s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.0275s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Huck[[:alpha:]]+</code></td>
|
||||
<td>11<br>
|
||||
(0.55s)</td>
|
||||
<td>93.4<br>
|
||||
(4.68s)</td>
|
||||
<td>3.4<br>
|
||||
(0.17s)</td>
|
||||
<td>3.35<br>
|
||||
(0.168s)</td>
|
||||
<td>124<br>
|
||||
(6.19s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.0501s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>[[:alpha:]]+ing</code></td>
|
||||
<td>11.3<br>
|
||||
(6.82s)</td>
|
||||
<td>21.3<br>
|
||||
(12.8s)</td>
|
||||
<td>1.83<br>
|
||||
(1.1s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.601s)</font></td>
|
||||
<td>6.47<br>
|
||||
(3.89s)</td>
|
||||
<td>4.75<br>
|
||||
(2.85s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[^ ]*?Twain</code></td>
|
||||
<td>5.75<br>
|
||||
(1.15s)</td>
|
||||
<td>17.1<br>
|
||||
(3.43s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.2s)</font></td>
|
||||
<td>1.3<br>
|
||||
(0.26s)</td>
|
||||
<td>NA</td>
|
||||
<td>3.8<br>
|
||||
(0.761s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
|
||||
<td>28.5<br>
|
||||
(3.1s)</td>
|
||||
<td>77.2<br>
|
||||
(8.4s)</td>
|
||||
<td>2.3<br>
|
||||
(0.251s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.109s)</font></td>
|
||||
<td>191<br>
|
||||
(20.8s)</td>
|
||||
<td>1.77<br>
|
||||
(0.193s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
|
||||
<td>16.2<br>
|
||||
(4.14s)</td>
|
||||
<td>49<br>
|
||||
(12.5s)</td>
|
||||
<td>1.65<br>
|
||||
(0.42s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.255s)</font></td>
|
||||
<td>NA</td>
|
||||
<td>2.43<br>
|
||||
(0.62s)</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 2: Medium Sized Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within a medium sized English language text was
|
||||
measured (the first 50K from mtent12.txt). </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Twain</code></td>
|
||||
<td>9.49<br>
|
||||
(0.00274s)</td>
|
||||
<td>40.7<br>
|
||||
(0.0117s)</td>
|
||||
<td>1.54<br>
|
||||
(0.000445s)</td>
|
||||
<td>1.56<br>
|
||||
(0.00045s)</td>
|
||||
<td>13.5<br>
|
||||
(0.00391s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000289s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Huck[[:alpha:]]+</code></td>
|
||||
<td>14.3<br>
|
||||
(0.0027s)</td>
|
||||
<td>62.3<br>
|
||||
(0.0117s)</td>
|
||||
<td>2.26<br>
|
||||
(0.000425s)</td>
|
||||
<td>2.29<br>
|
||||
(0.000431s)</td>
|
||||
<td>1.27<br>
|
||||
(0.000239s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000188s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>[[:alpha:]]+ing</code></td>
|
||||
<td>7.34<br>
|
||||
(0.0178s)</td>
|
||||
<td>13.7<br>
|
||||
(0.0331s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.00243s)</font></td>
|
||||
<td><font color="#008000">1.02<br>
|
||||
(0.00246s)</font></td>
|
||||
<td>7.36<br>
|
||||
(0.0178s)</td>
|
||||
<td>5.87<br>
|
||||
(0.0142s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[^ ]*?Twain</code></td>
|
||||
<td>8.34<br>
|
||||
(0.00579s)</td>
|
||||
<td>24.8<br>
|
||||
(0.0172s)</td>
|
||||
<td>1.52<br>
|
||||
(0.00105s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000694s)</font></td>
|
||||
<td>NA</td>
|
||||
<td>2.81<br>
|
||||
(0.00195s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>Tom|Sawyer|Huckleberry|Finn</code></td>
|
||||
<td>12.9<br>
|
||||
(0.00781s)</td>
|
||||
<td>35.1<br>
|
||||
(0.0213s)</td>
|
||||
<td>1.67<br>
|
||||
(0.00102s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000606s)</font></td>
|
||||
<td>81.5<br>
|
||||
(0.0494s)</td>
|
||||
<td>1.94<br>
|
||||
(0.00117s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)</code></td>
|
||||
<td>15.6<br>
|
||||
(0.0106s)</td>
|
||||
<td>46.6<br>
|
||||
(0.0319s)</td>
|
||||
<td>2.72<br>
|
||||
(0.00186s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000684s)</font></td>
|
||||
<td>311<br>
|
||||
(0.213s)</td>
|
||||
<td>1.72<br>
|
||||
(0.00117s)</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 3: C++ Code Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within the C++ source file <a href="../../../boost/crc.hpp">
|
||||
boost/crc.hpp</a> was measured. </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([
|
||||
]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{)</code></td>
|
||||
<td>8.88<br>
|
||||
(0.000792s)</td>
|
||||
<td>46.4<br>
|
||||
(0.00414s)</td>
|
||||
<td>1.19<br>
|
||||
(0.000106s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(8.92e-005s)</font></td>
|
||||
<td>688<br>
|
||||
(0.0614s)</td>
|
||||
<td>3.23<br>
|
||||
(0.000288s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>(^[
|
||||
]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\></code></td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.00571s)</font></td>
|
||||
<td>5.31<br>
|
||||
(0.0303s)</td>
|
||||
<td>2.47<br>
|
||||
(0.0141s)</td>
|
||||
<td>1.92<br>
|
||||
(0.011s)</td>
|
||||
<td>NA</td>
|
||||
<td>3.29<br>
|
||||
(0.0188s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>)</code></td>
|
||||
<td>5.78<br>
|
||||
(0.00172s)</td>
|
||||
<td>26.3<br>
|
||||
(0.00783s)</td>
|
||||
<td>1.12<br>
|
||||
(0.000333s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000298s)</font></td>
|
||||
<td>128<br>
|
||||
(0.0382s)</td>
|
||||
<td>1.74<br>
|
||||
(0.000518s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>)</code></td>
|
||||
<td>10.2<br>
|
||||
(0.00305s)</td>
|
||||
<td>28.4<br>
|
||||
(0.00845s)</td>
|
||||
<td>1.12<br>
|
||||
(0.000333s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000298s)</font></td>
|
||||
<td>155<br>
|
||||
(0.0463s)</td>
|
||||
<td>1.74<br>
|
||||
(0.000519s)</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<h3></h3>
|
||||
<H3>Comparison 4: HTML Document Search
|
||||
</H3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within the html file <a href="../../libraries.htm">libs/libraries.htm</a>
|
||||
was measured. </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>beman|john|dave</code></td>
|
||||
<td>11<br>
|
||||
(0.00297s)</td>
|
||||
<td>34.3<br>
|
||||
(0.00922s)</td>
|
||||
<td>1.78<br>
|
||||
(0.000479s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000269s)</font></td>
|
||||
<td>55.2<br>
|
||||
(0.0149s)</td>
|
||||
<td>1.85<br>
|
||||
(0.000499s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code><p>.*?</p></code></td>
|
||||
<td>5.38<br>
|
||||
(0.00145s)</td>
|
||||
<td>21.8<br>
|
||||
(0.00587s)</td>
|
||||
<td><font color="#008000">1.02<br>
|
||||
(0.000274s)</font></td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000269s)</font></td>
|
||||
<td>NA</td>
|
||||
<td><font color="#008000">1.05<br>
|
||||
(0.000283s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*></code></td>
|
||||
<td>4.51<br>
|
||||
(0.00207s)</td>
|
||||
<td>12.6<br>
|
||||
(0.00579s)</td>
|
||||
<td>1.34<br>
|
||||
(0.000616s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000459s)</font></td>
|
||||
<td>343<br>
|
||||
(0.158s)</td>
|
||||
<td><font color="#008000">1.09<br>
|
||||
(0.000499s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <h[12345678][^>]*>.*?</h[12345678]></code></td>
|
||||
<td>7.39<br>
|
||||
(0.00143s)</td>
|
||||
<td>29.6<br>
|
||||
(0.00571s)</td>
|
||||
<td>1.87<br>
|
||||
(0.000362s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000193s)</font></td>
|
||||
<td>NA</td>
|
||||
<td>1.27<br>
|
||||
(0.000245s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*></code></td>
|
||||
<td>6.73<br>
|
||||
(0.00145s)</td>
|
||||
<td>27.3<br>
|
||||
(0.00587s)</td>
|
||||
<td>1.2<br>
|
||||
(0.000259s)</td>
|
||||
<td>1.32<br>
|
||||
(0.000283s)</td>
|
||||
<td>148<br>
|
||||
(0.0319s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.000215s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> <font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font></code></td>
|
||||
<td>6.93<br>
|
||||
(0.00153s)</td>
|
||||
<td>27<br>
|
||||
(0.00595s)</td>
|
||||
<td>1.22<br>
|
||||
(0.000269s)</td>
|
||||
<td>1.31<br>
|
||||
(0.000289s)</td>
|
||||
<td>NA</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(0.00022s)</font></td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<h3>Comparison 3: Simple Matches</h3>
|
||||
<p>For each of the following regular expressions the time taken to match against
|
||||
the text indicated was measured. </p>
|
||||
<table border="1" cellspacing="1">
|
||||
<tr>
|
||||
<td><strong>Expression</strong></td>
|
||||
<td><strong>Text</strong></td>
|
||||
<td><strong>GRETA</strong></td>
|
||||
<td><strong>GRETA<br>
|
||||
(non-recursive mode)</strong></td>
|
||||
<td><strong>Boost</strong></td>
|
||||
<td><strong>Boost + C++ locale</strong></td>
|
||||
<td><strong>POSIX</strong></td>
|
||||
<td><strong>PCRE</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>abc</code></td>
|
||||
<td>abc</td>
|
||||
<td>1.31<br>
|
||||
(2.2e-007s)</td>
|
||||
<td>1.94<br>
|
||||
(3.25e-007s)</td>
|
||||
<td>1.26<br>
|
||||
(2.1e-007s)</td>
|
||||
<td>1.24<br>
|
||||
(2.08e-007s)</td>
|
||||
<td>3.03<br>
|
||||
(5.06e-007s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.67e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^([0-9]+)(\-| |$)(.*)$</code></td>
|
||||
<td>100- this is a line of ftp response which contains a message string</td>
|
||||
<td>1.52<br>
|
||||
(6.88e-007s)</td>
|
||||
<td>2.28<br>
|
||||
(1.03e-006s)</td>
|
||||
<td>1.5<br>
|
||||
(6.78e-007s)</td>
|
||||
<td>1.5<br>
|
||||
(6.78e-007s)</td>
|
||||
<td>329<br>
|
||||
(0.000149s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(4.53e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}</code></td>
|
||||
<td>1234-5678-1234-456</td>
|
||||
<td>2.04<br>
|
||||
(1.03e-006s)</td>
|
||||
<td>2.83<br>
|
||||
(1.43e-006s)</td>
|
||||
<td>2.12<br>
|
||||
(1.07e-006s)</td>
|
||||
<td>2.04<br>
|
||||
(1.03e-006s)</td>
|
||||
<td>30.8<br>
|
||||
(1.56e-005s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(5.05e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
|
||||
<td>john_maddock@compuserve.com</td>
|
||||
<td>1.48<br>
|
||||
(1.78e-006s)</td>
|
||||
<td>2.1<br>
|
||||
(2.52e-006s)</td>
|
||||
<td>1.35<br>
|
||||
(1.62e-006s)</td>
|
||||
<td>1.32<br>
|
||||
(1.59e-006s)</td>
|
||||
<td>165<br>
|
||||
(0.000198s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.2e-006s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
|
||||
<td>foo12@foo.edu</td>
|
||||
<td>1.28<br>
|
||||
(1.41e-006s)</td>
|
||||
<td>1.9<br>
|
||||
(2.1e-006s)</td>
|
||||
<td>1.42<br>
|
||||
(1.57e-006s)</td>
|
||||
<td>1.38<br>
|
||||
(1.53e-006s)</td>
|
||||
<td>107<br>
|
||||
(0.000119s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.11e-006s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$</code></td>
|
||||
<td>bob.smith@foo.tv</td>
|
||||
<td>1.29<br>
|
||||
(1.43e-006s)</td>
|
||||
<td>1.9<br>
|
||||
(2.1e-006s)</td>
|
||||
<td>1.42<br>
|
||||
(1.57e-006s)</td>
|
||||
<td>1.38<br>
|
||||
(1.53e-006s)</td>
|
||||
<td>119<br>
|
||||
(0.000132s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(1.11e-006s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
|
||||
<td>EH10 2QQ</td>
|
||||
<td>1.26<br>
|
||||
(4.63e-007s)</td>
|
||||
<td>1.77<br>
|
||||
(6.49e-007s)</td>
|
||||
<td>1.3<br>
|
||||
(4.77e-007s)</td>
|
||||
<td>1.2<br>
|
||||
(4.4e-007s)</td>
|
||||
<td>9.15<br>
|
||||
(3.36e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.68e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
|
||||
<td>G1 1AA</td>
|
||||
<td><font color="#008000">1.06<br>
|
||||
(4.73e-007s)</font></td>
|
||||
<td>1.59<br>
|
||||
(7.07e-007s)</td>
|
||||
<td><font color="#008000">1.05<br>
|
||||
(4.68e-007s)</font></td>
|
||||
<td><font color="#008000">1<br>
|
||||
(4.44e-007s)</font></td>
|
||||
<td>12.9<br>
|
||||
(5.73e-006s)</td>
|
||||
<td>1.63<br>
|
||||
(7.26e-007s)</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$</code></td>
|
||||
<td>SW1 1ZZ</td>
|
||||
<td>1.26<br>
|
||||
(9.17e-007s)</td>
|
||||
<td>1.84<br>
|
||||
(1.34e-006s)</td>
|
||||
<td>1.28<br>
|
||||
(9.26e-007s)</td>
|
||||
<td>1.21<br>
|
||||
(8.78e-007s)</td>
|
||||
<td>8.42<br>
|
||||
(6.11e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(7.26e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
|
||||
<td>4/1/2001</td>
|
||||
<td>1.57<br>
|
||||
(9.73e-007s)</td>
|
||||
<td>2.28<br>
|
||||
(1.41e-006s)</td>
|
||||
<td>1.25<br>
|
||||
(7.73e-007s)</td>
|
||||
<td>1.26<br>
|
||||
(7.83e-007s)</td>
|
||||
<td>11.2<br>
|
||||
(6.95e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(6.21e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code> ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$</code></td>
|
||||
<td>12/12/2001</td>
|
||||
<td>1.52<br>
|
||||
(9.56e-007s)</td>
|
||||
<td>2.06<br>
|
||||
(1.3e-006s)</td>
|
||||
<td>1.29<br>
|
||||
(8.12e-007s)</td>
|
||||
<td>1.24<br>
|
||||
(7.83e-007s)</td>
|
||||
<td>12.4<br>
|
||||
(7.8e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(6.3e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
|
||||
<td>123</td>
|
||||
<td>2.11<br>
|
||||
(7.35e-007s)</td>
|
||||
<td>3.18<br>
|
||||
(1.11e-006s)</td>
|
||||
<td>2.5<br>
|
||||
(8.7e-007s)</td>
|
||||
<td>2.44<br>
|
||||
(8.5e-007s)</td>
|
||||
<td>5.26<br>
|
||||
(1.83e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.49e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
|
||||
<td>+3.14159</td>
|
||||
<td>1.31<br>
|
||||
(4.96e-007s)</td>
|
||||
<td>1.92<br>
|
||||
(7.26e-007s)</td>
|
||||
<td>1.26<br>
|
||||
(4.77e-007s)</td>
|
||||
<td>1.2<br>
|
||||
(4.53e-007s)</td>
|
||||
<td>9.71<br>
|
||||
(3.66e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.77e-007s)</font></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>^[-+]?[[:digit:]]*\.?[[:digit:]]*$</code></td>
|
||||
<td>-3.14159</td>
|
||||
<td>1.32<br>
|
||||
(4.97e-007s)</td>
|
||||
<td>1.92<br>
|
||||
(7.26e-007s)</td>
|
||||
<td>1.24<br>
|
||||
(4.67e-007s)</td>
|
||||
<td>1.2<br>
|
||||
(4.53e-007s)</td>
|
||||
<td>9.7<br>
|
||||
(3.66e-006s)</td>
|
||||
<td><font color="#008000">1<br>
|
||||
(3.78e-007s)</font></td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<hr>
|
||||
<p>Copyright John Maddock April 2003, all rights reserved.</p>
|
||||
</body>
|
||||
</html>
|
115
example/snippets/regex_iterator_example.cpp
Normal file
115
example/snippets/regex_iterator_example.cpp
Normal file
@ -0,0 +1,115 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2003
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
/*
|
||||
* LOCATION: see http://www.boost.org for most recent version.
|
||||
* FILE regex_iterator_example_2.cpp
|
||||
* VERSION see <boost/version.hpp>
|
||||
* DESCRIPTION: regex_iterator example 2: searches a cpp file for class definitions,
|
||||
* using global data.
|
||||
*/
|
||||
|
||||
#include <string>
|
||||
#include <map>
|
||||
#include <fstream>
|
||||
#include <iostream>
|
||||
#include <boost/regex.hpp>
|
||||
|
||||
using namespace std;
|
||||
|
||||
// purpose:
|
||||
// takes the contents of a file in the form of a string
|
||||
// and searches for all the C++ class definitions, storing
|
||||
// their locations in a map of strings/int's
|
||||
|
||||
typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type;
|
||||
|
||||
const char* re =
|
||||
// possibly leading whitespace:
|
||||
"^[[:space:]]*"
|
||||
// possible template declaration:
|
||||
"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
|
||||
// class or struct:
|
||||
"(class|struct)[[:space:]]*"
|
||||
// leading declspec macros etc:
|
||||
"("
|
||||
"\\<\\w+\\>"
|
||||
"("
|
||||
"[[:blank:]]*\\([^)]*\\)"
|
||||
")?"
|
||||
"[[:space:]]*"
|
||||
")*"
|
||||
// the class name
|
||||
"(\\<\\w*\\>)[[:space:]]*"
|
||||
// template specialisation parameters
|
||||
"(<[^;:{]+>)?[[:space:]]*"
|
||||
// terminate in { or :
|
||||
"(\\{|:[^;\\{()]*\\{)";
|
||||
|
||||
|
||||
boost::regex expression(re);
|
||||
map_type class_index;
|
||||
|
||||
bool regex_callback(const boost::match_results<std::string::const_iterator>& what)
|
||||
{
|
||||
// what[0] contains the whole string
|
||||
// what[5] contains the class name.
|
||||
// what[6] contains the template specialisation if any.
|
||||
// add class name and position to map:
|
||||
class_index[what[5].str() + what[6].str()] = what.position(5);
|
||||
return true;
|
||||
}
|
||||
|
||||
void load_file(std::string& s, std::istream& is)
|
||||
{
|
||||
s.erase();
|
||||
s.reserve(is.rdbuf()->in_avail());
|
||||
char c;
|
||||
while(is.get(c))
|
||||
{
|
||||
if(s.capacity() == s.size())
|
||||
s.reserve(s.capacity() * 3);
|
||||
s.append(1, c);
|
||||
}
|
||||
}
|
||||
|
||||
int main(int argc, const char** argv)
|
||||
{
|
||||
std::string text;
|
||||
for(int i = 1; i < argc; ++i)
|
||||
{
|
||||
cout << "Processing file " << argv[i] << endl;
|
||||
std::ifstream fs(argv[i]);
|
||||
load_file(text, fs);
|
||||
// construct our iterators:
|
||||
boost::regex_iterator<std::string::const_iterator> m1(text.begin(), text.end(), expression);
|
||||
boost::regex_iterator<std::string::const_iterator> m2;
|
||||
std::for_each(m1, m2, ®ex_callback);
|
||||
// copy results:
|
||||
cout << class_index.size() << " matches found" << endl;
|
||||
map_type::iterator c, d;
|
||||
c = class_index.begin();
|
||||
d = class_index.end();
|
||||
while(c != d)
|
||||
{
|
||||
cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl;
|
||||
++c;
|
||||
}
|
||||
class_index.erase(class_index.begin(), class_index.end());
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
138
example/snippets/regex_replace_example.cpp
Normal file
138
example/snippets/regex_replace_example.cpp
Normal file
@ -0,0 +1,138 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 1998-2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
/*
|
||||
* LOCATION: see http://www.boost.org for most recent version.
|
||||
* FILE regex_replace_example.cpp
|
||||
* VERSION see <boost/version.hpp>
|
||||
* DESCRIPTION: regex_replace example:
|
||||
* converts a C++ file to syntax highlighted HTML.
|
||||
*/
|
||||
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <iterator>
|
||||
#include <boost/regex.hpp>
|
||||
#include <fstream>
|
||||
#include <iostream>
|
||||
|
||||
// purpose:
|
||||
// takes the contents of a file and transform to
|
||||
// syntax highlighted code in html format
|
||||
|
||||
boost::regex e1, e2;
|
||||
extern const char* expression_text;
|
||||
extern const char* format_string;
|
||||
extern const char* pre_expression;
|
||||
extern const char* pre_format;
|
||||
extern const char* header_text;
|
||||
extern const char* footer_text;
|
||||
|
||||
void load_file(std::string& s, std::istream& is)
|
||||
{
|
||||
s.erase();
|
||||
s.reserve(is.rdbuf()->in_avail());
|
||||
char c;
|
||||
while(is.get(c))
|
||||
{
|
||||
if(s.capacity() == s.size())
|
||||
s.reserve(s.capacity() * 3);
|
||||
s.append(1, c);
|
||||
}
|
||||
}
|
||||
|
||||
int main(int argc, const char** argv)
|
||||
{
|
||||
try{
|
||||
e1.assign(expression_text);
|
||||
e2.assign(pre_expression);
|
||||
for(int i = 1; i < argc; ++i)
|
||||
{
|
||||
std::cout << "Processing file " << argv[i] << std::endl;
|
||||
std::ifstream fs(argv[i]);
|
||||
std::string in;
|
||||
load_file(in, fs);
|
||||
std::string out_name = std::string(argv[i]) + std::string(".htm");
|
||||
std::ofstream os(out_name.c_str());
|
||||
os << header_text;
|
||||
// strip '<' and '>' first by outputting to a
|
||||
// temporary string stream
|
||||
std::ostringstream t(std::ios::out | std::ios::binary);
|
||||
std::ostream_iterator<char> oi(t);
|
||||
boost::regex_replace(oi, in.begin(), in.end(), e2, pre_format, boost::match_default | boost::format_all);
|
||||
// then output to final output stream
|
||||
// adding syntax highlighting:
|
||||
std::string s(t.str());
|
||||
std::ostream_iterator<char> out(os);
|
||||
boost::regex_replace(out, s.begin(), s.end(), e1, format_string, boost::match_default | boost::format_all);
|
||||
os << footer_text;
|
||||
}
|
||||
}
|
||||
catch(...)
|
||||
{ return -1; }
|
||||
return 0;
|
||||
}
|
||||
|
||||
extern const char* pre_expression = "(<)|(>)|\\r";
|
||||
extern const char* pre_format = "(?1<)(?2>)";
|
||||
|
||||
|
||||
const char* expression_text = // preprocessor directives: index 1
|
||||
"(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
|
||||
// comment: index 2
|
||||
"(//[^\\n]*|/\\*.*?\\*/)|"
|
||||
// literals: index 3
|
||||
"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
|
||||
// string literals: index 4
|
||||
"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
|
||||
// keywords: index 5
|
||||
"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
|
||||
"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
|
||||
"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
|
||||
"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
|
||||
"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
|
||||
"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
|
||||
"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
|
||||
"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
|
||||
"|using|virtual|void|volatile|wchar_t|while)\\>"
|
||||
;
|
||||
|
||||
const char* format_string = "(?1<font color=\"#008040\">$&</font>)"
|
||||
"(?2<I><font color=\"#000080\">$&</font></I>)"
|
||||
"(?3<font color=\"#0000A0\">$&</font>)"
|
||||
"(?4<font color=\"#0000FF\">$&</font>)"
|
||||
"(?5<B>$&</B>)";
|
||||
|
||||
const char* header_text = "<HTML>\n<HEAD>\n"
|
||||
"<TITLE>Auto-generated html formated source</TITLE>\n"
|
||||
"<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n"
|
||||
"</HEAD>\n"
|
||||
"<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n"
|
||||
"<P> </P>\n<PRE>";
|
||||
|
||||
const char* footer_text = "</PRE>\n</BODY>\n\n";
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
75
example/snippets/regex_token_iterator_example_1.cpp
Normal file
75
example/snippets/regex_token_iterator_example_1.cpp
Normal file
@ -0,0 +1,75 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 12003
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
/*
|
||||
* LOCATION: see http://www.boost.org for most recent version.
|
||||
* FILE regex_token_iterator_example_1.cpp
|
||||
* VERSION see <boost/version.hpp>
|
||||
* DESCRIPTION: regex_token_iterator example: split a string into tokens.
|
||||
*/
|
||||
|
||||
|
||||
#include <boost/regex.hpp>
|
||||
|
||||
#include <iostream>
|
||||
using namespace std;
|
||||
|
||||
|
||||
#if defined(BOOST_MSVC) || (defined(__BORLANDC__) && (__BORLANDC__ == 0x550))
|
||||
//
|
||||
// problem with std::getline under MSVC6sp3
|
||||
istream& getline(istream& is, std::string& s)
|
||||
{
|
||||
s.erase();
|
||||
char c = is.get();
|
||||
while(c != '\n')
|
||||
{
|
||||
s.append(1, c);
|
||||
c = is.get();
|
||||
}
|
||||
return is;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
int main(int argc)
|
||||
{
|
||||
string s;
|
||||
do{
|
||||
if(argc == 1)
|
||||
{
|
||||
cout << "Enter text to split (or \"quit\" to exit): ";
|
||||
getline(cin, s);
|
||||
if(s == "quit") break;
|
||||
}
|
||||
else
|
||||
s = "This is a string of tokens";
|
||||
|
||||
boost::regex re("\\s+");
|
||||
boost::regex_token_iterator<std::string::const_iterator> i(s.begin(), s.end(), re, -1);
|
||||
boost::regex_token_iterator<std::string::const_iterator> j;
|
||||
|
||||
unsigned count = 0;
|
||||
while(i != j)
|
||||
{
|
||||
cout << *i++ << endl;
|
||||
count++;
|
||||
}
|
||||
cout << "There were " << count << " tokens found." << endl;
|
||||
|
||||
}while(argc == 1);
|
||||
return 0;
|
||||
}
|
||||
|
92
example/snippets/regex_token_iterator_example_2.cpp
Normal file
92
example/snippets/regex_token_iterator_example_2.cpp
Normal file
@ -0,0 +1,92 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2003
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
/*
|
||||
* LOCATION: see http://www.boost.org for most recent version.
|
||||
* FILE regex_token_iterator_example_2.cpp
|
||||
* VERSION see <boost/version.hpp>
|
||||
* DESCRIPTION: regex_token_iterator example: spit out linked URL's.
|
||||
*/
|
||||
|
||||
|
||||
#include <fstream>
|
||||
#include <iostream>
|
||||
#include <iterator>
|
||||
#include <boost/regex.hpp>
|
||||
|
||||
boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
|
||||
boost::regex::normal | boost::regbase::icase);
|
||||
|
||||
void load_file(std::string& s, std::istream& is)
|
||||
{
|
||||
s.erase();
|
||||
//
|
||||
// attempt to grow string buffer to match file size,
|
||||
// this doesn't always work...
|
||||
s.reserve(is.rdbuf()->in_avail());
|
||||
char c;
|
||||
while(is.get(c))
|
||||
{
|
||||
// use logarithmic growth stategy, in case
|
||||
// in_avail (above) returned zero:
|
||||
if(s.capacity() == s.size())
|
||||
s.reserve(s.capacity() * 3);
|
||||
s.append(1, c);
|
||||
}
|
||||
}
|
||||
|
||||
int main(int argc, char** argv)
|
||||
{
|
||||
std::string s;
|
||||
int i;
|
||||
for(i = 1; i < argc; ++i)
|
||||
{
|
||||
std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
|
||||
s.erase();
|
||||
std::ifstream is(argv[i]);
|
||||
load_file(s, is);
|
||||
boost::regex_token_iterator<std::string::const_iterator>
|
||||
i(s.begin(), s.end(), e, 1);
|
||||
boost::regex_token_iterator<std::string::const_iterator> j;
|
||||
while(i != j)
|
||||
{
|
||||
std::cout << *i++ << std::endl;
|
||||
}
|
||||
}
|
||||
//
|
||||
// alternative method:
|
||||
// test the array-literal constructor, and split out the whole
|
||||
// match as well as $1....
|
||||
//
|
||||
for(i = 1; i < argc; ++i)
|
||||
{
|
||||
std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
|
||||
s.erase();
|
||||
std::ifstream is(argv[i]);
|
||||
load_file(s, is);
|
||||
const int subs[] = {1, 0,};
|
||||
boost::regex_token_iterator<std::string::const_iterator>
|
||||
i(s.begin(), s.end(), e, subs);
|
||||
boost::regex_token_iterator<std::string::const_iterator> j;
|
||||
while(i != j)
|
||||
{
|
||||
std::cout << *i++ << std::endl;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
205
faq.htm
205
faq.htm
@ -1,205 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++ - FAQ</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, FAQ.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><font color="#FF0000">Q. Why does using parenthesis in a
|
||||
regular expression change the result of a match?</font></p>
|
||||
|
||||
<p>Parentheses don't only mark; they determine what the best
|
||||
match is as well. regex++ tries to follow the POSIX standard
|
||||
leftmost longest rule for determining what matched. So if there
|
||||
is more than one possible match after considering the whole
|
||||
expression, it looks next at the first sub-expression and then
|
||||
the second sub-expression and so on. So...</p>
|
||||
|
||||
<pre>"(0*)([0-9]*)" against "00123" would produce
|
||||
$1 = "00"
|
||||
$2 = "123"</pre>
|
||||
|
||||
<p>where as</p>
|
||||
|
||||
<pre>"0*([0-9)*" against "00123" would produce
|
||||
$1 = "00123"</pre>
|
||||
|
||||
<p>If you think about it, had $1 only matched the "123",
|
||||
this would be "less good" than the match "00123"
|
||||
which is both further to the left and longer. If you want $1 to
|
||||
match only the "123" part, then you need to use
|
||||
something like:</p>
|
||||
|
||||
<pre>"0*([1-9][0-9]*)"</pre>
|
||||
|
||||
<p>as the expression.</p>
|
||||
|
||||
<p><font color="#FF0000">Q. Configure says that my compiler is
|
||||
unable to merge template instances, what does this mean?</font> </p>
|
||||
|
||||
<p>A. When you compile template code, you can end up with the
|
||||
same template instances in multiple translation units - this will
|
||||
lead to link time errors unless your compiler/linker is smart
|
||||
enough to merge these template instances into a single record in
|
||||
the executable file. If you see this warning after running
|
||||
configure, then you can still link to libregex++.a if: </p>
|
||||
|
||||
<ol>
|
||||
<li>You use only the low-level template classes (reg_expression<>
|
||||
match_results<> etc), from a single translation
|
||||
unit, and use no other part of regex++.</li>
|
||||
<li>You use only the POSIX API functions (regcomp regexec etc),
|
||||
and no other part of regex++.</li>
|
||||
<li>You use only the high level class RegEx, and no other
|
||||
part of regex++. </li>
|
||||
</ol>
|
||||
|
||||
<p>Another option is to create a master include file, which
|
||||
#include's all the regex++ source files, and all the source files
|
||||
in which you use regex++. You then compile and link this master
|
||||
file as a single translation unit. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Configure says that my compiler is
|
||||
unable to merge template instances from archive files, what does
|
||||
this mean?</font> </p>
|
||||
|
||||
<p>A. When you compile template code, you can end up with the
|
||||
same template instances in multiple translation units - this will
|
||||
lead to link time errors unless your compiler/linker is smart
|
||||
enough to merge these template instances into a single record in
|
||||
the executable file. Some compilers are able to do this for
|
||||
normal .cpp or .o files, but fail if the object file has been
|
||||
placed in a library archive. If you see this warning after
|
||||
running configure, then you can still link to libregex++.a if: </p>
|
||||
|
||||
<ol>
|
||||
<li>You use only the low-level template classes (reg_expression<>
|
||||
match_results<> etc), and use no other part of
|
||||
regex++.</li>
|
||||
<li>You use only the POSIX API functions (regcomp regexec etc),
|
||||
and no other part of regex++.</li>
|
||||
<li>You use only the high level class RegEx, and no other
|
||||
part of regex++. </li>
|
||||
</ol>
|
||||
|
||||
<p>Another option is to add the regex++ source files directly to
|
||||
your project instead of linking to libregex++.a, generally you
|
||||
should do this only if you are getting link time errors with
|
||||
libregex++.a. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Configure says that my compiler can't
|
||||
merge templates containing switch statements, what does this
|
||||
mean?</font> </p>
|
||||
|
||||
<p>A. Some compilers can't merge templates that contain static
|
||||
data - this includes switch statements which implicitly generate
|
||||
static data as well as code. Principally this affects the egcs
|
||||
compiler - but note gcc 2.81 also suffers from this problem - the
|
||||
compiler will compile and link the code - but the code will not
|
||||
run because the code and the static data it uses have become
|
||||
separated. The default behaviour of regex++ is to try and fix
|
||||
this problem by declaring "problem" templates inside
|
||||
unnamed namespaces, so that the templates have internal linkage.
|
||||
Note that this can result in a great deal of code bloat. If the
|
||||
compiler doesn't support namespaces, or if code bloat becomes a
|
||||
problem, then follow the guidelines above for placing all the
|
||||
templates used in a single translation unit, and edit boost/regex/config.hpp
|
||||
so that BOOST_REGEX_NO_TEMPLATE_SWITCH_MERGE is no longer defined.
|
||||
</p>
|
||||
|
||||
<p><font color="#FF0000">Q. I can't get regex++ to work with
|
||||
escape characters, what's going on?</font> </p>
|
||||
|
||||
<p>A. If you embed regular expressions in C++ code, then remember
|
||||
that escape characters are processed twice: once by the C++
|
||||
compiler, and once by the regex++ expression compiler, so to pass
|
||||
the regular expression \d+ to regex++, you need to embed "\\d+"
|
||||
in your code. Likewise to match a literal backslash you will need
|
||||
to embed "\\\\" in your code. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Why don't character ranges work
|
||||
properly?</font> <br>
|
||||
A. The POSIX standard specifies that character range expressions
|
||||
are locale sensitive - so for example the expression [A-Z] will
|
||||
match any collating element that collates between 'A' and 'Z'.
|
||||
That means that for most locales other than "C" or
|
||||
"POSIX", [A-Z] would match the single character 't' for
|
||||
example, which is not what most people expect - or at least not
|
||||
what most people have come to expect from regular expression
|
||||
engines. For this reason, the default behaviour of regex++ is to
|
||||
turn locale sensitive collation off by setting the regbase::nocollate
|
||||
compile time flag (this is set by regbase::normal). However if
|
||||
you set a non-default compile time flag - for example regbase::extended
|
||||
or regbase::basic, then locale dependent collation will be
|
||||
enabled, this also applies to the POSIX API functions which use
|
||||
either regbase::extended or regbase::basic internally, in the
|
||||
latter case use REG_NOCOLLATE in combination with either
|
||||
REG_BASIC or REG_EXTENDED when invoking regcomp if you don't want
|
||||
locale sensitive collation. <i>[Note - when regbase::nocollate in
|
||||
effect, the library behaves "as if" the LC_COLLATE
|
||||
locale category were always "C", regardless of what its
|
||||
actually set to - end note</i>]. </p>
|
||||
|
||||
<p><font color="#FF0000"> Q. Why can't I use the "convenience"
|
||||
versions of query_match/reg_search/reg_grep/reg_format/reg_merge?</font>
|
||||
</p>
|
||||
|
||||
<p>A. These versions may or may not be available depending upon
|
||||
the capabilities of your compiler, the rules determining the
|
||||
format of these functions are quite complex - and only the
|
||||
versions visible to a standard compliant compiler are given in
|
||||
the help. To find out what your compiler supports, run <boost/regex.hpp>
|
||||
through your C++ pre-processor, and search the output file for
|
||||
the function that you are interested in. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Why are there no throw specifications
|
||||
on any of the functions? What exceptions can the library throw?</font>
|
||||
</p>
|
||||
|
||||
<p>A. Not all compilers support (or honor) throw specifications,
|
||||
others support them but with reduced efficiency. Throw
|
||||
specifications may be added at a later date as compilers begin to
|
||||
handle this better. The library should throw only three types of
|
||||
exception: boost::bad_expression can be thrown by reg_expression
|
||||
when compiling a regular expression, std::runtime_error can be
|
||||
thrown when a call to reg_expression::imbue tries to open a
|
||||
message catalogue that doesn't exist or when a call to RegEx::GrepFiles
|
||||
or RegEx::FindFiles tries to open a file that cannot be opened,
|
||||
finally std::bad_alloc can be thrown by just about any of the
|
||||
functions in this library. </p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
@ -1,243 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, Format String Reference</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Format
|
||||
String Reference.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="format_string"></a>Format String Syntax</h3>
|
||||
|
||||
<p>Format strings are used by the algorithms <a
|
||||
href="template_class_ref.htm#reg_format">regex_format</a> and <a
|
||||
href="template_class_ref.htm#reg_merge">regex_merge</a>, and are
|
||||
used to transform one string into another. </p>
|
||||
|
||||
<p>There are three kind of format string: sed, perl and extended,
|
||||
the extended syntax is the default so this is covered first. </p>
|
||||
|
||||
<p><b><i>Extended format syntax</i></b> </p>
|
||||
|
||||
<p>In format strings, all characters are treated as literals
|
||||
except: ()$\?: </p>
|
||||
|
||||
<p>To use any of these as literals you must prefix them with the
|
||||
escape character \ </p>
|
||||
|
||||
<p>The following special sequences are recognized: <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Grouping:</i> </p>
|
||||
|
||||
<p>Use the parenthesis characters ( and ) to group sub-expressions
|
||||
within the format string, use \( and \) to represent literal '('
|
||||
and ')'. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Sub-expression expansions:</i> </p>
|
||||
|
||||
<p>The following perl like expressions expand to a particular
|
||||
matched sub-expression: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$`</td>
|
||||
<td valign="top" width="43%">Expands to all the text from
|
||||
the end of the previous match to the start of the current
|
||||
match, if there was no previous match in the current
|
||||
operation, then everything from the start of the input
|
||||
string to the start of the match.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$'</td>
|
||||
<td valign="top" width="43%">Expands to all the text from
|
||||
the end of the match to the end of the input string.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$&</td>
|
||||
<td valign="top" width="43%">Expands to all of the
|
||||
current match.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$0</td>
|
||||
<td valign="top" width="43%">Expands to all of the
|
||||
current match.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$N</td>
|
||||
<td valign="top" width="43%">Expands to the text that
|
||||
matched sub-expression <i>N</i>.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><i>Conditional expressions:</i> </p>
|
||||
|
||||
<p>Conditional expressions allow two different format strings to
|
||||
be selected dependent upon whether a sub-expression participated
|
||||
in the match or not: </p>
|
||||
|
||||
<p>?Ntrue_expression:false_expression </p>
|
||||
|
||||
<p>Executes true_expression if sub-expression <i>N</i>
|
||||
participated in the match, otherwise executes false_expression. </p>
|
||||
|
||||
<p>Example: suppose we search for "(while)|(for)" then
|
||||
the format string "?1WHILE:FOR" would output what
|
||||
matched, but in upper case. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Escape sequences:</i> </p>
|
||||
|
||||
<p>The following escape sequences are also allowed: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\a</td>
|
||||
<td valign="top" width="43%">The bell character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\f</td>
|
||||
<td valign="top" width="43%">The form feed character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\n</td>
|
||||
<td valign="top" width="43%">The newline character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\r</td>
|
||||
<td valign="top" width="43%">The carriage return
|
||||
character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\t</td>
|
||||
<td valign="top" width="43%">The tab character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\v</td>
|
||||
<td valign="top" width="43%">A vertical tab character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\x</td>
|
||||
<td valign="top" width="43%">A hexadecimal character -
|
||||
for example \x0D.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\x{}</td>
|
||||
<td valign="top" width="43%">A possible unicode
|
||||
hexadecimal character - for example \x{1A0}</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\cx</td>
|
||||
<td valign="top" width="43%">The ASCII escape character
|
||||
x, for example \c@ is equivalent to escape-@.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\e</td>
|
||||
<td valign="top" width="43%">The ASCII escape character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\dd</td>
|
||||
<td valign="top" width="43%">An octal character constant,
|
||||
for example \10.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><b><i>Perl format strings</i></b> </p>
|
||||
|
||||
<p>Perl format strings are the same as the default syntax except
|
||||
that the characters ()?: have no special meaning. </p>
|
||||
|
||||
<p><b><i>Sed format strings</i></b> </p>
|
||||
|
||||
<p>Sed format strings use only the characters \ and & as
|
||||
special characters. </p>
|
||||
|
||||
<p>\n where n is a digit, is expanded to the nth sub-expression. </p>
|
||||
|
||||
<p>& is expanded to the whole of the match (equivalent to \0).
|
||||
</p>
|
||||
|
||||
<p>Other escape sequences are expanded as per the default syntax.
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
572
hl_ref.htm
572
hl_ref.htm
@ -1,572 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, RegEx Class Reference</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, RegEx Class
|
||||
Reference. </h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="RegEx"></a><i>Class RegEx</i></h3>
|
||||
|
||||
<p>#include <boost/cregex.hpp> </p>
|
||||
|
||||
<p>The class RegEx provides a high level simplified interface to
|
||||
the regular expression library, this class only handles narrow
|
||||
character strings, and regular expressions always follow the
|
||||
"normal" syntax - that is the same as the standard
|
||||
POSIX extended syntax, but with locale specific collation
|
||||
disabled, and escape characters inside character set declarations
|
||||
are allowed. </p>
|
||||
|
||||
<pre><b>typedef</b> <b>bool</b> (*GrepCallback)(<b>const</b> RegEx& expression);
|
||||
<b>typedef</b> <b>bool</b> (*GrepFileCallback)(<b>const</b> <b>char</b>* file, <b>const</b> RegEx& expression);
|
||||
<b>typedef</b> <b>bool</b> (*FindFilesCallback)(<b>const</b> <b>char</b>* file);
|
||||
|
||||
<b>class</b> RegEx
|
||||
{
|
||||
<b>public</b>:
|
||||
RegEx();
|
||||
RegEx(<b>const</b> RegEx& o);
|
||||
~RegEx();
|
||||
RegEx(<b>const</b> <b>char</b>* c, <b>bool</b> icase = <b>false</b>);
|
||||
<strong>explicit</strong> RegEx(<b>const</b> std::string& s, <b>bool</b> icase = <b>false</b>);
|
||||
RegEx& <b>operator</b>=(<b>const</b> RegEx& o);
|
||||
RegEx& <b>operator</b>=(<b>const</b> <b>char</b>* p);
|
||||
RegEx& <b>operator</b>=(<b>const</b> std::string& s);
|
||||
<b>unsigned</b> <b>int</b> SetExpression(<b>const</b> <b>char</b>* p, <b>bool</b> icase = <b>false</b>);
|
||||
<b>unsigned</b> <b>int</b> SetExpression(<b>const</b> std::string& s, <b>bool</b> icase = <b>false</b>);
|
||||
std::string Expression()<b>const</b>;
|
||||
<font color="#000080"><i>//
|
||||
</i> <i>// now matching operators: </i>
|
||||
<i>// </i></font>
|
||||
<b>bool</b> Match(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>bool</b> Match(<b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>bool</b> Search(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>bool</b> Search(<b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<std::string>& v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<std::string>& v, <b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<<b>unsigned</b> <b>int</b>>& v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<<b>unsigned</b> <b>int</b>>& v, <b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> std::string& files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> std::string& files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
std::string Merge(<b>const</b> std::string& in, <b>const</b> std::string& fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
std::string Merge(<b>const</b> char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned int </b>flags = match_default);
|
||||
<b>unsigned</b> Split(std::vector<std::string>& v, std::string& s, <b>unsigned</b> flags = match_default, <b>unsigned</b> max_count = ~0);
|
||||
<font color="#000080"><i>//
|
||||
</i> <i>// now operators for returning what matched in more detail:
|
||||
</i> <i>//
|
||||
</i></font> <b>unsigned</b> <b>int</b> Position(<b>int</b> i = 0)<b>const</b>;
|
||||
<b>unsigned</b> <b>int</b> Length(<b>int</b> i = 0)<b>const</b>;
|
||||
<strong>bool</strong> Matched(<strong>int</strong> i = 0)<strong>const</strong>;
|
||||
<b>unsigned</b> <b>int</b> Line()<b>const</b>;
|
||||
<b>unsigned int</b> Marks() const;
|
||||
std::string What(<b>int</b> i)<b>const</b>;
|
||||
std::string <b>operator</b>[](<b>int</b> i)<b>const</b> ;
|
||||
|
||||
<strong>static const unsigned int</strong> npos;
|
||||
}; </pre>
|
||||
|
||||
<p>Member functions for class RegEx are defined as follows: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx();</td>
|
||||
<td valign="top" width="42%">Default constructor,
|
||||
constructs an instance of RegEx without any valid
|
||||
expression.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx(<b>const</b>
|
||||
RegEx& o);</td>
|
||||
<td valign="top" width="42%">Copy constructor, all the
|
||||
properties of parameter <i>o</i> are copied.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx(<b>const</b> <b>char</b>*
|
||||
c, <b>bool</b> icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Constructs an instance of
|
||||
RegEx, setting the expression to <i>c</i>, if <i>icase</i>
|
||||
is <i>true</i> then matching is insensitive to case,
|
||||
otherwise it is sensitive to case. Throws <i>bad_expression</i>
|
||||
on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx(<b>const</b> std::string&
|
||||
s, <b>bool</b> icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Constructs an instance of
|
||||
RegEx, setting the expression to <i>s</i>, if <i>icase </i>is
|
||||
<i>true</i> then matching is insensitive to case,
|
||||
otherwise it is sensitive to case. Throws <i>bad_expression</i>
|
||||
on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx& <b>operator</b>=(<b>const</b>
|
||||
RegEx& o);</td>
|
||||
<td valign="top" width="42%">Default assignment operator.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx& <b>operator</b>=(<b>const</b>
|
||||
<b>char</b>* p);</td>
|
||||
<td valign="top" width="42%">Assignment operator,
|
||||
equivalent to calling <i>SetExpression(p, false).</i>
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx& <b>operator</b>=(<b>const</b>
|
||||
std::string& s);</td>
|
||||
<td valign="top" width="42%">Assignment operator,
|
||||
equivalent to calling <i>SetExpression(s, false).</i>
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
SetExpression(<b>constchar</b>* p, <b>bool</b> icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Sets the current expression
|
||||
to <i>p</i>, if <i>icase</i> is <i>true</i> then matching
|
||||
is insensitive to case, otherwise it is sensitive to case.
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
SetExpression(<b>const</b> std::string& s, <b>bool</b>
|
||||
icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Sets the current expression
|
||||
to <i>s</i>, if <i>icase</i> is <i>true</i> then matching
|
||||
is insensitive to case, otherwise it is sensitive to case.
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string Expression()<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns a copy of the
|
||||
current regular expression.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
|
||||
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Attempts to match the
|
||||
current expression against the text <i>p</i> using the
|
||||
match flags <i>flags</i> - see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the expression matches the whole
|
||||
of the input string.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default) ;</td>
|
||||
<td valign="top" width="42%">Attempts to match the
|
||||
current expression against the text <i>s</i> using the
|
||||
match flags <i>flags</i> - see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the expression matches the whole
|
||||
of the input string.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
|
||||
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Attempts to find a match for
|
||||
the current expression somewhere in the text <i>p</i>
|
||||
using the match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the match succeeds.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default) ;</td>
|
||||
<td valign="top" width="42%">Attempts to find a match for
|
||||
the current expression somewhere in the text <i>s</i>
|
||||
using the match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the match succeeds.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>p</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match found calls the call-back function <i>cb</i>
|
||||
as: cb(*this); <p>If at any stage the call-back function
|
||||
returns false then the grep operation terminates,
|
||||
otherwise continues until no further matches are found.
|
||||
Returns the number of matches found.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(GrepCallback cb, <b>const</b> std::string& s, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>s</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match found calls the call-back function <i>cb</i>
|
||||
as: cb(*this); <p>If at any stage the call-back function
|
||||
returns false then the grep operation terminates,
|
||||
otherwise continues until no further matches are found.
|
||||
Returns the number of matches found. </p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<std::string>& v, <b>const</b> <b>char</b>*
|
||||
p, <b>unsigned</b> <b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>p</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes a copy of what matched onto <i>v</i>.
|
||||
Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<std::string>& v, <b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>s</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes a copy of what matched onto <i>v</i>.
|
||||
Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<<b>unsigned int</b>>& v, <b>const</b>
|
||||
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>p</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes the starting index of what matched
|
||||
onto <i>v</i>. Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<<b>unsigned int</b>>& v, <b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>s</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes the starting index of what matched
|
||||
onto <i>v</i>. Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>*
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the files <i>files</i> using the
|
||||
match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match calls the call-back function cb. <p>If
|
||||
the call-back returns false then the algorithm returns
|
||||
without considering further matches in the current file,
|
||||
or any further files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of matches found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
GrepFiles(GrepFileCallback cb, <b>const</b> std::string&
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the files <i>files</i> using the
|
||||
match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match calls the call-back function cb. <p>If
|
||||
the call-back returns false then the algorithm returns
|
||||
without considering further matches in the current file,
|
||||
or any further files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of matches found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>*
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Searches <i>files</i> to
|
||||
find all those which contain at least one match of the
|
||||
current expression using the match flags <i>flags </i>-
|
||||
see <a href="template_class_ref.htm#match_type">match
|
||||
flags</a>. For each matching file calls the call-back
|
||||
function cb. <p>If the call-back returns false then
|
||||
the algorithm returns without considering any further
|
||||
files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of files found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
FindFiles(FindFilesCallback cb, <b>const</b> std::string&
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Searches <i>files</i> to
|
||||
find all those which contain at least one match of the
|
||||
current expression using the match flags <i>flags </i>-
|
||||
see <a href="template_class_ref.htm#match_type">match
|
||||
flags</a>. For each matching file calls the call-back
|
||||
function cb. <p>If the call-back returns false then
|
||||
the algorithm returns without considering any further
|
||||
files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of files found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string Merge(<b>const</b>
|
||||
std::string& in, <b>const</b> std::string& fmt, <b>bool</b>
|
||||
copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Performs a search and
|
||||
replace operation: searches through the string <i>in</i>
|
||||
for all occurrences of the current expression, for each
|
||||
occurrence replaces the match with the format string <i>fmt</i>.
|
||||
Uses <i>flags</i> to determine what gets matched, and how
|
||||
the format string should be treated. If <i>copy</i> is
|
||||
true then all unmatched sections of input are copied
|
||||
unchanged to output, if the flag <em>format_first_only</em>
|
||||
is set then only the first occurance of the pattern found
|
||||
is replaced. Returns the new string. See <a
|
||||
href="format_string.htm#format_string">also format string
|
||||
syntax</a>, <a href="template_class_ref.htm#match_type">match
|
||||
flags</a> and <a
|
||||
href="template_class_ref.htm#format_flags">format flags</a>.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string Merge(<b>const</b>
|
||||
char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>,
|
||||
<b>unsigned int </b>flags = match_default);</td>
|
||||
<td valign="top" width="42%">Performs a search and
|
||||
replace operation: searches through the string <i>in</i>
|
||||
for all occurrences of the current expression, for each
|
||||
occurrence replaces the match with the format string <i>fmt</i>.
|
||||
Uses <i>flags</i> to determine what gets matched, and how
|
||||
the format string should be treated. If <i>copy</i> is
|
||||
true then all unmatched sections of input are copied
|
||||
unchanged to output, if the flag <em>format_first_only</em>
|
||||
is set then only the first occurance of the pattern found
|
||||
is replaced. Returns the new string. See <a
|
||||
href="format_string.htm#format_string">also format string
|
||||
syntax</a>, <a href="template_class_ref.htm#match_type">match
|
||||
flags</a> and <a
|
||||
href="template_class_ref.htm#format_flags">format flags</a>.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top"><b>unsigned</b> Split(std::vector<std::string>&
|
||||
v, std::string& s, <b>unsigned</b> flags =
|
||||
match_default, <b>unsigned</b> max_count = ~0);</td>
|
||||
<td valign="top">Splits the input string and pushes each
|
||||
one onto the vector. If the expression contains no marked
|
||||
sub-expressions, then one string is outputted for each
|
||||
section of the input that does not match the expression.
|
||||
If the expression does contain marked sub-expressions,
|
||||
then outputs one string for each marked sub-expression
|
||||
each time a match occurs. Outputs no more than <i>max_count
|
||||
</i>strings. Before returning, deletes from the input
|
||||
string <i>s</i> all of the input that has been processed
|
||||
(all of the string if <i>max_count</i> was not reached).
|
||||
Returns the number of strings pushed onto the vector.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Position(<b>int</b> i = 0)<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns the position of what
|
||||
matched sub-expression <i>i</i>. If <i>i = 0</i> then
|
||||
returns the position of the whole match. Returns RegEx::npos
|
||||
if the supplied index is invalid, or if the specified sub-expression
|
||||
did not participate in the match.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Length(<b>int</b> i = 0)<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns the length of what
|
||||
matched sub-expression <i>i</i>. If <i>i = 0</i> then
|
||||
returns the length of the whole match. Returns RegEx::npos
|
||||
if the supplied index is invalid, or if the specified sub-expression
|
||||
did not participate in the match.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td><strong>bool</strong> Matched(<strong>int</strong> i
|
||||
= 0)<strong>const</strong>;</td>
|
||||
<td>Returns true if sub-expression <em>i</em> was
|
||||
matched, false otherwise.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Line()<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns the line on which
|
||||
the match occurred, indexes start from 1 not zero, if no
|
||||
match occurred then returns RegEx::npos.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned int</b> Marks()
|
||||
const;</td>
|
||||
<td valign="top" width="42%">Returns the number of marked
|
||||
sub-expressions contained in the expression. Note that
|
||||
this includes the whole match (sub-expression zero), so
|
||||
the value returned is always >= 1.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string What(<b>int</b>
|
||||
i)<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns a copy of what
|
||||
matched sub-expression <i>i</i>. If <i>i = 0</i> then
|
||||
returns a copy of the whole match. Returns a null string
|
||||
if the index is invalid or if the specified sub-expression
|
||||
did not participate in a match.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string <b>operator</b>[](<b>int</b>
|
||||
i)<b>const</b> ;</td>
|
||||
<td valign="top" width="42%">Returns <i>what(i);</i> <p>Can
|
||||
be used to simplify access to sub-expression matches, and
|
||||
make usage more perl-like.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
150
index.htm
150
index.htm
@ -1,150 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="keywords"
|
||||
content="regex++, regular expressions, regular expression library, C++">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>regex++, Index</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="277" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Index.</h3>
|
||||
<p align="left"><i>(Version 3.31, 16th Dec 2001)</i>
|
||||
</p>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3 align="center">Contents</h3>
|
||||
|
||||
<ul>
|
||||
<li><a href="introduction.htm#intro">Introduction</a></li>
|
||||
<li><a href="introduction.htm#Installation">Installation and
|
||||
Configuration</a> </li>
|
||||
<li><a href="template_class_ref.htm#regbase">Template Class
|
||||
and Algorithm Reference</a> <ul>
|
||||
<li>Class <a href="template_class_ref.htm#regbase">regbase</a></li>
|
||||
<li>Class <a
|
||||
href="template_class_ref.htm#bad_expression">bad_expression</a>
|
||||
</li>
|
||||
<li>Class <a
|
||||
href="template_class_ref.htm#reg_expression">reg_expression</a>
|
||||
</li>
|
||||
<li>Class <a
|
||||
href="template_class_ref.htm#regex_char_traits">char_regex_traits</a></li>
|
||||
<li>Class <a href="template_class_ref.htm#reg_match">match_results</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#query_match">regex_match</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_search">regex_search</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_grep">regex_grep</a></li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_format">regex_format</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_merge">regex_merge</a></li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#regex_split">regex_split</a>
|
||||
</li>
|
||||
<li><a href="template_class_ref.htm#partial_matches">Partial
|
||||
regular expression matches</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Class <a href="hl_ref.htm#RegEx">RegEx</a> reference</li>
|
||||
<li><a href="posix_ref.htm#posix">POSIX Compatibility
|
||||
Functions</a></li>
|
||||
<li><a href="syntax.htm#syntax">Regular Expression Syntax</a></li>
|
||||
<li><a href="format_string.htm#format_string">Format String
|
||||
Syntax</a></li>
|
||||
<li><a href="appendix.htm#implementation">Appendices</a> <ul>
|
||||
<li><a href="appendix.htm#implementation">Implementation
|
||||
notes</a></li>
|
||||
<li><a href="appendix.htm#threads">Thread safety</a></li>
|
||||
<li><a href="appendix.htm#localisation">Localization</a></li>
|
||||
<li><a href="appendix.htm#demos">Example Applications</a>
|
||||
<ul>
|
||||
<li><a
|
||||
href="example/snippets/regex_match_example.cpp">regex_match_example.cpp</a>:
|
||||
ftp based regex_match example.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_search_example.cpp">regex_search_example.cpp</a>:
|
||||
regex_search example: searches a cpp file
|
||||
for class definitions.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_1.cpp">regex_grep_example_1.cpp</a>:
|
||||
regex_grep example 1: searches a cpp file
|
||||
for class definitions.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_merge_example.cpp">regex_merge_example.cpp</a>:
|
||||
regex_merge example: converts a C++ file
|
||||
to syntax highlighted HTML.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_2.cpp">regex_grep_example_2.cpp</a>:
|
||||
regex_grep example 2: searches a cpp file
|
||||
for class definitions, using a global
|
||||
callback function. </li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_3.cpp">regex_grep_example_3.cpp</a>:
|
||||
regex_grep example 2: searches a cpp file
|
||||
for class definitions, using a bound
|
||||
member function callback.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_4.cpp">regex_grep_example_4.cpp</a>:
|
||||
regex_grep example 2: searches a cpp file
|
||||
for class definitions, using a C++
|
||||
Builder closure as a callback.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_split_example_1.cpp">regex_split_example_1.cpp</a>:
|
||||
regex_split example: split a string into
|
||||
tokens.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_split_example_2.cpp">regex_split_example_2.cpp</a>:
|
||||
regex_split example: spit out linked
|
||||
URL's.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="appendix.htm#headers">Header Files.</a></li>
|
||||
<li><a href="appendix.htm#redist">Redistributables</a></li>
|
||||
<li><a href="appendix.htm#upgrade">Note for upgraders</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="appendix.htm#furtherInfo">Further Information (Contacts
|
||||
and Acknowledgements)</a></li>
|
||||
<li><a href="faq.htm">FAQ</a></li>
|
||||
</ul>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
9
index.html
Normal file
9
index.html
Normal file
@ -0,0 +1,9 @@
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="refresh" content="0; URL=doc/index.html">
|
||||
</head>
|
||||
<body>
|
||||
Automatic redirection failed, please go to <A href="doc/index.html">doc/index.html</A>.
|
||||
</body>
|
||||
</html>
|
||||
|
476
introduction.htm
476
introduction.htm
@ -1,476 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="keywords"
|
||||
content="regex++, regular expressions, regular expression library, C++">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>regex++, Introduction</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Introduction.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="intro"></a><i>Introduction</i></h3>
|
||||
|
||||
<p>Regular expressions are a form of pattern-matching that are
|
||||
often used in text processing; many users will be familiar with
|
||||
the Unix utilities <i>grep</i>, <i>sed</i> and <i>awk</i>, and
|
||||
the programming language <i>perl</i>, each of which make
|
||||
extensive use of regular expressions. Traditionally C++ users
|
||||
have been limited to the POSIX C API's for manipulating regular
|
||||
expressions, and while regex++ does provide these API's, they do
|
||||
not represent the best way to use the library. For example regex++
|
||||
can cope with wide character strings, or search and replace
|
||||
operations (in a manner analogous to either sed or perl),
|
||||
something that traditional C libraries can not do.</p>
|
||||
|
||||
<p>The class <a href="template_class_ref.htm#reg_expression">boost::reg_expression</a>
|
||||
is the key class in this library; it represents a "machine
|
||||
readable" regular expression, and is very closely modelled
|
||||
on std::basic_string, think of it as a string plus the actual
|
||||
state-machine required by the regular expression algorithms. Like
|
||||
std::basic_string there are two typedefs that are almost always
|
||||
the means by which this class is referenced:</p>
|
||||
|
||||
<pre><b>namespace </b>boost{
|
||||
|
||||
<b>template</b> <<b>class</b> charT,
|
||||
<b> class</b> traits = regex_traits<charT>,
|
||||
<b>class</b> Allocator = std::allocator<charT> >
|
||||
<b>class</b> reg_expression;
|
||||
|
||||
<b>typedef</b> reg_expression<<b>char</b>> regex;
|
||||
<b>typedef</b> reg_expression<<b>wchar_t></b> wregex;
|
||||
|
||||
}</pre>
|
||||
|
||||
<p>To see how this library can be used, imagine that we are
|
||||
writing a credit card processing application. Credit card numbers
|
||||
generally come as a string of 16-digits, separated into groups of
|
||||
4-digits, and separated by either a space or a hyphen. Before
|
||||
storing a credit card number in a database (not necessarily
|
||||
something your customers will appreciate!), we may want to verify
|
||||
that the number is in the correct format. To match any digit we
|
||||
could use the regular expression [0-9], however ranges of
|
||||
characters like this are actually locale dependent. Instead we
|
||||
should use the POSIX standard form [[:digit:]], or the regex++
|
||||
and perl shorthand for this \d (note that many older libraries
|
||||
tended to be hard-coded to the C-locale, consequently this was
|
||||
not an issue for them). That leaves us with the following regular
|
||||
expression to validate credit card number formats:</p>
|
||||
|
||||
<p>(\d{4}[- ]){3}\d{4}</p>
|
||||
|
||||
<p>Here the parenthesis act to group (and mark for future
|
||||
reference) sub-expressions, and the {4} means "repeat
|
||||
exactly 4 times". This is an example of the extended regular
|
||||
expression syntax used by perl, awk and egrep. Regex++ also
|
||||
supports the older "basic" syntax used by sed and grep,
|
||||
but this is generally less useful, unless you already have some
|
||||
basic regular expressions that you need to reuse.</p>
|
||||
|
||||
<p>Now lets take that expression and place it in some C++ code to
|
||||
validate the format of a credit card number:</p>
|
||||
|
||||
<pre><b>bool</b> validate_card_format(<b>const</b> std::string s)
|
||||
{
|
||||
<b>static</b> <b>const</b> <a
|
||||
href="template_class_ref.htm#reg_expression">boost::regex</a> e("(\\d{4}[- ]){3}\\d{4}");
|
||||
<b>return</b> <a href="template_class_ref.htm#query_match">regex_match</a>(s, e);
|
||||
}</pre>
|
||||
|
||||
<p>Note how we had to add some extra escapes to the expression:
|
||||
remember that the escape is seen once by the C++ compiler, before
|
||||
it gets to be seen by the regular expression engine, consequently
|
||||
escapes in regular expressions have to be doubled up when
|
||||
embedding them in C/C++ code. Also note that all the examples
|
||||
assume that your compiler supports Koenig lookup, if yours
|
||||
doesn't (for example VC6), then you will have to add some boost::
|
||||
prefixes to some of the function calls in the examples.</p>
|
||||
|
||||
<p>Those of you who are familiar with credit card processing,
|
||||
will have realised that while the format used above is suitable
|
||||
for human readable card numbers, it does not represent the format
|
||||
required by online credit card systems; these require the number
|
||||
as a string of 16 (or possibly 15) digits, without any
|
||||
intervening spaces. What we need is a means to convert easily
|
||||
between the two formats, and this is where search and replace
|
||||
comes in. Those who are familiar with the utilities <i>sed</i>
|
||||
and <i>perl</i> will already be ahead here; we need two strings -
|
||||
one a regular expression - the other a "<a
|
||||
href="format_string.htm">format string</a>" that provides a
|
||||
description of the text to replace the match with. In regex++
|
||||
this search and replace operation is performed with the algorithm
|
||||
regex_merge, for our credit card example we can write two
|
||||
algorithms like this to provide the format conversions:</p>
|
||||
|
||||
<pre>
|
||||
<i>// match any format with the regular expression:
|
||||
</i><b>const</b> boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
|
||||
<b>const</b> std::string machine_format("\\1\\2\\3\\4");
|
||||
<b>const</b> std::string human_format("\\1-\\2-\\3-\\4");
|
||||
|
||||
std::string machine_readable_card_number(<b>const</b> std::string s)
|
||||
{
|
||||
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, machine_format, boost::match_default | boost::format_sed);
|
||||
}
|
||||
|
||||
std::string human_readable_card_number(<b>const</b> std::string s)
|
||||
{
|
||||
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, human_format, boost::match_default | boost::format_sed);
|
||||
}</pre>
|
||||
|
||||
<p>Here we've used marked sub-expressions in the regular
|
||||
expression to split out the four parts of the card number as
|
||||
separate fields, the format string then uses the sed-like syntax
|
||||
to replace the matched text with the reformatted version.</p>
|
||||
|
||||
<p>In the examples above, we haven't directly manipulated the
|
||||
results of a regular expression match, however in general the
|
||||
result of a match contains a number of sub-expression matches in
|
||||
addition to the overall match. When the library needs to report a
|
||||
regular expression match it does so using an instance of the
|
||||
class <a href="template_class_ref.htm#reg_match">match_results</a>,
|
||||
as before there are typedefs of this class for the most common
|
||||
cases: </p>
|
||||
|
||||
<pre><b>namespace </b>boost{
|
||||
<b>typedef</b> match_results<<b>const</b> <b>char</b>*> cmatch;
|
||||
<b>typedef</b> match_results<<b>const</b> <b>wchar_t</b>*> wcmatch;
|
||||
<strong>typedef</strong> match_results<std::string::const_iterator> smatch;
|
||||
<strong>typedef</strong> match_results<std::wstring::const_iterator> wsmatch;
|
||||
}</pre>
|
||||
|
||||
<p>The algorithms <a href="template_class_ref.htm#reg_search">regex_search</a>
|
||||
and <a href="template_class_ref.htm#reg_grep">regex_grep</a> (i.e.
|
||||
finding all matches in a string) make use of match_results to
|
||||
report what matched.</p>
|
||||
|
||||
<p>Note that these algorithms are not restricted to searching
|
||||
regular C-strings, any bidirectional iterator type can be
|
||||
searched, allowing for the possibility of seamlessly searching
|
||||
almost any kind of data. </p>
|
||||
|
||||
<p>For search and replace operations in addition to the algorithm
|
||||
<a href="template_class_ref.htm#reg_merge">regex_merge</a> that
|
||||
we have already seen, the algorithm <a
|
||||
href="template_class_ref.htm#reg_format">regex_format</a> takes
|
||||
the result of a match and a format string, and produces a new
|
||||
string by merging the two.</p>
|
||||
|
||||
<p>For those that dislike templates, there is a high level
|
||||
wrapper class RegEx that is an encapsulation of the lower level
|
||||
template code - it provides a simplified interface for those that
|
||||
don't need the full power of the library, and supports only
|
||||
narrow characters, and the "extended" regular
|
||||
expression syntax. </p>
|
||||
|
||||
<p>The <a href="posix_ref.htm#posix">POSIX API</a> functions:
|
||||
regcomp, regexec, regfree and regerror, are available in both
|
||||
narrow character and Unicode versions, and are provided for those
|
||||
who need compatibility with these API's. </p>
|
||||
|
||||
<p>Finally, note that the library now has run-time <a
|
||||
href="appendix.htm#localisation">localization</a> support, and
|
||||
recognizes the full POSIX regular expression syntax - including
|
||||
advanced features like multi-character collating elements and
|
||||
equivalence classes - as well as providing compatibility with
|
||||
other regular expression libraries including GNU and BSD4 regex
|
||||
packages, and to a more limited extent perl 5. </p>
|
||||
|
||||
<h3><a name="Installation"></a><i>Installation and Configuration
|
||||
Options</i> </h3>
|
||||
|
||||
<p><em>[ </em><strong><i>Important</i></strong><em>: If you are
|
||||
upgrading from the 2.x version of this library then you will find
|
||||
a number of changes to the documented header names and library
|
||||
interfaces, existing code should still compile unchanged however
|
||||
- see </em><a href="appendix.htm#upgrade"><font color="#0000FF"><em>Note
|
||||
for Upgraders</em></font></a><em>. ]</em></p>
|
||||
|
||||
<p>When you extract the library from its zip file, you must
|
||||
preserve its internal directory structure (for example by using
|
||||
the -d option when extracting). If you didn't do that when
|
||||
extracting, then you'd better stop reading this, delete the files
|
||||
you just extracted, and try again! </p>
|
||||
|
||||
<p>This library should not need configuring before use; most
|
||||
popular compilers/standard libraries/platforms are already
|
||||
supported "as is". If you do experience configuration
|
||||
problems, or just want to test the configuration with your
|
||||
compiler, then the process is the same as for all of boost; see
|
||||
the <a href="../config/config.htm">configuration library
|
||||
documentation</a>.</p>
|
||||
|
||||
<p>The library will encase all code inside namespace boost. </p>
|
||||
|
||||
<p>Unlike some other template libraries, this library consists of
|
||||
a mixture of template code (in the headers) and static code and
|
||||
data (in cpp files). Consequently it is necessary to build the
|
||||
library's support code into a library or archive file before you
|
||||
can use it, instructions for specific platforms are as follows: </p>
|
||||
|
||||
<p><b>Borland C++ Builder:</b> </p>
|
||||
|
||||
<ul>
|
||||
<li>Open up a console window and change to the
|
||||
<boost>\libs\regex\build directory. </li>
|
||||
<li>Select the appropriate makefile (bcb4.mak for C++ Builder
|
||||
4, bcb5.mak for C++ Builder 5, and bcb6.mak for C++
|
||||
Builder 6). </li>
|
||||
<li>Invoke the makefile (pass the full path to your version
|
||||
of make if you have more than one version installed, the
|
||||
makefile relies on the path to make to obtain your C++
|
||||
Builder installation directory and tools) for example: </li>
|
||||
</ul>
|
||||
|
||||
<pre>make -fbcb5.mak</pre>
|
||||
|
||||
<p>The build process will build a variety of .lib and .dll files
|
||||
(the exact number depends upon the version of Borland's tools you
|
||||
are using) the .lib and dll files will be in a sub-directory
|
||||
called bcb4 or bcb5 depending upon the makefile used. To install
|
||||
the libraries into your development system use:</p>
|
||||
|
||||
<p>make -fbcb5.mak install</p>
|
||||
|
||||
<p>library files will be copied to <BCROOT>/lib and the
|
||||
dll's to <BCROOT>/bin, where <BCROOT> corresponds to
|
||||
the install path of your Borland C++ tools. </p>
|
||||
|
||||
<p>You may also remove temporary files created during the build
|
||||
process (excluding lib and dll files) by using:</p>
|
||||
|
||||
<p>make -fbcb5.mak clean</p>
|
||||
|
||||
<p>Finally when you use regex++ it is only necessary for you to
|
||||
add the <boost> root director to your list of include
|
||||
directories for that project. It is not necessary for you to
|
||||
manually add a .lib file to the project; the headers will
|
||||
automatically select the correct .lib file for your build mode
|
||||
and tell the linker to include it. There is one caveat however:
|
||||
the library can not tell the difference between VCL and non-VCL
|
||||
enabled builds when building a GUI application from the command
|
||||
line, if you build from the command line with the 5.5 command
|
||||
line tools then you must define the pre-processor symbol _NO_VCL
|
||||
in order to ensure that the correct link libraries are selected:
|
||||
the C++ Builder IDE normally sets this automatically. Hint, users
|
||||
of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg
|
||||
in order to set this option permanently. </p>
|
||||
|
||||
<p>If you would prefer to do a static link to the regex libraries
|
||||
even when using the dll runtime then define
|
||||
BOOST_REGEX_STATIC_LINK, and if you want to suppress automatic
|
||||
linking altogether (and supply your own custom build of the lib)
|
||||
then define BOOST_REGEX_NO_LIB.</p>
|
||||
|
||||
<p>If you are building with C++ Builder 6, you will find that
|
||||
<boost/regex.hpp> can not be used in a pre-compiled header
|
||||
(the actual problem is in <locale> which gets included by
|
||||
<boost/regex.hpp>), if this causes problems for you, then
|
||||
try defining BOOST_NO_STD_LOCALE when building, this will disable
|
||||
some features throughout boost, but may save you a lot in compile
|
||||
times!</p>
|
||||
|
||||
<p><b>Microsoft Visual C++ 6</b><strong> and 7</strong></p>
|
||||
|
||||
<p>You need version 6 of MSVC to build this library. If you are
|
||||
using VC5 then you may want to look at one of the previous
|
||||
releases of this <a
|
||||
href="http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm">library</a>
|
||||
</p>
|
||||
|
||||
<p>Open up a command prompt, which has the necessary MSVC
|
||||
environment variables defined (for example by using the batch
|
||||
file Vcvars32.bat installed by the Visual Studio installation),
|
||||
and change to the <boost>\libs\regex\build directory. </p>
|
||||
|
||||
<p>Select the correct makefile - vc6.mak for "vanilla"
|
||||
Visual C++ 6 or vc6-stlport.mak if you are using STLPort.</p>
|
||||
|
||||
<p>Invoke the makefile like this:</p>
|
||||
|
||||
<p>nmake -fvc6.mak</p>
|
||||
|
||||
<p>You will now have a collection of lib and dll files in a
|
||||
"vc6" subdirectory, to install these into your
|
||||
development system use:</p>
|
||||
|
||||
<p>nmake -fvc6.mak install</p>
|
||||
|
||||
<p>The lib files will be copied to your <VC6>\lib directory
|
||||
and the dll files to <VC6>\bin, where <VC6> is the
|
||||
root of your Visual C++ 6 installation.</p>
|
||||
|
||||
<p>You can delete all the temporary files created during the
|
||||
build (excluding lib and dll files) using:</p>
|
||||
|
||||
<p>nmake -fvc6.mak clean </p>
|
||||
|
||||
<p>Finally when you use regex++ it is only necessary for you to
|
||||
add the <boost> root directory to your list of include
|
||||
directories for that project. It is not necessary for you to
|
||||
manually add a .lib file to the project; the headers will
|
||||
automatically select the correct .lib file for your build mode
|
||||
and tell the linker to include it. </p>
|
||||
|
||||
<p>Note that if you want to statically link to the regex library
|
||||
when using the dynamic C++ runtime, define
|
||||
BOOST_REGEX_STATIC_LINK when building your project (this only has
|
||||
an effect for release builds). If you want to add the source
|
||||
directly to your project then define BOOST_REGEX_NO_LIB to
|
||||
disable automatic library selection.</p>
|
||||
|
||||
<p><strong><i>Important</i></strong><em>: there have been some
|
||||
reports of compiler-optimisation bugs affecting this library, (particularly
|
||||
with VC6 versions prior to service patch 5) the workaround is to
|
||||
build the library using /Oityb1 rather than /O2. That is to use
|
||||
all optimisation settings except /Oa. This problem is reported to
|
||||
affect some standard library code as well (in fact I'm not sure
|
||||
if the problem is with the regex code or the underlying standard
|
||||
library), so it's probably worthwhile applying this workaround in
|
||||
normal practice in any case.</em></p>
|
||||
|
||||
<p>Note: if you have replaced the C++ standard library that comes
|
||||
with VC6, then when you build the library you must ensure that
|
||||
the environment variables "INCLUDE" and "LIB"
|
||||
have been updated to reflect the include and library paths for
|
||||
the new library - see vcvars32.bat (part of your Visual Studio
|
||||
installation) for more details. Alternatively if STLPort is in c:/stlport
|
||||
then you could use:</p>
|
||||
|
||||
<p>nmake INCLUDES="-Ic:/stlport/stlport" XLFLAGS="/LIBPATH:c:/stlport/lib"
|
||||
-fvc6-stlport.mak</p>
|
||||
|
||||
<p>If you are building with the full STLPort v4.x, then use the
|
||||
vc6-stlport.mak file provided and set the environment variable
|
||||
STLPORT_PATH to point to the location of your STLport
|
||||
installation (Note that the full STLPort libraries appear not to
|
||||
support single-thread static builds). <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><b>GCC(2.95)</b> </p>
|
||||
|
||||
<p>There is a conservative makefile for the g++ compiler. From
|
||||
the command prompt change to the <boost>/libs/regex/build
|
||||
directory and type: </p>
|
||||
|
||||
<p>make -fgcc.mak </p>
|
||||
|
||||
<p>At the end of the build process you should have a gcc sub-directory
|
||||
containing release and debug versions of the library (libboost_regex.a
|
||||
and libboost_regex_debug.a). When you build projects that use
|
||||
regex++, you will need to add the boost install directory to your
|
||||
list of include paths and add <boost>/libs/regex/build/gcc/libboost_regex.a
|
||||
to your list of library files. </p>
|
||||
|
||||
<p>There is also a makefile to build the library as a shared
|
||||
library:</p>
|
||||
|
||||
<p>make -fgcc-shared.mak</p>
|
||||
|
||||
<p>which will build libboost_regex.so and libboost_regex_debug.so.</p>
|
||||
|
||||
<p>Both of the these makefiles support the following environment
|
||||
variables:</p>
|
||||
|
||||
<p>CXXFLAGS: extra compiler options - note that this applies to
|
||||
both the debug and release builds.</p>
|
||||
|
||||
<p>INCLUDES: additional include directories.</p>
|
||||
|
||||
<p>LDFLAGS: additional linker options.</p>
|
||||
|
||||
<p>LIBS: additional library files.</p>
|
||||
|
||||
<p>For the more adventurous there is a configure script in
|
||||
<boost>/libs/config; see the <a href="../config/config.htm">config
|
||||
library documentation</a>.</p>
|
||||
|
||||
<p><b>Sun Workshop 6.1</b></p>
|
||||
|
||||
<p>There is a makefile for the sun (6.1) compiler (C++ version 3.12).
|
||||
From the command prompt change to the <boost>/libs/regex/build
|
||||
directory and type: </p>
|
||||
|
||||
<p>dmake -f sunpro.mak </p>
|
||||
|
||||
<p>At the end of the build process you should have a sunpro sub-directory
|
||||
containing single and multithread versions of the library (libboost_regex.a,
|
||||
libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so).
|
||||
When you build projects that use regex++, you will need to add
|
||||
the boost install directory to your list of include paths and add
|
||||
<boost>/libs/regex/build/sunpro/ to your library search
|
||||
path. </p>
|
||||
|
||||
<p>Both of the these makefiles support the following environment
|
||||
variables:</p>
|
||||
|
||||
<p>CXXFLAGS: extra compiler options - note that this applies to
|
||||
both the single and multithreaded builds.</p>
|
||||
|
||||
<p>INCLUDES: additional include directories.</p>
|
||||
|
||||
<p>LDFLAGS: additional linker options.</p>
|
||||
|
||||
<p>LIBS: additional library files.</p>
|
||||
|
||||
<p>LIBSUFFIX: a suffix to mangle the library name with (defaults
|
||||
to nothing).</p>
|
||||
|
||||
<p>This makefile does not set any architecture specific options
|
||||
like -xarch=v9, you can set these by defining the appropriate
|
||||
macros, for example:</p>
|
||||
|
||||
<p>dmake CXXFLAGS="-xarch=v9" LDFLAGS="-xarch=v9"
|
||||
LIBSUFFIX="_v9" -f sunpro.mak</p>
|
||||
|
||||
<p>will build v9 variants of the regex library named
|
||||
libboost_regex_v9.a etc.</p>
|
||||
|
||||
<p><b>Other compilers:</b> </p>
|
||||
|
||||
<p>There is a generic makefile (<a href="build/generic.mak">generic.mak</a>)
|
||||
provided in <boost-root>/libs/regex/build - see that
|
||||
makefile for details of environment variables that need to be set
|
||||
before use. Alternatively you can using the <a
|
||||
href="../../tools/build/index.html">Jam based build system</a>.
|
||||
If you need to configure the library for your platform, then
|
||||
refer to the <a href="../config/config.htm">config library
|
||||
documentation</a>.</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
43
performance/Jamfile
Normal file
43
performance/Jamfile
Normal file
@ -0,0 +1,43 @@
|
||||
|
||||
subproject libs/regex/performance ;
|
||||
|
||||
SOURCES = command_line main time_boost time_greta time_localised_boost time_pcre time_posix time_safe_greta ;
|
||||
|
||||
if $(HS_REGEX_PATH)
|
||||
{
|
||||
HS_SOURCES = $(HS_REGEX_PATH)/regcomp.c $(HS_REGEX_PATH)/regerror.c $(HS_REGEX_PATH)/regexec.c $(HS_REGEX_PATH)/regfree.c ;
|
||||
POSIX_OPTS = <define>BOOST_HAS_POSIX=1 <include>$(HS_REGEX_PATH) ;
|
||||
}
|
||||
else if $(USE_POSIX)
|
||||
{
|
||||
POSIX_OPTS = <define>BOOST_HAS_POSIX=1 ;
|
||||
}
|
||||
|
||||
if $(PCRE_PATH)
|
||||
{
|
||||
PCRE_SOURCES = $(PCRE_PATH)/chartables.c $(PCRE_PATH)/get.c $(PCRE_PATH)/pcre.c $(PCRE_PATH)/study.c ;
|
||||
PCRE_OPTS = <define>BOOST_HAS_PCRE=1 <include>$(PCRE_PATH) ;
|
||||
}
|
||||
else if $(USE_PCRE)
|
||||
{
|
||||
PCRE_OPTS = <define>BOOST_HAS_PCRE=1 <find-library>pcre ;
|
||||
}
|
||||
|
||||
|
||||
exe regex_comparison :
|
||||
$(SOURCES).cpp
|
||||
$(HS_SOURCES)
|
||||
$(PCRE_SOURCES)
|
||||
<lib>../build/boost_regex
|
||||
<lib>../../test/build/boost_prg_exec_monitor
|
||||
:
|
||||
<include>$(BOOST_ROOT)
|
||||
<define>BOOST_REGEX_NO_LIB=1
|
||||
<define>BOOST_REGEX_STATIC_LINK=1
|
||||
$(POSIX_OPTS)
|
||||
$(PCRE_OPTS)
|
||||
;
|
||||
|
||||
|
||||
|
||||
|
470
performance/command_line.cpp
Normal file
470
performance/command_line.cpp
Normal file
@ -0,0 +1,470 @@
|
||||
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
#include <fstream>
|
||||
#include <deque>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
#include <iterator>
|
||||
#include <boost/regex.hpp>
|
||||
#include <boost/version.hpp>
|
||||
#include "regex_comparison.hpp"
|
||||
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
#include "pcre.h" // for pcre version number
|
||||
#endif
|
||||
|
||||
//
|
||||
// globals:
|
||||
//
|
||||
bool time_boost = false;
|
||||
bool time_localised_boost = false;
|
||||
bool time_greta = false;
|
||||
bool time_safe_greta = false;
|
||||
bool time_posix = false;
|
||||
bool time_pcre = false;
|
||||
|
||||
bool test_matches = false;
|
||||
bool test_code = false;
|
||||
bool test_html = false;
|
||||
bool test_short_twain = false;
|
||||
bool test_long_twain = false;
|
||||
|
||||
|
||||
std::string html_template_file;
|
||||
std::string html_out_file;
|
||||
std::string html_contents;
|
||||
std::list<results> result_list;
|
||||
|
||||
// the following let us compute averages:
|
||||
double greta_total = 0;
|
||||
double safe_greta_total = 0;
|
||||
double boost_total = 0;
|
||||
double locale_boost_total = 0;
|
||||
double posix_total = 0;
|
||||
double pcre_total = 0;
|
||||
unsigned greta_test_count = 0;
|
||||
unsigned safe_greta_test_count = 0;
|
||||
unsigned boost_test_count = 0;
|
||||
unsigned locale_boost_test_count = 0;
|
||||
unsigned posix_test_count = 0;
|
||||
unsigned pcre_test_count = 0;
|
||||
|
||||
int handle_argument(const std::string& what)
|
||||
{
|
||||
if(what == "-b")
|
||||
time_boost = true;
|
||||
else if(what == "-bl")
|
||||
time_localised_boost = true;
|
||||
#ifdef BOOST_HAS_GRETA
|
||||
else if(what == "-g")
|
||||
time_greta = true;
|
||||
else if(what == "-gs")
|
||||
time_safe_greta = true;
|
||||
#endif
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
else if(what == "-posix")
|
||||
time_posix = true;
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
else if(what == "-pcre")
|
||||
time_pcre = true;
|
||||
#endif
|
||||
else if(what == "-all")
|
||||
{
|
||||
time_boost = true;
|
||||
time_localised_boost = true;
|
||||
#ifdef BOOST_HAS_GRETA
|
||||
time_greta = true;
|
||||
time_safe_greta = true;
|
||||
#endif
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
time_posix = true;
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
time_pcre = true;
|
||||
#endif
|
||||
}
|
||||
else if(what == "-test-matches")
|
||||
test_matches = true;
|
||||
else if(what == "-test-code")
|
||||
test_code = true;
|
||||
else if(what == "-test-html")
|
||||
test_html = true;
|
||||
else if(what == "-test-short-twain")
|
||||
test_short_twain = true;
|
||||
else if(what == "-test-long-twain")
|
||||
test_long_twain = true;
|
||||
else if(what == "-test-all")
|
||||
{
|
||||
test_matches = true;
|
||||
test_code = true;
|
||||
test_html = true;
|
||||
test_short_twain = true;
|
||||
test_long_twain = true;
|
||||
}
|
||||
else if((what == "-h") || (what == "--help"))
|
||||
return show_usage();
|
||||
else if((what[0] == '-') || (what[0] == '/'))
|
||||
{
|
||||
std::cerr << "Unknown argument: \"" << what << "\"" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
else if(html_template_file.size() == 0)
|
||||
{
|
||||
html_template_file = what;
|
||||
load_file(html_contents, what.c_str());
|
||||
}
|
||||
else if(html_out_file.size() == 0)
|
||||
html_out_file = what;
|
||||
else
|
||||
{
|
||||
std::cerr << "Unexpected argument: \"" << what << "\"" << std::endl;
|
||||
return 1;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int show_usage()
|
||||
{
|
||||
std::cout <<
|
||||
"Usage\n"
|
||||
"regex_comparison [-h] [library options] [test options] [html_template html_output_file]\n"
|
||||
" -h Show help\n\n"
|
||||
" library options:\n"
|
||||
" -b Apply tests to boost library\n"
|
||||
" -bl Apply tests to boost library with C++ locale\n"
|
||||
#ifdef BOOST_HAS_GRETA
|
||||
" -g Apply tests to GRETA library\n"
|
||||
" -gs Apply tests to GRETA library (in non-recursive mode)\n"
|
||||
#endif
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
" -posix Apply tests to POSIX library\n"
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
" -pcre Apply tests to PCRE library\n"
|
||||
#endif
|
||||
" -all Apply tests to all libraries\n\n"
|
||||
" test options:\n"
|
||||
" -test-matches Test short matches\n"
|
||||
" -test-code Test c++ code examples\n"
|
||||
" -test-html Test c++ code examples\n"
|
||||
" -test-short-twain Test short searches\n"
|
||||
" -test-long-twain Test long searches\n"
|
||||
" -test-all Test everthing\n";
|
||||
return 1;
|
||||
}
|
||||
|
||||
void load_file(std::string& text, const char* file)
|
||||
{
|
||||
std::deque<char> temp_copy;
|
||||
std::ifstream is(file);
|
||||
if(!is.good())
|
||||
{
|
||||
std::string msg("Unable to open file: \"");
|
||||
msg.append(file);
|
||||
msg.append("\"");
|
||||
throw std::runtime_error(msg);
|
||||
}
|
||||
is.seekg(0, std::ios_base::end);
|
||||
std::istream::pos_type pos = is.tellg();
|
||||
is.seekg(0, std::ios_base::beg);
|
||||
text.erase();
|
||||
text.reserve(pos);
|
||||
std::istreambuf_iterator<char> it(is);
|
||||
std::copy(it, std::istreambuf_iterator<char>(), std::back_inserter(text));
|
||||
}
|
||||
|
||||
void print_result(std::ostream& os, double time, double best)
|
||||
{
|
||||
static const char* suffixes[] = {"s", "ms", "us", "ns", "ps", };
|
||||
|
||||
if(time < 0)
|
||||
{
|
||||
os << "<td>NA</td>";
|
||||
return;
|
||||
}
|
||||
double rel = time / best;
|
||||
bool highlight = ((rel > 0) && (rel < 1.1));
|
||||
unsigned suffix = 0;
|
||||
while(time < 0)
|
||||
{
|
||||
time *= 1000;
|
||||
++suffix;
|
||||
}
|
||||
os << "<td>";
|
||||
if(highlight)
|
||||
os << "<font color=\"#008000\">";
|
||||
if(rel <= 1000)
|
||||
os << std::setprecision(3) << rel;
|
||||
else
|
||||
os << (int)rel;
|
||||
os << "<BR>(";
|
||||
if(time <= 1000)
|
||||
os << std::setprecision(3) << time;
|
||||
else
|
||||
os << (int)time;
|
||||
os << suffixes[suffix] << ")";
|
||||
if(highlight)
|
||||
os << "</font>";
|
||||
os << "</td>";
|
||||
}
|
||||
|
||||
std::string html_quote(const std::string& in)
|
||||
{
|
||||
static const boost::regex e("(<)|(>)|(&)|(\")");
|
||||
static const std::string format("(?1<)(?2>)(?3&)(?4")");
|
||||
return regex_replace(in, e, format, boost::match_default | boost::format_all);
|
||||
}
|
||||
|
||||
void output_html_results(bool show_description, const std::string& tagname)
|
||||
{
|
||||
std::stringstream os;
|
||||
if(result_list.size())
|
||||
{
|
||||
//
|
||||
// start by outputting the table header:
|
||||
//
|
||||
os << "<table border=\"1\" cellspacing=\"1\">\n";
|
||||
os << "<tr><td><strong>Expression</strong></td>";
|
||||
if(show_description)
|
||||
os << "<td><strong>Text</strong></td>";
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
if(time_greta == true)
|
||||
os << "<td><strong>GRETA</strong></td>";
|
||||
if(time_safe_greta == true)
|
||||
os << "<td><strong>GRETA<BR>(non-recursive mode)</strong></td>";
|
||||
#endif
|
||||
if(time_boost == true)
|
||||
os << "<td><strong>Boost</strong></td>";
|
||||
if(time_localised_boost == true)
|
||||
os << "<td><strong>Boost + C++ locale</strong></td>";
|
||||
#if defined(BOOST_HAS_POSIX)
|
||||
if(time_posix == true)
|
||||
os << "<td><strong>POSIX</strong></td>";
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
if(time_pcre == true)
|
||||
os << "<td><strong>PCRE</strong></td>";
|
||||
#endif
|
||||
os << "</tr>\n";
|
||||
|
||||
//
|
||||
// Now enumerate through all the test results:
|
||||
//
|
||||
std::list<results>::const_iterator first, last;
|
||||
first = result_list.begin();
|
||||
last = result_list.end();
|
||||
while(first != last)
|
||||
{
|
||||
os << "<tr><td><code>" << html_quote(first->expression) << "</code></td>";
|
||||
if(show_description)
|
||||
os << "<td>" << html_quote(first->description) << "</td>";
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
if(time_greta == true)
|
||||
{
|
||||
print_result(os, first->greta_time, first->factor);
|
||||
if(first->greta_time > 0)
|
||||
{
|
||||
greta_total += first->greta_time / first->factor;
|
||||
++greta_test_count;
|
||||
}
|
||||
}
|
||||
if(time_safe_greta == true)
|
||||
{
|
||||
print_result(os, first->safe_greta_time, first->factor);
|
||||
if(first->safe_greta_time > 0)
|
||||
{
|
||||
safe_greta_total += first->safe_greta_time / first->factor;
|
||||
++safe_greta_test_count;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(BOOST_HAS_POSIX)
|
||||
if(time_boost == true)
|
||||
{
|
||||
print_result(os, first->boost_time, first->factor);
|
||||
if(first->boost_time > 0)
|
||||
{
|
||||
boost_total += first->boost_time / first->factor;
|
||||
++boost_test_count;
|
||||
}
|
||||
}
|
||||
if(time_localised_boost == true)
|
||||
{
|
||||
print_result(os, first->localised_boost_time, first->factor);
|
||||
if(first->localised_boost_time > 0)
|
||||
{
|
||||
locale_boost_total += first->localised_boost_time / first->factor;
|
||||
++locale_boost_test_count;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
if(time_posix == true)
|
||||
{
|
||||
print_result(os, first->posix_time, first->factor);
|
||||
if(first->posix_time > 0)
|
||||
{
|
||||
posix_total += first->posix_time / first->factor;
|
||||
++posix_test_count;
|
||||
}
|
||||
}
|
||||
#if defined(BOOST_HAS_PCRE)
|
||||
if(time_pcre == true)
|
||||
{
|
||||
print_result(os, first->pcre_time, first->factor);
|
||||
if(first->pcre_time > 0)
|
||||
{
|
||||
pcre_total += first->pcre_time / first->factor;
|
||||
++pcre_test_count;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
os << "</tr>\n";
|
||||
++first;
|
||||
}
|
||||
os << "</table>\n";
|
||||
result_list.clear();
|
||||
}
|
||||
else
|
||||
{
|
||||
os << "<P><I>Results not available...</I></P>\n";
|
||||
}
|
||||
|
||||
std::string result = os.str();
|
||||
|
||||
std::string::size_type pos = html_contents.find(tagname);
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
html_contents.replace(pos, tagname.size(), result);
|
||||
}
|
||||
}
|
||||
|
||||
std::string get_boost_version()
|
||||
{
|
||||
std::stringstream os;
|
||||
os << (BOOST_VERSION / 100000) << '.' << ((BOOST_VERSION / 100) % 1000) << '.' << (BOOST_VERSION % 100);
|
||||
return os.str();
|
||||
}
|
||||
|
||||
std::string get_averages_table()
|
||||
{
|
||||
std::stringstream os;
|
||||
//
|
||||
// start by outputting the table header:
|
||||
//
|
||||
os << "<table border=\"1\" cellspacing=\"1\">\n";
|
||||
os << "<tr>";
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
if(time_greta == true)
|
||||
{
|
||||
os << "<td><strong>GRETA</strong></td>";
|
||||
}
|
||||
if(time_safe_greta == true)
|
||||
{
|
||||
os << "<td><strong>GRETA<BR>(non-recursive mode)</strong></td>";
|
||||
}
|
||||
|
||||
#endif
|
||||
if(time_boost == true)
|
||||
{
|
||||
os << "<td><strong>Boost</strong></td>";
|
||||
}
|
||||
if(time_localised_boost == true)
|
||||
{
|
||||
os << "<td><strong>Boost + C++ locale</strong></td>";
|
||||
}
|
||||
#if defined(BOOST_HAS_POSIX)
|
||||
if(time_posix == true)
|
||||
{
|
||||
os << "<td><strong>POSIX</strong></td>";
|
||||
}
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
if(time_pcre == true)
|
||||
{
|
||||
os << "<td><strong>PCRE</strong></td>";
|
||||
}
|
||||
#endif
|
||||
os << "</tr>\n";
|
||||
|
||||
//
|
||||
// Now enumerate through all averages:
|
||||
//
|
||||
os << "<tr>";
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
if(time_greta == true)
|
||||
os << "<td>" << (greta_total / greta_test_count) << "</td>\n";
|
||||
if(time_safe_greta == true)
|
||||
os << "<td>" << (safe_greta_total / safe_greta_test_count) << "</td>\n";
|
||||
#endif
|
||||
#if defined(BOOST_HAS_POSIX)
|
||||
if(time_boost == true)
|
||||
os << "<td>" << (boost_total / boost_test_count) << "</td>\n";
|
||||
if(time_localised_boost == true)
|
||||
os << "<td>" << (locale_boost_total / locale_boost_test_count) << "</td>\n";
|
||||
#endif
|
||||
if(time_posix == true)
|
||||
os << "<td>" << (posix_total / posix_test_count) << "</td>\n";
|
||||
#if defined(BOOST_HAS_PCRE)
|
||||
if(time_pcre == true)
|
||||
os << "<td>" << (pcre_total / pcre_test_count) << "</td>\n";
|
||||
#endif
|
||||
os << "</tr>\n";
|
||||
os << "</table>\n";
|
||||
return os.str();
|
||||
}
|
||||
|
||||
void output_final_html()
|
||||
{
|
||||
if(html_out_file.size())
|
||||
{
|
||||
//
|
||||
// start with search and replace ops:
|
||||
//
|
||||
std::string::size_type pos;
|
||||
pos = html_contents.find("%compiler%");
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
html_contents.replace(pos, 10, BOOST_COMPILER);
|
||||
}
|
||||
pos = html_contents.find("%library%");
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
html_contents.replace(pos, 9, BOOST_STDLIB);
|
||||
}
|
||||
pos = html_contents.find("%os%");
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
html_contents.replace(pos, 4, BOOST_PLATFORM);
|
||||
}
|
||||
pos = html_contents.find("%boost%");
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
html_contents.replace(pos, 7, get_boost_version());
|
||||
}
|
||||
pos = html_contents.find("%pcre%");
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
#ifdef PCRE_MINOR
|
||||
html_contents.replace(pos, 6, BOOST_STRINGIZE(PCRE_MAJOR.PCRE_MINOR));
|
||||
#else
|
||||
html_contents.replace(pos, 6, "N/A");
|
||||
#endif
|
||||
}
|
||||
pos = html_contents.find("%averages%");
|
||||
if(pos != std::string::npos)
|
||||
{
|
||||
html_contents.replace(pos, 10, get_averages_table());
|
||||
}
|
||||
//
|
||||
// now right the output to file:
|
||||
//
|
||||
std::ofstream os(html_out_file.c_str());
|
||||
os << html_contents;
|
||||
}
|
||||
else
|
||||
{
|
||||
std::cout << html_contents;
|
||||
}
|
||||
}
|
70
performance/input.html
Normal file
70
performance/input.html
Normal file
@ -0,0 +1,70 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>Regular Expression Performance Comparison</title>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
|
||||
<meta name="Template" content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
</head>
|
||||
<body bgcolor="#ffffff" link="#0000ff" vlink="#800080">
|
||||
<h2>Regular Expression Performance Comparison</h2>
|
||||
<p>
|
||||
The following tables provide comparisons between the following regular
|
||||
expression libraries:</p>
|
||||
<p><a href="http://research.microsoft.com/projects/greta">GRETA</a>.</p>
|
||||
<p><a href="http://www.boost.org/">The Boost regex library</a>.</p>
|
||||
<p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a>
|
||||
- this is provided for comparison as a typical non-backtracking implementation.</p>
|
||||
<P>Philip Hazel's <A href="http://www.pcre.org">PCRE</A> library.</P>
|
||||
<H3>Details</H3>
|
||||
<P>Machine: Intel Pentium 4 2.8GHz PC.</P>
|
||||
<P>Compiler: %compiler%.</P>
|
||||
<P>C++ Standard Library: %library%.</P>
|
||||
<P>OS: %os%.</P>
|
||||
<P>Boost version: %boost%.</P>
|
||||
<P>PCRE version: %pcre%.</P>
|
||||
<P>
|
||||
As ever care should be taken in interpreting the results, only sensible regular
|
||||
expressions (rather than pathological cases) are given, most are taken from the
|
||||
Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of
|
||||
Regular Expressions</a>. In addition, some variation in the relative
|
||||
performance of these libraries can be expected on other machines - as memory
|
||||
access and processor caching effects can be quite large for most finite state
|
||||
machine algorithms.</P>
|
||||
<H3>Averages</H3>
|
||||
<P>The following are the average relative scores for all the tests: the perfect
|
||||
regular expression library would score 1, in practice anything less than 2
|
||||
is pretty good.</P>
|
||||
<P>%averages%</P>
|
||||
<h3>Comparison 1: Long Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within a long English language text was measured
|
||||
(<a href="ftp://ibiblio.org/pub/docs/books/gutenberg/etext02/mtent12.zip">mtent12.txt</a>
|
||||
from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb). </p>
|
||||
<P>%long_twain_search%</P>
|
||||
<h3>Comparison 2: Medium Sized Search</h3>
|
||||
<p>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within a medium sized English language text was
|
||||
measured (the first 50K from mtent12.txt). </p>
|
||||
<P>%short_twain_search%</P>
|
||||
<H3>Comparison 3: C++ Code Search</H3>
|
||||
<P>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within the C++ source file <A href="../../../boost/crc.hpp">
|
||||
boost/crc.hpp</A> was measured. </P>
|
||||
<P>%code_search%</P>
|
||||
<H3>
|
||||
<H3>Comparison 4: HTML Document Search</H3>
|
||||
</H3>
|
||||
<P>For each of the following regular expressions the time taken to find all
|
||||
occurrences of the expression within the html file <A href="../../libraries.htm">libs/libraries.htm</A>
|
||||
was measured. </P>
|
||||
<P>%html_search%</P>
|
||||
<H3>Comparison 3: Simple Matches</H3>
|
||||
<p>
|
||||
For each of the following regular expressions the time taken to match against
|
||||
the text indicated was measured. </p>
|
||||
<P>%short_matches%</P>
|
||||
<hr>
|
||||
<p>Copyright John Maddock April 2003, all rights reserved.</p>
|
||||
</body>
|
||||
</html>
|
251
performance/main.cpp
Normal file
251
performance/main.cpp
Normal file
@ -0,0 +1,251 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include <iostream>
|
||||
#include <fstream>
|
||||
#include <iterator>
|
||||
#include <cassert>
|
||||
#include <boost/test/execution_monitor.hpp>
|
||||
#include "regex_comparison.hpp"
|
||||
|
||||
|
||||
void test_match(const std::string& re, const std::string& text, const std::string& description, bool icase)
|
||||
{
|
||||
double time;
|
||||
results r(re, description);
|
||||
|
||||
std::cout << "Testing: \"" << re << "\" against \"" << description << "\"" << std::endl;
|
||||
|
||||
#ifdef BOOST_HAS_GRETA
|
||||
if(time_greta == true)
|
||||
{
|
||||
time = g::time_match(re, text, icase);
|
||||
r.greta_time = time;
|
||||
std::cout << "\tGRETA regex: " << time << "s\n";
|
||||
}
|
||||
if(time_safe_greta == true)
|
||||
{
|
||||
time = gs::time_match(re, text, icase);
|
||||
r.safe_greta_time = time;
|
||||
std::cout << "\tSafe GRETA regex: " << time << "s\n";
|
||||
}
|
||||
#endif
|
||||
if(time_boost == true)
|
||||
{
|
||||
time = b::time_match(re, text, icase);
|
||||
r.boost_time = time;
|
||||
std::cout << "\tBoost regex: " << time << "s\n";
|
||||
}
|
||||
if(time_localised_boost == true)
|
||||
{
|
||||
time = bl::time_match(re, text, icase);
|
||||
r.localised_boost_time = time;
|
||||
std::cout << "\tBoost regex (C++ locale): " << time << "s\n";
|
||||
}
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
if(time_posix == true)
|
||||
{
|
||||
time = posix::time_match(re, text, icase);
|
||||
r.posix_time = time;
|
||||
std::cout << "\tPOSIX regex: " << time << "s\n";
|
||||
}
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
if(time_pcre == true)
|
||||
{
|
||||
time = pcr::time_match(re, text, icase);
|
||||
r.pcre_time = time;
|
||||
std::cout << "\tPCRE regex: " << time << "s\n";
|
||||
}
|
||||
#endif
|
||||
r.finalise();
|
||||
result_list.push_back(r);
|
||||
}
|
||||
|
||||
void test_find_all(const std::string& re, const std::string& text, const std::string& description, bool icase)
|
||||
{
|
||||
std::cout << "Testing: " << re << std::endl;
|
||||
|
||||
double time;
|
||||
results r(re, description);
|
||||
|
||||
#ifdef BOOST_HAS_GRETA
|
||||
if(time_greta == true)
|
||||
{
|
||||
time = g::time_find_all(re, text, icase);
|
||||
r.greta_time = time;
|
||||
std::cout << "\tGRETA regex: " << time << "s\n";
|
||||
}
|
||||
if(time_safe_greta == true)
|
||||
{
|
||||
time = gs::time_find_all(re, text, icase);
|
||||
r.safe_greta_time = time;
|
||||
std::cout << "\tSafe GRETA regex: " << time << "s\n";
|
||||
}
|
||||
#endif
|
||||
if(time_boost == true)
|
||||
{
|
||||
time = b::time_find_all(re, text, icase);
|
||||
r.boost_time = time;
|
||||
std::cout << "\tBoost regex: " << time << "s\n";
|
||||
}
|
||||
if(time_localised_boost == true)
|
||||
{
|
||||
time = bl::time_find_all(re, text, icase);
|
||||
r.localised_boost_time = time;
|
||||
std::cout << "\tBoost regex (C++ locale): " << time << "s\n";
|
||||
}
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
if(time_posix == true)
|
||||
{
|
||||
time = posix::time_find_all(re, text, icase);
|
||||
r.posix_time = time;
|
||||
std::cout << "\tPOSIX regex: " << time << "s\n";
|
||||
}
|
||||
#endif
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
if(time_pcre == true)
|
||||
{
|
||||
time = pcr::time_find_all(re, text, icase);
|
||||
r.pcre_time = time;
|
||||
std::cout << "\tPCRE regex: " << time << "s\n";
|
||||
}
|
||||
#endif
|
||||
r.finalise();
|
||||
result_list.push_back(r);
|
||||
}
|
||||
|
||||
int cpp_main(int argc, char * argv[])
|
||||
{
|
||||
// start by processing the command line args:
|
||||
if(argc < 2)
|
||||
return show_usage();
|
||||
int result = 0;
|
||||
for(int c = 1; c < argc; ++c)
|
||||
{
|
||||
result += handle_argument(argv[c]);
|
||||
}
|
||||
if(result)
|
||||
return result;
|
||||
|
||||
if(test_matches)
|
||||
{
|
||||
// start with a simple test, this is basically a measure of the minimal overhead
|
||||
// involved in calling a regex matcher:
|
||||
test_match("abc", "abc");
|
||||
// these are from the regex docs:
|
||||
test_match("^([0-9]+)(\\-| |$)(.*)$", "100- this is a line of ftp response which contains a message string");
|
||||
test_match("([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}", "1234-5678-1234-456");
|
||||
// these are from http://www.regxlib.com/
|
||||
test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "john_maddock@compuserve.com");
|
||||
test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "foo12@foo.edu");
|
||||
test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "bob.smith@foo.tv");
|
||||
test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "EH10 2QQ");
|
||||
test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "G1 1AA");
|
||||
test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "SW1 1ZZ");
|
||||
test_match("^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$", "4/1/2001");
|
||||
test_match("^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$", "12/12/2001");
|
||||
test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "123");
|
||||
test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "+3.14159");
|
||||
test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "-3.14159");
|
||||
}
|
||||
output_html_results(true, "%short_matches%");
|
||||
|
||||
std::string file_contents;
|
||||
|
||||
if(test_code)
|
||||
{
|
||||
load_file(file_contents, "../../../boost/crc.hpp");
|
||||
|
||||
const char* highlight_expression = // preprocessor directives: index 1
|
||||
"(^[ \t]*#(?:[^\\\\\\n]|\\\\[^\\n_[:punct:][:alnum:]]*[\\n[:punct:][:word:]])*)|"
|
||||
// comment: index 2
|
||||
"(//[^\\n]*|/\\*.*?\\*/)|"
|
||||
// literals: index 3
|
||||
"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
|
||||
// string literals: index 4
|
||||
"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
|
||||
// keywords: index 5
|
||||
"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
|
||||
"|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
|
||||
"|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
|
||||
"|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
|
||||
"|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
|
||||
"|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
|
||||
"|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
|
||||
"|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
|
||||
"|using|virtual|void|volatile|wchar_t|while)\\>"
|
||||
;
|
||||
|
||||
const char* class_expression = "^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
|
||||
"(class|struct)[[:space:]]*(\\<\\w+\\>([ \t]*\\([^)]*\\))?"
|
||||
"[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?"
|
||||
"(\\{|:[^;\\{()]*\\{)";
|
||||
|
||||
const char* include_expression = "^[ \t]*#[ \t]*include[ \t]+(\"[^\"]+\"|<[^>]+>)";
|
||||
const char* boost_include_expression = "^[ \t]*#[ \t]*include[ \t]+(\"boost/[^\"]+\"|<boost/[^>]+>)";
|
||||
|
||||
|
||||
test_find_all(class_expression, file_contents);
|
||||
test_find_all(highlight_expression, file_contents);
|
||||
test_find_all(include_expression, file_contents);
|
||||
test_find_all(boost_include_expression, file_contents);
|
||||
}
|
||||
output_html_results(false, "%code_search%");
|
||||
|
||||
if(test_html)
|
||||
{
|
||||
load_file(file_contents, "../../../libs/libraries.htm");
|
||||
test_find_all("beman|john|dave", file_contents, true);
|
||||
test_find_all("<p>.*?</p>", file_contents, true);
|
||||
test_find_all("<a[^>]+href=(\"[^\"]*\"|[^[:space:]]+)[^>]*>", file_contents, true);
|
||||
test_find_all("<h[12345678][^>]*>.*?</h[12345678]>", file_contents, true);
|
||||
test_find_all("<img[^>]+src=(\"[^\"]*\"|[^[:space:]]+)[^>]*>", file_contents, true);
|
||||
test_find_all("<font[^>]+face=(\"[^\"]*\"|[^[:space:]]+)[^>]*>.*?</font>", file_contents, true);
|
||||
}
|
||||
output_html_results(false, "%html_search%");
|
||||
|
||||
if(test_short_twain)
|
||||
{
|
||||
load_file(file_contents, "short_twain.txt");
|
||||
|
||||
test_find_all("Twain", file_contents);
|
||||
test_find_all("Huck[[:alpha:]]+", file_contents);
|
||||
test_find_all("[[:alpha:]]+ing", file_contents);
|
||||
test_find_all("^[^\n]*?Twain", file_contents);
|
||||
test_find_all("Tom|Sawyer|Huckleberry|Finn", file_contents);
|
||||
test_find_all("(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)", file_contents);
|
||||
}
|
||||
output_html_results(false, "%short_twain_search%");
|
||||
|
||||
if(test_long_twain)
|
||||
{
|
||||
load_file(file_contents, "mtent13.txt");
|
||||
|
||||
test_find_all("Twain", file_contents);
|
||||
test_find_all("Huck[[:alpha:]]+", file_contents);
|
||||
test_find_all("[[:alpha:]]+ing", file_contents);
|
||||
test_find_all("^[^\n]*?Twain", file_contents);
|
||||
test_find_all("Tom|Sawyer|Huckleberry|Finn", file_contents);
|
||||
time_posix = false;
|
||||
test_find_all("(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)", file_contents);
|
||||
time_posix = true;
|
||||
}
|
||||
output_html_results(false, "%long_twain_search%");
|
||||
|
||||
output_final_html();
|
||||
return 0;
|
||||
}
|
||||
|
136
performance/regex_comparison.hpp
Normal file
136
performance/regex_comparison.hpp
Normal file
@ -0,0 +1,136 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* All rights reserved.
|
||||
* May not be transfered or disclosed to a third party without
|
||||
* prior consent of the author.
|
||||
*
|
||||
*/
|
||||
|
||||
|
||||
#ifndef REGEX_COMPARISON_HPP
|
||||
#define REGEX_COMPARISON_HPP
|
||||
|
||||
#include <string>
|
||||
#include <list>
|
||||
#include <boost/limits.hpp>
|
||||
|
||||
//
|
||||
// globals:
|
||||
//
|
||||
extern bool time_boost;
|
||||
extern bool time_localised_boost;
|
||||
extern bool time_greta;
|
||||
extern bool time_safe_greta;
|
||||
extern bool time_posix;
|
||||
extern bool time_pcre;
|
||||
|
||||
extern bool test_matches;
|
||||
extern bool test_short_twain;
|
||||
extern bool test_long_twain;
|
||||
extern bool test_code;
|
||||
extern bool test_html;
|
||||
|
||||
extern std::string html_template_file;
|
||||
extern std::string html_out_file;
|
||||
extern std::string html_contents;
|
||||
|
||||
|
||||
int handle_argument(const std::string& what);
|
||||
int show_usage();
|
||||
void load_file(std::string& text, const char* file);
|
||||
void output_html_results(bool show_description, const std::string& tagname);
|
||||
void output_final_html();
|
||||
|
||||
|
||||
struct results
|
||||
{
|
||||
double boost_time;
|
||||
double localised_boost_time;
|
||||
double greta_time;
|
||||
double safe_greta_time;
|
||||
double posix_time;
|
||||
double pcre_time;
|
||||
double factor;
|
||||
std::string expression;
|
||||
std::string description;
|
||||
results(const std::string& ex, const std::string& desc)
|
||||
: boost_time(-1),
|
||||
localised_boost_time(-1),
|
||||
greta_time(-1),
|
||||
safe_greta_time(-1),
|
||||
posix_time(-1),
|
||||
pcre_time(-1),
|
||||
factor(std::numeric_limits<double>::max()),
|
||||
expression(ex),
|
||||
description(desc)
|
||||
{}
|
||||
void finalise()
|
||||
{
|
||||
if((boost_time >= 0) && (boost_time < factor))
|
||||
factor = boost_time;
|
||||
if((localised_boost_time >= 0) && (localised_boost_time < factor))
|
||||
factor = localised_boost_time;
|
||||
if((greta_time >= 0) && (greta_time < factor))
|
||||
factor = greta_time;
|
||||
if((safe_greta_time >= 0) && (safe_greta_time < factor))
|
||||
factor = safe_greta_time;
|
||||
if((posix_time >= 0) && (posix_time < factor))
|
||||
factor = posix_time;
|
||||
if((pcre_time >= 0) && (pcre_time < factor))
|
||||
factor = pcre_time;
|
||||
}
|
||||
};
|
||||
|
||||
extern std::list<results> result_list;
|
||||
|
||||
|
||||
namespace b {
|
||||
// boost tests:
|
||||
double time_match(const std::string& re, const std::string& text, bool icase);
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase);
|
||||
|
||||
}
|
||||
namespace bl {
|
||||
// localised boost tests:
|
||||
double time_match(const std::string& re, const std::string& text, bool icase);
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase);
|
||||
|
||||
}
|
||||
namespace pcr {
|
||||
// pcre tests:
|
||||
double time_match(const std::string& re, const std::string& text, bool icase);
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase);
|
||||
|
||||
}
|
||||
namespace g {
|
||||
// greta tests:
|
||||
double time_match(const std::string& re, const std::string& text, bool icase);
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase);
|
||||
|
||||
}
|
||||
namespace gs {
|
||||
// safe greta tests:
|
||||
double time_match(const std::string& re, const std::string& text, bool icase);
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase);
|
||||
|
||||
}
|
||||
namespace posix {
|
||||
// safe greta tests:
|
||||
double time_match(const std::string& re, const std::string& text, bool icase);
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase);
|
||||
|
||||
}
|
||||
void test_match(const std::string& re, const std::string& text, const std::string& description, bool icase = false);
|
||||
void test_find_all(const std::string& re, const std::string& text, const std::string& description, bool icase = false);
|
||||
inline void test_match(const std::string& re, const std::string& text, bool icase = false)
|
||||
{ test_match(re, text, text, icase); }
|
||||
inline void test_find_all(const std::string& re, const std::string& text, bool icase = false)
|
||||
{ test_find_all(re, text, "", icase); }
|
||||
|
||||
|
||||
#define REPEAT_COUNT 10
|
||||
|
||||
#endif
|
98
performance/time_boost.cpp
Normal file
98
performance/time_boost.cpp
Normal file
@ -0,0 +1,98 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include "regex_comparison.hpp"
|
||||
#include <boost/timer.hpp>
|
||||
#include <boost/regex.hpp>
|
||||
|
||||
namespace b{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
boost::regex e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
|
||||
boost::smatch what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_match(text, what, e);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_match(text, what, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
bool dummy_grep_proc(const boost::smatch&)
|
||||
{ return true; }
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
boost::regex e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
|
||||
boost::smatch what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_grep(&dummy_grep_proc, text, e);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
if(result >10)
|
||||
return result / iter;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_grep(&dummy_grep_proc, text, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
}
|
125
performance/time_greta.cpp
Normal file
125
performance/time_greta.cpp
Normal file
@ -0,0 +1,125 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include "regex_comparison.hpp"
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
#include <cassert>
|
||||
#include <boost/timer.hpp>
|
||||
#include "regexpr2.h"
|
||||
|
||||
namespace g{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE));
|
||||
regex::match_results what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
assert(e.match(text, what));
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text, what);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text, what);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE));
|
||||
regex::match_results what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text.begin(), text.end(), what);
|
||||
while(what.backref(0).matched)
|
||||
{
|
||||
e.match(what.backref(0).end(), text.end(), what);
|
||||
}
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
if(result > 10)
|
||||
return result / iter;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text.begin(), text.end(), what);
|
||||
while(what.backref(0).matched)
|
||||
{
|
||||
e.match(what.backref(0).end(), text.end(), what);
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
namespace g {
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
98
performance/time_localised_boost.cpp
Normal file
98
performance/time_localised_boost.cpp
Normal file
@ -0,0 +1,98 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include "regex_comparison.hpp"
|
||||
#include <boost/timer.hpp>
|
||||
#include <boost/regex.hpp>
|
||||
|
||||
namespace bl{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
boost::reg_expression<char, boost::cpp_regex_traits<char> > e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
|
||||
boost::smatch what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_match(text, what, e);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_match(text, what, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
bool dummy_grep_proc(const boost::smatch&)
|
||||
{ return true; }
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
boost::reg_expression<char, boost::cpp_regex_traits<char> > e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
|
||||
boost::smatch what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_grep(&dummy_grep_proc, text, e);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
if(result >10)
|
||||
return result / iter;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
boost::regex_grep(&dummy_grep_proc, text, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
}
|
180
performance/time_pcre.cpp
Normal file
180
performance/time_pcre.cpp
Normal file
@ -0,0 +1,180 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include <cassert>
|
||||
#include <cfloat>
|
||||
#include "regex_comparison.hpp"
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
#include "pcre.h"
|
||||
#include <boost/timer.hpp>
|
||||
|
||||
namespace pcr{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
pcre *ppcre;
|
||||
const char *error;
|
||||
int erroffset;
|
||||
|
||||
int what[50];
|
||||
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
|
||||
if(0 == (ppcre = pcre_compile(re.c_str(), (icase ? PCRE_CASELESS | PCRE_ANCHORED | PCRE_DOTALL | PCRE_MULTILINE : PCRE_ANCHORED | PCRE_DOTALL | PCRE_MULTILINE),
|
||||
&error, &erroffset, NULL)))
|
||||
{
|
||||
free(ppcre);
|
||||
return -1;
|
||||
}
|
||||
|
||||
pcre_extra *pe;
|
||||
pe = pcre_study(ppcre, 0, &error);
|
||||
if(error)
|
||||
{
|
||||
free(ppcre);
|
||||
free(pe);
|
||||
return -1;
|
||||
}
|
||||
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
free(ppcre);
|
||||
free(pe);
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
pcre *ppcre;
|
||||
const char *error;
|
||||
int erroffset;
|
||||
|
||||
int what[50];
|
||||
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
int exec_result;
|
||||
int matches;
|
||||
|
||||
if(0 == (ppcre = pcre_compile(re.c_str(), (icase ? PCRE_CASELESS | PCRE_DOTALL | PCRE_MULTILINE : PCRE_DOTALL | PCRE_MULTILINE), &error, &erroffset, NULL)))
|
||||
{
|
||||
free(ppcre);
|
||||
return -1;
|
||||
}
|
||||
|
||||
pcre_extra *pe;
|
||||
pe = pcre_study(ppcre, 0, &error);
|
||||
if(error)
|
||||
{
|
||||
free(ppcre);
|
||||
free(pe);
|
||||
return -1;
|
||||
}
|
||||
|
||||
do
|
||||
{
|
||||
int startoff;
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
matches = 0;
|
||||
startoff = 0;
|
||||
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
|
||||
while(exec_result >= 0)
|
||||
{
|
||||
++matches;
|
||||
startoff = what[1];
|
||||
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
|
||||
}
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
if(result >10)
|
||||
return result / iter;
|
||||
|
||||
result = DBL_MAX;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
int startoff;
|
||||
matches = 0;
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
matches = 0;
|
||||
startoff = 0;
|
||||
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
|
||||
while(exec_result >= 0)
|
||||
{
|
||||
++matches;
|
||||
startoff = what[1];
|
||||
exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
}
|
||||
#else
|
||||
|
||||
namespace pcr{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#endif
|
143
performance/time_posix.cpp
Normal file
143
performance/time_posix.cpp
Normal file
@ -0,0 +1,143 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include <cassert>
|
||||
#include <cfloat>
|
||||
#include "regex_comparison.hpp"
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
#include <boost/timer.hpp>
|
||||
#include "regex.h"
|
||||
|
||||
namespace posix{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
regex_t e;
|
||||
regmatch_t what[20];
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
if(0 != regcomp(&e, re.c_str(), (icase ? REG_ICASE | REG_EXTENDED : REG_EXTENDED)))
|
||||
return -1;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
regexec(&e, text.c_str(), e.re_nsub, what, 0);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
regexec(&e, text.c_str(), e.re_nsub, what, 0);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
regfree(&e);
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
regex_t e;
|
||||
regmatch_t what[20];
|
||||
memset(what, 0, sizeof(what));
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
int exec_result;
|
||||
int matches;
|
||||
if(0 != regcomp(&e, re.c_str(), (icase ? REG_ICASE | REG_EXTENDED : REG_EXTENDED)))
|
||||
return -1;
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
what[0].rm_so = 0;
|
||||
what[0].rm_eo = text.size();
|
||||
matches = 0;
|
||||
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
|
||||
while(exec_result == 0)
|
||||
{
|
||||
++matches;
|
||||
what[0].rm_so = what[0].rm_eo;
|
||||
what[0].rm_eo = text.size();
|
||||
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
|
||||
}
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
if(result >10)
|
||||
return result / iter;
|
||||
|
||||
result = DBL_MAX;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
what[0].rm_so = 0;
|
||||
what[0].rm_eo = text.size();
|
||||
matches = 0;
|
||||
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
|
||||
while(exec_result == 0)
|
||||
{
|
||||
++matches;
|
||||
what[0].rm_so = what[0].rm_eo;
|
||||
what[0].rm_eo = text.size();
|
||||
exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
}
|
||||
#else
|
||||
|
||||
namespace posix{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
}
|
||||
#endif
|
127
performance/time_safe_greta.cpp
Normal file
127
performance/time_safe_greta.cpp
Normal file
@ -0,0 +1,127 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
#include "regex_comparison.hpp"
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
|
||||
#include <cassert>
|
||||
#include <boost/timer.hpp>
|
||||
#include "regexpr2.h"
|
||||
|
||||
namespace gs{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE), regex::MODE_SAFE);
|
||||
regex::match_results what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
assert(e.match(text, what));
|
||||
do
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text, what);
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text, what);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE), regex::MODE_SAFE);
|
||||
regex::match_results what;
|
||||
boost::timer tim;
|
||||
int iter = 1;
|
||||
int counter, repeats;
|
||||
double result = 0;
|
||||
double run;
|
||||
do
|
||||
{
|
||||
bool r;
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text.begin(), text.end(), what);
|
||||
while(what.backref(0).matched)
|
||||
{
|
||||
e.match(what.backref(0).end(), text.end(), what);
|
||||
}
|
||||
}
|
||||
result = tim.elapsed();
|
||||
iter *= 2;
|
||||
}while(result < 0.5);
|
||||
iter /= 2;
|
||||
|
||||
if(result > 10)
|
||||
return result / iter;
|
||||
|
||||
// repeat test and report least value for consistency:
|
||||
for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
|
||||
{
|
||||
tim.restart();
|
||||
for(counter = 0; counter < iter; ++counter)
|
||||
{
|
||||
e.match(text.begin(), text.end(), what);
|
||||
while(what.backref(0).matched)
|
||||
{
|
||||
e.match(what.backref(0).end(), text.end(), what);
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
namespace gs{
|
||||
|
||||
double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
314
posix_ref.htm
314
posix_ref.htm
@ -1,314 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, POSIX API Reference</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, POSIX API
|
||||
Reference. </h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="posix"></a><i>POSIX compatibility library</i></h3>
|
||||
|
||||
<pre>#include <boost/cregex.hpp>
|
||||
<i>or</i>:
|
||||
#include <boost/regex.h></pre>
|
||||
|
||||
<p>The following functions are available for users who need a
|
||||
POSIX compatible C library, they are available in both Unicode
|
||||
and narrow character versions, the standard POSIX API names are
|
||||
macros that expand to one version or the other depending upon
|
||||
whether UNICODE is defined or not. </p>
|
||||
|
||||
<p><b>Important</b>: Note that all the symbols defined here are
|
||||
enclosed inside namespace <i>boost</i> when used in C++ programs,
|
||||
unless you use #include <boost/regex.h> instead - in which
|
||||
case the symbols are still defined in namespace boost, but are
|
||||
made available in the global namespace as well.</p>
|
||||
|
||||
<p>The functions are defined as: </p>
|
||||
|
||||
<pre>extern "C" {
|
||||
<b>int</b> regcompA(regex_tA*, <b>const</b> <b>char</b>*, <b>int</b>);
|
||||
<b>unsigned</b> <b>int</b> regerrorA(<b>int</b>, <b>const</b> regex_tA*, <b>char</b>*, <b>unsigned</b> <b>int</b>);
|
||||
<b>int</b> regexecA(<b>const</b> regex_tA*, <b>const</b> <b>char</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
|
||||
<b>void</b> regfreeA(regex_tA*);
|
||||
|
||||
<b>int</b> regcompW(regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>int</b>);
|
||||
<b>unsigned</b> <b>int</b> regerrorW(<b>int</b>, <b>const</b> regex_tW*, <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>);
|
||||
<b>int</b> regexecW(<b>const</b> regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
|
||||
<b>void</b> regfreeW(regex_tW*);
|
||||
|
||||
#ifdef UNICODE
|
||||
#define regcomp regcompW
|
||||
#define regerror regerrorW
|
||||
#define regexec regexecW
|
||||
#define regfree regfreeW
|
||||
#define regex_t regex_tW
|
||||
#else
|
||||
#define regcomp regcompA
|
||||
#define regerror regerrorA
|
||||
#define regexec regexecA
|
||||
#define regfree regfreeA
|
||||
#define regex_t regex_tA
|
||||
#endif
|
||||
}</pre>
|
||||
|
||||
<p>All the functions operate on structure <b>regex_t</b>, which
|
||||
exposes two public members: </p>
|
||||
|
||||
<p><b>unsigned int re_nsub</b> this is filled in by <b>regcomp</b>
|
||||
and indicates the number of sub-expressions contained in the
|
||||
regular expression. </p>
|
||||
|
||||
<p><b>const TCHAR* re_endp</b> points to the end of the
|
||||
expression to compile when the flag REG_PEND is set. </p>
|
||||
|
||||
<p><i>Footnote: regex_t is actually a #define - it is either
|
||||
regex_tA or regex_tW depending upon whether UNICODE is defined or
|
||||
not, TCHAR is either char or wchar_t again depending upon the
|
||||
macro UNICODE.</i> </p>
|
||||
|
||||
<p><b>regcomp</b> takes a pointer to a <b>regex_t</b>, a pointer
|
||||
to the expression to compile and a flags parameter which can be a
|
||||
combination of: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_EXTENDED</td>
|
||||
<td valign="top" width="45%">Compiles modern regular
|
||||
expressions. Equivalent to regbase::char_classes |
|
||||
regbase::intervals | regbase::bk_refs.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_BASIC</td>
|
||||
<td valign="top" width="45%">Compiles basic (obsolete)
|
||||
regular expression syntax. Equivalent to regbase::char_classes
|
||||
| regbase::intervals | regbase::limited_ops | regbase::bk_braces
|
||||
| regbase::bk_parens | regbase::bk_refs.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NOSPEC</td>
|
||||
<td valign="top" width="45%">All characters are ordinary,
|
||||
the expression is a literal string.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_ICASE</td>
|
||||
<td valign="top" width="45%">Compiles for matching that
|
||||
ignores character case.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NOSUB</td>
|
||||
<td valign="top" width="45%">Has no effect in this
|
||||
library.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NEWLINE</td>
|
||||
<td valign="top" width="45%">When this flag is set a dot
|
||||
does not match the newline character.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_PEND</td>
|
||||
<td valign="top" width="45%">When this flag is set the
|
||||
re_endp parameter of the regex_t structure must point to
|
||||
the end of the regular expression to compile.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NOCOLLATE</td>
|
||||
<td valign="top" width="45%">When this flag is set then
|
||||
locale dependent collation for character ranges is turned
|
||||
off.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_ESCAPE_IN_LISTS<br>
|
||||
, , , </td>
|
||||
<td valign="top" width="45%">When this flag is set, then
|
||||
escape sequences are permitted in bracket expressions (character
|
||||
sets).</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NEWLINE_ALT </td>
|
||||
<td valign="top" width="45%">When this flag is set then
|
||||
the newline character is equivalent to the alternation
|
||||
operator |.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_PERL </td>
|
||||
<td valign="top" width="45%"> A shortcut for perl-like
|
||||
behavior: REG_EXTENDED | REG_NOCOLLATE |
|
||||
REG_ESCAPE_IN_LISTS</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_AWK</td>
|
||||
<td valign="top" width="45%">A shortcut for awk-like
|
||||
behavior: REG_EXTENDED | REG_ESCAPE_IN_LISTS</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_GREP</td>
|
||||
<td valign="top" width="45%">A shortcut for grep like
|
||||
behavior: REG_BASIC | REG_NEWLINE_ALT</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_EGREP</td>
|
||||
<td valign="top" width="45%"> A shortcut for egrep
|
||||
like behavior: REG_EXTENDED | REG_NEWLINE_ALT</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><b>regerror</b> takes the following parameters, it maps an
|
||||
error code to a human readable string: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="50%">int code</td>
|
||||
<td valign="top" width="50%">The error code.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">const regex_t* e</td>
|
||||
<td valign="top" width="50%">The regular expression (can
|
||||
be null).</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">char* buf</td>
|
||||
<td valign="top" width="50%">The buffer to fill in with
|
||||
the error message.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">unsigned int buf_size</td>
|
||||
<td valign="top" width="50%">The length of buf.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p>If the error code is OR'ed with REG_ITOA then the message that
|
||||
results is the printable name of the code rather than a message,
|
||||
for example "REG_BADPAT". If the code is REG_ATIO then <b>e</b>
|
||||
must not be null and <b>e->re_pend</b> must point to the
|
||||
printable name of an error code, the return value is then the
|
||||
value of the error code. For any other value of <b>code</b>, the
|
||||
return value is the number of characters in the error message, if
|
||||
the return value is greater than or equal to <b>buf_size</b> then
|
||||
<b>regerror</b> will have to be called again with a larger buffer.</p>
|
||||
|
||||
<p><b>regexec</b> finds the first occurrence of expression <b>e</b>
|
||||
within string <b>buf</b>. If <b>len</b> is non-zero then *<b>m</b>
|
||||
is filled in with what matched the regular expression, <b>m[0]</b>
|
||||
contains what matched the whole string, <b>m[1] </b>the first sub-expression
|
||||
etc, see <b>regmatch_t</b> in the header file declaration for
|
||||
more details. The <b>eflags</b> parameter can be a combination of:
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="50%">REG_NOTBOL</td>
|
||||
<td valign="top" width="50%">Parameter <b>buf </b>does
|
||||
not represent the start of a line.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">REG_NOTEOL</td>
|
||||
<td valign="top" width="50%">Parameter <b>buf</b> does
|
||||
not terminate at the end of a line.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">REG_STARTEND</td>
|
||||
<td valign="top" width="50%">The string searched starts
|
||||
at buf + pmatch[0].rm_so and ends at buf + pmatch[0].rm_eo.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p>Finally <b>regfree</b> frees all the memory that was allocated
|
||||
by regcomp. </p>
|
||||
|
||||
<p><i>Footnote: this is an abridged reference to the POSIX API
|
||||
functions, it is provided for compatibility with other libraries,
|
||||
rather than an API to be used in new code (unless you need access
|
||||
from a language other than C++). This version of these functions
|
||||
should also happily coexist with other versions, as the names
|
||||
used are macros that expand to the actual function names.</i> <br>
|
||||
</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
742
syntax.htm
742
syntax.htm
@ -1,742 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, Regular Expression Syntax</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Regular
|
||||
Expression Syntax.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="syntax"></a><i>Regular expression syntax</i></h3>
|
||||
|
||||
<p>This section covers the regular expression syntax used by this
|
||||
library, this is a programmers guide, the actual syntax presented
|
||||
to your program's users will depend upon the flags used during
|
||||
expression compilation. </p>
|
||||
|
||||
<p><i>Literals</i> </p>
|
||||
|
||||
<p>All characters are literals except: ".", "|",
|
||||
"*", "?", "+", "(",
|
||||
")", "{", "}", "[",
|
||||
"]", "^", "$" and "\".
|
||||
These characters are literals when preceded by a "\". A
|
||||
literal is a character that matches itself, or matches the result
|
||||
of traits_type::translate(), where traits_type is the traits
|
||||
template parameter to class reg_expression. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Wildcard</i> </p>
|
||||
|
||||
<p>The dot character "." matches any single character
|
||||
except : when <i>match_not_dot_null</i> is passed to the matching
|
||||
algorithms, the dot does not match a null character; when <i>match_not_dot_newline</i>
|
||||
is passed to the matching algorithms, then the dot does not match
|
||||
a newline character. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Repeats</i> </p>
|
||||
|
||||
<p>A repeat is an expression that is repeated an arbitrary number
|
||||
of times. An expression followed by "*" can be repeated
|
||||
any number of times including zero. An expression followed by
|
||||
"+" can be repeated any number of times, but at least
|
||||
once, if the expression is compiled with the flag regbase::bk_plus_qm
|
||||
then "+" is an ordinary character and "\+"
|
||||
represents a repeat of once or more. An expression followed by
|
||||
"?" may be repeated zero or one times only, if the
|
||||
expression is compiled with the flag regbase::bk_plus_qm then
|
||||
"?" is an ordinary character and "\?"
|
||||
represents the repeat zero or once operator. When it is necessary
|
||||
to specify the minimum and maximum number of repeats explicitly,
|
||||
the bounds operator "{}" may be used, thus "a{2}"
|
||||
is the letter "a" repeated exactly twice, "a{2,4}"
|
||||
represents the letter "a" repeated between 2 and 4
|
||||
times, and "a{2,}" represents the letter "a"
|
||||
repeated at least twice with no upper limit. Note that there must
|
||||
be no white-space inside the {}, and there is no upper limit on
|
||||
the values of the lower and upper bounds. When the expression is
|
||||
compiled with the flag regbase::bk_braces then "{" and
|
||||
"}" are ordinary characters and "\{" and
|
||||
"\}" are used to delimit bounds instead. All repeat
|
||||
expressions refer to the shortest possible previous sub-expression:
|
||||
a single character; a character set, or a sub-expression grouped
|
||||
with "()" for example. </p>
|
||||
|
||||
<p>Examples: </p>
|
||||
|
||||
<p>"ba*" will match all of "b", "ba",
|
||||
"baaa" etc. </p>
|
||||
|
||||
<p>"ba+" will match "ba" or "baaaa"
|
||||
for example but not "b". </p>
|
||||
|
||||
<p>"ba?" will match "b" or "ba". </p>
|
||||
|
||||
<p>"ba{2,4}" will match "baa", "baaa"
|
||||
and "baaaa". </p>
|
||||
|
||||
<p><i>Non-greedy repeats</i> </p>
|
||||
|
||||
<p>Whenever the "extended" regular expression syntax is
|
||||
in use (the default) then non-greedy repeats are possible by
|
||||
appending a '?' after the repeat; a non-greedy repeat is one
|
||||
which will match the <i>shortest</i> possible string. </p>
|
||||
|
||||
<p>For example to match html tag pairs one could use something
|
||||
like: </p>
|
||||
|
||||
<p>"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
|
||||
</p>
|
||||
|
||||
<p>In this case $1 will contain the text between the tag pairs,
|
||||
and will be the shortest possible matching string. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Parenthesis</i> </p>
|
||||
|
||||
<p>Parentheses serve two purposes, to group items together into a
|
||||
sub-expression, and to mark what generated the match. For example
|
||||
the expression "(ab)*" would match all of the string
|
||||
"ababab". The matching algorithms <a
|
||||
href="template_class_ref.htm#query_match">regex_match</a> and <a
|
||||
href="template_class_ref.htm#reg_search">regex_search</a> each
|
||||
take an instance of <a href="template_class_ref.htm#reg_match">match_results</a>
|
||||
that reports what caused the match, on exit from these functions
|
||||
the <a href="template_class_ref.htm#reg_match">match_results</a>
|
||||
contains information both on what the whole expression matched
|
||||
and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the
|
||||
final "ab" of the matching string. It is permissible
|
||||
for sub-expressions to match null strings. If a sub-expression
|
||||
takes no part in a match - for example if it is part of an
|
||||
alternative that is not taken - then both of the iterators that
|
||||
are returned for that sub-expression point to the end of the
|
||||
input string, and the <i>matched</i> parameter for that sub-expression
|
||||
is <i>false</i>. Sub-expressions are indexed from left to right
|
||||
starting from 1, sub-expression 0 is the whole expression. </p>
|
||||
|
||||
<p><i>Non-Marking Parenthesis</i> </p>
|
||||
|
||||
<p>Sometimes you need to group sub-expressions with parenthesis,
|
||||
but don't want the parenthesis to spit out another marked sub-expression,
|
||||
in this case a non-marking parenthesis (?:expression) can be used.
|
||||
For example the following expression creates no sub-expressions: </p>
|
||||
|
||||
<p>"(?:abc)*"</p>
|
||||
|
||||
<p><em>Forward Lookahead Asserts</em> </p>
|
||||
|
||||
<p>There are two forms of these; one for positive forward
|
||||
lookahead asserts, and one for negative lookahead asserts:</p>
|
||||
|
||||
<p>"(?=abc)" matches zero characters only if they are
|
||||
followed by the expression "abc".</p>
|
||||
|
||||
<p>"(?!abc)" matches zero characters only if they are
|
||||
not followed by the expression "abc".</p>
|
||||
|
||||
<p><i>Alternatives</i> </p>
|
||||
|
||||
<p>Alternatives occur when the expression can match either one
|
||||
sub-expression or another, each alternative is separated by a
|
||||
"|", or a "\|" if the flag regbase::bk_vbar
|
||||
is set, or by a newline character if the flag regbase::newline_alt
|
||||
is set. Each alternative is the largest possible previous sub-expression;
|
||||
this is the opposite behaviour from repetition operators. </p>
|
||||
|
||||
<p>Examples: </p>
|
||||
|
||||
<p>"a(b|c)" could match "ab" or "ac".
|
||||
</p>
|
||||
|
||||
<p>"abc|def" could match "abc" or "def".
|
||||
<br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Sets</i> </p>
|
||||
|
||||
<p>A set is a set of characters that can match any single
|
||||
character that is a member of the set. Sets are delimited by
|
||||
"[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and
|
||||
equivalence classes. Set declarations that start with "^"
|
||||
contain the compliment of the elements that follow. </p>
|
||||
|
||||
<p>Examples: </p>
|
||||
|
||||
<p>Character literals: </p>
|
||||
|
||||
<p>"[abc]" will match either of "a", "b",
|
||||
or "c". </p>
|
||||
|
||||
<p>"[^abc] will match any character other than "a",
|
||||
"b", or "c". </p>
|
||||
|
||||
<p>Character ranges: </p>
|
||||
|
||||
<p>"[a-z]" will match any character in the range "a"
|
||||
to "z". </p>
|
||||
|
||||
<p>"[^A-Z]" will match any character other than those
|
||||
in the range "A" to "Z". </p>
|
||||
|
||||
<p>Note that character ranges are highly locale dependent: they
|
||||
match any character that collates between the endpoints of the
|
||||
range, ranges will only behave according to ASCII rules when the
|
||||
default "C" locale is in effect. For example if the
|
||||
library is compiled with the Win32 localization model, then [a-z]
|
||||
will match the ASCII characters a-z, and also 'A', 'B' etc, but
|
||||
not 'Z' which collates just after 'z'. This locale specific
|
||||
behaviour can be disabled by specifying regbase::nocollate when
|
||||
compiling, this is the default behaviour when using regbase::normal,
|
||||
and forces ranges to collate according to ASCII character code.
|
||||
Likewise, if you use the POSIX C API functions then setting
|
||||
REG_NOCOLLATE turns off locale dependent collation. </p>
|
||||
|
||||
<p>Character classes are denoted using the syntax "[:classname:]"
|
||||
within a set declaration, for example "[[:space:]]" is
|
||||
the set of all whitespace characters. Character classes are only
|
||||
available if the flag regbase::char_classes is set. The available
|
||||
character classes are: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="50%">alnum</td>
|
||||
<td valign="top" width="50%">Any alpha numeric character.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">alpha</td>
|
||||
<td valign="top" width="50%">Any alphabetical character a-z
|
||||
and A-Z. Other characters may also be included depending
|
||||
upon the locale.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">blank</td>
|
||||
<td valign="top" width="50%">Any blank character, either
|
||||
a space or a tab.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">cntrl</td>
|
||||
<td valign="top" width="50%">Any control character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">digit</td>
|
||||
<td valign="top" width="50%">Any digit 0-9.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">graph</td>
|
||||
<td valign="top" width="50%">Any graphical character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">lower</td>
|
||||
<td valign="top" width="50%">Any lower case character a-z.
|
||||
Other characters may also be included depending upon the
|
||||
locale.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">print</td>
|
||||
<td valign="top" width="50%">Any printable character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">punct</td>
|
||||
<td valign="top" width="50%">Any punctuation character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">space</td>
|
||||
<td valign="top" width="50%">Any whitespace character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">upper</td>
|
||||
<td valign="top" width="50%">Any upper case character A-Z.
|
||||
Other characters may also be included depending upon the
|
||||
locale.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">xdigit</td>
|
||||
<td valign="top" width="50%">Any hexadecimal digit
|
||||
character, 0-9, a-f and A-F.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">word</td>
|
||||
<td valign="top" width="50%">Any word character - all
|
||||
alphanumeric characters plus the underscore.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">unicode</td>
|
||||
<td valign="top" width="50%">Any character whose code is
|
||||
greater than 255, this applies to the wide character
|
||||
traits classes only.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p>There are some shortcuts that can be used in place of the
|
||||
character classes, provided the flag regbase::escape_in_lists is
|
||||
set then you can use: </p>
|
||||
|
||||
<p>\w in place of [:word:] </p>
|
||||
|
||||
<p>\s in place of [:space:] </p>
|
||||
|
||||
<p>\d in place of [:digit:] </p>
|
||||
|
||||
<p>\l in place of [:lower:] </p>
|
||||
|
||||
<p>\u in place of [:upper:] <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p>Collating elements take the general form [.tagname.] inside a
|
||||
set declaration, where <i>tagname</i> is either a single
|
||||
character, or a name of a collating element, for example [[.a.]]
|
||||
is equivalent to [a], and [[.comma.]] is equivalent to [,]. The
|
||||
library supports all the standard POSIX collating element names,
|
||||
and in addition the following digraphs: "ae", "ch",
|
||||
"ll", "ss", "nj", "dz",
|
||||
"lj", each in lower, upper and title case variations.
|
||||
Multi-character collating elements can result in the set matching
|
||||
more than one character, for example [[.ae.]] would match two
|
||||
characters, but note that [^[.ae.]] would only match one
|
||||
character. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p>Equivalence classes take the general form [=tagname=] inside a
|
||||
set declaration, where <i>tagname</i> is either a single
|
||||
character, or a name of a collating element, and matches any
|
||||
character that is a member of the same primary equivalence class
|
||||
as the collating element [.tagname.]. An equivalence class is a
|
||||
set of characters that collate the same, a primary equivalence
|
||||
class is a set of characters whose primary sort key are all the
|
||||
same (for example strings are typically collated by character,
|
||||
then by accent, and then by case; the primary sort key then
|
||||
relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class
|
||||
corresponding to <i>tagname</i>, then [=tagname=] is exactly the
|
||||
same as [.tagname.]. Unfortunately there is no locale independent
|
||||
method of obtaining the primary sort key for a character, except
|
||||
under Win32. For other operating systems the library will "guess"
|
||||
the primary sort key from the full sort key (obtained from <i>strxfrm</i>),
|
||||
so equivalence classes are probably best considered broken under
|
||||
any operating system other than Win32. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p>To include a literal "-" in a set declaration then:
|
||||
make it the first character after the opening "[" or
|
||||
"[^", the endpoint of a range, a collating element, or
|
||||
if the flag regbase::escape_in_lists is set then precede with an
|
||||
escape character as in "[\-]". To include a literal
|
||||
"[" or "]" or "^" in a set then
|
||||
make them the endpoint of a range, a collating element, or
|
||||
precede with an escape character if the flag regbase::escape_in_lists
|
||||
is set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Line anchors</i> </p>
|
||||
|
||||
<p>An anchor is something that matches the null string at the
|
||||
start or end of a line: "^" matches the null string at
|
||||
the start of a line, "$" matches the null string at the
|
||||
end of a line. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Back references</i> </p>
|
||||
|
||||
<p>A back reference is a reference to a previous sub-expression
|
||||
that has already been matched, the reference is to what the sub-expression
|
||||
matched, not to the expression itself. A back reference consists
|
||||
of the escape character "\" followed by a digit "1"
|
||||
to "9", "\1" refers to the first sub-expression,
|
||||
"\2" to the second etc. For example the expression
|
||||
"(.*)\1" matches any string that is repeated about its
|
||||
mid-point for example "abcabc" or "xyzxyz". A
|
||||
back reference to a sub-expression that did not participate in
|
||||
any match, matches the null string: NB this is different to some
|
||||
other regular expression matchers. Back references are only
|
||||
available if the expression is compiled with the flag regbase::bk_refs
|
||||
set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Characters by code</i> </p>
|
||||
|
||||
<p>This is an extension to the algorithm that is not available in
|
||||
other libraries, it consists of the escape character followed by
|
||||
the digit "0" followed by the octal character code. For
|
||||
example "\023" represents the character whose octal
|
||||
code is 23. Where ambiguity could occur use parentheses to break
|
||||
the expression up: "\0103" represents the character
|
||||
whose code is 103, "(\010)3 represents the character 10
|
||||
followed by "3". To match characters by their
|
||||
hexadecimal code, use \x followed by a string of hexadecimal
|
||||
digits, optionally enclosed inside {}, for example \xf0 or
|
||||
\x{aff}, notice the latter example is a Unicode character. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Word operators</i> </p>
|
||||
|
||||
<p>The following operators are provided for compatibility with
|
||||
the GNU regular expression library. </p>
|
||||
|
||||
<p>"\w" matches any single character that is a member
|
||||
of the "word" character class, this is identical to the
|
||||
expression "[[:word:]]". </p>
|
||||
|
||||
<p>"\W" matches any single character that is not a
|
||||
member of the "word" character class, this is identical
|
||||
to the expression "[^[:word:]]". </p>
|
||||
|
||||
<p>"\<" matches the null string at the start of a
|
||||
word. </p>
|
||||
|
||||
<p>"\>" matches the null string at the end of the
|
||||
word. </p>
|
||||
|
||||
<p>"\b" matches the null string at either the start or
|
||||
the end of a word. </p>
|
||||
|
||||
<p>"\B" matches a null string within a word. </p>
|
||||
|
||||
<p>The start of the sequence passed to the matching algorithms is
|
||||
considered to be a potential start of a word unless the flag
|
||||
match_not_bow is set. The end of the sequence passed to the
|
||||
matching algorithms is considered to be a potential end of a word
|
||||
unless the flag match_not_eow is set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Buffer operators</i> </p>
|
||||
|
||||
<p>The following operators are provide for compatibility with the
|
||||
GNU regular expression library, and Perl regular expressions: </p>
|
||||
|
||||
<p>"\`" matches the start of a buffer. </p>
|
||||
|
||||
<p>"\A" matches the start of the buffer. </p>
|
||||
|
||||
<p>"\'" matches the end of a buffer. </p>
|
||||
|
||||
<p>"\z" matches the end of a buffer. </p>
|
||||
|
||||
<p>"\Z" matches the end of a buffer, or possibly one or
|
||||
more new line characters followed by the end of the buffer. </p>
|
||||
|
||||
<p>A buffer is considered to consist of the whole sequence passed
|
||||
to the matching algorithms, unless the flags match_not_bob or
|
||||
match_not_eob are set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Escape operator</i> </p>
|
||||
|
||||
<p>The escape character "\" has several meanings. </p>
|
||||
|
||||
<p>Inside a set declaration the escape character is a normal
|
||||
character unless the flag regbase::escape_in_lists is set in
|
||||
which case whatever follows the escape is a literal character
|
||||
regardless of its normal meaning. </p>
|
||||
|
||||
<p>The escape operator may introduce an operator for example:
|
||||
back references, or a word operator. </p>
|
||||
|
||||
<p>The escape operator may make the following character normal,
|
||||
for example "\*" represents a literal "*"
|
||||
rather than the repeat operator. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Single character escape sequences</i> </p>
|
||||
|
||||
<p>The following escape sequences are aliases for single
|
||||
characters: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="33%">Escape sequence </td>
|
||||
<td valign="top" width="33%">Character code </td>
|
||||
<td valign="top" width="33%">Meaning </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\a </td>
|
||||
<td valign="top" width="33%">0x07 </td>
|
||||
<td valign="top" width="33%">Bell character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\f </td>
|
||||
<td valign="top" width="33%">0x0C </td>
|
||||
<td valign="top" width="33%">Form feed. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\n </td>
|
||||
<td valign="top" width="33%">0x0A </td>
|
||||
<td valign="top" width="33%">Newline character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\r </td>
|
||||
<td valign="top" width="33%">0x0D </td>
|
||||
<td valign="top" width="33%">Carriage return. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\t </td>
|
||||
<td valign="top" width="33%">0x09 </td>
|
||||
<td valign="top" width="33%">Tab character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\v </td>
|
||||
<td valign="top" width="33%">0x0B </td>
|
||||
<td valign="top" width="33%">Vertical tab. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\e </td>
|
||||
<td valign="top" width="33%">0x1B </td>
|
||||
<td valign="top" width="33%">ASCII Escape character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\0dd </td>
|
||||
<td valign="top" width="33%">0dd </td>
|
||||
<td valign="top" width="33%">An octal character code,
|
||||
where <i>dd</i> is one or more octal digits. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\xXX </td>
|
||||
<td valign="top" width="33%">0xXX </td>
|
||||
<td valign="top" width="33%">A hexadecimal character
|
||||
code, where XX is one or more hexadecimal digits. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\x{XX} </td>
|
||||
<td valign="top" width="33%">0xXX </td>
|
||||
<td valign="top" width="33%">A hexadecimal character
|
||||
code, where XX is one or more hexadecimal digits,
|
||||
optionally a unicode character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\cZ </td>
|
||||
<td valign="top" width="33%">z-@ </td>
|
||||
<td valign="top" width="33%">An ASCII escape sequence
|
||||
control-Z, where Z is any ASCII character greater than or
|
||||
equal to the character code for '@'. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><i>Miscellaneous escape sequences:</i> </p>
|
||||
|
||||
<p>The following are provided mostly for perl compatibility, but
|
||||
note that there are some differences in the meanings of \l \L \u
|
||||
and \U: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="6" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\w </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:word:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\W </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:word:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\s </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:space:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\S </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:space:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\d </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:digit:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\D </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:digit:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\l </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:lower:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\L </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:lower:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\u </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:upper:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\U </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:upper:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\C </td>
|
||||
<td valign="top" width="45%">Any single character,
|
||||
equivalent to '.'. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\X </td>
|
||||
<td valign="top" width="45%">Match any Unicode combining
|
||||
character sequence, for example "a\x 0301" (a
|
||||
letter a with an acute). </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\Q </td>
|
||||
<td valign="top" width="45%">The begin quote operator,
|
||||
everything that follows is treated as a literal character
|
||||
until a \E end quote operator is found. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\E </td>
|
||||
<td valign="top" width="45%">The end quote operator,
|
||||
terminates a sequence begun with \Q. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><i>What gets matched?</i> </p>
|
||||
|
||||
<p>The regular expression library will match the first possible
|
||||
matching string, if more than one string starting at a given
|
||||
location can match then it matches the longest possible string,
|
||||
unless the flag match_any is set, in which case the first match
|
||||
encountered is returned. Use of the match_any option can reduce
|
||||
the time taken to find the match - but is only useful if the user
|
||||
is less concerned about what matched - for example it would not
|
||||
be suitable for search and replace operations. In cases where
|
||||
their are multiple possible matches all starting at the same
|
||||
location, and all of the same length, then the match chosen is
|
||||
the one with the longest first sub-expression, if that is the
|
||||
same for two or more matches, then the second sub-expression will
|
||||
be examined and so on. <br>
|
||||
</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
File diff suppressed because it is too large
Load Diff
52
test/pathology/bad_expression_test.cpp
Normal file
52
test/pathology/bad_expression_test.cpp
Normal file
@ -0,0 +1,52 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 1998-2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
/*
|
||||
* LOCATION: see http://www.boost.org for most recent version.
|
||||
* FILE: recursion_test.cpp
|
||||
* VERSION: see <boost/version.hpp>
|
||||
* DESCRIPTION: Test for indefinite recursion and/or stack overrun.
|
||||
*/
|
||||
|
||||
#include <string>
|
||||
#include <boost/regex.hpp>
|
||||
#include <boost/test/test_tools.hpp>
|
||||
|
||||
int test_main( int argc, char* argv[] )
|
||||
{
|
||||
std::string bad_text(1024, ' ');
|
||||
std::string good_text(200, ' ');
|
||||
good_text.append("xyz");
|
||||
|
||||
boost::smatch what;
|
||||
|
||||
boost::regex e1("(.+)+xyz");
|
||||
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e1));
|
||||
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e1), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e1));
|
||||
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e1));
|
||||
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e1), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e1));
|
||||
|
||||
boost::regex e2("abc|[[:space:]]+(xyz)?[[:space:]]+xyz");
|
||||
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e2));
|
||||
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e2), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e2));
|
||||
|
||||
return 0;
|
||||
}
|
63
test/pathology/recursion_test.cpp
Normal file
63
test/pathology/recursion_test.cpp
Normal file
@ -0,0 +1,63 @@
|
||||
/*
|
||||
*
|
||||
* Copyright (c) 1998-2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Permission to use, copy, modify, distribute and sell this software
|
||||
* and its documentation for any purpose is hereby granted without fee,
|
||||
* provided that the above copyright notice appear in all copies and
|
||||
* that both that copyright notice and this permission notice appear
|
||||
* in supporting documentation. Dr John Maddock makes no representations
|
||||
* about the suitability of this software for any purpose.
|
||||
* It is provided "as is" without express or implied warranty.
|
||||
*
|
||||
*/
|
||||
|
||||
/*
|
||||
* LOCATION: see http://www.boost.org for most recent version.
|
||||
* FILE: recursion_test.cpp
|
||||
* VERSION: see <boost/version.hpp>
|
||||
* DESCRIPTION: Test for indefinite recursion and/or stack overrun.
|
||||
*/
|
||||
|
||||
#include <string>
|
||||
#include <boost/regex.hpp>
|
||||
#include <boost/test/test_tools.hpp>
|
||||
|
||||
int test_main( int argc, char* argv[] )
|
||||
{
|
||||
// this regex will recurse twice for each whitespace character matched:
|
||||
boost::regex e("([[:space:]]|.)+");
|
||||
|
||||
std::string bad_text(1024*1024*4, ' ');
|
||||
std::string good_text(200, ' ');
|
||||
|
||||
boost::smatch what;
|
||||
|
||||
//
|
||||
// Over and over: We want to make sure that after a stack error has
|
||||
// been triggered, that we can still conduct a good search and that
|
||||
// subsequent stack failures still do the right thing:
|
||||
//
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_search(good_text, what, e));
|
||||
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e));
|
||||
BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
|
||||
BOOST_CHECK(boost::regex_match(good_text, what, e));
|
||||
|
||||
return 0;
|
||||
}
|
908
test/regress/v3_tests.txt
Normal file
908
test/regress/v3_tests.txt
Normal file
@ -0,0 +1,908 @@
|
||||
;
|
||||
;
|
||||
; this file contains a script of tests to run through regress.exe
|
||||
;
|
||||
; comments start with a semicolon and proceed to the end of the line
|
||||
;
|
||||
; changes to regular expression compile flags start with a "-" as the first
|
||||
; non-whitespace character and consist of a list of the printable names
|
||||
; of the flags, for example "match_default"
|
||||
;
|
||||
; Other lines contain a test to perform using the current flag status
|
||||
; the first token contains the expression to compile, the second the string
|
||||
; to match it against. If the second string is "!" then the expression should
|
||||
; not compile, that is the first string is an invalid regular expression.
|
||||
; This is then followed by a list of integers that specify what should match,
|
||||
; each pair represents the starting and ending positions of a subexpression
|
||||
; starting with the zeroth subexpression (the whole match).
|
||||
; A value of -1 indicates that the subexpression should not take part in the
|
||||
; match at all, if the first value is -1 then no part of the expression should
|
||||
; match the string.
|
||||
;
|
||||
|
||||
- match_default normal REG_EXTENDED
|
||||
|
||||
;
|
||||
; try some really simple literals:
|
||||
a a 0 1
|
||||
Z Z 0 1
|
||||
Z aaa -1 -1
|
||||
Z xxxxZZxxx 4 5
|
||||
|
||||
; and some simple brackets:
|
||||
(a) zzzaazz 3 4 3 4
|
||||
() zzz 0 0 0 0
|
||||
() "" 0 0 0 0
|
||||
( !
|
||||
) !
|
||||
(aa !
|
||||
aa) !
|
||||
a b -1 -1
|
||||
\(\) () 0 2
|
||||
\(a\) (a) 0 3
|
||||
\() !
|
||||
(\) !
|
||||
p(a)rameter ABCparameterXYZ 3 12 4 5
|
||||
[pq](a)rameter ABCparameterXYZ 3 12 4 5
|
||||
|
||||
; now try escaped brackets:
|
||||
- match_default bk_parens REG_BASIC
|
||||
\(a\) zzzaazz 3 4 3 4
|
||||
\(\) zzz 0 0 0 0
|
||||
\(\) "" 0 0 0 0
|
||||
\( !
|
||||
\) !
|
||||
\(aa !
|
||||
aa\) !
|
||||
() () 0 2
|
||||
(a) (a) 0 3
|
||||
(\) !
|
||||
\() !
|
||||
|
||||
; now move on to "." wildcards
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
. a 0 1
|
||||
. \n 0 1
|
||||
. \r 0 1
|
||||
. \0 0 1
|
||||
- match_default normal match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE
|
||||
. a 0 1
|
||||
. \n -1 -1
|
||||
. \r -1 -1
|
||||
. \0 0 1
|
||||
- match_default normal match_not_dot_null match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE
|
||||
. \n -1 -1
|
||||
. \r -1 -1
|
||||
; this *WILL* produce an error from the POSIX API functions:
|
||||
- match_default normal match_not_dot_null match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE REG_NO_POSIX_TEST
|
||||
. \0 -1 -1
|
||||
|
||||
|
||||
;
|
||||
; now move on to the repetion ops,
|
||||
; starting with operator *
|
||||
- match_default normal REG_EXTENDED
|
||||
a* b 0 0
|
||||
ab* a 0 1
|
||||
ab* ab 0 2
|
||||
ab* sssabbbbbbsss 3 10
|
||||
ab*c* a 0 1
|
||||
ab*c* abbb 0 4
|
||||
ab*c* accc 0 4
|
||||
ab*c* abbcc 0 5
|
||||
*a !
|
||||
\<* !
|
||||
\>* !
|
||||
\n* \n\n 0 2
|
||||
\** ** 0 2
|
||||
\* * 0 1
|
||||
|
||||
; now try operator +
|
||||
ab+ a -1 -1
|
||||
ab+ ab 0 2
|
||||
ab+ sssabbbbbbsss 3 10
|
||||
ab+c+ a -1 -1
|
||||
ab+c+ abbb -1 -1
|
||||
ab+c+ accc -1 -1
|
||||
ab+c+ abbcc 0 5
|
||||
+a !
|
||||
\<+ !
|
||||
\>+ !
|
||||
\n+ \n\n 0 2
|
||||
\+ + 0 1
|
||||
\+ ++ 0 1
|
||||
\++ ++ 0 2
|
||||
- match_default normal bk_plus_qm REG_EXTENDED REG_NO_POSIX_TEST
|
||||
+ + 0 1
|
||||
\+ !
|
||||
a\+ aa 0 2
|
||||
|
||||
; now try operator ?
|
||||
- match_default normal REG_EXTENDED
|
||||
a? b 0 0
|
||||
ab? a 0 1
|
||||
ab? ab 0 2
|
||||
ab? sssabbbbbbsss 3 5
|
||||
ab?c? a 0 1
|
||||
ab?c? abbb 0 2
|
||||
ab?c? accc 0 2
|
||||
ab?c? abcc 0 3
|
||||
?a !
|
||||
\<? !
|
||||
\>? !
|
||||
\n? \n\n 0 1
|
||||
\? ? 0 1
|
||||
\? ?? 0 1
|
||||
\?? ?? 0 1
|
||||
- match_default normal bk_plus_qm REG_EXTENDED REG_NO_POSIX_TEST
|
||||
? ? 0 1
|
||||
\? !
|
||||
a\? aa 0 1
|
||||
a\? b 0 0
|
||||
|
||||
- match_default normal limited_ops
|
||||
a? a? 0 2
|
||||
a+ a+ 0 2
|
||||
a\? a? 0 2
|
||||
a\+ a+ 0 2
|
||||
|
||||
; now try operator {}
|
||||
- match_default normal REG_EXTENDED
|
||||
a{2} a -1 -1
|
||||
a{2} aa 0 2
|
||||
a{2} aaa 0 2
|
||||
a{2,} a -1 -1
|
||||
a{2,} aa 0 2
|
||||
a{2,} aaaaa 0 5
|
||||
a{2,4} a -1 -1
|
||||
a{2,4} aa 0 2
|
||||
a{2,4} aaa 0 3
|
||||
a{2,4} aaaa 0 4
|
||||
a{2,4} aaaaa 0 4
|
||||
; spaces are now allowed inside {}
|
||||
"a{ 2 , 4 }" aaaaa 0 4
|
||||
a{} !
|
||||
"a{ }" !
|
||||
a{2 !
|
||||
a} !
|
||||
\{\} {} 0 2
|
||||
|
||||
- match_default normal bk_braces
|
||||
a\{2\} a -1 -1
|
||||
a\{2\} aa 0 2
|
||||
a\{2\} aaa 0 2
|
||||
a\{2,\} a -1 -1
|
||||
a\{2,\} aa 0 2
|
||||
a\{2,\} aaaaa 0 5
|
||||
a\{2,4\} a -1 -1
|
||||
a\{2,4\} aa 0 2
|
||||
a\{2,4\} aaa 0 3
|
||||
a\{2,4\} aaaa 0 4
|
||||
a\{2,4\} aaaaa 0 4
|
||||
"a\{ 2 , 4 \}" aaaaa 0 4
|
||||
{} {} 0 2
|
||||
|
||||
; now test the alternation operator |
|
||||
- match_default normal REG_EXTENDED
|
||||
a|b a 0 1
|
||||
a|b b 0 1
|
||||
a(b|c) ab 0 2 1 2
|
||||
a(b|c) ac 0 2 1 2
|
||||
a(b|c) ad -1 -1 -1 -1
|
||||
|c !
|
||||
c| !
|
||||
(|) !
|
||||
(a|) !
|
||||
(|a) !
|
||||
a\| a| 0 2
|
||||
- match_default normal limited_ops
|
||||
a| a| 0 2
|
||||
a\| a| 0 2
|
||||
| | 0 1
|
||||
- match_default normal bk_vbar REG_NO_POSIX_TEST
|
||||
a| a| 0 2
|
||||
a\|b a 0 1
|
||||
a\|b b 0 1
|
||||
|
||||
; now test the set operator []
|
||||
- match_default normal REG_EXTENDED
|
||||
; try some literals first
|
||||
[abc] a 0 1
|
||||
[abc] b 0 1
|
||||
[abc] c 0 1
|
||||
[abc] d -1 -1
|
||||
[^bcd] a 0 1
|
||||
[^bcd] b -1 -1
|
||||
[^bcd] d -1 -1
|
||||
[^bcd] e 0 1
|
||||
a[b]c abc 0 3
|
||||
a[ab]c abc 0 3
|
||||
a[^ab]c adc 0 3
|
||||
a[]b]c a]c 0 3
|
||||
a[[b]c a[c 0 3
|
||||
a[-b]c a-c 0 3
|
||||
a[^]b]c adc 0 3
|
||||
a[^-b]c adc 0 3
|
||||
a[b-]c a-c 0 3
|
||||
a[b !
|
||||
a[] !
|
||||
|
||||
; then some ranges
|
||||
[b-e] a -1 -1
|
||||
[b-e] b 0 1
|
||||
[b-e] e 0 1
|
||||
[b-e] f -1 -1
|
||||
[^b-e] a 0 1
|
||||
[^b-e] b -1 -1
|
||||
[^b-e] e -1 -1
|
||||
[^b-e] f 0 1
|
||||
a[1-3]c a2c 0 3
|
||||
a[3-1]c !
|
||||
a[1-3-5]c !
|
||||
a[1- !
|
||||
|
||||
; and some classes
|
||||
a[[:alpha:]]c abc 0 3
|
||||
a[[:unknown:]]c !
|
||||
a[[: !
|
||||
a[[:alpha !
|
||||
a[[:alpha:] !
|
||||
a[[:alpha,:] !
|
||||
a[[:]:]]b !
|
||||
a[[:-:]]b !
|
||||
a[[:alph:]] !
|
||||
a[[:alphabet:]] !
|
||||
[[:alnum:]]+ -%@a0X_- 3 6
|
||||
[[:alpha:]]+ -%@aX_0- 3 5
|
||||
[[:blank:]]+ "a \tb" 1 4
|
||||
[[:cntrl:]]+ a\n\tb 1 3
|
||||
[[:digit:]]+ a019b 1 4
|
||||
[[:graph:]]+ " a%b " 1 4
|
||||
[[:lower:]]+ AabC 1 3
|
||||
; This test fails with STLPort, disable for now as this is a corner case anyway...
|
||||
;[[:print:]]+ "\na b\n" 1 4
|
||||
[[:punct:]]+ " %-&\t" 1 4
|
||||
[[:space:]]+ "a \n\t\rb" 1 5
|
||||
[[:upper:]]+ aBCd 1 3
|
||||
[[:xdigit:]]+ p0f3Cx 1 5
|
||||
|
||||
; now test flag settings:
|
||||
- escape_in_lists REG_NO_POSIX_TEST
|
||||
[\n] \n 0 1
|
||||
- REG_NO_POSIX_TEST
|
||||
[\n] \n -1 -1
|
||||
[\n] \\ 0 1
|
||||
[[:class:] : 0 1
|
||||
[[:class:] [ 0 1
|
||||
[[:class:] c 0 1
|
||||
|
||||
; line anchors
|
||||
- match_default normal REG_EXTENDED
|
||||
^ab ab 0 2
|
||||
^ab xxabxx -1 -1
|
||||
^ab xx\nabzz 3 5
|
||||
ab$ ab 0 2
|
||||
ab$ abxx -1 -1
|
||||
ab$ ab\nzz 0 2
|
||||
- match_default match_not_bol match_not_eol normal REG_EXTENDED REG_NOTBOL REG_NOTEOL
|
||||
^ab ab -1 -1
|
||||
^ab xxabxx -1 -1
|
||||
^ab xx\nabzz 3 5
|
||||
ab$ ab -1 -1
|
||||
ab$ abxx -1 -1
|
||||
ab$ ab\nzz 0 2
|
||||
|
||||
; back references
|
||||
- match_default normal REG_EXTENDED
|
||||
a(b)\2c !
|
||||
a(b\1)c !
|
||||
a(b*)c\1d abbcbbd 0 7 1 3
|
||||
a(b*)c\1d abbcbd -1 -1
|
||||
a(b*)c\1d abbcbbbd -1 -1
|
||||
^(.)\1 abc -1 -1
|
||||
a([bc])\1d abcdabbd 4 8 5 6
|
||||
; strictly speaking this is at best ambiguous, at worst wrong, this is what most
|
||||
; re implimentations will match though.
|
||||
a(([bc])\2)*d abbccd 0 6 3 5 3 4
|
||||
|
||||
a(([bc])\2)*d abbcbd -1 -1
|
||||
a((b)*\2)*d abbbd 0 5 1 4 2 3
|
||||
(ab*)[ab]*\1 ababaaa 0 7 0 1
|
||||
(a)\1bcd aabcd 0 5 0 1
|
||||
(a)\1bc*d aabcd 0 5 0 1
|
||||
(a)\1bc*d aabd 0 4 0 1
|
||||
(a)\1bc*d aabcccd 0 7 0 1
|
||||
(a)\1bc*[ce]d aabcccd 0 7 0 1
|
||||
^(a)\1b(c)*cd$ aabcccd 0 7 0 1 4 5
|
||||
|
||||
;
|
||||
; characters by code:
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
\0101 A 0 1
|
||||
\00 \0 0 1
|
||||
\0 \0 0 1
|
||||
\0172 z 0 1
|
||||
|
||||
;
|
||||
; word operators:
|
||||
\w a 0 1
|
||||
\w z 0 1
|
||||
\w A 0 1
|
||||
\w Z 0 1
|
||||
\w _ 0 1
|
||||
\w } -1 -1
|
||||
\w ` -1 -1
|
||||
\w [ -1 -1
|
||||
\w @ -1 -1
|
||||
; non-word:
|
||||
\W a -1 -1
|
||||
\W z -1 -1
|
||||
\W A -1 -1
|
||||
\W Z -1 -1
|
||||
\W _ -1 -1
|
||||
\W } 0 1
|
||||
\W ` 0 1
|
||||
\W [ 0 1
|
||||
\W @ 0 1
|
||||
; word start:
|
||||
\<abcd " abcd" 2 6
|
||||
\<ab cab -1 -1
|
||||
\<ab "\nab" 1 3
|
||||
\<tag ::tag 2 5
|
||||
;word end:
|
||||
abc\> abc 0 3
|
||||
abc\> abcd -1 -1
|
||||
abc\> abc\n 0 3
|
||||
abc\> abc:: 0 3
|
||||
; word boundary:
|
||||
\babcd " abcd" 2 6
|
||||
\bab cab -1 -1
|
||||
\bab "\nab" 1 3
|
||||
\btag ::tag 2 5
|
||||
abc\b abc 0 3
|
||||
abc\b abcd -1 -1
|
||||
abc\b abc\n 0 3
|
||||
abc\b abc:: 0 3
|
||||
; within word:
|
||||
\B ab 1 1
|
||||
a\Bb ab 0 2
|
||||
a\B ab 0 1
|
||||
a\B a -1 -1
|
||||
a\B "a " -1 -1
|
||||
|
||||
;
|
||||
; buffer operators:
|
||||
\`abc abc 0 3
|
||||
\`abc \nabc -1 -1
|
||||
\`abc " abc" -1 -1
|
||||
abc\' abc 0 3
|
||||
abc\' abc\n -1 -1
|
||||
abc\' "abc " -1 -1
|
||||
|
||||
;
|
||||
; extra escape sequences:
|
||||
\a \a 0 1
|
||||
\f \f 0 1
|
||||
\n \n 0 1
|
||||
\r \r 0 1
|
||||
\t \t 0 1
|
||||
\v \v 0 1
|
||||
|
||||
|
||||
;
|
||||
; now follows various complex expressions designed to try and bust the matcher:
|
||||
a(((b)))c abc 0 3 1 2 1 2 1 2
|
||||
a(b|(c))d abd 0 3 1 2 -1 -1
|
||||
a(b|(c))d acd 0 3 1 2 1 2
|
||||
a(b*|c)d abbd 0 4 1 3
|
||||
; just gotta have one DFA-buster, of course
|
||||
a[ab]{20} aaaaabaaaabaaaabaaaab 0 21
|
||||
; and an inline expansion in case somebody gets tricky
|
||||
a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab] aaaaabaaaabaaaabaaaab 0 21
|
||||
; and in case somebody just slips in an NFA...
|
||||
a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab](wee|week)(knights|night) aaaaabaaaabaaaabaaaabweeknights 0 31 21 24 24 31
|
||||
; one really big one
|
||||
1234567890123456789012345678901234567890123456789012345678901234567890 a1234567890123456789012345678901234567890123456789012345678901234567890b 1 71
|
||||
; fish for problems as brackets go past 8
|
||||
[ab][cd][ef][gh][ij][kl][mn] xacegikmoq 1 8
|
||||
[ab][cd][ef][gh][ij][kl][mn][op] xacegikmoq 1 9
|
||||
[ab][cd][ef][gh][ij][kl][mn][op][qr] xacegikmoqy 1 10
|
||||
[ab][cd][ef][gh][ij][kl][mn][op][q] xacegikmoqy 1 10
|
||||
; and as parenthesis go past 9:
|
||||
(a)(b)(c)(d)(e)(f)(g)(h) zabcdefghi 1 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
|
||||
(a)(b)(c)(d)(e)(f)(g)(h)(i) zabcdefghij 1 10 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
|
||||
(a)(b)(c)(d)(e)(f)(g)(h)(i)(j) zabcdefghijk 1 11 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11
|
||||
(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k) zabcdefghijkl 1 12 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12
|
||||
(a)d|(b)c abc 1 3 -1 -1 1 2
|
||||
"_+((www)|(ftp)|(mailto)):_*" "_wwwnocolon _mailto:" 12 20 13 19 -1 -1 -1 -1 13 19
|
||||
|
||||
; subtleties of matching
|
||||
a(b)?c\1d acd 0 3 -1 -1
|
||||
a(b?c)+d accd 0 4 2 3
|
||||
(wee|week)(knights|night) weeknights 0 10 0 3 3 10
|
||||
.* abc 0 3
|
||||
a(b|(c))d abd 0 3 1 2 -1 -1
|
||||
a(b|(c))d acd 0 3 1 2 1 2
|
||||
a(b*|c|e)d abbd 0 4 1 3
|
||||
a(b*|c|e)d acd 0 3 1 2
|
||||
a(b*|c|e)d ad 0 2 1 1
|
||||
a(b?)c abc 0 3 1 2
|
||||
a(b?)c ac 0 2 1 1
|
||||
a(b+)c abc 0 3 1 2
|
||||
a(b+)c abbbc 0 5 1 4
|
||||
a(b*)c ac 0 2 1 1
|
||||
(a|ab)(bc([de]+)f|cde) abcdef 0 6 0 1 1 6 3 5
|
||||
a([bc]?)c abc 0 3 1 2
|
||||
a([bc]?)c ac 0 2 1 1
|
||||
a([bc]+)c abc 0 3 1 2
|
||||
a([bc]+)c abcc 0 4 1 3
|
||||
a([bc]+)bc abcbc 0 5 1 3
|
||||
a(bb+|b)b abb 0 3 1 2
|
||||
a(bbb+|bb+|b)b abb 0 3 1 2
|
||||
a(bbb+|bb+|b)b abbb 0 4 1 3
|
||||
a(bbb+|bb+|b)bb abbb 0 4 1 2
|
||||
(.*).* abcdef 0 6 0 6
|
||||
(a*)* bc 0 0 0 0
|
||||
|
||||
; do we get the right subexpression when it is used more than once?
|
||||
a(b|c)*d ad 0 2 -1 -1
|
||||
a(b|c)*d abcd 0 4 2 3
|
||||
a(b|c)+d abd 0 3 1 2
|
||||
a(b|c)+d abcd 0 4 2 3
|
||||
a(b|c?)+d ad 0 2 1 1
|
||||
a(b|c?)+d abcd 0 4 2 3
|
||||
a(b|c){0,0}d ad 0 2 -1 -1
|
||||
a(b|c){0,1}d ad 0 2 -1 -1
|
||||
a(b|c){0,1}d abd 0 3 1 2
|
||||
a(b|c){0,2}d ad 0 2 -1 -1
|
||||
a(b|c){0,2}d abcd 0 4 2 3
|
||||
a(b|c){0,}d ad 0 2 -1 -1
|
||||
a(b|c){0,}d abcd 0 4 2 3
|
||||
a(b|c){1,1}d abd 0 3 1 2
|
||||
a(b|c){1,2}d abd 0 3 1 2
|
||||
a(b|c){1,2}d abcd 0 4 2 3
|
||||
a(b|c){1,}d abd 0 3 1 2
|
||||
a(b|c){1,}d abcd 0 4 2 3
|
||||
a(b|c){2,2}d acbd 0 4 2 3
|
||||
a(b|c){2,2}d abcd 0 4 2 3
|
||||
a(b|c){2,4}d abcd 0 4 2 3
|
||||
a(b|c){2,4}d abcbd 0 5 3 4
|
||||
a(b|c){2,4}d abcbcd 0 6 4 5
|
||||
a(b|c){2,}d abcd 0 4 2 3
|
||||
a(b|c){2,}d abcbd 0 5 3 4
|
||||
a(b+|((c)*))+d abd 0 3 1 2 -1 -1 -1 -1
|
||||
a(b+|((c)*))+d abcd 0 4 2 3 2 3 2 3
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_NOSPEC literal
|
||||
\**?/{} \\**?/{} 0 7
|
||||
|
||||
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST ; we disable POSIX testing because it can't handle escapes in sets
|
||||
; try to match C++ syntax elements:
|
||||
; line comment:
|
||||
//[^\n]* "++i //here is a line comment\n" 4 28
|
||||
; block comment:
|
||||
/\*([^*]|\*+[^*/])*\*+/ "/* here is a block comment */" 0 29 26 27
|
||||
/\*([^*]|\*+[^*/])*\*+/ "/**/" 0 4 -1 -1
|
||||
/\*([^*]|\*+[^*/])*\*+/ "/***/" 0 5 -1 -1
|
||||
/\*([^*]|\*+[^*/])*\*+/ "/****/" 0 6 -1 -1
|
||||
/\*([^*]|\*+[^*/])*\*+/ "/*****/" 0 7 -1 -1
|
||||
/\*([^*]|\*+[^*/])*\*+/ "/*****/*/" 0 7 -1 -1
|
||||
; preprossor directives:
|
||||
^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol" 0 19 -1 -1
|
||||
^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) #x" 0 25 -1 -1
|
||||
^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) \\ \r\n foo();\\\r\n printf(#x);" 0 53 28 42
|
||||
; literals:
|
||||
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFF 0 4 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1
|
||||
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 35 0 2 0 2 -1 -1 0 2 -1 -1 -1 -1 -1 -1
|
||||
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFu 0 5 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1
|
||||
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFL 0 5 0 4 0 4 -1 -1 4 5 -1 -1 -1 -1
|
||||
((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFFFFFFFFFFFFFFFuint64 0 24 0 18 0 18 -1 -1 19 24 19 24 22 24
|
||||
; strings:
|
||||
'([^\\']|\\.)*' '\\x3A' 0 6 4 5
|
||||
'([^\\']|\\.)*' '\\'' 0 4 1 3
|
||||
'([^\\']|\\.)*' '\\n' 0 4 1 3
|
||||
|
||||
; now try and test some unicode specific characters:
|
||||
- match_default normal REG_PERL REG_UNICODE_ONLY
|
||||
[[:unicode:]]+ a\0300\0400z 1 3
|
||||
[\x10-\xff] \39135\12409 -1 -1
|
||||
[\01-\05]{5} \36865\36865\36865\36865\36865 -1 -1
|
||||
|
||||
; finally try some case insensitive matches:
|
||||
- match_default normal REG_EXTENDED REG_ICASE
|
||||
; upper and lower have no meaning here so they fail, however these
|
||||
; may compile with other libraries...
|
||||
;[[:lower:]] !
|
||||
;[[:upper:]] !
|
||||
0123456789@abcdefghijklmnopqrstuvwxyz\[\\\]\^_`ABCDEFGHIJKLMNOPQRSTUVWXYZ\{\|\} 0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]\^_`abcdefghijklmnopqrstuvwxyz\{\|\} 0 72
|
||||
|
||||
; known and suspected bugs:
|
||||
- match_default normal REG_EXTENDED
|
||||
\( ( 0 1
|
||||
\) ) 0 1
|
||||
\$ $ 0 1
|
||||
\^ ^ 0 1
|
||||
\. . 0 1
|
||||
\* * 0 1
|
||||
\+ + 0 1
|
||||
\? ? 0 1
|
||||
\[ [ 0 1
|
||||
\] ] 0 1
|
||||
\| | 0 1
|
||||
\\ \\ 0 1
|
||||
# # 0 1
|
||||
\# # 0 1
|
||||
a- a- 0 2
|
||||
\- - 0 1
|
||||
\{ { 0 1
|
||||
\} } 0 1
|
||||
0 0 0 1
|
||||
1 1 0 1
|
||||
9 9 0 1
|
||||
b b 0 1
|
||||
B B 0 1
|
||||
< < 0 1
|
||||
> > 0 1
|
||||
w w 0 1
|
||||
W W 0 1
|
||||
` ` 0 1
|
||||
' ' 0 1
|
||||
\n \n 0 1
|
||||
, , 0 1
|
||||
a a 0 1
|
||||
f f 0 1
|
||||
n n 0 1
|
||||
r r 0 1
|
||||
t t 0 1
|
||||
v v 0 1
|
||||
c c 0 1
|
||||
x x 0 1
|
||||
: : 0 1
|
||||
(\.[[:alnum:]]+){2} "w.a.b " 1 5 3 5
|
||||
|
||||
- match_default normal REG_EXTENDED REG_ICASE
|
||||
a A 0 1
|
||||
A a 0 1
|
||||
[abc]+ abcABC 0 6
|
||||
[ABC]+ abcABC 0 6
|
||||
[a-z]+ abcABC 0 6
|
||||
[A-Z]+ abzANZ 0 6
|
||||
[a-Z]+ abzABZ 0 6
|
||||
[A-z]+ abzABZ 0 6
|
||||
[[:lower:]]+ abyzABYZ 0 8
|
||||
[[:upper:]]+ abzABZ 0 6
|
||||
[[:word:]]+ abcZZZ 0 6
|
||||
[[:alpha:]]+ abyzABYZ 0 8
|
||||
[[:alnum:]]+ 09abyzABYZ 0 10
|
||||
|
||||
; updated tests for version 2:
|
||||
- match_default normal REG_EXTENDED
|
||||
\x41 A 0 1
|
||||
\xff \255 0 1
|
||||
\xFF \255 0 1
|
||||
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
|
||||
\c@ \0 0 1
|
||||
- match_default normal REG_EXTENDED
|
||||
\cA \1 0 1
|
||||
\cz \58 0 1
|
||||
\c= !
|
||||
\c? !
|
||||
=: =: 0 2
|
||||
|
||||
; word start:
|
||||
[[:<:]]abcd " abcd" 2 6
|
||||
[[:<:]]ab cab -1 -1
|
||||
[[:<:]]ab "\nab" 1 3
|
||||
[[:<:]]tag ::tag 2 5
|
||||
;word end:
|
||||
abc[[:>:]] abc 0 3
|
||||
abc[[:>:]] abcd -1 -1
|
||||
abc[[:>:]] abc\n 0 3
|
||||
abc[[:>:]] abc:: 0 3
|
||||
|
||||
; collating elements and rewritten set code:
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
[[.zero.]] 0 0 1
|
||||
[[.one.]] 1 0 1
|
||||
[[.two.]] 2 0 1
|
||||
[[.three.]] 3 0 1
|
||||
[[.a.]] baa 1 2
|
||||
[[.right-curly-bracket.]] } 0 1
|
||||
[[.NUL.]] \0 0 1
|
||||
[[:<:]z] !
|
||||
[a[:>:]] !
|
||||
[[=a=]] a 0 1
|
||||
[[=right-curly-bracket=]] } 0 1
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
|
||||
[[.A.]] A 0 1
|
||||
[[.A.]] a 0 1
|
||||
[[.A.]-b]+ AaBb 0 4
|
||||
[A-[.b.]]+ AaBb 0 4
|
||||
[[.a.]-B]+ AaBb 0 4
|
||||
[a-[.B.]]+ AaBb 0 4
|
||||
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
|
||||
[\x61] a 0 1
|
||||
[\x61-c]+ abcd 0 3
|
||||
[a-\x63]+ abcd 0 3
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
[[.a.]-c]+ abcd 0 3
|
||||
[a-[.c.]]+ abcd 0 3
|
||||
[[:alpha:]-a] !
|
||||
[a-[:alpha:]] !
|
||||
|
||||
; try mutli-character ligatures:
|
||||
[[.ae.]] ae 0 2
|
||||
[[.ae.]] aE -1 -1
|
||||
[[.AE.]] AE 0 2
|
||||
[[.Ae.]] Ae 0 2
|
||||
[[.ae.]-b] a -1 -1
|
||||
[[.ae.]-b] b 0 1
|
||||
[[.ae.]-b] ae 0 2
|
||||
[a-[.ae.]] a 0 1
|
||||
[a-[.ae.]] b -1 -1
|
||||
[a-[.ae.]] ae 0 2
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
|
||||
[[.ae.]] AE 0 2
|
||||
[[.ae.]] Ae 0 2
|
||||
[[.AE.]] Ae 0 2
|
||||
[[.Ae.]] aE 0 2
|
||||
[[.AE.]-B] a -1 -1
|
||||
[[.Ae.]-b] b 0 1
|
||||
[[.Ae.]-b] B 0 1
|
||||
[[.ae.]-b] AE 0 2
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
;extended perl style escape sequences:
|
||||
\e \27 0 1
|
||||
\x1b \27 0 1
|
||||
\x{1b} \27 0 1
|
||||
\x{} !
|
||||
\x{ !
|
||||
\x} !
|
||||
\x !
|
||||
\x{yy !
|
||||
\x{1b !
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_NO_POSIX_TEST
|
||||
\l+ ABabcAB 2 5
|
||||
[\l]+ ABabcAB 2 5
|
||||
[a-\l] !
|
||||
[\l-a] !
|
||||
[\L] !
|
||||
\L+ abABCab 2 5
|
||||
\u+ abABCab 2 5
|
||||
[\u]+ abABCab 2 5
|
||||
[\U] !
|
||||
\U+ ABabcAB 2 5
|
||||
\d+ ab012ab 2 5
|
||||
[\d]+ ab012ab 2 5
|
||||
[\D] !
|
||||
\D+ 01abc01 2 5
|
||||
\s+ "ab ab" 2 5
|
||||
[\s]+ "ab ab" 2 5
|
||||
[\S] !
|
||||
\S+ " abc " 2 5
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
\Qabc !
|
||||
\Qabc\E abcd 0 3
|
||||
\Qabc\Ed abcde 0 4
|
||||
\Q+*?\\E +*?\\ 0 4
|
||||
|
||||
\C+ abcde 0 5
|
||||
\X+ abcde 0 5
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_UNICODE_ONLY
|
||||
\X+ a\768\769 0 3
|
||||
\X+ \2309\2307 0 2 ;DEVANAGARI script
|
||||
\X+ \2489\2494 0 2 ;BENGALI script
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND
|
||||
\Aabc abc 0 3
|
||||
\Aabc aabc -1 -1
|
||||
abc\z abc 0 3
|
||||
abc\z abcd -1 -1
|
||||
abc\Z abc\n\n 0 3
|
||||
abc\Z abc 0 3
|
||||
|
||||
|
||||
\Gabc abc 0 3
|
||||
\Gabc dabcd -1 -1
|
||||
a\Gbc abc -1 -1
|
||||
a\Aab abc -1 -1
|
||||
|
||||
;
|
||||
; now test grep,
|
||||
; basically check all our restart types - line, word, etc
|
||||
; checking each one for null and non-null matches.
|
||||
;
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP
|
||||
a " a a a aa" 1 2 3 4 5 6 7 8 8 9
|
||||
a+b+ "aabaabbb ab" 0 3 3 8 9 11
|
||||
a(b*|c|e)d adabbdacd 0 2 2 6 6 9
|
||||
a "\na\na\na\naa" 1 2 3 4 5 6 7 8 8 9
|
||||
|
||||
^ " \n\n \n\n\n" 0 0 4 4 5 5 8 8 9 9 10 10
|
||||
^ab "ab \nab ab\n" 0 2 5 7
|
||||
^[^\n]*\n " \n \n\n \n" 0 4 4 7 7 8 8 11
|
||||
\<abc "abcabc abc\n\nabc" 0 3 7 10 12 15
|
||||
\< " ab a aaa " 2 2 5 5 7 7
|
||||
\<\w+\W+ " aa aa a " 1 5 5 9 9 11
|
||||
|
||||
\Aabc "abc abc" 0 3
|
||||
\G\w+\W+ "abc abc a cbbb " 0 5 5 9 9 11 11 18
|
||||
\Ga+b+ "aaababb abb" 0 4 4 7
|
||||
|
||||
abc abc 0 3
|
||||
abc " abc abcabc " 1 4 5 8 8 11
|
||||
\n\n " \n\n\n \n \n\n\n\n " 1 3 18 20 20 22
|
||||
|
||||
$ " \n\n \n\n\n" 3 3 4 4 7 7 8 8 9 9 10 10
|
||||
\b " abb a abbb " 2 2 5 5 6 6 7 7 8 8 12 12
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP REG_ICASE
|
||||
A " a a a aa" 1 2 3 4 5 6 7 8 8 9
|
||||
A+B+ "aabaabbb ab" 0 3 3 8 9 11
|
||||
A(B*|c|e)D adabbdacd 0 2 2 6 6 9
|
||||
A "\na\na\na\naa" 1 2 3 4 5 6 7 8 8 9
|
||||
|
||||
^aB "Ab \nab Ab\n" 0 2 5 7
|
||||
\<abc "Abcabc aBc\n\nabc" 0 3 7 10 12 15
|
||||
|
||||
ABC abc 0 3
|
||||
abc " ABC ABCABC " 1 4 5 8 8 11
|
||||
|
||||
|
||||
;
|
||||
; now test merge,
|
||||
;
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_no_copy
|
||||
; start by testing subs:
|
||||
a+ "...aaa,,," $` "..."
|
||||
a+ "...aaa,,," $' ",,,"
|
||||
a+ "...aaa,,," $& "aaa"
|
||||
a+ "...aaa,,," $0 aaa
|
||||
a+ "...aaa,,," $1 ""
|
||||
a+ "...aaa,,," $15 ""
|
||||
(a+)b+ "...aaabbb,,," $1 aaa
|
||||
[[:digit:]]* 123ab <$0> <123><><><>
|
||||
[[:digit:]]* 123ab1 <$0> <123><><><1>
|
||||
|
||||
; and now escapes:
|
||||
a+ "...aaa,,," $x "$x"
|
||||
a+ "...aaa,,," \a "\a"
|
||||
a+ "...aaa,,," \f "\f"
|
||||
a+ "...aaa,,," \n "\n"
|
||||
a+ "...aaa,,," \r "\r"
|
||||
a+ "...aaa,,," \t "\t"
|
||||
a+ "...aaa,,," \v "\v"
|
||||
|
||||
a+ "...aaa,,," \x21 "!"
|
||||
a+ "...aaa,,," \x{21} "!"
|
||||
a+ "...aaa,,," \c@ \0
|
||||
a+ "...aaa,,," \e \27
|
||||
a+ "...aaa,,," \0101 A
|
||||
a+ "...aaa,,," (\0101) A
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_sed format_no_copy
|
||||
(a+)(b+) ...aabb,, \0 aabb
|
||||
(a+)(b+) ...aabb,, \1 aa
|
||||
(a+)(b+) ...aabb,, \2 bb
|
||||
(a+)(b+) ...aabb,, & aabb
|
||||
(a+)(b+) ...aabb,, $ $
|
||||
(a+)(b+) ...aabb,, $1 $1
|
||||
(a+)(b+) ...aabb,, ()?: ()?:
|
||||
(a+)(b+) ...aabb,, \\ \\
|
||||
(a+)(b+) ...aabb,, \& &
|
||||
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_perl format_no_copy
|
||||
(a+)(b+) ...aabb,, $0 aabb
|
||||
(a+)(b+) ...aabb,, $1 aa
|
||||
(a+)(b+) ...aabb,, $2 bb
|
||||
(a+)(b+) ...aabb,, $& aabb
|
||||
(a+)(b+) ...aabb,, & &
|
||||
(a+)(b+) ...aabb,, \0 \0
|
||||
(a+)(b+) ...aabb,, ()?: ()?:
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE
|
||||
; move to copying unmatched data:
|
||||
a+ "...aaa,,," bbb "...bbb,,,"
|
||||
a+(b+) "...aaabb,,," $1 "...bb,,,"
|
||||
a+(b+) "...aaabb,,,ab*abbb?" $1 "...bb,,,b*bbb?"
|
||||
|
||||
(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A)(?2B) "...AB,,,AB*AB?"
|
||||
(a+)|(b+) "...aaabb,,,ab*abbb?" ?1A:B "...AB,,,AB*AB?"
|
||||
(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A:B)C "...ACBC,,,ACBC*ACBC?"
|
||||
(a+)|(b+) "...aaabb,,,ab*abbb?" ?1:B "...B,,,B*B?"
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_first_only
|
||||
; move to copying unmatched data, but replace first occurance only:
|
||||
a+ "...aaa,,," bbb "...bbb,,,"
|
||||
a+(b+) "...aaabb,,," $1 "...bb,,,"
|
||||
a+(b+) "...aaabb,,,ab*abbb?" $1 "...bb,,,ab*abbb?"
|
||||
(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A)(?2B) "...Abb,,,ab*abbb?"
|
||||
|
||||
;
|
||||
; changes to newline handling with 2.11:
|
||||
;
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP
|
||||
|
||||
^. " \n \r\n " 0 1 3 4 7 8
|
||||
.$ " \n \r\n " 1 2 4 5 8 9
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_GREP REG_UNICODE_ONLY
|
||||
^. " \8232 \8233 " 0 1 3 4 5 6
|
||||
.$ " \8232 \8233 " 1 2 3 4 6 7
|
||||
|
||||
;
|
||||
; non-greedy repeats added 21/04/00
|
||||
- match_default normal REG_EXTENDED
|
||||
a** !
|
||||
a*? aa 0 0
|
||||
a?? aa 0 0
|
||||
a++ !
|
||||
a+? aa 0 1
|
||||
a{1,3}{1} !
|
||||
a{1,3}? aaa 0 1
|
||||
\w+?w ...ccccccwcccccw 3 10
|
||||
\W+\w+?w ...ccccccwcccccw 0 10
|
||||
abc|\w+? abd 0 1
|
||||
abc|\w+? abcd 0 3
|
||||
<\s*tag[^>]*>(.*?)<\s*/tag\s*> " <tag>here is some text</tag> <tag></tag>" 1 29 6 23
|
||||
<\s*tag[^>]*>(.*?)<\s*/tag\s*> " < tag attr=\"something\">here is some text< /tag > <tag></tag>" 1 49 24 41
|
||||
|
||||
;
|
||||
; non-marking parenthesis added 25/04/00
|
||||
- match_default normal REG_EXTENDED
|
||||
(?:abc)+ xxabcabcxx 2 8
|
||||
(?:a+)(b+) xaaabbbx 1 7 4 7
|
||||
(a+)(?:b+) xaaabbba 1 7 1 4
|
||||
(?:(a+)b+) xaaabbba 1 7 1 4
|
||||
(?:a+(b+)) xaaabbba 1 7 4 7
|
||||
a+(?#b+)b+ xaaabbba 1 7
|
||||
(a)(?:b|$) ab 0 2 0 1
|
||||
(a)(?:b|$) a 0 1 0 1
|
||||
|
||||
|
||||
;
|
||||
; try some partial matches:
|
||||
- match_partial match_default normal REG_EXTENDED REG_NO_POSIX_TEST
|
||||
(xyz)(.*)abc xyzaaab -1 -1 0 3 3 7
|
||||
(xyz)(.*)abc xyz -1 -1 0 3 3 3
|
||||
(xyz)(.*)abc xy -1 -1 -1 -1 -1 -1
|
||||
|
||||
;
|
||||
; forward lookahead asserts added 21/01/02
|
||||
- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
|
||||
((?:(?!a|b)\w)+)(\w+) " xxxabaxxx " 2 11 2 5 5 11
|
||||
|
||||
/\*(?:(?!\*/).)*\*/ " /**/ " 2 6
|
||||
/\*(?:(?!\*/).)*\*/ " /***/ " 2 7
|
||||
/\*(?:(?!\*/).)*\*/ " /********/ " 2 12
|
||||
/\*(?:(?!\*/).)*\*/ " /* comment */ " 2 15
|
||||
|
||||
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)<\s*/\s*a\s*> " <a href=\"here\">here</a> " 1 24 16 20
|
||||
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)<\s*/\s*a\s*> " <a href=\"here\">here< / a > " 1 28 16 20
|
||||
|
||||
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)(?=<\s*/\s*a\s*>) " <a href=\"here\">here</a> " 1 20 16 20
|
||||
<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)(?=<\s*/\s*a\s*>) " <a href=\"here\">here< / a > " 1 20 16 20
|
||||
|
||||
; filename matching:
|
||||
^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ command.com 0 11
|
||||
^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ PRN -1 -1
|
||||
^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ COM2 -1 -1
|
||||
|
||||
; password checking:
|
||||
^(?=.*\d).{4,8}$ abc3 0 4
|
||||
^(?=.*\d).{4,8}$ abc3def4 0 8
|
||||
^(?=.*\d).{4,8}$ ab2 -1 -1
|
||||
^(?=.*\d).{4,8}$ abcdefg -1 -1
|
||||
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ abc3 -1 -1
|
||||
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ abC3 0 4
|
||||
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ ABCD3 -1 -1
|
||||
|
||||
|
||||
|
||||
|
||||
|
1016
traits_class_ref.htm
1016
traits_class_ref.htm
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user