forked from boostorg/regex
Compare commits
20 Commits
svn-branch
...
svn-branch
Author | SHA1 | Date | |
---|---|---|---|
61741228da | |||
e3aacc5c55 | |||
dc90d07749 | |||
300ca31723 | |||
c65dd3be41 | |||
5a29047906 | |||
cdd130ae2d | |||
9f47972bcf | |||
0633ba70f2 | |||
8330b19cec | |||
b8eab985e5 | |||
f90d8c667e | |||
4b7f14e72d | |||
89515b9a8e | |||
3075aaba4a | |||
50b8204753 | |||
02c652c01e | |||
b3f7d35f68 | |||
f1312f16c4 | |||
9ebe515adf |
@ -25,25 +25,32 @@
|
||||
<BR>
|
||||
<BR>
|
||||
<HR>
|
||||
<P>The author can be contacted at john@johnmaddock.co.uk; the
|
||||
home page for this library is at <A href="http://www.boost.org">www.boost.org</A>.</P>
|
||||
<P>I am indebted to Robert Sedgewick's "Algorithms in C++" for forcing me to think
|
||||
about algorithms and their performance, and to the folks at boost for forcing
|
||||
me to <I>think</I>, period. The following people have all contributed useful
|
||||
comments or fixes: Dave Abrahams, Mike Allison, Edan Ayal, Jayashree
|
||||
Balasubramanian, Jan B<>lsche, Beman Dawes, Paul Baxter, David Bergman, David
|
||||
Dennerline, Edward Diener, Peter Dimov, Robert Dunn, Fabio Forno, Tobias
|
||||
Gabrielsson, Rob Gillen, Marc Gregoire, Chris Hecker, Nick Hodapp, Jesse Jones,
|
||||
Martin Jost, Boris Krasnovskiy, Jan Hermelink, Max Leung, Wei-hao Lin, Jens
|
||||
Maurer, Richard Peters, Heiko Schmidt, Jason Shirk, Gerald Slacik, Scobie
|
||||
Smith, Mike Smyth, Alexander Sokolovsky, Herv<72> Poirier, Michael Raykh, Marc
|
||||
Recht, Scott VanCamp, Bruno Voigt, Alexey Voinov, Jerry Waldorf, Rob Ward,
|
||||
Lealon Watts, Thomas Witt and Yuval Yosef. I am also grateful to the manuals
|
||||
supplied with the Henry Spencer, Perl and GNU regular expression libraries -
|
||||
wherever possible I have tried to maintain compatibility with these libraries
|
||||
and with the POSIX standard - the code however is entirely my own, including
|
||||
any bugs! I can absolutely guarantee that I will not fix any bugs I don't know
|
||||
about, so if you have any comments or spot any bugs, please get in touch.</P>
|
||||
<P>The author can be contacted at john@johnmaddock.co.uk; the home page for
|
||||
this library is at <A href="http://www.boost.org">www.boost.org</A>.</P>
|
||||
<P>I am indebted to <A href="http://www.cs.princeton.edu/~rs/">Robert Sedgewick's
|
||||
"Algorithms in C++" </A>for forcing me to think about algorithms and their
|
||||
performance, and to the folks at <A href="http://www.boost.org">boost</A> for
|
||||
forcing me to <I>think</I>, period.</P>
|
||||
<P><A href="http://www.boost-consulting.com">Eric Niebler</A>, author of the <A href="http://research.microsoft.com/projects/greta">
|
||||
GRETA regular expression component</A>, has shared several important ideas,
|
||||
in a series of long discussions.</P>
|
||||
<P>Pete Becker, of <A href="http://www.dinkumware.com/">Dinkumware Ltd</A>, has
|
||||
helped enormously with the standardisation proposal language.</P>
|
||||
<P>The following people have all contributed useful comments or fixes: Dave
|
||||
Abrahams, Mike Allison, Edan Ayal, Jayashree Balasubramanian, Jan B<>lsche,
|
||||
Beman Dawes, Paul Baxter, David Bergman, David Dennerline, Edward Diener, Peter
|
||||
Dimov, Robert Dunn, Fabio Forno, Tobias Gabrielsson, Rob Gillen, Marc Gregoire,
|
||||
Chris Hecker, Nick Hodapp, Jesse Jones, Martin Jost, Boris Krasnovskiy, Jan
|
||||
Hermelink, Max Leung, Wei-hao Lin, Jens Maurer, Richard Peters, Heiko Schmidt,
|
||||
Jason Shirk, Gerald Slacik, Scobie Smith, Mike Smyth, Alexander Sokolovsky,
|
||||
Herv<EFBFBD> Poirier, Michael Raykh, Marc Recht, Scott VanCamp, Bruno Voigt, Alexey
|
||||
Voinov, Jerry Waldorf, Rob Ward, Lealon Watts, John Wismar, Thomas Witt and
|
||||
Yuval Yosef. I am also grateful to the manuals supplied with the Henry Spencer,
|
||||
Perl and GNU regular expression libraries - wherever possible I have tried to
|
||||
maintain compatibility with these libraries and with the POSIX standard - the
|
||||
code however is entirely my own, including any bugs! I can absolutely guarantee
|
||||
that I will not fix any bugs I don't know about, so if you have any comments or
|
||||
spot any bugs, please get in touch.</P>
|
||||
<P>Useful further information can be found at:</P>
|
||||
<P>Short tutorials on regular expressions can be <A href="http://etext.lib.virginia.edu/helpsheets/regex.html">
|
||||
found here</A> and <A href="http://www.devshed.com/Server_Side/Administration/RegExp/page1.html">here</A>.</P>
|
||||
@ -72,8 +79,7 @@
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -1,153 +1,114 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||
<title>Boost.Regex: FAQ</title>
|
||||
<meta http-equiv="Content-Type" content=
|
||||
"text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<p></p>
|
||||
|
||||
<table id="Table1" cellspacing="1" cellpadding="1" width="100%"
|
||||
border="0">
|
||||
<tr>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt=
|
||||
"C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<td width="353">
|
||||
<h1 align="center">Boost.Regex</h1>
|
||||
|
||||
<h2 align="center">FAQ</h2>
|
||||
</td>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt=
|
||||
"Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
|
||||
<hr>
|
||||
<font color="#ff0000"><font color="#ff0000"></font></font>
|
||||
<p><font color="#ff0000"><font color="#ff0000"><font color=
|
||||
"#ff0000"> Q. Why can't I use the "convenience" versions of
|
||||
regex_match / regex_search / regex_grep / regex_format /
|
||||
regex_merge?</font></font></font></p>
|
||||
|
||||
<p>A. These versions may or may not be available depending upon the
|
||||
capabilities of your compiler, the rules determining the format of
|
||||
these functions are quite complex - and only the versions visible
|
||||
to a standard compliant compiler are given in the help. To find out
|
||||
what your compiler supports, run <boost/regex.hpp> through
|
||||
your C++ pre-processor, and search the output file for the function
|
||||
that you are interested in.<font color="#ff0000"><font color=
|
||||
"#ff0000"></font></font></p>
|
||||
|
||||
<p><font color="#ff0000"><font color="#ff0000">Q. I can't get
|
||||
regex++ to work with escape characters, what's going
|
||||
on?</font></font></p>
|
||||
|
||||
<p>A. If you embed regular expressions in C++ code, then remember
|
||||
that escape characters are processed twice: once by the C++
|
||||
compiler, and once by the regex++ expression compiler, so to pass
|
||||
the regular expression \d+ to regex++, you need to embed "\\d+" in
|
||||
your code. Likewise to match a literal backslash you will need to
|
||||
embed "\\\\" in your code. <font color="#ff0000"></font></p>
|
||||
|
||||
<p><font color="#ff0000">Q. Why does using parenthesis in a POSIX
|
||||
regular expression change the result of a match?</font></p>
|
||||
|
||||
<p>For POSIX (extended and basic) regular expressions, but not for
|
||||
perl regexes, parentheses don't only mark; they determine what the
|
||||
best match is as well. When the expression is compiled as a POSIX
|
||||
basic or extended regex then Boost.regex follows the POSIX standard
|
||||
leftmost longest rule for determining what matched. So if there is
|
||||
more than one possible match after considering the whole
|
||||
expression, it looks next at the first sub-expression and then the
|
||||
second sub-expression and so on. So...</p>
|
||||
|
||||
<pre>
|
||||
<head>
|
||||
<title>Boost.Regex: FAQ</title>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<p></p>
|
||||
<table id="Table1" cellspacing="1" cellpadding="1" width="100%" border="0">
|
||||
<tr>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<td width="353">
|
||||
<h1 align="center">Boost.Regex</h1>
|
||||
<h2 align="center">FAQ</h2>
|
||||
</td>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<hr>
|
||||
<font color="#ff0000"><font color="#ff0000"></font></font>
|
||||
<p><font color="#ff0000"><font color="#ff0000"><font color="#ff0000"> Q. Why can't I
|
||||
use the "convenience" versions of regex_match / regex_search / regex_grep /
|
||||
regex_format / regex_merge?</font></font></font></p>
|
||||
<p>A. These versions may or may not be available depending upon the capabilities
|
||||
of your compiler, the rules determining the format of these functions are quite
|
||||
complex - and only the versions visible to a standard compliant compiler are
|
||||
given in the help. To find out what your compiler supports, run
|
||||
<boost/regex.hpp> through your C++ pre-processor, and search the output
|
||||
file for the function that you are interested in.<font color="#ff0000"><font color="#ff0000"></font></font></p>
|
||||
<p><font color="#ff0000"><font color="#ff0000">Q. I can't get regex++ to work with
|
||||
escape characters, what's going on?</font></font></p>
|
||||
<p>A. If you embed regular expressions in C++ code, then remember that escape
|
||||
characters are processed twice: once by the C++ compiler, and once by the
|
||||
regex++ expression compiler, so to pass the regular expression \d+ to regex++,
|
||||
you need to embed "\\d+" in your code. Likewise to match a literal backslash
|
||||
you will need to embed "\\\\" in your code. <font color="#ff0000"></font>
|
||||
</p>
|
||||
<p><font color="#ff0000">Q. Why does using parenthesis in a POSIX regular expression
|
||||
change the result of a match?</font></p>
|
||||
<p>For POSIX (extended and basic) regular expressions, but not for perl regexes,
|
||||
parentheses don't only mark; they determine what the best match is as well.
|
||||
When the expression is compiled as a POSIX basic or extended regex then
|
||||
Boost.regex follows the POSIX standard leftmost longest rule for determining
|
||||
what matched. So if there is more than one possible match after considering the
|
||||
whole expression, it looks next at the first sub-expression and then the second
|
||||
sub-expression and so on. So...</p>
|
||||
<pre>
|
||||
"(0*)([0-9]*)" against "00123" would produce
|
||||
$1 = "00"
|
||||
$2 = "123"
|
||||
</pre>
|
||||
|
||||
<p>where as</p>
|
||||
|
||||
<pre>
|
||||
"0*([0-9)*" against "00123" would produce
|
||||
<p>where as</p>
|
||||
<pre>
|
||||
"0*([0-9])*" against "00123" would produce
|
||||
$1 = "00123"
|
||||
</pre>
|
||||
|
||||
<p>If you think about it, had $1 only matched the "123", this would
|
||||
be "less good" than the match "00123" which is both further to the
|
||||
left and longer. If you want $1 to match only the "123" part, then
|
||||
you need to use something like:</p>
|
||||
|
||||
<pre>
|
||||
<p>If you think about it, had $1 only matched the "123", this would be "less good"
|
||||
than the match "00123" which is both further to the left and longer. If you
|
||||
want $1 to match only the "123" part, then you need to use something like:</p>
|
||||
<pre>
|
||||
"0*([1-9][0-9]*)"
|
||||
</pre>
|
||||
|
||||
<p>as the expression.</p>
|
||||
|
||||
<p><font color="#ff0000">Q. Why don't character ranges work
|
||||
properly (POSIX mode only)?</font><br>
|
||||
A. The POSIX standard specifies that character range expressions
|
||||
are locale sensitive - so for example the expression [A-Z] will
|
||||
match any collating element that collates between 'A' and 'Z'. That
|
||||
means that for most locales other than "C" or "POSIX", [A-Z] would
|
||||
match the single character 't' for example, which is not what most
|
||||
people expect - or at least not what most people have come to
|
||||
expect from regular expression engines. For this reason, the
|
||||
default behaviour of boost.regex (perl mode) is to turn locale
|
||||
sensitive collation off by not setting the regex_constants::collate
|
||||
compile time flag. However if you set a non-default compile time
|
||||
flag - for example regex_constants::extended or
|
||||
regex_constants::basic, then locale dependent collation will be
|
||||
enabled, this also applies to the POSIX API functions which use
|
||||
either regex_constants::extended or regex_constants::basic
|
||||
internally. <i>[Note - when regex_constants::nocollate in effect,
|
||||
the library behaves "as if" the LC_COLLATE locale category were
|
||||
always "C", regardless of what its actually set to - end
|
||||
note</i>].</p>
|
||||
|
||||
<p><font color="#ff0000">Q. Why are there no throw specifications
|
||||
on any of the functions? What exceptions can the library
|
||||
throw?</font></p>
|
||||
|
||||
<p>A. Not all compilers support (or honor) throw specifications,
|
||||
others support them but with reduced efficiency. Throw
|
||||
specifications may be added at a later date as compilers begin to
|
||||
handle this better. The library should throw only three types of
|
||||
exception: boost::bad_expression can be thrown by basic_regex when
|
||||
compiling a regular expression, std::runtime_error can be thrown
|
||||
when a call to basic_regex::imbue tries to open a message catalogue
|
||||
that doesn't exist, or when a call to regex_search or regex_match
|
||||
results in an "everlasting" search, or when a call to
|
||||
RegEx::GrepFiles or RegEx::FindFiles tries to open a file that
|
||||
cannot be opened, finally std::bad_alloc can be thrown by just
|
||||
about any of the functions in this library.</p>
|
||||
|
||||
<p></p>
|
||||
|
||||
<hr>
|
||||
<p>as the expression.</p>
|
||||
<p><font color="#ff0000">Q. Why don't character ranges work properly (POSIX mode
|
||||
only)?</font><br>
|
||||
A. The POSIX standard specifies that character range expressions are locale
|
||||
sensitive - so for example the expression [A-Z] will match any collating
|
||||
element that collates between 'A' and 'Z'. That means that for most locales
|
||||
other than "C" or "POSIX", [A-Z] would match the single character 't' for
|
||||
example, which is not what most people expect - or at least not what most
|
||||
people have come to expect from regular expression engines. For this reason,
|
||||
the default behaviour of boost.regex (perl mode) is to turn locale sensitive
|
||||
collation off by not setting the regex_constants::collate compile time flag.
|
||||
However if you set a non-default compile time flag - for example
|
||||
regex_constants::extended or regex_constants::basic, then locale dependent
|
||||
collation will be enabled, this also applies to the POSIX API functions which
|
||||
use either regex_constants::extended or regex_constants::basic internally. <i>[Note
|
||||
- when regex_constants::nocollate in effect, the library behaves "as if" the
|
||||
LC_COLLATE locale category were always "C", regardless of what its actually set
|
||||
to - end note</i>].</p>
|
||||
<p><font color="#ff0000">Q. Why are there no throw specifications on any of the
|
||||
functions? What exceptions can the library throw?</font></p>
|
||||
<p>A. Not all compilers support (or honor) throw specifications, others support
|
||||
them but with reduced efficiency. Throw specifications may be added at a later
|
||||
date as compilers begin to handle this better. The library should throw only
|
||||
three types of exception: boost::bad_expression can be thrown by basic_regex
|
||||
when compiling a regular expression, std::runtime_error can be thrown when a
|
||||
call to basic_regex::imbue tries to open a message catalogue that doesn't
|
||||
exist, or when a call to regex_search or regex_match results in an
|
||||
"everlasting" search, or when a call to RegEx::GrepFiles or
|
||||
RegEx::FindFiles tries to open a file that cannot be opened, finally
|
||||
std::bad_alloc can be thrown by just about any of the functions in this
|
||||
library.</p>
|
||||
<p></p>
|
||||
<hr>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
@ -26,14 +26,14 @@
|
||||
<br>
|
||||
<hr>
|
||||
<h3>Synopsis</h3>
|
||||
<p>The type <code>match_flag_type</code> is an implementation defined bitmask type
|
||||
(17.3.2.1.2) that controls how a regular expression is matched against a
|
||||
<p>The type <code>match_flag_type</code> is an implementation specific bitmask
|
||||
type (17.3.2.1.2) that controls how a regular expression is matched against a
|
||||
character sequence. The behavior of the format flags is descibed in more
|
||||
detail in the <A href="format_syntax.html">format syntax guide</A>.</p>
|
||||
<pre>
|
||||
namespace std{ namespace regex_constants{
|
||||
namespace boost{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type match_flag_type;
|
||||
typedef <EM>implemenation-specific-bitmask-type</EM> match_flag_type;
|
||||
|
||||
static const match_flag_type match_default = 0;
|
||||
static const match_flag_type match_not_bob;
|
||||
@ -59,11 +59,11 @@ static const match_flag_type format_first_only;
|
||||
static const match_flag_type format_all;
|
||||
|
||||
} // namespace regex_constants
|
||||
} // namespace std
|
||||
} // namespace boost
|
||||
</pre>
|
||||
<h3>Description</h3>
|
||||
<p>The type <code>match_flag_type</code> is an implementation defined bitmask type
|
||||
(17.3.2.1.2). When matching a regular expression against a sequence of
|
||||
<p>The type <code>match_flag_type</code> is an implementation specific bitmask
|
||||
type (17.3.2.1.2). When matching a regular expression against a sequence of
|
||||
characters [first, last) then setting its elements has the effects listed in
|
||||
the table below:</p>
|
||||
<p></p>
|
||||
@ -271,10 +271,10 @@ static const match_flag_type format_all;
|
||||
<br>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -344,7 +344,7 @@ const_iterator end()const;
|
||||
<p><b>Effects:</b> Returns a terminating iterator that enumerates over all the
|
||||
marked sub-expression matches stored in *this.</p>
|
||||
<h4><A name="format"></A>match_results reformatting</h4>
|
||||
<pre>template <class OutputIterator>
|
||||
<pre><A name=m12></A>template <class OutputIterator>
|
||||
OutputIterator format(OutputIterator out,
|
||||
const string_type& fmt,
|
||||
<A href="match_flag_type.html" >match_flag_type</A> flags = format_default);
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -42,7 +42,7 @@
|
||||
iterator first,
|
||||
iterator last,
|
||||
<b>const</b> basic_regex<charT, traits, Allocator>& e,
|
||||
<b>unsigned</b> flags = match_default)
|
||||
boost::match_flag_type flags = match_default)
|
||||
</pre>
|
||||
<p>The library also defines the following convenience versions, which take either
|
||||
a const charT*, or a const std::basic_string<>& in place of a pair of
|
||||
@ -53,13 +53,13 @@
|
||||
<b>unsigned</b> <b>int</b> regex_grep(Predicate foo,
|
||||
<b>const</b> charT* str,
|
||||
<b>const</b> basic_regex<charT, traits, Allocator>& e,
|
||||
<b>unsigned</b> flags = match_default);
|
||||
boost::match_flag_type flags = match_default);
|
||||
|
||||
<b>template</b> <<b>class</b> Predicate, <b>class</b> ST, <b>class</b> SA, <b>class</b> Allocator, <b>class</b> charT, <b>class</b> traits>
|
||||
<b>unsigned</b> <b>int</b> regex_grep(Predicate foo,
|
||||
<b>const</b> std::basic_string<charT, ST, SA>& s,
|
||||
<b>const</b> basic_regex<charT, traits, Allocator>& e,
|
||||
<b>unsigned</b> flags = match_default);
|
||||
boost::match_flag_type flags = match_default);
|
||||
</pre>
|
||||
<p>The parameters for the primary version of regex_grep have the following
|
||||
meanings: </p>
|
||||
@ -370,11 +370,10 @@ index[std::string(what[5].first, what[5].second) + std::string(what[6].first, wh
|
||||
<hr>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -294,7 +294,7 @@ void</B> IndexClasses(map_type& m, <B>const</B> std::string& file)
|
||||
start = file.begin();
|
||||
end = file.end();
|
||||
boost::<a href="match_results.html">match_results</a><std::string::const_iterator> what;
|
||||
<B>unsigned</B> <B>int</B> flags = boost::match_default;
|
||||
boost::match_flag_type flags = boost::match_default;
|
||||
<B>while</B>(regex_search(start, end, what, expression, flags))
|
||||
{
|
||||
<FONT color=#000080> <I>// what[0] contains the whole string
|
||||
@ -314,11 +314,10 @@ void</B> IndexClasses(map_type& m, <B>const</B> std::string& file)
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -38,15 +38,15 @@
|
||||
<PRE><B>template</B> <<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1, <B>class</B> Traits2, <B>class</B> Alloc2>
|
||||
std::size_t regex_split(OutputIterator out,
|
||||
std::basic_string<charT, Traits1, Alloc1>& s,
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
<B> unsigned</B> flags,
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
<STRONG> </STRONG>boost::match_flag_type flags,
|
||||
std::size_t max_split);
|
||||
|
||||
<B>template</B> <<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1, <B>class</B> Traits2, <B>class</B> Alloc2>
|
||||
std::size_t regex_split(OutputIterator out,
|
||||
std::basic_string<charT, Traits1, Alloc1>& s,
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
<B>unsigned</B> flags = match_default);
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
boost::match_flag_type flags = match_default);
|
||||
|
||||
<B>template</B> <<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1>
|
||||
std::size_t regex_split(OutputIterator out,
|
||||
@ -134,11 +134,10 @@ boost::regex e(<FONT color=#000080>"<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -76,7 +76,7 @@ typedef regex_token_iterator<const char*> cregex_token_i
|
||||
typedef regex_token_iterator<std::string::const_iterator> sregex_token_iterator;
|
||||
#ifndef BOOST_NO_WREGEX
|
||||
typedef regex_token_iterator<const wchar_t*> wcregex_token_iterator;
|
||||
typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_iterator;
|
||||
typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_iterator;
|
||||
#endif
|
||||
</PRE>
|
||||
<H3><A name="description"></A>Description</H3>
|
||||
@ -84,7 +84,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
|
||||
<P><B> Effects:</B> constructs an end of sequence iterator.</P>
|
||||
<PRE><A name=c2></A>regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
|
||||
int submatch = 0, match_flag_type m = match_default);</PRE>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>.</P>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>. Object re shall exist
|
||||
for the lifetime of the iterator constructed from it.</P>
|
||||
<P><B> Effects:</B> constructs a regex_token_iterator that will enumerate one
|
||||
string for each regular expression match of the expression <EM>re</EM> found
|
||||
within the sequence <EM>[a,b)</EM>, using match flags <EM>m</EM>. The
|
||||
@ -99,7 +100,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
|
||||
configured</A> in non-recursive mode).</P>
|
||||
<PRE><A name=c3></A>regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
|
||||
const std::vector<int>& submatches, match_flag_type m = match_default);</PRE>
|
||||
<P><B> Preconditions:</B> <CODE>submatches.size() && !re.empty()</CODE>.</P>
|
||||
<P><B> Preconditions:</B> <CODE>submatches.size() && !re.empty()</CODE>.
|
||||
Object re shall exist for the lifetime of the iterator constructed from it.</P>
|
||||
<P><B> Effects:</B> constructs a regex_token_iterator that will enumerate <EM>submatches.size()</EM>
|
||||
strings for each regular expression match of the expression <EM>re</EM> found
|
||||
within the sequence <EM>[a,b)</EM>, using match flags <EM>m</EM>. For
|
||||
@ -118,7 +120,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
|
||||
<PRE><A name=c4></A>template <std::size_t N>
|
||||
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
|
||||
const int (&submatches)[R], match_flag_type m = match_default);</PRE>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>.</P>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>. Object re shall exist
|
||||
for the lifetime of the iterator constructed from it.</P>
|
||||
<P><STRONG>Effects:</STRONG></B> constructs a regex_token_iterator that will
|
||||
enumerate <EM>R</EM> strings for each regular expression match of the
|
||||
expression <EM>re</EM> found within the sequence <EM>[a,b)</EM>, using match
|
||||
|
@ -24,10 +24,12 @@
|
||||
</P>
|
||||
<HR>
|
||||
<p></p>
|
||||
<P>Under construction.</P>
|
||||
<P>The current boost.regex traits class design will be migrated to that specified
|
||||
in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">regular
|
||||
expression standardization proposal</A>. </P>
|
||||
<P>
|
||||
Under construction: the current design will be replaced by that specified in
|
||||
the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">regular
|
||||
expression standardization proposal</A>, the current (obsolete) design has
|
||||
it's <A href="http://cvs.sourceforge.net/viewcvs.py/*checkout*/boost/boost/libs/regex/Attic/traits_class_ref.htm?rev=1.11">
|
||||
documentation archived online</A>.</P>
|
||||
<P>
|
||||
<HR>
|
||||
<P></P>
|
||||
@ -36,11 +38,9 @@
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
@ -91,18 +91,18 @@
|
||||
<P>Parentheses serve two purposes, to group items together into a sub-expression,
|
||||
and to mark what generated the match. For example the expression "(ab)*" would
|
||||
match all of the string "ababab". The matching algorithms <A href="regex_match.html">
|
||||
regex_match</A> and <A href="regex_search.html">regex_search</A>
|
||||
each take an instance of <A href="match_results.html">match_results</A>
|
||||
that reports what caused the match, on exit from these functions the <A href="match_results.html">
|
||||
match_results</A> contains information both on what the whole expression
|
||||
matched and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the final "ab" of
|
||||
the matching string. It is permissible for sub-expressions to match null
|
||||
strings. If a sub-expression takes no part in a match - for example if it is
|
||||
part of an alternative that is not taken - then both of the iterators that are
|
||||
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
|
||||
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
|
||||
from left to right starting from 1, sub-expression 0 is the whole expression.
|
||||
regex_match</A> and <A href="regex_search.html">regex_search</A> each take
|
||||
an instance of <A href="match_results.html">match_results</A> that reports what
|
||||
caused the match, on exit from these functions the <A href="match_results.html">match_results</A>
|
||||
contains information both on what the whole expression matched and on what each
|
||||
sub-expression matched. In the example above match_results[1] would contain a
|
||||
pair of iterators denoting the final "ab" of the matching string. It is
|
||||
permissible for sub-expressions to match null strings. If a sub-expression
|
||||
takes no part in a match - for example if it is part of an alternative that is
|
||||
not taken - then both of the iterators that are returned for that
|
||||
sub-expression point to the end of the input string, and the <I>matched</I> parameter
|
||||
for that sub-expression is <I>false</I>. Sub-expressions are indexed from left
|
||||
to right starting from 1, sub-expression 0 is the whole expression.
|
||||
</P>
|
||||
<H3>Non-Marking Parenthesis
|
||||
</H3>
|
||||
@ -143,7 +143,7 @@
|
||||
<P>A set is a set of characters that can match any single character that is a
|
||||
member of the set. Sets are delimited by "[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and equivalence
|
||||
classes. Set declarations that start with "^" contain the compliment of the
|
||||
classes. Set declarations that start with "^" contain the complement of the
|
||||
elements that follow.
|
||||
</P>
|
||||
<P>Examples:
|
||||
@ -293,7 +293,7 @@
|
||||
[^[.ae.]] would only match one character.
|
||||
</P>
|
||||
<P>
|
||||
Equivalence classes take the general form[=tagname=] inside a set declaration,
|
||||
Equivalence classes take the generalform[=tagname=] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, and matches any character that is a member of the same primary
|
||||
equivalence class as the collating element [.tagname.]. An equivalence class is
|
||||
@ -302,7 +302,7 @@
|
||||
typically collated by character, then by accent, and then by case; the primary
|
||||
sort key then relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
|
||||
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
,then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
locale independent method of obtaining the primary sort key for a character,
|
||||
except under Win32. For other operating systems the library will "guess" the
|
||||
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
|
||||
@ -666,106 +666,103 @@
|
||||
<H3>What gets matched?
|
||||
</H3>
|
||||
<P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
<P>
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P><P>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
xaby</CODE>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"ab"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"a"</CODE></P></TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*([[:alnum:]]+).*</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
" abc def xyz "</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*(a|xayy)</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
zzxayyzz</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"zzxayy"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> xaby</CODE>
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> "ab"</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> "a"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> .*([[:alnum:]]+).*</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> " abc def xyz "</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> .*(a|xayy)</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> zzxayyzz</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> "zzxayy"</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>These differences between Perl matching rules, and POSIX matching rules, mean
|
||||
that these two regular expression syntaxes differ not only in the features
|
||||
offered, but also in the form that the state machine takes and/or the
|
||||
algorithms used to traverse the state machine.</p>
|
||||
<HR>
|
||||
algorithms used to traverse the state machine.</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
@ -24,13 +24,15 @@
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>Type syntax_option type is an implementation defined bitmask type that controls
|
||||
how a regular expression string is to be interpreted. For convenience
|
||||
note that all the constants listed here, are also duplicated within the scope
|
||||
of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<P>Type syntax_option type is an implementation specific bitmask type that
|
||||
controls how a regular expression string is to be interpreted. For
|
||||
convenience note that all the constants listed here, are also duplicated within
|
||||
the scope of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<PRE>namespace std{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type syntax_option_type;
|
||||
typedef <EM>implementation-specific-bitmask-type</EM>
|
||||
|
||||
syntax_option_type;
|
||||
// these flags are standardized:
|
||||
static const syntax_option_type normal;
|
||||
static const syntax_option_type icase;
|
||||
@ -50,7 +52,7 @@ static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>s
|
||||
} // namespace regex_constants
|
||||
} // namespace std</PRE>
|
||||
<H3>Description</H3>
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation specific bitmask
|
||||
type (17.3.2.1.2). Setting its elements has the effects listed in the table
|
||||
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
|
||||
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
|
||||
@ -314,18 +316,15 @@ static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>s
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
@ -25,25 +25,32 @@
|
||||
<BR>
|
||||
<BR>
|
||||
<HR>
|
||||
<P>The author can be contacted at john@johnmaddock.co.uk; the
|
||||
home page for this library is at <A href="http://www.boost.org">www.boost.org</A>.</P>
|
||||
<P>I am indebted to Robert Sedgewick's "Algorithms in C++" for forcing me to think
|
||||
about algorithms and their performance, and to the folks at boost for forcing
|
||||
me to <I>think</I>, period. The following people have all contributed useful
|
||||
comments or fixes: Dave Abrahams, Mike Allison, Edan Ayal, Jayashree
|
||||
Balasubramanian, Jan B<>lsche, Beman Dawes, Paul Baxter, David Bergman, David
|
||||
Dennerline, Edward Diener, Peter Dimov, Robert Dunn, Fabio Forno, Tobias
|
||||
Gabrielsson, Rob Gillen, Marc Gregoire, Chris Hecker, Nick Hodapp, Jesse Jones,
|
||||
Martin Jost, Boris Krasnovskiy, Jan Hermelink, Max Leung, Wei-hao Lin, Jens
|
||||
Maurer, Richard Peters, Heiko Schmidt, Jason Shirk, Gerald Slacik, Scobie
|
||||
Smith, Mike Smyth, Alexander Sokolovsky, Herv<72> Poirier, Michael Raykh, Marc
|
||||
Recht, Scott VanCamp, Bruno Voigt, Alexey Voinov, Jerry Waldorf, Rob Ward,
|
||||
Lealon Watts, Thomas Witt and Yuval Yosef. I am also grateful to the manuals
|
||||
supplied with the Henry Spencer, Perl and GNU regular expression libraries -
|
||||
wherever possible I have tried to maintain compatibility with these libraries
|
||||
and with the POSIX standard - the code however is entirely my own, including
|
||||
any bugs! I can absolutely guarantee that I will not fix any bugs I don't know
|
||||
about, so if you have any comments or spot any bugs, please get in touch.</P>
|
||||
<P>The author can be contacted at john@johnmaddock.co.uk; the home page for
|
||||
this library is at <A href="http://www.boost.org">www.boost.org</A>.</P>
|
||||
<P>I am indebted to <A href="http://www.cs.princeton.edu/~rs/">Robert Sedgewick's
|
||||
"Algorithms in C++" </A>for forcing me to think about algorithms and their
|
||||
performance, and to the folks at <A href="http://www.boost.org">boost</A> for
|
||||
forcing me to <I>think</I>, period.</P>
|
||||
<P><A href="http://www.boost-consulting.com">Eric Niebler</A>, author of the <A href="http://research.microsoft.com/projects/greta">
|
||||
GRETA regular expression component</A>, has shared several important ideas,
|
||||
in a series of long discussions.</P>
|
||||
<P>Pete Becker, of <A href="http://www.dinkumware.com/">Dinkumware Ltd</A>, has
|
||||
helped enormously with the standardisation proposal language.</P>
|
||||
<P>The following people have all contributed useful comments or fixes: Dave
|
||||
Abrahams, Mike Allison, Edan Ayal, Jayashree Balasubramanian, Jan B<>lsche,
|
||||
Beman Dawes, Paul Baxter, David Bergman, David Dennerline, Edward Diener, Peter
|
||||
Dimov, Robert Dunn, Fabio Forno, Tobias Gabrielsson, Rob Gillen, Marc Gregoire,
|
||||
Chris Hecker, Nick Hodapp, Jesse Jones, Martin Jost, Boris Krasnovskiy, Jan
|
||||
Hermelink, Max Leung, Wei-hao Lin, Jens Maurer, Richard Peters, Heiko Schmidt,
|
||||
Jason Shirk, Gerald Slacik, Scobie Smith, Mike Smyth, Alexander Sokolovsky,
|
||||
Herv<EFBFBD> Poirier, Michael Raykh, Marc Recht, Scott VanCamp, Bruno Voigt, Alexey
|
||||
Voinov, Jerry Waldorf, Rob Ward, Lealon Watts, John Wismar, Thomas Witt and
|
||||
Yuval Yosef. I am also grateful to the manuals supplied with the Henry Spencer,
|
||||
Perl and GNU regular expression libraries - wherever possible I have tried to
|
||||
maintain compatibility with these libraries and with the POSIX standard - the
|
||||
code however is entirely my own, including any bugs! I can absolutely guarantee
|
||||
that I will not fix any bugs I don't know about, so if you have any comments or
|
||||
spot any bugs, please get in touch.</P>
|
||||
<P>Useful further information can be found at:</P>
|
||||
<P>Short tutorials on regular expressions can be <A href="http://etext.lib.virginia.edu/helpsheets/regex.html">
|
||||
found here</A> and <A href="http://www.devshed.com/Server_Side/Administration/RegExp/page1.html">here</A>.</P>
|
||||
@ -72,8 +79,7 @@
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
229
doc/faq.html
229
doc/faq.html
@ -1,153 +1,114 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||
<title>Boost.Regex: FAQ</title>
|
||||
<meta http-equiv="Content-Type" content=
|
||||
"text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<p></p>
|
||||
|
||||
<table id="Table1" cellspacing="1" cellpadding="1" width="100%"
|
||||
border="0">
|
||||
<tr>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt=
|
||||
"C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<td width="353">
|
||||
<h1 align="center">Boost.Regex</h1>
|
||||
|
||||
<h2 align="center">FAQ</h2>
|
||||
</td>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt=
|
||||
"Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<br>
|
||||
<br>
|
||||
|
||||
|
||||
<hr>
|
||||
<font color="#ff0000"><font color="#ff0000"></font></font>
|
||||
<p><font color="#ff0000"><font color="#ff0000"><font color=
|
||||
"#ff0000"> Q. Why can't I use the "convenience" versions of
|
||||
regex_match / regex_search / regex_grep / regex_format /
|
||||
regex_merge?</font></font></font></p>
|
||||
|
||||
<p>A. These versions may or may not be available depending upon the
|
||||
capabilities of your compiler, the rules determining the format of
|
||||
these functions are quite complex - and only the versions visible
|
||||
to a standard compliant compiler are given in the help. To find out
|
||||
what your compiler supports, run <boost/regex.hpp> through
|
||||
your C++ pre-processor, and search the output file for the function
|
||||
that you are interested in.<font color="#ff0000"><font color=
|
||||
"#ff0000"></font></font></p>
|
||||
|
||||
<p><font color="#ff0000"><font color="#ff0000">Q. I can't get
|
||||
regex++ to work with escape characters, what's going
|
||||
on?</font></font></p>
|
||||
|
||||
<p>A. If you embed regular expressions in C++ code, then remember
|
||||
that escape characters are processed twice: once by the C++
|
||||
compiler, and once by the regex++ expression compiler, so to pass
|
||||
the regular expression \d+ to regex++, you need to embed "\\d+" in
|
||||
your code. Likewise to match a literal backslash you will need to
|
||||
embed "\\\\" in your code. <font color="#ff0000"></font></p>
|
||||
|
||||
<p><font color="#ff0000">Q. Why does using parenthesis in a POSIX
|
||||
regular expression change the result of a match?</font></p>
|
||||
|
||||
<p>For POSIX (extended and basic) regular expressions, but not for
|
||||
perl regexes, parentheses don't only mark; they determine what the
|
||||
best match is as well. When the expression is compiled as a POSIX
|
||||
basic or extended regex then Boost.regex follows the POSIX standard
|
||||
leftmost longest rule for determining what matched. So if there is
|
||||
more than one possible match after considering the whole
|
||||
expression, it looks next at the first sub-expression and then the
|
||||
second sub-expression and so on. So...</p>
|
||||
|
||||
<pre>
|
||||
<head>
|
||||
<title>Boost.Regex: FAQ</title>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||||
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
||||
</head>
|
||||
<body>
|
||||
<p></p>
|
||||
<table id="Table1" cellspacing="1" cellpadding="1" width="100%" border="0">
|
||||
<tr>
|
||||
<td valign="top" width="300">
|
||||
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
||||
</td>
|
||||
<td width="353">
|
||||
<h1 align="center">Boost.Regex</h1>
|
||||
<h2 align="center">FAQ</h2>
|
||||
</td>
|
||||
<td width="50">
|
||||
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
<br>
|
||||
<hr>
|
||||
<font color="#ff0000"><font color="#ff0000"></font></font>
|
||||
<p><font color="#ff0000"><font color="#ff0000"><font color="#ff0000"> Q. Why can't I
|
||||
use the "convenience" versions of regex_match / regex_search / regex_grep /
|
||||
regex_format / regex_merge?</font></font></font></p>
|
||||
<p>A. These versions may or may not be available depending upon the capabilities
|
||||
of your compiler, the rules determining the format of these functions are quite
|
||||
complex - and only the versions visible to a standard compliant compiler are
|
||||
given in the help. To find out what your compiler supports, run
|
||||
<boost/regex.hpp> through your C++ pre-processor, and search the output
|
||||
file for the function that you are interested in.<font color="#ff0000"><font color="#ff0000"></font></font></p>
|
||||
<p><font color="#ff0000"><font color="#ff0000">Q. I can't get regex++ to work with
|
||||
escape characters, what's going on?</font></font></p>
|
||||
<p>A. If you embed regular expressions in C++ code, then remember that escape
|
||||
characters are processed twice: once by the C++ compiler, and once by the
|
||||
regex++ expression compiler, so to pass the regular expression \d+ to regex++,
|
||||
you need to embed "\\d+" in your code. Likewise to match a literal backslash
|
||||
you will need to embed "\\\\" in your code. <font color="#ff0000"></font>
|
||||
</p>
|
||||
<p><font color="#ff0000">Q. Why does using parenthesis in a POSIX regular expression
|
||||
change the result of a match?</font></p>
|
||||
<p>For POSIX (extended and basic) regular expressions, but not for perl regexes,
|
||||
parentheses don't only mark; they determine what the best match is as well.
|
||||
When the expression is compiled as a POSIX basic or extended regex then
|
||||
Boost.regex follows the POSIX standard leftmost longest rule for determining
|
||||
what matched. So if there is more than one possible match after considering the
|
||||
whole expression, it looks next at the first sub-expression and then the second
|
||||
sub-expression and so on. So...</p>
|
||||
<pre>
|
||||
"(0*)([0-9]*)" against "00123" would produce
|
||||
$1 = "00"
|
||||
$2 = "123"
|
||||
</pre>
|
||||
|
||||
<p>where as</p>
|
||||
|
||||
<pre>
|
||||
"0*([0-9)*" against "00123" would produce
|
||||
<p>where as</p>
|
||||
<pre>
|
||||
"0*([0-9])*" against "00123" would produce
|
||||
$1 = "00123"
|
||||
</pre>
|
||||
|
||||
<p>If you think about it, had $1 only matched the "123", this would
|
||||
be "less good" than the match "00123" which is both further to the
|
||||
left and longer. If you want $1 to match only the "123" part, then
|
||||
you need to use something like:</p>
|
||||
|
||||
<pre>
|
||||
<p>If you think about it, had $1 only matched the "123", this would be "less good"
|
||||
than the match "00123" which is both further to the left and longer. If you
|
||||
want $1 to match only the "123" part, then you need to use something like:</p>
|
||||
<pre>
|
||||
"0*([1-9][0-9]*)"
|
||||
</pre>
|
||||
|
||||
<p>as the expression.</p>
|
||||
|
||||
<p><font color="#ff0000">Q. Why don't character ranges work
|
||||
properly (POSIX mode only)?</font><br>
|
||||
A. The POSIX standard specifies that character range expressions
|
||||
are locale sensitive - so for example the expression [A-Z] will
|
||||
match any collating element that collates between 'A' and 'Z'. That
|
||||
means that for most locales other than "C" or "POSIX", [A-Z] would
|
||||
match the single character 't' for example, which is not what most
|
||||
people expect - or at least not what most people have come to
|
||||
expect from regular expression engines. For this reason, the
|
||||
default behaviour of boost.regex (perl mode) is to turn locale
|
||||
sensitive collation off by not setting the regex_constants::collate
|
||||
compile time flag. However if you set a non-default compile time
|
||||
flag - for example regex_constants::extended or
|
||||
regex_constants::basic, then locale dependent collation will be
|
||||
enabled, this also applies to the POSIX API functions which use
|
||||
either regex_constants::extended or regex_constants::basic
|
||||
internally. <i>[Note - when regex_constants::nocollate in effect,
|
||||
the library behaves "as if" the LC_COLLATE locale category were
|
||||
always "C", regardless of what its actually set to - end
|
||||
note</i>].</p>
|
||||
|
||||
<p><font color="#ff0000">Q. Why are there no throw specifications
|
||||
on any of the functions? What exceptions can the library
|
||||
throw?</font></p>
|
||||
|
||||
<p>A. Not all compilers support (or honor) throw specifications,
|
||||
others support them but with reduced efficiency. Throw
|
||||
specifications may be added at a later date as compilers begin to
|
||||
handle this better. The library should throw only three types of
|
||||
exception: boost::bad_expression can be thrown by basic_regex when
|
||||
compiling a regular expression, std::runtime_error can be thrown
|
||||
when a call to basic_regex::imbue tries to open a message catalogue
|
||||
that doesn't exist, or when a call to regex_search or regex_match
|
||||
results in an "everlasting" search, or when a call to
|
||||
RegEx::GrepFiles or RegEx::FindFiles tries to open a file that
|
||||
cannot be opened, finally std::bad_alloc can be thrown by just
|
||||
about any of the functions in this library.</p>
|
||||
|
||||
<p></p>
|
||||
|
||||
<hr>
|
||||
<p>as the expression.</p>
|
||||
<p><font color="#ff0000">Q. Why don't character ranges work properly (POSIX mode
|
||||
only)?</font><br>
|
||||
A. The POSIX standard specifies that character range expressions are locale
|
||||
sensitive - so for example the expression [A-Z] will match any collating
|
||||
element that collates between 'A' and 'Z'. That means that for most locales
|
||||
other than "C" or "POSIX", [A-Z] would match the single character 't' for
|
||||
example, which is not what most people expect - or at least not what most
|
||||
people have come to expect from regular expression engines. For this reason,
|
||||
the default behaviour of boost.regex (perl mode) is to turn locale sensitive
|
||||
collation off by not setting the regex_constants::collate compile time flag.
|
||||
However if you set a non-default compile time flag - for example
|
||||
regex_constants::extended or regex_constants::basic, then locale dependent
|
||||
collation will be enabled, this also applies to the POSIX API functions which
|
||||
use either regex_constants::extended or regex_constants::basic internally. <i>[Note
|
||||
- when regex_constants::nocollate in effect, the library behaves "as if" the
|
||||
LC_COLLATE locale category were always "C", regardless of what its actually set
|
||||
to - end note</i>].</p>
|
||||
<p><font color="#ff0000">Q. Why are there no throw specifications on any of the
|
||||
functions? What exceptions can the library throw?</font></p>
|
||||
<p>A. Not all compilers support (or honor) throw specifications, others support
|
||||
them but with reduced efficiency. Throw specifications may be added at a later
|
||||
date as compilers begin to handle this better. The library should throw only
|
||||
three types of exception: boost::bad_expression can be thrown by basic_regex
|
||||
when compiling a regular expression, std::runtime_error can be thrown when a
|
||||
call to basic_regex::imbue tries to open a message catalogue that doesn't
|
||||
exist, or when a call to regex_search or regex_match results in an
|
||||
"everlasting" search, or when a call to RegEx::GrepFiles or
|
||||
RegEx::FindFiles tries to open a file that cannot be opened, finally
|
||||
std::bad_alloc can be thrown by just about any of the functions in this
|
||||
library.</p>
|
||||
<p></p>
|
||||
<hr>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
@ -46,10 +46,10 @@
|
||||
<dl class="index">
|
||||
<dt><a href="syntax_option_type.html">syntax_option_type</a></dt> <dt><a href="match_flag_type.html">
|
||||
match_flag_type</a></dt> <dt><a href="bad_expression.html">class bad_expression</a></dt>
|
||||
<dt><a href="regex_traits.html">class regex_traits</a></dt> <dt><a href="basic_regex.html">
|
||||
class template basic_regex</a></dt> <dt><a href="sub_match.html">class template
|
||||
sub_match</a></dt> <dt><a href="match_results.html">class template
|
||||
match_results</a></dt>
|
||||
<dt><a href="regex_traits.html">class regex_traits</a></dt>
|
||||
<dt><a href="basic_regex.html">class template basic_regex</a></dt>
|
||||
<dt><a href="sub_match.html">class template sub_match</a></dt>
|
||||
<dt><a href="match_results.html">class template match_results</a></dt>
|
||||
</dl>
|
||||
</dd>
|
||||
<dt>Algorithms</dt>
|
||||
@ -66,6 +66,25 @@
|
||||
<dt><a href="regex_token_iterator.html">regex_token_iterator</a></dt>
|
||||
</dl>
|
||||
</dd>
|
||||
<dt>Typedefs</dt>
|
||||
<dd>
|
||||
<dl class="index">
|
||||
<dt><a href="basic_regex.html">regex</a> [ = basic_regex<char> ]</dt>
|
||||
<dt><a href="basic_regex.html">wregex</a> [ = basic_regex<wchar_t> ]</dt>
|
||||
<dt><a href="match_results.html">cmatch</a> [ = match_results<const char*> ]</dt>
|
||||
<dt><a href="match_results.html">wcmatch</a> [ = match_results<const wchar_t*> ]</dt>
|
||||
<dt><a href="match_results.html">smatch</a> [ = match_results<std::string::const_iterator> ]</dt>
|
||||
<dt><a href="match_results.html">wsmatch</a> [ = match_results<std::wstring::const_iterator> ]</dt>
|
||||
<dt><a href="regex_iterator.html">cregex_iterator</a> [ = regex_iterator<const char*>]</dt>
|
||||
<dt><a href="regex_iterator.html">wcregex_iterator</a> [ = regex_iterator<const wchar_t*>]</dt>
|
||||
<dt><a href="regex_iterator.html">sregex_iterator</a> [ = regex_iterator<std::string::const_iterator>]</dt>
|
||||
<dt><a href="regex_iterator.html">wsregex_iterator</a> [ = regex_iterator<std::wstring::const_iterator>]</dt>
|
||||
<dt><a href="regex_token_iterator.html">cregex_token_iterator</a> [ = regex_token_iterator<const char*>]</dt>
|
||||
<dt><a href="regex_token_iterator.html">wcregex_token_iterator</a> [ = regex_token_iterator<const wchar_t*>]</dt>
|
||||
<dt><a href="regex_token_iterator.html">sregex_token_iterator</a> [ = regex_token_iterator<std::string::const_iterator>]</dt>
|
||||
<dt><a href="regex_token_iterator.html">wsregex_token_iterator</a> [ = regex_token_iterator<std::wstring::const_iterator>]</dt>
|
||||
</dl>
|
||||
</dd>
|
||||
<dt>Misc.</dt>
|
||||
<dd>
|
||||
<dl class="index">
|
||||
|
@ -26,14 +26,14 @@
|
||||
<br>
|
||||
<hr>
|
||||
<h3>Synopsis</h3>
|
||||
<p>The type <code>match_flag_type</code> is an implementation defined bitmask type
|
||||
(17.3.2.1.2) that controls how a regular expression is matched against a
|
||||
<p>The type <code>match_flag_type</code> is an implementation specific bitmask
|
||||
type (17.3.2.1.2) that controls how a regular expression is matched against a
|
||||
character sequence. The behavior of the format flags is descibed in more
|
||||
detail in the <A href="format_syntax.html">format syntax guide</A>.</p>
|
||||
<pre>
|
||||
namespace std{ namespace regex_constants{
|
||||
namespace boost{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type match_flag_type;
|
||||
typedef <EM>implemenation-specific-bitmask-type</EM> match_flag_type;
|
||||
|
||||
static const match_flag_type match_default = 0;
|
||||
static const match_flag_type match_not_bob;
|
||||
@ -59,11 +59,11 @@ static const match_flag_type format_first_only;
|
||||
static const match_flag_type format_all;
|
||||
|
||||
} // namespace regex_constants
|
||||
} // namespace std
|
||||
} // namespace boost
|
||||
</pre>
|
||||
<h3>Description</h3>
|
||||
<p>The type <code>match_flag_type</code> is an implementation defined bitmask type
|
||||
(17.3.2.1.2). When matching a regular expression against a sequence of
|
||||
<p>The type <code>match_flag_type</code> is an implementation specific bitmask
|
||||
type (17.3.2.1.2). When matching a regular expression against a sequence of
|
||||
characters [first, last) then setting its elements has the effects listed in
|
||||
the table below:</p>
|
||||
<p></p>
|
||||
@ -271,10 +271,10 @@ static const match_flag_type format_all;
|
||||
<br>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -344,7 +344,7 @@ const_iterator end()const;
|
||||
<p><b>Effects:</b> Returns a terminating iterator that enumerates over all the
|
||||
marked sub-expression matches stored in *this.</p>
|
||||
<h4><A name="format"></A>match_results reformatting</h4>
|
||||
<pre>template <class OutputIterator>
|
||||
<pre><A name=m12></A>template <class OutputIterator>
|
||||
OutputIterator format(OutputIterator out,
|
||||
const string_type& fmt,
|
||||
<A href="match_flag_type.html" >match_flag_type</A> flags = format_default);
|
||||
|
998
doc/regex.html
998
doc/regex.html
File diff suppressed because it is too large
Load Diff
@ -42,7 +42,7 @@
|
||||
iterator first,
|
||||
iterator last,
|
||||
<b>const</b> basic_regex<charT, traits, Allocator>& e,
|
||||
<b>unsigned</b> flags = match_default)
|
||||
boost::match_flag_type flags = match_default)
|
||||
</pre>
|
||||
<p>The library also defines the following convenience versions, which take either
|
||||
a const charT*, or a const std::basic_string<>& in place of a pair of
|
||||
@ -53,13 +53,13 @@
|
||||
<b>unsigned</b> <b>int</b> regex_grep(Predicate foo,
|
||||
<b>const</b> charT* str,
|
||||
<b>const</b> basic_regex<charT, traits, Allocator>& e,
|
||||
<b>unsigned</b> flags = match_default);
|
||||
boost::match_flag_type flags = match_default);
|
||||
|
||||
<b>template</b> <<b>class</b> Predicate, <b>class</b> ST, <b>class</b> SA, <b>class</b> Allocator, <b>class</b> charT, <b>class</b> traits>
|
||||
<b>unsigned</b> <b>int</b> regex_grep(Predicate foo,
|
||||
<b>const</b> std::basic_string<charT, ST, SA>& s,
|
||||
<b>const</b> basic_regex<charT, traits, Allocator>& e,
|
||||
<b>unsigned</b> flags = match_default);
|
||||
boost::match_flag_type flags = match_default);
|
||||
</pre>
|
||||
<p>The parameters for the primary version of regex_grep have the following
|
||||
meanings: </p>
|
||||
@ -370,11 +370,10 @@ index[std::string(what[5].first, what[5].second) + std::string(what[6].first, wh
|
||||
<hr>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -294,7 +294,7 @@ void</B> IndexClasses(map_type& m, <B>const</B> std::string& file)
|
||||
start = file.begin();
|
||||
end = file.end();
|
||||
boost::<a href="match_results.html">match_results</a><std::string::const_iterator> what;
|
||||
<B>unsigned</B> <B>int</B> flags = boost::match_default;
|
||||
boost::match_flag_type flags = boost::match_default;
|
||||
<B>while</B>(regex_search(start, end, what, expression, flags))
|
||||
{
|
||||
<FONT color=#000080> <I>// what[0] contains the whole string
|
||||
@ -314,11 +314,10 @@ void</B> IndexClasses(map_type& m, <B>const</B> std::string& file)
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -38,15 +38,15 @@
|
||||
<PRE><B>template</B> <<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1, <B>class</B> Traits2, <B>class</B> Alloc2>
|
||||
std::size_t regex_split(OutputIterator out,
|
||||
std::basic_string<charT, Traits1, Alloc1>& s,
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
<B> unsigned</B> flags,
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
<STRONG> </STRONG>boost::match_flag_type flags,
|
||||
std::size_t max_split);
|
||||
|
||||
<B>template</B> <<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1, <B>class</B> Traits2, <B>class</B> Alloc2>
|
||||
std::size_t regex_split(OutputIterator out,
|
||||
std::basic_string<charT, Traits1, Alloc1>& s,
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
<B>unsigned</B> flags = match_default);
|
||||
<B> const</B> basic_regex<charT, Traits2, Alloc2>& e,
|
||||
boost::match_flag_type flags = match_default);
|
||||
|
||||
<B>template</B> <<B>class</B> OutputIterator, <B>class</B> charT, <B>class</B> Traits1, <B>class</B> Alloc1>
|
||||
std::size_t regex_split(OutputIterator out,
|
||||
@ -134,11 +134,10 @@ boost::regex e(<FONT color=#000080>"<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
04 Feb 2004
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2004<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
|
@ -76,7 +76,7 @@ typedef regex_token_iterator<const char*> cregex_token_i
|
||||
typedef regex_token_iterator<std::string::const_iterator> sregex_token_iterator;
|
||||
#ifndef BOOST_NO_WREGEX
|
||||
typedef regex_token_iterator<const wchar_t*> wcregex_token_iterator;
|
||||
typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_iterator;
|
||||
typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_iterator;
|
||||
#endif
|
||||
</PRE>
|
||||
<H3><A name="description"></A>Description</H3>
|
||||
@ -84,7 +84,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
|
||||
<P><B> Effects:</B> constructs an end of sequence iterator.</P>
|
||||
<PRE><A name=c2></A>regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
|
||||
int submatch = 0, match_flag_type m = match_default);</PRE>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>.</P>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>. Object re shall exist
|
||||
for the lifetime of the iterator constructed from it.</P>
|
||||
<P><B> Effects:</B> constructs a regex_token_iterator that will enumerate one
|
||||
string for each regular expression match of the expression <EM>re</EM> found
|
||||
within the sequence <EM>[a,b)</EM>, using match flags <EM>m</EM>. The
|
||||
@ -99,7 +100,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
|
||||
configured</A> in non-recursive mode).</P>
|
||||
<PRE><A name=c3></A>regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
|
||||
const std::vector<int>& submatches, match_flag_type m = match_default);</PRE>
|
||||
<P><B> Preconditions:</B> <CODE>submatches.size() && !re.empty()</CODE>.</P>
|
||||
<P><B> Preconditions:</B> <CODE>submatches.size() && !re.empty()</CODE>.
|
||||
Object re shall exist for the lifetime of the iterator constructed from it.</P>
|
||||
<P><B> Effects:</B> constructs a regex_token_iterator that will enumerate <EM>submatches.size()</EM>
|
||||
strings for each regular expression match of the expression <EM>re</EM> found
|
||||
within the sequence <EM>[a,b)</EM>, using match flags <EM>m</EM>. For
|
||||
@ -118,7 +120,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
|
||||
<PRE><A name=c4></A>template <std::size_t N>
|
||||
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
|
||||
const int (&submatches)[R], match_flag_type m = match_default);</PRE>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>.</P>
|
||||
<P><B> Preconditions: </B><CODE>!re.empty()</CODE>. Object re shall exist
|
||||
for the lifetime of the iterator constructed from it.</P>
|
||||
<P><STRONG>Effects:</STRONG></B> constructs a regex_token_iterator that will
|
||||
enumerate <EM>R</EM> strings for each regular expression match of the
|
||||
expression <EM>re</EM> found within the sequence <EM>[a,b)</EM>, using match
|
||||
|
@ -24,10 +24,12 @@
|
||||
</P>
|
||||
<HR>
|
||||
<p></p>
|
||||
<P>Under construction.</P>
|
||||
<P>The current boost.regex traits class design will be migrated to that specified
|
||||
in the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">regular
|
||||
expression standardization proposal</A>. </P>
|
||||
<P>
|
||||
Under construction: the current design will be replaced by that specified in
|
||||
the <A href="http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1429.htm">regular
|
||||
expression standardization proposal</A>, the current (obsolete) design has
|
||||
it's <A href="http://cvs.sourceforge.net/viewcvs.py/*checkout*/boost/boost/libs/regex/Attic/traits_class_ref.htm?rev=1.11">
|
||||
documentation archived online</A>.</P>
|
||||
<P>
|
||||
<HR>
|
||||
<P></P>
|
||||
@ -36,11 +38,9 @@
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
199
doc/syntax.html
199
doc/syntax.html
@ -91,18 +91,18 @@
|
||||
<P>Parentheses serve two purposes, to group items together into a sub-expression,
|
||||
and to mark what generated the match. For example the expression "(ab)*" would
|
||||
match all of the string "ababab". The matching algorithms <A href="regex_match.html">
|
||||
regex_match</A> and <A href="regex_search.html">regex_search</A>
|
||||
each take an instance of <A href="match_results.html">match_results</A>
|
||||
that reports what caused the match, on exit from these functions the <A href="match_results.html">
|
||||
match_results</A> contains information both on what the whole expression
|
||||
matched and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the final "ab" of
|
||||
the matching string. It is permissible for sub-expressions to match null
|
||||
strings. If a sub-expression takes no part in a match - for example if it is
|
||||
part of an alternative that is not taken - then both of the iterators that are
|
||||
returned for that sub-expression point to the end of the input string, and the <I>matched</I>
|
||||
parameter for that sub-expression is <I>false</I>. Sub-expressions are indexed
|
||||
from left to right starting from 1, sub-expression 0 is the whole expression.
|
||||
regex_match</A> and <A href="regex_search.html">regex_search</A> each take
|
||||
an instance of <A href="match_results.html">match_results</A> that reports what
|
||||
caused the match, on exit from these functions the <A href="match_results.html">match_results</A>
|
||||
contains information both on what the whole expression matched and on what each
|
||||
sub-expression matched. In the example above match_results[1] would contain a
|
||||
pair of iterators denoting the final "ab" of the matching string. It is
|
||||
permissible for sub-expressions to match null strings. If a sub-expression
|
||||
takes no part in a match - for example if it is part of an alternative that is
|
||||
not taken - then both of the iterators that are returned for that
|
||||
sub-expression point to the end of the input string, and the <I>matched</I> parameter
|
||||
for that sub-expression is <I>false</I>. Sub-expressions are indexed from left
|
||||
to right starting from 1, sub-expression 0 is the whole expression.
|
||||
</P>
|
||||
<H3>Non-Marking Parenthesis
|
||||
</H3>
|
||||
@ -143,7 +143,7 @@
|
||||
<P>A set is a set of characters that can match any single character that is a
|
||||
member of the set. Sets are delimited by "[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and equivalence
|
||||
classes. Set declarations that start with "^" contain the compliment of the
|
||||
classes. Set declarations that start with "^" contain the complement of the
|
||||
elements that follow.
|
||||
</P>
|
||||
<P>Examples:
|
||||
@ -293,7 +293,7 @@
|
||||
[^[.ae.]] would only match one character.
|
||||
</P>
|
||||
<P>
|
||||
Equivalence classes take the general form[=tagname=] inside a set declaration,
|
||||
Equivalence classes take the generalform[=tagname=] inside a set declaration,
|
||||
where <I>tagname</I> is either a single character, or a name of a collating
|
||||
element, and matches any character that is a member of the same primary
|
||||
equivalence class as the collating element [.tagname.]. An equivalence class is
|
||||
@ -302,7 +302,7 @@
|
||||
typically collated by character, then by accent, and then by case; the primary
|
||||
sort key then relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class corresponding to <I>tagname</I>
|
||||
, then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
,then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
|
||||
locale independent method of obtaining the primary sort key for a character,
|
||||
except under Win32. For other operating systems the library will "guess" the
|
||||
primary sort key from the full sort key (obtained from <I>strxfrm</I>), so
|
||||
@ -666,106 +666,103 @@
|
||||
<H3>What gets matched?
|
||||
</H3>
|
||||
<P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
When the expression is compiled as a Perl-compatible regex then the matching
|
||||
algorithms will perform a depth first search on the state machine and report
|
||||
the first match found.</P>
|
||||
<P>
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P><P>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
When the expression is compiled as a POSIX-compatible regex then the matching
|
||||
algorithms will match the first possible matching string, if more than one
|
||||
string starting at a given location can match then it matches the longest
|
||||
possible string, unless the flag match_any is set, in which case the first
|
||||
match encountered is returned. Use of the match_any option can reduce the time
|
||||
taken to find the match - but is only useful if the user is less concerned
|
||||
about what matched - for example it would not be suitable for search and
|
||||
replace operations. In cases where their are multiple possible matches all
|
||||
starting at the same location, and all of the same length, then the match
|
||||
chosen is the one with the longest first sub-expression, if that is the same
|
||||
for two or more matches, then the second sub-expression will be examined and so
|
||||
on.
|
||||
</P>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
xaby</CODE>
|
||||
The following table examples illustrate the main differences between Perl and
|
||||
POSIX regular expression matching rules:
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"ab"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"a"</CODE></P></TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*([[:alnum:]]+).*</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
" abc def xyz "</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
.*(a|xayy)</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
zzxayyzz</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>
|
||||
"zzxayy"</CODE></P></TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>
|
||||
<TABLE id="Table5" cellSpacing="1" cellPadding="7" width="624" border="1">
|
||||
<TBODY>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Expression</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>Text</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>POSIX leftmost longest match</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>ECMAScript depth first search match</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>a|ab</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> xaby</CODE>
|
||||
</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> "ab"</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> "a"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> .*([[:alnum:]]+).*</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> " abc def xyz "</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "abc"</P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P>$0 = " abc def xyz "<BR>
|
||||
$1 = "z"</P>
|
||||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> .*(a|xayy)</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> zzxayyzz</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE> "zzxayy"</CODE></P>
|
||||
</TD>
|
||||
<TD vAlign="top" width="25%">
|
||||
<P><CODE>"zzxa"</CODE></P>
|
||||
</TD>
|
||||
</TR>
|
||||
</TBODY></CODE></TD></TR></TABLE>
|
||||
<P>These differences between Perl matching rules, and POSIX matching rules, mean
|
||||
that these two regular expression syntaxes differ not only in the features
|
||||
offered, but also in the form that the state machine takes and/or the
|
||||
algorithms used to traverse the state machine.</p>
|
||||
<HR>
|
||||
algorithms used to traverse the state machine.</P>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
@ -24,13 +24,15 @@
|
||||
</P>
|
||||
<HR>
|
||||
<H3>Synopsis</H3>
|
||||
<P>Type syntax_option type is an implementation defined bitmask type that controls
|
||||
how a regular expression string is to be interpreted. For convenience
|
||||
note that all the constants listed here, are also duplicated within the scope
|
||||
of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<P>Type syntax_option type is an implementation specific bitmask type that
|
||||
controls how a regular expression string is to be interpreted. For
|
||||
convenience note that all the constants listed here, are also duplicated within
|
||||
the scope of class template <A href="basic_regex.html">basic_regex</A>.</P>
|
||||
<PRE>namespace std{ namespace regex_constants{
|
||||
|
||||
typedef bitmask_type syntax_option_type;
|
||||
typedef <EM>implementation-specific-bitmask-type</EM>
|
||||
|
||||
syntax_option_type;
|
||||
// these flags are standardized:
|
||||
static const syntax_option_type normal;
|
||||
static const syntax_option_type icase;
|
||||
@ -50,7 +52,7 @@ static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>s
|
||||
} // namespace regex_constants
|
||||
} // namespace std</PRE>
|
||||
<H3>Description</H3>
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation defined bitmask
|
||||
<P>The type <CODE>syntax_option_type</CODE> is an implementation specific bitmask
|
||||
type (17.3.2.1.2). Setting its elements has the effects listed in the table
|
||||
below, a valid value of type <CODE>syntax_option_type</CODE> will always have
|
||||
exactly one of the elements <CODE>normal, basic, extended, awk, grep, egrep, sed
|
||||
@ -314,18 +316,15 @@ static const syntax_option_type perl;<BR>// these are boost.regex specific:<BR>s
|
||||
</TR>
|
||||
</TABLE>
|
||||
</P>
|
||||
<HR>
|
||||
<HR>
|
||||
<p>Revised
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
||||
24 Oct 2003
|
||||
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
||||
<p><i><EFBFBD> Copyright John Maddock 1998-
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
||||
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
||||
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
||||
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
||||
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
|
@ -35,6 +35,7 @@ using std::getline;
|
||||
#include <boost/regex.hpp>
|
||||
#include <boost/timer.hpp>
|
||||
#include <boost/smart_ptr.hpp>
|
||||
#include <boost/minmax.hpp>
|
||||
|
||||
#if (defined(_MSC_VER) && (_MSC_VER <= 1300)) || defined(__sgi)
|
||||
// maybe no Koenig lookup, use using declaration instead:
|
||||
@ -145,7 +146,7 @@ int main(int argc, char**argv)
|
||||
double tim;
|
||||
bool result;
|
||||
int iters = 100;
|
||||
double wait_time = std::min(t.elapsed_min() * 1000, 1.0);
|
||||
double wait_time = boost::std_min(t.elapsed_min() * 1000, 1.0);
|
||||
|
||||
while(true)
|
||||
{
|
||||
|
@ -213,7 +213,7 @@ public:
|
||||
{
|
||||
difference_type dist = boost::re_detail::distance(a,b);
|
||||
states *= states;
|
||||
difference_type lim = std::numeric_limits<difference_type>::max() - 1000 - states;
|
||||
difference_type lim = (std::numeric_limits<difference_type>::max)() - 1000 - states;
|
||||
if(dist > (difference_type)(lim / states))
|
||||
max_state_count = lim;
|
||||
else
|
||||
|
@ -57,7 +57,7 @@ template class BOOST_REGEX_DECL reg_expression< BOOST_REGEX_CHAR_T >;
|
||||
# include BOOST_ABI_SUFFIX
|
||||
#endif
|
||||
|
||||
#elif defined(BOOST_MSVC) || defined(__GNUC__)
|
||||
#elif (defined(BOOST_MSVC) && defined(_MSC_EXTENSIONS)) || defined(__GNUC__)
|
||||
|
||||
# ifndef BOOST_REGEX_INSTANTIATE
|
||||
# define template extern template
|
||||
|
@ -78,7 +78,7 @@ void perl_matcher<BidiIterator, Allocator, traits, Allocator2>::estimate_max_sta
|
||||
difference_type dist = boost::re_detail::distance(base, last);
|
||||
traits_size_type states = static_cast<traits_size_type>(re.size());
|
||||
states *= states;
|
||||
difference_type lim = std::numeric_limits<difference_type>::max() - 1000 - states;
|
||||
difference_type lim = (std::numeric_limits<difference_type>::max)() - 1000 - states;
|
||||
if(dist > (difference_type)(lim / states))
|
||||
max_state_count = lim;
|
||||
else
|
||||
@ -205,10 +205,10 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::find_imp()
|
||||
else
|
||||
{
|
||||
// start again:
|
||||
search_base = position = (*m_presult)[0].second;
|
||||
search_base = position = m_result[0].second;
|
||||
// If last match was null and match_not_null was not set then increment
|
||||
// our start position, otherwise we go into an infinite loop:
|
||||
if(((m_match_flags & match_not_null) == 0) && (m_presult->length() == 0))
|
||||
if(((m_match_flags & match_not_null) == 0) && (m_result.length() == 0))
|
||||
{
|
||||
if(position == last)
|
||||
return false;
|
||||
|
@ -546,7 +546,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_dot_repeat
|
||||
return match_dot_repeat_slow();
|
||||
|
||||
const re_repeat* rep = static_cast<const re_repeat*>(pstate);
|
||||
unsigned count = std::min(static_cast<unsigned>(re_detail::distance(position, last)), static_cast<unsigned>(rep->greedy ? rep->max : rep->min));
|
||||
unsigned count = (std::min)(static_cast<unsigned>(re_detail::distance(position, last)), static_cast<unsigned>(rep->greedy ? rep->max : rep->min));
|
||||
if(rep->min > count)
|
||||
return false; // not enough text left to match
|
||||
std::advance(position, count);
|
||||
@ -593,7 +593,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_char_repea
|
||||
if(::boost::is_random_access_iterator<BidiIterator>::value)
|
||||
{
|
||||
BidiIterator end = position;
|
||||
std::advance(end, std::min((unsigned)re_detail::distance(position, last), desired));
|
||||
std::advance(end, (std::min)((unsigned)re_detail::distance(position, last), desired));
|
||||
BidiIterator origin(position);
|
||||
while((position != end) && (traits_inst.translate(*position, icase) == what))
|
||||
{
|
||||
@ -660,7 +660,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_set_repeat
|
||||
if(::boost::is_random_access_iterator<BidiIterator>::value)
|
||||
{
|
||||
BidiIterator end = position;
|
||||
std::advance(end, std::min((unsigned)re_detail::distance(position, last), desired));
|
||||
std::advance(end, (std::min)((unsigned)re_detail::distance(position, last), desired));
|
||||
BidiIterator origin(position);
|
||||
while((position != end) && map[(traits_uchar_type)traits_inst.translate(*position, icase)])
|
||||
{
|
||||
@ -727,7 +727,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_long_set_r
|
||||
if(::boost::is_random_access_iterator<BidiIterator>::value)
|
||||
{
|
||||
BidiIterator end = position;
|
||||
std::advance(end, std::min((unsigned)re_detail::distance(position, last), desired));
|
||||
std::advance(end, (std::min)((unsigned)re_detail::distance(position, last), desired));
|
||||
BidiIterator origin(position);
|
||||
while((position != end) && (position != re_is_set_member(position, last, set, re)))
|
||||
{
|
||||
|
@ -400,7 +400,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_dot_repeat
|
||||
// start by working out how much we can skip:
|
||||
//
|
||||
const re_repeat* rep = static_cast<const re_repeat*>(pstate);
|
||||
unsigned count = std::min(static_cast<unsigned>(re_detail::distance(position, last)), (rep->greedy ? rep->max : rep->min));
|
||||
unsigned count = (std::min)(static_cast<unsigned>(re_detail::distance(position, last)), (rep->greedy ? rep->max : rep->min));
|
||||
if(rep->min > count)
|
||||
return false; // not enough text left to match
|
||||
std::advance(position, count);
|
||||
@ -458,7 +458,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_char_repea
|
||||
if(::boost::is_random_access_iterator<BidiIterator>::value)
|
||||
{
|
||||
BidiIterator end = position;
|
||||
std::advance(end, std::min((unsigned)re_detail::distance(position, last), desired));
|
||||
std::advance(end, (std::min)((unsigned)re_detail::distance(position, last), desired));
|
||||
BidiIterator origin(position);
|
||||
while((position != end) && (traits_inst.translate(*position, icase) == what))
|
||||
{
|
||||
@ -507,8 +507,16 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_char_repea
|
||||
return false;
|
||||
if(position == last)
|
||||
return false;
|
||||
position = ++save_pos;
|
||||
++count;
|
||||
position = save_pos;
|
||||
if(traits_inst.translate(*position, icase) == what)
|
||||
{
|
||||
++position;
|
||||
++count;
|
||||
}
|
||||
else
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}while(true);
|
||||
#ifdef __BORLANDC__
|
||||
#pragma option pop
|
||||
@ -538,7 +546,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_set_repeat
|
||||
if(::boost::is_random_access_iterator<BidiIterator>::value)
|
||||
{
|
||||
BidiIterator end = position;
|
||||
std::advance(end, std::min((unsigned)re_detail::distance(position, last), desired));
|
||||
std::advance(end, (std::min)((unsigned)re_detail::distance(position, last), desired));
|
||||
BidiIterator origin(position);
|
||||
while((position != end) && map[(traits_uchar_type)traits_inst.translate(*position, icase)])
|
||||
{
|
||||
@ -587,8 +595,16 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_set_repeat
|
||||
return false;
|
||||
if(position == last)
|
||||
return false;
|
||||
position = ++save_pos;
|
||||
++count;
|
||||
position = save_pos;
|
||||
if(map[(traits_uchar_type)traits_inst.translate(*position, icase)])
|
||||
{
|
||||
++position;
|
||||
++count;
|
||||
}
|
||||
else
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}while(true);
|
||||
#ifdef __BORLANDC__
|
||||
#pragma option pop
|
||||
@ -618,7 +634,7 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_long_set_r
|
||||
if(::boost::is_random_access_iterator<BidiIterator>::value)
|
||||
{
|
||||
BidiIterator end = position;
|
||||
std::advance(end, std::min((unsigned)re_detail::distance(position, last), desired));
|
||||
std::advance(end, (std::min)((unsigned)re_detail::distance(position, last), desired));
|
||||
BidiIterator origin(position);
|
||||
while((position != end) && (position != re_is_set_member(position, last, set, re)))
|
||||
{
|
||||
@ -667,8 +683,16 @@ bool perl_matcher<BidiIterator, Allocator, traits, Allocator2>::match_long_set_r
|
||||
return false;
|
||||
if(position == last)
|
||||
return false;
|
||||
position = ++save_pos;
|
||||
++count;
|
||||
position = save_pos;
|
||||
if(position != re_is_set_member(position, last, set, re))
|
||||
{
|
||||
++position;
|
||||
++count;
|
||||
}
|
||||
else
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}while(true);
|
||||
#ifdef __BORLANDC__
|
||||
#pragma option pop
|
||||
|
@ -3,8 +3,8 @@
|
||||
* Copyright (c) 1998-2002
|
||||
* Dr John Maddock
|
||||
*
|
||||
* Use, modification and distribution are subject to the
|
||||
* Boost Software License, Version 1.0. (See accompanying file
|
||||
* Use, modification and distribution are subject to the
|
||||
* Boost Software License, Version 1.0. (See accompanying file
|
||||
* LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
||||
*
|
||||
*/
|
||||
@ -36,18 +36,111 @@ class match_results;
|
||||
|
||||
namespace re_detail{
|
||||
|
||||
template <class O, class I>
|
||||
O BOOST_REGEX_CALL re_copy_out(O out, I first, I last)
|
||||
// make_upper and make_lower should ideally be implemented in regex_traits
|
||||
#if defined(_WIN32) && !defined(BOOST_REGEX_NO_W32)
|
||||
|
||||
//
|
||||
// VC6 needs to link to user32.lib, as do all compilers that
|
||||
// claim to be VC6/7 compatible:
|
||||
//
|
||||
#if defined(_MSC_VER) && !defined(__BORLANDC__)
|
||||
#pragma comment(lib, "user32.lib")
|
||||
#endif
|
||||
|
||||
inline wchar_t make_upper(wchar_t c)
|
||||
{
|
||||
return LOWORD(::CharUpperW(reinterpret_cast<wchar_t*>(static_cast<unsigned short>(c))));
|
||||
}
|
||||
|
||||
inline char make_upper(char c)
|
||||
{
|
||||
return static_cast<char>(LOWORD(::CharUpperA(reinterpret_cast<char*>(static_cast<unsigned short>(c)))));
|
||||
}
|
||||
|
||||
inline wchar_t make_lower(wchar_t c)
|
||||
{
|
||||
return LOWORD(::CharLowerW(reinterpret_cast<wchar_t*>(static_cast<unsigned short>(c))));
|
||||
}
|
||||
|
||||
inline char make_lower(char c)
|
||||
{
|
||||
return static_cast<char>(LOWORD(::CharLowerA(reinterpret_cast<char*>(static_cast<unsigned short>(c)))));
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
// TODO: make this traits class sensitive:
|
||||
#ifndef BOOST_NO_WREGEX
|
||||
inline wchar_t make_upper(wchar_t c)
|
||||
{
|
||||
return (std::towupper)(c);
|
||||
}
|
||||
|
||||
inline wchar_t make_lower(wchar_t c)
|
||||
{
|
||||
return (std::towlower)(c);
|
||||
}
|
||||
#endif
|
||||
inline char make_upper(char c)
|
||||
{
|
||||
return static_cast<char>((std::toupper)(c));
|
||||
}
|
||||
|
||||
inline char make_lower(char c)
|
||||
{
|
||||
return static_cast<char>((std::tolower)(c));
|
||||
}
|
||||
|
||||
#endif //defined(_WIN32) && !defined(BOOST_REGEX_NO_W32)
|
||||
|
||||
typedef enum {
|
||||
case_nochange,
|
||||
case_oneupper,
|
||||
case_onelower,
|
||||
case_allupper,
|
||||
case_alllower
|
||||
} case_flags_type;
|
||||
|
||||
// traits_type is unused, but provided to make it possible to use it for case conversion
|
||||
template <class O, class charT, class traits_type>
|
||||
void BOOST_REGEX_CALL output_char(O& out, charT c, traits_type& /*t*/, case_flags_type& f)
|
||||
{
|
||||
switch (f) {
|
||||
case case_oneupper:
|
||||
f = case_nochange;
|
||||
// drop through
|
||||
case case_allupper:
|
||||
*out = make_upper(c);
|
||||
break;
|
||||
case case_onelower:
|
||||
f = case_nochange;
|
||||
// drop through
|
||||
case case_alllower:
|
||||
*out = make_lower(c);
|
||||
break;
|
||||
default:
|
||||
*out = c;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
template <class O, class I, class traits_type>
|
||||
O BOOST_REGEX_CALL re_copy_out(O out, I first, I last, traits_type& t, case_flags_type& f)
|
||||
{
|
||||
while(first != last)
|
||||
{
|
||||
*out = *first;
|
||||
if (f != case_nochange)
|
||||
output_char(out, *first, t, f);
|
||||
else
|
||||
*out = *first;
|
||||
|
||||
++out;
|
||||
++first;
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
|
||||
template <class charT, class traits_type>
|
||||
void BOOST_REGEX_CALL re_skip_format(const charT*& fmt, const traits_type& traits_inst)
|
||||
{
|
||||
@ -134,10 +227,11 @@ namespace{
|
||||
// _reg_format_aux does the actual work:
|
||||
//
|
||||
template <class OutputIterator, class Iterator, class Allocator, class charT, class traits_type>
|
||||
OutputIterator BOOST_REGEX_CALL _reg_format_aux(OutputIterator out,
|
||||
const match_results<Iterator, Allocator>& m,
|
||||
OutputIterator BOOST_REGEX_CALL _reg_format_aux(OutputIterator out,
|
||||
const match_results<Iterator, Allocator>& m,
|
||||
const charT*& fmt,
|
||||
match_flag_type flags, const traits_type& traits_inst)
|
||||
match_flag_type flags, const traits_type& traits_inst,
|
||||
case_flags_type& case_flags)
|
||||
{
|
||||
#ifdef __BORLANDC__
|
||||
#pragma option push -w-8037
|
||||
@ -171,11 +265,13 @@ OutputIterator BOOST_REGEX_CALL _reg_format_aux(OutputIterator out,
|
||||
switch(traits_inst.syntax_type((traits_size_type)(traits_uchar_type)(*fmt)))
|
||||
{
|
||||
case traits_type::syntax_start_buffer:
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[-1].first), Iterator(m[-1].second)));
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[-1].first), Iterator(m[-1].second),
|
||||
traits_inst, case_flags));
|
||||
++fmt;
|
||||
continue;
|
||||
case traits_type::syntax_end_buffer:
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[-2].first), Iterator(m[-2].second)));
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[-2].first), Iterator(m[-2].second),
|
||||
traits_inst, case_flags));
|
||||
++fmt;
|
||||
continue;
|
||||
case traits_type::syntax_digit:
|
||||
@ -183,14 +279,16 @@ OutputIterator BOOST_REGEX_CALL _reg_format_aux(OutputIterator out,
|
||||
expand_sub:
|
||||
unsigned int index = traits_inst.toi(fmt, fmt_end, 10);
|
||||
if(index < m.size())
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[index].first), Iterator(m[index].second)));
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[index].first), Iterator(m[index].second),
|
||||
traits_inst, case_flags));
|
||||
continue;
|
||||
}
|
||||
}
|
||||
// anything else:
|
||||
if(*fmt == '&')
|
||||
{
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[0].first), Iterator(m[0].second)));
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[0].first), Iterator(m[0].second),
|
||||
traits_inst, case_flags));
|
||||
++fmt;
|
||||
}
|
||||
else
|
||||
@ -327,6 +425,32 @@ expand_sub:
|
||||
else
|
||||
c = (charT)traits_inst.toi(fmt, fmt_end, -8);
|
||||
break;
|
||||
|
||||
case traits_type::syntax_u:
|
||||
++fmt;
|
||||
if(flags & format_sed) break;
|
||||
case_flags = case_oneupper;
|
||||
continue;
|
||||
case traits_type::syntax_l:
|
||||
++fmt;
|
||||
if(flags & format_sed) break;
|
||||
case_flags = case_onelower;
|
||||
continue;
|
||||
case traits_type::syntax_U:
|
||||
++fmt;
|
||||
if(flags & format_sed) break;
|
||||
case_flags = case_allupper;
|
||||
continue;
|
||||
case traits_type::syntax_L:
|
||||
++fmt;
|
||||
if(flags & format_sed) break;
|
||||
case_flags = case_alllower;
|
||||
continue;
|
||||
case traits_type::syntax_E:
|
||||
++fmt;
|
||||
if(flags & format_sed) break;
|
||||
case_flags = case_nochange;
|
||||
continue;
|
||||
default:
|
||||
//c = *fmt;
|
||||
++fmt;
|
||||
@ -346,7 +470,7 @@ expand_sub:
|
||||
else
|
||||
{
|
||||
++fmt; // recurse
|
||||
oi_assign(&out, _reg_format_aux(out, m, fmt, flags, traits_inst));
|
||||
oi_assign(&out, _reg_format_aux(out, m, fmt, flags, traits_inst, case_flags));
|
||||
continue;
|
||||
}
|
||||
case traits_type::syntax_close_bracket:
|
||||
@ -395,7 +519,7 @@ expand_sub:
|
||||
unsigned int id = traits_inst.toi(fmt, fmt_end, 10);
|
||||
if(m[id].matched)
|
||||
{
|
||||
oi_assign(&out, _reg_format_aux(out, m, fmt, flags | regex_constants::format_is_if, traits_inst));
|
||||
oi_assign(&out, _reg_format_aux(out, m, fmt, flags | regex_constants::format_is_if, traits_inst, case_flags));
|
||||
if(traits_inst.syntax_type((traits_size_type)(traits_uchar_type)(*(fmt-1))) == traits_type::syntax_colon)
|
||||
re_skip_format(fmt, traits_inst);
|
||||
}
|
||||
@ -403,7 +527,7 @@ expand_sub:
|
||||
{
|
||||
re_skip_format(fmt, traits_inst);
|
||||
if(traits_inst.syntax_type((traits_size_type)(traits_uchar_type)(*(fmt-1))) == traits_type::syntax_colon)
|
||||
oi_assign(&out, _reg_format_aux(out, m, fmt, flags | regex_constants::format_is_if, traits_inst));
|
||||
oi_assign(&out, _reg_format_aux(out, m, fmt, flags | regex_constants::format_is_if, traits_inst, case_flags));
|
||||
}
|
||||
return out;
|
||||
}
|
||||
@ -412,11 +536,13 @@ expand_sub:
|
||||
default_opt:
|
||||
if((flags & format_sed) && (*fmt == '&'))
|
||||
{
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[0].first), Iterator(m[0].second)));
|
||||
oi_assign(&out, re_copy_out(out, Iterator(m[0].first), Iterator(m[0].second),
|
||||
traits_inst, case_flags));
|
||||
++fmt;
|
||||
continue;
|
||||
}
|
||||
*out = *fmt;
|
||||
|
||||
output_char(out, *fmt, traits_inst, case_flags);
|
||||
++out;
|
||||
++fmt;
|
||||
}
|
||||
@ -441,10 +567,10 @@ public:
|
||||
string_out_iterator& operator++() { return *this; }
|
||||
string_out_iterator& operator++(int) { return *this; }
|
||||
string_out_iterator& operator*() { return *this; }
|
||||
string_out_iterator& operator=(typename S::value_type v)
|
||||
{
|
||||
out->append(1, v);
|
||||
return *this;
|
||||
string_out_iterator& operator=(typename S::value_type v)
|
||||
{
|
||||
out->append(1, v);
|
||||
return *this;
|
||||
}
|
||||
};
|
||||
|
||||
@ -467,9 +593,17 @@ public:
|
||||
bool BOOST_REGEX_CALL operator()(const boost::match_results<Iterator, alloc_type>& m)
|
||||
{
|
||||
const charT* f = fmt;
|
||||
case_flags_type cf = case_nochange;
|
||||
if(0 == (flags & format_no_copy))
|
||||
oi_assign(out, re_copy_out(*out, Iterator(m[-1].first), Iterator(m[-1].second)));
|
||||
oi_assign(out, _reg_format_aux(*out, m, f, flags, *pt));
|
||||
{
|
||||
oi_assign(out, re_copy_out(
|
||||
*out,
|
||||
Iterator(m[-1].first),
|
||||
Iterator(m[-1].second),
|
||||
*pt,
|
||||
cf));
|
||||
}
|
||||
oi_assign(out, _reg_format_aux(*out, m, f, flags, *pt, cf));
|
||||
*last = m[-2].first;
|
||||
return flags & format_first_only ? false : true;
|
||||
}
|
||||
@ -485,7 +619,9 @@ OutputIterator regex_format(OutputIterator out,
|
||||
)
|
||||
{
|
||||
regex_traits<charT> t;
|
||||
return re_detail::_reg_format_aux(out, m, fmt, flags, t);
|
||||
|
||||
re_detail::case_flags_type cf = re_detail::case_nochange;
|
||||
return re_detail::_reg_format_aux(out, m, fmt, flags, t, cf);
|
||||
}
|
||||
|
||||
template <class OutputIterator, class Iterator, class Allocator, class charT>
|
||||
@ -497,12 +633,14 @@ OutputIterator regex_format(OutputIterator out,
|
||||
{
|
||||
regex_traits<charT> t;
|
||||
const charT* start = fmt.c_str();
|
||||
return re_detail::_reg_format_aux(out, m, start, flags, t);
|
||||
}
|
||||
|
||||
re_detail::case_flags_type cf = re_detail::case_nochange;
|
||||
return re_detail::_reg_format_aux(out, m, start, flags, t, cf);
|
||||
}
|
||||
|
||||
template <class Iterator, class Allocator, class charT>
|
||||
std::basic_string<charT> regex_format(const match_results<Iterator, Allocator>& m,
|
||||
const charT* fmt,
|
||||
std::basic_string<charT> regex_format(const match_results<Iterator, Allocator>& m,
|
||||
const charT* fmt,
|
||||
match_flag_type flags = format_all)
|
||||
{
|
||||
std::basic_string<charT> result;
|
||||
@ -512,8 +650,8 @@ std::basic_string<charT> regex_format(const match_results<Iterator, Allocator>&
|
||||
}
|
||||
|
||||
template <class Iterator, class Allocator, class charT>
|
||||
std::basic_string<charT> regex_format(const match_results<Iterator, Allocator>& m,
|
||||
const std::basic_string<charT>& fmt,
|
||||
std::basic_string<charT> regex_format(const match_results<Iterator, Allocator>& m,
|
||||
const std::basic_string<charT>& fmt,
|
||||
match_flag_type flags = format_all)
|
||||
{
|
||||
std::basic_string<charT> result;
|
||||
|
@ -55,23 +55,22 @@ inline unsigned int regex_grep(Predicate foo,
|
||||
return count; // we've reached the end, don't try and find an extra null match.
|
||||
if(m.length() == 0)
|
||||
{
|
||||
if(m[0].second == last)
|
||||
return count;
|
||||
// we found a NULL-match, now try to find
|
||||
// a non-NULL one at the same position:
|
||||
BidiIterator last_end(m[0].second);
|
||||
if(last_end == last)
|
||||
return count;
|
||||
match_results<BidiIterator, match_allocator_type> m2(m);
|
||||
matcher.setf(match_not_null | match_continuous);
|
||||
if(matcher.find())
|
||||
{
|
||||
++count;
|
||||
last_end = m[0].second;
|
||||
if(0 == foo(m))
|
||||
return count;
|
||||
}
|
||||
else
|
||||
{
|
||||
// reset match back to where it was:
|
||||
m.set_second(last_end);
|
||||
m = m2;
|
||||
}
|
||||
matcher.unsetf((match_not_null | match_continuous) & ~flags);
|
||||
}
|
||||
|
@ -76,6 +76,14 @@ template <class BidirectionalIterator,
|
||||
class traits = regex_traits<charT>,
|
||||
class Allocator = BOOST_DEFAULT_ALLOCATOR(charT) >
|
||||
class regex_iterator
|
||||
#ifndef BOOST_NO_STD_ITERATOR
|
||||
: public std::iterator<
|
||||
std::forward_iterator_tag,
|
||||
match_results<BidirectionalIterator>,
|
||||
typename re_detail::regex_iterator_traits<BidirectionalIterator>::difference_type,
|
||||
const match_results<BidirectionalIterator>*,
|
||||
const match_results<BidirectionalIterator>& >
|
||||
#endif
|
||||
{
|
||||
private:
|
||||
typedef regex_iterator_implementation<BidirectionalIterator, charT, traits, Allocator> impl;
|
||||
|
@ -39,7 +39,7 @@ OutputIterator regex_replace(OutputIterator out,
|
||||
Iterator l = first;
|
||||
re_detail::merge_out_predicate<OutputIterator, Iterator, charT, Allocator, traits> oi(out, l, fmt, flags, e.get_traits());
|
||||
regex_grep(oi, first, last, e, flags);
|
||||
return (flags & format_no_copy) ? out : re_detail::re_copy_out(out, l, last);
|
||||
return (flags & format_no_copy) ? out : std::copy(l, last, out);
|
||||
}
|
||||
|
||||
template <class OutputIterator, class Iterator, class traits, class Allocator, class charT>
|
||||
|
@ -51,11 +51,7 @@ template <class BidirectionalIterator,
|
||||
class regex_token_iterator_implementation
|
||||
{
|
||||
typedef basic_regex<charT, traits, Allocator> regex_type;
|
||||
#if 1
|
||||
typedef sub_match<BidirectionalIterator> value_type;
|
||||
#else
|
||||
typedef std::basic_string<charT> value_type;
|
||||
#endif
|
||||
|
||||
match_results<BidirectionalIterator> what; // current match
|
||||
BidirectionalIterator end; // end of search area
|
||||
@ -163,6 +159,14 @@ template <class BidirectionalIterator,
|
||||
class traits = regex_traits<charT>,
|
||||
class Allocator = BOOST_DEFAULT_ALLOCATOR(charT) >
|
||||
class regex_token_iterator
|
||||
#ifndef BOOST_NO_STD_ITERATOR
|
||||
: public std::iterator<
|
||||
std::forward_iterator_tag,
|
||||
sub_match<BidirectionalIterator>,
|
||||
typename re_detail::regex_iterator_traits<BidirectionalIterator>::difference_type,
|
||||
const sub_match<BidirectionalIterator>*,
|
||||
const sub_match<BidirectionalIterator>& >
|
||||
#endif
|
||||
{
|
||||
private:
|
||||
typedef regex_token_iterator_implementation<BidirectionalIterator, charT, traits, Allocator> impl;
|
||||
|
@ -35,6 +35,8 @@ struct sub_match : public std::pair<BidiIterator, BidiIterator>
|
||||
typedef typename re_detail::regex_iterator_traits<BidiIterator>::difference_type difference_type;
|
||||
#endif
|
||||
typedef BidiIterator iterator_type;
|
||||
typedef BidiIterator iterator;
|
||||
typedef BidiIterator const_iterator;
|
||||
|
||||
bool matched;
|
||||
|
||||
|
@ -63,7 +63,7 @@ struct results
|
||||
safe_greta_time(-1),
|
||||
posix_time(-1),
|
||||
pcre_time(-1),
|
||||
factor(std::numeric_limits<double>::max()),
|
||||
factor((std::numeric_limits<double>::max)()),
|
||||
expression(ex),
|
||||
description(desc)
|
||||
{}
|
||||
|
@ -9,6 +9,7 @@
|
||||
*
|
||||
*/
|
||||
|
||||
#include <boost/minmax.hpp>
|
||||
#include "regex_comparison.hpp"
|
||||
#include <boost/timer.hpp>
|
||||
#include <boost/regex.hpp>
|
||||
@ -45,7 +46,7 @@ double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
boost::regex_match(text, what, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
@ -86,7 +87,7 @@ double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
boost::regex_grep(&dummy_grep_proc, text, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
@ -9,6 +9,7 @@
|
||||
*
|
||||
*/
|
||||
|
||||
#include <boost/minmax.hpp>
|
||||
#include "regex_comparison.hpp"
|
||||
#if defined(BOOST_HAS_GRETA)
|
||||
#include <cassert>
|
||||
@ -48,7 +49,7 @@ double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
e.match(text, what);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
@ -94,7 +95,7 @@ double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
@ -12,6 +12,7 @@
|
||||
#include "regex_comparison.hpp"
|
||||
#include <boost/timer.hpp>
|
||||
#include <boost/regex.hpp>
|
||||
#include <boost/minmax.hpp>
|
||||
|
||||
namespace bl{
|
||||
|
||||
@ -45,7 +46,7 @@ double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
boost::regex_match(text, what, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
@ -86,7 +87,7 @@ double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
boost::regex_grep(&dummy_grep_proc, text, e);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
@ -11,6 +11,7 @@
|
||||
|
||||
#include <cassert>
|
||||
#include <cfloat>
|
||||
#include <boost/minmax.hpp>
|
||||
#include "regex_comparison.hpp"
|
||||
#ifdef BOOST_HAS_PCRE
|
||||
#include "pcre.h"
|
||||
@ -69,7 +70,7 @@ double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
free(ppcre);
|
||||
free(pe);
|
||||
@ -152,7 +153,7 @@ double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
@ -11,6 +11,7 @@
|
||||
|
||||
#include <cassert>
|
||||
#include <cfloat>
|
||||
#include <boost/minmax.hpp>
|
||||
#include "regex_comparison.hpp"
|
||||
#ifdef BOOST_HAS_POSIX
|
||||
#include <boost/timer.hpp>
|
||||
@ -50,7 +51,7 @@ double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
regexec(&e, text.c_str(), e.re_nsub, what, 0);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
regfree(&e);
|
||||
return result / iter;
|
||||
@ -116,7 +117,7 @@ double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
@ -14,6 +14,7 @@
|
||||
|
||||
#include <cassert>
|
||||
#include <boost/timer.hpp>
|
||||
#include <boost/minmax.hpp>
|
||||
#include "regexpr2.h"
|
||||
|
||||
namespace gs{
|
||||
@ -49,7 +50,7 @@ double time_match(const std::string& re, const std::string& text, bool icase)
|
||||
e.match(text, what);
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
@ -96,7 +97,7 @@ double time_find_all(const std::string& re, const std::string& text, bool icase)
|
||||
}
|
||||
}
|
||||
run = tim.elapsed();
|
||||
result = std::min(run, result);
|
||||
result = std_min(run, result);
|
||||
}
|
||||
return result / iter;
|
||||
}
|
||||
|
@ -136,7 +136,7 @@ nl_catd message_cat = (nl_catd)-1;
|
||||
unsigned int message_count = 0;
|
||||
std::string* mess_locale;
|
||||
|
||||
BOOST_REGEX_DECL char* re_custom_error_messages[] = {
|
||||
char* re_custom_error_messages[] = {
|
||||
0,
|
||||
0,
|
||||
0,
|
||||
@ -182,8 +182,8 @@ std::size_t BOOST_REGEX_CALL _re_get_message(char* buf, std::size_t len, std::si
|
||||
|
||||
#ifndef BOOST_NO_WREGEX
|
||||
|
||||
BOOST_REGEX_DECL boost::regex_wchar_type re_zero_w;
|
||||
BOOST_REGEX_DECL boost::regex_wchar_type re_ten_w;
|
||||
boost::regex_wchar_type re_zero_w;
|
||||
boost::regex_wchar_type re_ten_w;
|
||||
|
||||
unsigned int nlsw_count = 0;
|
||||
std::string* wlocale_name = 0;
|
||||
|
@ -108,7 +108,7 @@ std::list<collate_name_t>* pcoll_names = 0;
|
||||
|
||||
HINSTANCE hresmod = 0;
|
||||
|
||||
BOOST_REGEX_DECL char* re_custom_error_messages[] = {
|
||||
char* re_custom_error_messages[] = {
|
||||
0,
|
||||
0,
|
||||
0,
|
||||
@ -147,8 +147,8 @@ enum syntax_map_size
|
||||
|
||||
#ifndef BOOST_NO_WREGEX
|
||||
|
||||
BOOST_REGEX_DECL boost::regex_wchar_type re_zero_w;
|
||||
BOOST_REGEX_DECL boost::regex_wchar_type re_ten_w;
|
||||
boost::regex_wchar_type re_zero_w;
|
||||
boost::regex_wchar_type re_ten_w;
|
||||
|
||||
bool isPlatformNT = false;
|
||||
|
||||
|
@ -1,94 +0,0 @@
|
||||
# copyright John Maddock 2003
|
||||
|
||||
#
|
||||
# This Jamfile tests the ability of some Windows compilers
|
||||
# to automatically link to the right lib file,
|
||||
# it is not generally applicable.
|
||||
#
|
||||
|
||||
subproject libs/regex/test/auto-link-test ;
|
||||
|
||||
# bring in the rules for testing
|
||||
import testing ;
|
||||
|
||||
run
|
||||
# sources
|
||||
<template>../../build/regex-options
|
||||
../regress/parse.cpp
|
||||
../regress/regress.cpp
|
||||
../regress/tests.cpp
|
||||
<lib>../../../test/build/boost_prg_exec_monitor
|
||||
:
|
||||
: # input files
|
||||
../regress/tests.txt
|
||||
: # requirements
|
||||
<library-path>../../../../stage/lib
|
||||
<define>BOOST_LIB_DIAGNOSTIC=1
|
||||
: # program name
|
||||
regex_regress
|
||||
|
||||
;
|
||||
|
||||
run
|
||||
# sources
|
||||
<template>../../build/regex-options
|
||||
../regress/parse.cpp
|
||||
../regress/regress.cpp
|
||||
../regress/tests.cpp
|
||||
<lib>../../../test/build/boost_prg_exec_monitor
|
||||
:
|
||||
: # input files
|
||||
../regress/tests.txt
|
||||
: # requirements
|
||||
<library-path>../../../../stage/lib
|
||||
<define>TEST_UNICODE=1
|
||||
<define>BOOST_LIB_DIAGNOSTIC=1
|
||||
: # program name
|
||||
wide_regex_regress
|
||||
|
||||
;
|
||||
|
||||
# and now the dll versions:
|
||||
|
||||
run
|
||||
# sources
|
||||
<template>../../build/regex-options
|
||||
../regress/parse.cpp
|
||||
../regress/regress.cpp
|
||||
../regress/tests.cpp
|
||||
<lib>../../../test/build/boost_prg_exec_monitor
|
||||
:
|
||||
: # input files
|
||||
../regress/tests.txt
|
||||
: # requirements
|
||||
<library-path>../../../../stage/lib
|
||||
<define>BOOST_ALL_DYN_LINK=1
|
||||
<runtime-link>dynamic
|
||||
<define>BOOST_LIB_DIAGNOSTIC=1
|
||||
: # program name
|
||||
regex_regress_dll
|
||||
|
||||
;
|
||||
|
||||
run
|
||||
# sources
|
||||
<template>../../build/regex-options
|
||||
../regress/parse.cpp
|
||||
../regress/regress.cpp
|
||||
../regress/tests.cpp
|
||||
<lib>../../../test/build/boost_prg_exec_monitor
|
||||
:
|
||||
: # input files
|
||||
../regress/tests.txt
|
||||
: # requirements
|
||||
<define>BOOST_ALL_DYN_LINK=1
|
||||
<runtime-link>dynamic
|
||||
<library-path>../../../../stage/lib
|
||||
<define>TEST_UNICODE=1
|
||||
<define>BOOST_LIB_DIAGNOSTIC=1
|
||||
: # program name
|
||||
wide_regex_regress_dll
|
||||
|
||||
;
|
||||
|
||||
|
@ -1,16 +0,0 @@
|
||||
# copyright John Maddock 2003
|
||||
|
||||
subproject libs/regex/test/captures ;
|
||||
|
||||
EX_SOURCES = c_regex_traits c_regex_traits_common cpp_regex_traits
|
||||
cregex fileiter posix_api regex regex_debug
|
||||
regex_synch w32_regex_traits wide_posix_api instances winstances ;
|
||||
|
||||
lib boost_regex_extra : ../../src/$(EX_SOURCES).cpp <template>../../build/regex-options
|
||||
:
|
||||
<define>BOOST_REGEX_MATCH_EXTRA=1
|
||||
:
|
||||
;
|
||||
|
||||
|
||||
|
@ -1,90 +0,0 @@
|
||||
|
||||
#include <boost/regex.hpp>
|
||||
#include <boost/test/test_tools.hpp>
|
||||
#include <boost/array.hpp>
|
||||
|
||||
#define ARRAY_SIZE(x) (sizeof(x) / sizeof(x[0]))
|
||||
|
||||
template <class T>
|
||||
void test_captures(const std::string& regx, const std::string& text, T& expected)
|
||||
{
|
||||
boost::regex e(regx);
|
||||
boost::smatch what;
|
||||
if(boost::regex_match(text, what, e, boost::match_extra))
|
||||
{
|
||||
unsigned i, j;
|
||||
BOOST_TEST(what.size() == ARRAY_SIZE(expected));
|
||||
for(i = 0; i < what.size(); ++i)
|
||||
{
|
||||
BOOST_TEST(what.captures(i).size() <= ARRAY_SIZE(expected[i]));
|
||||
for(j = 0; j < what.captures(i).size(); ++j)
|
||||
{
|
||||
BOOST_TEST(what.captures(i)[j] == expected[i][j]);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
int test_main(int , char* [])
|
||||
{
|
||||
typedef const char* pchar;
|
||||
pchar e1[4][5] =
|
||||
{
|
||||
{ "aBBcccDDDDDeeeeeeee", },
|
||||
{ "a", "BB", "ccc", "DDDDD", "eeeeeeee", },
|
||||
{ "a", "ccc", "eeeeeeee", },
|
||||
{ "BB", "DDDDD", },
|
||||
};
|
||||
test_captures("(([[:lower:]]+)|([[:upper:]]+))+", "aBBcccDDDDDeeeeeeee", e1);
|
||||
pchar e2[4][2] =
|
||||
{
|
||||
{ "abd" },
|
||||
{ "b", "" },
|
||||
{ "" },
|
||||
};
|
||||
test_captures("a(b+|((c)*))+d", "abd", e2);
|
||||
pchar e3[3][1] =
|
||||
{
|
||||
{ "abcbar" },
|
||||
{ "abc" },
|
||||
};
|
||||
test_captures("(.*)bar|(.*)bah", "abcbar", e3);
|
||||
pchar e4[3][1] =
|
||||
{
|
||||
{ "abcbah" },
|
||||
{ 0, },
|
||||
{ "abc" },
|
||||
};
|
||||
test_captures("(.*)bar|(.*)bah", "abcbah", e4);
|
||||
pchar e5[2][16] =
|
||||
{
|
||||
{ "now is the time for all good men to come to the aid of the party" },
|
||||
{ "now", "is", "the", "time", "for", "all", "good", "men", "to", "come", "to", "the", "aid", "of", "the", "party" },
|
||||
};
|
||||
test_captures("^(?:(\\w+)|(?>\\W+))*$", "now is the time for all good men to come to the aid of the party", e5);
|
||||
pchar e6[2][16] =
|
||||
{
|
||||
{ "now is the time for all good men to come to the aid of the party" },
|
||||
{ "now", "is", "the", "time", "for", "all", "good", "men", "to", "come", "to", "the", "aid", "of", "the", "party" },
|
||||
};
|
||||
test_captures("^(?>(\\w+)\\W*)*$", "now is the time for all good men to come to the aid of the party", e6);
|
||||
pchar e7[4][14] =
|
||||
{
|
||||
{ "now is the time for all good men to come to the aid of the party" },
|
||||
{ "now" },
|
||||
{ "is", "the", "time", "for", "all", "good", "men", "to", "come", "to", "the", "aid", "of", "the" },
|
||||
{ "party" },
|
||||
};
|
||||
test_captures("^(\\w+)\\W+(?>(\\w+)\\W+)*(\\w+)$", "now is the time for all good men to come to the aid of the party", e7);
|
||||
pchar e8[5][9] =
|
||||
{
|
||||
{ "now is the time for all good men to come to the aid of the party" } ,
|
||||
{ "now" },
|
||||
{ "is", "for", "men", "to", "of" },
|
||||
{ "the", "time", "all", "good", "to", "come", "the", "aid", "the" },
|
||||
{ "party" },
|
||||
};
|
||||
test_captures("^(\\w+)\\W+(?>(\\w+)\\W+(?:(\\w+)\\W+){0,2})*(\\w+)$", "now is the time for all good men to come to the aid of the party", e8);
|
||||
return 0;
|
||||
}
|
||||
|
@ -87,7 +87,19 @@ int main()
|
||||
r.assign(s, boost::regex::perl);
|
||||
r.assign(c_exp, c_exp+1);
|
||||
r.assign(c_exp, c_exp+1, boost::regex::perl);
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#ifndef BOOST_NO_STD_ITERATOR
|
||||
//
|
||||
//check iterators work with std lib algorithms:
|
||||
//
|
||||
boost::cregex_iterator ri, rj;
|
||||
std::distance(ri, rj);
|
||||
std::advance(ri, 0);
|
||||
boost::cregex_token_iterator rk, rm;
|
||||
std::distance(rk, rm);
|
||||
std::advance(rk, 0);
|
||||
#endif
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
@ -1018,6 +1018,7 @@ ab.{2,5}? ab__ 0 4
|
||||
ab.{2,5}? ab_______ 0 4
|
||||
ab.{2,5}?xy ab______xy -1 -1
|
||||
ab.{2,5}xy ab_xy -1 -1
|
||||
(.*?).somesite \n\n555.somesite 2 14 2 5
|
||||
|
||||
; now again for single character repeats:
|
||||
|
||||
@ -1054,6 +1055,7 @@ ab_{2,5}? ab__ 0 4
|
||||
ab_{2,5}? ab_______ 0 4
|
||||
ab_{2,5}?xy ab______xy -1 -1
|
||||
ab_{2,5}xy ab_xy -1 -1
|
||||
(5*?).somesite //555.somesite 2 14 2 5
|
||||
|
||||
; and again for sets:
|
||||
ab[_,;]*xy abxy_ 0 4
|
||||
@ -1089,6 +1091,7 @@ ab[_,;]{2,5}? ab__ 0 4
|
||||
ab[_,;]{2,5}? ab_______ 0 4
|
||||
ab[_,;]{2,5}?xy ab______xy -1 -1
|
||||
ab[_,;]{2,5}xy ab_xy -1 -1
|
||||
(\d*?).somesite //555.somesite 2 14 2 5
|
||||
|
||||
; and again for tricky sets with digraphs:
|
||||
ab[_[.ae.]]*xy abxy_ 0 4
|
||||
@ -1124,6 +1127,7 @@ ab[_[.ae.]]{2,5}? ab__ 0 4
|
||||
ab[_[.ae.]]{2,5}? ab_______ 0 4
|
||||
ab[_[.ae.]]{2,5}?xy ab______xy -1 -1
|
||||
ab[_[.ae.]]{2,5}xy ab_xy -1 -1
|
||||
([5[.ae.]]*?).somesite //555.somesite 2 14 2 5
|
||||
|
||||
; new bugs detected in spring 2003:
|
||||
- normal match_continuous REG_NO_POSIX_TEST
|
||||
@ -1249,3 +1253,16 @@ a(bbb+|bb+|b)bb abbb 0 4
|
||||
(.*).* abcdef 0 6
|
||||
(a*)* bc 0 0
|
||||
|
||||
; new merge tests for case convertions:
|
||||
- match_default normal REG_PERL REG_STARTEND REG_MERGE
|
||||
abc "xyzabcCD" "\u$&" "xyzAbcCD"
|
||||
abc "xyzabcCD" "\U$&\E" "xyzABCCD"
|
||||
ABC "xyzABCCD" "\l$&" "xyzaBCCD"
|
||||
ABC "xyzABCCD" "\L$&\E" "xyzabcCD"
|
||||
|
||||
- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_sed
|
||||
abc "xyzabcCD" "\u\0" "xyzuabcCD"
|
||||
abc "xyzabcCD" "\U\0\E" "xyzUabcECD"
|
||||
ABC "xyzABCCD" "\l\0" "xyzlABCCD"
|
||||
ABC "xyzABCCD" "\L\0\E" "xyzLABCECD"
|
||||
|
||||
|
Reference in New Issue
Block a user