Compare commits

...

5 Commits

Author SHA1 Message Date
8441264579 This commit was manufactured by cvs2svn to create tag
'Version_1_30_1'.

[SVN r19444]
2003-08-04 17:55:29 +00:00
644e41ba31 Added minor gcc 3.3 fixes (warning suppression)
[SVN r19415]
2003-08-03 11:59:11 +00:00
51fe53f437 Fixed bug that effect some searches
[SVN r18003]
2003-03-19 12:19:14 +00:00
6d4dd63cba Merged from main truck
[SVN r17910]
2003-03-14 12:18:04 +00:00
686a939498 This commit was manufactured by cvs2svn to create branch 'RC_1_30_0'.
[SVN r17693]
2003-03-01 19:43:06 +00:00
15 changed files with 45 additions and 7510 deletions

File diff suppressed because it is too large Load Diff

205
faq.htm
View File

@ -1,205 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++ - FAQ</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, FAQ.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<p><font color="#FF0000">Q. Why does using parenthesis in a
regular expression change the result of a match?</font></p>
<p>Parentheses don't only mark; they determine what the best
match is as well. regex++ tries to follow the POSIX standard
leftmost longest rule for determining what matched. So if there
is more than one possible match after considering the whole
expression, it looks next at the first sub-expression and then
the second sub-expression and so on. So...</p>
<pre>&quot;(0*)([0-9]*)&quot; against &quot;00123&quot; would produce
$1 = &quot;00&quot;
$2 = &quot;123&quot;</pre>
<p>where as</p>
<pre>&quot;0*([0-9)*&quot; against &quot;00123&quot; would produce
$1 = &quot;00123&quot;</pre>
<p>If you think about it, had $1 only matched the &quot;123&quot;,
this would be &quot;less good&quot; than the match &quot;00123&quot;
which is both further to the left and longer. If you want $1 to
match only the &quot;123&quot; part, then you need to use
something like:</p>
<pre>&quot;0*([1-9][0-9]*)&quot;</pre>
<p>as the expression.</p>
<p><font color="#FF0000">Q. Configure says that my compiler is
unable to merge template instances, what does this mean?</font> </p>
<p>A. When you compile template code, you can end up with the
same template instances in multiple translation units - this will
lead to link time errors unless your compiler/linker is smart
enough to merge these template instances into a single record in
the executable file. If you see this warning after running
configure, then you can still link to libregex++.a if: </p>
<ol>
<li>You use only the low-level template classes (reg_expression&lt;&gt;
match_results&lt;&gt; etc), from a single translation
unit, and use no other part of regex++.</li>
<li>You use only the POSIX API functions (regcomp regexec etc),
and no other part of regex++.</li>
<li>You use only the high level class RegEx, and no other
part of regex++. </li>
</ol>
<p>Another option is to create a master include file, which
#include's all the regex++ source files, and all the source files
in which you use regex++. You then compile and link this master
file as a single translation unit. </p>
<p><font color="#FF0000">Q. Configure says that my compiler is
unable to merge template instances from archive files, what does
this mean?</font> </p>
<p>A. When you compile template code, you can end up with the
same template instances in multiple translation units - this will
lead to link time errors unless your compiler/linker is smart
enough to merge these template instances into a single record in
the executable file. Some compilers are able to do this for
normal .cpp or .o files, but fail if the object file has been
placed in a library archive. If you see this warning after
running configure, then you can still link to libregex++.a if: </p>
<ol>
<li>You use only the low-level template classes (reg_expression&lt;&gt;
match_results&lt;&gt; etc), and use no other part of
regex++.</li>
<li>You use only the POSIX API functions (regcomp regexec etc),
and no other part of regex++.</li>
<li>You use only the high level class RegEx, and no other
part of regex++. </li>
</ol>
<p>Another option is to add the regex++ source files directly to
your project instead of linking to libregex++.a, generally you
should do this only if you are getting link time errors with
libregex++.a. </p>
<p><font color="#FF0000">Q. Configure says that my compiler can't
merge templates containing switch statements, what does this
mean?</font> </p>
<p>A. Some compilers can't merge templates that contain static
data - this includes switch statements which implicitly generate
static data as well as code. Principally this affects the egcs
compiler - but note gcc 2.81 also suffers from this problem - the
compiler will compile and link the code - but the code will not
run because the code and the static data it uses have become
separated. The default behaviour of regex++ is to try and fix
this problem by declaring &quot;problem&quot; templates inside
unnamed namespaces, so that the templates have internal linkage.
Note that this can result in a great deal of code bloat. If the
compiler doesn't support namespaces, or if code bloat becomes a
problem, then follow the guidelines above for placing all the
templates used in a single translation unit, and edit boost/regex/config.hpp
so that BOOST_REGEX_NO_TEMPLATE_SWITCH_MERGE is no longer defined.
</p>
<p><font color="#FF0000">Q. I can't get regex++ to work with
escape characters, what's going on?</font> </p>
<p>A. If you embed regular expressions in C++ code, then remember
that escape characters are processed twice: once by the C++
compiler, and once by the regex++ expression compiler, so to pass
the regular expression \d+ to regex++, you need to embed &quot;\\d+&quot;
in your code. Likewise to match a literal backslash you will need
to embed &quot;\\\\&quot; in your code. </p>
<p><font color="#FF0000">Q. Why don't character ranges work
properly?</font> <br>
A. The POSIX standard specifies that character range expressions
are locale sensitive - so for example the expression [A-Z] will
match any collating element that collates between 'A' and 'Z'.
That means that for most locales other than &quot;C&quot; or
&quot;POSIX&quot;, [A-Z] would match the single character 't' for
example, which is not what most people expect - or at least not
what most people have come to expect from regular expression
engines. For this reason, the default behaviour of regex++ is to
turn locale sensitive collation off by setting the regbase::nocollate
compile time flag (this is set by regbase::normal). However if
you set a non-default compile time flag - for example regbase::extended
or regbase::basic, then locale dependent collation will be
enabled, this also applies to the POSIX API functions which use
either regbase::extended or regbase::basic internally, in the
latter case use REG_NOCOLLATE in combination with either
REG_BASIC or REG_EXTENDED when invoking regcomp if you don't want
locale sensitive collation. <i>[Note - when regbase::nocollate in
effect, the library behaves &quot;as if&quot; the LC_COLLATE
locale category were always &quot;C&quot;, regardless of what its
actually set to - end note</i>]. </p>
<p><font color="#FF0000">&nbsp;Q. Why can't I use the &quot;convenience&quot;
versions of query_match/reg_search/reg_grep/reg_format/reg_merge?</font>
</p>
<p>A. These versions may or may not be available depending upon
the capabilities of your compiler, the rules determining the
format of these functions are quite complex - and only the
versions visible to a standard compliant compiler are given in
the help. To find out what your compiler supports, run &lt;boost/regex.hpp&gt;
through your C++ pre-processor, and search the output file for
the function that you are interested in. </p>
<p><font color="#FF0000">Q. Why are there no throw specifications
on any of the functions? What exceptions can the library throw?</font>
</p>
<p>A. Not all compilers support (or honor) throw specifications,
others support them but with reduced efficiency. Throw
specifications may be added at a later date as compilers begin to
handle this better. The library should throw only three types of
exception: boost::bad_expression can be thrown by reg_expression
when compiling a regular expression, std::runtime_error can be
thrown when a call to reg_expression::imbue tries to open a
message catalogue that doesn't exist or when a call to RegEx::GrepFiles
or RegEx::FindFiles tries to open a file that cannot be opened,
finally std::bad_alloc can be thrown by just about any of the
functions in this library. </p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,243 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, Format String Reference</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Format
String Reference.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="format_string"></a>Format String Syntax</h3>
<p>Format strings are used by the algorithms <a
href="template_class_ref.htm#reg_format">regex_format</a> and <a
href="template_class_ref.htm#reg_merge">regex_merge</a>, and are
used to transform one string into another. </p>
<p>There are three kind of format string: sed, perl and extended,
the extended syntax is the default so this is covered first. </p>
<p><b><i>Extended format syntax</i></b> </p>
<p>In format strings, all characters are treated as literals
except: ()$\?: </p>
<p>To use any of these as literals you must prefix them with the
escape character \ </p>
<p>The following special sequences are recognized: <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Grouping:</i> </p>
<p>Use the parenthesis characters ( and ) to group sub-expressions
within the format string, use \( and \) to represent literal '('
and ')'. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Sub-expression expansions:</i> </p>
<p>The following perl like expressions expand to a particular
matched sub-expression: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$`</td>
<td valign="top" width="43%">Expands to all the text from
the end of the previous match to the start of the current
match, if there was no previous match in the current
operation, then everything from the start of the input
string to the start of the match.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$'</td>
<td valign="top" width="43%">Expands to all the text from
the end of the match to the end of the input string.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$&amp;</td>
<td valign="top" width="43%">Expands to all of the
current match.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$0</td>
<td valign="top" width="43%">Expands to all of the
current match.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">$N</td>
<td valign="top" width="43%">Expands to the text that
matched sub-expression <i>N</i>.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><i>Conditional expressions:</i> </p>
<p>Conditional expressions allow two different format strings to
be selected dependent upon whether a sub-expression participated
in the match or not: </p>
<p>?Ntrue_expression:false_expression </p>
<p>Executes true_expression if sub-expression <i>N</i>
participated in the match, otherwise executes false_expression. </p>
<p>Example: suppose we search for &quot;(while)|(for)&quot; then
the format string &quot;?1WHILE:FOR&quot; would output what
matched, but in upper case. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Escape sequences:</i> </p>
<p>The following escape sequences are also allowed: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\a</td>
<td valign="top" width="43%">The bell character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\f</td>
<td valign="top" width="43%">The form feed character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\n</td>
<td valign="top" width="43%">The newline character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\r</td>
<td valign="top" width="43%">The carriage return
character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\t</td>
<td valign="top" width="43%">The tab character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\v</td>
<td valign="top" width="43%">A vertical tab character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\x</td>
<td valign="top" width="43%">A hexadecimal character -
for example \x0D.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\x{}</td>
<td valign="top" width="43%">A possible unicode
hexadecimal character - for example \x{1A0}</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\cx</td>
<td valign="top" width="43%">The ASCII escape character
x, for example \c@ is equivalent to escape-@.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\e</td>
<td valign="top" width="43%">The ASCII escape character.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="8%">&nbsp;</td>
<td valign="top" width="40%">\dd</td>
<td valign="top" width="43%">An octal character constant,
for example \10.</td>
<td valign="top" width="9%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><b><i>Perl format strings</i></b> </p>
<p>Perl format strings are the same as the default syntax except
that the characters ()?: have no special meaning. </p>
<p><b><i>Sed format strings</i></b> </p>
<p>Sed format strings use only the characters \ and &amp; as
special characters. </p>
<p>\n where n is a digit, is expanded to the nth sub-expression. </p>
<p>&amp; is expanded to the whole of the match (equivalent to \0).
</p>
<p>Other escape sequences are expanded as per the default syntax.
<br>
</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,572 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, RegEx Class Reference</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, RegEx Class
Reference. </h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="RegEx"></a><i>Class RegEx</i></h3>
<p>#include &lt;boost/cregex.hpp&gt; </p>
<p>The class RegEx provides a high level simplified interface to
the regular expression library, this class only handles narrow
character strings, and regular expressions always follow the
&quot;normal&quot; syntax - that is the same as the standard
POSIX extended syntax, but with locale specific collation
disabled, and escape characters inside character set declarations
are allowed. </p>
<pre><b>typedef</b> <b>bool</b> (*GrepCallback)(<b>const</b> RegEx&amp; expression);
<b>typedef</b> <b>bool</b> (*GrepFileCallback)(<b>const</b> <b>char</b>* file, <b>const</b> RegEx&amp; expression);
<b>typedef</b> <b>bool</b> (*FindFilesCallback)(<b>const</b> <b>char</b>* file);
<b>class</b>&nbsp; RegEx
{
<b>public</b>:
&nbsp;&nbsp; RegEx();
&nbsp;&nbsp; RegEx(<b>const</b> RegEx&amp; o);
&nbsp;&nbsp; ~RegEx();
&nbsp;&nbsp; RegEx(<b>const</b> <b>char</b>* c, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; <strong>explicit</strong> RegEx(<b>const</b> std::string&amp; s, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; RegEx&amp; <b>operator</b>=(<b>const</b> RegEx&amp; o);
&nbsp;&nbsp; RegEx&amp; <b>operator</b>=(<b>const</b> <b>char</b>* p);
&nbsp;&nbsp; RegEx&amp; <b>operator</b>=(<b>const</b> std::string&amp; s);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> SetExpression(<b>const</b> <b>char</b>* p, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> SetExpression(<b>const</b> std::string&amp; s, <b>bool</b> icase = <b>false</b>);
&nbsp;&nbsp; std::string Expression()<b>const</b>;
&nbsp;&nbsp; <font color="#000080"><i>//
</i>&nbsp;&nbsp;<i>// now matching operators: </i>
&nbsp;&nbsp; <i>// </i></font>
&nbsp;&nbsp; <b>bool</b> Match(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>bool</b> Match(<b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>bool</b> Search(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>bool</b> Search(<b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;<b>unsigned</b> <b>int</b>&gt;&amp; v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Grep(std::vector&lt;<b>unsigned</b> <b>int</b>&gt;&amp; v, <b>const</b> std::string&amp; s, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> std::string&amp; files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> std::string&amp; files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; std::string Merge(<b>const</b> std::string&amp; in, <b>const</b> std::string&amp; fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags = match_default);
&nbsp;&nbsp; std::string Merge(<b>const</b> char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned int </b>flags = match_default);
&nbsp;&nbsp; <b>unsigned</b> Split(std::vector&lt;std::string&gt;&amp; v, std::string&amp; s, <b>unsigned</b> flags = match_default, <b>unsigned</b> max_count = ~0);
&nbsp;&nbsp; <font color="#000080"><i>//
</i>&nbsp;&nbsp; <i>// now operators for returning what matched in more detail:
</i>&nbsp;&nbsp; <i>//
</i></font>&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Position(<b>int</b> i = 0)<b>const</b>;
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Length(<b>int</b> i = 0)<b>const</b>;
<strong>bool</strong> Matched(<strong>int</strong> i = 0)<strong>const</strong>;
&nbsp;&nbsp; <b>unsigned</b> <b>int</b> Line()<b>const</b>;
&nbsp;&nbsp; <b>unsigned int</b> Marks() const;
&nbsp;&nbsp; std::string What(<b>int</b> i)<b>const</b>;
&nbsp;&nbsp; std::string <b>operator</b>[](<b>int</b> i)<b>const</b> ;
<strong>static const unsigned int</strong> npos;
}; &nbsp; &nbsp; </pre>
<p>Member functions for class RegEx are defined as follows: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx();</td>
<td valign="top" width="42%">Default constructor,
constructs an instance of RegEx without any valid
expression.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx(<b>const</b>
RegEx&amp; o);</td>
<td valign="top" width="42%">Copy constructor, all the
properties of parameter <i>o</i> are copied.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx(<b>const</b> <b>char</b>*
c, <b>bool</b> icase = <b>false</b>);</td>
<td valign="top" width="42%">Constructs an instance of
RegEx, setting the expression to <i>c</i>, if <i>icase</i>
is <i>true</i> then matching is insensitive to case,
otherwise it is sensitive to case. Throws <i>bad_expression</i>
on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx(<b>const</b> std::string&amp;
s, <b>bool</b> icase = <b>false</b>);</td>
<td valign="top" width="42%">Constructs an instance of
RegEx, setting the expression to <i>s</i>, if <i>icase </i>is
<i>true</i> then matching is insensitive to case,
otherwise it is sensitive to case. Throws <i>bad_expression</i>
on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx&amp; <b>operator</b>=(<b>const</b>
RegEx&amp; o);</td>
<td valign="top" width="42%">Default assignment operator.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx&amp; <b>operator</b>=(<b>const</b>
<b>char</b>* p);</td>
<td valign="top" width="42%">Assignment operator,
equivalent to calling <i>SetExpression(p, false).</i>
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">RegEx&amp; <b>operator</b>=(<b>const</b>
std::string&amp; s);</td>
<td valign="top" width="42%">Assignment operator,
equivalent to calling <i>SetExpression(s, false).</i>
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
SetExpression(<b>constchar</b>* p, <b>bool</b> icase = <b>false</b>);</td>
<td valign="top" width="42%">Sets the current expression
to <i>p</i>, if <i>icase</i> is <i>true</i> then matching
is insensitive to case, otherwise it is sensitive to case.
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
SetExpression(<b>const</b> std::string&amp; s, <b>bool</b>
icase = <b>false</b>);</td>
<td valign="top" width="42%">Sets the current expression
to <i>s</i>, if <i>icase</i> is <i>true</i> then matching
is insensitive to case, otherwise it is sensitive to case.
Throws <i>bad_expression</i> on failure.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string Expression()<b>const</b>;</td>
<td valign="top" width="42%">Returns a copy of the
current regular expression.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Attempts to match the
current expression against the text <i>p</i> using the
match flags <i>flags</i> - see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the expression matches the whole
of the input string.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default) ;</td>
<td valign="top" width="42%">Attempts to match the
current expression against the text <i>s</i> using the
match flags <i>flags</i> - see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the expression matches the whole
of the input string.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Attempts to find a match for
the current expression somewhere in the text <i>p</i>
using the match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the match succeeds.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default) ;</td>
<td valign="top" width="42%">Attempts to find a match for
the current expression somewhere in the text <i>s</i>
using the match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
Returns <i>true</i> if the match succeeds.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>p</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match found calls the call-back function <i>cb</i>
as: cb(*this); <p>If at any stage the call-back function
returns false then the grep operation terminates,
otherwise continues until no further matches are found.
Returns the number of matches found.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(GrepCallback cb, <b>const</b> std::string&amp; s, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>s</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match found calls the call-back function <i>cb</i>
as: cb(*this); <p>If at any stage the call-back function
returns false then the grep operation terminates,
otherwise continues until no further matches are found.
Returns the number of matches found. </p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b> <b>char</b>*
p, <b>unsigned</b> <b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>p</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes a copy of what matched onto <i>v</i>.
Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;std::string&gt;&amp; v, <b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>s</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes a copy of what matched onto <i>v</i>.
Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;<b>unsigned int</b>&gt;&amp; v, <b>const</b>
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>p</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes the starting index of what matched
onto <i>v</i>. Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Grep(std::vector&lt;<b>unsigned int</b>&gt;&amp; v, <b>const</b>
std::string&amp; s, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the text <i>s</i> using the match
flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match pushes the starting index of what matched
onto <i>v</i>. Returns the number of matches found.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>*
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the files <i>files</i> using the
match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match calls the call-back function cb.&nbsp; <p>If
the call-back returns false then the algorithm returns
without considering further matches in the current file,
or any further files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of matches found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
GrepFiles(GrepFileCallback cb, <b>const</b> std::string&amp;
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Finds all matches of the
current expression in the files <i>files</i> using the
match flags <i>flags </i>- see <a
href="template_class_ref.htm#match_type">match flags</a>.
For each match calls the call-back function cb.&nbsp; <p>If
the call-back returns false then the algorithm returns
without considering further matches in the current file,
or any further files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of matches found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>*
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Searches <i>files</i> to
find all those which contain at least one match of the
current expression using the match flags <i>flags </i>-
see <a href="template_class_ref.htm#match_type">match
flags</a>. For each matching file calls the call-back
function cb.&nbsp; <p>If the call-back returns false then
the algorithm returns without considering any further
files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of files found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
FindFiles(FindFilesCallback cb, <b>const</b> std::string&amp;
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
<b>int</b> flags = match_default);</td>
<td valign="top" width="42%">Searches <i>files</i> to
find all those which contain at least one match of the
current expression using the match flags <i>flags </i>-
see <a href="template_class_ref.htm#match_type">match
flags</a>. For each matching file calls the call-back
function cb.&nbsp; <p>If the call-back returns false then
the algorithm returns without considering any further
files.&nbsp; </p>
<p>The parameter <i>files</i> can include wild card
characters '*' and '?', if the parameter <i>recurse</i>
is true then searches sub-directories for matching file
names.&nbsp; </p>
<p>Returns the total number of files found.</p>
<p>May throw an exception derived from std::runtime_error
if file io fails.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string Merge(<b>const</b>
std::string&amp; in, <b>const</b> std::string&amp; fmt, <b>bool</b>
copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags =
match_default);</td>
<td valign="top" width="42%">Performs a search and
replace operation: searches through the string <i>in</i>
for all occurrences of the current expression, for each
occurrence replaces the match with the format string <i>fmt</i>.
Uses <i>flags</i> to determine what gets matched, and how
the format string should be treated. If <i>copy</i> is
true then all unmatched sections of input are copied
unchanged to output, if the flag <em>format_first_only</em>
is set then only the first occurance of the pattern found
is replaced. Returns the new string. See <a
href="format_string.htm#format_string">also format string
syntax</a>, <a href="template_class_ref.htm#match_type">match
flags</a> and <a
href="template_class_ref.htm#format_flags">format flags</a>.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string Merge(<b>const</b>
char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>,
<b>unsigned int </b>flags = match_default);</td>
<td valign="top" width="42%">Performs a search and
replace operation: searches through the string <i>in</i>
for all occurrences of the current expression, for each
occurrence replaces the match with the format string <i>fmt</i>.
Uses <i>flags</i> to determine what gets matched, and how
the format string should be treated. If <i>copy</i> is
true then all unmatched sections of input are copied
unchanged to output, if the flag <em>format_first_only</em>
is set then only the first occurance of the pattern found
is replaced. Returns the new string. See <a
href="format_string.htm#format_string">also format string
syntax</a>, <a href="template_class_ref.htm#match_type">match
flags</a> and <a
href="template_class_ref.htm#format_flags">format flags</a>.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top"><b>unsigned</b> Split(std::vector&lt;std::string&gt;&amp;
v, std::string&amp; s, <b>unsigned</b> flags =
match_default, <b>unsigned</b> max_count = ~0);</td>
<td valign="top">Splits the input string and pushes each
one onto the vector. If the expression contains no marked
sub-expressions, then one string is outputted for each
section of the input that does not match the expression.
If the expression does contain marked sub-expressions,
then outputs one string for each marked sub-expression
each time a match occurs. Outputs no more than <i>max_count
</i>strings. Before returning, deletes from the input
string <i>s</i> all of the input that has been processed
(all of the string if <i>max_count</i> was not reached).
Returns the number of strings pushed onto the vector.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Position(<b>int</b> i = 0)<b>const</b>;</td>
<td valign="top" width="42%">Returns the position of what
matched sub-expression <i>i</i>. If <i>i = 0</i> then
returns the position of the whole match. Returns RegEx::npos
if the supplied index is invalid, or if the specified sub-expression
did not participate in the match.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Length(<b>int</b> i = 0)<b>const</b>;</td>
<td valign="top" width="42%">Returns the length of what
matched sub-expression <i>i</i>. If <i>i = 0</i> then
returns the length of the whole match. Returns RegEx::npos
if the supplied index is invalid, or if the specified sub-expression
did not participate in the match.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td><strong>bool</strong> Matched(<strong>int</strong> i
= 0)<strong>const</strong>;</td>
<td>Returns true if sub-expression <em>i</em> was
matched, false otherwise.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
Line()<b>const</b>;</td>
<td valign="top" width="42%">Returns the line on which
the match occurred, indexes start from 1 not zero, if no
match occurred then returns RegEx::npos.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%"><b>unsigned int</b> Marks()
const;</td>
<td valign="top" width="42%">Returns the number of marked
sub-expressions contained in the expression. Note that
this includes the whole match (sub-expression zero), so
the value returned is always &gt;= 1.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string What(<b>int</b>
i)<b>const</b>;</td>
<td valign="top" width="42%">Returns a copy of what
matched sub-expression <i>i</i>. If <i>i = 0</i> then
returns a copy of the whole match. Returns a null string
if the index is invalid or if the specified sub-expression
did not participate in a match.</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
<tr>
<td valign="top" width="7%">&nbsp;</td>
<td valign="top" width="43%">std::string <b>operator</b>[](<b>int</b>
i)<b>const</b> ;</td>
<td valign="top" width="42%">Returns <i>what(i);</i> <p>Can
be used to simplify access to sub-expression matches, and
make usage more perl-like.</p>
</td>
<td valign="top" width="7%">&nbsp;</td>
</tr>
</table>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -125,8 +125,10 @@
// If there isn't good enough wide character support then there will
// be no wide character regular expressions:
//
#if (defined(BOOST_NO_CWCHAR) || defined(BOOST_NO_CWCTYPE) || defined(BOOST_NO_STD_WSTRING)) && !defined(BOOST_NO_WREGEX)
# define BOOST_NO_WREGEX
#if (defined(BOOST_NO_CWCHAR) || defined(BOOST_NO_CWCTYPE) || defined(BOOST_NO_STD_WSTRING))
# if !defined(BOOST_NO_WREGEX)
# define BOOST_NO_WREGEX
# endif
#else
# if defined(__sgi) && defined(__SGI_STL_PORT)
// STLPort on IRIX is misconfigured: <cwctype> does not compile
@ -645,3 +647,4 @@ inline void pointer_construct(T* p, const T& t)

View File

@ -1990,6 +1990,8 @@ unsigned int BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::probe_re
{
case re_detail::syntax_element_startmark:
case re_detail::syntax_element_endmark:
if(static_cast<const re_detail::re_brace*>(dat)->index == -2)
return regbase::restart_any;
return probe_restart(dat->next.p);
case re_detail::syntax_element_start_line:
return regbase::restart_line;
@ -2018,7 +2020,7 @@ unsigned int BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::fixup_le
if((leading_lit) && (static_cast<re_detail::re_literal*>(dat)->length > 2))
{
// we can do a literal search for the leading literal string
// using Knuth-Morris-Pratt (or whatever), and only then check for
// using Knuth-Morris-Pratt (or whatever), and only then check for
// matches. We need a decent length string though to make it
// worth while.
_leading_string = reinterpret_cast<charT*>(reinterpret_cast<char*>(dat) + sizeof(re_detail::re_literal));
@ -2066,10 +2068,14 @@ unsigned int BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::fixup_le
case re_detail::syntax_element_rep:
if((len == 0) && (1 == fixup_leading_rep(dat->next.p, static_cast<re_detail::re_repeat*>(dat)->alt.p) ))
{
static_cast<re_detail::re_repeat*>(dat)->leading = true;
static_cast<re_detail::re_repeat*>(dat)->leading = leading_lit;
return len;
}
return len;
case re_detail::syntax_element_startmark:
if(static_cast<const re_detail::re_brace*>(dat)->index == -2)
return 0;
// fall through:
default:
break;
}
@ -2115,3 +2121,4 @@ void BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::fail(unsigned in

View File

@ -56,8 +56,10 @@ inline int string_compare(const std::basic_string<C,T,A>& s, const C* p)
{ return s.compare(p); }
inline int string_compare(const std::string& s, const char* p)
{ return std::strcmp(s.c_str(), p); }
# ifndef BOOST_NO_WREGEX
inline int string_compare(const std::wstring& s, const wchar_t* p)
{ return std::wcscmp(s.c_str(), p); }
# endif
# define STR_COMP(s,p) string_compare(s,p)
#endif
@ -753,6 +755,15 @@ bool query_match_aux(iterator first,
start_loop[cur_acc] = first;
continue;
}
else if((unsigned int)accumulators[cur_acc] < static_cast<const re_repeat*>(ptr)->min)
{
// the repeat was null, and we haven't gone round min times yet,
// since all subsequent repeats will be null as well, just update
// our repeat count and skip out.
accumulators[cur_acc] = static_cast<const re_repeat*>(ptr)->min;
ptr = static_cast<const re_repeat*>(ptr)->alt.p;
continue;
}
goto failure;
}
// see if we can skip the repeat:
@ -809,6 +820,15 @@ bool query_match_aux(iterator first,
start_loop[cur_acc] = first;
continue;
}
else if((first == start_loop[cur_acc]) && accumulators[cur_acc] && ((unsigned int)accumulators[cur_acc] < static_cast<const re_repeat*>(ptr)->min))
{
// the repeat was null, and we haven't gone round min times yet,
// since all subsequent repeats will be null as well, just update
// our repeat count and skip out.
accumulators[cur_acc] = static_cast<const re_repeat*>(ptr)->min;
ptr = static_cast<const re_repeat*>(ptr)->alt.p;
continue;
}
// if we get here then neither option is allowed so fail:
goto failure;
@ -826,7 +846,7 @@ bool query_match_aux(iterator first,
if(flags & match_not_eob)
goto failure;
iterator p(first);
while((p != last) && traits_inst.is_separator(traits_inst.translate(*first, icase)))++p;
while((p != last) && traits_inst.is_separator(traits_inst.translate(*p, icase)))++p;
if(p != last)
goto failure;
ptr = ptr->next.p;
@ -958,6 +978,12 @@ bool query_match_aux(iterator first,
goto failure;
ptr = ptr->next.p;
continue;
case syntax_element_backref:
if(temp_match[static_cast<const re_brace*>(ptr)->index].first
!= temp_match[static_cast<const re_brace*>(ptr)->index].second)
goto failure;
ptr = ptr->next.p;
continue;
default:
goto failure;
}

150
index.htm
View File

@ -1,150 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="keywords"
content="regex++, regular expressions, regular expression library, C++">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>regex++, Index</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="277" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Index.</h3>
<p align="left"><i>(Version 3.31, 16th Dec 2001)</i>&nbsp;
</p>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3 align="center">Contents</h3>
<ul>
<li><a href="introduction.htm#intro">Introduction</a></li>
<li><a href="introduction.htm#Installation">Installation and
Configuration</a> </li>
<li><a href="template_class_ref.htm#regbase">Template Class
and Algorithm Reference</a> <ul>
<li>Class <a href="template_class_ref.htm#regbase">regbase</a></li>
<li>Class <a
href="template_class_ref.htm#bad_expression">bad_expression</a>
</li>
<li>Class <a
href="template_class_ref.htm#reg_expression">reg_expression</a>
</li>
<li>Class <a
href="template_class_ref.htm#regex_char_traits">char_regex_traits</a></li>
<li>Class <a href="template_class_ref.htm#reg_match">match_results</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#query_match">regex_match</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#reg_search">regex_search</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#reg_grep">regex_grep</a></li>
<li>Algorithm <a
href="template_class_ref.htm#reg_format">regex_format</a>
</li>
<li>Algorithm <a
href="template_class_ref.htm#reg_merge">regex_merge</a></li>
<li>Algorithm <a
href="template_class_ref.htm#regex_split">regex_split</a>
</li>
<li><a href="template_class_ref.htm#partial_matches">Partial
regular expression matches</a></li>
</ul>
</li>
<li>Class <a href="hl_ref.htm#RegEx">RegEx</a> reference</li>
<li><a href="posix_ref.htm#posix">POSIX Compatibility
Functions</a></li>
<li><a href="syntax.htm#syntax">Regular Expression Syntax</a></li>
<li><a href="format_string.htm#format_string">Format String
Syntax</a></li>
<li><a href="appendix.htm#implementation">Appendices</a> <ul>
<li><a href="appendix.htm#implementation">Implementation
notes</a></li>
<li><a href="appendix.htm#threads">Thread safety</a></li>
<li><a href="appendix.htm#localisation">Localization</a></li>
<li><a href="appendix.htm#demos">Example Applications</a>
<ul>
<li><a
href="example/snippets/regex_match_example.cpp">regex_match_example.cpp</a>:
ftp based regex_match example.</li>
<li><a
href="example/snippets/regex_search_example.cpp">regex_search_example.cpp</a>:
regex_search example: searches a cpp file
for class definitions.</li>
<li><a
href="example/snippets/regex_grep_example_1.cpp">regex_grep_example_1.cpp</a>:
regex_grep example 1: searches a cpp file
for class definitions.</li>
<li><a
href="example/snippets/regex_merge_example.cpp">regex_merge_example.cpp</a>:
regex_merge example: converts a C++ file
to syntax highlighted HTML.</li>
<li><a
href="example/snippets/regex_grep_example_2.cpp">regex_grep_example_2.cpp</a>:
regex_grep example 2: searches a cpp file
for class definitions, using a global
callback function. </li>
<li><a
href="example/snippets/regex_grep_example_3.cpp">regex_grep_example_3.cpp</a>:
regex_grep example 2: searches a cpp file
for class definitions, using a bound
member function callback.</li>
<li><a
href="example/snippets/regex_grep_example_4.cpp">regex_grep_example_4.cpp</a>:
regex_grep example 2: searches a cpp file
for class definitions, using a C++
Builder closure as a callback.</li>
<li><a
href="example/snippets/regex_split_example_1.cpp">regex_split_example_1.cpp</a>:
regex_split example: split a string into
tokens.</li>
<li><a
href="example/snippets/regex_split_example_2.cpp">regex_split_example_2.cpp</a>:
regex_split example: spit out linked
URL's.</li>
</ul>
</li>
<li><a href="appendix.htm#headers">Header Files.</a></li>
<li><a href="appendix.htm#redist">Redistributables</a></li>
<li><a href="appendix.htm#upgrade">Note for upgraders</a></li>
</ul>
</li>
<li><a href="appendix.htm#furtherInfo">Further Information (Contacts
and Acknowledgements)</a></li>
<li><a href="faq.htm">FAQ</a></li>
</ul>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,476 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="keywords"
content="regex++, regular expressions, regular expression library, C++">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>regex++, Introduction</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Introduction.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="intro"></a><i>Introduction</i></h3>
<p>Regular expressions are a form of pattern-matching that are
often used in text processing; many users will be familiar with
the Unix utilities <i>grep</i>, <i>sed</i> and <i>awk</i>, and
the programming language <i>perl</i>, each of which make
extensive use of regular expressions. Traditionally C++ users
have been limited to the POSIX C API's for manipulating regular
expressions, and while regex++ does provide these API's, they do
not represent the best way to use the library. For example regex++
can cope with wide character strings, or search and replace
operations (in a manner analogous to either sed or perl),
something that traditional C libraries can not do.</p>
<p>The class <a href="template_class_ref.htm#reg_expression">boost::reg_expression</a>
is the key class in this library; it represents a &quot;machine
readable&quot; regular expression, and is very closely modelled
on std::basic_string, think of it as a string plus the actual
state-machine required by the regular expression algorithms. Like
std::basic_string there are two typedefs that are almost always
the means by which this class is referenced:</p>
<pre><b>namespace </b>boost{
<b>template</b> &lt;<b>class</b> charT,
<b> class</b> traits = regex_traits&lt;charT&gt;,
<b>class</b> Allocator = std::allocator&lt;charT&gt; &gt;
<b>class</b> reg_expression;
<b>typedef</b> reg_expression&lt;<b>char</b>&gt; regex;
<b>typedef</b> reg_expression&lt;<b>wchar_t&gt;</b> wregex;
}</pre>
<p>To see how this library can be used, imagine that we are
writing a credit card processing application. Credit card numbers
generally come as a string of 16-digits, separated into groups of
4-digits, and separated by either a space or a hyphen. Before
storing a credit card number in a database (not necessarily
something your customers will appreciate!), we may want to verify
that the number is in the correct format. To match any digit we
could use the regular expression [0-9], however ranges of
characters like this are actually locale dependent. Instead we
should use the POSIX standard form [[:digit:]], or the regex++
and perl shorthand for this \d (note that many older libraries
tended to be hard-coded to the C-locale, consequently this was
not an issue for them). That leaves us with the following regular
expression to validate credit card number formats:</p>
<p>(\d{4}[- ]){3}\d{4}</p>
<p>Here the parenthesis act to group (and mark for future
reference) sub-expressions, and the {4} means &quot;repeat
exactly 4 times&quot;. This is an example of the extended regular
expression syntax used by perl, awk and egrep. Regex++ also
supports the older &quot;basic&quot; syntax used by sed and grep,
but this is generally less useful, unless you already have some
basic regular expressions that you need to reuse.</p>
<p>Now lets take that expression and place it in some C++ code to
validate the format of a credit card number:</p>
<pre><b>bool</b> validate_card_format(<b>const</b> std::string s)
{
<b>static</b> <b>const</b> <a
href="template_class_ref.htm#reg_expression">boost::regex</a> e(&quot;(\\d{4}[- ]){3}\\d{4}&quot;);
<b>return</b> <a href="template_class_ref.htm#query_match">regex_match</a>(s, e);
}</pre>
<p>Note how we had to add some extra escapes to the expression:
remember that the escape is seen once by the C++ compiler, before
it gets to be seen by the regular expression engine, consequently
escapes in regular expressions have to be doubled up when
embedding them in C/C++ code. Also note that all the examples
assume that your compiler supports Koenig lookup, if yours
doesn't (for example VC6), then you will have to add some boost::
prefixes to some of the function calls in the examples.</p>
<p>Those of you who are familiar with credit card processing,
will have realised that while the format used above is suitable
for human readable card numbers, it does not represent the format
required by online credit card systems; these require the number
as a string of 16 (or possibly 15) digits, without any
intervening spaces. What we need is a means to convert easily
between the two formats, and this is where search and replace
comes in. Those who are familiar with the utilities <i>sed</i>
and <i>perl</i> will already be ahead here; we need two strings -
one a regular expression - the other a &quot;<a
href="format_string.htm">format string</a>&quot; that provides a
description of the text to replace the match with. In regex++
this search and replace operation is performed with the algorithm
regex_merge, for our credit card example we can write two
algorithms like this to provide the format conversions:</p>
<pre>
<i>// match any format with the regular expression:
</i><b>const</b> boost::regex e(&quot;\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z&quot;);
<b>const</b> std::string machine_format(&quot;\\1\\2\\3\\4&quot;);
<b>const</b> std::string human_format(&quot;\\1-\\2-\\3-\\4&quot;);
std::string machine_readable_card_number(<b>const</b> std::string s)
{
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, machine_format, boost::match_default | boost::format_sed);
}
std::string human_readable_card_number(<b>const</b> std::string s)
{
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, human_format, boost::match_default | boost::format_sed);
}</pre>
<p>Here we've used marked sub-expressions in the regular
expression to split out the four parts of the card number as
separate fields, the format string then uses the sed-like syntax
to replace the matched text with the reformatted version.</p>
<p>In the examples above, we haven't directly manipulated the
results of a regular expression match, however in general the
result of a match contains a number of sub-expression matches in
addition to the overall match. When the library needs to report a
regular expression match it does so using an instance of the
class <a href="template_class_ref.htm#reg_match">match_results</a>,
as before there are typedefs of this class for the most common
cases: </p>
<pre><b>namespace </b>boost{
<b>typedef</b> match_results&lt;<b>const</b> <b>char</b>*&gt; cmatch;
<b>typedef</b> match_results&lt;<b>const</b> <b>wchar_t</b>*&gt; wcmatch;
<strong>typedef</strong> match_results&lt;std::string::const_iterator&gt; smatch;
<strong>typedef</strong> match_results&lt;std::wstring::const_iterator&gt; wsmatch;
}</pre>
<p>The algorithms <a href="template_class_ref.htm#reg_search">regex_search</a>
and <a href="template_class_ref.htm#reg_grep">regex_grep</a> (i.e.
finding all matches in a string) make use of match_results to
report what matched.</p>
<p>Note that these algorithms are not restricted to searching
regular C-strings, any bidirectional iterator type can be
searched, allowing for the possibility of seamlessly searching
almost any kind of data. </p>
<p>For search and replace operations in addition to the algorithm
<a href="template_class_ref.htm#reg_merge">regex_merge</a> that
we have already seen, the algorithm <a
href="template_class_ref.htm#reg_format">regex_format</a> takes
the result of a match and a format string, and produces a new
string by merging the two.</p>
<p>For those that dislike templates, there is a high level
wrapper class RegEx that is an encapsulation of the lower level
template code - it provides a simplified interface for those that
don't need the full power of the library, and supports only
narrow characters, and the &quot;extended&quot; regular
expression syntax. </p>
<p>The <a href="posix_ref.htm#posix">POSIX API</a> functions:
regcomp, regexec, regfree and regerror, are available in both
narrow character and Unicode versions, and are provided for those
who need compatibility with these API's. </p>
<p>Finally, note that the library now has run-time <a
href="appendix.htm#localisation">localization</a> support, and
recognizes the full POSIX regular expression syntax - including
advanced features like multi-character collating elements and
equivalence classes - as well as providing compatibility with
other regular expression libraries including GNU and BSD4 regex
packages, and to a more limited extent perl 5. </p>
<h3><a name="Installation"></a><i>Installation and Configuration
Options</i> </h3>
<p><em>[ </em><strong><i>Important</i></strong><em>: If you are
upgrading from the 2.x version of this library then you will find
a number of changes to the documented header names and library
interfaces, existing code should still compile unchanged however
- see </em><a href="appendix.htm#upgrade"><font color="#0000FF"><em>Note
for Upgraders</em></font></a><em>. ]</em></p>
<p>When you extract the library from its zip file, you must
preserve its internal directory structure (for example by using
the -d option when extracting). If you didn't do that when
extracting, then you'd better stop reading this, delete the files
you just extracted, and try again! </p>
<p>This library should not need configuring before use; most
popular compilers/standard libraries/platforms are already
supported &quot;as is&quot;. If you do experience configuration
problems, or just want to test the configuration with your
compiler, then the process is the same as for all of boost; see
the <a href="../config/config.htm">configuration library
documentation</a>.</p>
<p>The library will encase all code inside namespace boost. </p>
<p>Unlike some other template libraries, this library consists of
a mixture of template code (in the headers) and static code and
data (in cpp files). Consequently it is necessary to build the
library's support code into a library or archive file before you
can use it, instructions for specific platforms are as follows: </p>
<p><b>Borland C++ Builder:</b> </p>
<ul>
<li>Open up a console window and change to the
&lt;boost&gt;\libs\regex\build directory. </li>
<li>Select the appropriate makefile (bcb4.mak for C++ Builder
4, bcb5.mak for C++ Builder 5, and bcb6.mak for C++
Builder 6). </li>
<li>Invoke the makefile (pass the full path to your version
of make if you have more than one version installed, the
makefile relies on the path to make to obtain your C++
Builder installation directory and tools) for example: </li>
</ul>
<pre>make -fbcb5.mak</pre>
<p>The build process will build a variety of .lib and .dll files
(the exact number depends upon the version of Borland's tools you
are using) the .lib and dll files will be in a sub-directory
called bcb4 or bcb5 depending upon the makefile used. To install
the libraries into your development system use:</p>
<p>make -fbcb5.mak install</p>
<p>library files will be copied to &lt;BCROOT&gt;/lib and the
dll's to &lt;BCROOT&gt;/bin, where &lt;BCROOT&gt; corresponds to
the install path of your Borland C++ tools. </p>
<p>You may also remove temporary files created during the build
process (excluding lib and dll files) by using:</p>
<p>make -fbcb5.mak clean</p>
<p>Finally when you use regex++ it is only necessary for you to
add the &lt;boost&gt; root director to your list of include
directories for that project. It is not necessary for you to
manually add a .lib file to the project; the headers will
automatically select the correct .lib file for your build mode
and tell the linker to include it. There is one caveat however:
the library can not tell the difference between VCL and non-VCL
enabled builds when building a GUI application from the command
line, if you build from the command line with the 5.5 command
line tools then you must define the pre-processor symbol _NO_VCL
in order to ensure that the correct link libraries are selected:
the C++ Builder IDE normally sets this automatically. Hint, users
of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg
in order to set this option permanently. </p>
<p>If you would prefer to do a static link to the regex libraries
even when using the dll runtime then define
BOOST_REGEX_STATIC_LINK, and if you want to suppress automatic
linking altogether (and supply your own custom build of the lib)
then define BOOST_REGEX_NO_LIB.</p>
<p>If you are building with C++ Builder 6, you will find that
&lt;boost/regex.hpp&gt; can not be used in a pre-compiled header
(the actual problem is in &lt;locale&gt; which gets included by
&lt;boost/regex.hpp&gt;), if this causes problems for you, then
try defining BOOST_NO_STD_LOCALE when building, this will disable
some features throughout boost, but may save you a lot in compile
times!</p>
<p><b>Microsoft Visual C++ 6</b><strong> and 7</strong></p>
<p>You need version 6 of MSVC to build this library. If you are
using VC5 then you may want to look at one of the previous
releases of this <a
href="http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm">library</a>
</p>
<p>Open up a command prompt, which has the necessary MSVC
environment variables defined (for example by using the batch
file Vcvars32.bat installed by the Visual Studio installation),
and change to the &lt;boost&gt;\libs\regex\build directory. </p>
<p>Select the correct makefile - vc6.mak for &quot;vanilla&quot;
Visual C++ 6 or vc6-stlport.mak if you are using STLPort.</p>
<p>Invoke the makefile like this:</p>
<p>nmake -fvc6.mak</p>
<p>You will now have a collection of lib and dll files in a
&quot;vc6&quot; subdirectory, to install these into your
development system use:</p>
<p>nmake -fvc6.mak install</p>
<p>The lib files will be copied to your &lt;VC6&gt;\lib directory
and the dll files to &lt;VC6&gt;\bin, where &lt;VC6&gt; is the
root of your Visual C++ 6 installation.</p>
<p>You can delete all the temporary files created during the
build (excluding lib and dll files) using:</p>
<p>nmake -fvc6.mak clean </p>
<p>Finally when you use regex++ it is only necessary for you to
add the &lt;boost&gt; root directory to your list of include
directories for that project. It is not necessary for you to
manually add a .lib file to the project; the headers will
automatically select the correct .lib file for your build mode
and tell the linker to include it. </p>
<p>Note that if you want to statically link to the regex library
when using the dynamic C++ runtime, define
BOOST_REGEX_STATIC_LINK when building your project (this only has
an effect for release builds). If you want to add the source
directly to your project then define BOOST_REGEX_NO_LIB to
disable automatic library selection.</p>
<p><strong><i>Important</i></strong><em>: there have been some
reports of compiler-optimisation bugs affecting this library, (particularly
with VC6 versions prior to service patch 5) the workaround is to
build the library using /Oityb1 rather than /O2. That is to use
all optimisation settings except /Oa. This problem is reported to
affect some standard library code as well (in fact I'm not sure
if the problem is with the regex code or the underlying standard
library), so it's probably worthwhile applying this workaround in
normal practice in any case.</em></p>
<p>Note: if you have replaced the C++ standard library that comes
with VC6, then when you build the library you must ensure that
the environment variables &quot;INCLUDE&quot; and &quot;LIB&quot;
have been updated to reflect the include and library paths for
the new library - see vcvars32.bat (part of your Visual Studio
installation) for more details. Alternatively if STLPort is in c:/stlport
then you could use:</p>
<p>nmake INCLUDES=&quot;-Ic:/stlport/stlport&quot; XLFLAGS=&quot;/LIBPATH:c:/stlport/lib&quot;
-fvc6-stlport.mak</p>
<p>If you are building with the full STLPort v4.x, then use the
vc6-stlport.mak file provided and set the environment variable
STLPORT_PATH to point to the location of your STLport
installation (Note that the full STLPort libraries appear not to
support single-thread static builds). <br>
&nbsp; <br>
&nbsp; </p>
<p><b>GCC(2.95)</b> </p>
<p>There is a conservative makefile for the g++ compiler. From
the command prompt change to the &lt;boost&gt;/libs/regex/build
directory and type: </p>
<p>make -fgcc.mak </p>
<p>At the end of the build process you should have a gcc sub-directory
containing release and debug versions of the library (libboost_regex.a
and libboost_regex_debug.a). When you build projects that use
regex++, you will need to add the boost install directory to your
list of include paths and add &lt;boost&gt;/libs/regex/build/gcc/libboost_regex.a
to your list of library files. </p>
<p>There is also a makefile to build the library as a shared
library:</p>
<p>make -fgcc-shared.mak</p>
<p>which will build libboost_regex.so and libboost_regex_debug.so.</p>
<p>Both of the these makefiles support the following environment
variables:</p>
<p>CXXFLAGS: extra compiler options - note that this applies to
both the debug and release builds.</p>
<p>INCLUDES: additional include directories.</p>
<p>LDFLAGS: additional linker options.</p>
<p>LIBS: additional library files.</p>
<p>For the more adventurous there is a configure script in
&lt;boost&gt;/libs/config; see the <a href="../config/config.htm">config
library documentation</a>.</p>
<p><b>Sun Workshop 6.1</b></p>
<p>There is a makefile for the sun (6.1) compiler (C++ version 3.12).
From the command prompt change to the &lt;boost&gt;/libs/regex/build
directory and type: </p>
<p>dmake -f sunpro.mak </p>
<p>At the end of the build process you should have a sunpro sub-directory
containing single and multithread versions of the library (libboost_regex.a,
libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so).
When you build projects that use regex++, you will need to add
the boost install directory to your list of include paths and add
&lt;boost&gt;/libs/regex/build/sunpro/ to your library search
path. </p>
<p>Both of the these makefiles support the following environment
variables:</p>
<p>CXXFLAGS: extra compiler options - note that this applies to
both the single and multithreaded builds.</p>
<p>INCLUDES: additional include directories.</p>
<p>LDFLAGS: additional linker options.</p>
<p>LIBS: additional library files.</p>
<p>LIBSUFFIX: a suffix to mangle the library name with (defaults
to nothing).</p>
<p>This makefile does not set any architecture specific options
like -xarch=v9, you can set these by defining the appropriate
macros, for example:</p>
<p>dmake CXXFLAGS=&quot;-xarch=v9&quot; LDFLAGS=&quot;-xarch=v9&quot;
LIBSUFFIX=&quot;_v9&quot; -f sunpro.mak</p>
<p>will build v9 variants of the regex library named
libboost_regex_v9.a etc.</p>
<p><b>Other compilers:</b> </p>
<p>There is a generic makefile (<a href="build/generic.mak">generic.mak</a>)
provided in &lt;boost-root&gt;/libs/regex/build - see that
makefile for details of environment variables that need to be set
before use. Alternatively you can using the <a
href="../../tools/build/index.html">Jam based build system</a>.
If you need to configure the library for your platform, then
refer to the <a href="../config/config.htm">config library
documentation</a>.</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
</body>
</html>

View File

@ -1,314 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, POSIX API Reference</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, POSIX API
Reference. </h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="posix"></a><i>POSIX compatibility library</i></h3>
<pre>#include &lt;boost/cregex.hpp&gt;
<i>or</i>:
#include &lt;boost/regex.h&gt;</pre>
<p>The following functions are available for users who need a
POSIX compatible C library, they are available in both Unicode
and narrow character versions, the standard POSIX API names are
macros that expand to one version or the other depending upon
whether UNICODE is defined or not. </p>
<p><b>Important</b>: Note that all the symbols defined here are
enclosed inside namespace <i>boost</i> when used in C++ programs,
unless you use #include &lt;boost/regex.h&gt; instead - in which
case the symbols are still defined in namespace boost, but are
made available in the global namespace as well.</p>
<p>The functions are defined as: </p>
<pre>extern &quot;C&quot; {
<b>int</b> regcompA(regex_tA*, <b>const</b> <b>char</b>*, <b>int</b>);
<b>unsigned</b> <b>int</b> regerrorA(<b>int</b>, <b>const</b> regex_tA*, <b>char</b>*, <b>unsigned</b> <b>int</b>);
<b>int</b> regexecA(<b>const</b> regex_tA*, <b>const</b> <b>char</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
<b>void</b> regfreeA(regex_tA*);
<b>int</b> regcompW(regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>int</b>);
<b>unsigned</b> <b>int</b> regerrorW(<b>int</b>, <b>const</b> regex_tW*, <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>);
<b>int</b> regexecW(<b>const</b> regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
<b>void</b> regfreeW(regex_tW*);
#ifdef UNICODE
#define regcomp regcompW
#define regerror regerrorW
#define regexec regexecW
#define regfree regfreeW
#define regex_t regex_tW
#else
#define regcomp regcompA
#define regerror regerrorA
#define regexec regexecA
#define regfree regfreeA
#define regex_t regex_tA
#endif
}</pre>
<p>All the functions operate on structure <b>regex_t</b>, which
exposes two public members: </p>
<p><b>unsigned int re_nsub</b> this is filled in by <b>regcomp</b>
and indicates the number of sub-expressions contained in the
regular expression. </p>
<p><b>const TCHAR* re_endp</b> points to the end of the
expression to compile when the flag REG_PEND is set. </p>
<p><i>Footnote: regex_t is actually a #define - it is either
regex_tA or regex_tW depending upon whether UNICODE is defined or
not, TCHAR is either char or wchar_t again depending upon the
macro UNICODE.</i> </p>
<p><b>regcomp</b> takes a pointer to a <b>regex_t</b>, a pointer
to the expression to compile and a flags parameter which can be a
combination of: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_EXTENDED</td>
<td valign="top" width="45%">Compiles modern regular
expressions. Equivalent to regbase::char_classes |
regbase::intervals | regbase::bk_refs.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_BASIC</td>
<td valign="top" width="45%">Compiles basic (obsolete)
regular expression syntax. Equivalent to regbase::char_classes
| regbase::intervals | regbase::limited_ops | regbase::bk_braces
| regbase::bk_parens | regbase::bk_refs.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NOSPEC</td>
<td valign="top" width="45%">All characters are ordinary,
the expression is a literal string.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_ICASE</td>
<td valign="top" width="45%">Compiles for matching that
ignores character case.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NOSUB</td>
<td valign="top" width="45%">Has no effect in this
library.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NEWLINE</td>
<td valign="top" width="45%">When this flag is set a dot
does not match the newline character.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_PEND</td>
<td valign="top" width="45%">When this flag is set the
re_endp parameter of the regex_t structure must point to
the end of the regular expression to compile.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NOCOLLATE</td>
<td valign="top" width="45%">When this flag is set then
locale dependent collation for character ranges is turned
off.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_ESCAPE_IN_LISTS<br>
, , , </td>
<td valign="top" width="45%">When this flag is set, then
escape sequences are permitted in bracket expressions (character
sets).</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_NEWLINE_ALT&nbsp;</td>
<td valign="top" width="45%">When this flag is set then
the newline character is equivalent to the alternation
operator |.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_PERL&nbsp;</td>
<td valign="top" width="45%">&nbsp;A shortcut for perl-like
behavior: REG_EXTENDED | REG_NOCOLLATE |
REG_ESCAPE_IN_LISTS</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_AWK</td>
<td valign="top" width="45%">A shortcut for awk-like
behavior: REG_EXTENDED | REG_ESCAPE_IN_LISTS</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_GREP</td>
<td valign="top" width="45%">A shortcut for grep like
behavior: REG_BASIC | REG_NEWLINE_ALT</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">REG_EGREP</td>
<td valign="top" width="45%">&nbsp;A shortcut for egrep
like behavior: REG_EXTENDED | REG_NEWLINE_ALT</td>
<td width="5%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><b>regerror</b> takes the following parameters, it maps an
error code to a human readable string: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="50%">int code</td>
<td valign="top" width="50%">The error code.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">const regex_t* e</td>
<td valign="top" width="50%">The regular expression (can
be null).</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">char* buf</td>
<td valign="top" width="50%">The buffer to fill in with
the error message.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">unsigned int buf_size</td>
<td valign="top" width="50%">The length of buf.</td>
<td>&nbsp;</td>
</tr>
</table>
<p>If the error code is OR'ed with REG_ITOA then the message that
results is the printable name of the code rather than a message,
for example &quot;REG_BADPAT&quot;. If the code is REG_ATIO then <b>e</b>
must not be null and <b>e-&gt;re_pend</b> must point to the
printable name of an error code, the return value is then the
value of the error code. For any other value of <b>code</b>, the
return value is the number of characters in the error message, if
the return value is greater than or equal to <b>buf_size</b> then
<b>regerror</b> will have to be called again with a larger buffer.</p>
<p><b>regexec</b> finds the first occurrence of expression <b>e</b>
within string <b>buf</b>. If <b>len</b> is non-zero then *<b>m</b>
is filled in with what matched the regular expression, <b>m[0]</b>
contains what matched the whole string, <b>m[1] </b>the first sub-expression
etc, see <b>regmatch_t</b> in the header file declaration for
more details. The <b>eflags</b> parameter can be a combination of:
<br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="50%">REG_NOTBOL</td>
<td valign="top" width="50%">Parameter <b>buf </b>does
not represent the start of a line.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">REG_NOTEOL</td>
<td valign="top" width="50%">Parameter <b>buf</b> does
not terminate at the end of a line.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">REG_STARTEND</td>
<td valign="top" width="50%">The string searched starts
at buf + pmatch[0].rm_so and ends at buf + pmatch[0].rm_eo.</td>
<td>&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p>Finally <b>regfree</b> frees all the memory that was allocated
by regcomp. </p>
<p><i>Footnote: this is an abridged reference to the POSIX API
functions, it is provided for compatibility with other libraries,
rather than an API to be used in new code (unless you need access
from a language other than C++). This version of these functions
should also happily coexist with other versions, as the names
used are macros that expand to the actual function names.</i> <br>
</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

View File

@ -514,9 +514,9 @@ void BOOST_REGEX_CALL c_traits_base::do_update_ctype()
if(std::isxdigit(i))
class_map[i] |= char_class_xdigit;
}
class_map['_'] |= char_class_underscore;
class_map[' '] |= char_class_blank;
class_map['\t'] |= char_class_blank;
class_map[(unsigned char)'_'] |= char_class_underscore;
class_map[(unsigned char)' '] |= char_class_blank;
class_map[(unsigned char)'\t'] |= char_class_blank;
for(i = 0; i < map_size; ++i)
{
lower_case_map[i] = (char)std::tolower(i);

View File

@ -241,7 +241,7 @@ message_data<char>::message_data(const std::locale& l, const std::string& regex_
#endif
for(std::size_t j = 0; j < s.size(); ++j)
{
syntax_map[s[j]] = (unsigned char)(i);
syntax_map[(unsigned char)s[j]] = (unsigned char)(i);
}
}

View File

@ -1,742 +0,0 @@
<html>
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>Regex++, Regular Expression Syntax</title>
</head>
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
<p>&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td valign="top"><h3><img src="../../c++boost.gif"
alt="C++ Boost" width="276" height="86"></h3>
</td>
<td valign="top"><h3 align="center">Regex++, Regular
Expression Syntax.</h3>
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
<p align="left"><i>Dr John Maddock</i></p>
<p align="left"><i>Permission to use, copy, modify,
distribute and sell this software and its documentation
for any purpose is hereby granted without fee, provided
that the above copyright notice appear in all copies and
that both that copyright notice and this permission
notice appear in supporting documentation. Dr John
Maddock makes no representations about the suitability of
this software for any purpose. It is provided &quot;as is&quot;
without express or implied warranty.</i></p>
</td>
</tr>
</table>
<hr>
<h3><a name="syntax"></a><i>Regular expression syntax</i></h3>
<p>This section covers the regular expression syntax used by this
library, this is a programmers guide, the actual syntax presented
to your program's users will depend upon the flags used during
expression compilation. </p>
<p><i>Literals</i> </p>
<p>All characters are literals except: &quot;.&quot;, &quot;|&quot;,
&quot;*&quot;, &quot;?&quot;, &quot;+&quot;, &quot;(&quot;,
&quot;)&quot;, &quot;{&quot;, &quot;}&quot;, &quot;[&quot;,
&quot;]&quot;, &quot;^&quot;, &quot;$&quot; and &quot;\&quot;.
These characters are literals when preceded by a &quot;\&quot;. A
literal is a character that matches itself, or matches the result
of traits_type::translate(), where traits_type is the traits
template parameter to class reg_expression. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Wildcard</i> </p>
<p>The dot character &quot;.&quot; matches any single character
except : when <i>match_not_dot_null</i> is passed to the matching
algorithms, the dot does not match a null character; when <i>match_not_dot_newline</i>
is passed to the matching algorithms, then the dot does not match
a newline character. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Repeats</i> </p>
<p>A repeat is an expression that is repeated an arbitrary number
of times. An expression followed by &quot;*&quot; can be repeated
any number of times including zero. An expression followed by
&quot;+&quot; can be repeated any number of times, but at least
once, if the expression is compiled with the flag regbase::bk_plus_qm
then &quot;+&quot; is an ordinary character and &quot;\+&quot;
represents a repeat of once or more. An expression followed by
&quot;?&quot; may be repeated zero or one times only, if the
expression is compiled with the flag regbase::bk_plus_qm then
&quot;?&quot; is an ordinary character and &quot;\?&quot;
represents the repeat zero or once operator. When it is necessary
to specify the minimum and maximum number of repeats explicitly,
the bounds operator &quot;{}&quot; may be used, thus &quot;a{2}&quot;
is the letter &quot;a&quot; repeated exactly twice, &quot;a{2,4}&quot;
represents the letter &quot;a&quot; repeated between 2 and 4
times, and &quot;a{2,}&quot; represents the letter &quot;a&quot;
repeated at least twice with no upper limit. Note that there must
be no white-space inside the {}, and there is no upper limit on
the values of the lower and upper bounds. When the expression is
compiled with the flag regbase::bk_braces then &quot;{&quot; and
&quot;}&quot; are ordinary characters and &quot;\{&quot; and
&quot;\}&quot; are used to delimit bounds instead. All repeat
expressions refer to the shortest possible previous sub-expression:
a single character; a character set, or a sub-expression grouped
with &quot;()&quot; for example. </p>
<p>Examples: </p>
<p>&quot;ba*&quot; will match all of &quot;b&quot;, &quot;ba&quot;,
&quot;baaa&quot; etc. </p>
<p>&quot;ba+&quot; will match &quot;ba&quot; or &quot;baaaa&quot;
for example but not &quot;b&quot;. </p>
<p>&quot;ba?&quot; will match &quot;b&quot; or &quot;ba&quot;. </p>
<p>&quot;ba{2,4}&quot; will match &quot;baa&quot;, &quot;baaa&quot;
and &quot;baaaa&quot;. </p>
<p><i>Non-greedy repeats</i> </p>
<p>Whenever the &quot;extended&quot; regular expression syntax is
in use (the default) then non-greedy repeats are possible by
appending a '?' after the repeat; a non-greedy repeat is one
which will match the <i>shortest</i> possible string. </p>
<p>For example to match html tag pairs one could use something
like: </p>
<p>&quot;&lt;\s*tagname[^&gt;]*&gt;(.*?)&lt;\s*/tagname\s*&gt;&quot;
</p>
<p>In this case $1 will contain the text between the tag pairs,
and will be the shortest possible matching string. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Parenthesis</i> </p>
<p>Parentheses serve two purposes, to group items together into a
sub-expression, and to mark what generated the match. For example
the expression &quot;(ab)*&quot; would match all of the string
&quot;ababab&quot;. The matching algorithms <a
href="template_class_ref.htm#query_match">regex_match</a> and <a
href="template_class_ref.htm#reg_search">regex_search</a> each
take an instance of <a href="template_class_ref.htm#reg_match">match_results</a>
that reports what caused the match, on exit from these functions
the <a href="template_class_ref.htm#reg_match">match_results</a>
contains information both on what the whole expression matched
and on what each sub-expression matched. In the example above
match_results[1] would contain a pair of iterators denoting the
final &quot;ab&quot; of the matching string. It is permissible
for sub-expressions to match null strings. If a sub-expression
takes no part in a match - for example if it is part of an
alternative that is not taken - then both of the iterators that
are returned for that sub-expression point to the end of the
input string, and the <i>matched</i> parameter for that sub-expression
is <i>false</i>. Sub-expressions are indexed from left to right
starting from 1, sub-expression 0 is the whole expression. </p>
<p><i>Non-Marking Parenthesis</i> </p>
<p>Sometimes you need to group sub-expressions with parenthesis,
but don't want the parenthesis to spit out another marked sub-expression,
in this case a non-marking parenthesis (?:expression) can be used.
For example the following expression creates no sub-expressions: </p>
<p>&quot;(?:abc)*&quot;</p>
<p><em>Forward Lookahead Asserts</em>&nbsp; </p>
<p>There are two forms of these; one for positive forward
lookahead asserts, and one for negative lookahead asserts:</p>
<p>&quot;(?=abc)&quot; matches zero characters only if they are
followed by the expression &quot;abc&quot;.</p>
<p>&quot;(?!abc)&quot; matches zero characters only if they are
not followed by the expression &quot;abc&quot;.</p>
<p><i>Alternatives</i> </p>
<p>Alternatives occur when the expression can match either one
sub-expression or another, each alternative is separated by a
&quot;|&quot;, or a &quot;\|&quot; if the flag regbase::bk_vbar
is set, or by a newline character if the flag regbase::newline_alt
is set. Each alternative is the largest possible previous sub-expression;
this is the opposite behaviour from repetition operators. </p>
<p>Examples: </p>
<p>&quot;a(b|c)&quot; could match &quot;ab&quot; or &quot;ac&quot;.
</p>
<p>&quot;abc|def&quot; could match &quot;abc&quot; or &quot;def&quot;.
<br>
&nbsp; <br>
&nbsp; </p>
<p><i>Sets</i> </p>
<p>A set is a set of characters that can match any single
character that is a member of the set. Sets are delimited by
&quot;[&quot; and &quot;]&quot; and can contain literals,
character ranges, character classes, collating elements and
equivalence classes. Set declarations that start with &quot;^&quot;
contain the compliment of the elements that follow. </p>
<p>Examples: </p>
<p>Character literals: </p>
<p>&quot;[abc]&quot; will match either of &quot;a&quot;, &quot;b&quot;,
or &quot;c&quot;. </p>
<p>&quot;[^abc] will match any character other than &quot;a&quot;,
&quot;b&quot;, or &quot;c&quot;. </p>
<p>Character ranges: </p>
<p>&quot;[a-z]&quot; will match any character in the range &quot;a&quot;
to &quot;z&quot;. </p>
<p>&quot;[^A-Z]&quot; will match any character other than those
in the range &quot;A&quot; to &quot;Z&quot;. </p>
<p>Note that character ranges are highly locale dependent: they
match any character that collates between the endpoints of the
range, ranges will only behave according to ASCII rules when the
default &quot;C&quot; locale is in effect. For example if the
library is compiled with the Win32 localization model, then [a-z]
will match the ASCII characters a-z, and also 'A', 'B' etc, but
not 'Z' which collates just after 'z'. This locale specific
behaviour can be disabled by specifying regbase::nocollate when
compiling, this is the default behaviour when using regbase::normal,
and forces ranges to collate according to ASCII character code.
Likewise, if you use the POSIX C API functions then setting
REG_NOCOLLATE turns off locale dependent collation. </p>
<p>Character classes are denoted using the syntax &quot;[:classname:]&quot;
within a set declaration, for example &quot;[[:space:]]&quot; is
the set of all whitespace characters. Character classes are only
available if the flag regbase::char_classes is set. The available
character classes are: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="50%">alnum</td>
<td valign="top" width="50%">Any alpha numeric character.</td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">alpha</td>
<td valign="top" width="50%">Any alphabetical character a-z
and A-Z. Other characters may also be included depending
upon the locale.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">blank</td>
<td valign="top" width="50%">Any blank character, either
a space or a tab.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">cntrl</td>
<td valign="top" width="50%">Any control character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">digit</td>
<td valign="top" width="50%">Any digit 0-9.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">graph</td>
<td valign="top" width="50%">Any graphical character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">lower</td>
<td valign="top" width="50%">Any lower case character a-z.
Other characters may also be included depending upon the
locale.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">print</td>
<td valign="top" width="50%">Any printable character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">punct</td>
<td valign="top" width="50%">Any punctuation character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">space</td>
<td valign="top" width="50%">Any whitespace character.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">upper</td>
<td valign="top" width="50%">Any upper case character A-Z.
Other characters may also be included depending upon the
locale.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">xdigit</td>
<td valign="top" width="50%">Any hexadecimal digit
character, 0-9, a-f and A-F.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">word</td>
<td valign="top" width="50%">Any word character - all
alphanumeric characters plus the underscore.</td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="50%">unicode</td>
<td valign="top" width="50%">Any character whose code is
greater than 255, this applies to the wide character
traits classes only.</td>
<td>&nbsp;</td>
</tr>
</table>
<p>There are some shortcuts that can be used in place of the
character classes, provided the flag regbase::escape_in_lists is
set then you can use: </p>
<p>\w in place of [:word:] </p>
<p>\s in place of [:space:] </p>
<p>\d in place of [:digit:] </p>
<p>\l in place of [:lower:] </p>
<p>\u in place of [:upper:] <br>
&nbsp; <br>
&nbsp; </p>
<p>Collating elements take the general form [.tagname.] inside a
set declaration, where <i>tagname</i> is either a single
character, or a name of a collating element, for example [[.a.]]
is equivalent to [a], and [[.comma.]] is equivalent to [,]. The
library supports all the standard POSIX collating element names,
and in addition the following digraphs: &quot;ae&quot;, &quot;ch&quot;,
&quot;ll&quot;, &quot;ss&quot;, &quot;nj&quot;, &quot;dz&quot;,
&quot;lj&quot;, each in lower, upper and title case variations.
Multi-character collating elements can result in the set matching
more than one character, for example [[.ae.]] would match two
characters, but note that [^[.ae.]] would only match one
character. <br>
&nbsp; <br>
&nbsp; </p>
<p>Equivalence classes take the general form [=tagname=] inside a
set declaration, where <i>tagname</i> is either a single
character, or a name of a collating element, and matches any
character that is a member of the same primary equivalence class
as the collating element [.tagname.]. An equivalence class is a
set of characters that collate the same, a primary equivalence
class is a set of characters whose primary sort key are all the
same (for example strings are typically collated by character,
then by accent, and then by case; the primary sort key then
relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class
corresponding to <i>tagname</i>, then [=tagname=] is exactly the
same as [.tagname.]. Unfortunately there is no locale independent
method of obtaining the primary sort key for a character, except
under Win32. For other operating systems the library will &quot;guess&quot;
the primary sort key from the full sort key (obtained from <i>strxfrm</i>),
so equivalence classes are probably best considered broken under
any operating system other than Win32. <br>
&nbsp; <br>
&nbsp; </p>
<p>To include a literal &quot;-&quot; in a set declaration then:
make it the first character after the opening &quot;[&quot; or
&quot;[^&quot;, the endpoint of a range, a collating element, or
if the flag regbase::escape_in_lists is set then precede with an
escape character as in &quot;[\-]&quot;. To include a literal
&quot;[&quot; or &quot;]&quot; or &quot;^&quot; in a set then
make them the endpoint of a range, a collating element, or
precede with an escape character if the flag regbase::escape_in_lists
is set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Line anchors</i> </p>
<p>An anchor is something that matches the null string at the
start or end of a line: &quot;^&quot; matches the null string at
the start of a line, &quot;$&quot; matches the null string at the
end of a line. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Back references</i> </p>
<p>A back reference is a reference to a previous sub-expression
that has already been matched, the reference is to what the sub-expression
matched, not to the expression itself. A back reference consists
of the escape character &quot;\&quot; followed by a digit &quot;1&quot;
to &quot;9&quot;, &quot;\1&quot; refers to the first sub-expression,
&quot;\2&quot; to the second etc. For example the expression
&quot;(.*)\1&quot; matches any string that is repeated about its
mid-point for example &quot;abcabc&quot; or &quot;xyzxyz&quot;. A
back reference to a sub-expression that did not participate in
any match, matches the null string: NB this is different to some
other regular expression matchers. Back references are only
available if the expression is compiled with the flag regbase::bk_refs
set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Characters by code</i> </p>
<p>This is an extension to the algorithm that is not available in
other libraries, it consists of the escape character followed by
the digit &quot;0&quot; followed by the octal character code. For
example &quot;\023&quot; represents the character whose octal
code is 23. Where ambiguity could occur use parentheses to break
the expression up: &quot;\0103&quot; represents the character
whose code is 103, &quot;(\010)3 represents the character 10
followed by &quot;3&quot;. To match characters by their
hexadecimal code, use \x followed by a string of hexadecimal
digits, optionally enclosed inside {}, for example \xf0 or
\x{aff}, notice the latter example is a Unicode character. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Word operators</i> </p>
<p>The following operators are provided for compatibility with
the GNU regular expression library. </p>
<p>&quot;\w&quot; matches any single character that is a member
of the &quot;word&quot; character class, this is identical to the
expression &quot;[[:word:]]&quot;. </p>
<p>&quot;\W&quot; matches any single character that is not a
member of the &quot;word&quot; character class, this is identical
to the expression &quot;[^[:word:]]&quot;. </p>
<p>&quot;\&lt;&quot; matches the null string at the start of a
word. </p>
<p>&quot;\&gt;&quot; matches the null string at the end of the
word. </p>
<p>&quot;\b&quot; matches the null string at either the start or
the end of a word. </p>
<p>&quot;\B&quot; matches a null string within a word. </p>
<p>The start of the sequence passed to the matching algorithms is
considered to be a potential start of a word unless the flag
match_not_bow is set. The end of the sequence passed to the
matching algorithms is considered to be a potential end of a word
unless the flag match_not_eow is set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Buffer operators</i> </p>
<p>The following operators are provide for compatibility with the
GNU regular expression library, and Perl regular expressions: </p>
<p>&quot;\`&quot; matches the start of a buffer. </p>
<p>&quot;\A&quot; matches the start of the buffer. </p>
<p>&quot;\'&quot; matches the end of a buffer. </p>
<p>&quot;\z&quot; matches the end of a buffer. </p>
<p>&quot;\Z&quot; matches the end of a buffer, or possibly one or
more new line characters followed by the end of the buffer. </p>
<p>A buffer is considered to consist of the whole sequence passed
to the matching algorithms, unless the flags match_not_bob or
match_not_eob are set. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Escape operator</i> </p>
<p>The escape character &quot;\&quot; has several meanings. </p>
<p>Inside a set declaration the escape character is a normal
character unless the flag regbase::escape_in_lists is set in
which case whatever follows the escape is a literal character
regardless of its normal meaning. </p>
<p>The escape operator may introduce an operator for example:
back references, or a word operator. </p>
<p>The escape operator may make the following character normal,
for example &quot;\*&quot; represents a literal &quot;*&quot;
rather than the repeat operator. <br>
&nbsp; <br>
&nbsp; </p>
<p><i>Single character escape sequences</i> </p>
<p>The following escape sequences are aliases for single
characters: <br>
&nbsp; </p>
<table border="0" cellpadding="7" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="33%">Escape sequence </td>
<td valign="top" width="33%">Character code </td>
<td valign="top" width="33%">Meaning </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\a </td>
<td valign="top" width="33%">0x07 </td>
<td valign="top" width="33%">Bell character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\f </td>
<td valign="top" width="33%">0x0C </td>
<td valign="top" width="33%">Form feed. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\n </td>
<td valign="top" width="33%">0x0A </td>
<td valign="top" width="33%">Newline character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\r </td>
<td valign="top" width="33%">0x0D </td>
<td valign="top" width="33%">Carriage return. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\t </td>
<td valign="top" width="33%">0x09 </td>
<td valign="top" width="33%">Tab character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\v </td>
<td valign="top" width="33%">0x0B </td>
<td valign="top" width="33%">Vertical tab. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\e </td>
<td valign="top" width="33%">0x1B </td>
<td valign="top" width="33%">ASCII Escape character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\0dd </td>
<td valign="top" width="33%">0dd </td>
<td valign="top" width="33%">An octal character code,
where <i>dd</i> is one or more octal digits. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\xXX </td>
<td valign="top" width="33%">0xXX </td>
<td valign="top" width="33%">A hexadecimal character
code, where XX is one or more hexadecimal digits. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\x{XX} </td>
<td valign="top" width="33%">0xXX </td>
<td valign="top" width="33%">A hexadecimal character
code, where XX is one or more hexadecimal digits,
optionally a unicode character. </td>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
<td valign="top" width="33%">\cZ </td>
<td valign="top" width="33%">z-@ </td>
<td valign="top" width="33%">An ASCII escape sequence
control-Z, where Z is any ASCII character greater than or
equal to the character code for '@'. </td>
<td>&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><i>Miscellaneous escape sequences:</i> </p>
<p>The following are provided mostly for perl compatibility, but
note that there are some differences in the meanings of \l \L \u
and \U: <br>
&nbsp; </p>
<table border="0" cellpadding="6" cellspacing="0" width="100%">
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\w </td>
<td valign="top" width="45%">Equivalent to [[:word:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\W </td>
<td valign="top" width="45%">Equivalent to [^[:word:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\s </td>
<td valign="top" width="45%">Equivalent to [[:space:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\S </td>
<td valign="top" width="45%">Equivalent to [^[:space:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\d </td>
<td valign="top" width="45%">Equivalent to [[:digit:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\D </td>
<td valign="top" width="45%">Equivalent to [^[:digit:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\l </td>
<td valign="top" width="45%">Equivalent to [[:lower:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\L </td>
<td valign="top" width="45%">Equivalent to [^[:lower:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\u </td>
<td valign="top" width="45%">Equivalent to [[:upper:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\U </td>
<td valign="top" width="45%">Equivalent to [^[:upper:]]. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\C </td>
<td valign="top" width="45%">Any single character,
equivalent to '.'. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\X </td>
<td valign="top" width="45%">Match any Unicode combining
character sequence, for example &quot;a\x 0301&quot; (a
letter a with an acute). </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\Q </td>
<td valign="top" width="45%">The begin quote operator,
everything that follows is treated as a literal character
until a \E end quote operator is found. </td>
<td width="5%">&nbsp;</td>
</tr>
<tr>
<td width="5%">&nbsp;</td>
<td valign="top" width="45%">\E </td>
<td valign="top" width="45%">The end quote operator,
terminates a sequence begun with \Q. </td>
<td width="5%">&nbsp;</td>
</tr>
</table>
<p><br>
&nbsp; </p>
<p><i>What gets matched?</i> </p>
<p>The regular expression library will match the first possible
matching string, if more than one string starting at a given
location can match then it matches the longest possible string,
unless the flag match_any is set, in which case the first match
encountered is returned. Use of the match_any option can reduce
the time taken to find the match - but is only useful if the user
is less concerned about what matched - for example it would not
be suitable for search and replace operations. In cases where
their are multiple possible matches all starting at the same
location, and all of the same length, then the match chosen is
the one with the longest first sub-expression, if that is the
same for two or more matches, then the second sub-expression will
be examined and so on. <br>
</p>
<hr>
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
</body>
</html>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff