mirror of
https://github.com/boostorg/regex.git
synced 2025-06-25 20:01:37 +02:00
Compare commits
5 Commits
svn-branch
...
boost-1.30
Author | SHA1 | Date | |
---|---|---|---|
8441264579 | |||
644e41ba31 | |||
51fe53f437 | |||
6d4dd63cba | |||
686a939498 |
1304
appendix.htm
1304
appendix.htm
File diff suppressed because it is too large
Load Diff
205
faq.htm
205
faq.htm
@ -1,205 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++ - FAQ</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, FAQ.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><font color="#FF0000">Q. Why does using parenthesis in a
|
||||
regular expression change the result of a match?</font></p>
|
||||
|
||||
<p>Parentheses don't only mark; they determine what the best
|
||||
match is as well. regex++ tries to follow the POSIX standard
|
||||
leftmost longest rule for determining what matched. So if there
|
||||
is more than one possible match after considering the whole
|
||||
expression, it looks next at the first sub-expression and then
|
||||
the second sub-expression and so on. So...</p>
|
||||
|
||||
<pre>"(0*)([0-9]*)" against "00123" would produce
|
||||
$1 = "00"
|
||||
$2 = "123"</pre>
|
||||
|
||||
<p>where as</p>
|
||||
|
||||
<pre>"0*([0-9)*" against "00123" would produce
|
||||
$1 = "00123"</pre>
|
||||
|
||||
<p>If you think about it, had $1 only matched the "123",
|
||||
this would be "less good" than the match "00123"
|
||||
which is both further to the left and longer. If you want $1 to
|
||||
match only the "123" part, then you need to use
|
||||
something like:</p>
|
||||
|
||||
<pre>"0*([1-9][0-9]*)"</pre>
|
||||
|
||||
<p>as the expression.</p>
|
||||
|
||||
<p><font color="#FF0000">Q. Configure says that my compiler is
|
||||
unable to merge template instances, what does this mean?</font> </p>
|
||||
|
||||
<p>A. When you compile template code, you can end up with the
|
||||
same template instances in multiple translation units - this will
|
||||
lead to link time errors unless your compiler/linker is smart
|
||||
enough to merge these template instances into a single record in
|
||||
the executable file. If you see this warning after running
|
||||
configure, then you can still link to libregex++.a if: </p>
|
||||
|
||||
<ol>
|
||||
<li>You use only the low-level template classes (reg_expression<>
|
||||
match_results<> etc), from a single translation
|
||||
unit, and use no other part of regex++.</li>
|
||||
<li>You use only the POSIX API functions (regcomp regexec etc),
|
||||
and no other part of regex++.</li>
|
||||
<li>You use only the high level class RegEx, and no other
|
||||
part of regex++. </li>
|
||||
</ol>
|
||||
|
||||
<p>Another option is to create a master include file, which
|
||||
#include's all the regex++ source files, and all the source files
|
||||
in which you use regex++. You then compile and link this master
|
||||
file as a single translation unit. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Configure says that my compiler is
|
||||
unable to merge template instances from archive files, what does
|
||||
this mean?</font> </p>
|
||||
|
||||
<p>A. When you compile template code, you can end up with the
|
||||
same template instances in multiple translation units - this will
|
||||
lead to link time errors unless your compiler/linker is smart
|
||||
enough to merge these template instances into a single record in
|
||||
the executable file. Some compilers are able to do this for
|
||||
normal .cpp or .o files, but fail if the object file has been
|
||||
placed in a library archive. If you see this warning after
|
||||
running configure, then you can still link to libregex++.a if: </p>
|
||||
|
||||
<ol>
|
||||
<li>You use only the low-level template classes (reg_expression<>
|
||||
match_results<> etc), and use no other part of
|
||||
regex++.</li>
|
||||
<li>You use only the POSIX API functions (regcomp regexec etc),
|
||||
and no other part of regex++.</li>
|
||||
<li>You use only the high level class RegEx, and no other
|
||||
part of regex++. </li>
|
||||
</ol>
|
||||
|
||||
<p>Another option is to add the regex++ source files directly to
|
||||
your project instead of linking to libregex++.a, generally you
|
||||
should do this only if you are getting link time errors with
|
||||
libregex++.a. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Configure says that my compiler can't
|
||||
merge templates containing switch statements, what does this
|
||||
mean?</font> </p>
|
||||
|
||||
<p>A. Some compilers can't merge templates that contain static
|
||||
data - this includes switch statements which implicitly generate
|
||||
static data as well as code. Principally this affects the egcs
|
||||
compiler - but note gcc 2.81 also suffers from this problem - the
|
||||
compiler will compile and link the code - but the code will not
|
||||
run because the code and the static data it uses have become
|
||||
separated. The default behaviour of regex++ is to try and fix
|
||||
this problem by declaring "problem" templates inside
|
||||
unnamed namespaces, so that the templates have internal linkage.
|
||||
Note that this can result in a great deal of code bloat. If the
|
||||
compiler doesn't support namespaces, or if code bloat becomes a
|
||||
problem, then follow the guidelines above for placing all the
|
||||
templates used in a single translation unit, and edit boost/regex/config.hpp
|
||||
so that BOOST_REGEX_NO_TEMPLATE_SWITCH_MERGE is no longer defined.
|
||||
</p>
|
||||
|
||||
<p><font color="#FF0000">Q. I can't get regex++ to work with
|
||||
escape characters, what's going on?</font> </p>
|
||||
|
||||
<p>A. If you embed regular expressions in C++ code, then remember
|
||||
that escape characters are processed twice: once by the C++
|
||||
compiler, and once by the regex++ expression compiler, so to pass
|
||||
the regular expression \d+ to regex++, you need to embed "\\d+"
|
||||
in your code. Likewise to match a literal backslash you will need
|
||||
to embed "\\\\" in your code. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Why don't character ranges work
|
||||
properly?</font> <br>
|
||||
A. The POSIX standard specifies that character range expressions
|
||||
are locale sensitive - so for example the expression [A-Z] will
|
||||
match any collating element that collates between 'A' and 'Z'.
|
||||
That means that for most locales other than "C" or
|
||||
"POSIX", [A-Z] would match the single character 't' for
|
||||
example, which is not what most people expect - or at least not
|
||||
what most people have come to expect from regular expression
|
||||
engines. For this reason, the default behaviour of regex++ is to
|
||||
turn locale sensitive collation off by setting the regbase::nocollate
|
||||
compile time flag (this is set by regbase::normal). However if
|
||||
you set a non-default compile time flag - for example regbase::extended
|
||||
or regbase::basic, then locale dependent collation will be
|
||||
enabled, this also applies to the POSIX API functions which use
|
||||
either regbase::extended or regbase::basic internally, in the
|
||||
latter case use REG_NOCOLLATE in combination with either
|
||||
REG_BASIC or REG_EXTENDED when invoking regcomp if you don't want
|
||||
locale sensitive collation. <i>[Note - when regbase::nocollate in
|
||||
effect, the library behaves "as if" the LC_COLLATE
|
||||
locale category were always "C", regardless of what its
|
||||
actually set to - end note</i>]. </p>
|
||||
|
||||
<p><font color="#FF0000"> Q. Why can't I use the "convenience"
|
||||
versions of query_match/reg_search/reg_grep/reg_format/reg_merge?</font>
|
||||
</p>
|
||||
|
||||
<p>A. These versions may or may not be available depending upon
|
||||
the capabilities of your compiler, the rules determining the
|
||||
format of these functions are quite complex - and only the
|
||||
versions visible to a standard compliant compiler are given in
|
||||
the help. To find out what your compiler supports, run <boost/regex.hpp>
|
||||
through your C++ pre-processor, and search the output file for
|
||||
the function that you are interested in. </p>
|
||||
|
||||
<p><font color="#FF0000">Q. Why are there no throw specifications
|
||||
on any of the functions? What exceptions can the library throw?</font>
|
||||
</p>
|
||||
|
||||
<p>A. Not all compilers support (or honor) throw specifications,
|
||||
others support them but with reduced efficiency. Throw
|
||||
specifications may be added at a later date as compilers begin to
|
||||
handle this better. The library should throw only three types of
|
||||
exception: boost::bad_expression can be thrown by reg_expression
|
||||
when compiling a regular expression, std::runtime_error can be
|
||||
thrown when a call to reg_expression::imbue tries to open a
|
||||
message catalogue that doesn't exist or when a call to RegEx::GrepFiles
|
||||
or RegEx::FindFiles tries to open a file that cannot be opened,
|
||||
finally std::bad_alloc can be thrown by just about any of the
|
||||
functions in this library. </p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
@ -1,243 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, Format String Reference</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Format
|
||||
String Reference.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="format_string"></a>Format String Syntax</h3>
|
||||
|
||||
<p>Format strings are used by the algorithms <a
|
||||
href="template_class_ref.htm#reg_format">regex_format</a> and <a
|
||||
href="template_class_ref.htm#reg_merge">regex_merge</a>, and are
|
||||
used to transform one string into another. </p>
|
||||
|
||||
<p>There are three kind of format string: sed, perl and extended,
|
||||
the extended syntax is the default so this is covered first. </p>
|
||||
|
||||
<p><b><i>Extended format syntax</i></b> </p>
|
||||
|
||||
<p>In format strings, all characters are treated as literals
|
||||
except: ()$\?: </p>
|
||||
|
||||
<p>To use any of these as literals you must prefix them with the
|
||||
escape character \ </p>
|
||||
|
||||
<p>The following special sequences are recognized: <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Grouping:</i> </p>
|
||||
|
||||
<p>Use the parenthesis characters ( and ) to group sub-expressions
|
||||
within the format string, use \( and \) to represent literal '('
|
||||
and ')'. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Sub-expression expansions:</i> </p>
|
||||
|
||||
<p>The following perl like expressions expand to a particular
|
||||
matched sub-expression: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$`</td>
|
||||
<td valign="top" width="43%">Expands to all the text from
|
||||
the end of the previous match to the start of the current
|
||||
match, if there was no previous match in the current
|
||||
operation, then everything from the start of the input
|
||||
string to the start of the match.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$'</td>
|
||||
<td valign="top" width="43%">Expands to all the text from
|
||||
the end of the match to the end of the input string.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$&</td>
|
||||
<td valign="top" width="43%">Expands to all of the
|
||||
current match.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$0</td>
|
||||
<td valign="top" width="43%">Expands to all of the
|
||||
current match.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">$N</td>
|
||||
<td valign="top" width="43%">Expands to the text that
|
||||
matched sub-expression <i>N</i>.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><i>Conditional expressions:</i> </p>
|
||||
|
||||
<p>Conditional expressions allow two different format strings to
|
||||
be selected dependent upon whether a sub-expression participated
|
||||
in the match or not: </p>
|
||||
|
||||
<p>?Ntrue_expression:false_expression </p>
|
||||
|
||||
<p>Executes true_expression if sub-expression <i>N</i>
|
||||
participated in the match, otherwise executes false_expression. </p>
|
||||
|
||||
<p>Example: suppose we search for "(while)|(for)" then
|
||||
the format string "?1WHILE:FOR" would output what
|
||||
matched, but in upper case. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Escape sequences:</i> </p>
|
||||
|
||||
<p>The following escape sequences are also allowed: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\a</td>
|
||||
<td valign="top" width="43%">The bell character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\f</td>
|
||||
<td valign="top" width="43%">The form feed character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\n</td>
|
||||
<td valign="top" width="43%">The newline character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\r</td>
|
||||
<td valign="top" width="43%">The carriage return
|
||||
character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\t</td>
|
||||
<td valign="top" width="43%">The tab character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\v</td>
|
||||
<td valign="top" width="43%">A vertical tab character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\x</td>
|
||||
<td valign="top" width="43%">A hexadecimal character -
|
||||
for example \x0D.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\x{}</td>
|
||||
<td valign="top" width="43%">A possible unicode
|
||||
hexadecimal character - for example \x{1A0}</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\cx</td>
|
||||
<td valign="top" width="43%">The ASCII escape character
|
||||
x, for example \c@ is equivalent to escape-@.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\e</td>
|
||||
<td valign="top" width="43%">The ASCII escape character.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="8%"> </td>
|
||||
<td valign="top" width="40%">\dd</td>
|
||||
<td valign="top" width="43%">An octal character constant,
|
||||
for example \10.</td>
|
||||
<td valign="top" width="9%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><b><i>Perl format strings</i></b> </p>
|
||||
|
||||
<p>Perl format strings are the same as the default syntax except
|
||||
that the characters ()?: have no special meaning. </p>
|
||||
|
||||
<p><b><i>Sed format strings</i></b> </p>
|
||||
|
||||
<p>Sed format strings use only the characters \ and & as
|
||||
special characters. </p>
|
||||
|
||||
<p>\n where n is a digit, is expanded to the nth sub-expression. </p>
|
||||
|
||||
<p>& is expanded to the whole of the match (equivalent to \0).
|
||||
</p>
|
||||
|
||||
<p>Other escape sequences are expanded as per the default syntax.
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
572
hl_ref.htm
572
hl_ref.htm
@ -1,572 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, RegEx Class Reference</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, RegEx Class
|
||||
Reference. </h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="RegEx"></a><i>Class RegEx</i></h3>
|
||||
|
||||
<p>#include <boost/cregex.hpp> </p>
|
||||
|
||||
<p>The class RegEx provides a high level simplified interface to
|
||||
the regular expression library, this class only handles narrow
|
||||
character strings, and regular expressions always follow the
|
||||
"normal" syntax - that is the same as the standard
|
||||
POSIX extended syntax, but with locale specific collation
|
||||
disabled, and escape characters inside character set declarations
|
||||
are allowed. </p>
|
||||
|
||||
<pre><b>typedef</b> <b>bool</b> (*GrepCallback)(<b>const</b> RegEx& expression);
|
||||
<b>typedef</b> <b>bool</b> (*GrepFileCallback)(<b>const</b> <b>char</b>* file, <b>const</b> RegEx& expression);
|
||||
<b>typedef</b> <b>bool</b> (*FindFilesCallback)(<b>const</b> <b>char</b>* file);
|
||||
|
||||
<b>class</b> RegEx
|
||||
{
|
||||
<b>public</b>:
|
||||
RegEx();
|
||||
RegEx(<b>const</b> RegEx& o);
|
||||
~RegEx();
|
||||
RegEx(<b>const</b> <b>char</b>* c, <b>bool</b> icase = <b>false</b>);
|
||||
<strong>explicit</strong> RegEx(<b>const</b> std::string& s, <b>bool</b> icase = <b>false</b>);
|
||||
RegEx& <b>operator</b>=(<b>const</b> RegEx& o);
|
||||
RegEx& <b>operator</b>=(<b>const</b> <b>char</b>* p);
|
||||
RegEx& <b>operator</b>=(<b>const</b> std::string& s);
|
||||
<b>unsigned</b> <b>int</b> SetExpression(<b>const</b> <b>char</b>* p, <b>bool</b> icase = <b>false</b>);
|
||||
<b>unsigned</b> <b>int</b> SetExpression(<b>const</b> std::string& s, <b>bool</b> icase = <b>false</b>);
|
||||
std::string Expression()<b>const</b>;
|
||||
<font color="#000080"><i>//
|
||||
</i> <i>// now matching operators: </i>
|
||||
<i>// </i></font>
|
||||
<b>bool</b> Match(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>bool</b> Match(<b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>bool</b> Search(<b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>bool</b> Search(<b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(GrepCallback cb, <b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<std::string>& v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<std::string>& v, <b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<<b>unsigned</b> <b>int</b>>& v, <b>const</b> <b>char</b>* p, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> Grep(std::vector<<b>unsigned</b> <b>int</b>>& v, <b>const</b> std::string& s, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> GrepFiles(GrepFileCallback cb, <b>const</b> std::string& files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>* files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
<b>unsigned</b> <b>int</b> FindFiles(FindFilesCallback cb, <b>const</b> std::string& files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
std::string Merge(<b>const</b> std::string& in, <b>const</b> std::string& fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags = match_default);
|
||||
std::string Merge(<b>const</b> char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>, <b>unsigned int </b>flags = match_default);
|
||||
<b>unsigned</b> Split(std::vector<std::string>& v, std::string& s, <b>unsigned</b> flags = match_default, <b>unsigned</b> max_count = ~0);
|
||||
<font color="#000080"><i>//
|
||||
</i> <i>// now operators for returning what matched in more detail:
|
||||
</i> <i>//
|
||||
</i></font> <b>unsigned</b> <b>int</b> Position(<b>int</b> i = 0)<b>const</b>;
|
||||
<b>unsigned</b> <b>int</b> Length(<b>int</b> i = 0)<b>const</b>;
|
||||
<strong>bool</strong> Matched(<strong>int</strong> i = 0)<strong>const</strong>;
|
||||
<b>unsigned</b> <b>int</b> Line()<b>const</b>;
|
||||
<b>unsigned int</b> Marks() const;
|
||||
std::string What(<b>int</b> i)<b>const</b>;
|
||||
std::string <b>operator</b>[](<b>int</b> i)<b>const</b> ;
|
||||
|
||||
<strong>static const unsigned int</strong> npos;
|
||||
}; </pre>
|
||||
|
||||
<p>Member functions for class RegEx are defined as follows: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx();</td>
|
||||
<td valign="top" width="42%">Default constructor,
|
||||
constructs an instance of RegEx without any valid
|
||||
expression.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx(<b>const</b>
|
||||
RegEx& o);</td>
|
||||
<td valign="top" width="42%">Copy constructor, all the
|
||||
properties of parameter <i>o</i> are copied.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx(<b>const</b> <b>char</b>*
|
||||
c, <b>bool</b> icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Constructs an instance of
|
||||
RegEx, setting the expression to <i>c</i>, if <i>icase</i>
|
||||
is <i>true</i> then matching is insensitive to case,
|
||||
otherwise it is sensitive to case. Throws <i>bad_expression</i>
|
||||
on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx(<b>const</b> std::string&
|
||||
s, <b>bool</b> icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Constructs an instance of
|
||||
RegEx, setting the expression to <i>s</i>, if <i>icase </i>is
|
||||
<i>true</i> then matching is insensitive to case,
|
||||
otherwise it is sensitive to case. Throws <i>bad_expression</i>
|
||||
on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx& <b>operator</b>=(<b>const</b>
|
||||
RegEx& o);</td>
|
||||
<td valign="top" width="42%">Default assignment operator.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx& <b>operator</b>=(<b>const</b>
|
||||
<b>char</b>* p);</td>
|
||||
<td valign="top" width="42%">Assignment operator,
|
||||
equivalent to calling <i>SetExpression(p, false).</i>
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">RegEx& <b>operator</b>=(<b>const</b>
|
||||
std::string& s);</td>
|
||||
<td valign="top" width="42%">Assignment operator,
|
||||
equivalent to calling <i>SetExpression(s, false).</i>
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
SetExpression(<b>constchar</b>* p, <b>bool</b> icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Sets the current expression
|
||||
to <i>p</i>, if <i>icase</i> is <i>true</i> then matching
|
||||
is insensitive to case, otherwise it is sensitive to case.
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
SetExpression(<b>const</b> std::string& s, <b>bool</b>
|
||||
icase = <b>false</b>);</td>
|
||||
<td valign="top" width="42%">Sets the current expression
|
||||
to <i>s</i>, if <i>icase</i> is <i>true</i> then matching
|
||||
is insensitive to case, otherwise it is sensitive to case.
|
||||
Throws <i>bad_expression</i> on failure.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string Expression()<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns a copy of the
|
||||
current regular expression.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
|
||||
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Attempts to match the
|
||||
current expression against the text <i>p</i> using the
|
||||
match flags <i>flags</i> - see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the expression matches the whole
|
||||
of the input string.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Match(<b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default) ;</td>
|
||||
<td valign="top" width="42%">Attempts to match the
|
||||
current expression against the text <i>s</i> using the
|
||||
match flags <i>flags</i> - see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the expression matches the whole
|
||||
of the input string.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
|
||||
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Attempts to find a match for
|
||||
the current expression somewhere in the text <i>p</i>
|
||||
using the match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the match succeeds.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>bool</b> Search(<b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default) ;</td>
|
||||
<td valign="top" width="42%">Attempts to find a match for
|
||||
the current expression somewhere in the text <i>s</i>
|
||||
using the match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
Returns <i>true</i> if the match succeeds.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(GrepCallback cb, <b>const</b> <b>char</b>* p, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>p</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match found calls the call-back function <i>cb</i>
|
||||
as: cb(*this); <p>If at any stage the call-back function
|
||||
returns false then the grep operation terminates,
|
||||
otherwise continues until no further matches are found.
|
||||
Returns the number of matches found.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(GrepCallback cb, <b>const</b> std::string& s, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>s</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match found calls the call-back function <i>cb</i>
|
||||
as: cb(*this); <p>If at any stage the call-back function
|
||||
returns false then the grep operation terminates,
|
||||
otherwise continues until no further matches are found.
|
||||
Returns the number of matches found. </p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<std::string>& v, <b>const</b> <b>char</b>*
|
||||
p, <b>unsigned</b> <b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>p</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes a copy of what matched onto <i>v</i>.
|
||||
Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<std::string>& v, <b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>s</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes a copy of what matched onto <i>v</i>.
|
||||
Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<<b>unsigned int</b>>& v, <b>const</b>
|
||||
<b>char</b>* p, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>p</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes the starting index of what matched
|
||||
onto <i>v</i>. Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Grep(std::vector<<b>unsigned int</b>>& v, <b>const</b>
|
||||
std::string& s, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the text <i>s</i> using the match
|
||||
flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match pushes the starting index of what matched
|
||||
onto <i>v</i>. Returns the number of matches found.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
GrepFiles(GrepFileCallback cb, <b>const</b> <b>char</b>*
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the files <i>files</i> using the
|
||||
match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match calls the call-back function cb. <p>If
|
||||
the call-back returns false then the algorithm returns
|
||||
without considering further matches in the current file,
|
||||
or any further files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of matches found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
GrepFiles(GrepFileCallback cb, <b>const</b> std::string&
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Finds all matches of the
|
||||
current expression in the files <i>files</i> using the
|
||||
match flags <i>flags </i>- see <a
|
||||
href="template_class_ref.htm#match_type">match flags</a>.
|
||||
For each match calls the call-back function cb. <p>If
|
||||
the call-back returns false then the algorithm returns
|
||||
without considering further matches in the current file,
|
||||
or any further files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of matches found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
FindFiles(FindFilesCallback cb, <b>const</b> <b>char</b>*
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Searches <i>files</i> to
|
||||
find all those which contain at least one match of the
|
||||
current expression using the match flags <i>flags </i>-
|
||||
see <a href="template_class_ref.htm#match_type">match
|
||||
flags</a>. For each matching file calls the call-back
|
||||
function cb. <p>If the call-back returns false then
|
||||
the algorithm returns without considering any further
|
||||
files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of files found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
FindFiles(FindFilesCallback cb, <b>const</b> std::string&
|
||||
files, <b>bool</b> recurse = <b>false</b>, <b>unsigned</b>
|
||||
<b>int</b> flags = match_default);</td>
|
||||
<td valign="top" width="42%">Searches <i>files</i> to
|
||||
find all those which contain at least one match of the
|
||||
current expression using the match flags <i>flags </i>-
|
||||
see <a href="template_class_ref.htm#match_type">match
|
||||
flags</a>. For each matching file calls the call-back
|
||||
function cb. <p>If the call-back returns false then
|
||||
the algorithm returns without considering any further
|
||||
files. </p>
|
||||
<p>The parameter <i>files</i> can include wild card
|
||||
characters '*' and '?', if the parameter <i>recurse</i>
|
||||
is true then searches sub-directories for matching file
|
||||
names. </p>
|
||||
<p>Returns the total number of files found.</p>
|
||||
<p>May throw an exception derived from std::runtime_error
|
||||
if file io fails.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string Merge(<b>const</b>
|
||||
std::string& in, <b>const</b> std::string& fmt, <b>bool</b>
|
||||
copy = <b>true</b>, <b>unsigned</b> <b>int</b> flags =
|
||||
match_default);</td>
|
||||
<td valign="top" width="42%">Performs a search and
|
||||
replace operation: searches through the string <i>in</i>
|
||||
for all occurrences of the current expression, for each
|
||||
occurrence replaces the match with the format string <i>fmt</i>.
|
||||
Uses <i>flags</i> to determine what gets matched, and how
|
||||
the format string should be treated. If <i>copy</i> is
|
||||
true then all unmatched sections of input are copied
|
||||
unchanged to output, if the flag <em>format_first_only</em>
|
||||
is set then only the first occurance of the pattern found
|
||||
is replaced. Returns the new string. See <a
|
||||
href="format_string.htm#format_string">also format string
|
||||
syntax</a>, <a href="template_class_ref.htm#match_type">match
|
||||
flags</a> and <a
|
||||
href="template_class_ref.htm#format_flags">format flags</a>.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string Merge(<b>const</b>
|
||||
char* in, <b>const</b> char* fmt, <b>bool</b> copy = <b>true</b>,
|
||||
<b>unsigned int </b>flags = match_default);</td>
|
||||
<td valign="top" width="42%">Performs a search and
|
||||
replace operation: searches through the string <i>in</i>
|
||||
for all occurrences of the current expression, for each
|
||||
occurrence replaces the match with the format string <i>fmt</i>.
|
||||
Uses <i>flags</i> to determine what gets matched, and how
|
||||
the format string should be treated. If <i>copy</i> is
|
||||
true then all unmatched sections of input are copied
|
||||
unchanged to output, if the flag <em>format_first_only</em>
|
||||
is set then only the first occurance of the pattern found
|
||||
is replaced. Returns the new string. See <a
|
||||
href="format_string.htm#format_string">also format string
|
||||
syntax</a>, <a href="template_class_ref.htm#match_type">match
|
||||
flags</a> and <a
|
||||
href="template_class_ref.htm#format_flags">format flags</a>.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top"><b>unsigned</b> Split(std::vector<std::string>&
|
||||
v, std::string& s, <b>unsigned</b> flags =
|
||||
match_default, <b>unsigned</b> max_count = ~0);</td>
|
||||
<td valign="top">Splits the input string and pushes each
|
||||
one onto the vector. If the expression contains no marked
|
||||
sub-expressions, then one string is outputted for each
|
||||
section of the input that does not match the expression.
|
||||
If the expression does contain marked sub-expressions,
|
||||
then outputs one string for each marked sub-expression
|
||||
each time a match occurs. Outputs no more than <i>max_count
|
||||
</i>strings. Before returning, deletes from the input
|
||||
string <i>s</i> all of the input that has been processed
|
||||
(all of the string if <i>max_count</i> was not reached).
|
||||
Returns the number of strings pushed onto the vector.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Position(<b>int</b> i = 0)<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns the position of what
|
||||
matched sub-expression <i>i</i>. If <i>i = 0</i> then
|
||||
returns the position of the whole match. Returns RegEx::npos
|
||||
if the supplied index is invalid, or if the specified sub-expression
|
||||
did not participate in the match.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Length(<b>int</b> i = 0)<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns the length of what
|
||||
matched sub-expression <i>i</i>. If <i>i = 0</i> then
|
||||
returns the length of the whole match. Returns RegEx::npos
|
||||
if the supplied index is invalid, or if the specified sub-expression
|
||||
did not participate in the match.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td><strong>bool</strong> Matched(<strong>int</strong> i
|
||||
= 0)<strong>const</strong>;</td>
|
||||
<td>Returns true if sub-expression <em>i</em> was
|
||||
matched, false otherwise.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned</b> <b>int</b>
|
||||
Line()<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns the line on which
|
||||
the match occurred, indexes start from 1 not zero, if no
|
||||
match occurred then returns RegEx::npos.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%"><b>unsigned int</b> Marks()
|
||||
const;</td>
|
||||
<td valign="top" width="42%">Returns the number of marked
|
||||
sub-expressions contained in the expression. Note that
|
||||
this includes the whole match (sub-expression zero), so
|
||||
the value returned is always >= 1.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string What(<b>int</b>
|
||||
i)<b>const</b>;</td>
|
||||
<td valign="top" width="42%">Returns a copy of what
|
||||
matched sub-expression <i>i</i>. If <i>i = 0</i> then
|
||||
returns a copy of the whole match. Returns a null string
|
||||
if the index is invalid or if the specified sub-expression
|
||||
did not participate in a match.</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td valign="top" width="7%"> </td>
|
||||
<td valign="top" width="43%">std::string <b>operator</b>[](<b>int</b>
|
||||
i)<b>const</b> ;</td>
|
||||
<td valign="top" width="42%">Returns <i>what(i);</i> <p>Can
|
||||
be used to simplify access to sub-expression matches, and
|
||||
make usage more perl-like.</p>
|
||||
</td>
|
||||
<td valign="top" width="7%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
@ -125,8 +125,10 @@
|
||||
// If there isn't good enough wide character support then there will
|
||||
// be no wide character regular expressions:
|
||||
//
|
||||
#if (defined(BOOST_NO_CWCHAR) || defined(BOOST_NO_CWCTYPE) || defined(BOOST_NO_STD_WSTRING)) && !defined(BOOST_NO_WREGEX)
|
||||
# define BOOST_NO_WREGEX
|
||||
#if (defined(BOOST_NO_CWCHAR) || defined(BOOST_NO_CWCTYPE) || defined(BOOST_NO_STD_WSTRING))
|
||||
# if !defined(BOOST_NO_WREGEX)
|
||||
# define BOOST_NO_WREGEX
|
||||
# endif
|
||||
#else
|
||||
# if defined(__sgi) && defined(__SGI_STL_PORT)
|
||||
// STLPort on IRIX is misconfigured: <cwctype> does not compile
|
||||
@ -645,3 +647,4 @@ inline void pointer_construct(T* p, const T& t)
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -1990,6 +1990,8 @@ unsigned int BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::probe_re
|
||||
{
|
||||
case re_detail::syntax_element_startmark:
|
||||
case re_detail::syntax_element_endmark:
|
||||
if(static_cast<const re_detail::re_brace*>(dat)->index == -2)
|
||||
return regbase::restart_any;
|
||||
return probe_restart(dat->next.p);
|
||||
case re_detail::syntax_element_start_line:
|
||||
return regbase::restart_line;
|
||||
@ -2018,7 +2020,7 @@ unsigned int BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::fixup_le
|
||||
if((leading_lit) && (static_cast<re_detail::re_literal*>(dat)->length > 2))
|
||||
{
|
||||
// we can do a literal search for the leading literal string
|
||||
// using Knuth-Morris-Pratt (or whatever), and only then check for
|
||||
// using Knuth-Morris-Pratt (or whatever), and only then check for
|
||||
// matches. We need a decent length string though to make it
|
||||
// worth while.
|
||||
_leading_string = reinterpret_cast<charT*>(reinterpret_cast<char*>(dat) + sizeof(re_detail::re_literal));
|
||||
@ -2066,10 +2068,14 @@ unsigned int BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::fixup_le
|
||||
case re_detail::syntax_element_rep:
|
||||
if((len == 0) && (1 == fixup_leading_rep(dat->next.p, static_cast<re_detail::re_repeat*>(dat)->alt.p) ))
|
||||
{
|
||||
static_cast<re_detail::re_repeat*>(dat)->leading = true;
|
||||
static_cast<re_detail::re_repeat*>(dat)->leading = leading_lit;
|
||||
return len;
|
||||
}
|
||||
return len;
|
||||
case re_detail::syntax_element_startmark:
|
||||
if(static_cast<const re_detail::re_brace*>(dat)->index == -2)
|
||||
return 0;
|
||||
// fall through:
|
||||
default:
|
||||
break;
|
||||
}
|
||||
@ -2115,3 +2121,4 @@ void BOOST_REGEX_CALL reg_expression<charT, traits, Allocator>::fail(unsigned in
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -56,8 +56,10 @@ inline int string_compare(const std::basic_string<C,T,A>& s, const C* p)
|
||||
{ return s.compare(p); }
|
||||
inline int string_compare(const std::string& s, const char* p)
|
||||
{ return std::strcmp(s.c_str(), p); }
|
||||
# ifndef BOOST_NO_WREGEX
|
||||
inline int string_compare(const std::wstring& s, const wchar_t* p)
|
||||
{ return std::wcscmp(s.c_str(), p); }
|
||||
# endif
|
||||
# define STR_COMP(s,p) string_compare(s,p)
|
||||
#endif
|
||||
|
||||
@ -753,6 +755,15 @@ bool query_match_aux(iterator first,
|
||||
start_loop[cur_acc] = first;
|
||||
continue;
|
||||
}
|
||||
else if((unsigned int)accumulators[cur_acc] < static_cast<const re_repeat*>(ptr)->min)
|
||||
{
|
||||
// the repeat was null, and we haven't gone round min times yet,
|
||||
// since all subsequent repeats will be null as well, just update
|
||||
// our repeat count and skip out.
|
||||
accumulators[cur_acc] = static_cast<const re_repeat*>(ptr)->min;
|
||||
ptr = static_cast<const re_repeat*>(ptr)->alt.p;
|
||||
continue;
|
||||
}
|
||||
goto failure;
|
||||
}
|
||||
// see if we can skip the repeat:
|
||||
@ -809,6 +820,15 @@ bool query_match_aux(iterator first,
|
||||
start_loop[cur_acc] = first;
|
||||
continue;
|
||||
}
|
||||
else if((first == start_loop[cur_acc]) && accumulators[cur_acc] && ((unsigned int)accumulators[cur_acc] < static_cast<const re_repeat*>(ptr)->min))
|
||||
{
|
||||
// the repeat was null, and we haven't gone round min times yet,
|
||||
// since all subsequent repeats will be null as well, just update
|
||||
// our repeat count and skip out.
|
||||
accumulators[cur_acc] = static_cast<const re_repeat*>(ptr)->min;
|
||||
ptr = static_cast<const re_repeat*>(ptr)->alt.p;
|
||||
continue;
|
||||
}
|
||||
|
||||
// if we get here then neither option is allowed so fail:
|
||||
goto failure;
|
||||
@ -826,7 +846,7 @@ bool query_match_aux(iterator first,
|
||||
if(flags & match_not_eob)
|
||||
goto failure;
|
||||
iterator p(first);
|
||||
while((p != last) && traits_inst.is_separator(traits_inst.translate(*first, icase)))++p;
|
||||
while((p != last) && traits_inst.is_separator(traits_inst.translate(*p, icase)))++p;
|
||||
if(p != last)
|
||||
goto failure;
|
||||
ptr = ptr->next.p;
|
||||
@ -958,6 +978,12 @@ bool query_match_aux(iterator first,
|
||||
goto failure;
|
||||
ptr = ptr->next.p;
|
||||
continue;
|
||||
case syntax_element_backref:
|
||||
if(temp_match[static_cast<const re_brace*>(ptr)->index].first
|
||||
!= temp_match[static_cast<const re_brace*>(ptr)->index].second)
|
||||
goto failure;
|
||||
ptr = ptr->next.p;
|
||||
continue;
|
||||
default:
|
||||
goto failure;
|
||||
}
|
||||
|
150
index.htm
150
index.htm
@ -1,150 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="keywords"
|
||||
content="regex++, regular expressions, regular expression library, C++">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>regex++, Index</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="277" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Index.</h3>
|
||||
<p align="left"><i>(Version 3.31, 16th Dec 2001)</i>
|
||||
</p>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3 align="center">Contents</h3>
|
||||
|
||||
<ul>
|
||||
<li><a href="introduction.htm#intro">Introduction</a></li>
|
||||
<li><a href="introduction.htm#Installation">Installation and
|
||||
Configuration</a> </li>
|
||||
<li><a href="template_class_ref.htm#regbase">Template Class
|
||||
and Algorithm Reference</a> <ul>
|
||||
<li>Class <a href="template_class_ref.htm#regbase">regbase</a></li>
|
||||
<li>Class <a
|
||||
href="template_class_ref.htm#bad_expression">bad_expression</a>
|
||||
</li>
|
||||
<li>Class <a
|
||||
href="template_class_ref.htm#reg_expression">reg_expression</a>
|
||||
</li>
|
||||
<li>Class <a
|
||||
href="template_class_ref.htm#regex_char_traits">char_regex_traits</a></li>
|
||||
<li>Class <a href="template_class_ref.htm#reg_match">match_results</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#query_match">regex_match</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_search">regex_search</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_grep">regex_grep</a></li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_format">regex_format</a>
|
||||
</li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#reg_merge">regex_merge</a></li>
|
||||
<li>Algorithm <a
|
||||
href="template_class_ref.htm#regex_split">regex_split</a>
|
||||
</li>
|
||||
<li><a href="template_class_ref.htm#partial_matches">Partial
|
||||
regular expression matches</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Class <a href="hl_ref.htm#RegEx">RegEx</a> reference</li>
|
||||
<li><a href="posix_ref.htm#posix">POSIX Compatibility
|
||||
Functions</a></li>
|
||||
<li><a href="syntax.htm#syntax">Regular Expression Syntax</a></li>
|
||||
<li><a href="format_string.htm#format_string">Format String
|
||||
Syntax</a></li>
|
||||
<li><a href="appendix.htm#implementation">Appendices</a> <ul>
|
||||
<li><a href="appendix.htm#implementation">Implementation
|
||||
notes</a></li>
|
||||
<li><a href="appendix.htm#threads">Thread safety</a></li>
|
||||
<li><a href="appendix.htm#localisation">Localization</a></li>
|
||||
<li><a href="appendix.htm#demos">Example Applications</a>
|
||||
<ul>
|
||||
<li><a
|
||||
href="example/snippets/regex_match_example.cpp">regex_match_example.cpp</a>:
|
||||
ftp based regex_match example.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_search_example.cpp">regex_search_example.cpp</a>:
|
||||
regex_search example: searches a cpp file
|
||||
for class definitions.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_1.cpp">regex_grep_example_1.cpp</a>:
|
||||
regex_grep example 1: searches a cpp file
|
||||
for class definitions.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_merge_example.cpp">regex_merge_example.cpp</a>:
|
||||
regex_merge example: converts a C++ file
|
||||
to syntax highlighted HTML.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_2.cpp">regex_grep_example_2.cpp</a>:
|
||||
regex_grep example 2: searches a cpp file
|
||||
for class definitions, using a global
|
||||
callback function. </li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_3.cpp">regex_grep_example_3.cpp</a>:
|
||||
regex_grep example 2: searches a cpp file
|
||||
for class definitions, using a bound
|
||||
member function callback.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_grep_example_4.cpp">regex_grep_example_4.cpp</a>:
|
||||
regex_grep example 2: searches a cpp file
|
||||
for class definitions, using a C++
|
||||
Builder closure as a callback.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_split_example_1.cpp">regex_split_example_1.cpp</a>:
|
||||
regex_split example: split a string into
|
||||
tokens.</li>
|
||||
<li><a
|
||||
href="example/snippets/regex_split_example_2.cpp">regex_split_example_2.cpp</a>:
|
||||
regex_split example: spit out linked
|
||||
URL's.</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="appendix.htm#headers">Header Files.</a></li>
|
||||
<li><a href="appendix.htm#redist">Redistributables</a></li>
|
||||
<li><a href="appendix.htm#upgrade">Note for upgraders</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="appendix.htm#furtherInfo">Further Information (Contacts
|
||||
and Acknowledgements)</a></li>
|
||||
<li><a href="faq.htm">FAQ</a></li>
|
||||
</ul>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
476
introduction.htm
476
introduction.htm
@ -1,476 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="keywords"
|
||||
content="regex++, regular expressions, regular expression library, C++">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>regex++, Introduction</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Introduction.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="intro"></a><i>Introduction</i></h3>
|
||||
|
||||
<p>Regular expressions are a form of pattern-matching that are
|
||||
often used in text processing; many users will be familiar with
|
||||
the Unix utilities <i>grep</i>, <i>sed</i> and <i>awk</i>, and
|
||||
the programming language <i>perl</i>, each of which make
|
||||
extensive use of regular expressions. Traditionally C++ users
|
||||
have been limited to the POSIX C API's for manipulating regular
|
||||
expressions, and while regex++ does provide these API's, they do
|
||||
not represent the best way to use the library. For example regex++
|
||||
can cope with wide character strings, or search and replace
|
||||
operations (in a manner analogous to either sed or perl),
|
||||
something that traditional C libraries can not do.</p>
|
||||
|
||||
<p>The class <a href="template_class_ref.htm#reg_expression">boost::reg_expression</a>
|
||||
is the key class in this library; it represents a "machine
|
||||
readable" regular expression, and is very closely modelled
|
||||
on std::basic_string, think of it as a string plus the actual
|
||||
state-machine required by the regular expression algorithms. Like
|
||||
std::basic_string there are two typedefs that are almost always
|
||||
the means by which this class is referenced:</p>
|
||||
|
||||
<pre><b>namespace </b>boost{
|
||||
|
||||
<b>template</b> <<b>class</b> charT,
|
||||
<b> class</b> traits = regex_traits<charT>,
|
||||
<b>class</b> Allocator = std::allocator<charT> >
|
||||
<b>class</b> reg_expression;
|
||||
|
||||
<b>typedef</b> reg_expression<<b>char</b>> regex;
|
||||
<b>typedef</b> reg_expression<<b>wchar_t></b> wregex;
|
||||
|
||||
}</pre>
|
||||
|
||||
<p>To see how this library can be used, imagine that we are
|
||||
writing a credit card processing application. Credit card numbers
|
||||
generally come as a string of 16-digits, separated into groups of
|
||||
4-digits, and separated by either a space or a hyphen. Before
|
||||
storing a credit card number in a database (not necessarily
|
||||
something your customers will appreciate!), we may want to verify
|
||||
that the number is in the correct format. To match any digit we
|
||||
could use the regular expression [0-9], however ranges of
|
||||
characters like this are actually locale dependent. Instead we
|
||||
should use the POSIX standard form [[:digit:]], or the regex++
|
||||
and perl shorthand for this \d (note that many older libraries
|
||||
tended to be hard-coded to the C-locale, consequently this was
|
||||
not an issue for them). That leaves us with the following regular
|
||||
expression to validate credit card number formats:</p>
|
||||
|
||||
<p>(\d{4}[- ]){3}\d{4}</p>
|
||||
|
||||
<p>Here the parenthesis act to group (and mark for future
|
||||
reference) sub-expressions, and the {4} means "repeat
|
||||
exactly 4 times". This is an example of the extended regular
|
||||
expression syntax used by perl, awk and egrep. Regex++ also
|
||||
supports the older "basic" syntax used by sed and grep,
|
||||
but this is generally less useful, unless you already have some
|
||||
basic regular expressions that you need to reuse.</p>
|
||||
|
||||
<p>Now lets take that expression and place it in some C++ code to
|
||||
validate the format of a credit card number:</p>
|
||||
|
||||
<pre><b>bool</b> validate_card_format(<b>const</b> std::string s)
|
||||
{
|
||||
<b>static</b> <b>const</b> <a
|
||||
href="template_class_ref.htm#reg_expression">boost::regex</a> e("(\\d{4}[- ]){3}\\d{4}");
|
||||
<b>return</b> <a href="template_class_ref.htm#query_match">regex_match</a>(s, e);
|
||||
}</pre>
|
||||
|
||||
<p>Note how we had to add some extra escapes to the expression:
|
||||
remember that the escape is seen once by the C++ compiler, before
|
||||
it gets to be seen by the regular expression engine, consequently
|
||||
escapes in regular expressions have to be doubled up when
|
||||
embedding them in C/C++ code. Also note that all the examples
|
||||
assume that your compiler supports Koenig lookup, if yours
|
||||
doesn't (for example VC6), then you will have to add some boost::
|
||||
prefixes to some of the function calls in the examples.</p>
|
||||
|
||||
<p>Those of you who are familiar with credit card processing,
|
||||
will have realised that while the format used above is suitable
|
||||
for human readable card numbers, it does not represent the format
|
||||
required by online credit card systems; these require the number
|
||||
as a string of 16 (or possibly 15) digits, without any
|
||||
intervening spaces. What we need is a means to convert easily
|
||||
between the two formats, and this is where search and replace
|
||||
comes in. Those who are familiar with the utilities <i>sed</i>
|
||||
and <i>perl</i> will already be ahead here; we need two strings -
|
||||
one a regular expression - the other a "<a
|
||||
href="format_string.htm">format string</a>" that provides a
|
||||
description of the text to replace the match with. In regex++
|
||||
this search and replace operation is performed with the algorithm
|
||||
regex_merge, for our credit card example we can write two
|
||||
algorithms like this to provide the format conversions:</p>
|
||||
|
||||
<pre>
|
||||
<i>// match any format with the regular expression:
|
||||
</i><b>const</b> boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
|
||||
<b>const</b> std::string machine_format("\\1\\2\\3\\4");
|
||||
<b>const</b> std::string human_format("\\1-\\2-\\3-\\4");
|
||||
|
||||
std::string machine_readable_card_number(<b>const</b> std::string s)
|
||||
{
|
||||
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, machine_format, boost::match_default | boost::format_sed);
|
||||
}
|
||||
|
||||
std::string human_readable_card_number(<b>const</b> std::string s)
|
||||
{
|
||||
<b>return</b> <a href="template_class_ref.htm#reg_merge">regex_merge</a>(s, e, human_format, boost::match_default | boost::format_sed);
|
||||
}</pre>
|
||||
|
||||
<p>Here we've used marked sub-expressions in the regular
|
||||
expression to split out the four parts of the card number as
|
||||
separate fields, the format string then uses the sed-like syntax
|
||||
to replace the matched text with the reformatted version.</p>
|
||||
|
||||
<p>In the examples above, we haven't directly manipulated the
|
||||
results of a regular expression match, however in general the
|
||||
result of a match contains a number of sub-expression matches in
|
||||
addition to the overall match. When the library needs to report a
|
||||
regular expression match it does so using an instance of the
|
||||
class <a href="template_class_ref.htm#reg_match">match_results</a>,
|
||||
as before there are typedefs of this class for the most common
|
||||
cases: </p>
|
||||
|
||||
<pre><b>namespace </b>boost{
|
||||
<b>typedef</b> match_results<<b>const</b> <b>char</b>*> cmatch;
|
||||
<b>typedef</b> match_results<<b>const</b> <b>wchar_t</b>*> wcmatch;
|
||||
<strong>typedef</strong> match_results<std::string::const_iterator> smatch;
|
||||
<strong>typedef</strong> match_results<std::wstring::const_iterator> wsmatch;
|
||||
}</pre>
|
||||
|
||||
<p>The algorithms <a href="template_class_ref.htm#reg_search">regex_search</a>
|
||||
and <a href="template_class_ref.htm#reg_grep">regex_grep</a> (i.e.
|
||||
finding all matches in a string) make use of match_results to
|
||||
report what matched.</p>
|
||||
|
||||
<p>Note that these algorithms are not restricted to searching
|
||||
regular C-strings, any bidirectional iterator type can be
|
||||
searched, allowing for the possibility of seamlessly searching
|
||||
almost any kind of data. </p>
|
||||
|
||||
<p>For search and replace operations in addition to the algorithm
|
||||
<a href="template_class_ref.htm#reg_merge">regex_merge</a> that
|
||||
we have already seen, the algorithm <a
|
||||
href="template_class_ref.htm#reg_format">regex_format</a> takes
|
||||
the result of a match and a format string, and produces a new
|
||||
string by merging the two.</p>
|
||||
|
||||
<p>For those that dislike templates, there is a high level
|
||||
wrapper class RegEx that is an encapsulation of the lower level
|
||||
template code - it provides a simplified interface for those that
|
||||
don't need the full power of the library, and supports only
|
||||
narrow characters, and the "extended" regular
|
||||
expression syntax. </p>
|
||||
|
||||
<p>The <a href="posix_ref.htm#posix">POSIX API</a> functions:
|
||||
regcomp, regexec, regfree and regerror, are available in both
|
||||
narrow character and Unicode versions, and are provided for those
|
||||
who need compatibility with these API's. </p>
|
||||
|
||||
<p>Finally, note that the library now has run-time <a
|
||||
href="appendix.htm#localisation">localization</a> support, and
|
||||
recognizes the full POSIX regular expression syntax - including
|
||||
advanced features like multi-character collating elements and
|
||||
equivalence classes - as well as providing compatibility with
|
||||
other regular expression libraries including GNU and BSD4 regex
|
||||
packages, and to a more limited extent perl 5. </p>
|
||||
|
||||
<h3><a name="Installation"></a><i>Installation and Configuration
|
||||
Options</i> </h3>
|
||||
|
||||
<p><em>[ </em><strong><i>Important</i></strong><em>: If you are
|
||||
upgrading from the 2.x version of this library then you will find
|
||||
a number of changes to the documented header names and library
|
||||
interfaces, existing code should still compile unchanged however
|
||||
- see </em><a href="appendix.htm#upgrade"><font color="#0000FF"><em>Note
|
||||
for Upgraders</em></font></a><em>. ]</em></p>
|
||||
|
||||
<p>When you extract the library from its zip file, you must
|
||||
preserve its internal directory structure (for example by using
|
||||
the -d option when extracting). If you didn't do that when
|
||||
extracting, then you'd better stop reading this, delete the files
|
||||
you just extracted, and try again! </p>
|
||||
|
||||
<p>This library should not need configuring before use; most
|
||||
popular compilers/standard libraries/platforms are already
|
||||
supported "as is". If you do experience configuration
|
||||
problems, or just want to test the configuration with your
|
||||
compiler, then the process is the same as for all of boost; see
|
||||
the <a href="../config/config.htm">configuration library
|
||||
documentation</a>.</p>
|
||||
|
||||
<p>The library will encase all code inside namespace boost. </p>
|
||||
|
||||
<p>Unlike some other template libraries, this library consists of
|
||||
a mixture of template code (in the headers) and static code and
|
||||
data (in cpp files). Consequently it is necessary to build the
|
||||
library's support code into a library or archive file before you
|
||||
can use it, instructions for specific platforms are as follows: </p>
|
||||
|
||||
<p><b>Borland C++ Builder:</b> </p>
|
||||
|
||||
<ul>
|
||||
<li>Open up a console window and change to the
|
||||
<boost>\libs\regex\build directory. </li>
|
||||
<li>Select the appropriate makefile (bcb4.mak for C++ Builder
|
||||
4, bcb5.mak for C++ Builder 5, and bcb6.mak for C++
|
||||
Builder 6). </li>
|
||||
<li>Invoke the makefile (pass the full path to your version
|
||||
of make if you have more than one version installed, the
|
||||
makefile relies on the path to make to obtain your C++
|
||||
Builder installation directory and tools) for example: </li>
|
||||
</ul>
|
||||
|
||||
<pre>make -fbcb5.mak</pre>
|
||||
|
||||
<p>The build process will build a variety of .lib and .dll files
|
||||
(the exact number depends upon the version of Borland's tools you
|
||||
are using) the .lib and dll files will be in a sub-directory
|
||||
called bcb4 or bcb5 depending upon the makefile used. To install
|
||||
the libraries into your development system use:</p>
|
||||
|
||||
<p>make -fbcb5.mak install</p>
|
||||
|
||||
<p>library files will be copied to <BCROOT>/lib and the
|
||||
dll's to <BCROOT>/bin, where <BCROOT> corresponds to
|
||||
the install path of your Borland C++ tools. </p>
|
||||
|
||||
<p>You may also remove temporary files created during the build
|
||||
process (excluding lib and dll files) by using:</p>
|
||||
|
||||
<p>make -fbcb5.mak clean</p>
|
||||
|
||||
<p>Finally when you use regex++ it is only necessary for you to
|
||||
add the <boost> root director to your list of include
|
||||
directories for that project. It is not necessary for you to
|
||||
manually add a .lib file to the project; the headers will
|
||||
automatically select the correct .lib file for your build mode
|
||||
and tell the linker to include it. There is one caveat however:
|
||||
the library can not tell the difference between VCL and non-VCL
|
||||
enabled builds when building a GUI application from the command
|
||||
line, if you build from the command line with the 5.5 command
|
||||
line tools then you must define the pre-processor symbol _NO_VCL
|
||||
in order to ensure that the correct link libraries are selected:
|
||||
the C++ Builder IDE normally sets this automatically. Hint, users
|
||||
of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg
|
||||
in order to set this option permanently. </p>
|
||||
|
||||
<p>If you would prefer to do a static link to the regex libraries
|
||||
even when using the dll runtime then define
|
||||
BOOST_REGEX_STATIC_LINK, and if you want to suppress automatic
|
||||
linking altogether (and supply your own custom build of the lib)
|
||||
then define BOOST_REGEX_NO_LIB.</p>
|
||||
|
||||
<p>If you are building with C++ Builder 6, you will find that
|
||||
<boost/regex.hpp> can not be used in a pre-compiled header
|
||||
(the actual problem is in <locale> which gets included by
|
||||
<boost/regex.hpp>), if this causes problems for you, then
|
||||
try defining BOOST_NO_STD_LOCALE when building, this will disable
|
||||
some features throughout boost, but may save you a lot in compile
|
||||
times!</p>
|
||||
|
||||
<p><b>Microsoft Visual C++ 6</b><strong> and 7</strong></p>
|
||||
|
||||
<p>You need version 6 of MSVC to build this library. If you are
|
||||
using VC5 then you may want to look at one of the previous
|
||||
releases of this <a
|
||||
href="http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm">library</a>
|
||||
</p>
|
||||
|
||||
<p>Open up a command prompt, which has the necessary MSVC
|
||||
environment variables defined (for example by using the batch
|
||||
file Vcvars32.bat installed by the Visual Studio installation),
|
||||
and change to the <boost>\libs\regex\build directory. </p>
|
||||
|
||||
<p>Select the correct makefile - vc6.mak for "vanilla"
|
||||
Visual C++ 6 or vc6-stlport.mak if you are using STLPort.</p>
|
||||
|
||||
<p>Invoke the makefile like this:</p>
|
||||
|
||||
<p>nmake -fvc6.mak</p>
|
||||
|
||||
<p>You will now have a collection of lib and dll files in a
|
||||
"vc6" subdirectory, to install these into your
|
||||
development system use:</p>
|
||||
|
||||
<p>nmake -fvc6.mak install</p>
|
||||
|
||||
<p>The lib files will be copied to your <VC6>\lib directory
|
||||
and the dll files to <VC6>\bin, where <VC6> is the
|
||||
root of your Visual C++ 6 installation.</p>
|
||||
|
||||
<p>You can delete all the temporary files created during the
|
||||
build (excluding lib and dll files) using:</p>
|
||||
|
||||
<p>nmake -fvc6.mak clean </p>
|
||||
|
||||
<p>Finally when you use regex++ it is only necessary for you to
|
||||
add the <boost> root directory to your list of include
|
||||
directories for that project. It is not necessary for you to
|
||||
manually add a .lib file to the project; the headers will
|
||||
automatically select the correct .lib file for your build mode
|
||||
and tell the linker to include it. </p>
|
||||
|
||||
<p>Note that if you want to statically link to the regex library
|
||||
when using the dynamic C++ runtime, define
|
||||
BOOST_REGEX_STATIC_LINK when building your project (this only has
|
||||
an effect for release builds). If you want to add the source
|
||||
directly to your project then define BOOST_REGEX_NO_LIB to
|
||||
disable automatic library selection.</p>
|
||||
|
||||
<p><strong><i>Important</i></strong><em>: there have been some
|
||||
reports of compiler-optimisation bugs affecting this library, (particularly
|
||||
with VC6 versions prior to service patch 5) the workaround is to
|
||||
build the library using /Oityb1 rather than /O2. That is to use
|
||||
all optimisation settings except /Oa. This problem is reported to
|
||||
affect some standard library code as well (in fact I'm not sure
|
||||
if the problem is with the regex code or the underlying standard
|
||||
library), so it's probably worthwhile applying this workaround in
|
||||
normal practice in any case.</em></p>
|
||||
|
||||
<p>Note: if you have replaced the C++ standard library that comes
|
||||
with VC6, then when you build the library you must ensure that
|
||||
the environment variables "INCLUDE" and "LIB"
|
||||
have been updated to reflect the include and library paths for
|
||||
the new library - see vcvars32.bat (part of your Visual Studio
|
||||
installation) for more details. Alternatively if STLPort is in c:/stlport
|
||||
then you could use:</p>
|
||||
|
||||
<p>nmake INCLUDES="-Ic:/stlport/stlport" XLFLAGS="/LIBPATH:c:/stlport/lib"
|
||||
-fvc6-stlport.mak</p>
|
||||
|
||||
<p>If you are building with the full STLPort v4.x, then use the
|
||||
vc6-stlport.mak file provided and set the environment variable
|
||||
STLPORT_PATH to point to the location of your STLport
|
||||
installation (Note that the full STLPort libraries appear not to
|
||||
support single-thread static builds). <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><b>GCC(2.95)</b> </p>
|
||||
|
||||
<p>There is a conservative makefile for the g++ compiler. From
|
||||
the command prompt change to the <boost>/libs/regex/build
|
||||
directory and type: </p>
|
||||
|
||||
<p>make -fgcc.mak </p>
|
||||
|
||||
<p>At the end of the build process you should have a gcc sub-directory
|
||||
containing release and debug versions of the library (libboost_regex.a
|
||||
and libboost_regex_debug.a). When you build projects that use
|
||||
regex++, you will need to add the boost install directory to your
|
||||
list of include paths and add <boost>/libs/regex/build/gcc/libboost_regex.a
|
||||
to your list of library files. </p>
|
||||
|
||||
<p>There is also a makefile to build the library as a shared
|
||||
library:</p>
|
||||
|
||||
<p>make -fgcc-shared.mak</p>
|
||||
|
||||
<p>which will build libboost_regex.so and libboost_regex_debug.so.</p>
|
||||
|
||||
<p>Both of the these makefiles support the following environment
|
||||
variables:</p>
|
||||
|
||||
<p>CXXFLAGS: extra compiler options - note that this applies to
|
||||
both the debug and release builds.</p>
|
||||
|
||||
<p>INCLUDES: additional include directories.</p>
|
||||
|
||||
<p>LDFLAGS: additional linker options.</p>
|
||||
|
||||
<p>LIBS: additional library files.</p>
|
||||
|
||||
<p>For the more adventurous there is a configure script in
|
||||
<boost>/libs/config; see the <a href="../config/config.htm">config
|
||||
library documentation</a>.</p>
|
||||
|
||||
<p><b>Sun Workshop 6.1</b></p>
|
||||
|
||||
<p>There is a makefile for the sun (6.1) compiler (C++ version 3.12).
|
||||
From the command prompt change to the <boost>/libs/regex/build
|
||||
directory and type: </p>
|
||||
|
||||
<p>dmake -f sunpro.mak </p>
|
||||
|
||||
<p>At the end of the build process you should have a sunpro sub-directory
|
||||
containing single and multithread versions of the library (libboost_regex.a,
|
||||
libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so).
|
||||
When you build projects that use regex++, you will need to add
|
||||
the boost install directory to your list of include paths and add
|
||||
<boost>/libs/regex/build/sunpro/ to your library search
|
||||
path. </p>
|
||||
|
||||
<p>Both of the these makefiles support the following environment
|
||||
variables:</p>
|
||||
|
||||
<p>CXXFLAGS: extra compiler options - note that this applies to
|
||||
both the single and multithreaded builds.</p>
|
||||
|
||||
<p>INCLUDES: additional include directories.</p>
|
||||
|
||||
<p>LDFLAGS: additional linker options.</p>
|
||||
|
||||
<p>LIBS: additional library files.</p>
|
||||
|
||||
<p>LIBSUFFIX: a suffix to mangle the library name with (defaults
|
||||
to nothing).</p>
|
||||
|
||||
<p>This makefile does not set any architecture specific options
|
||||
like -xarch=v9, you can set these by defining the appropriate
|
||||
macros, for example:</p>
|
||||
|
||||
<p>dmake CXXFLAGS="-xarch=v9" LDFLAGS="-xarch=v9"
|
||||
LIBSUFFIX="_v9" -f sunpro.mak</p>
|
||||
|
||||
<p>will build v9 variants of the regex library named
|
||||
libboost_regex_v9.a etc.</p>
|
||||
|
||||
<p><b>Other compilers:</b> </p>
|
||||
|
||||
<p>There is a generic makefile (<a href="build/generic.mak">generic.mak</a>)
|
||||
provided in <boost-root>/libs/regex/build - see that
|
||||
makefile for details of environment variables that need to be set
|
||||
before use. Alternatively you can using the <a
|
||||
href="../../tools/build/index.html">Jam based build system</a>.
|
||||
If you need to configure the library for your platform, then
|
||||
refer to the <a href="../config/config.htm">config library
|
||||
documentation</a>.</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2001 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
314
posix_ref.htm
314
posix_ref.htm
@ -1,314 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, POSIX API Reference</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, POSIX API
|
||||
Reference. </h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="posix"></a><i>POSIX compatibility library</i></h3>
|
||||
|
||||
<pre>#include <boost/cregex.hpp>
|
||||
<i>or</i>:
|
||||
#include <boost/regex.h></pre>
|
||||
|
||||
<p>The following functions are available for users who need a
|
||||
POSIX compatible C library, they are available in both Unicode
|
||||
and narrow character versions, the standard POSIX API names are
|
||||
macros that expand to one version or the other depending upon
|
||||
whether UNICODE is defined or not. </p>
|
||||
|
||||
<p><b>Important</b>: Note that all the symbols defined here are
|
||||
enclosed inside namespace <i>boost</i> when used in C++ programs,
|
||||
unless you use #include <boost/regex.h> instead - in which
|
||||
case the symbols are still defined in namespace boost, but are
|
||||
made available in the global namespace as well.</p>
|
||||
|
||||
<p>The functions are defined as: </p>
|
||||
|
||||
<pre>extern "C" {
|
||||
<b>int</b> regcompA(regex_tA*, <b>const</b> <b>char</b>*, <b>int</b>);
|
||||
<b>unsigned</b> <b>int</b> regerrorA(<b>int</b>, <b>const</b> regex_tA*, <b>char</b>*, <b>unsigned</b> <b>int</b>);
|
||||
<b>int</b> regexecA(<b>const</b> regex_tA*, <b>const</b> <b>char</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
|
||||
<b>void</b> regfreeA(regex_tA*);
|
||||
|
||||
<b>int</b> regcompW(regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>int</b>);
|
||||
<b>unsigned</b> <b>int</b> regerrorW(<b>int</b>, <b>const</b> regex_tW*, <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>);
|
||||
<b>int</b> regexecW(<b>const</b> regex_tW*, <b>const</b> <b>wchar_t</b>*, <b>unsigned</b> <b>int</b>, regmatch_t*, <b>int</b>);
|
||||
<b>void</b> regfreeW(regex_tW*);
|
||||
|
||||
#ifdef UNICODE
|
||||
#define regcomp regcompW
|
||||
#define regerror regerrorW
|
||||
#define regexec regexecW
|
||||
#define regfree regfreeW
|
||||
#define regex_t regex_tW
|
||||
#else
|
||||
#define regcomp regcompA
|
||||
#define regerror regerrorA
|
||||
#define regexec regexecA
|
||||
#define regfree regfreeA
|
||||
#define regex_t regex_tA
|
||||
#endif
|
||||
}</pre>
|
||||
|
||||
<p>All the functions operate on structure <b>regex_t</b>, which
|
||||
exposes two public members: </p>
|
||||
|
||||
<p><b>unsigned int re_nsub</b> this is filled in by <b>regcomp</b>
|
||||
and indicates the number of sub-expressions contained in the
|
||||
regular expression. </p>
|
||||
|
||||
<p><b>const TCHAR* re_endp</b> points to the end of the
|
||||
expression to compile when the flag REG_PEND is set. </p>
|
||||
|
||||
<p><i>Footnote: regex_t is actually a #define - it is either
|
||||
regex_tA or regex_tW depending upon whether UNICODE is defined or
|
||||
not, TCHAR is either char or wchar_t again depending upon the
|
||||
macro UNICODE.</i> </p>
|
||||
|
||||
<p><b>regcomp</b> takes a pointer to a <b>regex_t</b>, a pointer
|
||||
to the expression to compile and a flags parameter which can be a
|
||||
combination of: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_EXTENDED</td>
|
||||
<td valign="top" width="45%">Compiles modern regular
|
||||
expressions. Equivalent to regbase::char_classes |
|
||||
regbase::intervals | regbase::bk_refs.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_BASIC</td>
|
||||
<td valign="top" width="45%">Compiles basic (obsolete)
|
||||
regular expression syntax. Equivalent to regbase::char_classes
|
||||
| regbase::intervals | regbase::limited_ops | regbase::bk_braces
|
||||
| regbase::bk_parens | regbase::bk_refs.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NOSPEC</td>
|
||||
<td valign="top" width="45%">All characters are ordinary,
|
||||
the expression is a literal string.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_ICASE</td>
|
||||
<td valign="top" width="45%">Compiles for matching that
|
||||
ignores character case.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NOSUB</td>
|
||||
<td valign="top" width="45%">Has no effect in this
|
||||
library.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NEWLINE</td>
|
||||
<td valign="top" width="45%">When this flag is set a dot
|
||||
does not match the newline character.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_PEND</td>
|
||||
<td valign="top" width="45%">When this flag is set the
|
||||
re_endp parameter of the regex_t structure must point to
|
||||
the end of the regular expression to compile.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NOCOLLATE</td>
|
||||
<td valign="top" width="45%">When this flag is set then
|
||||
locale dependent collation for character ranges is turned
|
||||
off.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_ESCAPE_IN_LISTS<br>
|
||||
, , , </td>
|
||||
<td valign="top" width="45%">When this flag is set, then
|
||||
escape sequences are permitted in bracket expressions (character
|
||||
sets).</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_NEWLINE_ALT </td>
|
||||
<td valign="top" width="45%">When this flag is set then
|
||||
the newline character is equivalent to the alternation
|
||||
operator |.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_PERL </td>
|
||||
<td valign="top" width="45%"> A shortcut for perl-like
|
||||
behavior: REG_EXTENDED | REG_NOCOLLATE |
|
||||
REG_ESCAPE_IN_LISTS</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_AWK</td>
|
||||
<td valign="top" width="45%">A shortcut for awk-like
|
||||
behavior: REG_EXTENDED | REG_ESCAPE_IN_LISTS</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_GREP</td>
|
||||
<td valign="top" width="45%">A shortcut for grep like
|
||||
behavior: REG_BASIC | REG_NEWLINE_ALT</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">REG_EGREP</td>
|
||||
<td valign="top" width="45%"> A shortcut for egrep
|
||||
like behavior: REG_EXTENDED | REG_NEWLINE_ALT</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><b>regerror</b> takes the following parameters, it maps an
|
||||
error code to a human readable string: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="50%">int code</td>
|
||||
<td valign="top" width="50%">The error code.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">const regex_t* e</td>
|
||||
<td valign="top" width="50%">The regular expression (can
|
||||
be null).</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">char* buf</td>
|
||||
<td valign="top" width="50%">The buffer to fill in with
|
||||
the error message.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">unsigned int buf_size</td>
|
||||
<td valign="top" width="50%">The length of buf.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p>If the error code is OR'ed with REG_ITOA then the message that
|
||||
results is the printable name of the code rather than a message,
|
||||
for example "REG_BADPAT". If the code is REG_ATIO then <b>e</b>
|
||||
must not be null and <b>e->re_pend</b> must point to the
|
||||
printable name of an error code, the return value is then the
|
||||
value of the error code. For any other value of <b>code</b>, the
|
||||
return value is the number of characters in the error message, if
|
||||
the return value is greater than or equal to <b>buf_size</b> then
|
||||
<b>regerror</b> will have to be called again with a larger buffer.</p>
|
||||
|
||||
<p><b>regexec</b> finds the first occurrence of expression <b>e</b>
|
||||
within string <b>buf</b>. If <b>len</b> is non-zero then *<b>m</b>
|
||||
is filled in with what matched the regular expression, <b>m[0]</b>
|
||||
contains what matched the whole string, <b>m[1] </b>the first sub-expression
|
||||
etc, see <b>regmatch_t</b> in the header file declaration for
|
||||
more details. The <b>eflags</b> parameter can be a combination of:
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="50%">REG_NOTBOL</td>
|
||||
<td valign="top" width="50%">Parameter <b>buf </b>does
|
||||
not represent the start of a line.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">REG_NOTEOL</td>
|
||||
<td valign="top" width="50%">Parameter <b>buf</b> does
|
||||
not terminate at the end of a line.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">REG_STARTEND</td>
|
||||
<td valign="top" width="50%">The string searched starts
|
||||
at buf + pmatch[0].rm_so and ends at buf + pmatch[0].rm_eo.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p>Finally <b>regfree</b> frees all the memory that was allocated
|
||||
by regcomp. </p>
|
||||
|
||||
<p><i>Footnote: this is an abridged reference to the POSIX API
|
||||
functions, it is provided for compatibility with other libraries,
|
||||
rather than an API to be used in new code (unless you need access
|
||||
from a language other than C++). This version of these functions
|
||||
should also happily coexist with other versions, as the names
|
||||
used are macros that expand to the actual function names.</i> <br>
|
||||
</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
@ -514,9 +514,9 @@ void BOOST_REGEX_CALL c_traits_base::do_update_ctype()
|
||||
if(std::isxdigit(i))
|
||||
class_map[i] |= char_class_xdigit;
|
||||
}
|
||||
class_map['_'] |= char_class_underscore;
|
||||
class_map[' '] |= char_class_blank;
|
||||
class_map['\t'] |= char_class_blank;
|
||||
class_map[(unsigned char)'_'] |= char_class_underscore;
|
||||
class_map[(unsigned char)' '] |= char_class_blank;
|
||||
class_map[(unsigned char)'\t'] |= char_class_blank;
|
||||
for(i = 0; i < map_size; ++i)
|
||||
{
|
||||
lower_case_map[i] = (char)std::tolower(i);
|
||||
|
@ -241,7 +241,7 @@ message_data<char>::message_data(const std::locale& l, const std::string& regex_
|
||||
#endif
|
||||
for(std::size_t j = 0; j < s.size(); ++j)
|
||||
{
|
||||
syntax_map[s[j]] = (unsigned char)(i);
|
||||
syntax_map[(unsigned char)s[j]] = (unsigned char)(i);
|
||||
}
|
||||
}
|
||||
|
||||
|
742
syntax.htm
742
syntax.htm
@ -1,742 +0,0 @@
|
||||
<html>
|
||||
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=iso-8859-1">
|
||||
<meta name="Template"
|
||||
content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||
<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
|
||||
<title>Regex++, Regular Expression Syntax</title>
|
||||
</head>
|
||||
|
||||
<body bgcolor="#FFFFFF" link="#0000FF" vlink="#800080">
|
||||
|
||||
<p> </p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td valign="top"><h3><img src="../../c++boost.gif"
|
||||
alt="C++ Boost" width="276" height="86"></h3>
|
||||
</td>
|
||||
<td valign="top"><h3 align="center">Regex++, Regular
|
||||
Expression Syntax.</h3>
|
||||
<p align="left"><i>Copyright (c) 1998-2001 </i></p>
|
||||
<p align="left"><i>Dr John Maddock</i></p>
|
||||
<p align="left"><i>Permission to use, copy, modify,
|
||||
distribute and sell this software and its documentation
|
||||
for any purpose is hereby granted without fee, provided
|
||||
that the above copyright notice appear in all copies and
|
||||
that both that copyright notice and this permission
|
||||
notice appear in supporting documentation. Dr John
|
||||
Maddock makes no representations about the suitability of
|
||||
this software for any purpose. It is provided "as is"
|
||||
without express or implied warranty.</i></p>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<hr>
|
||||
|
||||
<h3><a name="syntax"></a><i>Regular expression syntax</i></h3>
|
||||
|
||||
<p>This section covers the regular expression syntax used by this
|
||||
library, this is a programmers guide, the actual syntax presented
|
||||
to your program's users will depend upon the flags used during
|
||||
expression compilation. </p>
|
||||
|
||||
<p><i>Literals</i> </p>
|
||||
|
||||
<p>All characters are literals except: ".", "|",
|
||||
"*", "?", "+", "(",
|
||||
")", "{", "}", "[",
|
||||
"]", "^", "$" and "\".
|
||||
These characters are literals when preceded by a "\". A
|
||||
literal is a character that matches itself, or matches the result
|
||||
of traits_type::translate(), where traits_type is the traits
|
||||
template parameter to class reg_expression. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Wildcard</i> </p>
|
||||
|
||||
<p>The dot character "." matches any single character
|
||||
except : when <i>match_not_dot_null</i> is passed to the matching
|
||||
algorithms, the dot does not match a null character; when <i>match_not_dot_newline</i>
|
||||
is passed to the matching algorithms, then the dot does not match
|
||||
a newline character. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Repeats</i> </p>
|
||||
|
||||
<p>A repeat is an expression that is repeated an arbitrary number
|
||||
of times. An expression followed by "*" can be repeated
|
||||
any number of times including zero. An expression followed by
|
||||
"+" can be repeated any number of times, but at least
|
||||
once, if the expression is compiled with the flag regbase::bk_plus_qm
|
||||
then "+" is an ordinary character and "\+"
|
||||
represents a repeat of once or more. An expression followed by
|
||||
"?" may be repeated zero or one times only, if the
|
||||
expression is compiled with the flag regbase::bk_plus_qm then
|
||||
"?" is an ordinary character and "\?"
|
||||
represents the repeat zero or once operator. When it is necessary
|
||||
to specify the minimum and maximum number of repeats explicitly,
|
||||
the bounds operator "{}" may be used, thus "a{2}"
|
||||
is the letter "a" repeated exactly twice, "a{2,4}"
|
||||
represents the letter "a" repeated between 2 and 4
|
||||
times, and "a{2,}" represents the letter "a"
|
||||
repeated at least twice with no upper limit. Note that there must
|
||||
be no white-space inside the {}, and there is no upper limit on
|
||||
the values of the lower and upper bounds. When the expression is
|
||||
compiled with the flag regbase::bk_braces then "{" and
|
||||
"}" are ordinary characters and "\{" and
|
||||
"\}" are used to delimit bounds instead. All repeat
|
||||
expressions refer to the shortest possible previous sub-expression:
|
||||
a single character; a character set, or a sub-expression grouped
|
||||
with "()" for example. </p>
|
||||
|
||||
<p>Examples: </p>
|
||||
|
||||
<p>"ba*" will match all of "b", "ba",
|
||||
"baaa" etc. </p>
|
||||
|
||||
<p>"ba+" will match "ba" or "baaaa"
|
||||
for example but not "b". </p>
|
||||
|
||||
<p>"ba?" will match "b" or "ba". </p>
|
||||
|
||||
<p>"ba{2,4}" will match "baa", "baaa"
|
||||
and "baaaa". </p>
|
||||
|
||||
<p><i>Non-greedy repeats</i> </p>
|
||||
|
||||
<p>Whenever the "extended" regular expression syntax is
|
||||
in use (the default) then non-greedy repeats are possible by
|
||||
appending a '?' after the repeat; a non-greedy repeat is one
|
||||
which will match the <i>shortest</i> possible string. </p>
|
||||
|
||||
<p>For example to match html tag pairs one could use something
|
||||
like: </p>
|
||||
|
||||
<p>"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
|
||||
</p>
|
||||
|
||||
<p>In this case $1 will contain the text between the tag pairs,
|
||||
and will be the shortest possible matching string. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Parenthesis</i> </p>
|
||||
|
||||
<p>Parentheses serve two purposes, to group items together into a
|
||||
sub-expression, and to mark what generated the match. For example
|
||||
the expression "(ab)*" would match all of the string
|
||||
"ababab". The matching algorithms <a
|
||||
href="template_class_ref.htm#query_match">regex_match</a> and <a
|
||||
href="template_class_ref.htm#reg_search">regex_search</a> each
|
||||
take an instance of <a href="template_class_ref.htm#reg_match">match_results</a>
|
||||
that reports what caused the match, on exit from these functions
|
||||
the <a href="template_class_ref.htm#reg_match">match_results</a>
|
||||
contains information both on what the whole expression matched
|
||||
and on what each sub-expression matched. In the example above
|
||||
match_results[1] would contain a pair of iterators denoting the
|
||||
final "ab" of the matching string. It is permissible
|
||||
for sub-expressions to match null strings. If a sub-expression
|
||||
takes no part in a match - for example if it is part of an
|
||||
alternative that is not taken - then both of the iterators that
|
||||
are returned for that sub-expression point to the end of the
|
||||
input string, and the <i>matched</i> parameter for that sub-expression
|
||||
is <i>false</i>. Sub-expressions are indexed from left to right
|
||||
starting from 1, sub-expression 0 is the whole expression. </p>
|
||||
|
||||
<p><i>Non-Marking Parenthesis</i> </p>
|
||||
|
||||
<p>Sometimes you need to group sub-expressions with parenthesis,
|
||||
but don't want the parenthesis to spit out another marked sub-expression,
|
||||
in this case a non-marking parenthesis (?:expression) can be used.
|
||||
For example the following expression creates no sub-expressions: </p>
|
||||
|
||||
<p>"(?:abc)*"</p>
|
||||
|
||||
<p><em>Forward Lookahead Asserts</em> </p>
|
||||
|
||||
<p>There are two forms of these; one for positive forward
|
||||
lookahead asserts, and one for negative lookahead asserts:</p>
|
||||
|
||||
<p>"(?=abc)" matches zero characters only if they are
|
||||
followed by the expression "abc".</p>
|
||||
|
||||
<p>"(?!abc)" matches zero characters only if they are
|
||||
not followed by the expression "abc".</p>
|
||||
|
||||
<p><i>Alternatives</i> </p>
|
||||
|
||||
<p>Alternatives occur when the expression can match either one
|
||||
sub-expression or another, each alternative is separated by a
|
||||
"|", or a "\|" if the flag regbase::bk_vbar
|
||||
is set, or by a newline character if the flag regbase::newline_alt
|
||||
is set. Each alternative is the largest possible previous sub-expression;
|
||||
this is the opposite behaviour from repetition operators. </p>
|
||||
|
||||
<p>Examples: </p>
|
||||
|
||||
<p>"a(b|c)" could match "ab" or "ac".
|
||||
</p>
|
||||
|
||||
<p>"abc|def" could match "abc" or "def".
|
||||
<br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Sets</i> </p>
|
||||
|
||||
<p>A set is a set of characters that can match any single
|
||||
character that is a member of the set. Sets are delimited by
|
||||
"[" and "]" and can contain literals,
|
||||
character ranges, character classes, collating elements and
|
||||
equivalence classes. Set declarations that start with "^"
|
||||
contain the compliment of the elements that follow. </p>
|
||||
|
||||
<p>Examples: </p>
|
||||
|
||||
<p>Character literals: </p>
|
||||
|
||||
<p>"[abc]" will match either of "a", "b",
|
||||
or "c". </p>
|
||||
|
||||
<p>"[^abc] will match any character other than "a",
|
||||
"b", or "c". </p>
|
||||
|
||||
<p>Character ranges: </p>
|
||||
|
||||
<p>"[a-z]" will match any character in the range "a"
|
||||
to "z". </p>
|
||||
|
||||
<p>"[^A-Z]" will match any character other than those
|
||||
in the range "A" to "Z". </p>
|
||||
|
||||
<p>Note that character ranges are highly locale dependent: they
|
||||
match any character that collates between the endpoints of the
|
||||
range, ranges will only behave according to ASCII rules when the
|
||||
default "C" locale is in effect. For example if the
|
||||
library is compiled with the Win32 localization model, then [a-z]
|
||||
will match the ASCII characters a-z, and also 'A', 'B' etc, but
|
||||
not 'Z' which collates just after 'z'. This locale specific
|
||||
behaviour can be disabled by specifying regbase::nocollate when
|
||||
compiling, this is the default behaviour when using regbase::normal,
|
||||
and forces ranges to collate according to ASCII character code.
|
||||
Likewise, if you use the POSIX C API functions then setting
|
||||
REG_NOCOLLATE turns off locale dependent collation. </p>
|
||||
|
||||
<p>Character classes are denoted using the syntax "[:classname:]"
|
||||
within a set declaration, for example "[[:space:]]" is
|
||||
the set of all whitespace characters. Character classes are only
|
||||
available if the flag regbase::char_classes is set. The available
|
||||
character classes are: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="50%">alnum</td>
|
||||
<td valign="top" width="50%">Any alpha numeric character.</td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">alpha</td>
|
||||
<td valign="top" width="50%">Any alphabetical character a-z
|
||||
and A-Z. Other characters may also be included depending
|
||||
upon the locale.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">blank</td>
|
||||
<td valign="top" width="50%">Any blank character, either
|
||||
a space or a tab.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">cntrl</td>
|
||||
<td valign="top" width="50%">Any control character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">digit</td>
|
||||
<td valign="top" width="50%">Any digit 0-9.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">graph</td>
|
||||
<td valign="top" width="50%">Any graphical character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">lower</td>
|
||||
<td valign="top" width="50%">Any lower case character a-z.
|
||||
Other characters may also be included depending upon the
|
||||
locale.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">print</td>
|
||||
<td valign="top" width="50%">Any printable character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">punct</td>
|
||||
<td valign="top" width="50%">Any punctuation character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">space</td>
|
||||
<td valign="top" width="50%">Any whitespace character.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">upper</td>
|
||||
<td valign="top" width="50%">Any upper case character A-Z.
|
||||
Other characters may also be included depending upon the
|
||||
locale.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">xdigit</td>
|
||||
<td valign="top" width="50%">Any hexadecimal digit
|
||||
character, 0-9, a-f and A-F.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">word</td>
|
||||
<td valign="top" width="50%">Any word character - all
|
||||
alphanumeric characters plus the underscore.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="50%">unicode</td>
|
||||
<td valign="top" width="50%">Any character whose code is
|
||||
greater than 255, this applies to the wide character
|
||||
traits classes only.</td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p>There are some shortcuts that can be used in place of the
|
||||
character classes, provided the flag regbase::escape_in_lists is
|
||||
set then you can use: </p>
|
||||
|
||||
<p>\w in place of [:word:] </p>
|
||||
|
||||
<p>\s in place of [:space:] </p>
|
||||
|
||||
<p>\d in place of [:digit:] </p>
|
||||
|
||||
<p>\l in place of [:lower:] </p>
|
||||
|
||||
<p>\u in place of [:upper:] <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p>Collating elements take the general form [.tagname.] inside a
|
||||
set declaration, where <i>tagname</i> is either a single
|
||||
character, or a name of a collating element, for example [[.a.]]
|
||||
is equivalent to [a], and [[.comma.]] is equivalent to [,]. The
|
||||
library supports all the standard POSIX collating element names,
|
||||
and in addition the following digraphs: "ae", "ch",
|
||||
"ll", "ss", "nj", "dz",
|
||||
"lj", each in lower, upper and title case variations.
|
||||
Multi-character collating elements can result in the set matching
|
||||
more than one character, for example [[.ae.]] would match two
|
||||
characters, but note that [^[.ae.]] would only match one
|
||||
character. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p>Equivalence classes take the general form [=tagname=] inside a
|
||||
set declaration, where <i>tagname</i> is either a single
|
||||
character, or a name of a collating element, and matches any
|
||||
character that is a member of the same primary equivalence class
|
||||
as the collating element [.tagname.]. An equivalence class is a
|
||||
set of characters that collate the same, a primary equivalence
|
||||
class is a set of characters whose primary sort key are all the
|
||||
same (for example strings are typically collated by character,
|
||||
then by accent, and then by case; the primary sort key then
|
||||
relates to the character, the secondary to the accentation, and
|
||||
the tertiary to the case). If there is no equivalence class
|
||||
corresponding to <i>tagname</i>, then [=tagname=] is exactly the
|
||||
same as [.tagname.]. Unfortunately there is no locale independent
|
||||
method of obtaining the primary sort key for a character, except
|
||||
under Win32. For other operating systems the library will "guess"
|
||||
the primary sort key from the full sort key (obtained from <i>strxfrm</i>),
|
||||
so equivalence classes are probably best considered broken under
|
||||
any operating system other than Win32. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p>To include a literal "-" in a set declaration then:
|
||||
make it the first character after the opening "[" or
|
||||
"[^", the endpoint of a range, a collating element, or
|
||||
if the flag regbase::escape_in_lists is set then precede with an
|
||||
escape character as in "[\-]". To include a literal
|
||||
"[" or "]" or "^" in a set then
|
||||
make them the endpoint of a range, a collating element, or
|
||||
precede with an escape character if the flag regbase::escape_in_lists
|
||||
is set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Line anchors</i> </p>
|
||||
|
||||
<p>An anchor is something that matches the null string at the
|
||||
start or end of a line: "^" matches the null string at
|
||||
the start of a line, "$" matches the null string at the
|
||||
end of a line. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Back references</i> </p>
|
||||
|
||||
<p>A back reference is a reference to a previous sub-expression
|
||||
that has already been matched, the reference is to what the sub-expression
|
||||
matched, not to the expression itself. A back reference consists
|
||||
of the escape character "\" followed by a digit "1"
|
||||
to "9", "\1" refers to the first sub-expression,
|
||||
"\2" to the second etc. For example the expression
|
||||
"(.*)\1" matches any string that is repeated about its
|
||||
mid-point for example "abcabc" or "xyzxyz". A
|
||||
back reference to a sub-expression that did not participate in
|
||||
any match, matches the null string: NB this is different to some
|
||||
other regular expression matchers. Back references are only
|
||||
available if the expression is compiled with the flag regbase::bk_refs
|
||||
set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Characters by code</i> </p>
|
||||
|
||||
<p>This is an extension to the algorithm that is not available in
|
||||
other libraries, it consists of the escape character followed by
|
||||
the digit "0" followed by the octal character code. For
|
||||
example "\023" represents the character whose octal
|
||||
code is 23. Where ambiguity could occur use parentheses to break
|
||||
the expression up: "\0103" represents the character
|
||||
whose code is 103, "(\010)3 represents the character 10
|
||||
followed by "3". To match characters by their
|
||||
hexadecimal code, use \x followed by a string of hexadecimal
|
||||
digits, optionally enclosed inside {}, for example \xf0 or
|
||||
\x{aff}, notice the latter example is a Unicode character. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Word operators</i> </p>
|
||||
|
||||
<p>The following operators are provided for compatibility with
|
||||
the GNU regular expression library. </p>
|
||||
|
||||
<p>"\w" matches any single character that is a member
|
||||
of the "word" character class, this is identical to the
|
||||
expression "[[:word:]]". </p>
|
||||
|
||||
<p>"\W" matches any single character that is not a
|
||||
member of the "word" character class, this is identical
|
||||
to the expression "[^[:word:]]". </p>
|
||||
|
||||
<p>"\<" matches the null string at the start of a
|
||||
word. </p>
|
||||
|
||||
<p>"\>" matches the null string at the end of the
|
||||
word. </p>
|
||||
|
||||
<p>"\b" matches the null string at either the start or
|
||||
the end of a word. </p>
|
||||
|
||||
<p>"\B" matches a null string within a word. </p>
|
||||
|
||||
<p>The start of the sequence passed to the matching algorithms is
|
||||
considered to be a potential start of a word unless the flag
|
||||
match_not_bow is set. The end of the sequence passed to the
|
||||
matching algorithms is considered to be a potential end of a word
|
||||
unless the flag match_not_eow is set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Buffer operators</i> </p>
|
||||
|
||||
<p>The following operators are provide for compatibility with the
|
||||
GNU regular expression library, and Perl regular expressions: </p>
|
||||
|
||||
<p>"\`" matches the start of a buffer. </p>
|
||||
|
||||
<p>"\A" matches the start of the buffer. </p>
|
||||
|
||||
<p>"\'" matches the end of a buffer. </p>
|
||||
|
||||
<p>"\z" matches the end of a buffer. </p>
|
||||
|
||||
<p>"\Z" matches the end of a buffer, or possibly one or
|
||||
more new line characters followed by the end of the buffer. </p>
|
||||
|
||||
<p>A buffer is considered to consist of the whole sequence passed
|
||||
to the matching algorithms, unless the flags match_not_bob or
|
||||
match_not_eob are set. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Escape operator</i> </p>
|
||||
|
||||
<p>The escape character "\" has several meanings. </p>
|
||||
|
||||
<p>Inside a set declaration the escape character is a normal
|
||||
character unless the flag regbase::escape_in_lists is set in
|
||||
which case whatever follows the escape is a literal character
|
||||
regardless of its normal meaning. </p>
|
||||
|
||||
<p>The escape operator may introduce an operator for example:
|
||||
back references, or a word operator. </p>
|
||||
|
||||
<p>The escape operator may make the following character normal,
|
||||
for example "\*" represents a literal "*"
|
||||
rather than the repeat operator. <br>
|
||||
<br>
|
||||
</p>
|
||||
|
||||
<p><i>Single character escape sequences</i> </p>
|
||||
|
||||
<p>The following escape sequences are aliases for single
|
||||
characters: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="33%">Escape sequence </td>
|
||||
<td valign="top" width="33%">Character code </td>
|
||||
<td valign="top" width="33%">Meaning </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\a </td>
|
||||
<td valign="top" width="33%">0x07 </td>
|
||||
<td valign="top" width="33%">Bell character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\f </td>
|
||||
<td valign="top" width="33%">0x0C </td>
|
||||
<td valign="top" width="33%">Form feed. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\n </td>
|
||||
<td valign="top" width="33%">0x0A </td>
|
||||
<td valign="top" width="33%">Newline character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\r </td>
|
||||
<td valign="top" width="33%">0x0D </td>
|
||||
<td valign="top" width="33%">Carriage return. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\t </td>
|
||||
<td valign="top" width="33%">0x09 </td>
|
||||
<td valign="top" width="33%">Tab character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\v </td>
|
||||
<td valign="top" width="33%">0x0B </td>
|
||||
<td valign="top" width="33%">Vertical tab. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\e </td>
|
||||
<td valign="top" width="33%">0x1B </td>
|
||||
<td valign="top" width="33%">ASCII Escape character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\0dd </td>
|
||||
<td valign="top" width="33%">0dd </td>
|
||||
<td valign="top" width="33%">An octal character code,
|
||||
where <i>dd</i> is one or more octal digits. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\xXX </td>
|
||||
<td valign="top" width="33%">0xXX </td>
|
||||
<td valign="top" width="33%">A hexadecimal character
|
||||
code, where XX is one or more hexadecimal digits. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\x{XX} </td>
|
||||
<td valign="top" width="33%">0xXX </td>
|
||||
<td valign="top" width="33%">A hexadecimal character
|
||||
code, where XX is one or more hexadecimal digits,
|
||||
optionally a unicode character. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> </td>
|
||||
<td valign="top" width="33%">\cZ </td>
|
||||
<td valign="top" width="33%">z-@ </td>
|
||||
<td valign="top" width="33%">An ASCII escape sequence
|
||||
control-Z, where Z is any ASCII character greater than or
|
||||
equal to the character code for '@'. </td>
|
||||
<td> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><i>Miscellaneous escape sequences:</i> </p>
|
||||
|
||||
<p>The following are provided mostly for perl compatibility, but
|
||||
note that there are some differences in the meanings of \l \L \u
|
||||
and \U: <br>
|
||||
</p>
|
||||
|
||||
<table border="0" cellpadding="6" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\w </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:word:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\W </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:word:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\s </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:space:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\S </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:space:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\d </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:digit:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\D </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:digit:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\l </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:lower:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\L </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:lower:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\u </td>
|
||||
<td valign="top" width="45%">Equivalent to [[:upper:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\U </td>
|
||||
<td valign="top" width="45%">Equivalent to [^[:upper:]]. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\C </td>
|
||||
<td valign="top" width="45%">Any single character,
|
||||
equivalent to '.'. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\X </td>
|
||||
<td valign="top" width="45%">Match any Unicode combining
|
||||
character sequence, for example "a\x 0301" (a
|
||||
letter a with an acute). </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\Q </td>
|
||||
<td valign="top" width="45%">The begin quote operator,
|
||||
everything that follows is treated as a literal character
|
||||
until a \E end quote operator is found. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td width="5%"> </td>
|
||||
<td valign="top" width="45%">\E </td>
|
||||
<td valign="top" width="45%">The end quote operator,
|
||||
terminates a sequence begun with \Q. </td>
|
||||
<td width="5%"> </td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p><br>
|
||||
</p>
|
||||
|
||||
<p><i>What gets matched?</i> </p>
|
||||
|
||||
<p>The regular expression library will match the first possible
|
||||
matching string, if more than one string starting at a given
|
||||
location can match then it matches the longest possible string,
|
||||
unless the flag match_any is set, in which case the first match
|
||||
encountered is returned. Use of the match_any option can reduce
|
||||
the time taken to find the match - but is only useful if the user
|
||||
is less concerned about what matched - for example it would not
|
||||
be suitable for search and replace operations. In cases where
|
||||
their are multiple possible matches all starting at the same
|
||||
location, and all of the same length, then the match chosen is
|
||||
the one with the longest first sub-expression, if that is the
|
||||
same for two or more matches, then the second sub-expression will
|
||||
be examined and so on. <br>
|
||||
</p>
|
||||
|
||||
<hr>
|
||||
|
||||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||||
</body>
|
||||
</html>
|
File diff suppressed because it is too large
Load Diff
1016
traits_class_ref.htm
1016
traits_class_ref.htm
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user