forked from boostorg/regex
1340 lines
54 KiB
HTML
1340 lines
54 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//w3c//dtd html 4.0 transitional//en">
|
||
|
||
<HTML>
|
||
|
||
<HEAD>
|
||
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
||
<META NAME="Template"
|
||
CONTENT="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||
<META NAME="GENERATOR" CONTENT="Mozilla/4.5 [en] (Win98; I) [Netscape]">
|
||
<TITLE>Regex++, Appendices</TITLE>
|
||
</HEAD>
|
||
|
||
<BODY BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#800080">
|
||
<TABLE BORDER="0" CELLSPACING="0" CELLPADDING="7" WIDTH="100%">
|
||
<TR>
|
||
<TD VALIGN="TOP" WIDTH="50%"> <H3>
|
||
<IMG SRC="../../c++boost.gif" HEIGHT="86" WIDTH="276" ALT="C++ Boost"></H3>
|
||
</TD>
|
||
<TD VALIGN="TOP" WIDTH="50%"> <CENTER>
|
||
<H3> Regex++, Appendices.</H3>
|
||
</CENTER>
|
||
<CENTER>
|
||
<I>(version 3.03, 18 April 2000)</I>
|
||
</CENTER>
|
||
<PRE><I>Copyright (c) 1998-2000
|
||
Dr John Maddock
|
||
|
||
Permission to use, copy, modify, distribute and sell this software
|
||
and its documentation for any purpose is hereby granted without fee,
|
||
provided that the above copyright notice appear in all copies and
|
||
that both that copyright notice and this permission notice appear
|
||
in supporting documentation. Dr John Maddock makes no representations
|
||
about the suitability of this software for any purpose.
|
||
It is provided "as is" without express or implied warranty.</i></pre>
|
||
</td>
|
||
</tr>
|
||
</table>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="implementation"></a>Appendix 1: Implementation notes</h3>
|
||
|
||
<p>This is the first port of regex++ to the boost library, and is
|
||
based on regex++ 2.x, see changes.txt for a full list of changes
|
||
from the previous version. There are no known functionality bugs
|
||
except that POSIX style equivalence classes are only guaranteed
|
||
correct if the Win32 localization model is used (the default for
|
||
Win32 builds of the library). </p>
|
||
|
||
<p>There are some aspects of the code that C++ puritans will
|
||
consider to be poor style, in particular the use of goto in some
|
||
of the algorithms. The code could be cleaned up, by changing to a
|
||
recursive implementation, although it is likely to be slower in
|
||
that case. </p>
|
||
|
||
<p>The performance of the algorithms should be satisfactory in
|
||
most cases. For example the times taken to match the ftp response
|
||
expression "^([0-9]+)(\-| |$)(.*)$" against the string
|
||
"100- this is a line of ftp response which contains a
|
||
message string" are: BSD implementation 450 micro seconds,
|
||
GNU implementation 271 micro seconds, regex++ 127 micro seconds (Pentium
|
||
P90, Win32 console app under MS Windows 95). </p>
|
||
|
||
<p>However it should be noted that there are some "pathological"
|
||
expressions which may require exponential time for matching;
|
||
these all involve nested repetition operators, for example
|
||
attempting to match the expression "(a*a)*b" against <i>N</i>
|
||
letter a's requires time proportional to <i>2</i><sup><i>N</i></sup>.
|
||
These expressions can (almost) always be rewritten in such a way
|
||
as to avoid the problem, for example "(a*a)*b" could be
|
||
rewritten as "a*b" which requires only time linearly
|
||
proportional to <i>N</i> to solve. In the general case, non-nested
|
||
repeat expressions require time proportional to <i>N</i><sup><i>2</i></sup>,
|
||
however if the clauses are mutually exclusive then they can be
|
||
matched in linear time - this is the case with "a*b",
|
||
for each character the matcher will either match an "a"
|
||
or a "b" or fail, where as with "a*a" the
|
||
matcher can't tell which branch to take (the first "a"
|
||
or the second) and so has to try both. <i>Be careful how you
|
||
write your regular expressions and avoid nested repeats if you
|
||
can! New to this version, some previously pathological cases have
|
||
been fixed - in particular searching for expressions which
|
||
contain leading repeats and/or leading literal strings should be
|
||
much faster than before. Literal strings are now searched for
|
||
using the Knuth/Morris/Pratt algorithm (this is used in
|
||
preference to the Boyer/More algorithm because it allows the
|
||
tracking of newline characters).</i> </p>
|
||
|
||
<p><i>Some aspects of the POSIX regular expression syntax are
|
||
implementation defined:</i> </p>
|
||
|
||
<ul>
|
||
<li>The "leftmost-longest" rule for determining
|
||
what matches is ambiguous, this library takes the "obvious"
|
||
interpretation: find the leftmost match, then maximize
|
||
the length of each sub-expression in turn with lower
|
||
indexed sub-expressions taking priority over higher
|
||
indexed sub-expression.</li>
|
||
<li>The behavior of multi-character collating elements is
|
||
ambiguous in the standard, in particular expressions such
|
||
as [a[.ae.]] may have subtle inconsistencies lurking in
|
||
them. This implementation matches bracket expressions as
|
||
follows: all bracket expressions match a single character
|
||
only, unless the expression contains a multi-character
|
||
collating element, either on its own, or as the endpoint
|
||
to a range, in which case the expression may match more
|
||
than one character.</li>
|
||
<li>Repeated null expressions are repeated only once, they
|
||
are treated "as if" they were matched the
|
||
maximum number of times allowed by the expression.</li>
|
||
<li>The behavior of back references is ambiguous in the
|
||
standard, in particular it is unclear whether expressions
|
||
of the form "((ab*)\2)+" should be allowed.
|
||
This implementation allows such expressions and the back
|
||
reference matches whatever the last sub-expression match
|
||
was. This means that at the end of the match, the back
|
||
references may have matched strings different from the
|
||
final value of the sub-expression to which they refer.</li>
|
||
</ul>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="threads"></a>Appendix 2: Thread safety</h3>
|
||
|
||
<p>Class reg_expression<> and its typedefs regex and wregex
|
||
are thread safe, in that compiled regular expressions can safely
|
||
be shared between threads. The matching algorithms regex_match,
|
||
regex_search, regex_grep, regex_format and regex_merge are all re-entrant
|
||
and thread safe. Class match_results is now thread safe, in that
|
||
the results of a match can be safely copied from one thread to
|
||
another (for example one thread may find matches and push match_results
|
||
instances onto a queue, while another thread pops them off the
|
||
other end), otherwise use a separate instance of match_results
|
||
per thread. </p>
|
||
|
||
<p>The POSIX API functions are all re-entrant and thread safe,
|
||
regular expressions compiled with <i>regcomp</i> can also be
|
||
shared between threads. </p>
|
||
|
||
<p>The class RegEx is only thread safe if each thread gets its
|
||
own RegEx instance (apartment threading) - this is a consequence
|
||
of RegEx handling both compiling and matching regular expressions.
|
||
</p>
|
||
|
||
<p>Finally note that changing the global locale invalidates all
|
||
compiled regular expressions, therefore calling <i>set_locale</i>
|
||
from one thread while another uses regular expressions <i>will</i>
|
||
produce unpredictable results. </p>
|
||
|
||
<p>There is also a requirement that there is only one thread
|
||
executing prior to the start of main(). <br>
|
||
</p>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="localisation"></a>Appendix 3: Localization</h3>
|
||
|
||
<p> Regex++ provides extensive support for run-time
|
||
localization, the localization model used can be split into two
|
||
parts: front-end and back-end. </p>
|
||
|
||
<p>Front-end localization deals with everything which the user
|
||
sees - error messages, and the regular expression syntax itself.
|
||
For example a French application could change [[:word:]] to [[:mot:]]
|
||
and \w to \m. Modifying the front end locale requires active
|
||
support from the developer, by providing the library with a
|
||
message catalogue to load, containing the localized strings.
|
||
Front-end locale is affected by the LC_MESSAGES category only. </p>
|
||
|
||
<p>Back-end localization deals with everything that occurs after
|
||
the expression has been parsed - in other words everything that
|
||
the user does not see or interact with directly. It deals with
|
||
case conversion, collation, and character class membership. The
|
||
back-end locale does not require any intervention from the
|
||
developer - the library will acquire all the information it
|
||
requires for the current locale from the underlying operating
|
||
system / run time library. This means that if the program user
|
||
does not interact with regular expressions directly - for example
|
||
if the expressions are embedded in your C++ code - then no
|
||
explicit localization is required, as the library will take care
|
||
of everything for you. For example embedding the expression [[:word:]]+
|
||
in your code will always match a whole word, if the program is
|
||
run on a machine with, for example, a Greek locale, then it will
|
||
still match a whole word, but in Greek characters rather than
|
||
Latin ones. The back-end locale is affected by the LC_TYPE and LC_COLLATE
|
||
categories. </p>
|
||
|
||
<p>There are three separate localization mechanisms supported by
|
||
regex++: </p>
|
||
|
||
<p><i>Win32 localization model.</i> </p>
|
||
|
||
<p>This is the default model when the library is compiled under
|
||
Win32, and is encapsulated by the traits class <a
|
||
href="template_class_ref.htm#regex_char_traits">w32_regex_traits</a>.
|
||
When this model is in effect there is a single global locale as
|
||
defined by the user's control panel settings, and returned by
|
||
GetUserDefaultLCID. All the settings used by regex++ are acquired
|
||
directly from the operating system bypassing the C run time
|
||
library. Front-end localization requires a resource dll,
|
||
containing a string table with the user-defined strings. The
|
||
traits class exports the function: </p>
|
||
|
||
<p>static std::string set_message_catalogue(const std::string&
|
||
s); </p>
|
||
|
||
<p>which needs to be called with a string identifying the name of
|
||
the resource dll, <i>before</i> your code compiles any regular
|
||
expressions (but not necessarily before you construct any <i>reg_expression</i>
|
||
instances): </p>
|
||
|
||
<p>boost::w32_regex_traits<char>::set_message_calalogue("mydll.dll");
|
||
</p>
|
||
|
||
<p>Note that this API sets the dll name for <i>both</i> the
|
||
narrow and wide character specializations of w32_regex_traits. </p>
|
||
|
||
<p>This model does not currently support thread specific locales
|
||
(via SetThreadLocale under Windows NT), the library provides full
|
||
Unicode support under NT, under Windows 9x the library degrades
|
||
gracefully - characters 0 to 255 are supported, the remainder are
|
||
treated as "unknown" graphic characters. </p>
|
||
|
||
<p><i>C localization model.</i> </p>
|
||
|
||
<p>This is the default model when the library is compiled under
|
||
an operating system other than Win32, and is encapsulated by the
|
||
traits class <a href="template_class_ref.htm#regex_char_traits"><i>c_regex_traits</i></a>,
|
||
Win32 users can force this model to take effect by defining the
|
||
pre-processor symbol BOOST_RE_LOCALE_C. When this model is in
|
||
effect there is a single global locale, as set by <i>setlocale</i>.
|
||
All settings are acquired from your run time library,
|
||
consequently Unicode support is dependent upon your run time
|
||
library implementation. Front end localization requires a POSIX
|
||
message catalogue. The traits class exports the function: </p>
|
||
|
||
<p>static std::string set_message_catalogue(const std::string&
|
||
s); </p>
|
||
|
||
<p>which needs to be called with a string identifying the name of
|
||
the message catalogue, <i>before</i> your code compiles any
|
||
regular expressions (but not necessarily before you construct any
|
||
<i>reg_expression</i> instances): </p>
|
||
|
||
<p>boost::c_regex_traits<char>::set_message_calalogue("mycatalogue");
|
||
</p>
|
||
|
||
<p>Note that this API sets the dll name for <i>both</i> the
|
||
narrow and wide character specializations of c_regex_traits. If
|
||
your run time library does not support POSIX message catalogues,
|
||
then you can either provide your own implementation of <nl_types.h>
|
||
or define BOOST_RE_NO_CAT to disable front-end localization via
|
||
message catalogues. </p>
|
||
|
||
<p>Note that calling <i>setlocale</i> invalidates all compiled
|
||
regular expressions, calling <tt>setlocale(LC_ALL, "C")</tt>
|
||
will make this library behave equivalent to most traditional
|
||
regular expression libraries including version 1 of this library.
|
||
</p>
|
||
|
||
<p><i><tt>C++ </tt></i><i>localization</i><i><tt> </tt></i><i>model</i><i><tt>.</tt></i>
|
||
</p>
|
||
|
||
<p>This model is only in effect if the library is built with the
|
||
pre-processor symbol BOOST_RE_LOCALE_CPP defined. When this model
|
||
is in effect each instance of reg_expression<> has its own
|
||
instance of std::locale, class reg_expression<> also has a
|
||
member function <i>imbue</i> which allows the locale for the
|
||
expression to be set on a per-instance basis. Front end
|
||
localization requires a POSIX message catalogue, which will be
|
||
loaded via the std::messages facet of the expression's locale,
|
||
the traits class exports the symbol: </p>
|
||
|
||
<p>static std::string set_message_catalogue(const std::string&
|
||
s); </p>
|
||
|
||
<p>which needs to be called with a string identifying the name of
|
||
the message catalogue, <i>before</i> your code compiles any
|
||
regular expressions (but not necessarily before you construct any
|
||
<i>reg_expression</i> instances): </p>
|
||
|
||
<p>boost::cpp_regex_traits<char>::set_message_calalogue("mycatalogue");
|
||
</p>
|
||
|
||
<p>Note that calling reg_expression<>::imbue will
|
||
invalidate any expression currently compiled in that instance of
|
||
reg_expression<>. This model is the one which closest fits
|
||
the ethos of the C++ standard library, however it is the model
|
||
which will produce the slowest code, and which is the least well
|
||
supported by current standard library implementations, for
|
||
example I have yet to find an implementation of std::locale which
|
||
supports either message catalogues, or locales other than "C"
|
||
or "POSIX". </p>
|
||
|
||
<p>Finally note that if you build the library with a non-default
|
||
localization model, then the appropriate pre-processor symbol (BOOST_RE_LOCALE_C
|
||
or BOOST_RE_LOCALE_CPP) must be defined both when you build the
|
||
support library, and when you include <boost/regex.hpp> or
|
||
<boost/cregex.hpp> in your code. The best way to ensure
|
||
this is to add the #define to <boost/re_detail/jm_opt.h>. </p>
|
||
|
||
<p><i>Providing a message catalogue:</i> </p>
|
||
|
||
<p>In order to localize the front end of the library, you need to
|
||
provide the library with the appropriate message strings
|
||
contained either in a resource dll's string table (Win32 model),
|
||
or a POSIX message catalogue (C or C++ models). In the latter
|
||
case the messages must appear in message set zero of the
|
||
catalogue. The messages and their id's are as follows: <br>
|
||
</p>
|
||
|
||
<table border="0" cellpadding="6" cellspacing="0" width="100%">
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">Message id </td>
|
||
<td valign="top" width="32%">Meaning </td>
|
||
<td valign="top" width="29%">Default value </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">101 </td>
|
||
<td valign="top" width="32%">The character used to start
|
||
a sub-expression. </td>
|
||
<td valign="top" width="29%">"(" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">102 </td>
|
||
<td valign="top" width="32%">The character used to end a
|
||
sub-expression declaration. </td>
|
||
<td valign="top" width="29%">")" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">103 </td>
|
||
<td valign="top" width="32%">The character used to denote
|
||
an end of line assertion. </td>
|
||
<td valign="top" width="29%">"$" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">104 </td>
|
||
<td valign="top" width="32%">The character used to denote
|
||
the start of line assertion. </td>
|
||
<td valign="top" width="29%">"^" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">105 </td>
|
||
<td valign="top" width="32%">The character used to denote
|
||
the "match any character expression". </td>
|
||
<td valign="top" width="29%">"." </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">106 </td>
|
||
<td valign="top" width="32%">The match zero or more times
|
||
repetition operator. </td>
|
||
<td valign="top" width="29%">"*" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">107 </td>
|
||
<td valign="top" width="32%">The match one or more
|
||
repetition operator. </td>
|
||
<td valign="top" width="29%">"+" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">108 </td>
|
||
<td valign="top" width="32%">The match zero or one
|
||
repetition operator. </td>
|
||
<td valign="top" width="29%">"?" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">109 </td>
|
||
<td valign="top" width="32%">The character set opening
|
||
character. </td>
|
||
<td valign="top" width="29%">"[" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">110 </td>
|
||
<td valign="top" width="32%">The character set closing
|
||
character. </td>
|
||
<td valign="top" width="29%">"]" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">111 </td>
|
||
<td valign="top" width="32%">The alternation operator. </td>
|
||
<td valign="top" width="29%">"|" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">112 </td>
|
||
<td valign="top" width="32%">The escape character. </td>
|
||
<td valign="top" width="29%">"\\" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">113 </td>
|
||
<td valign="top" width="32%">The hash character (not
|
||
currently used). </td>
|
||
<td valign="top" width="29%">"#" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">114 </td>
|
||
<td valign="top" width="32%">The range operator. </td>
|
||
<td valign="top" width="29%">"-" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">115 </td>
|
||
<td valign="top" width="32%">The repetition operator
|
||
opening character. </td>
|
||
<td valign="top" width="29%">"{" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">116 </td>
|
||
<td valign="top" width="32%">The repetition operator
|
||
closing character. </td>
|
||
<td valign="top" width="29%">"}" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">117 </td>
|
||
<td valign="top" width="32%">The digit characters. </td>
|
||
<td valign="top" width="29%">"0123456789" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">118 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the word
|
||
boundary assertion. </td>
|
||
<td valign="top" width="29%">"b" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">119 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the non-word
|
||
boundary assertion. </td>
|
||
<td valign="top" width="29%">"B" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">120 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the word-start
|
||
boundary assertion. </td>
|
||
<td valign="top" width="29%">"<" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">121 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the word-end
|
||
boundary assertion. </td>
|
||
<td valign="top" width="29%">">" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">122 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any word
|
||
character. </td>
|
||
<td valign="top" width="29%">"w" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">123 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents a non-word
|
||
character. </td>
|
||
<td valign="top" width="29%">"W" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">124 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents a start of
|
||
buffer assertion. </td>
|
||
<td valign="top" width="29%">"`A" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">125 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents an end of
|
||
buffer assertion. </td>
|
||
<td valign="top" width="29%">"'z" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">126 </td>
|
||
<td valign="top" width="32%">The newline character. </td>
|
||
<td valign="top" width="29%">"\n" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">127 </td>
|
||
<td valign="top" width="32%">The comma separator. </td>
|
||
<td valign="top" width="29%">"," </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">128 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the bell
|
||
character. </td>
|
||
<td valign="top" width="29%">"a" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">129 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the form feed
|
||
character. </td>
|
||
<td valign="top" width="29%">"f" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">130 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the newline
|
||
character. </td>
|
||
<td valign="top" width="29%">"n" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">131 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the carriage
|
||
return character. </td>
|
||
<td valign="top" width="29%">"r" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">132 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the tab
|
||
character. </td>
|
||
<td valign="top" width="29%">"t" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">133 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the vertical
|
||
tab character. </td>
|
||
<td valign="top" width="29%">"v" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">134 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the start of a
|
||
hexadecimal character constant. </td>
|
||
<td valign="top" width="29%">"x" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">135 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the start of
|
||
an ASCII escape character. </td>
|
||
<td valign="top" width="29%">"c" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">136 </td>
|
||
<td valign="top" width="32%">The colon character. </td>
|
||
<td valign="top" width="29%">":" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">137 </td>
|
||
<td valign="top" width="32%">The equals character. </td>
|
||
<td valign="top" width="29%">"=" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">138 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the ASCII
|
||
escape character. </td>
|
||
<td valign="top" width="29%">"e" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">139 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any lower case
|
||
character. </td>
|
||
<td valign="top" width="29%">"l" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">140 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any non-lower
|
||
case character. </td>
|
||
<td valign="top" width="29%">"L" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">141 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any upper case
|
||
character. </td>
|
||
<td valign="top" width="29%">"u" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">142 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any non-upper
|
||
case character. </td>
|
||
<td valign="top" width="29%">"U" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">143 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any space
|
||
character. </td>
|
||
<td valign="top" width="29%">"s" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">144 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any non-space
|
||
character. </td>
|
||
<td valign="top" width="29%">"S" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">145 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any digit
|
||
character. </td>
|
||
<td valign="top" width="29%">"d" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">146 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any non-digit
|
||
character. </td>
|
||
<td valign="top" width="29%">"D" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">147 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the end quote
|
||
operator. </td>
|
||
<td valign="top" width="29%">"E" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">148 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the start
|
||
quote operator. </td>
|
||
<td valign="top" width="29%">"Q" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">149 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents a Unicode
|
||
combining character sequence. </td>
|
||
<td valign="top" width="29%">"X" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">150 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents any single
|
||
character. </td>
|
||
<td valign="top" width="29%">"C" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">151 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents end of buffer
|
||
operator. </td>
|
||
<td valign="top" width="29%">"Z" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="21%">152 </td>
|
||
<td valign="top" width="32%">The character which when
|
||
preceded by an escape character represents the
|
||
continuation assertion. </td>
|
||
<td valign="top" width="29%">"G" </td>
|
||
<td valign="top" width="9%"> </td>
|
||
</tr>
|
||
</table>
|
||
|
||
<p><br>
|
||
</p>
|
||
|
||
<p>Custom error messages are loaded as follows: <br>
|
||
</p>
|
||
|
||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">Message ID </td>
|
||
<td valign="top" width="32%">Error message ID </td>
|
||
<td valign="top" width="31%">Default string </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">201 </td>
|
||
<td valign="top" width="32%">REG_NOMATCH </td>
|
||
<td valign="top" width="31%">"No match" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">202 </td>
|
||
<td valign="top" width="32%">REG_BADPAT </td>
|
||
<td valign="top" width="31%">"Invalid regular
|
||
expression" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">203 </td>
|
||
<td valign="top" width="32%">REG_ECOLLATE </td>
|
||
<td valign="top" width="31%">"Invalid collation
|
||
character" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">204 </td>
|
||
<td valign="top" width="32%">REG_ECTYPE </td>
|
||
<td valign="top" width="31%">"Invalid character
|
||
class name" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">205 </td>
|
||
<td valign="top" width="32%">REG_EESCAPE </td>
|
||
<td valign="top" width="31%">"Trailing backslash"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">206 </td>
|
||
<td valign="top" width="32%">REG_ESUBREG </td>
|
||
<td valign="top" width="31%">"Invalid back reference"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">207 </td>
|
||
<td valign="top" width="32%">REG_EBRACK </td>
|
||
<td valign="top" width="31%">"Unmatched [ or [^"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">208 </td>
|
||
<td valign="top" width="32%">REG_EPAREN </td>
|
||
<td valign="top" width="31%">"Unmatched ( or \\("
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">209 </td>
|
||
<td valign="top" width="32%">REG_EBRACE </td>
|
||
<td valign="top" width="31%">"Unmatched \\{" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">210 </td>
|
||
<td valign="top" width="32%">REG_BADBR </td>
|
||
<td valign="top" width="31%">"Invalid content of
|
||
\\{\\}" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">211 </td>
|
||
<td valign="top" width="32%">REG_ERANGE </td>
|
||
<td valign="top" width="31%">"Invalid range end"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">212 </td>
|
||
<td valign="top" width="32%">REG_ESPACE </td>
|
||
<td valign="top" width="31%">"Memory exhausted"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">213 </td>
|
||
<td valign="top" width="32%">REG_BADRPT </td>
|
||
<td valign="top" width="31%">"Invalid preceding
|
||
regular expression" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">214 </td>
|
||
<td valign="top" width="32%">REG_EEND </td>
|
||
<td valign="top" width="31%">"Premature end of
|
||
regular expression" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">215 </td>
|
||
<td valign="top" width="32%">REG_ESIZE </td>
|
||
<td valign="top" width="31%">"Regular expression too
|
||
big" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">216 </td>
|
||
<td valign="top" width="32%">REG_ERPAREN </td>
|
||
<td valign="top" width="31%">"Unmatched ) or \\)"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">217 </td>
|
||
<td valign="top" width="32%">REG_EMPTY </td>
|
||
<td valign="top" width="31%">"Empty expression"
|
||
</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">218 </td>
|
||
<td valign="top" width="32%">REG_E_UNKNOWN </td>
|
||
<td valign="top" width="31%">"Unknown error" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
</table>
|
||
|
||
<p><br>
|
||
</p>
|
||
|
||
<p>Custom character class names are loaded as followed: <br>
|
||
</p>
|
||
|
||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">Message ID </td>
|
||
<td valign="top" width="32%">Description </td>
|
||
<td valign="top" width="31%">Equivalent default class
|
||
name </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">300 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
alphanumeric characters. </td>
|
||
<td valign="top" width="31%">"alnum" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">301 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
alphabetic characters. </td>
|
||
<td valign="top" width="31%">"alpha" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">302 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
control characters. </td>
|
||
<td valign="top" width="31%">"cntrl" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">303 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
digit characters. </td>
|
||
<td valign="top" width="31%">"digit" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">304 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
graphics characters. </td>
|
||
<td valign="top" width="31%">"graph" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">305 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
lower case characters. </td>
|
||
<td valign="top" width="31%">"lower" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">306 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
printable characters. </td>
|
||
<td valign="top" width="31%">"print" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">307 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
punctuation characters. </td>
|
||
<td valign="top" width="31%">"punct" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">308 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
space characters. </td>
|
||
<td valign="top" width="31%">"space" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">309 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
upper case characters. </td>
|
||
<td valign="top" width="31%">"upper" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">310 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
hexadecimal characters. </td>
|
||
<td valign="top" width="31%">"xdigit" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">311 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
blank characters. </td>
|
||
<td valign="top" width="31%">"blank" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">312 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
word characters. </td>
|
||
<td valign="top" width="31%">"word" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="8%"> </td>
|
||
<td valign="top" width="22%">313 </td>
|
||
<td valign="top" width="32%">The character class name for
|
||
Unicode characters. </td>
|
||
<td valign="top" width="31%">"unicode" </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
</table>
|
||
|
||
<p><br>
|
||
</p>
|
||
|
||
<p>Finally, custom collating element names are loaded starting
|
||
from message id 400, and terminating when the first load
|
||
thereafter fails. Each message looks something like: "tagname
|
||
string" where <i>tagname</i> is the name used inside [[.tagname.]]
|
||
and <i>string</i> is the actual text of the collating element.
|
||
Note that the value of collating element [[.zero.]] is used for
|
||
the conversion of strings to numbers - if you replace this with
|
||
another value then that will be used for string parsing - for
|
||
example use the Unicode character 0x0660 for [[.zero.]] if you
|
||
want to use Unicode Arabic-Indic digits in your regular
|
||
expressions in place of Latin digits. </p>
|
||
|
||
<p>Note that the POSIX defined names for character classes and
|
||
collating elements are always available - even if custom names
|
||
are defined, in contrast, custom error messages, and custom
|
||
syntax messages replace the default ones. <br>
|
||
</p>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="demos"></a>Appendix 4: Demo Applications</h3>
|
||
|
||
<p>There are three demo applications that ship with this library,
|
||
they all come with makefiles for Borland, Microsoft and gcc
|
||
compilers, otherwise you will have to create your own makefiles. </p>
|
||
|
||
<h5>regress.exe: </h5>
|
||
|
||
<p>A regression test application that gives the matching/searching
|
||
algorithms a full workout. The presence of this program is your
|
||
guarantee that the library will behave as claimed - at least as
|
||
far as those items tested are concerned - if anyone spots
|
||
anything that isn't being tested I'd be glad to hear about it. </p>
|
||
|
||
<p>Files: <a href="demo/regress/parse.cpp">parse.cpp</a>, <a
|
||
href="demo/regress/regress.cpp">regress.cpp</a>, <a
|
||
href="demo/regress/tests.cpp">tests.cpp</a>. </p>
|
||
|
||
<h5>jgrep.exe </h5>
|
||
|
||
<p>A simple grep implementation, run with no command line options
|
||
to find out its usage. Look at <a href="src/fileiter.cpp">fileiter.cpp</a>/fileiter.hpp
|
||
and the mapfile class to see an example of a "smart"
|
||
bidirectional iterator that can be used with regex++ or any other
|
||
STL algorithm. </p>
|
||
|
||
<p>Files: <a href="demo/jgrep/jgrep.cpp">jgrep.cpp</a>, <a
|
||
href="demo/jgrep/main.cpp">main.cpp</a>. </p>
|
||
|
||
<h5>timer.exe </h5>
|
||
|
||
<p>A simple interactive expression matching application, the
|
||
results of all matches are timed, allowing the programmer to
|
||
optimize their regular expressions where performance is critical.
|
||
</p>
|
||
|
||
<p>Files: <a href="demo/timer/regex_timer.cpp">regex_timer.cpp</a>.
|
||
<br>
|
||
</p>
|
||
|
||
<p>The snippets demos contain the code examples used in the
|
||
documentation:</p>
|
||
|
||
<p><a href="demo/snippets/snip1.cpp">snip1.cpp</a>: ftp based
|
||
regex_match example.</p>
|
||
|
||
<p><a href="demo/snippets/snip2.cpp">snip2.cpp</a>: regex_search
|
||
example: searches a cpp file for class definitions.</p>
|
||
|
||
<p><a href="demo/snippets/snip3.cpp">snip3.cpp</a>: regex_grep
|
||
example 1: searches a cpp file for class definitions.</p>
|
||
|
||
<p><a href="demo/snippets/snip4.cpp">snip4.cpp</a>: regex_merge
|
||
example: converts a C++ file to syntax highlighted HTML.</p>
|
||
|
||
<p><a href="demo/snippets/snip5.cpp">snip5.cpp</a>: regex_grep
|
||
example 2: searches a cpp file for class definitions, using a
|
||
global callback function. </p>
|
||
|
||
<p><a href="demo/snippets/snip6.cpp">snip6.cpp</a>: regex_grep
|
||
example 2: searches a cpp file for class definitions, using a
|
||
bound member function callback.</p>
|
||
|
||
<p><a href="demo/snippets/snip7.cpp">snip7.cpp</a>: regex_grep
|
||
example 2: searches a cpp file for class definitions, using a C++
|
||
Builder closure as a callback.</p>
|
||
|
||
<p><a href="demo/snippets/snip8.cpp">snip8.cpp</a>: regex_split
|
||
example: split a string into tokens.</p>
|
||
|
||
<p><a href="demo/snippets/snip9.cpp">snip9.cpp</a>: regex_split
|
||
example: spit out linked URL's.</p>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="headers"></a>Appendix 5: Header Files</h3>
|
||
|
||
<p>There are two main headers used by this library: <boost/regex.hpp>
|
||
provides full access to the entire library, while <boost/cregex.hpp>
|
||
provides access to just the high level class RegEx, and the POSIX
|
||
API functions. <br>
|
||
</p>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="redist"></a>Appendix 6: Redistributables</h3>
|
||
|
||
<p> If you are using Microsoft or Borland C++ and link to a
|
||
dll version of the run time library, then you will also link to
|
||
one of the dll versions of regex++. While these dll's are
|
||
redistributable, there are no "standard" versions, so
|
||
when installing on the users PC, you should place these in a
|
||
directory private to your application, and not in the PC's
|
||
directory path. Note that if you link to a static version of your
|
||
run time library, then you will also link to a static version of
|
||
regex++ and no dll's will need to be distributed. The possible
|
||
regex++ dll's are as follows: <br>
|
||
</p>
|
||
|
||
<table border="0" cellpadding="7" cellspacing="0" width="100%">
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td valign="top" width="27%"><b>Development Tool</b> </td>
|
||
<td valign="top" width="30%"><b>Run Time Library</b> </td>
|
||
<td valign="top" width="30%"><b>Regex++ Dll</b> </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td valign="top" width="27%">Microsoft Visual C++ 6 </td>
|
||
<td valign="top" width="30%">Msvcp60.dll and msvcrt.dll </td>
|
||
<td valign="top" width="30%">Mre200l.dll </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td valign="top" width="27%">Microsoft Visual C++ 6 </td>
|
||
<td valign="top" width="30%">Msvcp60d.dll and msvcrtd.dll
|
||
</td>
|
||
<td valign="top" width="30%">Mre300dl.dll </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td valign="top" width="27%">Borland C++ Builder 4 </td>
|
||
<td valign="top" width="30%">Cw3245.dll </td>
|
||
<td valign="top" width="30%">bcb4re300l.dll </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td valign="top" width="27%">Borland C++ Builder 4 </td>
|
||
<td valign="top" width="30%">Cw3245mt.dll </td>
|
||
<td valign="top" width="30%">bcb4re300lm.dll </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td valign="top" width="27%">Borland C++ Builder 4 </td>
|
||
<td valign="top" width="30%">Cp3245mt.dll and vcl40.bpl </td>
|
||
<td valign="top" width="30%">bcb4re300lv.dll </td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td align="center" valign="top" width="27%">Borland C++
|
||
Builder 5</td>
|
||
<td align="center" valign="top" width="30%">cp3250.dll</td>
|
||
<td valign="top" width="30%">bcb5re300l.dll</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td align="center" valign="top" width="27%">Borland C++
|
||
Builder 5</td>
|
||
<td align="center" valign="top" width="30%">cp3250mt.dll</td>
|
||
<td valign="top" width="30%">bcb5re300lm.dll</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
<tr>
|
||
<td valign="top" width="7%"> </td>
|
||
<td align="center" valign="top" width="27%">Borland C++
|
||
Builder 5</td>
|
||
<td align="center" valign="top" width="30%">cw3250mt.dll</td>
|
||
<td valign="top" width="30%">bcb5re300lv.dll</td>
|
||
<td valign="top" width="7%"> </td>
|
||
</tr>
|
||
</table>
|
||
|
||
<p>Note: you can disable automatic library selection by defining
|
||
the symbol BOOST_RE_NO_LIB when compiling, this is useful if you
|
||
want to statically link even though you're using the dll version
|
||
of your run time library, or if you need to debug regex++. <br>
|
||
</p>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="upgrade"></a>Notes for upgraders</h3>
|
||
|
||
<p>This version of regex++ is the first to be ported to the <a
|
||
href="http://www.boost.org">boost</a> project, and as a result
|
||
has a number of changes to comply with the boost coding
|
||
guidelines. </p>
|
||
|
||
<p>Headers have been changed from <header> or <header.h>
|
||
to <boost/header.hpp> </p>
|
||
|
||
<p>The library namespace has changed from "jm", to
|
||
"boost". </p>
|
||
|
||
<p>The reg_xxx algorithms have been renamed regex_xxx (to improve
|
||
naming consistency). </p>
|
||
|
||
<p>Algorithm query_match has been renamed regex_match, and only
|
||
returns true if the expression matches the whole of the input
|
||
string (think input data validation). </p>
|
||
|
||
<p><i>Compiling existing code:</i> </p>
|
||
|
||
<p>The directory, libs/regex/old_include contains a set of
|
||
headers that make this version of regex++ compatible with
|
||
previous ones, either add this directory to your include path, or
|
||
copy these headers to the root directory of your boost
|
||
installation. The contents of these headers are deprecated and
|
||
undocumented - really these are just here for existing code - for
|
||
new projects use the new header forms. <br>
|
||
</p>
|
||
|
||
<hr>
|
||
|
||
<h3><a name="furtherInfo"></a>Further Information (Contacts and
|
||
Acknowledgements)</h3>
|
||
|
||
<p>The author can be contacted at <a
|
||
href="mailto:John_Maddock@compuserve.com">John_Maddock@compuserve.com</a>,
|
||
the home page for this library is at <a
|
||
href="http://ourworld.compuserve.com/homepages/John_Maddock/regexpp.htm">http://ourworld.compuserve.com/homepages/John_Maddock/regexpp.htm</a>,
|
||
and the official boost version can be obtained from <a
|
||
href="http://www.boost.org/libraries.htm">www.boost.org/libraries.htm</a>.
|
||
</p>
|
||
|
||
<p>I am indebted to Robert Sedgewick's "Algorithms in C++"
|
||
for forcing me to think about algorithms and their performance,
|
||
and to the folks at boost for forcing me to <i>think</i>, period.
|
||
The following people have all contributed useful comments or
|
||
fixes: Dave Abrahams, Mike Allison, Edan Ayal, Jayashree
|
||
Balasubramanian, Beman Dawes, Paul Baxter, Edward Diener, Robert
|
||
Dunn, Fabio Forno, Rob Gillen, Chris Hecker, Jesse Jones, Jan
|
||
Hermelink, Max Leung, Wei-hao Lin, Jens Maurer, Scobie Smith,
|
||
Herv<EFBFBD> Poirier, Marc Recht, Alexey Voinov, Jerry Waldorf, Rob
|
||
Ward, Lealon Watts and Yuval Yosef. I am also grateful to the
|
||
manuals supplied with the Henry Spencer, Perl and GNU regular
|
||
expression libraries - wherever possible I have tried to maintain
|
||
compatibility with these libraries and with the POSIX standard -
|
||
the code however is entirely my own, including any bugs! I can
|
||
absolutely guarantee that I will not fix any bugs I don't know
|
||
about, so if you have any comments or spot any bugs, please get
|
||
in touch. </p>
|
||
|
||
<p>Useful further information can be found at: </p>
|
||
|
||
<p>The <a
|
||
href="http://www.opengroup.org/onlinepubs/7908799/toc.htm">Open
|
||
Unix Specification</a> contains a wealth of useful material,
|
||
including the regular expression syntax, and specifications for <a
|
||
href="http://www.opengroup.org/onlinepubs/7908799/xsh/regex.h.html"><regex.h></a>
|
||
and <a
|
||
href="http://www.opengroup.org/onlinepubs/7908799/xsh/nl_types.h.html"><nl_types.h></a>.
|
||
</p>
|
||
|
||
<p>The <a
|
||
href="http://www.cs.purdue.edu/homes/stelo/pattern.html">Pattern
|
||
Matching Pointers</a> site is a "must visit" resource
|
||
for anyone interested in pattern matching. </p>
|
||
|
||
<p><a href="http://glimpse.cs.arizona.edu/">Glimpse and Agrep</a>,
|
||
use a simplified regular expression syntax to achieve faster
|
||
search times. </p>
|
||
|
||
<p><a href="http://glimpse.cs.arizona.edu/udi.html">Udi Manber</a>
|
||
and <a href="http://www.dcc.uchile.cl/~rbaeza/">Ricardo Baeza-Yates</a>
|
||
both have a selection of useful pattern matching papers available
|
||
from their respective web sites. <br>
|
||
</p>
|
||
<hr>
|
||
<p><i>Copyright </i><a href="mailto:John_Maddock@compuserve.com"><i>Dr
|
||
John Maddock</i></a><i> 1998-2000 all rights reserved.</i> </p>
|
||
</body>
|
||
</html>
|
||
|