It is provided "as is" without express or implied warranty.</PRE></I></TD>
</TR>
</TABLE>
<P><HR></P>
<I><H3><ANAME="intro"></A>Introduction</H3>
</I><P>Regular expressions are a form of pattern-matching that are often used in text processing; many users will be familiar with the Unix utilities <I>grep</I>, <I>sed</I> and <I>awk</I>, and the programming language <I>perl</I>, each of which make extensive use of regular expressions. Traditionally C++ users have been limited to the POSIX C API's for manipulating regular expressions, and while regex++ does provide these API's, they do not represent the best way to use the library. For example regex++ can cope with wide character strings, or search and replace operations (in a manner analogous to either sed or perl), something that traditional C libraries can not do.</P>
<P>The class <AHREF="template_class_ref.htm#reg_expression">boost::reg_expression</A> is the key class in this library; it represents a "machine readable" regular expression, and is very closely modelled on std::basic_string, think of it as a string plus the actual state-machine required by the regular expression algorithms. Like std::basic_string there are two typedefs that are almost always the means by which this class is referenced:</P>
<P>To see how this library can be used, imagine that we are writing a credit card processing application. Credit card numbers generally come as a string of 16-digits, separated into groups of 4-digits, and separated by either a space or a hyphen. Before storing a credit card number in a database (not necessarily something your customers will appreciate!), we may want to verify that the number is in the correct format. To match any digit we could use the regular expression [0-9], however ranges of characters like this are actually locale dependent. Instead we should use the POSIX standard form [[:digit:]], or the regex++ and perl shorthand for this \d (note that many older libraries tended to be hard-coded to the C-locale, consequently this was not an issue for them). That leaves us with the following regular expression to validate credit card number formats:</P>
<P>(\d{4}[- ]){3}\d</P>
<P>Here the parenthesis act to group (and mark for future reference) sub-expressions, and the {4} means "repeat exactly 4 times". This is an example of the extended regular expression syntax used by perl, awk and egrep. Regex++ also supports the older "basic" syntax used by sed and grep, but this is generally less useful, unless you already have some basic regular expressions that you need to reuse.</P>
<P>Now lets take that expression and place it in some C++ code to validate the format of a credit card number:</P>
<P>Note how we had to add some extra escapes to the expression: remember that the escape is seen once by the C++ compiler, before it gets to be seen by the regular expression engine, consequently escapes in regular expressions have to be doubled up when embedding them in C/C++ code.</P>
<P>Those of you who are familiar with credit card processing, will have realised that while the format used above is suitable for human readable card numbers, it does not represent the format required by online credit card systems; these require the number as a string of 16 (or possibly 15) digits, without any intervening spaces. What we need is a means to convert easily between the two formats, and this is where search and replace comes in. Those who are familiar with the utilities <I>sed</I> and <I>perl</I> will already be ahead here; we need two strings - one a regular expression - the other a "<AHREF="format_string.htm">format string</A>" that provides a description of the text to replace the match with. In regex++ this search and replace operation is performed with the algorithm regex_merge, for our credit card example we can write two algorithms like this to provide the format conversions:</P>
<PRE>
<I>// match any format with the regular expression:
<B>return</B><AHREF="template_class_ref.htm#reg_merge">regex_merge</A>(s, e, human_format, boost::match_default | boost::format_sed);
}</PRE>
<P>Here we've used marked sub-expressions in the regular expression to split out the four parts of the card number as separate fields, the format string then uses the sed-like syntax to replace the matched text with the reformatted version.</P>
<P>In the examples above, we haven't directly manipulated the results of a regular expression match, however in general the result of a match contains a number of sub-expression matches in addition to the overall match. When the library needs to report a regular expression match it does so using an instance of the class <AHREF="template_class_ref.htm#reg_match">match_results</A>, as before there are typedefs of this class for the two most common cases: </P>
<P>The algorithms <AHREF="template_class_ref.htm#reg_search">regex_search</A> and <AHREF="template_class_ref.htm#reg_grep">regex_grep</A> (i.e. finding all matches in a string) make use of match_results to report what matched.</P>
<P>Note that these algorithms are not restricted to searching regular C-strings, any bidirectional iterator type can be searched, allowing for the possibility of seamlessly searching almost any kind of data. </P>
<P>For search and replace operations in addition to the algorithm <AHREF="template_class_ref.htm#reg_merge">regex_merge</A> that we have already seen, the algorithm <AHREF="template_class_ref.htm#reg_format">regex_format</A> takes the result of a match and a format string, and produces a new string by merging the two.</P>
<P>For those that dislike templates, there is a high level wrapper class RegEx that is an encapsulation of the lower level template code - it provides a simplified interface for those that don't need the full power of the library, and supports only narrow characters, and the "extended" regular expression syntax. </P>
<P>The <AHREF="posix_ref.htm#posix">POSIX API</A> functions: regcomp, regexec, regfree and regerror, are available in both narrow character and Unicode versions, and are provided for those who need compatibility with these API's. </P>
<P>Finally, note that the library now has run-time <AHREF="appendix.htm#localisation">localization</A> support, and recognizes the full POSIX regular expression syntax - including advanced features like multi-character collating elements and equivalence classes - as well as providing compatibility with other regular expression libraries including GNU and BSD4 regex packages, and to a more limited extent perl 5. </P>
<I><H3><ANAME="Installation"></A>Installation and Configuration Options</I></H3>
<EM><P>[ </EM><I><STRONG>Important</I></STRONG><EM>: If you are upgrading from version 3.04x of this library then you will find a number of changes to the documented header names and library interfaces, existing code should still compile unchanged however - see </EM><AHREF="appendix.htm#upgrade"><FONTCOLOR="#0000ff"><EM>Note for Upgraders</FONT></EM></A><EM>. ]</P>
</EM><P>When you extract the library from its zip file, you must preserve its internal directory structure (for example by using the -d option when extracting). If you didn't do that when extracting, then you'd better stop reading this, delete the files you just extracted, and try again! </P>
<P>Currently the library will automatically detect and configure itself for Borland, Microsoft and gcc compilers only. The library will also detect the HP, SGI, Rogue Wave, or Microsoft STL implementations. If the STL type is detected, then the library will attempt to extract suitable compiler configuration options from the STL used. Otherwise the library will assume that the compiler is fully compliant with the C++ standard: unless various options are defined to depreciate features not implemented by your compiler. These options are documented in <boost/re_detail/regex_options.hpp>, if you want to add permanent configuration options add them to <boost/re_detail/regex_options.hpp> which is provided for this purpose - this will allow you to keep your configuration options between library versions by retaining <boost/re_detail/regex_options.hpp>. </P>
<P>The library will encase all code inside namespace boost. </P>
<P>Unlike some other template libraries, this library consists of a mixture of template code (in the headers) and static code and data (in cpp files). Consequently it is necessary to build the library's support code into a library or archive file before you can use it, instructions for specific platforms are as follows: </P>
<B><P>Borland C++ Builder:</B></P>
<UL>
<LI>Open up a console window and change to the <boost>\libs\regex\lib directory. </LI>
<LI>Select the appropriate makefile (bcb4.mak for C++ Builder 4, bcb5.mak for C++ Builder 5, and bcc55.mak for the 5.5 command line tools). </LI>
<LI>Invoke the makefile (pass the full path to your version of make if you have more than one version installed, the makefile relies on the path to make to obtain your C++ Builder installation directory and tools) for example: </LI></UL>
<PRE>make -fbcb5.mak</PRE>
<P>The build process will build a variety of .lib and .dll files (the exact number depends upon the version of Borland's tools you are using) the .lib and dll files will be in a sub-directory called bcb4 or bcb5 depending upon the makefile used. To install the libraries into your development system use:</P>
<P>make -fbcb5.mak install</P>
<P>library files will be copied to <BCROOT>/lib and the dll's to <BCROOT>/bin, where <BCROOT> corresponds to the install path of your Borland C++ tools. </P>
<P>You may also remove temporary files created during the build process (excluding lib and dll files) by using:</P>
<P>make -fbcb5.mak clean</P>
<P>Finally when you use regex++ it is only necessary for you to add the <boost> root director to your list of include directories for that project. It is not necessary for you to manually add a .lib file to the project; the headers will automatically select the correct .lib file for your build mode and tell the linker to include it. There is one caveat however: the library can not tell the difference between VCL and non-VCL enabled builds when building a GUI application from the command line, if you build from the command line with the 5.5 command line tools then you must define the pre-processor symbol _NO_VCL in order to ensure that the correct link libraries are selected: the C++ Builder IDE normally sets this automatically. Hint, users of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg in order to set this option permanently. <BR>
<BR>
</P>
<B><P>Microsoft Visual C++ 6</B></P>
<P>You need version 6 of MSVC to build this library. If you are using VC5 then you may want to look at one of the previous releases of this <AHREF="http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm">library</A></P>
<P>Open up a command prompt, which has the necessary MSVC environment variables defined (for example by using the batch file Vcvars32.bat installed by the Visual Studio installation), and change to the <boost>\libs\regex\lib directory. </P>
<P>Select the correct makefile - vc6.mak for "vanilla" Visual C++ 6 or vc6-stlport.mak if you are using STLPort.</P>
<P>Invoke the makefile like this:</P>
<P>nmake -fvc6.mak</P>
<P>You will now have a collection of lib and dll files in a "vc6" subdirectory, to install these into your development system use:</P>
<P>nmake -fvc6.mak install</P>
<P>The lib files will be copied to your <VC6>\lib directory and the dll files to <VC6>\bin, where <VC6> is the root of your Visual C++ 6 installation.</P>
<P>You can delete all the temporary files created during the build (excluding lib and dll files) using:</P>
<P>nmake -fvc6.mak clean </P>
<P>Finally when you use regex++ it is only necessary for you to add the <boost> root directory to your list of include directories for that project. It is not necessary for you to manually add a .lib file to the project; the headers will automatically select the correct .lib file for your build mode and tell the linker to include it. </P>
<I><STRONG><P>Important</I></STRONG><EM>: there have been some reports of compiler-optimisation bugs affecting this library, the workaround is to build the library using /Oityb1 rather than /O2. That is to use all optimisation settings except /Oa. This problem is reported to affect some standard library code as well (in fact I'm not sure if the problem is with the regex code or the underlying standard library), so it's probably worthwhile applying this workaround in normal practice in any case.</P>
</EM><P>Note: if you have replaced the C++ standard library that comes with VC6, then when you build the library you must ensure that the environment variables "INCLUDE" and "LIB" have been updated to reflect the include and library paths for the new library - see vcvars32.bat (part of your Visual Studio installation) for more details. </P>
<P>If you are building with the full STLPort v4, then use the vc6-stlport.mak file provided (The full STLPort libraries appear not to support single-thread static builds). <BR>
<BR>
</P>
<B><P>GCC(2.95)</B></P>
<P>There is a conservative makefile for the g++ compiler. From the command prompt change to the <boost>/libs/regex/lib directory and type: </P>
<P>make -fgcc.mak </P>
<P>At the end of the build process you should have a gcc sub-directory containing release and debug versions of the library (libboost_regex.a and libboost_regex_debug.a). When you build projects that use regex++, you will need to add the boost install directory to your list of include paths and add <boost>/libs/gcc/libboost_regex.a to your list of library files. </P>
<P>There is also a makefile to build the library as a shared library:</P>
<P>make -fgcc-shared.mak</P>
<P>which will build libboost_regex.so and libboost_regex_debug.so.</P>
<P>Both of the these makefiles support the following environment variables:</P>
<P>CXXFLAGS: extra compiler options - note that this applies to both the debug and release builds.</P>
<P>INCLUDES: additional include directories.</P>
<P>LDFLAGS: additional linker options.</P>
<P>LIBS: additional library files.</P>
<P>For the more adventurous there is a configure script in <boost>/libs/regex, this will enable things like multithreading/wide character/nls support if they are not enabled by default on your platform. When the configure script completes, run one of the makefiles described above.</P>
<B><P>Other compilers:</B></P>
<P>Run configure, this will set up the headers and generate makefiles: from the command prompt change to the <boost>/libs/regex directory and type: </P>
<TT><PRE>./configure
make</PRE>
</TT><P>Other make options include: </P>
<P>make jgrep: builds the jgrep demo. </P>
<P>make test: builds and runs the regression tests. </P>
<P>make timer: builds the timer demo program. </P>
<P>Note that the configure generated makefiles produce only a static library, if you would prefer to build a shared library, then there is a generic.mak makefile in the <boost>/libs/regex/lib directory. To use this you will need to set up a number of environment variables first (see the makefile for more details). Finally if you use one of the following compilers: Kai C++, SGI Irix C++, Compaq true64 C++, or Como C++, then you should not need to run the configure script to get the library to build, however doing so may enable optional features (multithreading support, and/or nls support).</P>
<B><P>Troubleshooting:</B></P>
<P>If make fails after running configure, you may need to manually disable some options: configure uses simple tests to determine what features your compiler supports, it does not stress the compiler's internals to any degree as the actual regex++ code can do. Other compiler features may be implemented (and therefore detected by configure) but known to be buggy, again in this case it may be necessary to disable the feature in order to compile regex++ to stable code. The output file from configure is <boost>/boost/re_detail/regex_options.hpp, this file lists all the macros that can be defined to configure regex++ along with a description to illustrate their usage, experiment changing options in regex_options.hpp one at a time until you achieve the effect you require. If you mail me questions about configure output, be sure to include both regex_options.hpp and config.log with your message. </P>
<P><HR></P>
<I><P>Copyright </I><AHREF="mailto:John_Maddock@compuserve.com"><I>Dr John Maddock</I></A><I> 1998-2001 all rights reserved.</I></P></BODY>