From f414a067b8a214187fffe13710805afaa65b5be0 Mon Sep 17 00:00:00 2001 From: John Maddock Date: Sat, 3 Mar 2001 11:32:04 +0000 Subject: [PATCH] regex doc updates for partial matches and revised makefiles [SVN r9394] --- index.htm | 7 +- introduction.htm | 539 ++---- template_class_ref.htm | 3805 +++++++++++++++++++--------------------- 3 files changed, 1956 insertions(+), 2395 deletions(-) diff --git a/index.htm b/index.htm index c2f4b818..37839938 100644 --- a/index.htm +++ b/index.htm @@ -75,6 +75,8 @@ It is provided "as is" without express or implied warranty.
  • Algorithm regex_split
  • +
  • Partial + regular expression matches
  • Class RegEx reference
  • @@ -117,7 +119,8 @@ It is provided "as is" without express or implied warranty. regex_split example: split a string into tokens.
  • snip9.cpp: - regex_split example: spit out linked URL's.
  • + regex_split example: spit out linked + URL's.
  • Header Files.
  • @@ -133,6 +136,6 @@ It is provided "as is" without express or implied warranty.

    Copyright Dr -John Maddock 1998-2000 all rights reserved.

    +John Maddock 1998-2001 all rights reserved.

    diff --git a/introduction.htm b/introduction.htm index 9c2995da..5b50fc89 100644 --- a/introduction.htm +++ b/introduction.htm @@ -1,30 +1,21 @@ - + + + + +regex++, Introduction + + + + - - - - - -regex++, Introduction - - - - -

     

    - - - - - + +

    C++ Boost

    -

    Regex++, - Introduction.

    -

    (version 3.04, 18 April 2000)

    -
    Copyright (c) 1998-2000
    +

     

    + + + - -
    +

    C++ Boost

    +

    Regex++, Introduction.

    +

    (version 3.04, 18 April 2000)

    +
    Copyright (c) 1998-2000
     Dr John Maddock
     
     Permission to use, copy, modify, distribute and sell this software
    @@ -33,400 +24,126 @@ provided that the above copyright notice appear in all copies and
     that both that copyright notice and this permission notice appear
     in supporting documentation.  Dr John Maddock makes no representations
     about the suitability of this software for any purpose.  
    -It is provided "as is" without express or implied warranty.
    -
    +It is provided "as is" without express or implied warranty.
    -
    +


    +

    Introduction

    +

    Regular expressions are a form of pattern-matching that are often used in text processing; many users will be familiar with the Unix utilities grep, sed and awk, and the programming language perl, each of which make extensive use of regular expressions. Traditionally C++ users have been limited to the POSIX C API's for manipulating regular expressions, and while regex++ does provide these API's, they do not represent the best way to use the library. For example regex++ can cope with wide character strings, or search and replace operations (in a manner analogous to either sed or perl), something that traditional C libraries can not do.

    +

    The class boost::reg_expression is the key class in this library; it represents a "machine readable" regular expression, and is very closely modelled on std::basic_string, think of it as a string plus the actual state-machine required by the regular expression algorithms. Like std::basic_string there are two typedefs that are almost always the means by which this class is referenced:

    +
    namespace boost{
     
    -

    Introduction

    +template <class charT, + class traits = regex_traits<charT>, + class Allocator = std::allocator<charT> > +class reg_expression; -

    Regular expressions are a form of pattern-matching that are -often used in text processing; many users will be familiar with -the Unix utilities grep, sed and awk, and -the programming language perl, each of which make -extensive use of regular expressions. Traditionally C++ users -have been limited to the POSIX C API's for manipulating regular -expressions, and while regex++ does provide these API's, they do -not represent the best way to use the library. For example regex++ -can cope with wide character strings, or search and replace -operations (in a manner analogous to either sed or perl), -something that traditional C libraries can not do.

    +typedef reg_expression<char> regex; +typedef reg_expression<wchar_t> wregex; -

    The class boost::reg_expression -is the key class in this library; it represents a "machine -readable" regular expression, and is very closely modelled -on std::basic_string, think of it as a string plus the actual -state-machine required by the regular expression algorithms. Like -std::basic_string there are two typedefs that are almost always -the means by which this class is referenced:

    - -
    namespace boost{
    -
    -template <class charT, 
    -          class traits = regex_traits<charT>, 
    -          class Allocator = std::allocator<charT> >
    -class reg_expression;
    -
    -typedef reg_expression<char> regex;
    -typedef reg_expression<wchar_t> wregex;
    -
    -}
    - -

    To see how this library can be used, imagine that we are -writing a credit card processing application. Credit card numbers -generally come as a string of 16-digits, separated into groups of -4-digits, and separated by either a space or a hyphen. Before -storing a credit card number in a database (not necessarily -something your customers will appreciate!), we may want to verify -that the number is in the correct format. To match any digit we -could use the regular expression [0-9], however ranges of -characters like this are actually locale dependent. Instead we -should use the POSIX standard form [[:digit:]], or the regex++ -and perl shorthand for this \d (note that many older libraries -tended to be hard-coded to the C-locale, consequently this was -not an issue for them). That leaves us with the following regular -expression to validate credit card number formats:

    - -

    (\d{4}[- ]){3}\d

    - -

    Here the parenthesis act to group (and mark for future -reference) sub-expressions, and the {4} means "repeat -exactly 4 times". This is an example of the extended regular -expression syntax used by perl, awk and egrep. Regex++ also -supports the older "basic" syntax used by sed and grep, -but this is generally less useful, unless you already have some -basic regular expressions that you need to reuse.

    - -

    Now lets take that expression and place it in some C++ code to -validate the format of a credit card number:

    - -
    bool validate_card_format(const std::string s)
    +}
    +

    To see how this library can be used, imagine that we are writing a credit card processing application. Credit card numbers generally come as a string of 16-digits, separated into groups of 4-digits, and separated by either a space or a hyphen. Before storing a credit card number in a database (not necessarily something your customers will appreciate!), we may want to verify that the number is in the correct format. To match any digit we could use the regular expression [0-9], however ranges of characters like this are actually locale dependent. Instead we should use the POSIX standard form [[:digit:]], or the regex++ and perl shorthand for this \d (note that many older libraries tended to be hard-coded to the C-locale, consequently this was not an issue for them). That leaves us with the following regular expression to validate credit card number formats:

    +

    (\d{4}[- ]){3}\d

    +

    Here the parenthesis act to group (and mark for future reference) sub-expressions, and the {4} means "repeat exactly 4 times". This is an example of the extended regular expression syntax used by perl, awk and egrep. Regex++ also supports the older "basic" syntax used by sed and grep, but this is generally less useful, unless you already have some basic regular expressions that you need to reuse.

    +

    Now lets take that expression and place it in some C++ code to validate the format of a credit card number:

    +
    bool validate_card_format(const std::string s)
     {
    -   static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
    -   return regex_match(s, e);
    -}
    + static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); + return regex_match(s, e); +}
    +

    Note how we had to add some extra escapes to the expression: remember that the escape is seen once by the C++ compiler, before it gets to be seen by the regular expression engine, consequently escapes in regular expressions have to be doubled up when embedding them in C/C++ code.

    +

    Those of you who are familiar with credit card processing, will have realised that while the format used above is suitable for human readable card numbers, it does not represent the format required by online credit card systems; these require the number as a string of 16 (or possibly 15) digits, without any intervening spaces. What we need is a means to convert easily between the two formats, and this is where search and replace comes in. Those who are familiar with the utilities sed and perl will already be ahead here; we need two strings - one a regular expression - the other a "format string" that provides a description of the text to replace the match with. In regex++ this search and replace operation is performed with the algorithm regex_merge, for our credit card example we can write two algorithms like this to provide the format conversions:

    +
    +// match any format with the regular expression:
    +const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
    +const std::string machine_format("\\1\\2\\3\\4");
    +const std::string human_format("\\1-\\2-\\3-\\4");
     
    -

    Note how we had to add some extra escapes to the expression: -remember that the escape is seen once by the C++ compiler, before -it gets to be seen by the regular expression engine, consequently -escapes in regular expressions have to be doubled up when -embedding them in C/C++ code.

    - -

    Those of you who are familiar with credit card processing, -will have realised that while the format used above is suitable -for human readable card numbers, it does not represent the format -required by online credit card systems; these require the number -as a string of 16 (or possibly 15) digits, without any -intervening spaces. What we need is a means to convert easily -between the two formats, and this is where search and replace -comes in. Those who are familiar with the utilities sed -and perl will already be ahead here; we need two strings - -one a regular expression - the other a "format string" that provides a -description of the text to replace the match with. In regex++ -this search and replace operation is performed with the algorithm -regex_merge, for our credit card example we can write two -algorithms like this to provide the format conversions:

    - -
    -// match any format with the regular expression:
    -const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
    -const std::string machine_format("\\1\\2\\3\\4");
    -const std::string human_format("\\1-\\2-\\3-\\4");
    -
    -std::string machine_readable_card_number(const std::string s)
    +std::string machine_readable_card_number(const std::string s)
     {
    -   return regex_merge(s, e, machine_format, boost::match_default | boost::format_sed);
    +   return regex_merge(s, e, machine_format, boost::match_default | boost::format_sed);
     }
     
    -std::string human_readable_card_number(const std::string s)
    +std::string human_readable_card_number(const std::string s)
     {
    -   return regex_merge(s, e, human_format, boost::match_default | boost::format_sed);
    -}
    + return regex_merge(s, e, human_format, boost::match_default | boost::format_sed); +}
    +

    Here we've used marked sub-expressions in the regular expression to split out the four parts of the card number as separate fields, the format string then uses the sed-like syntax to replace the matched text with the reformatted version.

    +

    In the examples above, we haven't directly manipulated the results of a regular expression match, however in general the result of a match contains a number of sub-expression matches in addition to the overall match. When the library needs to report a regular expression match it does so using an instance of the class match_results, as before there are typedefs of this class for the two most common cases:

    +
    namespace boost{
    +typedef match_results<const char*> cmatch;
    +typedef match_results<const wchar_t*> wcmatch;
    +}
    +

    The algorithms regex_search and regex_grep (i.e. finding all matches in a string) make use of match_results to report what matched.

    +

    Note that these algorithms are not restricted to searching regular C-strings, any bidirectional iterator type can be searched, allowing for the possibility of seamlessly searching almost any kind of data.

    +

    For search and replace operations in addition to the algorithm regex_merge that we have already seen, the algorithm regex_format takes the result of a match and a format string, and produces a new string by merging the two.

    +

    For those that dislike templates, there is a high level wrapper class RegEx that is an encapsulation of the lower level template code - it provides a simplified interface for those that don't need the full power of the library, and supports only narrow characters, and the "extended" regular expression syntax.

    +

    The POSIX API functions: regcomp, regexec, regfree and regerror, are available in both narrow character and Unicode versions, and are provided for those who need compatibility with these API's.

    +

    Finally, note that the library now has run-time localization support, and recognizes the full POSIX regular expression syntax - including advanced features like multi-character collating elements and equivalence classes - as well as providing compatibility with other regular expression libraries including GNU and BSD4 regex packages, and to a more limited extent perl 5.

    +

    Installation and Configuration Options

    +

    [ Important: If you are upgrading from version 3.04x of this library then you will find a number of changes to the documented header names and library interfaces, existing code should still compile unchanged however - see Note for Upgraders. ]

    +

    When you extract the library from its zip file, you must preserve its internal directory structure (for example by using the -d option when extracting). If you didn't do that when extracting, then you'd better stop reading this, delete the files you just extracted, and try again!

    +

    Currently the library will automatically detect and configure itself for Borland, Microsoft and gcc compilers only. The library will also detect the HP, SGI, Rogue Wave, or Microsoft STL implementations. If the STL type is detected, then the library will attempt to extract suitable compiler configuration options from the STL used. Otherwise the library will assume that the compiler is fully compliant with the C++ standard: unless various options are defined to depreciate features not implemented by your compiler. These options are documented in <boost/re_detail/regex_options.hpp>, if you want to add permanent configuration options add them to <boost/re_detail/regex_options.hpp> which is provided for this purpose - this will allow you to keep your configuration options between library versions by retaining <boost/re_detail/regex_options.hpp>.

    +

    The library will encase all code inside namespace boost.

    +

    Unlike some other template libraries, this library consists of a mixture of template code (in the headers) and static code and data (in cpp files). Consequently it is necessary to build the library's support code into a library or archive file before you can use it, instructions for specific platforms are as follows:

    +

    Borland C++ Builder:

    -

    Here we've used marked sub-expressions in the regular -expression to split out the four parts of the card number as -separate fields, the format string then uses the sed-like syntax -to replace the matched text with the reformatted version.

    + -

    In the examples above, we haven't directly manipulated the -results of a regular expression match, however in general the -result of a match contains a number of sub-expression matches in -addition to the overall match. When the library needs to report a -regular expression match it does so using an instance of the -class match_results, -as before there are typedefs of this class for the two most -common cases:

    - -
    namespace boost{
    -typedef match_results<const char*> cmatch;
    -typedef match_results<const wchar_t*> wcmatch;
    -}
    - -

    The algorithms regex_search -and regex_grep (i.e. -finding all matches in a string) make use of match_results to -report what matched.

    - -

    Note that these algorithms are not restricted to searching -regular C-strings, any bidirectional iterator type can be -searched, allowing for the possibility of seamlessly searching -almost any kind of data.

    - -

    For search and replace operations in addition to the algorithm -regex_merge that -we have already seen, the algorithm regex_format takes -the result of a match and a format string, and produces a new -string by merging the two.

    - -

    For those that dislike templates, there is a high level -wrapper class RegEx that is an encapsulation of the lower level -template code - it provides a simplified interface for those that -don't need the full power of the library, and supports only -narrow characters, and the "extended" regular -expression syntax.

    - -

    The POSIX API functions: -regcomp, regexec, regfree and regerror, are available in both -narrow character and Unicode versions, and are provided for those -who need compatibility with these API's.

    - -

    Finally, note that the library now has run-time localization support, and -recognizes the full POSIX regular expression syntax - including -advanced features like multi-character collating elements and -equivalence classes - as well as providing compatibility with -other regular expression libraries including GNU and BSD4 regex -packages, and to a more limited extent perl 5.

    - -

    Installation and -Configuration Options

    - -

    [ Important: If you are -upgrading from version 3.04x of this library then you will find a -number of changes to the documented header names and library -interfaces, existing code should still compile unchanged however -- see Note -for Upgraders. ]

    - -

    When you extract the library from its zip file, you must -preserve its internal directory structure (for example by using -the -d option when extracting). If you didn't do that when -extracting, then you'd better stop reading this, delete the files -you just extracted, and try again!

    - -

    Currently the library will automatically detect and configure -itself for Borland, Microsoft and gcc compilers only. The library -will also detect the HP, SGI, Rogue Wave, or Microsoft STL -implementations. If the STL type is detected, then the library -will attempt to extract suitable compiler configuration options -from the STL used. Otherwise the library will assume that the -compiler is fully compliant with the C++ standard: unless various -options are defined to depreciate features not implemented by -your compiler. These options are documented in <boost/re_detail/regex_options.hpp>, -if you want to add permanent configuration options add them to -<boost/re_detail/regex_options.hpp> which is provided for -this purpose - this will allow you to keep your configuration -options between library versions by retaining <boost/re_detail/regex_options.hpp>. -

    - -

    The library will encase all code inside namespace boost.

    - -

    Unlike some other template libraries, this library consists of -a mixture of template code (in the headers) and static code and -data (in cpp files). Consequently it is necessary to build the -library's support code into a library or archive file before you -can use it, instructions for specific platforms are as follows:

    - -

    Borland C++ Builder:

    - - - -
    make -fbcb5.mak
    - -

    The build process will build a variety of .lib and .dll files -(the exact number depends upon the version of Borland's tools you -are using) the .lib and dll files will be in a sub-directory -called bcb4 or bcb5 depending upon the makefile used. To install -the libraries into your development system use:

    - -

    make -fbcb5.mak install

    - -

    library files will be copied to <BCROOT>/lib and the dll's -to <BCROOT>/bin, where <BCROOT> corresponds to the -install path of your Borland C++ tools.

    - -

    You may also remove temporary files created during the build -process (excluding lib and dll files) by using:

    - -

    make -fbcb5.mak clean

    - -

    Finally when you use regex++ it is only necessary for you to -add the <boost> root director to your list of include -directories for that project. It is not necessary for you to -manually add a .lib file to the project; the headers will -automatically select the correct .lib file for your build mode -and tell the linker to include it. There is one caveat however: -the library can not tell the difference between VCL and non-VCL -enabled builds when building a GUI application from the command -line, if you build from the command line with the 5.5 command -line tools then you must define the pre-processor symbol _NO_VCL -in order to ensure that the correct link libraries are selected: -the C++ Builder IDE normally sets this automatically. Hint, users -of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg -in order to set this option permanently.

    - -

    Microsoft Visual C++ 6

    - -

    You need version 6 of MSVC to build this library. If you are -using VC5 then you may want to look at one of the previous -releases of this library -

    - -

    Open up a command prompt, which has the necessary MSVC -environment variables defined (for example by using the batch -file Vcvars32.bat installed by the Visual Studio installation), -and change to the <boost>\libs\regex\lib directory.

    - -

    Select the correct makefile - vc6.mak for "vanilla" -Visual C++ 6 or vc6-stlport.mak if you are using STLPort.

    - -

    Invoke the makefile like this:

    - -

    nmake -fvc6.mak

    - -

    You will now have a collection of lib and dll files in a -"vc6" subdirectory, to install these into your -development system use:

    - -

    nmake -fvc6.mak install

    - -

    The lib files will be copied to your <VC6>\lib directory -and the dll files to <VC6>\bin, where <VC6> is the -root of your Visual C++ 6 installation.

    - -

    You can delete all the temporary files created during the -build (excluding lib and dll files) using:

    - -

    nmake -fvc6.mak clean

    - -

    Finally when you use regex++ it is only necessary for you to -add the <boost> root directory to your list of include -directories for that project. It is not necessary for you to -manually add a .lib file to the project; the headers will -automatically select the correct .lib file for your build mode -and tell the linker to include it.

    - -

    Important: there have been some -reports of compiler-optimisation bugs affecting this library, the -workaround is to build the library using /Oityb1 rather than /O2. -That is to use all optimisation settings except /Oa. This problem -is reported to affect some standard library code as well (in fact -I'm not sure if the problem is with the regex code or the -underlying standard library), so it's probably worthwhile -applying this workaround in normal practice in any case.

    - -

    Note: if you have replaced the C++ standard library that comes -with VC6, then when you build the library you must ensure that -the environment variables "INCLUDE" and "LIB" -have been updated to reflect the include and library paths for -the new library - see vcvars32.bat (part of your Visual Studio -installation) for more details.

    - -

    If you are building with the full STLPort v4, then use the vc6-stlport.mak -file provided (The full STLPort libraries appear not to support -single-thread static builds).

    - -

    GCC(2.95)

    - -

    There is a conservative makefile for the g++ compiler. From -the command prompt change to the <boost>/libs/regex/lib -directory and type:

    - -

    make -fgcc.mak

    - -

    At the end of the build process you should have a gcc sub-directory -containing release and debug versions of the library (libregex++.a -and libregex++debug.a). When you build projects that use regex++, -you will need to add the boost install directory to your list of -include paths and add <boost>/libs/gcc/regex++ to your list -of library files.

    - -

    Otherwise: run configure, this will set up the headers and -generate makefiles, from the command prompt change to the <boost>/libs/regex -directory and type:

    - -
    configure
    -make
    - -

    Other make options include:

    - -

    make jgrep: builds the jgrep demo.

    - -

    make test: builds and runs the regression tests.

    - -

    make timer: builds the timer demo program.

    - -

    Note: gcc2.95.x on Win32 is only supported as cygwin and not -mingw32 (sorry but compiler related bugs prevent this).

    - -

    Other compilers:

    - -

    Run configure, this will set up the headers and generate -makefiles: from the command prompt change to the <boost>/libs/regex -directory and type:

    - -
    configure
    -make
    - -

    Other make options include:

    - -

    make jgrep: builds the jgrep demo.

    - -

    make test: builds and runs the regression tests.

    - -

    make timer: builds the timer demo program.

    - -

    Troubleshooting:

    - -

    If make fails after running configure, you may need to -manually disable some options: configure uses simple tests to -determine what features your compiler supports, it does not -stress the compiler's internals to any degree as the actual regex++ -code can do. Other compiler features may be implemented (and -therefore detected by configure) but known to be buggy, again in -this case it may be necessary to disable the feature in order to -compile regex++ to stable code. The output file from configure is -<boost>/boost/re_detail/regex_options.hpp, this file lists -all the macros that can be defined to configure regex++ along -with a description to illustrate their usage, experiment changing -options in regex_options.hpp one at a time until you achieve the -effect you require. If you mail me questions about configure -output, be sure to include both regex_options.hpp and config.log -with your message.

    - -
    - -

    Copyright Dr -John Maddock 1998-2000 all rights reserved.

    - - +
    make -fbcb5.mak
    +

    The build process will build a variety of .lib and .dll files (the exact number depends upon the version of Borland's tools you are using) the .lib and dll files will be in a sub-directory called bcb4 or bcb5 depending upon the makefile used. To install the libraries into your development system use:

    +

    make -fbcb5.mak install

    +

    library files will be copied to <BCROOT>/lib and the dll's to <BCROOT>/bin, where <BCROOT> corresponds to the install path of your Borland C++ tools.

    +

    You may also remove temporary files created during the build process (excluding lib and dll files) by using:

    +

    make -fbcb5.mak clean

    +

    Finally when you use regex++ it is only necessary for you to add the <boost> root director to your list of include directories for that project. It is not necessary for you to manually add a .lib file to the project; the headers will automatically select the correct .lib file for your build mode and tell the linker to include it. There is one caveat however: the library can not tell the difference between VCL and non-VCL enabled builds when building a GUI application from the command line, if you build from the command line with the 5.5 command line tools then you must define the pre-processor symbol _NO_VCL in order to ensure that the correct link libraries are selected: the C++ Builder IDE normally sets this automatically. Hint, users of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg in order to set this option permanently.

    +

    Microsoft Visual C++ 6

    +

    You need version 6 of MSVC to build this library. If you are using VC5 then you may want to look at one of the previous releases of this library

    +

    Open up a command prompt, which has the necessary MSVC environment variables defined (for example by using the batch file Vcvars32.bat installed by the Visual Studio installation), and change to the <boost>\libs\regex\lib directory.

    +

    Select the correct makefile - vc6.mak for "vanilla" Visual C++ 6 or vc6-stlport.mak if you are using STLPort.

    +

    Invoke the makefile like this:

    +

    nmake -fvc6.mak

    +

    You will now have a collection of lib and dll files in a "vc6" subdirectory, to install these into your development system use:

    +

    nmake -fvc6.mak install

    +

    The lib files will be copied to your <VC6>\lib directory and the dll files to <VC6>\bin, where <VC6> is the root of your Visual C++ 6 installation.

    +

    You can delete all the temporary files created during the build (excluding lib and dll files) using:

    +

    nmake -fvc6.mak clean

    +

    Finally when you use regex++ it is only necessary for you to add the <boost> root directory to your list of include directories for that project. It is not necessary for you to manually add a .lib file to the project; the headers will automatically select the correct .lib file for your build mode and tell the linker to include it.

    +

    Important: there have been some reports of compiler-optimisation bugs affecting this library, the workaround is to build the library using /Oityb1 rather than /O2. That is to use all optimisation settings except /Oa. This problem is reported to affect some standard library code as well (in fact I'm not sure if the problem is with the regex code or the underlying standard library), so it's probably worthwhile applying this workaround in normal practice in any case.

    +

    Note: if you have replaced the C++ standard library that comes with VC6, then when you build the library you must ensure that the environment variables "INCLUDE" and "LIB" have been updated to reflect the include and library paths for the new library - see vcvars32.bat (part of your Visual Studio installation) for more details.

    +

    If you are building with the full STLPort v4, then use the vc6-stlport.mak file provided (The full STLPort libraries appear not to support single-thread static builds).

    +

    GCC(2.95)

    +

    There is a conservative makefile for the g++ compiler. From the command prompt change to the <boost>/libs/regex/lib directory and type:

    +

    make -fgcc.mak

    +

    At the end of the build process you should have a gcc sub-directory containing release and debug versions of the library (libboost_regex.a and libboost_regex_debug.a). When you build projects that use regex++, you will need to add the boost install directory to your list of include paths and add <boost>/libs/gcc/libboost_regex.a to your list of library files.

    +

    There is also a makefile to build the library as a shared library:

    +

    make -fgcc-shared.mak

    +

    which will build libboost_regex.so and libboost_regex_debug.so.

    +

    Both of the these makefiles support the following environment variables:

    +

    CXXFLAGS: extra compiler options - note that this applies to both the debug and release builds.

    +

    INCLUDES: additional include directories.

    +

    LDFLAGS: additional linker options.

    +

    LIBS: additional library files.

    +

    For the more adventurous there is a configure script in <boost>/libs/regex, this will enable things like multithreading/wide character/nls support if they are not enabled by default on your platform. When the configure script completes, run one of the makefiles described above.

    +

    Other compilers:

    +

    Run configure, this will set up the headers and generate makefiles: from the command prompt change to the <boost>/libs/regex directory and type:

    +
    ./configure
    +make
    +

    Other make options include:

    +

    make jgrep: builds the jgrep demo.

    +

    make test: builds and runs the regression tests.

    +

    make timer: builds the timer demo program.

    +

    Note that the configure generated makefiles produce only a static library, if you would prefer to build a shared library, then there is a generic.mak makefile in the <boost>/libs/regex/lib directory. To use this you will need to set up a number of environment variables first (see the makefile for more details). Finally if you use one of the following compilers: Kai C++, SGI Irix C++, Compaq true64 C++, or Como C++, then you should not need to run the configure script to get the library to build, however doing so may enable optional features (multithreading support, and/or nls support).

    +

    Troubleshooting:

    +

    If make fails after running configure, you may need to manually disable some options: configure uses simple tests to determine what features your compiler supports, it does not stress the compiler's internals to any degree as the actual regex++ code can do. Other compiler features may be implemented (and therefore detected by configure) but known to be buggy, again in this case it may be necessary to disable the feature in order to compile regex++ to stable code. The output file from configure is <boost>/boost/re_detail/regex_options.hpp, this file lists all the macros that can be defined to configure regex++ along with a description to illustrate their usage, experiment changing options in regex_options.hpp one at a time until you achieve the effect you require. If you mail me questions about configure output, be sure to include both regex_options.hpp and config.log with your message.

    +


    +

    Copyright Dr John Maddock 1998-2001 all rights reserved.

    + diff --git a/template_class_ref.htm b/template_class_ref.htm index 01311d0c..8ab5dbb2 100644 --- a/template_class_ref.htm +++ b/template_class_ref.htm @@ -1,28 +1,20 @@ - + + + + +Regex++, template class and algorithm reference + + + - - - - -Regex++, template class and algorithm reference - - - - -

     

    - - - - - + +

    C++ Boost

    -

    Regex++, - Template Class and Algorithm Reference.

    -

    (version 3.04, 18 April 2000)

    -
    Copyright (c) 1998-9
    +

     

    + + + - -
    +

    C++ Boost

    +

    Regex++, Template Class and Algorithm Reference.

    +

    (version 3.04, 18 April 2000)

    +
    Copyright (c) 1998-9
     Dr John Maddock
     
     Permission to use, copy, modify, distribute and sell this software
    @@ -31,59 +23,35 @@ provided that the above copyright notice appear in all copies and
     that both that copyright notice and this permission notice appear
     in supporting documentation.  Dr John Maddock makes no representations
     about the suitability of this software for any purpose.  
    -It is provided "as is" without express or implied warranty.
    -
    +It is provided "as is" without express or implied warranty.
    -
    - -

    class regbase

    - -

    #include <boost/regex.hpp> -

    - -

    Class regbase is the template argument independent base class -for reg_expression, the only public members are the flag_type -enumerated values that determine how regular expressions are -interpreted.

    - -
    class regbase
    +


    +

    class regbase

    +

    #include <boost/regex.hpp>

    +

    Class regbase is the template argument independent base class for reg_expression, the only public members are the flag_type enumerated values that determine how regular expressions are interpreted.

    +
    class regbase
     {
    -public:
    -   enum flag_type_
    +public:
    +   enum flag_type_
        {
    -      escape_in_lists = 1,                          // '\\' special inside [...] 
    -      char_classes = escape_in_lists << 1,          // [[:CLASS:]] allowed 
    -      intervals = char_classes << 1,                // {x,y} allowed 
    -      limited_ops = intervals << 1,                 // all of + ? and | are normal characters 
    -      newline_alt = limited_ops << 1,               // \n is the same as | 
    -      bk_plus_qm = newline_alt << 1,                // uses \+ and \? 
    -      bk_braces = bk_plus_qm << 1,                  // uses \{ and \} 
    -      bk_parens = bk_braces << 1,                   // uses \( and \) 
    -      bk_refs = bk_parens << 1,                     // \d allowed 
    -      bk_vbar = bk_refs << 1,                       // uses \| 
    -      use_except = bk_vbar << 1,                    // exception on error 
    -      failbit = use_except << 1,                    // error flag 
    -      literal = failbit << 1,                       // all characters are literals 
    -      icase = literal << 1,                         // characters are matched regardless of case 
    -      nocollate = icase << 1,                       // don't use locale specific collation 
    -
    +      escape_in_lists = 1,                          // '\\' special inside [...] 
    +      char_classes = escape_in_lists << 1,          // [[:CLASS:]] allowed 
    +      intervals = char_classes << 1,                // {x,y} allowed 
    +      limited_ops = intervals << 1,                 // all of + ? and | are normal characters 
    +      newline_alt = limited_ops << 1,               // \n is the same as | 
    +      bk_plus_qm = newline_alt << 1,                // uses \+ and \? 
    +      bk_braces = bk_plus_qm << 1,                  // uses \{ and \} 
    +      bk_parens = bk_braces << 1,                   // uses \( and \) 
    +      bk_refs = bk_parens << 1,                     // \d allowed 
    +      bk_vbar = bk_refs << 1,                       // uses \| 
    +      use_except = bk_vbar << 1,                    // exception on error 
    +      failbit = use_except << 1,                    // error flag 
    +      literal = failbit << 1,                       // all characters are literals 
    +      icase = literal << 1,                         // characters are matched regardless of case 
    +      nocollate = icase << 1,                       // don't use locale specific collation 
    +
           basic = char_classes | intervals | limited_ops | bk_braces | bk_parens | bk_refs,
           extended = char_classes | intervals | bk_refs,
           normal = escape_in_lists | char_classes | intervals | bk_refs | nocollate,
    @@ -94,691 +62,611 @@ color="#000080">// don't use locale specific collation 
           sed = basic,
           perl = normal
        }; 
    -   typedef unsigned int flag_type;
    -};   
    + typedef unsigned int flag_type; +};  
    +

     

    +

    The enumerated type regbase::flag_type determines the syntax rules for regular expression compilation, the various flags have the following effects:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    regbase::escape_in_lists

    +

    Allows the use of the escape "\" character in sets of characters, for example [\]] represents the set of characters containing only "]". If this flag is not set then "\" is an ordinary character inside sets.

    +

     

    +

     

    +

    regbase::char_classes

    +

    When this bit is set, character classes [:classname:] are allowed inside character set declarations, for example "[[:word:]]" represents the set of all characters that belong to the character class "word".

    +

     

    +

     

    +

    regbase:: intervals

    +

    When this bit is set, repetition intervals are allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter a's.

    +

     

    +

     

    +

    regbase:: limited_ops

    +

    When this bit is set all of "+", "?" and "|" are ordinary characters in all situations.

    +

     

    +

     

    +

    regbase:: newline_alt

    +

    When this bit is set, then the newline character "\n" has the same effect as the alternation operator "|".

    +

     

    +

     

    +

    regbase:: bk_plus_qm

    +

    When this bit is set then "\+" represents the one or more repetition operator and "\?" represents the zero or one repetition operator. When this bit is not set then "+" and "?" are used instead.

    +

     

    +

     

    +

    regbase:: bk_braces

    +

    When this bit is set then "\{" and "\}" are used for bounded repetitions and "{" and "}" are normal characters. This is the opposite of default behavior.

    +

     

    +

     

    +

    regbase:: bk_parens

    +

    When this bit is set then "\(" and "\)" are used to group sub-expressions and "(" and ")" are ordinary characters, this is the opposite of default behaviour.

    +

     

    +

     

    +

    regbase:: bk_refs

    +

    When this bit is set then back references are allowed.

    +

     

    +

     

    +

    regbase:: bk_vbar

    +

    When this bit is set then "\|" represents the alternation operator and "|" is an ordinary character. This is the opposite of default behaviour.

    +

     

    +

     

    +

    regbase:: use_except

    +

    When this bit is set then a bad_expression exception will be thrown on error.  Use of this flag is deprecated - reg_expression will always throw on error.

    +

     

    +

     

    +

    regbase:: failbit

    +

    This bit is set on error, if regbase::use_except is not set, then this bit should be checked to see if a regular expression is valid before usage.

    +

     

    +

     

    +

    regbase::literal

    +

    All characters in the string are treated as literals, there are no special characters or escape sequences.

    +

     

    +

     

    +

    regbase::icase

    +

    All characters in the string are matched regardless of case.

    +

     

    +

     

    +

    regbase::nocollate

    +

    Locale specific collation is disabled when dealing with ranges in character set declarations.  For example when this bit is set the expression [a-c] would match the characters a, b and c only regardless of locale, where as when this is not set , then [a-c] matches any character which collates in the range a to c.

    +

     

    +

     

    +

    regbase::basic

    +

    Equivalent to the POSIX basic regular expression syntax: char_classes | intervals | limited_ops | bk_braces | bk_parens | bk_refs.

    +

     

    +

     

    +

    Regbase::extended

    +

    Equivalent to the POSIX extended regular expression syntax: char_classes | intervals | bk_refs.

    +

     

    +

     

    +

    regbase::normal

    +

    This is the default setting, and represents how most people expect the library to behave. Equivalent to the POSIX extended syntax, but with locale specific collation disabled, and escape characters inside set declarations enabled: regbase::escape_in_lists | regbase::char_classes | regbase::intervals | regbase::bk_refs | regbase::nocollate.

    +

     

    +

     

    +

    regbase::emacs

    +

    Provides compatability with the emacs editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.

    +

     

    +

     

    +

    regbase::awk

    +

    Provides compatabilty with the Unix utility Awk, the same as POSIX extended regular expressions, but allows escapes inside bracket-expressions (character sets). Equivalent to extended | escape_in_lists.

    +

     

    +

     

    +

    regbase::grep

    +

    Provides compatabilty with the Unix grep utility, the same as POSIX basic regular expressions, but with the newline character equivalent to the alternation operator. the same as basic | newline_alt.

    +

     

    +

     

    +

    regbase::egrep

    +

    Provides compatabilty with the Unix egrep utility, the same as POSIX extended regular expressions, but with the newline character equivalent to the alternation operator. the same as extended | newline_alt.

    +

     

    +

     

    +

    regbase::sed

    +

    Provides compatabilty with the Unix sed utility, the same as POSIX basic regular expressions.

    +

     

    +

     

    +

    regbase::perl

    +

    Provides compatibility with the perl programming language, the same as regbase::normal.

    +

     

    -

     

    +


    +

    Exception classes.

    +

    #include <boost/pat_except.hpp>

    +

    An instance of bad_expression is thrown whenever a bad regular expression is encountered.

    +
    namespace boost{
     
    -

    The enumerated type regbase::flag_type determines the -syntax rules for regular expression compilation, the various -flags have the following effects:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     regbase::escape_in_listsAllows the use of the escape - "\" character in sets of characters, for - example [\]] represents the set of characters containing - only "]". If this flag is not set then "\" - is an ordinary character inside sets. 
     regbase::char_classesWhen this bit is set, - character classes [:classname:] are allowed inside - character set declarations, for example "[[:word:]]" - represents the set of all characters that belong to the - character class "word". 
     regbase:: intervalsWhen this bit is set, - repetition intervals are allowed, for example "a{2,4}" - represents a repeat of between 2 and 4 letter a's. 
     regbase:: limited_opsWhen this bit is set all of - "+", "?" and "|" are - ordinary characters in all situations. 
     regbase:: newline_altWhen this bit is set, then - the newline character "\n" has the same effect - as the alternation operator "|". 
     regbase:: bk_plus_qmWhen this bit is set then - "\+" represents the one or more repetition - operator and "\?" represents the zero or one - repetition operator. When this bit is not set then "+" - and "?" are used instead. 
     regbase:: bk_bracesWhen this bit is set then - "\{" and "\}" are used for bounded - repetitions and "{" and "}" are - normal characters. This is the opposite of default - behavior. 
     regbase:: bk_parensWhen this bit is set then - "\(" and "\)" are used to group sub-expressions - and "(" and ")" are ordinary - characters, this is the opposite of default behaviour. 
     regbase:: bk_refsWhen this bit is set then - back references are allowed. 
     regbase:: bk_vbarWhen this bit is set then - "\|" represents the alternation operator and - "|" is an ordinary character. This is the - opposite of default behaviour. 
     regbase:: use_exceptWhen this bit is set then a bad_expression exception will - be thrown on error.  Use of this flag is deprecated - - reg_expression will always throw on error. 
     regbase:: failbitThis bit is set on error, if - regbase::use_except is not set, then this bit should be - checked to see if a regular expression is valid before - usage. 
     regbase::literalAll characters in the string - are treated as literals, there are no special characters - or escape sequences. 
     regbase::icaseAll characters in the string - are matched regardless of case. 
     regbase::nocollateLocale specific collation is - disabled when dealing with ranges in character set - declarations.  For example when this bit is set the - expression [a-c] would match the characters a, b and c - only regardless of locale, where as when this is not set - , then [a-c] matches any character which collates in the - range a to c. 
     regbase::basicEquivalent to the POSIX - basic regular expression syntax: char_classes | intervals - | limited_ops | bk_braces | bk_parens | bk_refs. 
     Regbase::extendedEquivalent to the POSIX - extended regular expression syntax: char_classes | - intervals | bk_refs. 
     regbase::normalThis is the - default setting, and represents how most people expect - the library to behave. Equivalent to the POSIX extended - syntax, but with locale specific collation disabled, and - escape characters inside set declarations enabled: - regbase::escape_in_lists | regbase::char_classes | - regbase::intervals | regbase::bk_refs | regbase::nocollate. 
     regbase::emacsProvides - compatability with the emacs editor, eqivalent to: bk_braces - | bk_parens | bk_refs | bk_vbar. 
     regbase::awk Provides - compatabilty with the Unix utility Awk, the same as POSIX - extended regular expressions, but allows escapes inside - bracket-expressions (character sets). Equivalent to - extended | escape_in_lists. 
     regbase::grepProvides - compatabilty with the Unix grep utility, the same as - POSIX basic regular expressions, but with the newline - character equivalent to the alternation operator. the - same as basic | newline_alt. 
     regbase::egrepProvides - compatabilty with the Unix egrep utility, the same as - POSIX extended regular expressions, but with the newline - character equivalent to the alternation operator. the - same as extended | newline_alt. 
     regbase::sedProvides - compatabilty with the Unix sed utility, the same as POSIX - basic regular expressions. 
     regbase::perlProvides - compatibility with the perl programming language, the - same as regbase::normal. 
    - -
    - -

    Exception classes.

    - -

    #include <boost/pat_except.hpp> -

    - -

    An instance of bad_expression is thrown whenever a bad -regular expression is encountered.

    - -
    namespace boost{
    -
    -class bad_pattern : public std::runtime_error
    +class bad_pattern : public std::runtime_error
     {
    -public:
    -   explicit bad_pattern(const std::string& s) : std::runtime_error(s){};
    +public:
    +   explicit bad_pattern(const std::string& s) : std::runtime_error(s){};
     };
     
    -class bad_expression : public bad_pattern
    +class bad_expression : public bad_pattern
     {
    -public:
    -   bad_expression(const std::string& s) : bad_pattern(s) {}
    +public:
    +   bad_expression(const std::string& s) : bad_pattern(s) {}
     };
     
     
    -} // namespace boost
    - -

    Footnotes: the class bad_pattern forms the base class -for all pattern-matching exceptions, of which bad_expression -is one. The choice of std::runtime_error as the base class -for bad_pattern is moot, depending upon how the library is -used exceptions may be either logic errors (programmer supplied -expressions) or run time errors (user supplied expressions).
    -

    - -
    - -

    Class reg_expression

    - -

    #include <boost/regex.hpp> -

    - -

    The template class reg_expression encapsulates regular -expression parsing and compilation. The class derives from class regbase and takes three template -parameters:

    - -

    charT: determines the character type, i.e. -either char or wchar_t.

    - -

    traits: determines the behaviour of the -character type, for example whether character matching is case -sensitive or not, and which character class names are recognized. -A default traits class is provided: regex_traits<charT>. -

    - -

    Allocator: the allocator class used to allocate -memory by the class.

    - -

    For ease of use there are two typedefs that define the two -standard reg_expression instances, unless you want to use -custom allocators, you won't need to use anything other than -these:

    - -
    namespace boost{
    -template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT>  >
    -class reg_expression;
    -typedef reg_expression<char> regex;
    -typedef reg_expression<wchar_t> wregex;
    -}
    - -

    The definition of reg_expression follows: it is based -very closely on class basic_string, and fulfils the requirements -for a container of charT.

    - -
    namespace boost{
    -template <class charT, class traits = char_regex_traits<charT>, class Allocator = std::allocator<charT>  >
    -class reg_expression : public regbase
    +} // namespace boost
    +

    Footnotes: the class bad_pattern forms the base class for all pattern-matching exceptions, of which bad_expression is one. The choice of std::runtime_error as the base class for bad_pattern is moot, depending upon how the library is used exceptions may be either logic errors (programmer supplied expressions) or run time errors (user supplied expressions).

    +


    +

    Class reg_expression

    +

    #include <boost/regex.hpp>

    +

    The template class reg_expression encapsulates regular expression parsing and compilation. The class derives from class regbase and takes three template parameters:

    +

    charT: determines the character type, i.e. either char or wchar_t.

    +

    traits: determines the behaviour of the character type, for example whether character matching is case sensitive or not, and which character class names are recognized. A default traits class is provided: regex_traits<charT>.

    +

    Allocator: the allocator class used to allocate memory by the class.

    +

    For ease of use there are two typedefs that define the two standard reg_expression instances, unless you want to use custom allocators, you won't need to use anything other than these:

    +
    namespace boost{
    +template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT>  >
    +class reg_expression;
    +typedef reg_expression<char> regex;
    +typedef reg_expression<wchar_t> wregex;
    +}
    +

    The definition of reg_expression follows: it is based very closely on class basic_string, and fulfils the requirements for a container of charT.

    +
    namespace boost{
    +template <class charT, class traits = char_regex_traits<charT>, class Allocator = std::allocator<charT>  >
    +class reg_expression : public regbase
     {
    -public: 
    -   // typedefs:  
    -   typedef charT char_type; 
    -   typedef traits traits_type; 
    -   // locale_type 
    -   // placeholder for actual locale type used by the 
    -   // traits class to localise *this. 
    -   typedef typename traits::locale_type locale_type; 
    -   // value_type 
    -   typedef charT value_type; 
    -   // reference, const_reference 
    -   typedef charT& reference; 
    -   typedef const charT& const_reference; 
    -   // iterator, const_iterator 
    -   typedef const charT* const_iterator; 
    -   typedef const_iterator iterator; 
    -   // difference_type 
    -   typedef typename Allocator::difference_type difference_type; 
    -   // size_type 
    -   typedef typename Allocator::size_type size_type; 
    -   // allocator_type 
    -   typedef Allocator allocator_type; 
    -   typedef Allocator alloc_type; 
    -   // flag_type 
    -   typedef jm_uintfast32_t flag_type; 
    -public: 
    -   // constructorsexplicit reg_expression(const Allocator& a = Allocator()); 
    -   explicit reg_expression(const charT* p, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    -   reg_expression(const charT* p1, const charT* p2, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    -   reg_expression(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator()); 
    -   reg_expression(const reg_expression&); 
    -   template <class ST, class SA> 
    -   explicit reg_expression(const std::basic_string<charT, ST, SA>& p, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    -   template <class I> 
    -   reg_expression(I first, I last, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    +public: 
    +   // typedefs:  
    +   typedef charT char_type; 
    +   typedef traits traits_type; 
    +   // locale_type 
    +   // placeholder for actual locale type used by the 
    +   // traits class to localise *this. 
    +   typedef typename traits::locale_type locale_type; 
    +   // value_type 
    +   typedef charT value_type; 
    +   // reference, const_reference 
    +   typedef charT& reference; 
    +   typedef const charT& const_reference; 
    +   // iterator, const_iterator 
    +   typedef const charT* const_iterator; 
    +   typedef const_iterator iterator; 
    +   // difference_type 
    +   typedef typename Allocator::difference_type difference_type; 
    +   // size_type 
    +   typedef typename Allocator::size_type size_type; 
    +   // allocator_type 
    +   typedef Allocator allocator_type; 
    +   typedef Allocator alloc_type; 
    +   // flag_type 
    +   typedef jm_uintfast32_t flag_type; 
    +public: 
    +   // constructorsexplicit reg_expression(const Allocator& a = Allocator()); 
    +   explicit reg_expression(const charT* p, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    +   reg_expression(const charT* p1, const charT* p2, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    +   reg_expression(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator()); 
    +   reg_expression(const reg_expression&); 
    +   template <class ST, class SA> 
    +   explicit reg_expression(const std::basic_string<charT, ST, SA>& p, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
    +   template <class I> 
    +   reg_expression(I first, I last, flag_type f = regbase::normal, const Allocator& a = Allocator()); 
        ~reg_expression(); 
    -   reg_expression& operator=(const reg_expression&); 
    -   reg_expression& operator=(const charT* ptr); 
    -   template <class ST, class SA> 
    -   reg_expression& operator=(const std::basic_string<charT, ST, SA>& p); 
    -   // 
    -   // assign: 
    -   reg_expression& assign(const reg_expression& that); 
    -   reg_expression& assign(const charT* ptr, flag_type f = regbase::normal); 
    -   reg_expression& assign(const charT* first, const charT* last, flag_type f = regbase::normal); 
    -   template <class string_traits, class A> 
    +   reg_expression& operator=(const reg_expression&); 
    +   reg_expression& operator=(const charT* ptr); 
    +   template <class ST, class SA> 
    +   reg_expression& operator=(const std::basic_string<charT, ST, SA>& p); 
    +   // 
    +   // assign: 
    +   reg_expression& assign(const reg_expression& that); 
    +   reg_expression& assign(const charT* ptr, flag_type f = regbase::normal); 
    +   reg_expression& assign(const charT* first, const charT* last, flag_type f = regbase::normal); 
    +   template <class string_traits, class A> 
        reg_expression& assign( 
    -       const std::basic_string<charT, string_traits, A>& s, 
    +       const std::basic_string<charT, string_traits, A>& s, 
            flag_type f = regbase::normal); 
    -   template <class iterator> 
    +   template <class iterator> 
        reg_expression& assign(iterator first, 
                               iterator last, 
                               flag_type f = regbase::normal); 
    -   // 
    -   // allocator access: 
    -   Allocator get_allocator()const; 
    -   // 
    -   // locale: 
    -   locale_type imbue(const locale_type& l); 
    -   locale_type getloc()const; 
    -   // 
    -   // flags: 
    -   flag_type getflags()const; 
    -   // 
    -   // str: 
    -   std::basic_string<charT> str()const; 
    -   // 
    -   // begin, end: 
    -   const_iterator begin()const; 
    -   const_iterator end()const; 
    -   // 
    -   // swap: 
    -   void swap(reg_expression&)throw(); 
    -   // 
    -   // size: 
    -   size_type size()const; 
    -   // 
    -   // max_size: 
    -   size_type max_size()const; 
    -   // 
    -   // empty: 
    -   bool empty()const; 
    -   unsigned mark_count()const; 
    -   bool operator==(const reg_expression&)const; 
    -   bool operator<(const reg_expression&)const; 
    +   // 
    +   // allocator access: 
    +   Allocator get_allocator()const; 
    +   // 
    +   // locale: 
    +   locale_type imbue(const locale_type& l); 
    +   locale_type getloc()const; 
    +   // 
    +   // flags: 
    +   flag_type getflags()const; 
    +   // 
    +   // str: 
    +   std::basic_string<charT> str()const; 
    +   // 
    +   // begin, end: 
    +   const_iterator begin()const; 
    +   const_iterator end()const; 
    +   // 
    +   // swap: 
    +   void swap(reg_expression&)throw(); 
    +   // 
    +   // size: 
    +   size_type size()const; 
    +   // 
    +   // max_size: 
    +   size_type max_size()const; 
    +   // 
    +   // empty: 
    +   bool empty()const; 
    +   unsigned mark_count()const; 
    +   bool operator==(const reg_expression&)const; 
    +   bool operator<(const reg_expression&)const; 
     };
    -} // namespace boost 
    +} // namespace boost
    +

    Class reg_expression has the following public member functions:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    reg_expression(Allocator a = Allocator());

    +

     Constructs a default instance of reg_expression without any expression.

    +

     

    +

     

    +

    reg_expression(charT* p, unsigned f = regbase::normal, Allocator a = Allocator());

    +

     Constructs an instance of reg_expression from the expression denoted by the null terminated string p, using the flags f to determine regular expression syntax. See class regbase for allowable flag values.

    +

     

    +

     

    +

    reg_expression(charT* p1, charT* p2, unsigned f = regbase::normal, Allocator a = Allocator());

    +

     Constructs an instance of reg_expression from the expression denoted by pair of iterators p1 and p2, using the flags f to determine regular expression syntax. See class regbase for allowable flag values.

    +

     

    +

     

    +

    reg_expression(charT* p, size_type len, unsigned f, Allocator a = Allocator());

    +

     Constructs an instance of reg_expression from the expression denoted by the string p of length len, using the flags f to determine regular expression syntax. See class regbase for allowable flag values.

    +

     

    +

     

    +

    template <class ST, class SA>
    +reg_expression(const std::basic_string<charT, ST, SA>& p, jm_uintfast32_t f = regbase::normal, const Allocator& a = Allocator());

    +

     Constructs an instance of reg_expression from the expression denoted by the string p, using the flags f to determine regular expression syntax. See class regbase for allowable flag values.

    +

    Note - this member may not be available depending upon your compiler capabilities.

    +

     

    +

     

    +

    template <class I>
    +reg_expression(I first, I last, flag_type f = regbase::normal, const Allocator& a = Allocator());

    +

     Constructs an instance of reg_expression from the expression denoted by pair of iterators p1 and p2, using the flags f to determine regular expression syntax. See class regbase for allowable flag values.

    +

     

    +

     

    +

    reg_expression(const reg_expression&);

    +

    Copy constructor - copies an existing regular expression.

    +

     

    +

     

    +

    reg_expression& operator=(const reg_expression&);

    +

    Copies an existing regular expression.

    +

     

    +

     

    +

    reg_expression& operator=(const charT* ptr);

    +

    Equivalent to assign(ptr);

    +

     

    +

     

    +

    template <class ST, class SA>

    +

    reg_expression& operator=(const std::basic_string<charT, ST, SA>& p);

    +

    Equivalent to assign(p);

    +

     

    +

     

    +

    reg_expression& assign(const reg_expression& that);

    +

    Copies the regular expression contained by that, throws bad_expression if that does not contain a valid expression. Returns *this.

    +

     

    +

     

    +

    reg_expression& assign(const charT* p, flag_type f = regbase::normal);

    +

    Compiles a regular expression from the expression denoted by the null terminated string p, using the flags f to determine regular expression syntax. See class regbase for allowable flag values. Throws bad_expression if p does not contain a valid expression. Returns *this.

    +

     

    +

     

    +

    reg_expression& assign(const charT* first, const charT* last, flag_type f = regbase::normal);

    +

    Compiles a regular expression from the expression denoted by the pair of iterators first-last, using the flags f to determine regular expression syntax. See class regbase for allowable flag values. Throws bad_expression if first-last does not contain a valid expression. Returns *this.

    +

     

    +

     

    +

    template <class string_traits, class A>
    +reg_expression& assign(const std::basic_string<charT, string_traits, A>& s, flag_type f = regbase::normal);

    +

    Compiles a regular expression from the expression denoted by the string s, using the flags f to determine regular expression syntax. See class regbase for allowable flag values. Throws bad_expression if s does not contain a valid expression. Returns *this.

    +

     

    +

     

    +

    template <class iterator>
    +reg_expression& assign(iterator first, iterator last, flag_type f = regbase::normal);

    +

    Compiles a regular expression from the expression denoted by the pair of iterators first-last, using the flags f to determine regular expression syntax. See class regbase for allowable flag values. Throws bad_expression if first-last does not contain a valid expression. Returns *this.

    +

     

    +

     

    +

    Allocator get_allocator()const;

    +

    Returns the allocator used by the expression.

    +

     

    +

     

    +

    locale_type imbue(const locale_type& l);

    +

    Imbues the expression with the specified locale, and invalidates the current expression.

    +

     

    +

     

    +

    locale_type getloc()const;

    +

    Returns the locale used by the expression.

    +

     

    +

     

    +

    flag_type getflags()const;

    +

    Returns the flags used to compile the current expression.

    +

     

    +

     

    +

    std::basic_string<charT> str()const;

    +

    Returns the current expression as a string.

    +

     

    +

     

    +

    const_iterator begin()const;

    +

    Returns a pointer to the first character of the current expression.

    +

     

    +

     

    +

    const_iterator end()const;

    +

    Returns a pointer to the end of the current expression.

    +

     

    +

     

    +

    size_type size()const;

    +

    Returns the length of the current expression.

    +

     

    +

     

    +

    size_type max_size()const;

    +

    Returns the maximum length of a regular expression text.

    +

     

    +

     

    +

    bool empty()const;

    +

    Returns true if the object contains no valid expression.

    +

     

    +

     

    +

    unsigned mark_count()const ;

    +

    Returns the number of sub-expressions in the compiled regular expression. Note that this includes the whole match (subexpression zero), so the value returned is always >= 1.

    +

     

    -

    Class reg_expression has the following public member functions: -

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     reg_expression(Allocator a = - Allocator()); Constructs a default - instance of reg_expression without any expression. 
     reg_expression(charT* p, unsigned - f = regbase::normal, Allocator a = Allocator()); Constructs an instance - of reg_expression from the expression denoted by the null - terminated string p, using the flags f to - determine regular expression syntax. See class regbase for allowable flag values. 
     reg_expression(charT* p1, - charT* p2, unsigned f = regbase::normal, Allocator - a = Allocator()); Constructs an instance - of reg_expression from the expression denoted by pair of - iterators p1 and p2, using the flags f - to determine regular expression syntax. See class regbase for allowable flag values. 
     reg_expression(charT* p, - size_type len, unsigned f, Allocator a = Allocator()); Constructs an instance - of reg_expression from the expression denoted by the - string p of length len, using the flags f - to determine regular expression syntax. See class regbase for allowable flag values. 
     template <class ST, - class SA>
    - reg_expression(const std::basic_string<charT, - ST, SA>& p, jm_uintfast32_t f = regbase::normal, const - Allocator& a = Allocator());
     Constructs an instance - of reg_expression from the expression denoted by the - string p, using the flags f to determine - regular expression syntax. See class regbase - for allowable flag values.

    Note - this member may not - be available depending upon your compiler capabilities.

    -
     
     template <class I>
    - reg_expression(I first, I last, flag_type f = regbase::normal, - const Allocator& a = Allocator());
     Constructs an instance - of reg_expression from the expression denoted by pair of - iterators p1 and p2, using the flags f - to determine regular expression syntax. See class regbase for allowable flag values. 
     reg_expression(const - reg_expression&);Copy constructor - copies an - existing regular expression. 
     reg_expression& operator=(const - reg_expression&);Copies an existing regular - expression. 
     reg_expression& operator=(const - charT* ptr);Equivalent to assign(ptr); 
     template <class ST, class - SA>

    reg_expression& operator=(const std::basic_string<charT, - ST, SA>& p);

    -
    Equivalent to assign(p); 
     reg_expression& assign(const - reg_expression& that);Copies the regular - expression contained by that, throws bad_expression if that - does not contain a valid expression. Returns *this. 
     reg_expression& assign(const - charT* p, flag_type f = regbase::normal);Compiles a regular - expression from the expression denoted by the null - terminated string p, using the flags f to - determine regular expression syntax. See class regbase for allowable flag values. - Throws bad_expression if p - does not contain a valid expression. Returns *this. 
     reg_expression& assign(const - charT* first, const charT* last, flag_type f = - regbase::normal);Compiles a regular - expression from the expression denoted by the pair of - iterators first-last, using the flags f to - determine regular expression syntax. See class regbase for allowable flag values. - Throws bad_expression if first-last - does not contain a valid expression. Returns *this. 
     template <class - string_traits, class A>
    - reg_expression& assign(const std::basic_string<charT, - string_traits, A>& s, flag_type f = regbase::normal);
    Compiles a regular - expression from the expression denoted by the string s, - using the flags f to determine regular expression - syntax. See class regbase for - allowable flag values. Throws bad_expression - if s does not contain a valid expression. Returns - *this. 
     template <class iterator> -
    - reg_expression& assign(iterator first, iterator last, - flag_type f = regbase::normal);
    Compiles a regular - expression from the expression denoted by the pair of - iterators first-last, using the flags f to - determine regular expression syntax. See class regbase for allowable flag values. - Throws bad_expression if first-last - does not contain a valid expression. Returns *this. 
     Allocator get_allocator()const;Returns the allocator used - by the expression. 
     locale_type imbue(const - locale_type& l);Imbues the expression with - the specified locale, and invalidates the current - expression. 
     locale_type getloc()const;Returns the locale used by - the expression. 
     flag_type getflags()const;Returns the flags used to - compile the current expression. 
     std::basic_string<charT> - str()const;Returns the current - expression as a string. 
     const_iterator begin()const;Returns a pointer to the - first character of the current expression. 
     const_iterator end()const;Returns a pointer to the end - of the current expression. 
     size_type size()const;Returns the length of the - current expression. 
     size_type max_size()const;Returns the maximum length - of a regular expression text. 
     bool empty()const;Returns true if the object - contains no valid expression. 
     unsigned mark_count()const - ;Returns the number of sub-expressions - in the compiled regular expression. Note that this - includes the whole match (subexpression zero), so the - value returned is always >= 1. 
    - -
    - -

    Class regex_traits

    - -

    #include <boost/regex_traits.hpp> -

    - -

    This is a preliminary version of the regular expression -traits class, and is subject to change.

    - -

    The purpose of the traits class is to make it easier to -customise the behaviour of reg_expression and the -associated matching algorithms. Custom traits classes can handle -special character sets or define additional character classes, -for example one could define [[:kanji:]] as the set of all (Unicode) -kanji characters. This library provides three traits classes and -a wrapper class regex_traits, which inherits from one of -these depending upon the default localisation model in use, class -c_regex_traits encapsulates the global C locale, class w32_regex_traits -encapsulates the global Win32 locale (only available on Win32 -systems), and class cpp_regex_traits encapsulates the C++ -locale (only provided if std::locale is supported):

    - -
    template <class charT> class c_regex_traits;
    +


    +

    Class regex_traits

    +

    #include <boost/regex_traits.hpp>

    +

    This is a preliminary version of the regular expression traits class, and is subject to change.

    +

    The purpose of the traits class is to make it easier to customise the behaviour of reg_expression and the associated matching algorithms. Custom traits classes can handle special character sets or define additional character classes, for example one could define [[:kanji:]] as the set of all (Unicode) kanji characters. This library provides three traits classes and a wrapper class regex_traits, which inherits from one of these depending upon the default localisation model in use, class c_regex_traits encapsulates the global C locale, class w32_regex_traits encapsulates the global Win32 locale (only available on Win32 systems), and class cpp_regex_traits encapsulates the C++ locale (only provided if std::locale is supported):

    +
    template <class charT> class c_regex_traits;
     template<> class c_regex_traits<char> { /*details*/ };
     template<> class c_regex_traits<wchar_t> { /*details*/ };
     
    @@ -790,910 +678,813 @@ template <class charT> class cpp_regex_traits;
     template<> class cpp_regex_traits<char> { /*details*/ };
     template<> class cpp_regex_traits<wchar_t> { /*details*/ };
     
    -template <class charT> class regex_traits : public base_type { /*detailts*/ };
    - -

    Where "base_type" defaults to w32_regex_traits -on Win32 systems, and c_regex_traits otherwise. The -default behaviour can be changed by defining one of BOOST_RE_LOCALE_C -(forces use of c_regex_traits by default), or BOOST_RE_LOCALE_CPP -(forces use of cpp_regex_traits by default). Alternatively -a specific traits class can be passed to the reg_expression -template.

    - -

    The requirements for custom traits classes are documented separately here....
    -

    - -
    - -

    Class match_results

    - -

    #include <boost/regex.hpp> -

    - -

    Regular expressions are different from many simple pattern-matching -algorithms in that as well as finding an overall match they can -also produce sub-expression matches: each sub-expression being -delimited in the pattern by a pair of parenthesis (...). There -has to be some method for reporting sub-expression matches back -to the user: this is achieved this by defining a class match_results -that acts as an indexed collection of sub-expression matches, -each sub-expression match being contained in an object of type sub_match. -

    - -
    // 
    +template <class charT> class regex_traits : public base_type { /*detailts*/ };
    +

    Where "base_type" defaults to w32_regex_traits on Win32 systems, and c_regex_traits otherwise. The default behaviour can be changed by defining one of BOOST_RE_LOCALE_C (forces use of c_regex_traits by default), or BOOST_RE_LOCALE_CPP (forces use of cpp_regex_traits by default). Alternatively a specific traits class can be passed to the reg_expression template.

    +

    The requirements for custom traits classes are documented separately here....

    +


    +

    Class match_results

    +

    #include <boost/regex.hpp>

    +

    Regular expressions are different from many simple pattern-matching algorithms in that as well as finding an overall match they can also produce sub-expression matches: each sub-expression being delimited in the pattern by a pair of parenthesis (...). There has to be some method for reporting sub-expression matches back to the user: this is achieved this by defining a class match_results that acts as an indexed collection of sub-expression matches, each sub-expression match being contained in an object of type sub_match.

    +
    // 
     // class sub_match: 
     // denotes one sub-expression match. 
     //         
    -template <class iterator>
    -struct sub_match
    +template <class iterator>
    +struct sub_match
     {
    -   typedef typename std::iterator_traits<iterator>::value_type       value_type;
    -   typedef typename std::iterator_traits<iterator>::difference_type  difference_type;
    -   typedef iterator                                                  iterator_type;
    +   typedef typename std::iterator_traits<iterator>::value_type       value_type;
    +   typedef typename std::iterator_traits<iterator>::difference_type  difference_type;
    +   typedef iterator                                                  iterator_type;
        
        iterator first;
        iterator second;
    -   bool matched;
    +   bool matched;
     
    -   operator std::basic_string<value_type>()const;
    +   operator std::basic_string<value_type>()const;
     
    -   bool operator==(const sub_match& that)const;
    -   bool operator !=(const sub_match& that)const;
    -   difference_type length()const;
    +   bool operator==(const sub_match& that)const;
    +   bool operator !=(const sub_match& that)const;
    +   difference_type length()const;
     };
     
    -// 
    +// 
     // class match_results: 
     // contains an indexed collection of matched sub-expressions. 
     // 
    -template <class iterator, class Allocator = std::allocator<typename std::iterator_traits<iterator>::value_type > > 
    -class match_results 
    -{ 
    -public: 
    -   typedef Allocator                                                 alloc_type; 
    -   typedef typename Allocator::template Rebind<iterator>::size_type  size_type; 
    -   typedef typename std::iterator_traits<iterator>::value_type       char_type; 
    -   typedef sub_match<iterator>                                       value_type; 
    -   typedef typename std::iterator_traits<iterator>::difference_type  difference_type; 
    -   typedef iterator                                                  iterator_type; 
    -   explicit match_results(const Allocator& a = Allocator()); 
    -   match_results(const match_results& m); 
    -   match_results& operator=(const match_results& m); 
    +template <class iterator, class Allocator = std::allocator<typename std::iterator_traits<iterator>::value_type > > 
    +class match_results 
    +{ 
    +public: 
    +   typedef Allocator                                                 alloc_type; 
    +   typedef typename Allocator::template Rebind<iterator>::size_type  size_type; 
    +   typedef typename std::iterator_traits<iterator>::value_type       char_type; 
    +   typedef sub_match<iterator>                                       value_type; 
    +   typedef typename std::iterator_traits<iterator>::difference_type  difference_type; 
    +   typedef iterator                                                  iterator_type; 
    +   explicit match_results(const Allocator& a = Allocator()); 
    +   match_results(const match_results& m); 
    +   match_results& operator=(const match_results& m); 
        ~match_results(); 
    -   size_type size()const; 
    -   const sub_match<iterator>& operator[](int n) const; 
    -   Allocator allocator()const; 
    -   difference_type length(int sub = 0)const; 
    -   difference_type position(unsigned int sub = 0)const; 
    -   unsigned int line()const; 
    -   iterator line_start()const; 
    -   std::basic_string<char_type> str(int sub = 0)const; 
    -   void swap(match_results& that); 
    -   bool operator==(const match_results& that)const; 
    -   bool operator<(const match_results& that)const; 
    -};
    +   size_type size()const; +   const sub_match<iterator>& operator[](int n) const; +   Allocator allocator()const; +   difference_type length(int sub = 0)const; +   difference_type position(unsigned int sub = 0)const; +   unsigned int line()const; +   iterator line_start()const; +   std::basic_string<char_type> str(int sub = 0)const; +   void swap(match_results& that); +   bool operator==(const match_results& that)const; +   bool operator<(const match_results& that)const; +}; +typedef match_results<const char*> cmatch; +typedef match_results<const wchar_t*> wcmatch;
    +

    Class match_results is used for reporting what matched a regular expression, it is passed to the matching algorithms regex_match and regex_search, and is used by regex_grep to notify the callback function (or function object) what matched. Note that the default allocator parameter has been chosen to match the default allocator parameter to reg_expression. match_results has the following public member functions:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    match_results(Allocator a = Allocator());

    +

    Constructs an instance of match_results, using allocator instance a.

    +

     

    +

     

    +

    match_results(const match_results& m);

    +

    Copy constructor.

    +

     

    +

     

    +

    match_results& operator=(const match_results& m);

    +

    Assignment operator.

    +

     

    +

     

    +

    const sub_match<iterator>& operator[](size_type n) const;

    +

    Returns what matched, item 0 represents the whole string, item 1 the first sub-expression and so on.

    +

     

    +

     

    +

    Allocator& allocator()const;

    +

    Returns the allocator used by the class.

    +

     

    +

     

    +

    difference_type length(unsigned int sub = 0);

    +

    Returns the length of the matched subexpression, defaults to the length of the whole match, in effect this is equivalent to operator[](sub).second - operator[](sub).first.

    +

     

    +

     

    +

    difference_type position(unsigned int sub = 0);

    +

    Returns the position of the matched sub-expression, defaults to the position of the whole match. The returned value is the position of the match relative to the start of the string.

    +

     

    +

     

    +

    unsigned int line()const;

    +

    Returns the index of the line on which the match occurred, indices start with 1, not zero. Equivalent to the number of newline characters prior to operator[](0).first plus one.

    +

     

    +

     

    +

    iterator line_start()const;

    +

    Returns an iterator denoting the start of the line on which the match occurred.

    +

     

    +

     

    +

    size_type size()const;

    +

    Returns how many sub-expressions are present in the match, including sub-expression zero (the whole match). Returns zero if no matches were found in the search operation.

    +

     

    -
    typedef match_results<const char*> cmatch;
    -typedef match_results<const wchar_t*> wcmatch; 
    +


    +

    The operator[] member function needs further explanation: it returns a const reference to a structure of type sub_match<iterator>, which has the following public members:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    typedef typename std::iterator_traits<iterator>::value_type value_type;

    +

    The type pointed to by the iterators.

    +

     

    +

     

    +

    typedef typename std::iterator_traits<iterator>::difference_type difference_type;

    +

    A type that represents the difference between two iterators.

    +

     

    +

     

    +

    typedef iterator iterator_type;

    +

    The iterator type.

    +

     

    +

     

    +

    iterator first

    +

    An iterator denoting the position of the start of the match.

    +

     

    +

     

    +

    iterator second

    +

    An iterator denoting the position of the end of the match.

    +

     

    +

     

    +

    bool matched

    +

    A Boolean value denoting whether this sub-expression participated in the match.

    +

     

    +

     

    +

    difference_type length()const;

    +

    Returns the length of the sub-expression match.

    +

     

    +

     

    +

    operator std::basic_string<value_type> ()const;

    +

    Converts the sub-expression match into an instance of std::basic_string<>. Note that this member may be either absent, or present to a more limited degree depending upon your compiler capabilities.

    +

     

    -

    Class match_results is used for reporting what matched a -regular expression, it is passed to the matching algorithms regex_match and regex_search, -and is used by regex_grep to notify the -callback function (or function object) what matched. Note that -the default allocator parameter has been chosen to match the -default allocator parameter to reg_expression. match_results has -the following public member functions:

    +

    Operator[] takes an integer as an argument that denotes the sub-expression for which to return information, the argument can take the following special values:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    -2

    +

    Returns everything from the end of the match, to the end of the input string, equivalent to $' in perl. If this is a null string, then:

    +

    first == second

    +

    And

    +

    matched == false.

    +

     

    +

     

    +

    -1

    +

    Returns everything from the start of the input string (or the end of the last match if this is a grep operation), to the start of this match. Equivalent to $` in perl. If this is a null string, then:

    +

    first == second

    +

    And

    +

    matched == false.

    +

     

    +

     

    +

    0

    +

    Returns the whole of what matched, equivalent to $& in perl. The matched parameter is always true.

    +

     

    +

     

    +

    0 < N < size()

    +

    Returns what matched sub-expression N, if this sub-expression did not participate in the match then 

    +

    matched == false

    +

    otherwise:

    +

    matched == true.

    +

     

    +

     

    +

    N < -2 or N >= size()

    +

    Represents an out-of range non-existent sub-expression. Returns a "null" match in which

    +

    first == last

    +

    And

    +

    matched == false.

    +

     

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     match_results(Allocator a = - Allocator());Constructs an instance of - match_results, using allocator instance a. 
     match_results(const match_results& - m);Copy constructor. 
     match_results& operator=(const - match_results& m);Assignment operator. 
     const sub_match<iterator>& - operator[](size_type n) const;Returns what matched, item 0 - represents the whole string, item 1 the first sub-expression - and so on. 
     Allocator& allocator()const;Returns the allocator used - by the class. 
     difference_type length(unsigned - int sub = 0);Returns the length of the - matched subexpression, defaults to the length of the - whole match, in effect this is equivalent to operator[](sub).second - - operator[](sub).first. 
     difference_type position(unsigned - int sub = 0);Returns the position of the - matched sub-expression, defaults to the position of the - whole match. The returned value is the position of the - match relative to the start of the string. 
     unsigned int - line()const;Returns the index of the - line on which the match occurred, indices start with 1, - not zero. Equivalent to the number of newline characters - prior to operator[](0).first plus one. 
     iterator line_start()const;Returns an iterator denoting - the start of the line on which the match occurred. 
     size_type size()const;Returns how many sub-expressions - are present in the match, including sub-expression zero (the - whole match). Returns zero if no matches were found in - the search operation. 
    - -


    - -

    The operator[] member function needs further explanation: it -returns a const reference to a structure of type sub_match<iterator>, -which has the following public members:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     typedef typename - std::iterator_traits<iterator>::value_type value_type;The type pointed to by the - iterators. 
     typedef typename - std::iterator_traits<iterator>::difference_type - difference_type;A type that represents the - difference between two iterators. 
     typedef iterator - iterator_type;The iterator type. 
     iterator firstAn iterator denoting the - position of the start of the match. 
     iterator secondAn iterator denoting the - position of the end of the match. 
     bool matchedA Boolean value denoting - whether this sub-expression participated in the match. 
     difference_type length()const;Returns the length of the - sub-expression match. 
     operator std::basic_string<value_type> - ()const;Converts the sub-expression - match into an instance of std::basic_string<>. Note - that this member may be either absent, or present to a - more limited degree depending upon your compiler - capabilities. 
    - -

    Operator[] takes an integer as an argument that denotes the -sub-expression for which to return information, the argument can -take the following special values:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     -2Returns everything from the - end of the match, to the end of the input string, - equivalent to $' in perl. If this is a null string, then: -

    first == second

    -

    And

    -

    matched == false.

    -
     
     -1Returns everything from the - start of the input string (or the end of the last match - if this is a grep operation), to the start of this match. - Equivalent to $` in perl. If this is a null string, then: -

    first == second

    -

    And

    -

    matched == false.

    -
     
     0Returns the whole of what - matched, equivalent to $& in perl. The matched - parameter is always true. 
     0 < N < size()Returns what matched sub-expression - N, if this sub-expression did not participate in the - match then 

    matched == false

    -

    otherwise:

    -

    matched == true.

    -
     
     N < -2 or N >= size()Represents an out-of range - non-existent sub-expression. Returns a "null" - match in which

    first == last

    -

    And

    -

    matched == false.

    -
     
    - -

    Note that as well as being parameterised for an allocator, -match_results<> also takes an iterator type, this allows -any pair of iterators to be searched for a given regular -expression, provided the iterators have at least bi-directional -properties.
    -

    - -
    - -

    Algorithm regex_match

    - -

    #include <boost/regex.hpp> -

    - -

    The algorithm regex _match determines whether a given regular -expression matches a given sequence denoted by a pair of -iterators, the algorithm is defined as follows, note that the -result is true only if the expression matches the whole of the -input sequence, the main use of this function is data input -validation:

    - -
    template <class iterator, class Allocator, class charT, class traits, class Allocator2>
    -bool regex_match(iterator first, 
    +

    Note that as well as being parameterised for an allocator, match_results<> also takes an iterator type, this allows any pair of iterators to be searched for a given regular expression, provided the iterators have at least bi-directional properties.

    +


    +

    Algorithm regex_match

    +

    #include <boost/regex.hpp>

    +

    The algorithm regex _match determines whether a given regular expression matches a given sequence denoted by a pair of iterators, the algorithm is defined as follows, note that the result is true only if the expression matches the whole of the input sequence, the main use of this function is data input validation:

    +
    template <class iterator, class Allocator, class charT, class traits, class Allocator2>
    +bool regex_match(iterator first, 
                      iterator last, 
                      match_results<iterator, Allocator>& m, 
    -                 const reg_expression<charT, traits, Allocator2>& e, 
    -                 unsigned flags = match_default);
    +                 const reg_expression<charT, traits, Allocator2>& e,  +                 unsigned flags = match_default);
    +

    The library also defines the following convenience versions, which take either a const charT*, or a const std::basic_string<>& in place of a pair of iterators [note - these versions may not be available, or may be available in a more limited form, depending upon your compilers capabilities]:

    +
    template <class charT, class Allocator, class traits, class Allocator2>
    +bool regex_match(const charT* str, 
    +                 match_results<const charT*, Allocator>& m, 
    +                 const reg_expression<charT, traits, Allocator2>& e, 
    +                 unsigned flags = match_default)
     
    -

    The library also defines the following convenience versions, -which take either a const charT*, or a const std::basic_string<>& -in place of a pair of iterators [note - these versions may not be -available, or may be available in a more limited form, depending -upon your compilers capabilities]:

    - -
    template <class charT, class Allocator, class traits, class Allocator2>
    -bool regex_match(const charT* str, 
    -                 match_results<const charT*, Allocator>& m, 
    -                 const reg_expression<charT, traits, Allocator2>& e, 
    -                 unsigned flags = match_default)
    -
    -template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
    -bool regex_match(const std::basic_string<charT, ST, SA>& s, 
    -                 match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator>& m, 
    -                 const reg_expression<charT, traits, Allocator2>& e, 
    -                 unsigned flags = match_default);
    - -

    Finally there is a set of convenience versions that simply -return true or false and do not indicate what matched:

    - -
    template <class iterator, class Allocator, class charT, class traits, class Allocator2>
    -bool regex_match(iterator first, 
    +template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
    +bool regex_match(const std::basic_string<charT, ST, SA>& s, 
    +                 match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator>& m, 
    +                 const reg_expression<charT, traits, Allocator2>& e, 
    +                 unsigned flags = match_default);
    +

    Finally there is a set of convenience versions that simply return true or false and do not indicate what matched:

    +
    template <class iterator, class Allocator, class charT, class traits, class Allocator2>
    +bool regex_match(iterator first, 
                      iterator last, 
    -                 const reg_expression<charT, traits, Allocator2>& e, 
    -                 unsigned flags = match_default);
    +                 const reg_expression<charT, traits, Allocator2>& e, 
    +                 unsigned flags = match_default);
     
    -template <class charT, class Allocator, class traits, class Allocator2>
    -bool regex_match(const charT* str, 
    -                 const reg_expression<charT, traits, Allocator2>& e, 
    -                 unsigned flags = match_default)
    +template <class charT, class Allocator, class traits, class Allocator2>
    +bool regex_match(const charT* str, 
    +                 const reg_expression<charT, traits, Allocator2>& e, 
    +                 unsigned flags = match_default)
     
    -template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
    -bool regex_match(const std::basic_string<charT, ST, SA>& s, 
    -                 const reg_expression<charT, traits, Allocator2>& e, 
    -                 unsigned flags = match_default);
    -
    +template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> +bool regex_match(const std::basic_string<charT, ST, SA>& s,  +                 const reg_expression<charT, traits, Allocator2>& e,  +                 unsigned flags = match_default);
    +

    The parameters for the main function version are as follows:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    iterator first

    +

    Denotes the start of the range to be matched.

    +

     

    +

     

    +

    iterator last

    +

    Denotes the end of the range to be matched.

    +

     

    +

     

    +

    match_results<iterator, Allocator>& m

    +

    An instance of match_results in which what matched will be reported. On exit if a match occurred then m[0] denotes the whole of the string that matched, m[0].first must be equal to first, m[0].second will be less than or equal to last. m[1] denotes the first subexpression m[2] the second subexpression and so on. If no match occurred then m[0].first = m[0].second = last.

    +

     

    +

     

    +

    const reg_expression<charT, traits, Allocator2>& e

    +

    Contains the regular expression to be matched.

    +

     

    +

     

    +

    unsigned flags = match_default

    +

    Determines the semantics used for matching, a combination of one or more match_flags enumerators.

    +

     

    -

    The parameters for the main function version are as follows:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     iterator firstDenotes the start of the range to be matched. 
     iterator lastDenotes the end of the range - to be matched. 
     match_results<iterator, - Allocator>& mAn instance of match_results - in which what matched will be reported. On exit if a - match occurred then m[0] denotes the whole of the string - that matched, m[0].first must be equal to first, m[0].second - will be less than or equal to last. m[1] denotes the - first subexpression m[2] the second subexpression and so - on. If no match occurred then m[0].first = m[0].second = - last. 
     const reg_expression<charT, - traits, Allocator2>& eContains the regular - expression to be matched. 
     unsigned flags = match_defaultDetermines the semantics - used for matching, a combination of one or more match_flags enumerators.
    - -

    regex_match returns false if no match occurs or true if it -does. A match only occurs if it starts at first and -finishes at last. Example: the following example processes an ftp -response:

    - -
    #include <stdlib.h> 
    +

    regex_match returns false if no match occurs or true if it does. A match only occurs if it starts at first and finishes at last. Example: the following example processes an ftp response:

    +
    #include <stdlib.h> 
     #include <boost/regex.hpp> 
     #include <string> 
     #include <iostream> 
     
    -using namespace boost; 
    +using namespace boost; 
     
    -regex expression("([0-9]+)(\\-| |$)(.*)"); 
    +regex expression("([0-9]+)(\\-| |$)(.*)"); 
     
    -// process_ftp: 
    +// process_ftp: 
     // on success returns the ftp response code, and fills 
    -// msg with the ftp response message. 
    -int process_ftp(const char* response, std::string* msg) 
    +// msg with the ftp response message. 
    +int process_ftp(const char* response, std::string* msg) 
     { 
        cmatch what; 
    -   if(regex_match(response, what, expression)) 
    -   { 
    -      // what[0] contains the whole string 
    -      // what[1] contains the response code 
    -      // what[2] contains the separator character 
    -      // what[3] contains the text message. 
    -      if(msg) 
    +   if(regex_match(response, what, expression)) 
    +   { 
    +      // what[0] contains the whole string 
    +      // what[1] contains the response code 
    +      // what[2] contains the separator character 
    +      // what[3] contains the text message. 
    +      if(msg) 
              msg->assign(what[3].first, what[3].second); 
    -      return std::atoi(what[1].first); 
    -   } 
    -   // failure did not match 
    -   if(msg) 
    +      return std::atoi(what[1].first); 
    +   } 
    +   // failure did not match 
    +   if(msg) 
           msg->erase(); 
    -   return -1; 
    -}
    +   return -1; +}
    +

    The value of the flags parameter passed to the algorithm must be a combination of one or more of the following values:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    match_default

    +

    The default value, indicates that first represents the start of a line, the start of a buffer, and (possibly) the start of a word. Also implies that last represents the end of a line, the end of the buffer and (possibly) the end of a word. Implies that a dot sub-expression "." will match both the newline character and a null.

    +

     

    +

     

    +

    match_not_bol

    +

    When this flag is set then first does not represent the start of a new line.

    +

     

    +

     

    +

    match_not_eol

    +

    When this flag is set then last does not represent the end of a line.

    +

     

    +

     

    +

    match_not_bob

    +

    When this flag is set then first is not the beginning of a buffer.

    +

     

    +

     

    +

    match_not_eob

    +

    When this flag is set then last does not represent the end of a buffer.

    +

     

    +

     

    +

    match_not_bow

    +

    When this flag is set then first can never match the start of a word.

    +

     

    +

     

    +

    match_not_eow

    +

    When this flag is set then last can never match the end of a word.

    +

     

    +

     

    +

    match_not_dot_newline

    +

    When this flag is set then a dot expression "." can not match the newline character.

    +

     

    +

     

    +

    match_not_dot_null

    +

    When this flag is set then a dot expression "." can not match a null character.

    +

     

    +

     

    +

    match_prev_avail

    +

    When this flag is set, then *--first is a valid expression and the flags match_not_bol and match_not_bow have no effect, since the value of the previous character can be used to check these.

    +

     

    +

     

    +

    match_any

    +

    When this flag is set, then the first string matched is returned, rather than the longest possible match. This flag can significantly reduce the time taken to find a match, but what matches is undefined.

    +

     

    +

     

    +

    match_not_null

    +

    When this flag is set, then the expression will never match a null string.

    +

     

    +

     

    +

    match_continuous

    +

    When this flags is set, then during a grep operation, each successive match must start from where the previous match finished.

    +

     

    +

    match_partial

    +

    When this flag is set, the regex algorithms will report partial matches - that is where one or more characters at the end of the text input matched some prefix of the regular expression.

    -

    The value of the flags parameter -passed to the algorithm must be a combination of one or more of -the following values:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     match_defaultThe default value, indicates - that first represents the start of a line, the - start of a buffer, and (possibly) the start of a word. - Also implies that last represents the end of a - line, the end of the buffer and (possibly) the end of a - word. Implies that a dot sub-expression "." - will match both the newline character and a null. 
     match_not_bolWhen this flag is set then first - does not represent the start of a new line. 
     match_not_eolWhen this flag is set then last - does not represent the end of a line. 
     match_not_bobWhen this flag is set then first - is not the beginning of a buffer. 
     match_not_eobWhen this flag is set then last - does not represent the end of a buffer. 
     match_not_bowWhen this flag is set then first - can never match the start of a word. 
     match_not_eowWhen this flag is set then last - can never match the end of a word. 
     match_not_dot_newlineWhen this flag is set then a - dot expression "." can not match the newline - character. 
     match_not_dot_nullWhen this flag is set then a - dot expression "." can not match a null - character. 
     match_prev_availWhen this flag - is set, then *--first is a valid expression and - the flags match_not_bol and match_not_bow have no effect, - since the value of the previous character can be used to - check these. 
     match_anyWhen this flag - is set, then the first string matched is returned, rather - than the longest possible match. This flag can - significantly reduce the time taken to find a match, but - what matches is undefined. 
     match_not_nullWhen this flag - is set, then the expression will never match a null - string. 
     match_continuousWhen this flags - is set, then during a grep operation, each successive - match must start from where the previous match finished. 
    - -
    - -

    Algorithm regex_search

    - -

     #include <boost/regex.hpp> -

    - -

    The algorithm regex_search will search a range denoted by a -pair of iterators for a given regular expression. The algorithm -uses various heuristics to reduce the search time by only -checking for a match if a match could conceivably start at that -position. The algorithm is defined as follows:

    - -
    template <class iterator, class Allocator, class charT, class traits, class Allocator2>
    -bool regex_search(iterator first, 
    +

     

    +


    +

    Algorithm regex_search

    +

     #include <boost/regex.hpp>

    +

    The algorithm regex_search will search a range denoted by a pair of iterators for a given regular expression. The algorithm uses various heuristics to reduce the search time by only checking for a match if a match could conceivably start at that position. The algorithm is defined as follows:

    +
    template <class iterator, class Allocator, class charT, class traits, class Allocator2>
    +bool regex_search(iterator first, 
                     iterator last, 
                     match_results<iterator, Allocator>& m, 
    -                const reg_expression<charT, traits, Allocator2>& e, 
    -                unsigned flags = match_default);
    +                const reg_expression<charT, traits, Allocator2>& e,  +                unsigned flags = match_default);
    +

    The library also defines the following convenience versions, which take either a const charT*, or a const std::basic_string<>& in place of a pair of iterators [note - these versions may not be available, or may be available in a more limited form, depending upon your compilers capabilities]:

    +
    template <class charT, class Allocator, class traits, class Allocator2>
    +bool regex_search(const charT* str, 
    +                match_results<const charT*, Allocator>& m, 
    +                const reg_expression<charT, traits, Allocator2>& e, 
    +                unsigned flags = match_default);
     
    -

    The library also defines the following convenience versions, -which take either a const charT*, or a const std::basic_string<>& -in place of a pair of iterators [note - these versions may not be -available, or may be available in a more limited form, depending -upon your compilers capabilities]:

    +template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2> +bool regex_search(const std::basic_string<charT, ST, SA>& s,  +                match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator>& m,  +                const reg_expression<charT, traits, Allocator2>& e,  +                unsigned flags = match_default);
    +

    The parameters for the main function version are as follows:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    iterator first

    +

    The starting position of the range to search.

    +

     

    +

     

    +

    iterator last

    +

    The ending position of the range to search.

    +

     

    +

     

    +

    match_results<iterator, Allocator>& m

    +

    An instance of match_results in which what matched will be reported. On exit if a match occurred then m[0] denotes the whole of the string that matched, m[0].first and m[0].second will be less than or equal to last. m[1] denotes the first sub-expression m[2] the second sub-expression and so on. If no match occurred then m[0].first = m[0].second = last.

    +

     

    +

     

    +

    const reg_expression<charT, traits, Allocator2>& e

    +

    The regular expression to search for.

    +

     

    +

     

    +

    unsigned flags = match_default

    +

    The flags that determine what gets matched, a combination of one or more match_flags enumerators.

    +

     

    -
    template <class charT, class Allocator, class traits, class Allocator2>
    -bool regex_search(const charT* str, 
    -                match_results<const charT*, Allocator>& m, 
    -                const reg_expression<charT, traits, Allocator2>& e, 
    -                unsigned flags = match_default);
    -
    -template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
    -bool regex_search(const std::basic_string<charT, ST, SA>& s, 
    -                match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator>& m, 
    -                const reg_expression<charT, traits, Allocator2>& e, 
    -                unsigned flags = match_default);
    - -

    The parameters for the main function version are as follows:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     iterator firstThe starting position of the - range to search. 
     iterator lastThe ending position of the - range to search. 
     match_results<iterator, - Allocator>& mAn instance of match_results - in which what matched will be reported. On exit if a - match occurred then m[0] denotes the whole of the string - that matched, m[0].first and m[0].second will be less - than or equal to last. m[1] denotes the first sub-expression - m[2] the second sub-expression and so on. If no match - occurred then m[0].first = m[0].second = last. 
     const reg_expression<charT, - traits, Allocator2>& eThe regular expression to - search for. 
     unsigned flags = match_defaultThe flags that determine - what gets matched, a combination of one or more match_flags enumerators. 
    - -


    - -

    Example: the following example, -takes the contents of a file in the form of a string, and -searches for all the C++ class declarations in the file. The code -will work regardless of the way that std::string is implemented, -for example it could easily be modified to work with the SGI rope -class, which uses a non-contiguous storage strategy.

    - -
    #include <string> 
    +


    +

    Example: the following example, takes the contents of a file in the form of a string, and searches for all the C++ class declarations in the file. The code will work regardless of the way that std::string is implemented, for example it could easily be modified to work with the SGI rope class, which uses a non-contiguous storage strategy.

    +
    #include <string> 
     #include <map> 
     #include <boost/regex.hpp> 
    -
    +
     // purpose: 
     // takes the contents of a file in the form of a string 
     // and searches for all the C++ class definitions, storing 
    -// their locations in a map of strings/int's 
    -typedef std::map<std::string, int, std::less<std::string> > map_type; 
    +// their locations in a map of strings/int's 
    +typedef std::map<std::string, int, std::less<std::string> > map_type; 
     
    -boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 
    -
    -void IndexClasses(map_type& m, const std::string& file) 
    +boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 
    +
    +void IndexClasses(map_type& m, const std::string& file) 
     { 
        std::string::const_iterator start, end; 
        start = file.begin(); 
        end = file.end(); 
           boost::match_results<std::string::const_iterator> what; 
    -   unsigned int flags = boost::match_default; 
    -   while(regex_search(start, end, what, expression, flags)) 
    -   { 
    -      // what[0] contains the whole string 
    -      // what[5] contains the class name. 
    -      // what[6] contains the template specialisation if any. 
    -      // add class name and position to map: 
    -      m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
    +   unsigned int flags = boost::match_default; 
    +   while(regex_search(start, end, what, expression, flags)) 
    +   { 
    +      // what[0] contains the whole string 
    +      // what[5] contains the class name. 
    +      // what[6] contains the template specialisation if any. 
    +      // add class name and position to map: 
    +      m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
                     what[5].first - file.begin(); 
    -      // update search position: 
    -      start = what[0].second; 
    -      // update flags: 
    -      flags |= boost::match_prev_avail; 
    +      // update search position: 
    +      start = what[0].second; 
    +      // update flags: 
    +      flags |= boost::match_prev_avail; 
           flags |= boost::match_not_bob; 
        } 
     }
    - 
    - -
    - -

    Algorithm regex_grep

    - -

    #include <boost/regex.hpp> -

    - -

     Regex_grep allows you to search through an iterator -range and locate all the (non-overlapping) matches with a given -regular expression. The function is declared as:

    - -
    template <class Predicate, class iterator, class charT, class traits, class Allocator>
    -unsigned int regex_grep(Predicate foo, 
    + 
    +


    +

    Algorithm regex_grep

    +

    #include <boost/regex.hpp>

    +

     Regex_grep allows you to search through an iterator range and locate all the (non-overlapping) matches with a given regular expression. The function is declared as:

    +
    template <class Predicate, class iterator, class charT, class traits, class Allocator>
    +unsigned int regex_grep(Predicate foo, 
                             iterator first, 
                             iterator last, 
    -                        const reg_expression<charT, traits, Allocator>& e, 
    -                        unsigned flags = match_default)
    +                        const reg_expression<charT, traits, Allocator>& e,  +                        unsigned flags = match_default)
    +

    The library also defines the following convenience versions, which take either a const charT*, or a const std::basic_string<>& in place of a pair of iterators [note - these versions may not be available, or may be available in a more limited form, depending upon your compilers capabilities]:

    +
    template <class Predicate, class charT, class Allocator, class traits>
    +unsigned int regex_grep(Predicate foo, 
    +              const charT* str, 
    +              const reg_expression<charT, traits, Allocator>& e, 
    +              unsigned flags = match_default);
     
    -

    The library also defines the following convenience versions, -which take either a const charT*, or a const std::basic_string<>& -in place of a pair of iterators [note - these versions may not be -available, or may be available in a more limited form, depending -upon your compilers capabilities]:

    +template <class Predicate, class ST, class SA, class Allocator, class charT, class traits> +unsigned int regex_grep(Predicate foo,  +              const std::basic_string<charT, ST, SA>& s,  +              const reg_expression<charT, traits, Allocator>& e,  +              unsigned flags = match_default);
    +

    The parameters for the primary version of regex_grep have the following meanings:

    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    foo

    +

    A predicate function object or function pointer, see below for more information.

    +

     

    +

     

    +

    first

    +

    The start of the range to search.

    +

     

    +

     

    +

    last

    +

    The end of the range to search.

    +

     

    +

     

    +

    e

    +

    The regular expression to search for.

    +

     

    +

     

    +

    flags

    +

    The flags that determine how matching is carried out, one of the match_flags enumerators.

    +

     

    -
    template <class Predicate, class charT, class Allocator, class traits>
    -unsigned int regex_grep(Predicate foo, 
    -              const charT* str, 
    -              const reg_expression<charT, traits, Allocator>& e, 
    -              unsigned flags = match_default);
    -
    -template <class Predicate, class ST, class SA, class Allocator, class charT, class traits>
    -unsigned int regex_grep(Predicate foo, 
    -              const std::basic_string<charT, ST, SA>& s, 
    -              const reg_expression<charT, traits, Allocator>& e, 
    -              unsigned flags = match_default);
    - -

    The parameters for the primary version of regex_grep have the -following meanings:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     fooA predicate function object - or function pointer, see below for more information. 
     firstThe start of the range to - search. 
     lastThe end of the range to - search. 
     eThe regular expression to - search for. 
     flagsThe flags that determine how - matching is carried out, one of the match_flags - enumerators. 
    - -

     The algorithm finds all of the non-overlapping matches -of the expression e, for each match it fills a match_results<iterator, Allocator> -structure, which contains information on what matched, and calls -the predicate foo, passing the match_results<iterator, -Allocator> as a single argument. If the predicate returns true, -then the grep operation continues, otherwise it terminates -without searching for further matches. The function returns the -number of matches found.

    - -

    The general form of the predicate is:

    - -
    struct grep_predicate
    +

     The algorithm finds all of the non-overlapping matches of the expression e, for each match it fills a match_results<iterator, Allocator> structure, which contains information on what matched, and calls the predicate foo, passing the match_results<iterator, Allocator> as a single argument. If the predicate returns true, then the grep operation continues, otherwise it terminates without searching for further matches. The function returns the number of matches found.

    +

    The general form of the predicate is:

    +
    struct grep_predicate
     {
    -   bool operator()(const match_results<iterator_type, expression_type::alloc_type>& m);
    -};
    - -

    For example the regular expression "a*b" would find -one match in the string "aaaaab" and two in the string -"aaabb".

    - -

    Remember this algorithm can be used for a lot more than -implementing a version of grep, the predicate can be and do -anything that you want, grep utilities would output the results -to the screen, another program could index a file based on a -regular expression and store a set of bookmarks in a list, or a -text file conversion utility would output to file. The results of -one regex_grep can even be chained into another regex_grep to -create recursive parsers.

    - -

    Example: convert the -example from regex_search to use regex_grep instead: -

    - -
    #include <string> 
    +   bool operator()(const match_results<iterator_type, expression_type::alloc_type>& m);
    +};
    +

    For example the regular expression "a*b" would find one match in the string "aaaaab" and two in the string "aaabb".

    +

    Remember this algorithm can be used for a lot more than implementing a version of grep, the predicate can be and do anything that you want, grep utilities would output the results to the screen, another program could index a file based on a regular expression and store a set of bookmarks in a list, or a text file conversion utility would output to file. The results of one regex_grep can even be chained into another regex_grep to create recursive parsers.

    +

    Example: convert the example from regex_search to use regex_grep instead:

    +
    #include <string> 
     #include <map> 
     #include <boost/regex.hpp> 
     
    -// IndexClasses: 
    +// IndexClasses: 
     // takes the contents of a file in the form of a string 
     // and searches for all the C++ class definitions, storing 
    -// their locations in a map of strings/int's 
    +// their locations in a map of strings/int's 
    +
    +typedef std::map<std::string, int, std::less<std::string> > map_type; 
     
    -typedef std::map<std::string, int, std::less<std::string> > map_type; 
    -
    -boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" 
    -                 "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)" 
    -                 "[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 
    -
    -class IndexClassesPred 
    +boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" 
    +                 "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)" 
    +                 "[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 
    +
    +class IndexClassesPred 
     { 
        map_type& m; 
    -   std::string::const_iterator base; 
    -public: 
    +   std::string::const_iterator base; 
    +public: 
        IndexClassesPred(map_type& a, std::string::const_iterator b) : m(a), base(b) {} 
    -   bool operator()(const match_results<std::string::const_iterator, regex::alloc_type>& what) 
    +   bool operator()(const match_results<std::string::const_iterator, regex::alloc_type>& what) 
        { 
    -      // what[0] contains the whole string 
    -      // what[5] contains the class name. 
    -      // what[6] contains the template specialisation if any. 
    -      // add class name and position to map: 
    -      m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
    +      // what[0] contains the whole string 
    +      // what[5] contains the class name. 
    +      // what[6] contains the template specialisation if any. 
    +      // add class name and position to map: 
    +      m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
                     what[5].first - base; 
    -      return true; 
    +      return true; 
        } 
    -}; 
    -
    -void IndexClasses(map_type& m, const std::string& file) 
    +}; 
    +
    +void IndexClasses(map_type& m, const std::string& file) 
     { 
        std::string::const_iterator start, end; 
        start = file.begin(); 
        end = file.end(); 
        regex_grep(IndexClassesPred(m, start), start, end, expression); 
    -} 
    - -

    Example: Use regex_grep -to call a global callback function:

    - -
    #include <string> 
    +} 
    +

    Example: Use regex_grep to call a global callback function:

    +
    #include <string> 
     #include <map> 
     #include <boost/regex.hpp> 
     
    -// purpose: 
    +// purpose: 
     // takes the contents of a file in the form of a string 
     // and searches for all the C++ class definitions, storing 
    -// their locations in a map of strings/int's 
    +// their locations in a map of strings/int's 
    +
    +typedef std::map<std::string, int, std::less<std::string> > map_type; 
     
    -typedef std::map<std::string, int, std::less<std::string> > map_type; 
    -
    -boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 
    +boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); 
     
     map_type class_index; 
     std::string::const_iterator base; 
     
    -bool grep_callback(const boost::match_results<std::string::const_iterator, boost::regex::alloc_type>& what) 
    +bool grep_callback(const boost::match_results<std::string::const_iterator, boost::regex::alloc_type>& what) 
     { 
    -   // what[0] contains the whole string 
    -   // what[5] contains the class name. 
    -   // what[6] contains the template specialisation if any. 
    -   // add class name and position to map: 
    -   class_index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
    +   // what[0] contains the whole string 
    +   // what[5] contains the class name. 
    +   // what[6] contains the template specialisation if any. 
    +   // add class name and position to map: 
    +   class_index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
                     what[5].first - base; 
    -   return true; 
    +   return true; 
     } 
    -
    -void IndexClasses(const std::string& file) 
    +
    +void IndexClasses(const std::string& file) 
     { 
        std::string::const_iterator start, end; 
        start = file.begin(); 
    @@ -1701,572 +1492,622 @@ void IndexClasses(const std::string& file)
        base = start; 
        regex_grep(grep_callback, start, end, expression, match_default); 
     }
    -  
    - -

    Example: use regex_grep -to call a class member function, use the standard library -adapters std::mem_fun and std::bind1st to convert -the member function into a predicate:

    - -
    #include <string> 
    +  
    +

    Example: use regex_grep to call a class member function, use the standard library adapters std::mem_fun and std::bind1st to convert the member function into a predicate:

    +
    #include <string> 
     #include <map> 
     #include <boost/regex.hpp> 
    -#include <functional> 
    -
    +#include <functional> 
    +
     // purpose: 
     // takes the contents of a file in the form of a string 
     // and searches for all the C++ class definitions, storing 
     // their locations in a map of strings/int's 
     
    -typedef std::map<std::string, int, std::less<std::string> > map_type; 
    -
    -class class_index 
    +typedef std::map<std::string, int, std::less<std::string> > map_type; 
    +
    +class class_index 
     { 
        boost::regex expression; 
        map_type index; 
        std::string::const_iterator base; 
    -   bool grep_callback(boost::match_results<std::string::const_iterator, boost::regex::alloc_type> what); 
    -public: 
    -   void IndexClasses(const std::string& file); 
    +   bool grep_callback(boost::match_results<std::string::const_iterator, boost::regex::alloc_type> what); 
    +public: 
    +   void IndexClasses(const std::string& file); 
        class_index() 
           : index(), 
    -        expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" 
    -                   "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" 
    -                   "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" 
    -                   "(\\{|:[^;\\{()]*\\{)" 
    -                   ){} 
    -}; 
    -
    -bool class_index::grep_callback(boost::match_results<std::string::const_iterator, boost::regex::alloc_type> what) 
    +        expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" 
    +                   "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" 
    +                   "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" 
    +                   "(\\{|:[^;\\{()]*\\{)" 
    +                   ){} 
    +}; 
    +
    +bool class_index::grep_callback(boost::match_results<std::string::const_iterator, boost::regex::alloc_type> what) 
     { 
    -   // what[0] contains the whole string 
    -   // what[5] contains the class name. 
    -   // what[6] contains the template specialisation if any. 
    -   // add class name and position to map: 
    -   index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
    +   // what[0] contains the whole string 
    +   // what[5] contains the class name. 
    +   // what[6] contains the template specialisation if any. 
    +   // add class name and position to map: 
    +   index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
                    what[5].first - base; 
    -   return true; 
    +   return true; 
     } 
     
    -void class_index::IndexClasses(const std::string& file) 
    +void class_index::IndexClasses(const std::string& file) 
     { 
        std::string::const_iterator start, end; 
        start = file.begin(); 
        end = file.end(); 
        base = start; 
    -   regex_grep(std::bind1st(std::mem_fun(&class_index::grep_callback), this), 
    +   regex_grep(std::bind1st(std::mem_fun(&class_index::grep_callback), this), 
                   start, 
                   end, 
                   expression); 
     } 
    -  
    - -

    Finally, C++ Builder -users can use C++ Builder's closure type as a callback argument:

    - -
    #include <string> 
    +  
    +

    Finally, C++ Builder users can use C++ Builder's closure type as a callback argument:

    +
    #include <string> 
     #include <map> 
     #include <boost/regex.hpp> 
    -#include <functional> 
    -
    +#include <functional> 
    +
     // purpose: 
     // takes the contents of a file in the form of a string 
     // and searches for all the C++ class definitions, storing 
     // their locations in a map of strings/int's 
     
    -typedef std::map<std::string, int, std::less<std::string> > map_type; 
    -class class_index 
    +typedef std::map<std::string, int, std::less<std::string> > map_type; 
    +class class_index 
     { 
        boost::regex expression; 
        map_type index; 
        std::string::const_iterator base; 
    -   typedef boost::match_results<std::string::const_iterator, boost::regex::alloc_type> arg_type; 
    -   bool grep_callback(const arg_type& what); 
    -public: 
    -   typedef bool (__closure* grep_callback_type)(const arg_type&); 
    -   void IndexClasses(const std::string& file); 
    +   typedef boost::match_results<std::string::const_iterator, boost::regex::alloc_type> arg_type; 
    +   bool grep_callback(const arg_type& what); 
    +public: 
    +   typedef bool (__closure* grep_callback_type)(const arg_type&); 
    +   void IndexClasses(const std::string& file); 
        class_index() 
           : index(), 
    -        expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" 
    -                   "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" 
    -                   "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" 
    -                   "(\\{|:[^;\\{()]*\\{)" 
    -                   ){} 
    +        expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" 
    +                   "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" 
    +                   "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" 
    +                   "(\\{|:[^;\\{()]*\\{)" 
    +                   ){} 
     }; 
     
    -bool class_index::grep_callback(const arg_type& what) 
    -{ 
    -   // what[0] contains the whole string    
    -// what[5] contains the class name.    
    -// what[6] contains the template specialisation if any.    
    -// add class name and position to map:    
    +bool class_index::grep_callback(const arg_type& what) 
    +{ 
    +   // what[0] contains the whole string    
    +// what[5] contains the class name.    
    +// what[6] contains the template specialisation if any.    
    +// add class name and position to map:    
     index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = 
                    what[5].first - base; 
    -   return true; 
    +   return true; 
     } 
     
    -void class_index::IndexClasses(const std::string& file) 
    +void class_index::IndexClasses(const std::string& file) 
     { 
        std::string::const_iterator start, end; 
        start = file.begin(); 
        end = file.end(); 
        base = start; 
    -   class_index::grep_callback_type cl = &(this->grep_callback); 
    +   class_index::grep_callback_type cl = &(this->grep_callback); 
        regex_grep(cl, 
                 start, 
                 end, 
                 expression); 
    -} 
    -
    - -
    - -

     Algorithm regex_format

    - -

    #include <boost/regex.hpp> -

    - -

    The algorithm regex_format takes the results of a match and -creates a new string based upon a format string, regex_format -can be used for search and replace operations:

    - -
    template <class OutputIterator, class iterator, class Allocator, class charT>
    +} 
    +


    +

     Algorithm regex_format

    +

    #include <boost/regex.hpp>

    +

    The algorithm regex_format takes the results of a match and creates a new string based upon a format string, regex_format can be used for search and replace operations:

    +
    template <class OutputIterator, class iterator, class Allocator, class charT>
     OutputIterator regex_format(OutputIterator out,
    -                            const match_results<iterator, Allocator>& m,
    -                            const charT* fmt,
    -                            unsigned flags = 0);
    -
    -template <class OutputIterator, class iterator, class Allocator, class charT>
    +                            const match_results<iterator, Allocator>& m,
    +                            const charT* fmt,
    +                            unsigned flags = 0);
    +
    +template <class OutputIterator, class iterator, class Allocator, class charT>
     OutputIterator regex_format(OutputIterator out,
    -                            const match_results<iterator, Allocator>& m,
    -                            const std::basic_string<charT>& fmt,
    -                            unsigned flags = 0);
    - -

    The library also defines the following convenience variation -of regex_format, which returns the result directly as a string, -rather than outputting to an iterator [note - this version may -not be available, or may be available in a more limited form, -depending upon your compilers capabilities]:

    - -
    template <class iterator, class Allocator, class charT>
    +                            const match_results<iterator, Allocator>& m,
    +                            const std::basic_string<charT>& fmt,
    +                            unsigned flags = 0);
    +

    The library also defines the following convenience variation of regex_format, which returns the result directly as a string, rather than outputting to an iterator [note - this version may not be available, or may be available in a more limited form, depending upon your compilers capabilities]:

    +
    template <class iterator, class Allocator, class charT>
     std::basic_string<charT> regex_format
    -                                 (const match_results<iterator, Allocator>& m, 
    -                                  const charT* fmt,
    -                                  unsigned flags = 0);
    +                                 (const match_results<iterator, Allocator>& m, 
    +                                  const charT* fmt,
    +                                  unsigned flags = 0);
     
    -template <class iterator, class Allocator, class charT>
    +template <class iterator, class Allocator, class charT>
     std::basic_string<charT> regex_format
    -                                 (const match_results<iterator, Allocator>& m, 
    -                                  const std::basic_string<charT>& fmt,
    -                                  unsigned flags = 0);
    +                                 (const match_results<iterator, Allocator>& m,  +                                  const std::basic_string<charT>& fmt, +                                  unsigned flags = 0);
    +

    Parameters to the main version of the function are passed as follows:

    + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    OutputIterator out

    +

    An output iterator type, the output string is sent to this iterator. Typically this would be a std::ostream_iterator.

    +

     

    +

     

    +

    const match_results<iterator, Allocator>& m

    +

    An instance of match_results<> obtained from one of the matching algorithms above, and denoting what matched.

    +

     

    +

     

    +

    const charT* fmt

    +

    A format string that determines how the match is transformed into the new string.

    +

     

    +

     

    +

    unsigned flags

    +

    Optional flags which describe how the format string is to be interpreted.

    +

     

    -

    Parameters to the main version of the function are passed as -follows:

    +

    Format flags are defined as follows:

    + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    format_all

    +

    Enables all syntax options (perl-like plus extentions).

    +

     

    +

     

    +

    format_sed

    +

    Allows only a sed-like syntax.

    +

     

    +

     

    +

    format_perl

    +

    Allows only a perl-like syntax.

    +

     

    +

     

    +

    format_no_copy

    +

    Disables copying of unmatched sections to the output string during regex_merge operations.

    +

     

    - - - - - - - - - - - - - - - - - - - - - - - - - -
     OutputIterator outAn output iterator type, the - output string is sent to this iterator. Typically this - would be a std::ostream_iterator. 
     const match_results<iterator, - Allocator>& mAn instance of match_results<> - obtained from one of the matching algorithms above, and - denoting what matched. 
     const charT* fmtA format string that - determines how the match is transformed into the new - string. 
     unsigned flagsOptional flags which - describe how the format string is to be interpreted. 
    - -

    Format flags are defined as follows: -

    - - - - - - - - - - - - - - - - - - - - - - - - - - -
     format_allEnables all syntax options (perl-like - plus extentions). 
     format_sedAllows only a sed-like - syntax. 
     format_perlAllows only a perl-like - syntax. 
     format_no_copyDisables copying of - unmatched sections to the output string during regex_merge operations. 
    - -


    - -

    The format string syntax (and available options) is described -more fully under format -strings.
    -

    - -
    - -

    Algorithm regex_merge

    - -

    #include <boost/regex.hpp> -

    - -

    The algorithm regex_merge is a combination of regex_grep and regex_format. -That is, it greps through the string finding all the matches to -the regular expression, for each match it then calls regex_format -to format the string and sends the result to the output iterator. -Sections of text that do not match are copied to the output -unchanged only if the flags parameter does not have the flag format_no_copy set.

    - -
    template <class OutputIterator, class iterator, class traits, class Allocator, class charT>
    +


    +

    The format string syntax (and available options) is described more fully under format strings.

    +


    +

    Algorithm regex_merge

    +

    #include <boost/regex.hpp>

    +

    The algorithm regex_merge is a combination of regex_grep and regex_format. That is, it greps through the string finding all the matches to the regular expression, for each match it then calls regex_format to format the string and sends the result to the output iterator. Sections of text that do not match are copied to the output unchanged only if the flags parameter does not have the flag format_no_copy set.

    +
    template <class OutputIterator, class iterator, class traits, class Allocator, class charT>
     OutputIterator regex_merge(OutputIterator out, 
                               iterator first,
                               iterator last,
    -                          const reg_expression<charT, traits, Allocator>& e, 
    -                          const charT* fmt, 
    -                          unsigned int flags = match_default);
    +                          const reg_expression<charT, traits, Allocator>& e, 
    +                          const charT* fmt, 
    +                          unsigned int flags = match_default);
     
    -template <class OutputIterator, class iterator, class traits, class Allocator, class charT>
    +template <class OutputIterator, class iterator, class traits, class Allocator, class charT>
     OutputIterator regex_merge(OutputIterator out, 
                                iterator first,
                                iterator last,
    -                           const reg_expression<charT, traits, Allocator>& e, 
    +                           const reg_expression<charT, traits, Allocator>& e, 
                                std::basic_string<charT>& fmt, 
    -                           unsigned int flags = match_default);
    +                           unsigned int flags = match_default);
    +

    The library also defines the following convenience variation of regex_merge, which returns the result directly as a string, rather than outputting to an iterator [note - this version may not be available, or may be available in a more limited form, depending upon your compilers capabilities]:

    +
    template <class traits, class Allocator, class charT>
    +std::basic_string<charT> regex_merge(const std::basic_string<charT>& text,
    +                                     const reg_expression<charT, traits, Allocator>& e, 
    +                                     const charT* fmt, 
    +                                     unsigned int flags = match_default);
     
    -

    The library also defines the following convenience variation -of regex_merge, which returns the result directly as a string, -rather than outputting to an iterator [note - this version may -not be available, or may be available in a more limited form, -depending upon your compilers capabilities]:

    - -
    template <class traits, class Allocator, class charT>
    -std::basic_string<charT> regex_merge(const std::basic_string<charT>& text,
    -                                     const reg_expression<charT, traits, Allocator>& e, 
    -                                     const charT* fmt, 
    -                                     unsigned int flags = match_default);
    -
    -template <class traits, class Allocator, class charT>
    -std::basic_string<charT> regex_merge(const std::basic_string<charT>& text,
    -                                     const reg_expression<charT, traits, Allocator>& e, 
    -                                     const std::basic_string<charT>& fmt, 
    -                                     unsigned int flags = match_default);
    - -

    Parameters to the main version of the function are passed as -follows:

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     OutputIterator outAn output iterator type, the - output string is sent to this iterator. Typically this - would be a std::ostream_iterator. 
     iterator firstThe start of the range of - text to grep. 
     iterator lastThe end of the range of text - to grep. 
     const reg_expression<charT, - traits, Allocator>& eThe expression to search for. 
     const charT* fmtThe format string to be - applied to sections of text that match. 
     unsigned int - flags = match_defaultFlags which determine how - the expression is matched - see match_flags, - and how the format string is interpreted - see format_flags. 
    - -

    Example: the following example -takes C/C++ source code as input, and outputs syntax highlighted -HTML code.

    - -
    -#include <fstream>
    -#include <sstream>
    -#include <string>
    -#include <iterator>
    -#include <boost/regex.hpp>
    -#include <fstream>
    -#include <iostream>
    -
    -// purpose:
    -// takes the contents of a file and transform to
    -// syntax highlighted code in html format
    +template <class traits, class Allocator, class charT>
    +std::basic_string<charT> regex_merge(const std::basic_string<charT>& text,
    +                                     const reg_expression<charT, traits, Allocator>& e, 
    +                                     const std::basic_string<charT>& fmt, 
    +                                     unsigned int flags = match_default);
    +

    Parameters to the main version of the function are passed as follows:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

     

    +

    OutputIterator out

    +

    An output iterator type, the output string is sent to this iterator. Typically this would be a std::ostream_iterator.

    +

     

    +

     

    +

    iterator first

    +

    The start of the range of text to grep.

    +

     

    +

     

    +

    iterator last

    +

    The end of the range of text to grep.

    +

     

    +

     

    +

    const reg_expression<charT, traits, Allocator>& e

    +

    The expression to search for.

    +

     

    +

     

    +

    const charT* fmt

    +

    The format string to be applied to sections of text that match.

    +

     

    +

     

    +

    unsigned int flags = match_default

    +

    Flags which determine how the expression is matched - see match_flags, and how the format string is interpreted - see format_flags.

    +

     

    +

    Example: the following example takes C/C++ source code as input, and outputs syntax highlighted HTML code.

    +
    +#include <fstream>
    +#include <sstream>
    +#include <string>
    +#include <iterator>
    +#include <boost/regex.hpp>
    +#include <fstream>
    +#include <iostream>
    +
    +// purpose:
    +// takes the contents of a file and transform to
    +// syntax highlighted code in html format
    +
     boost::regex e1, e2;
    -extern const char* expression_text;
    -extern const char* format_string;
    -extern const char* pre_expression;
    -extern const char* pre_format;
    -extern const char* header_text;
    -extern const char* footer_text;
    +extern const char* expression_text;
    +extern const char* format_string;
    +extern const char* pre_expression;
    +extern const char* pre_format;
    +extern const char* header_text;
    +extern const char* footer_text;
     
    -void load_file(std::string& s, std::istream& is)
    +void load_file(std::string& s, std::istream& is)
     {
        s.erase();
        s.reserve(is.rdbuf()->in_avail());
    -   char c;
    -   while(is.get(c))
    +   char c;
    +   while(is.get(c))
        {
    -      if(s.capacity() == s.size())
    -         s.reserve(s.capacity() * 3);
    -      s.append(1, c);
    +      if(s.capacity() == s.size())
    +         s.reserve(s.capacity() * 3);
    +      s.append(1, c);
        }
     }
     
    -int main(int argc, const char** argv)
    +int main(int argc, const char** argv)
     {
        e1.set_expression(expression_text);
        e2.set_expression(pre_expression);
    -   for(int i = 1; i < argc; ++i)
    +   for(int i = 1; i < argc; ++i)
        {
    -      std::cout << "Processing file " << argv[i] << std::endl;
    +      std::cout << "Processing file " << argv[i] << std::endl;
           std::ifstream fs(argv[i]);
           std::string in;
           load_file(in, fs);
    -      std::string out_name(std::string(argv[i]) + std::string(".htm"));
    +      std::string out_name(std::string(argv[i]) + std::string(".htm"));
           std::ofstream os(out_name.c_str());
           os << header_text;
    -      // strip '<' and '>' first by outputting to a
    -      // temporary string stream
    -      std::ostringstream t(std::ios::out | std::ios::binary);
    -      std::ostream_iterator<char, char> oi(t);
    +      // strip '<' and '>' first by outputting to a
    +      // temporary string stream
    +      std::ostringstream t(std::ios::out | std::ios::binary);
    +      std::ostream_iterator<char, char> oi(t);
           boost::regex_merge(oi, in.begin(), in.end(), e2, pre_format);
    -      // then output to final output stream
    -      // adding syntax highlighting:
    -      std::string s(t.str());
    -      std::ostream_iterator<char, char> out(os);
    +      // then output to final output stream
    +      // adding syntax highlighting:
    +      std::string s(t.str());
    +      std::ostream_iterator<char, char> out(os);
           boost::regex_merge(out, s.begin(), s.end(), e1, format_string);
           os << footer_text;
        }
    -   return 0;
    +   return 0;
     }
     
    -extern const char* pre_expression = "(<)|(>)|\\r";
    -extern const char* pre_format = "(?1<)(?2>)";
    +extern const char* pre_expression = "(<)|(>)|\\r";
    +extern const char* pre_format = "(?1<)(?2>)";
     
     
    -const char* expression_text = // preprocessor directives: index 1
    -                              "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
    -                              // comment: index 2
    -                              "(//[^\\n]*|/\\*.*?\\*/)|"
    -                              // literals: index 3
    -                              "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
    -                              // string literals: index 4
    -                              "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
    -                              // keywords: index 5
    -                              "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
    -                              "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
    -                              "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
    -                              "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
    -                              "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
    -                              "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
    -                              "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
    -                              "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
    -                              "|using|virtual|void|volatile|wchar_t|while)\\>"
    -                              ;
    +const char* expression_text = // preprocessor directives: index 1
    +                              "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
    +                              // comment: index 2
    +                              "(//[^\\n]*|/\\*.*?\\*/)|"
    +                              // literals: index 3
    +                              "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
    +                              // string literals: index 4
    +                              "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
    +                              // keywords: index 5
    +                              "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
    +                              "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
    +                              "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
    +                              "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
    +                              "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
    +                              "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
    +                              "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
    +                              "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
    +                              "|using|virtual|void|volatile|wchar_t|while)\\>"
    +                              ;
     
    -const char* format_string = "(?1<font color=\"#008040\">$&</font>)"
    -                            "(?2<I><font color=\"#000080\">$&</font></I>)"
    -                            "(?3<font color=\"#0000A0\">$&</font>)"
    -                            "(?4<font color=\"#0000FF\">$&</font>)"
    -                            "(?5<B>$&</B>)";
    +const char* format_string = "(?1<font color=\"#008040\">$&</font>)"
    +                            "(?2<I><font color=\"#000080\">$&</font></I>)"
    +                            "(?3<font color=\"#0000A0\">$&</font>)"
    +                            "(?4<font color=\"#0000FF\">$&</font>)"
    +                            "(?5<B>$&</B>)";
     
    -const char* header_text = "<HTML>\n<HEAD>\n"
    -                          "<TITLE>Auto-generated html formated source</TITLE>\n"
    -                          "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n"
    -                          "</HEAD>\n"
    -                          "<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n"
    -                          "<P> </P>\n<PRE>";
    +const char* header_text = "<HTML>\n<HEAD>\n"
    +                          "<TITLE>Auto-generated html formated source</TITLE>\n"
    +                          "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n"
    +                          "</HEAD>\n"
    +                          "<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n"
    +                          "<P> </P>\n<PRE>";
     
    -const char* footer_text = "</PRE>\n</BODY>\n\n";
    -
    -
    - -
    - -

    Algorithm regex_split

    - -

    #include <boost/regex.hpp> -

    - -

    Algorithm regex_split performs a similar operation to the perl -split operation, and comes in three overloaded forms:

    - -
    template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2>
    +const char* footer_text = "</PRE>\n</BODY>\n\n";
    +


    +

    Algorithm regex_split

    +

    #include <boost/regex.hpp>

    +

    Algorithm regex_split performs a similar operation to the perl split operation, and comes in three overloaded forms:

    +
    template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2>
     std::size_t regex_split(OutputIterator out, 
                             std::basic_string<charT, Traits1, Alloc1>& s, 
    -                        const reg_expression<charT, Traits2, Alloc2>& e,
    -                        unsigned flags,
    +                        const reg_expression<charT, Traits2, Alloc2>& e,
    +                        unsigned flags,
                             std::size_t max_split);
     
    -template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2>
    +template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2>
     std::size_t regex_split(OutputIterator out, 
                             std::basic_string<charT, Traits1, Alloc1>& s, 
    -                        const reg_expression<charT, Traits2, Alloc2>& e,
    -                        unsigned flags = match_default);
    +                        const reg_expression<charT, Traits2, Alloc2>& e,
    +                        unsigned flags = match_default);
     
    -template <class OutputIterator, class charT, class Traits1, class Alloc1>
    +template <class OutputIterator, class charT, class Traits1, class Alloc1>
     std::size_t regex_split(OutputIterator out, 
    -                        std::basic_string<charT, Traits1, Alloc1>& s);
    - -

    Each version takes an output-iterator for output, and a string -for input. If the expression contains no marked sub-expressions, -then the algorithm writes one string onto the output-iterator for -each section of input that does not match the expression. If the -expression does contain marked sub-expressions, then each time a -match is found, one string for each marked sub-expression will be -written to the output-iterator. No more than max_split strings -will be written to the output-iterator. Before returning, all the -input processed will be deleted from the string s (if max_split -is not reached then all of s will be deleted). Returns -the number of strings written to the output-iterator. If the -parameter max_split is not specified then it defaults to -UINT_MAX. If no expression is specified, then it defaults to -"\s+", and splitting occurs on whitespace.

    - -

    Example: the following -function will split the input string into a series of tokens, and -remove each token from the string s:

    - -
    unsigned tokenise(std::list<std::string>& l, std::string& s)
    +                        std::basic_string<charT, Traits1, Alloc1>& s);
    +

    Each version takes an output-iterator for output, and a string for input. If the expression contains no marked sub-expressions, then the algorithm writes one string onto the output-iterator for each section of input that does not match the expression. If the expression does contain marked sub-expressions, then each time a match is found, one string for each marked sub-expression will be written to the output-iterator. No more than max_split strings will be written to the output-iterator. Before returning, all the input processed will be deleted from the string s (if max_split is not reached then all of s will be deleted). Returns the number of strings written to the output-iterator. If the parameter max_split is not specified then it defaults to UINT_MAX. If no expression is specified, then it defaults to "\s+", and splitting occurs on whitespace.

    +

    Example: the following function will split the input string into a series of tokens, and remove each token from the string s:

    +
    unsigned tokenise(std::list<std::string>& l, std::string& s)
     {
    -   return boost::regex_split(std::back_inserter(l), s);
    -}
    - -

    Example: the following -short program will extract all of the URL's from a html file, and -print them out to cout:

    - -
    #include <list>
    +   return boost::regex_split(std::back_inserter(l), s);
    +}
    +

    Example: the following short program will extract all of the URL's from a html file, and print them out to cout:

    +
    #include <list>
     #include <fstream>
     #include <iostream>
     #include <boost/regex.hpp>
    -
    -boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
    +
    +boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
                    boost::regbase::normal | boost::regbase::icase);
     
    -void load_file(std::string& s, std::istream& is)
    +void load_file(std::string& s, std::istream& is)
     {
        s.erase();
    -   //
    +   //
        // attempt to grow string buffer to match file size,
    -   // this doesn't always work...
    -   s.reserve(is.rdbuf()-&gtin_avail());
    -   char c;
    -   while(is.get(c))
    +   // this doesn't always work...
    +   s.reserve(is.rdbuf()-&gtin_avail());
    +   char c;
    +   while(is.get(c))
        {
    -      // use logarithmic growth stategy, in case
    -      // in_avail (above) returned zero:
    -      if(s.capacity() == s.size())
    +      // use logarithmic growth stategy, in case
    +      // in_avail (above) returned zero:
    +      if(s.capacity() == s.size())
              s.reserve(s.capacity() * 3);
           s.append(1, c);
        }
     }
     
     
    -int main(int argc, char** argv)
    +int main(int argc, char** argv)
     {
        std::string s;
        std::list<std::string> l;
     
    -   for(int i = 1; i < argc; ++i)
    +   for(int i = 1; i < argc; ++i)
        {
    -      std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
    +      std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
           s.erase();
           std::ifstream is(argv[i]);
           load_file(s, is);
           boost::regex_split(std::back_inserter(l), s, e);
    -      while(l.size())
    +      while(l.size())
           {
              s = *(l.begin());
              l.pop_front();
              std::cout << s << std::endl;
           }
        }
    -   return 0;
    -}
    + return 0; +}
    +


    +

    Partial Matches

    +

    The match-flag match_partial can be passed to the following algorithms: regex_match, regex_search, and regex_grep. When used it indicates that partial as well as full matches should be found. A partial match is one that matched one or more characters at the end of the text input, but did not match all of the regular expression (although it may have done so had more input been available). Partial matches are typically used when either validating data input (checking each character as it is entered on the keyboard), or when searching texts that are either too long to load into memory (or even into a memory mapped file), or are of indeterminate length (for example the source may be a socket or similar). Partial and full matches can be differentiated as shown in the following table (the variable M represents an instance of match_results<> as filled in by regex_match, regex_search or regex_grep):
    +

    + + + + + + + + + + + + + + + + + + + + + + + + + +
      +

    Result

    +

    M[0].matched

    +

    M[0].first

    +

    M[0].second

    +

    No match

    +

    False

    +

    Undefined

    +

    Undefined

    +

    Undefined

    +

    Partial match

    +

    True

    +

    False

    +

    Start of partial match.

    +

    End of partial match (end of text).

    +

    Full match

    +

    True

    +

    True

    +

    Start of full match.

    +

    End of full match.

    -
    +

    The following example tests to see whether the text could be a valid credit card number, as the user presses a key, the character entered would be added to the string being built up, and passed to is_possible_card_number. If this returns true then the text could be a valid card number, so the user interface's OK button would be enabled. If it returns false, then this is not yet a valid card number, but could be with more input, so the user interface would disable the OK button. Finally, if the procedure throws an exception the input could never become a valid number, and the inputted character must be discarded, and a suitable error indication displayed to the user.

    +
    #include <string>
    +#include <iostream>
    +#include <boost/regex.hpp>
     
    -

    Copyright Dr -John Maddock 1998-2000 all rights reserved.

    - - +boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})"); + +bool is_possible_card_number(const std::string& input) +{ + // + // return false for partial match, true for full match, or throw for + // impossible match based on what we have so far... + boost::match_results<std::string::const_iterator> what; + if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial)) + { + // the input so far could not possibly be valid so reject it: + throw std::runtime_error("Invalid data entered - this could not possibly be a valid card number"); + } + // OK so far so good, but have we finished? + if(what[0].matched) + { + // excellent, we have a result: + return true; + } + // what we have so far is only a partial match... + return false; +}
    +

    In the following example, text input is taken from a stream containing an unknown amount of text; this example simply counts the number of html tags encountered in the stream. The text is loaded into a buffer and searched a part at a time, if a partial match was encountered, then the partial match gets searched a second time as the start of the next batch of text:

    +
    #include <iostream>
    +#include <fstream>
    +#include <sstream>
    +#include <string>
    +#include <boost/regex.hpp>
    +
    +// match some kind of html tag:
    +boost::regex e("<[^>]*>");
    +// count how many:
    +unsigned int tags = 0;
    +// saved position of partial match:
    +char* next_pos = 0;
    +
    +bool grep_callback(const boost::match_results<char*>& m)
    +{
    +   if(m[0].matched == false)
    +   {
    +      // save position and return:
    +      next_pos = m[0].first;
    +   }
    +   else
    +      ++tags;
    +   return true;
    +}
    +
    +void search(std::istream& is)
    +{
    +   char buf[4096];
    +   next_pos = buf + sizeof(buf);
    +   bool have_more = true;
    +   while(have_more)
    +   {
    +      // how much do we copy forward from last try:
    +      unsigned leftover = (buf + sizeof(buf)) - next_pos;
    +      // and how much is left to fill:
    +      unsigned size = next_pos - buf;
    +      // copy forward whatever we have left:
    +      memcpy(buf, next_pos, leftover);
    +      // fill the rest from the stream:
    +      unsigned read = is.readsome(buf + leftover, size);
    +      // check to see if we've run out of text:
    +      have_more = read == size;
    +      // reset next_pos:
    +      next_pos = buf + sizeof(buf);
    +      // and then grep:
    +      boost::regex_grep(grep_callback,
    +                        buf,
    +                        buf + read + leftover,
    +                        e,
    +                        boost::match_default | boost::match_partial);
    +   }
    +}
    +


    +

    Copyright Dr John Maddock 1998-2001 all rights reserved.

    +