diff --git a/appendix.htm b/appendix.htm
deleted file mode 100644
index ba0b3bdf..00000000
--- a/appendix.htm
+++ /dev/null
@@ -1,1304 +0,0 @@
-
-
-
-
-
-
-Regex++, Appendices
-
-
-
-
-
-
-
-
-
-
- Regex++, Appendices.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
- Appendix 1: Implementation notes
-
-This is the first port of regex++ to the boost library, and is
-based on regex++ 2.x, see changes.txt for a full list of changes
-from the previous version. There are no known functionality bugs
-except that POSIX style equivalence classes are only guaranteed
-correct if the Win32 localization model is used (the default for
-Win32 builds of the library).
-
-There are some aspects of the code that C++ puritans will
-consider to be poor style, in particular the use of goto in some
-of the algorithms. The code could be cleaned up, by changing to a
-recursive implementation, although it is likely to be slower in
-that case.
-
-The performance of the algorithms should be satisfactory in
-most cases. For example the times taken to match the ftp response
-expression "^([0-9]+)(\-| |$)(.*)$" against the string
-"100- this is a line of ftp response which contains a
-message string" are: BSD implementation 450 micro seconds,
-GNU implementation 271 micro seconds, regex++ 127 micro seconds (Pentium
-P90, Win32 console app under MS Windows 95).
-
-However it should be noted that there are some "pathological"
-expressions which may require exponential time for matching;
-these all involve nested repetition operators, for example
-attempting to match the expression "(a*a)*b" against N
-letter a's requires time proportional to 2 N .
-These expressions can (almost) always be rewritten in such a way
-as to avoid the problem, for example "(a*a)*b" could be
-rewritten as "a*b" which requires only time linearly
-proportional to N to solve. In the general case, non-nested
-repeat expressions require time proportional to N 2 ,
-however if the clauses are mutually exclusive then they can be
-matched in linear time - this is the case with "a*b",
-for each character the matcher will either match an "a"
-or a "b" or fail, where as with "a*a" the
-matcher can't tell which branch to take (the first "a"
-or the second) and so has to try both. Be careful how you
-write your regular expressions and avoid nested repeats if you
-can! New to this version, some previously pathological cases have
-been fixed - in particular searching for expressions which
-contain leading repeats and/or leading literal strings should be
-much faster than before. Literal strings are now searched for
-using the Knuth/Morris/Pratt algorithm (this is used in
-preference to the Boyer/More algorithm because it allows the
-tracking of newline characters).
-
-Some aspects of the POSIX regular expression syntax are
-implementation defined:
-
-
- The "leftmost-longest" rule for determining
- what matches is ambiguous, this library takes the "obvious"
- interpretation: find the leftmost match, then maximize
- the length of each sub-expression in turn with lower
- indexed sub-expressions taking priority over higher
- indexed sub-expression.
- The behavior of multi-character collating elements is
- ambiguous in the standard, in particular expressions such
- as [a[.ae.]] may have subtle inconsistencies lurking in
- them. This implementation matches bracket expressions as
- follows: all bracket expressions match a single character
- only, unless the expression contains a multi-character
- collating element, either on its own, or as the endpoint
- to a range, in which case the expression may match more
- than one character.
- Repeated null expressions are repeated only once, they
- are treated "as if" they were matched the
- maximum number of times allowed by the expression.
- The behavior of back references is ambiguous in the
- standard, in particular it is unclear whether expressions
- of the form "((ab*)\2)+" should be allowed.
- This implementation allows such expressions and the back
- reference matches whatever the last sub-expression match
- was. This means that at the end of the match, the back
- references may have matched strings different from the
- final value of the sub-expression to which they refer.
-
-
-
-
- Appendix 2: Thread safety
-
-Class reg_expression<> and its typedefs regex and wregex
-are thread safe, in that compiled regular expressions can safely
-be shared between threads. The matching algorithms regex_match,
-regex_search, regex_grep, regex_format and regex_merge are all re-entrant
-and thread safe. Class match_results is now thread safe, in that
-the results of a match can be safely copied from one thread to
-another (for example one thread may find matches and push
-match_results instances onto a queue, while another thread pops
-them off the other end), otherwise use a separate instance of
-match_results per thread.
-
-The POSIX API functions are all re-entrant and thread safe,
-regular expressions compiled with regcomp can also be
-shared between threads.
-
-The class RegEx is only thread safe if each thread gets its
-own RegEx instance (apartment threading) - this is a consequence
-of RegEx handling both compiling and matching regular expressions.
-
-
-Finally note that changing the global locale invalidates all
-compiled regular expressions, therefore calling set_locale
-from one thread while another uses regular expressions will
-produce unpredictable results.
-
-There is also a requirement that there is only one thread
-executing prior to the start of main().
-
-
-
- Appendix 3: Localization
-
- Regex++ provides extensive support for run-time
-localization, the localization model used can be split into two
-parts: front-end and back-end.
-
-Front-end localization deals with everything which the user
-sees - error messages, and the regular expression syntax itself.
-For example a French application could change [[:word:]] to [[:mot:]]
-and \w to \m. Modifying the front end locale requires active
-support from the developer, by providing the library with a
-message catalogue to load, containing the localized strings.
-Front-end locale is affected by the LC_MESSAGES category only.
-
-Back-end localization deals with everything that occurs after
-the expression has been parsed - in other words everything that
-the user does not see or interact with directly. It deals with
-case conversion, collation, and character class membership. The
-back-end locale does not require any intervention from the
-developer - the library will acquire all the information it
-requires for the current locale from the underlying operating
-system / run time library. This means that if the program user
-does not interact with regular expressions directly - for example
-if the expressions are embedded in your C++ code - then no
-explicit localization is required, as the library will take care
-of everything for you. For example embedding the expression [[:word:]]+
-in your code will always match a whole word, if the program is
-run on a machine with, for example, a Greek locale, then it will
-still match a whole word, but in Greek characters rather than
-Latin ones. The back-end locale is affected by the LC_TYPE and
-LC_COLLATE categories.
-
-There are three separate localization mechanisms supported by
-regex++:
-
-Win32 localization model.
-
-This is the default model when the library is compiled under
-Win32, and is encapsulated by the traits class w32_regex_traits .
-When this model is in effect there is a single global locale as
-defined by the user's control panel settings, and returned by
-GetUserDefaultLCID. All the settings used by regex++ are acquired
-directly from the operating system bypassing the C run time
-library. Front-end localization requires a resource dll,
-containing a string table with the user-defined strings. The
-traits class exports the function:
-
-static std::string set_message_catalogue(const std::string&
-s);
-
-which needs to be called with a string identifying the name of
-the resource dll, before your code compiles any regular
-expressions (but not necessarily before you construct any reg_expression
-instances):
-
-boost::w32_regex_traits<char>::set_message_catalogue("mydll.dll");
-
-
-Note that this API sets the dll name for both the
-narrow and wide character specializations of w32_regex_traits.
-
-This model does not currently support thread specific locales
-(via SetThreadLocale under Windows NT), the library provides full
-Unicode support under NT, under Windows 9x the library degrades
-gracefully - characters 0 to 255 are supported, the remainder are
-treated as "unknown" graphic characters.
-
-C localization model.
-
-This is the default model when the library is compiled under
-an operating system other than Win32, and is encapsulated by the
-traits class c_regex_traits ,
-Win32 users can force this model to take effect by defining the
-pre-processor symbol BOOST_REGEX_USE_C_LOCALE. When this model is
-in effect there is a single global locale, as set by setlocale .
-All settings are acquired from your run time library,
-consequently Unicode support is dependent upon your run time
-library implementation. Front end localization requires a POSIX
-message catalogue. The traits class exports the function:
-
-static std::string set_message_catalogue(const std::string&
-s);
-
-which needs to be called with a string identifying the name of
-the message catalogue, before your code compiles any
-regular expressions (but not necessarily before you construct any
-reg_expression instances):
-
-boost::c_regex_traits<char>::set_message_catalogue("mycatalogue");
-
-
-Note that this API sets the dll name for both the
-narrow and wide character specializations of c_regex_traits. If
-your run time library does not support POSIX message catalogues,
-then you can either provide your own implementation of
-<nl_types.h> or define BOOST_RE_NO_CAT to disable front-end
-localization via message catalogues.
-
-Note that calling setlocale invalidates all compiled
-regular expressions, calling setlocale(LC_ALL, "C")
-will make this library behave equivalent to most traditional
-regular expression libraries including version 1 of this library.
-
-
-C++ localization model .
-
-
-This model is only in effect if the library is built with the
-pre-processor symbol BOOST_REGEX_USE_CPP_LOCALE defined. When
-this model is in effect each instance of reg_expression<>
-has its own instance of std::locale, class reg_expression<>
-also has a member function imbue which allows the locale
-for the expression to be set on a per-instance basis. Front end
-localization requires a POSIX message catalogue, which will be
-loaded via the std::messages facet of the expression's locale,
-the traits class exports the symbol:
-
-static std::string set_message_catalogue(const std::string&
-s);
-
-which needs to be called with a string identifying the name of
-the message catalogue, before your code compiles any
-regular expressions (but not necessarily before you construct any
-reg_expression instances):
-
-boost::cpp_regex_traits<char>::set_message_catalogue("mycatalogue");
-
-
-Note that calling reg_expression<>::imbue will
-invalidate any expression currently compiled in that instance of
-reg_expression<>. This model is the one which closest fits
-the ethos of the C++ standard library, however it is the model
-which will produce the slowest code, and which is the least well
-supported by current standard library implementations, for
-example I have yet to find an implementation of std::locale which
-supports either message catalogues, or locales other than "C"
-or "POSIX".
-
-Finally note that if you build the library with a non-default
-localization model, then the appropriate pre-processor symbol (BOOST_REGEX_USE_C_LOCALE
-or BOOST_REGEX_USE_CPP_LOCALE) must be defined both when you
-build the support library, and when you include <boost/regex.hpp>
-or <boost/cregex.hpp> in your code. The best way to ensure
-this is to add the #define to <boost/regex/detail/regex_options.hpp>.
-
-
-Providing a message catalogue:
-
-In order to localize the front end of the library, you need to
-provide the library with the appropriate message strings
-contained either in a resource dll's string table (Win32 model),
-or a POSIX message catalogue (C or C++ models). In the latter
-case the messages must appear in message set zero of the
-catalogue. The messages and their id's are as follows:
-
-
-
-
-
- Message id
- Meaning
- Default value
-
-
-
-
- 101
- The character used to start
- a sub-expression.
- "("
-
-
-
-
- 102
- The character used to end a
- sub-expression declaration.
- ")"
-
-
-
-
- 103
- The character used to denote
- an end of line assertion.
- "$"
-
-
-
-
- 104
- The character used to denote
- the start of line assertion.
- "^"
-
-
-
-
- 105
- The character used to denote
- the "match any character expression".
- "."
-
-
-
-
- 106
- The match zero or more times
- repetition operator.
- "*"
-
-
-
-
- 107
- The match one or more
- repetition operator.
- "+"
-
-
-
-
- 108
- The match zero or one
- repetition operator.
- "?"
-
-
-
-
- 109
- The character set opening
- character.
- "["
-
-
-
-
- 110
- The character set closing
- character.
- "]"
-
-
-
-
- 111
- The alternation operator.
- "|"
-
-
-
-
- 112
- The escape character.
- "\\"
-
-
-
-
- 113
- The hash character (not
- currently used).
- "#"
-
-
-
-
- 114
- The range operator.
- "-"
-
-
-
-
- 115
- The repetition operator
- opening character.
- "{"
-
-
-
-
- 116
- The repetition operator
- closing character.
- "}"
-
-
-
-
- 117
- The digit characters.
- "0123456789"
-
-
-
-
- 118
- The character which when
- preceded by an escape character represents the word
- boundary assertion.
- "b"
-
-
-
-
- 119
- The character which when
- preceded by an escape character represents the non-word
- boundary assertion.
- "B"
-
-
-
-
- 120
- The character which when
- preceded by an escape character represents the word-start
- boundary assertion.
- "<"
-
-
-
-
- 121
- The character which when
- preceded by an escape character represents the word-end
- boundary assertion.
- ">"
-
-
-
-
- 122
- The character which when
- preceded by an escape character represents any word
- character.
- "w"
-
-
-
-
- 123
- The character which when
- preceded by an escape character represents a non-word
- character.
- "W"
-
-
-
-
- 124
- The character which when
- preceded by an escape character represents a start of
- buffer assertion.
- "`A"
-
-
-
-
- 125
- The character which when
- preceded by an escape character represents an end of
- buffer assertion.
- "'z"
-
-
-
-
- 126
- The newline character.
- "\n"
-
-
-
-
- 127
- The comma separator.
- ","
-
-
-
-
- 128
- The character which when
- preceded by an escape character represents the bell
- character.
- "a"
-
-
-
-
- 129
- The character which when
- preceded by an escape character represents the form feed
- character.
- "f"
-
-
-
-
- 130
- The character which when
- preceded by an escape character represents the newline
- character.
- "n"
-
-
-
-
- 131
- The character which when
- preceded by an escape character represents the carriage
- return character.
- "r"
-
-
-
-
- 132
- The character which when
- preceded by an escape character represents the tab
- character.
- "t"
-
-
-
-
- 133
- The character which when
- preceded by an escape character represents the vertical
- tab character.
- "v"
-
-
-
-
- 134
- The character which when
- preceded by an escape character represents the start of a
- hexadecimal character constant.
- "x"
-
-
-
-
- 135
- The character which when
- preceded by an escape character represents the start of
- an ASCII escape character.
- "c"
-
-
-
-
- 136
- The colon character.
- ":"
-
-
-
-
- 137
- The equals character.
- "="
-
-
-
-
- 138
- The character which when
- preceded by an escape character represents the ASCII
- escape character.
- "e"
-
-
-
-
- 139
- The character which when
- preceded by an escape character represents any lower case
- character.
- "l"
-
-
-
-
- 140
- The character which when
- preceded by an escape character represents any non-lower
- case character.
- "L"
-
-
-
-
- 141
- The character which when
- preceded by an escape character represents any upper case
- character.
- "u"
-
-
-
-
- 142
- The character which when
- preceded by an escape character represents any non-upper
- case character.
- "U"
-
-
-
-
- 143
- The character which when
- preceded by an escape character represents any space
- character.
- "s"
-
-
-
-
- 144
- The character which when
- preceded by an escape character represents any non-space
- character.
- "S"
-
-
-
-
- 145
- The character which when
- preceded by an escape character represents any digit
- character.
- "d"
-
-
-
-
- 146
- The character which when
- preceded by an escape character represents any non-digit
- character.
- "D"
-
-
-
-
- 147
- The character which when
- preceded by an escape character represents the end quote
- operator.
- "E"
-
-
-
-
- 148
- The character which when
- preceded by an escape character represents the start
- quote operator.
- "Q"
-
-
-
-
- 149
- The character which when
- preceded by an escape character represents a Unicode
- combining character sequence.
- "X"
-
-
-
-
- 150
- The character which when
- preceded by an escape character represents any single
- character.
- "C"
-
-
-
-
- 151
- The character which when
- preceded by an escape character represents end of buffer
- operator.
- "Z"
-
-
-
-
- 152
- The character which when
- preceded by an escape character represents the
- continuation assertion.
- "G"
-
-
-
-
- 153
- The character which when preceeded by (? indicates a
- zero width negated forward lookahead assert.
- !
-
-
-
-
-
-
-
-Custom error messages are loaded as follows:
-
-
-
-
-
- Message ID
- Error message ID
- Default string
-
-
-
-
- 201
- REG_NOMATCH
- "No match"
-
-
-
-
- 202
- REG_BADPAT
- "Invalid regular
- expression"
-
-
-
-
- 203
- REG_ECOLLATE
- "Invalid collation
- character"
-
-
-
-
- 204
- REG_ECTYPE
- "Invalid character
- class name"
-
-
-
-
- 205
- REG_EESCAPE
- "Trailing backslash"
-
-
-
-
-
- 206
- REG_ESUBREG
- "Invalid back reference"
-
-
-
-
-
- 207
- REG_EBRACK
- "Unmatched [ or [^"
-
-
-
-
-
- 208
- REG_EPAREN
- "Unmatched ( or \\("
-
-
-
-
-
- 209
- REG_EBRACE
- "Unmatched \\{"
-
-
-
-
- 210
- REG_BADBR
- "Invalid content of
- \\{\\}"
-
-
-
-
- 211
- REG_ERANGE
- "Invalid range end"
-
-
-
-
-
- 212
- REG_ESPACE
- "Memory exhausted"
-
-
-
-
-
- 213
- REG_BADRPT
- "Invalid preceding
- regular expression"
-
-
-
-
- 214
- REG_EEND
- "Premature end of
- regular expression"
-
-
-
-
- 215
- REG_ESIZE
- "Regular expression too
- big"
-
-
-
-
- 216
- REG_ERPAREN
- "Unmatched ) or \\)"
-
-
-
-
-
- 217
- REG_EMPTY
- "Empty expression"
-
-
-
-
-
- 218
- REG_E_UNKNOWN
- "Unknown error"
-
-
-
-
-
-
-
-Custom character class names are loaded as followed:
-
-
-
-
-
- Message ID
- Description
- Equivalent default class
- name
-
-
-
-
- 300
- The character class name for
- alphanumeric characters.
- "alnum"
-
-
-
-
- 301
- The character class name for
- alphabetic characters.
- "alpha"
-
-
-
-
- 302
- The character class name for
- control characters.
- "cntrl"
-
-
-
-
- 303
- The character class name for
- digit characters.
- "digit"
-
-
-
-
- 304
- The character class name for
- graphics characters.
- "graph"
-
-
-
-
- 305
- The character class name for
- lower case characters.
- "lower"
-
-
-
-
- 306
- The character class name for
- printable characters.
- "print"
-
-
-
-
- 307
- The character class name for
- punctuation characters.
- "punct"
-
-
-
-
- 308
- The character class name for
- space characters.
- "space"
-
-
-
-
- 309
- The character class name for
- upper case characters.
- "upper"
-
-
-
-
- 310
- The character class name for
- hexadecimal characters.
- "xdigit"
-
-
-
-
- 311
- The character class name for
- blank characters.
- "blank"
-
-
-
-
- 312
- The character class name for
- word characters.
- "word"
-
-
-
-
- 313
- The character class name for
- Unicode characters.
- "unicode"
-
-
-
-
-
-
-
-Finally, custom collating element names are loaded starting
-from message id 400, and terminating when the first load
-thereafter fails. Each message looks something like: "tagname
-string" where tagname is the name used inside [[.tagname.]]
-and string is the actual text of the collating element.
-Note that the value of collating element [[.zero.]] is used for
-the conversion of strings to numbers - if you replace this with
-another value then that will be used for string parsing - for
-example use the Unicode character 0x0660 for [[.zero.]] if you
-want to use Unicode Arabic-Indic digits in your regular
-expressions in place of Latin digits.
-
-Note that the POSIX defined names for character classes and
-collating elements are always available - even if custom names
-are defined, in contrast, custom error messages, and custom
-syntax messages replace the default ones.
-
-
-
- Appendix 4: Example Applications
-
-There are three demo applications that ship with this library,
-they all come with makefiles for Borland, Microsoft and gcc
-compilers, otherwise you will have to create your own makefiles.
-
-regress.exe:
-
-A regression test application that gives the matching/searching
-algorithms a full workout. The presence of this program is your
-guarantee that the library will behave as claimed - at least as
-far as those items tested are concerned - if anyone spots
-anything that isn't being tested I'd be glad to hear about it.
-
-Files: parse.cpp , regress.cpp , tests.cpp .
-
-jgrep.exe
-
-A simple grep implementation, run with no command line options
-to find out its usage. Look at fileiter.cpp /fileiter.hpp
-and the mapfile class to see an example of a "smart"
-bidirectional iterator that can be used with regex++ or any other
-STL algorithm.
-
-Files: jgrep.cpp , main.cpp .
-
-timer.exe
-
-A simple interactive expression matching application, the
-results of all matches are timed, allowing the programmer to
-optimize their regular expressions where performance is critical.
-
-
-Files: regex_timer.cpp .
-
-
-The snippets examples contain the code examples used in the
-documentation:
-
-regex_match_example.cpp :
-ftp based regex_match example.
-
-regex_search_example.cpp :
-regex_search example: searches a cpp file for class definitions.
-
-regex_grep_example_1.cpp :
-regex_grep example 1: searches a cpp file for class definitions.
-
-regex_merge_example.cpp :
-regex_merge example: converts a C++ file to syntax highlighted
-HTML.
-
-regex_grep_example_2.cpp :
-regex_grep example 2: searches a cpp file for class definitions,
-using a global callback function.
-
-regex_grep_example_3.cpp :
-regex_grep example 2: searches a cpp file for class definitions,
-using a bound member function callback.
-
-regex_grep_example_4.cpp :
-regex_grep example 2: searches a cpp file for class definitions,
-using a C++ Builder closure as a callback.
-
-regex_split_example_1.cpp :
-regex_split example: split a string into tokens.
-
-regex_split_example_2.cpp :
-regex_split example: spit out linked URL's.
-
-
-
- Appendix 5: Header Files
-
-There are two main headers used by this library: <boost/regex.hpp>
-provides full access to the entire library, while <boost/cregex.hpp>
-provides access to just the high level class RegEx, and the POSIX
-API functions.
-
-
-
- Appendix 6: Redistributables
-
- If you are using Microsoft or Borland C++ and link to a
-dll version of the run time library, then you will also link to
-one of the dll versions of regex++. While these dll's are
-redistributable, there are no "standard" versions, so
-when installing on the users PC, you should place these in a
-directory private to your application, and not in the PC's
-directory path. Note that if you link to a static version of your
-run time library, then you will also link to a static version of
-regex++ and no dll's will need to be distributed. The possible
-regex++ dll and library names are computed according to the
-following formula:
-
-
-"boost_regex_"
-+ BOOST_LIB_TOOLSET
-+ "_"
-+ BOOST_LIB_THREAD_OPT
-+ BOOST_LIB_RT_OPT
-+ BOOST_LIB_LINK_OPT
-+ BOOST_LIB_DEBUG_OPT
-
-These are defined as:
-
-BOOST_LIB_TOOLSET: The compiler toolset name (vc6, vc7, bcb5 etc).
-
-BOOST_LIB_THREAD_OPT: "s" for single thread builds,
-"m" for multithread builds.
-
-BOOST_LIB_RT_OPT: "s" for static runtime,
-"d" for dynamic runtime.
-
-BOOST_LIB_LINK_OPT: "s" for static link,
-"i" for dynamic link.
-
-BOOST_LIB_DEBUG_OPT: nothing for release builds,
-"d" for debug builds,
-"dd" for debug-diagnostic builds (_STLP_DEBUG).
-
-Note: you can disable automatic library selection by defining
-the symbol BOOST_REGEX_NO_LIB when compiling, this is useful if
-you want to statically link even though you're using the dll
-version of your run time library, or if you need to debug regex++.
-
-
-
-
- Notes for upgraders
-
-This version of regex++ is the first to be ported to the boost project, and as a result
-has a number of changes to comply with the boost coding
-guidelines.
-
-Headers have been changed from <header> or <header.h>
-to <boost/header.hpp>
-
-The library namespace has changed from "jm", to
-"boost".
-
-The reg_xxx algorithms have been renamed regex_xxx (to improve
-naming consistency).
-
-Algorithm query_match has been renamed regex_match, and only
-returns true if the expression matches the whole of the input
-string (think input data validation).
-
-Compiling existing code:
-
-The directory, libs/regex/old_include contains a set of
-headers that make this version of regex++ compatible with
-previous ones, either add this directory to your include path, or
-copy these headers to the root directory of your boost
-installation. The contents of these headers are deprecated and
-undocumented - really these are just here for existing code - for
-new projects use the new header forms.
-
-
-
- Further Information (Contacts and
-Acknowledgements)
-
-The author can be contacted at John_Maddock@compuserve.com ,
-the home page for this library is at http://ourworld.compuserve.com/homepages/John_Maddock/regexpp.htm ,
-and the official boost version can be obtained from www.boost.org/libraries.htm .
-
-I am indebted to Robert Sedgewick's "Algorithms in C++"
-for forcing me to think about algorithms and their performance,
-and to the folks at boost for forcing me to think , period.
-The following people have all contributed useful comments or
-fixes: Dave Abrahams, Mike Allison, Edan Ayal, Jayashree
-Balasubramanian, Jan Bölsche, Beman Dawes, Paul Baxter, David
-Bergman, David Dennerline, Edward Diener, Peter Dimov, Robert
-Dunn, Fabio Forno, Tobias Gabrielsson, Rob Gillen, Marc Gregoire,
-Chris Hecker, Nick Hodapp, Jesse Jones, Martin Jost, Boris
-Krasnovskiy, Jan Hermelink, Max Leung, Wei-hao Lin, Jens Maurer,
-Richard Peters, Heiko Schmidt, Jason Shirk, Gerald Slacik, Scobie
-Smith, Mike Smyth, Alexander Sokolovsky, Hervé Poirier, Michael
-Raykh, Marc Recht, Scott VanCamp, Bruno Voigt, Alexey Voinov,
-Jerry Waldorf, Rob Ward, Lealon Watts, Thomas Witt and Yuval
-Yosef. I am also grateful to the manuals supplied with the Henry
-Spencer, Perl and GNU regular expression libraries - wherever
-possible I have tried to maintain compatibility with these
-libraries and with the POSIX standard - the code however is
-entirely my own, including any bugs! I can absolutely guarantee
-that I will not fix any bugs I don't know about, so if you have
-any comments or spot any bugs, please get in touch.
-
-Useful further information can be found at:
-
-A short tutorial on regular expressions can
-be found here .
-
-The Open
-Unix Specification contains a wealth of useful material,
-including the regular expression syntax, and specifications for <regex.h>
-and <nl_types.h> .
-
-
-The Pattern
-Matching Pointers site is a "must visit" resource
-for anyone interested in pattern matching.
-
-Glimpse and Agrep ,
-use a simplified regular expression syntax to achieve faster
-search times.
-
-Udi Manber
-and Ricardo Baeza-Yates
-both have a selection of useful pattern matching papers available
-from their respective web sites.
-
-
-
-Copyright Dr
-John Maddock 1998-2000 all rights reserved.
-
-
diff --git a/doc/Attic/standards.html b/doc/Attic/standards.html
new file mode 100644
index 00000000..35a2e67e
--- /dev/null
+++ b/doc/Attic/standards.html
@@ -0,0 +1,79 @@
+
+
+
+ Boost.Regex: Standards Conformance
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Standards Conformance
+
+
+
+
+
+
+
+
+ C++
+ Boost.regex is intended to conform to the
+ regular expression standardization proposal , which will appear in a
+ future C++ standard technical report (and hopefully in a future version of the
+ standard). Currently there are some differences in how the regular
+ expression traits classes are defined, these will be fixed in a future release.
+ ECMAScript / JavaScript
+ All of the ECMAScript regular expression syntax features are supported, except
+ that:
+ Negated class escapes (\S, \D and \W) are not permitted inside character class
+ definitions ( [...] ).
+ The escape sequence \u matches any upper case character (the same as
+ [[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for
+ Unicode escape sequences.
+ Perl
+ Almost all Perl features are supported, except for:
+ \N{name} Use [[:name:]] instead.
+ \pP and \PP
+ (?imsx-imsx)
+ (?<=pattern)
+ (?<!pattern)
+ (?{code})
+ (??{code})
+ (?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)
+ These embarrassments / limitations will be removed in due course, mainly
+ dependent upon user demand.
+ POSIX
+ All the POSIX basic and extended regular expression features are supported,
+ except that:
+ No character collating names are recognized except those specified in the POSIX
+ standard for the C locale, unless they are explicitly registered with the
+ traits class.
+ Character equivalence classes ( [[=a=]] etc) are probably buggy except on
+ Win32. Implementing this feature requires knowledge of the format of the
+ string sort keys produced by the system; if you need this, and the default
+ implementation doesn't work on your platform, then you will need to supply a
+ custom traits class.
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/Attic/sub_match.html b/doc/Attic/sub_match.html
new file mode 100644
index 00000000..db995312
--- /dev/null
+++ b/doc/Attic/sub_match.html
@@ -0,0 +1,426 @@
+
+
+
+ Boost.Regex: sub_match
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ sub_match
+
+
+
+
+
+
+
+
+ Synopsis
+ #include <boost/regex.hpp >
+
+ Regular expressions are different from many simple pattern-matching algorithms
+ in that as well as finding an overall match they can also produce
+ sub-expression matches: each sub-expression being delimited in the pattern by a
+ pair of parenthesis (...). There has to be some method for reporting
+ sub-expression matches back to the user: this is achieved this by defining a
+ class match_results that acts as an
+ indexed collection of sub-expression matches, each sub-expression match being
+ contained in an object of type sub_match
+ .
+
Objects of type sub_match may only obtained by subscripting an object
+ of type match_results
+ .
+
When the marked sub-expression denoted by an object of type sub_match<>
+ participated in a regular expression match then member matched
evaluates
+ to true, and members first
and second
denote the
+ range of characters [first,second)
which formed that match.
+ Otherwise matched
is false, and members first
and second
+ contained undefined values.
+ If an object of type sub_match<>
represents sub-expression 0
+ - that is to say the whole match - then member matched
is always
+ true, unless a partial match was obtained as a result of the flag match_partial
+ being passed to a regular expression algorithm, in which case member matched
+ is false, and members first
and second
represent the
+ character range that formed the partial match.
+
+namespace boost{
+
+template <class BidirectionalIterator>
+class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
+{
+public:
+ typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
+ typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
+ typedef BidirectionalIterator iterator;
+
+ bool matched;
+
+ difference_type length()const;
+ operator basic_string<value_type>()const;
+ basic_string<value_type> str()const;
+
+ int compare(const sub_match& s)const;
+ int compare(const basic_string<value_type>& s)const;
+ int compare(const value_type* s)const;
+};
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+template <class charT, class traits, class BidirectionalIterator>
+basic_ostream<charT, traits>&
+ operator << (basic_ostream<charT, traits>& os,
+ const sub_match<BidirectionalIterator>& m);
+
+} // namespace boost
+ Description
+
+ sub_match members
+ typedef typename std::iterator_traits<iterator>::value_type value_type;
+ The type pointed to by the iterators.
+ typedef typename std::iterator_traits<iterator>::difference_type difference_type;
+ A type that represents the difference between two iterators.
+ typedef iterator iterator_type;
+ The iterator type.
+ iterator first
+ An iterator denoting the position of the start of the match.
+ iterator second
+ An iterator denoting the position of the end of the match.
+ bool matched
+ A Boolean value denoting whether this sub-expression participated in the match.
+ static difference_type length();
+
+
+ Effects: returns (matched ? 0 : distance(first, second))
.
operator basic_string<value_type>()const;
+
+
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>()).
basic_string<value_type> str()const;
+
+
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>())
.
int compare(const sub_match& s)const;
+
+
+ Effects: returns str().compare(s.str())
.
int compare(const basic_string<value_type>& s)const;
+
+
+ Effects: returns str().compare(s)
.
int compare(const value_type* s)const;
+
+
+ Effects: returns str().compare(s)
.
+
+ sub_match non-member operators
+ template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) == 0
.
template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) != 0
.
template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) < 0
.
template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) <= 0
.
template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) >= 0
.
template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) > 0
.
template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs == rhs.str()
.
template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs != rhs.str()
.
template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs < rhs.str()
.
template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs > rhs.str()
.
template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs >= rhs.str()
.
template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs <= rhs.str()
.
template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() == rhs
.
template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() != rhs
.
template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() < rhs
.
template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() > rhs
.
template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() >= rhs
.
template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() <= rhs
.
template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs == rhs.str()
.
template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs != rhs.str()
.
template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs < rhs.str()
.
template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs > rhs.str()
.
template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs >= rhs.str()
.
template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs <= rhs.str()
.
template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() == rhs
.
template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() != rhs
.
template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() < rhs
.
template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() > rhs
.
template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() >= rhs
.
template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() <= rhs
.
template <class charT, class traits, class BidirectionalIterator>
+basic_ostream<charT, traits>&
+ operator << (basic_ostream<charT, traits>& os
+ const sub_match<BidirectionalIterator>& m);
+
+
+ Effects: returns (os << m.str())
.
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/Attic/syntax.html b/doc/Attic/syntax.html
new file mode 100644
index 00000000..f776cd3c
--- /dev/null
+++ b/doc/Attic/syntax.html
@@ -0,0 +1,773 @@
+
+
+
+ Boost.Regex: Regular Expression Syntax
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Regular Expression Syntax
+
+
+
+
+
+
+
+
+ This section covers the regular expression syntax used by this library, this is
+ a programmers guide, the actual syntax presented to your program's users will
+ depend upon the flags used during expression compilation.
+
+ Literals
+
+ All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
+ "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
+ a "\". A literal is a character that matches itself, or matches the result of
+ traits_type::translate(), where traits_type is the traits template parameter to
+ class basic_regex.
+ Wildcard
+
+ The dot character "." matches any single character except : when match_not_dot_null
+ is passed to the matching algorithms, the dot does not match a null character;
+ when match_not_dot_newline is passed to the matching algorithms, then
+ the dot does not match a newline character.
+
+ Repeats
+
+ A repeat is an expression that is repeated an arbitrary number of times. An
+ expression followed by "*" can be repeated any number of times including zero.
+ An expression followed by "+" can be repeated any number of times, but at least
+ once, if the expression is compiled with the flag regex_constants::bk_plus_qm
+ then "+" is an ordinary character and "\+" represents a repeat of once or more.
+ An expression followed by "?" may be repeated zero or one times only, if the
+ expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
+ ordinary character and "\?" represents the repeat zero or once operator. When
+ it is necessary to specify the minimum and maximum number of repeats
+ explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
+ repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
+ and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
+ no upper limit. Note that there must be no white-space inside the {}, and there
+ is no upper limit on the values of the lower and upper bounds. When the
+ expression is compiled with the flag regex_constants::bk_braces then "{" and
+ "}" are ordinary characters and "\{" and "\}" are used to delimit bounds
+ instead. All repeat expressions refer to the shortest possible previous
+ sub-expression: a single character; a character set, or a sub-expression
+ grouped with "()" for example.
+
+ Examples:
+
+ "ba*" will match all of "b", "ba", "baaa" etc.
+
+ "ba+" will match "ba" or "baaaa" for example but not "b".
+
+ "ba?" will match "b" or "ba".
+
+ "ba{2,4}" will match "baa", "baaa" and "baaaa".
+
+ Non-greedy repeats
+
+ Whenever the "extended" regular expression syntax is in use (the default) then
+ non-greedy repeats are possible by appending a '?' after the repeat; a
+ non-greedy repeat is one which will match the shortest possible string.
+
+ For example to match html tag pairs one could use something like:
+
+ "<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
+
+ In this case $1 will contain the text between the tag pairs, and will be the
+ shortest possible matching string.
+
+ Parenthesis
+
+ Parentheses serve two purposes, to group items together into a sub-expression,
+ and to mark what generated the match. For example the expression "(ab)*" would
+ match all of the string "ababab". The matching algorithms
+ regex_match and regex_search
+ each take an instance of match_results
+ that reports what caused the match, on exit from these functions the
+ match_results contains information both on what the whole expression
+ matched and on what each sub-expression matched. In the example above
+ match_results[1] would contain a pair of iterators denoting the final "ab" of
+ the matching string. It is permissible for sub-expressions to match null
+ strings. If a sub-expression takes no part in a match - for example if it is
+ part of an alternative that is not taken - then both of the iterators that are
+ returned for that sub-expression point to the end of the input string, and the matched
+ parameter for that sub-expression is false . Sub-expressions are indexed
+ from left to right starting from 1, sub-expression 0 is the whole expression.
+
+ Non-Marking Parenthesis
+
+ Sometimes you need to group sub-expressions with parenthesis, but don't want
+ the parenthesis to spit out another marked sub-expression, in this case a
+ non-marking parenthesis (?:expression) can be used. For example the following
+ expression creates no sub-expressions:
+
+ "(?:abc)*"
+ Forward Lookahead Asserts
+
+ There are two forms of these; one for positive forward lookahead asserts, and
+ one for negative lookahead asserts:
+ "(?=abc)" matches zero characters only if they are followed by the expression
+ "abc".
+ "(?!abc)" matches zero characters only if they are not followed by the
+ expression "abc".
+ Independent sub-expressions
+ "(?>expression)" matches "expression" as an independent atom (the algorithm
+ will not backtrack into it if a failure occurs later in the expression).
+ Alternatives
+
+ Alternatives occur when the expression can match either one sub-expression or
+ another, each alternative is separated by a "|", or a "\|" if the flag
+ regex_constants::bk_vbar is set, or by a newline character if the flag
+ regex_constants::newline_alt is set. Each alternative is the largest possible
+ previous sub-expression; this is the opposite behavior from repetition
+ operators.
+
+ Examples:
+
+ "a(b|c)" could match "ab" or "ac".
+
+ "abc|def" could match "abc" or "def".
+
+ Sets
+
+ A set is a set of characters that can match any single character that is a
+ member of the set. Sets are delimited by "[" and "]" and can contain literals,
+ character ranges, character classes, collating elements and equivalence
+ classes. Set declarations that start with "^" contain the compliment of the
+ elements that follow.
+
+ Examples:
+
+ Character literals:
+
+ "[abc]" will match either of "a", "b", or "c".
+
+ "[^abc] will match any character other than "a", "b", or "c".
+
+ Character ranges:
+
+ "[a-z]" will match any character in the range "a" to "z".
+
+ "[^A-Z]" will match any character other than those in the range "A" to "Z".
+
+ Note that character ranges are highly locale dependent if the flag
+ regex_constants::collate is set: they match any character that collates between
+ the endpoints of the range, ranges will only behave according to ASCII rules
+ when the default "C" locale is in effect. For example if the library is
+ compiled with the Win32 localization model, then [a-z] will match the ASCII
+ characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
+ 'z'. This locale specific behavior is disabled by default (in perl mode), and
+ forces ranges to collate according to ASCII character code.
+
+ Character classes are denoted using the syntax "[:classname:]" within a set
+ declaration, for example "[[:space:]]" is the set of all whitespace characters.
+ Character classes are only available if the flag regex_constants::char_classes
+ is set. The available character classes are:
+
+
+
+
+
+
+
+ alnum
+ Any alpha numeric character.
+
+
+
+
+ alpha
+ Any alphabetical character a-z and A-Z. Other
+ characters may also be included depending upon the locale.
+
+
+
+
+ blank
+ Any blank character, either a space or a tab.
+
+
+
+
+ cntrl
+ Any control character.
+
+
+
+
+ digit
+ Any digit 0-9.
+
+
+
+
+ graph
+ Any graphical character.
+
+
+
+
+ lower
+ Any lower case character a-z. Other characters may
+ also be included depending upon the locale.
+
+
+
+
+ print
+ Any printable character.
+
+
+
+
+ punct
+ Any punctuation character.
+
+
+
+
+ space
+ Any whitespace character.
+
+
+
+
+ upper
+ Any upper case character A-Z. Other characters may
+ also be included depending upon the locale.
+
+
+
+
+ xdigit
+ Any hexadecimal digit character, 0-9, a-f and A-F.
+
+
+
+
+ word
+ Any word character - all alphanumeric characters plus
+ the underscore.
+
+
+
+
+ Unicode
+ Any character whose code is greater than 255, this
+ applies to the wide character traits classes only.
+
+
+
+
+ There are some shortcuts that can be used in place of the character classes,
+ provided the flag regex_constants::escape_in_lists is set then you can use:
+
+ \w in place of [:word:]
+
+ \s in place of [:space:]
+
+ \d in place of [:digit:]
+
+ \l in place of [:lower:]
+
+ \u in place of [:upper:]
+
+ Collating elements take the general form [.tagname.] inside a set declaration,
+ where tagname is either a single character, or a name of a collating
+ element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
+ equivalent to [,]. The library supports all the standard POSIX collating
+ element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
+ "nj", "dz", "lj", each in lower, upper and title case variations.
+ Multi-character collating elements can result in the set matching more than one
+ character, for example [[.ae.]] would match two characters, but note that
+ [^[.ae.]] would only match one character.
+
+
+ Equivalence classes take the general form[=tagname=] inside a set declaration,
+ where tagname is either a single character, or a name of a collating
+ element, and matches any character that is a member of the same primary
+ equivalence class as the collating element [.tagname.]. An equivalence class is
+ a set of characters that collate the same, a primary equivalence class is a set
+ of characters whose primary sort key are all the same (for example strings are
+ typically collated by character, then by accent, and then by case; the primary
+ sort key then relates to the character, the secondary to the accentation, and
+ the tertiary to the case). If there is no equivalence class corresponding to tagname
+ , then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
+ locale independent method of obtaining the primary sort key for a character,
+ except under Win32. For other operating systems the library will "guess" the
+ primary sort key from the full sort key (obtained from strxfrm ), so
+ equivalence classes are probably best considered broken under any operating
+ system other than Win32.
+
+ To include a literal "-" in a set declaration then: make it the first character
+ after the opening "[" or "[^", the endpoint of a range, a collating element, or
+ if the flag regex_constants::escape_in_lists is set then precede with an escape
+ character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
+ make them the endpoint of a range, a collating element, or precede with an
+ escape character if the flag regex_constants::escape_in_lists is set.
+
+ Line anchors
+
+ An anchor is something that matches the null string at the start or end of a
+ line: "^" matches the null string at the start of a line, "$" matches the null
+ string at the end of a line.
+
+ Back references
+
+ A back reference is a reference to a previous sub-expression that has already
+ been matched, the reference is to what the sub-expression matched, not to the
+ expression itself. A back reference consists of the escape character "\"
+ followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
+ to the second etc. For example the expression "(.*)\1" matches any string that
+ is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
+ reference to a sub-expression that did not participate in any match, matches
+ the null string: NB this is different to some other regular expression
+ matchers. Back references are only available if the expression is compiled with
+ the flag regex_constants::bk_refs set.
+
+ Characters by code
+
+ This is an extension to the algorithm that is not available in other libraries,
+ it consists of the escape character followed by the digit "0" followed by the
+ octal character code. For example "\023" represents the character whose octal
+ code is 23. Where ambiguity could occur use parentheses to break the expression
+ up: "\0103" represents the character whose code is 103, "(\010)3 represents the
+ character 10 followed by "3". To match characters by their hexadecimal code,
+ use \x followed by a string of hexadecimal digits, optionally enclosed inside
+ {}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
+ character.
+ Word operators
+
+ The following operators are provided for compatibility with the GNU regular
+ expression library.
+
+ "\w" matches any single character that is a member of the "word" character
+ class, this is identical to the expression "[[:word:]]".
+
+ "\W" matches any single character that is not a member of the "word" character
+ class, this is identical to the expression "[^[:word:]]".
+
+ "\<" matches the null string at the start of a word.
+
+ "\>" matches the null string at the end of the word.
+
+ "\b" matches the null string at either the start or the end of a word.
+
+ "\B" matches a null string within a word.
+
+ The start of the sequence passed to the matching algorithms is considered to be
+ a potential start of a word unless the flag match_not_bow is set. The end of
+ the sequence passed to the matching algorithms is considered to be a potential
+ end of a word unless the flag match_not_eow is set.
+
+ Buffer operators
+
+ The following operators are provided for compatibility with the GNU regular
+ expression library, and Perl regular expressions:
+
+ "\`" matches the start of a buffer.
+
+ "\A" matches the start of the buffer.
+
+ "\'" matches the end of a buffer.
+
+ "\z" matches the end of a buffer.
+
+ "\Z" matches the end of a buffer, or possibly one or more new line characters
+ followed by the end of the buffer.
+
+ A buffer is considered to consist of the whole sequence passed to the matching
+ algorithms, unless the flags match_not_bob or match_not_eob are set.
+
+ Escape operator
+
+ The escape character "\" has several meanings.
+
+ Inside a set declaration the escape character is a normal character unless the
+ flag regex_constants::escape_in_lists is set in which case whatever follows the
+ escape is a literal character regardless of its normal meaning.
+
+ The escape operator may introduce an operator for example: back references, or
+ a word operator.
+
+ The escape operator may make the following character normal, for example "\*"
+ represents a literal "*" rather than the repeat operator.
+
+ Single character escape sequences
+
+ The following escape sequences are aliases for single characters:
+
+
+
+
+
+
+
+ Escape sequence
+
+ Character code
+
+ Meaning
+
+
+
+
+
+ \a
+
+ 0x07
+
+ Bell character.
+
+
+
+
+
+ \f
+
+ 0x0C
+
+ Form feed.
+
+
+
+
+
+ \n
+
+ 0x0A
+
+ Newline character.
+
+
+
+
+
+ \r
+
+ 0x0D
+
+ Carriage return.
+
+
+
+
+
+ \t
+
+ 0x09
+
+ Tab character.
+
+
+
+
+
+ \v
+
+ 0x0B
+
+ Vertical tab.
+
+
+
+
+
+ \e
+
+ 0x1B
+
+ ASCII Escape character.
+
+
+
+
+
+ \0dd
+
+ 0dd
+
+ An octal character code, where dd is one or
+ more octal digits.
+
+
+
+
+
+ \xXX
+
+ 0xXX
+
+ A hexadecimal character code, where XX is one or more
+ hexadecimal digits.
+
+
+
+
+
+ \x{XX}
+
+ 0xXX
+
+ A hexadecimal character code, where XX is one or more
+ hexadecimal digits, optionally a Unicode character.
+
+
+
+
+
+ \cZ
+
+ z-@
+
+ An ASCII escape sequence control-Z, where Z is any
+ ASCII character greater than or equal to the character code for '@'.
+
+
+
+
+
+ Miscellaneous escape sequences:
+
+ The following are provided mostly for perl compatibility, but note that there
+ are some differences in the meanings of \l \L \u and \U:
+
+
+
+
+
+
+
+ \w
+
+ Equivalent to [[:word:]].
+
+
+
+
+
+ \W
+
+ Equivalent to [^[:word:]].
+
+
+
+
+
+ \s
+
+ Equivalent to [[:space:]].
+
+
+
+
+
+ \S
+
+ Equivalent to [^[:space:]].
+
+
+
+
+
+ \d
+
+ Equivalent to [[:digit:]].
+
+
+
+
+
+ \D
+
+ Equivalent to [^[:digit:]].
+
+
+
+
+
+ \l
+
+ Equivalent to [[:lower:]].
+
+
+
+
+
+ \L
+
+ Equivalent to [^[:lower:]].
+
+
+
+
+
+ \u
+
+ Equivalent to [[:upper:]].
+
+
+
+
+
+ \U
+
+ Equivalent to [^[:upper:]].
+
+
+
+
+
+ \C
+
+ Any single character, equivalent to '.'.
+
+
+
+
+
+ \X
+
+ Match any Unicode combining character sequence, for
+ example "a\x 0301" (a letter a with an acute).
+
+
+
+
+
+ \Q
+
+ The begin quote operator, everything that follows is
+ treated as a literal character until a \E end quote operator is found.
+
+
+
+
+
+ \E
+
+ The end quote operator, terminates a sequence begun
+ with \Q.
+
+
+
+
+
+ What gets matched?
+
+
+ When the expression is compiled as a Perl-compatible regex then the matching
+ algorithms will perform a depth first search on the state machine and report
+ the first match found.
+
+ When the expression is compiled as a POSIX-compatible regex then the matching
+ algorithms will match the first possible matching string, if more than one
+ string starting at a given location can match then it matches the longest
+ possible string, unless the flag match_any is set, in which case the first
+ match encountered is returned. Use of the match_any option can reduce the time
+ taken to find the match - but is only useful if the user is less concerned
+ about what matched - for example it would not be suitable for search and
+ replace operations. In cases where their are multiple possible matches all
+ starting at the same location, and all of the same length, then the match
+ chosen is the one with the longest first sub-expression, if that is the same
+ for two or more matches, then the second sub-expression will be examined and so
+ on.
+
+ The following table examples illustrate the main differences between Perl and
+ POSIX regular expression matching rules:
+
+
+
+
+
+
+ Expression
+
+
+ Text
+
+
+ POSIX leftmost longest match
+
+
+ ECMAScript depth first search match
+
+
+
+
+ a|ab
+
+
+
+ xaby
+
+
+
+
+ "ab"
+
+
+ "a"
+
+
+
+
+ .*([[:alnum:]]+).*
+
+
+ " abc def xyz "
+
+ $0 = " abc def xyz "
+ $1 = "abc"
+
+
+ $0 = " abc def xyz "
+ $1 = "z"
+
+
+
+
+
+ .*(a|xayy)
+
+
+ zzxayyzz
+
+
+ "zzxayy"
+
+ "zzxa"
+
+
+
+ These differences between Perl matching rules, and POSIX matching rules, mean
+ that these two regular expression syntaxes differ not only in the features
+ offered, but also in the form that the state machine takes and/or the
+ algorithms used to traverse the state machine.
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/Attic/syntax_option_type.html b/doc/Attic/syntax_option_type.html
new file mode 100644
index 00000000..532d6386
--- /dev/null
+++ b/doc/Attic/syntax_option_type.html
@@ -0,0 +1,332 @@
+
+
+
+ Boost.Regex: syntax_option_type
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ syntax_option_type
+
+
+
+
+
+
+
+
+ Synopsis
+ Type syntax_option type is an implementation defined bitmask type that controls
+ how a regular expression string is to be interpreted. For convenience
+ note that all the constants listed here, are also duplicated within the scope
+ of class template basic_regex .
+ namespace std{ namespace regex_constants{
+
+typedef bitmask_type syntax_option_type;
+// these flags are standardized:
+static const syntax_option_type normal;
+static const syntax_option_type icase;
+static const syntax_option_type nosubs;
+static const syntax_option_type optimize;
+static const syntax_option_type collate;
+static const syntax_option_type ECMAScript = normal;
+static const syntax_option_type JavaScript = normal;
+static const syntax_option_type JScript = normal;
+static const syntax_option_type basic;
+static const syntax_option_type extended;
+static const syntax_option_type awk;
+static const syntax_option_type grep;
+static const syntax_option_type egrep;
+static const syntax_option_type sed = basic;
+static const syntax_option_type perl; // these are boost.regex specific: static const syntax_option_type escape_in_lists; static const syntax_option_type char_classes; static const syntax_option_type intervals; static const syntax_option_type limited_ops; static const syntax_option_type newline_alt; static const syntax_option_type bk_plus_qm; static const syntax_option_type bk_braces; static const syntax_option_type bk_parens; static const syntax_option_type bk_refs; static const syntax_option_type bk_vbar; static const syntax_option_type use_except; static const syntax_option_type failbit; static const syntax_option_type literal; static const syntax_option_type nocollate; static const syntax_option_type perlex; static const syntax_option_type emacs;
+} // namespace regex_constants
+} // namespace std
+ Description
+ The type syntax_option_type
is an implementation defined bitmask
+ type (17.3.2.1.2). Setting its elements has the effects listed in the table
+ below, a valid value of type syntax_option_type
will always have
+ exactly one of the elements normal, basic, extended, awk, grep, egrep, sed
+ or perl
set.
+ Note that for convenience all the constants listed here are duplicated within
+ the scope of class template basic_regex, so you can use any of:
+ boost::regex_constants::constant_name
+ or
+ boost::regex::constant_name
+ or
+ boost::wregex::constant_name
+ in an interchangeable manner.
+
+
+
+
+ Element
+
+
+ Effect if set
+
+
+
+
+ normal
+
+
+ Specifies that the grammar recognized by the regular expression engine uses its
+ normal semantics: that is the same as that given in the ECMA-262, ECMAScript
+ Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
+ (FWD.1).
+ boost.regex also recognizes most perl-compatible extensions in this mode.
+
+
+
+
+ icase
+
+
+ Specifies that matching of regular expressions against a character container
+ sequence shall be performed without regard to case.
+
+
+
+
+ nosubs
+
+
+ Specifies that when a regular expression is matched against a character
+ container sequence, then no sub-expression matches are to be stored in the
+ supplied match_results structure.
+
+
+
+
+ optimize
+
+
+ Specifies that the regular expression engine should pay more attention to the
+ speed with which regular expressions are matched, and less to the speed with
+ which regular expression objects are constructed. Otherwise it has no
+ detectable effect on the program output. This currently has no effect for
+ boost.regex.
+
+
+
+
+ collate
+
+
+ Specifies that character ranges of the form "[a-b]" should be locale sensitive.
+
+
+
+
+ ECMAScript
+
+
+ The same as normal.
+
+
+
+
+ JavaScript
+
+
+ The same as normal.
+
+
+
+
+ JScript
+
+
+ The same as normal.
+
+
+
+
+ basic
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
+ Portable Operating System Interface (POSIX ), Base Definitions and Headers,
+ Section 9, Regular Expressions (FWD.1).
+
+
+
+
+
+ extended
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX extended regular expressions in IEEE Std
+ 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
+ Headers, Section 9, Regular Expressions (FWD.1).
+
+
+
+
+ awk
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
+ Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
+ (FWD.1).
+ That is to say: the same as POSIX extended syntax, but with escape sequences in
+ character classes permitted.
+
+
+
+
+ grep
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
+ Operating System Interface (POSIX ), Shells and Utilities, Section 4,
+ Utilities, grep (FWD.1).
+ That is to say, the same as POSIX basic syntax, but with the newline character
+ acting as an alternation character in addition to "|".
+
+
+
+
+ egrep
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX utility grep when given the -E option in IEEE Std
+ 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
+ Utilities, Section 4, Utilities, grep (FWD.1).
+ That is to say, the same as POSIX extended syntax, but with the newline
+ character acting as an alternation character in addition to "|".
+
+
+
+
+ sed
+
+
+ The same as basic.
+
+
+
+
+ perl
+
+
+ The same as normal.
+
+
+
+
+ The following constants are specific to this particular regular expression
+ implementation and do not appear in the
+ regular expression standardization proposal :
+
+
+
+ regbase::escape_in_lists
+ Allows the use of the escape "\" character in sets of
+ characters, for example [\]] represents the set of characters containing only
+ "]". If this flag is not set then "\" is an ordinary character inside sets.
+
+
+ regbase::char_classes
+ When this bit is set, character classes [:classname:]
+ are allowed inside character set declarations, for example "[[:word:]]"
+ represents the set of all characters that belong to the character class "word".
+
+
+ regbase:: intervals
+ When this bit is set, repetition intervals are
+ allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
+ a's.
+
+
+ regbase:: limited_ops
+ When this bit is set all of "+", "?" and "|" are
+ ordinary characters in all situations.
+
+
+ regbase:: newline_alt
+ When this bit is set, then the newline character "\n"
+ has the same effect as the alternation operator "|".
+
+
+ regbase:: bk_plus_qm
+ When this bit is set then "\+" represents the one or
+ more repetition operator and "\?" represents the zero or one repetition
+ operator. When this bit is not set then "+" and "?" are used instead.
+
+
+ regbase:: bk_braces
+ When this bit is set then "\{" and "\}" are used for
+ bounded repetitions and "{" and "}" are normal characters. This is the opposite
+ of default behavior.
+
+
+ regbase:: bk_parens
+ When this bit is set then "\(" and "\)" are used to
+ group sub-expressions and "(" and ")" are ordinary characters, this is the
+ opposite of default behavior.
+
+
+ regbase:: bk_refs
+ When this bit is set then back references are
+ allowed.
+
+
+ regbase:: bk_vbar
+ When this bit is set then "\|" represents the
+ alternation operator and "|" is an ordinary character. This is the opposite of
+ default behavior.
+
+
+ regbase:: use_except
+ When this bit is set then a bad_expression
+ exception will be thrown on error. Use of this flag is deprecated -
+ basic_regex will always throw on error.
+
+
+ regbase:: failbit
+ This bit is set on error, if regbase::use_except is
+ not set, then this bit should be checked to see if a regular expression is
+ valid before usage.
+
+
+ regbase::literal
+ All characters in the string are treated as literals,
+ there are no special characters or escape sequences.
+
+
+ regbase::emacs
+ Provides compatability with the emacs
+ editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.
+
+
+
+
+ Revised
+
+ 17 May 2003
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/Attic/thread_safety.html b/doc/Attic/thread_safety.html
new file mode 100644
index 00000000..eeda681d
--- /dev/null
+++ b/doc/Attic/thread_safety.html
@@ -0,0 +1,68 @@
+
+
+
+ Boost.Regex: Thread Safety
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Thread Safety
+
+
+
+
+
+
+
+
+ Class basic_regex <> and its typedefs regex
+ and wregex are thread safe, in that compiled regular expressions can safely be
+ shared between threads. The matching algorithms regex_match ,
+ regex_search , regex_grep ,
+ regex_format and regex_merge
+ are all re-entrant and thread safe. Class match_results
+ is now thread safe, in that the results of a match can be safely copied from
+ one thread to another (for example one thread may find matches and push
+ match_results instances onto a queue, while another thread pops them off the
+ other end), otherwise use a separate instance of match_results
+ per thread.
+
+ The POSIX API functions are all re-entrant and
+ thread safe, regular expressions compiled with regcomp can also be
+ shared between threads.
+
+ The class RegEx is only thread safe if each thread
+ gets its own RegEx instance (apartment threading) - this is a consequence of
+ RegEx handling both compiling and matching regular expressions.
+
+ Finally note that changing the global locale invalidates all compiled regular
+ expressions, therefore calling set_locale from one thread while another
+ uses regular expressions will produce unpredictable results.
+
+
+ There is also a requirement that there is only one thread executing prior to
+ the start of main().
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
diff --git a/doc/Attic/uarrow.gif b/doc/Attic/uarrow.gif
new file mode 100644
index 00000000..6afd20c3
Binary files /dev/null and b/doc/Attic/uarrow.gif differ
diff --git a/doc/standards.html b/doc/standards.html
new file mode 100644
index 00000000..35a2e67e
--- /dev/null
+++ b/doc/standards.html
@@ -0,0 +1,79 @@
+
+
+
+ Boost.Regex: Standards Conformance
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Standards Conformance
+
+
+
+
+
+
+
+
+ C++
+ Boost.regex is intended to conform to the
+ regular expression standardization proposal , which will appear in a
+ future C++ standard technical report (and hopefully in a future version of the
+ standard). Currently there are some differences in how the regular
+ expression traits classes are defined, these will be fixed in a future release.
+ ECMAScript / JavaScript
+ All of the ECMAScript regular expression syntax features are supported, except
+ that:
+ Negated class escapes (\S, \D and \W) are not permitted inside character class
+ definitions ( [...] ).
+ The escape sequence \u matches any upper case character (the same as
+ [[:upper:]]) rather than a Unicode escape sequence; use \x{DDDD} for
+ Unicode escape sequences.
+ Perl
+ Almost all Perl features are supported, except for:
+ \N{name} Use [[:name:]] instead.
+ \pP and \PP
+ (?imsx-imsx)
+ (?<=pattern)
+ (?<!pattern)
+ (?{code})
+ (??{code})
+ (?(condition)yes-pattern) and (?(condition)yes-pattern|no-pattern)
+ These embarrassments / limitations will be removed in due course, mainly
+ dependent upon user demand.
+ POSIX
+ All the POSIX basic and extended regular expression features are supported,
+ except that:
+ No character collating names are recognized except those specified in the POSIX
+ standard for the C locale, unless they are explicitly registered with the
+ traits class.
+ Character equivalence classes ( [[=a=]] etc) are probably buggy except on
+ Win32. Implementing this feature requires knowledge of the format of the
+ string sort keys produced by the system; if you need this, and the default
+ implementation doesn't work on your platform, then you will need to supply a
+ custom traits class.
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/sub_match.html b/doc/sub_match.html
new file mode 100644
index 00000000..db995312
--- /dev/null
+++ b/doc/sub_match.html
@@ -0,0 +1,426 @@
+
+
+
+ Boost.Regex: sub_match
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ sub_match
+
+
+
+
+
+
+
+
+ Synopsis
+ #include <boost/regex.hpp >
+
+ Regular expressions are different from many simple pattern-matching algorithms
+ in that as well as finding an overall match they can also produce
+ sub-expression matches: each sub-expression being delimited in the pattern by a
+ pair of parenthesis (...). There has to be some method for reporting
+ sub-expression matches back to the user: this is achieved this by defining a
+ class match_results that acts as an
+ indexed collection of sub-expression matches, each sub-expression match being
+ contained in an object of type sub_match
+ .
+
Objects of type sub_match may only obtained by subscripting an object
+ of type match_results
+ .
+
When the marked sub-expression denoted by an object of type sub_match<>
+ participated in a regular expression match then member matched
evaluates
+ to true, and members first
and second
denote the
+ range of characters [first,second)
which formed that match.
+ Otherwise matched
is false, and members first
and second
+ contained undefined values.
+ If an object of type sub_match<>
represents sub-expression 0
+ - that is to say the whole match - then member matched
is always
+ true, unless a partial match was obtained as a result of the flag match_partial
+ being passed to a regular expression algorithm, in which case member matched
+ is false, and members first
and second
represent the
+ character range that formed the partial match.
+
+namespace boost{
+
+template <class BidirectionalIterator>
+class sub_match : public std::pair<BidirectionalIterator, BidirectionalIterator>
+{
+public:
+ typedef typename iterator_traits<BidirectionalIterator>::value_type value_type;
+ typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type;
+ typedef BidirectionalIterator iterator;
+
+ bool matched;
+
+ difference_type length()const;
+ operator basic_string<value_type>()const;
+ basic_string<value_type> str()const;
+
+ int compare(const sub_match& s)const;
+ int compare(const basic_string<value_type>& s)const;
+ int compare(const value_type* s)const;
+};
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+template <class BidirectionalIterator, class traits, class Allocator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const std::basic_string<iterator_traits<BidirectionalIterator>::value_type, traits, Allocator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+template <class charT, class traits, class BidirectionalIterator>
+basic_ostream<charT, traits>&
+ operator << (basic_ostream<charT, traits>& os,
+ const sub_match<BidirectionalIterator>& m);
+
+} // namespace boost
+ Description
+
+ sub_match members
+ typedef typename std::iterator_traits<iterator>::value_type value_type;
+ The type pointed to by the iterators.
+ typedef typename std::iterator_traits<iterator>::difference_type difference_type;
+ A type that represents the difference between two iterators.
+ typedef iterator iterator_type;
+ The iterator type.
+ iterator first
+ An iterator denoting the position of the start of the match.
+ iterator second
+ An iterator denoting the position of the end of the match.
+ bool matched
+ A Boolean value denoting whether this sub-expression participated in the match.
+ static difference_type length();
+
+
+ Effects: returns (matched ? 0 : distance(first, second))
.
operator basic_string<value_type>()const;
+
+
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>()).
basic_string<value_type> str()const;
+
+
+ Effects: returns (matched ? basic_string<value_type>(first,
+ second) : basic_string<value_type>())
.
int compare(const sub_match& s)const;
+
+
+ Effects: returns str().compare(s.str())
.
int compare(const basic_string<value_type>& s)const;
+
+
+ Effects: returns str().compare(s)
.
int compare(const value_type* s)const;
+
+
+ Effects: returns str().compare(s)
.
+
+ sub_match non-member operators
+ template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) == 0
.
template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) != 0
.
template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) < 0
.
template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) <= 0
.
template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) >= 0
.
template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs.compare(rhs) > 0
.
template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs == rhs.str()
.
template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs != rhs.str()
.
template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs < rhs.str()
.
template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs > rhs.str()
.
template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs >= rhs.str()
.
template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const* lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs <= rhs.str()
.
template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() == rhs
.
template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() != rhs
.
template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() < rhs
.
template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() > rhs
.
template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() >= rhs
.
template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const* rhs);
+
+
+ Effects: returns lhs.str() <= rhs
.
template <class BidirectionalIterator>
+bool operator == (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs == rhs.str()
.
template <class BidirectionalIterator>
+bool operator != (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs != rhs.str()
.
template <class BidirectionalIterator>
+bool operator < (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs < rhs.str()
.
template <class BidirectionalIterator>
+bool operator > (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs > rhs.str()
.
template <class BidirectionalIterator>
+bool operator >= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs >= rhs.str()
.
template <class BidirectionalIterator>
+bool operator <= (typename iterator_traits<BidirectionalIterator>::value_type const& lhs,
+ const sub_match<BidirectionalIterator>& rhs);
+
+
+ Effects: returns lhs <= rhs.str()
.
template <class BidirectionalIterator>
+bool operator == (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() == rhs
.
template <class BidirectionalIterator>
+bool operator != (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() != rhs
.
template <class BidirectionalIterator>
+bool operator < (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() < rhs
.
template <class BidirectionalIterator>
+bool operator > (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() > rhs
.
template <class BidirectionalIterator>
+bool operator >= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() >= rhs
.
template <class BidirectionalIterator>
+bool operator <= (const sub_match<BidirectionalIterator>& lhs,
+ typename iterator_traits<BidirectionalIterator>::value_type const& rhs);
+
+
+ Effects: returns lhs.str() <= rhs
.
template <class charT, class traits, class BidirectionalIterator>
+basic_ostream<charT, traits>&
+ operator << (basic_ostream<charT, traits>& os
+ const sub_match<BidirectionalIterator>& m);
+
+
+ Effects: returns (os << m.str())
.
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/syntax.html b/doc/syntax.html
new file mode 100644
index 00000000..f776cd3c
--- /dev/null
+++ b/doc/syntax.html
@@ -0,0 +1,773 @@
+
+
+
+ Boost.Regex: Regular Expression Syntax
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Regular Expression Syntax
+
+
+
+
+
+
+
+
+ This section covers the regular expression syntax used by this library, this is
+ a programmers guide, the actual syntax presented to your program's users will
+ depend upon the flags used during expression compilation.
+
+ Literals
+
+ All characters are literals except: ".", "|", "*", "?", "+", "(", ")", "{",
+ "}", "[", "]", "^", "$" and "\". These characters are literals when preceded by
+ a "\". A literal is a character that matches itself, or matches the result of
+ traits_type::translate(), where traits_type is the traits template parameter to
+ class basic_regex.
+ Wildcard
+
+ The dot character "." matches any single character except : when match_not_dot_null
+ is passed to the matching algorithms, the dot does not match a null character;
+ when match_not_dot_newline is passed to the matching algorithms, then
+ the dot does not match a newline character.
+
+ Repeats
+
+ A repeat is an expression that is repeated an arbitrary number of times. An
+ expression followed by "*" can be repeated any number of times including zero.
+ An expression followed by "+" can be repeated any number of times, but at least
+ once, if the expression is compiled with the flag regex_constants::bk_plus_qm
+ then "+" is an ordinary character and "\+" represents a repeat of once or more.
+ An expression followed by "?" may be repeated zero or one times only, if the
+ expression is compiled with the flag regex_constants::bk_plus_qm then "?" is an
+ ordinary character and "\?" represents the repeat zero or once operator. When
+ it is necessary to specify the minimum and maximum number of repeats
+ explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a"
+ repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2
+ and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with
+ no upper limit. Note that there must be no white-space inside the {}, and there
+ is no upper limit on the values of the lower and upper bounds. When the
+ expression is compiled with the flag regex_constants::bk_braces then "{" and
+ "}" are ordinary characters and "\{" and "\}" are used to delimit bounds
+ instead. All repeat expressions refer to the shortest possible previous
+ sub-expression: a single character; a character set, or a sub-expression
+ grouped with "()" for example.
+
+ Examples:
+
+ "ba*" will match all of "b", "ba", "baaa" etc.
+
+ "ba+" will match "ba" or "baaaa" for example but not "b".
+
+ "ba?" will match "b" or "ba".
+
+ "ba{2,4}" will match "baa", "baaa" and "baaaa".
+
+ Non-greedy repeats
+
+ Whenever the "extended" regular expression syntax is in use (the default) then
+ non-greedy repeats are possible by appending a '?' after the repeat; a
+ non-greedy repeat is one which will match the shortest possible string.
+
+ For example to match html tag pairs one could use something like:
+
+ "<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
+
+ In this case $1 will contain the text between the tag pairs, and will be the
+ shortest possible matching string.
+
+ Parenthesis
+
+ Parentheses serve two purposes, to group items together into a sub-expression,
+ and to mark what generated the match. For example the expression "(ab)*" would
+ match all of the string "ababab". The matching algorithms
+ regex_match and regex_search
+ each take an instance of match_results
+ that reports what caused the match, on exit from these functions the
+ match_results contains information both on what the whole expression
+ matched and on what each sub-expression matched. In the example above
+ match_results[1] would contain a pair of iterators denoting the final "ab" of
+ the matching string. It is permissible for sub-expressions to match null
+ strings. If a sub-expression takes no part in a match - for example if it is
+ part of an alternative that is not taken - then both of the iterators that are
+ returned for that sub-expression point to the end of the input string, and the matched
+ parameter for that sub-expression is false . Sub-expressions are indexed
+ from left to right starting from 1, sub-expression 0 is the whole expression.
+
+ Non-Marking Parenthesis
+
+ Sometimes you need to group sub-expressions with parenthesis, but don't want
+ the parenthesis to spit out another marked sub-expression, in this case a
+ non-marking parenthesis (?:expression) can be used. For example the following
+ expression creates no sub-expressions:
+
+ "(?:abc)*"
+ Forward Lookahead Asserts
+
+ There are two forms of these; one for positive forward lookahead asserts, and
+ one for negative lookahead asserts:
+ "(?=abc)" matches zero characters only if they are followed by the expression
+ "abc".
+ "(?!abc)" matches zero characters only if they are not followed by the
+ expression "abc".
+ Independent sub-expressions
+ "(?>expression)" matches "expression" as an independent atom (the algorithm
+ will not backtrack into it if a failure occurs later in the expression).
+ Alternatives
+
+ Alternatives occur when the expression can match either one sub-expression or
+ another, each alternative is separated by a "|", or a "\|" if the flag
+ regex_constants::bk_vbar is set, or by a newline character if the flag
+ regex_constants::newline_alt is set. Each alternative is the largest possible
+ previous sub-expression; this is the opposite behavior from repetition
+ operators.
+
+ Examples:
+
+ "a(b|c)" could match "ab" or "ac".
+
+ "abc|def" could match "abc" or "def".
+
+ Sets
+
+ A set is a set of characters that can match any single character that is a
+ member of the set. Sets are delimited by "[" and "]" and can contain literals,
+ character ranges, character classes, collating elements and equivalence
+ classes. Set declarations that start with "^" contain the compliment of the
+ elements that follow.
+
+ Examples:
+
+ Character literals:
+
+ "[abc]" will match either of "a", "b", or "c".
+
+ "[^abc] will match any character other than "a", "b", or "c".
+
+ Character ranges:
+
+ "[a-z]" will match any character in the range "a" to "z".
+
+ "[^A-Z]" will match any character other than those in the range "A" to "Z".
+
+ Note that character ranges are highly locale dependent if the flag
+ regex_constants::collate is set: they match any character that collates between
+ the endpoints of the range, ranges will only behave according to ASCII rules
+ when the default "C" locale is in effect. For example if the library is
+ compiled with the Win32 localization model, then [a-z] will match the ASCII
+ characters a-z, and also 'A', 'B' etc, but not 'Z' which collates just after
+ 'z'. This locale specific behavior is disabled by default (in perl mode), and
+ forces ranges to collate according to ASCII character code.
+
+ Character classes are denoted using the syntax "[:classname:]" within a set
+ declaration, for example "[[:space:]]" is the set of all whitespace characters.
+ Character classes are only available if the flag regex_constants::char_classes
+ is set. The available character classes are:
+
+
+
+
+
+
+
+ alnum
+ Any alpha numeric character.
+
+
+
+
+ alpha
+ Any alphabetical character a-z and A-Z. Other
+ characters may also be included depending upon the locale.
+
+
+
+
+ blank
+ Any blank character, either a space or a tab.
+
+
+
+
+ cntrl
+ Any control character.
+
+
+
+
+ digit
+ Any digit 0-9.
+
+
+
+
+ graph
+ Any graphical character.
+
+
+
+
+ lower
+ Any lower case character a-z. Other characters may
+ also be included depending upon the locale.
+
+
+
+
+ print
+ Any printable character.
+
+
+
+
+ punct
+ Any punctuation character.
+
+
+
+
+ space
+ Any whitespace character.
+
+
+
+
+ upper
+ Any upper case character A-Z. Other characters may
+ also be included depending upon the locale.
+
+
+
+
+ xdigit
+ Any hexadecimal digit character, 0-9, a-f and A-F.
+
+
+
+
+ word
+ Any word character - all alphanumeric characters plus
+ the underscore.
+
+
+
+
+ Unicode
+ Any character whose code is greater than 255, this
+ applies to the wide character traits classes only.
+
+
+
+
+ There are some shortcuts that can be used in place of the character classes,
+ provided the flag regex_constants::escape_in_lists is set then you can use:
+
+ \w in place of [:word:]
+
+ \s in place of [:space:]
+
+ \d in place of [:digit:]
+
+ \l in place of [:lower:]
+
+ \u in place of [:upper:]
+
+ Collating elements take the general form [.tagname.] inside a set declaration,
+ where tagname is either a single character, or a name of a collating
+ element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is
+ equivalent to [,]. The library supports all the standard POSIX collating
+ element names, and in addition the following digraphs: "ae", "ch", "ll", "ss",
+ "nj", "dz", "lj", each in lower, upper and title case variations.
+ Multi-character collating elements can result in the set matching more than one
+ character, for example [[.ae.]] would match two characters, but note that
+ [^[.ae.]] would only match one character.
+
+
+ Equivalence classes take the general form[=tagname=] inside a set declaration,
+ where tagname is either a single character, or a name of a collating
+ element, and matches any character that is a member of the same primary
+ equivalence class as the collating element [.tagname.]. An equivalence class is
+ a set of characters that collate the same, a primary equivalence class is a set
+ of characters whose primary sort key are all the same (for example strings are
+ typically collated by character, then by accent, and then by case; the primary
+ sort key then relates to the character, the secondary to the accentation, and
+ the tertiary to the case). If there is no equivalence class corresponding to tagname
+ , then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
+ locale independent method of obtaining the primary sort key for a character,
+ except under Win32. For other operating systems the library will "guess" the
+ primary sort key from the full sort key (obtained from strxfrm ), so
+ equivalence classes are probably best considered broken under any operating
+ system other than Win32.
+
+ To include a literal "-" in a set declaration then: make it the first character
+ after the opening "[" or "[^", the endpoint of a range, a collating element, or
+ if the flag regex_constants::escape_in_lists is set then precede with an escape
+ character as in "[\-]". To include a literal "[" or "]" or "^" in a set then
+ make them the endpoint of a range, a collating element, or precede with an
+ escape character if the flag regex_constants::escape_in_lists is set.
+
+ Line anchors
+
+ An anchor is something that matches the null string at the start or end of a
+ line: "^" matches the null string at the start of a line, "$" matches the null
+ string at the end of a line.
+
+ Back references
+
+ A back reference is a reference to a previous sub-expression that has already
+ been matched, the reference is to what the sub-expression matched, not to the
+ expression itself. A back reference consists of the escape character "\"
+ followed by a digit "1" to "9", "\1" refers to the first sub-expression, "\2"
+ to the second etc. For example the expression "(.*)\1" matches any string that
+ is repeated about its mid-point for example "abcabc" or "xyzxyz". A back
+ reference to a sub-expression that did not participate in any match, matches
+ the null string: NB this is different to some other regular expression
+ matchers. Back references are only available if the expression is compiled with
+ the flag regex_constants::bk_refs set.
+
+ Characters by code
+
+ This is an extension to the algorithm that is not available in other libraries,
+ it consists of the escape character followed by the digit "0" followed by the
+ octal character code. For example "\023" represents the character whose octal
+ code is 23. Where ambiguity could occur use parentheses to break the expression
+ up: "\0103" represents the character whose code is 103, "(\010)3 represents the
+ character 10 followed by "3". To match characters by their hexadecimal code,
+ use \x followed by a string of hexadecimal digits, optionally enclosed inside
+ {}, for example \xf0 or \x{aff}, notice the latter example is a Unicode
+ character.
+ Word operators
+
+ The following operators are provided for compatibility with the GNU regular
+ expression library.
+
+ "\w" matches any single character that is a member of the "word" character
+ class, this is identical to the expression "[[:word:]]".
+
+ "\W" matches any single character that is not a member of the "word" character
+ class, this is identical to the expression "[^[:word:]]".
+
+ "\<" matches the null string at the start of a word.
+
+ "\>" matches the null string at the end of the word.
+
+ "\b" matches the null string at either the start or the end of a word.
+
+ "\B" matches a null string within a word.
+
+ The start of the sequence passed to the matching algorithms is considered to be
+ a potential start of a word unless the flag match_not_bow is set. The end of
+ the sequence passed to the matching algorithms is considered to be a potential
+ end of a word unless the flag match_not_eow is set.
+
+ Buffer operators
+
+ The following operators are provided for compatibility with the GNU regular
+ expression library, and Perl regular expressions:
+
+ "\`" matches the start of a buffer.
+
+ "\A" matches the start of the buffer.
+
+ "\'" matches the end of a buffer.
+
+ "\z" matches the end of a buffer.
+
+ "\Z" matches the end of a buffer, or possibly one or more new line characters
+ followed by the end of the buffer.
+
+ A buffer is considered to consist of the whole sequence passed to the matching
+ algorithms, unless the flags match_not_bob or match_not_eob are set.
+
+ Escape operator
+
+ The escape character "\" has several meanings.
+
+ Inside a set declaration the escape character is a normal character unless the
+ flag regex_constants::escape_in_lists is set in which case whatever follows the
+ escape is a literal character regardless of its normal meaning.
+
+ The escape operator may introduce an operator for example: back references, or
+ a word operator.
+
+ The escape operator may make the following character normal, for example "\*"
+ represents a literal "*" rather than the repeat operator.
+
+ Single character escape sequences
+
+ The following escape sequences are aliases for single characters:
+
+
+
+
+
+
+
+ Escape sequence
+
+ Character code
+
+ Meaning
+
+
+
+
+
+ \a
+
+ 0x07
+
+ Bell character.
+
+
+
+
+
+ \f
+
+ 0x0C
+
+ Form feed.
+
+
+
+
+
+ \n
+
+ 0x0A
+
+ Newline character.
+
+
+
+
+
+ \r
+
+ 0x0D
+
+ Carriage return.
+
+
+
+
+
+ \t
+
+ 0x09
+
+ Tab character.
+
+
+
+
+
+ \v
+
+ 0x0B
+
+ Vertical tab.
+
+
+
+
+
+ \e
+
+ 0x1B
+
+ ASCII Escape character.
+
+
+
+
+
+ \0dd
+
+ 0dd
+
+ An octal character code, where dd is one or
+ more octal digits.
+
+
+
+
+
+ \xXX
+
+ 0xXX
+
+ A hexadecimal character code, where XX is one or more
+ hexadecimal digits.
+
+
+
+
+
+ \x{XX}
+
+ 0xXX
+
+ A hexadecimal character code, where XX is one or more
+ hexadecimal digits, optionally a Unicode character.
+
+
+
+
+
+ \cZ
+
+ z-@
+
+ An ASCII escape sequence control-Z, where Z is any
+ ASCII character greater than or equal to the character code for '@'.
+
+
+
+
+
+ Miscellaneous escape sequences:
+
+ The following are provided mostly for perl compatibility, but note that there
+ are some differences in the meanings of \l \L \u and \U:
+
+
+
+
+
+
+
+ \w
+
+ Equivalent to [[:word:]].
+
+
+
+
+
+ \W
+
+ Equivalent to [^[:word:]].
+
+
+
+
+
+ \s
+
+ Equivalent to [[:space:]].
+
+
+
+
+
+ \S
+
+ Equivalent to [^[:space:]].
+
+
+
+
+
+ \d
+
+ Equivalent to [[:digit:]].
+
+
+
+
+
+ \D
+
+ Equivalent to [^[:digit:]].
+
+
+
+
+
+ \l
+
+ Equivalent to [[:lower:]].
+
+
+
+
+
+ \L
+
+ Equivalent to [^[:lower:]].
+
+
+
+
+
+ \u
+
+ Equivalent to [[:upper:]].
+
+
+
+
+
+ \U
+
+ Equivalent to [^[:upper:]].
+
+
+
+
+
+ \C
+
+ Any single character, equivalent to '.'.
+
+
+
+
+
+ \X
+
+ Match any Unicode combining character sequence, for
+ example "a\x 0301" (a letter a with an acute).
+
+
+
+
+
+ \Q
+
+ The begin quote operator, everything that follows is
+ treated as a literal character until a \E end quote operator is found.
+
+
+
+
+
+ \E
+
+ The end quote operator, terminates a sequence begun
+ with \Q.
+
+
+
+
+
+ What gets matched?
+
+
+ When the expression is compiled as a Perl-compatible regex then the matching
+ algorithms will perform a depth first search on the state machine and report
+ the first match found.
+
+ When the expression is compiled as a POSIX-compatible regex then the matching
+ algorithms will match the first possible matching string, if more than one
+ string starting at a given location can match then it matches the longest
+ possible string, unless the flag match_any is set, in which case the first
+ match encountered is returned. Use of the match_any option can reduce the time
+ taken to find the match - but is only useful if the user is less concerned
+ about what matched - for example it would not be suitable for search and
+ replace operations. In cases where their are multiple possible matches all
+ starting at the same location, and all of the same length, then the match
+ chosen is the one with the longest first sub-expression, if that is the same
+ for two or more matches, then the second sub-expression will be examined and so
+ on.
+
+ The following table examples illustrate the main differences between Perl and
+ POSIX regular expression matching rules:
+
+
+
+
+
+
+ Expression
+
+
+ Text
+
+
+ POSIX leftmost longest match
+
+
+ ECMAScript depth first search match
+
+
+
+
+ a|ab
+
+
+
+ xaby
+
+
+
+
+ "ab"
+
+
+ "a"
+
+
+
+
+ .*([[:alnum:]]+).*
+
+
+ " abc def xyz "
+
+ $0 = " abc def xyz "
+ $1 = "abc"
+
+
+ $0 = " abc def xyz "
+ $1 = "z"
+
+
+
+
+
+ .*(a|xayy)
+
+
+ zzxayyzz
+
+
+ "zzxayy"
+
+ "zzxa"
+
+
+
+ These differences between Perl matching rules, and POSIX matching rules, mean
+ that these two regular expression syntaxes differ not only in the features
+ offered, but also in the form that the state machine takes and/or the
+ algorithms used to traverse the state machine.
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/syntax_option_type.html b/doc/syntax_option_type.html
new file mode 100644
index 00000000..532d6386
--- /dev/null
+++ b/doc/syntax_option_type.html
@@ -0,0 +1,332 @@
+
+
+
+ Boost.Regex: syntax_option_type
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ syntax_option_type
+
+
+
+
+
+
+
+
+ Synopsis
+ Type syntax_option type is an implementation defined bitmask type that controls
+ how a regular expression string is to be interpreted. For convenience
+ note that all the constants listed here, are also duplicated within the scope
+ of class template basic_regex .
+ namespace std{ namespace regex_constants{
+
+typedef bitmask_type syntax_option_type;
+// these flags are standardized:
+static const syntax_option_type normal;
+static const syntax_option_type icase;
+static const syntax_option_type nosubs;
+static const syntax_option_type optimize;
+static const syntax_option_type collate;
+static const syntax_option_type ECMAScript = normal;
+static const syntax_option_type JavaScript = normal;
+static const syntax_option_type JScript = normal;
+static const syntax_option_type basic;
+static const syntax_option_type extended;
+static const syntax_option_type awk;
+static const syntax_option_type grep;
+static const syntax_option_type egrep;
+static const syntax_option_type sed = basic;
+static const syntax_option_type perl; // these are boost.regex specific: static const syntax_option_type escape_in_lists; static const syntax_option_type char_classes; static const syntax_option_type intervals; static const syntax_option_type limited_ops; static const syntax_option_type newline_alt; static const syntax_option_type bk_plus_qm; static const syntax_option_type bk_braces; static const syntax_option_type bk_parens; static const syntax_option_type bk_refs; static const syntax_option_type bk_vbar; static const syntax_option_type use_except; static const syntax_option_type failbit; static const syntax_option_type literal; static const syntax_option_type nocollate; static const syntax_option_type perlex; static const syntax_option_type emacs;
+} // namespace regex_constants
+} // namespace std
+ Description
+ The type syntax_option_type
is an implementation defined bitmask
+ type (17.3.2.1.2). Setting its elements has the effects listed in the table
+ below, a valid value of type syntax_option_type
will always have
+ exactly one of the elements normal, basic, extended, awk, grep, egrep, sed
+ or perl
set.
+ Note that for convenience all the constants listed here are duplicated within
+ the scope of class template basic_regex, so you can use any of:
+ boost::regex_constants::constant_name
+ or
+ boost::regex::constant_name
+ or
+ boost::wregex::constant_name
+ in an interchangeable manner.
+
+
+
+
+ Element
+
+
+ Effect if set
+
+
+
+
+ normal
+
+
+ Specifies that the grammar recognized by the regular expression engine uses its
+ normal semantics: that is the same as that given in the ECMA-262, ECMAScript
+ Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects
+ (FWD.1).
+ boost.regex also recognizes most perl-compatible extensions in this mode.
+
+
+
+
+ icase
+
+
+ Specifies that matching of regular expressions against a character container
+ sequence shall be performed without regard to case.
+
+
+
+
+ nosubs
+
+
+ Specifies that when a regular expression is matched against a character
+ container sequence, then no sub-expression matches are to be stored in the
+ supplied match_results structure.
+
+
+
+
+ optimize
+
+
+ Specifies that the regular expression engine should pay more attention to the
+ speed with which regular expressions are matched, and less to the speed with
+ which regular expression objects are constructed. Otherwise it has no
+ detectable effect on the program output. This currently has no effect for
+ boost.regex.
+
+
+
+
+ collate
+
+
+ Specifies that character ranges of the form "[a-b]" should be locale sensitive.
+
+
+
+
+ ECMAScript
+
+
+ The same as normal.
+
+
+
+
+ JavaScript
+
+
+ The same as normal.
+
+
+
+
+ JScript
+
+
+ The same as normal.
+
+
+
+
+ basic
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001,
+ Portable Operating System Interface (POSIX ), Base Definitions and Headers,
+ Section 9, Regular Expressions (FWD.1).
+
+
+
+
+
+ extended
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX extended regular expressions in IEEE Std
+ 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and
+ Headers, Section 9, Regular Expressions (FWD.1).
+
+
+
+
+ awk
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable
+ Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk
+ (FWD.1).
+ That is to say: the same as POSIX extended syntax, but with escape sequences in
+ character classes permitted.
+
+
+
+
+ grep
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable
+ Operating System Interface (POSIX ), Shells and Utilities, Section 4,
+ Utilities, grep (FWD.1).
+ That is to say, the same as POSIX basic syntax, but with the newline character
+ acting as an alternation character in addition to "|".
+
+
+
+
+ egrep
+
+
+ Specifies that the grammar recognized by the regular expression engine is the
+ same as that used by POSIX utility grep when given the -E option in IEEE Std
+ 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and
+ Utilities, Section 4, Utilities, grep (FWD.1).
+ That is to say, the same as POSIX extended syntax, but with the newline
+ character acting as an alternation character in addition to "|".
+
+
+
+
+ sed
+
+
+ The same as basic.
+
+
+
+
+ perl
+
+
+ The same as normal.
+
+
+
+
+ The following constants are specific to this particular regular expression
+ implementation and do not appear in the
+ regular expression standardization proposal :
+
+
+
+ regbase::escape_in_lists
+ Allows the use of the escape "\" character in sets of
+ characters, for example [\]] represents the set of characters containing only
+ "]". If this flag is not set then "\" is an ordinary character inside sets.
+
+
+ regbase::char_classes
+ When this bit is set, character classes [:classname:]
+ are allowed inside character set declarations, for example "[[:word:]]"
+ represents the set of all characters that belong to the character class "word".
+
+
+ regbase:: intervals
+ When this bit is set, repetition intervals are
+ allowed, for example "a{2,4}" represents a repeat of between 2 and 4 letter
+ a's.
+
+
+ regbase:: limited_ops
+ When this bit is set all of "+", "?" and "|" are
+ ordinary characters in all situations.
+
+
+ regbase:: newline_alt
+ When this bit is set, then the newline character "\n"
+ has the same effect as the alternation operator "|".
+
+
+ regbase:: bk_plus_qm
+ When this bit is set then "\+" represents the one or
+ more repetition operator and "\?" represents the zero or one repetition
+ operator. When this bit is not set then "+" and "?" are used instead.
+
+
+ regbase:: bk_braces
+ When this bit is set then "\{" and "\}" are used for
+ bounded repetitions and "{" and "}" are normal characters. This is the opposite
+ of default behavior.
+
+
+ regbase:: bk_parens
+ When this bit is set then "\(" and "\)" are used to
+ group sub-expressions and "(" and ")" are ordinary characters, this is the
+ opposite of default behavior.
+
+
+ regbase:: bk_refs
+ When this bit is set then back references are
+ allowed.
+
+
+ regbase:: bk_vbar
+ When this bit is set then "\|" represents the
+ alternation operator and "|" is an ordinary character. This is the opposite of
+ default behavior.
+
+
+ regbase:: use_except
+ When this bit is set then a bad_expression
+ exception will be thrown on error. Use of this flag is deprecated -
+ basic_regex will always throw on error.
+
+
+ regbase:: failbit
+ This bit is set on error, if regbase::use_except is
+ not set, then this bit should be checked to see if a regular expression is
+ valid before usage.
+
+
+ regbase::literal
+ All characters in the string are treated as literals,
+ there are no special characters or escape sequences.
+
+
+ regbase::emacs
+ Provides compatability with the emacs
+ editor, eqivalent to: bk_braces | bk_parens | bk_refs | bk_vbar.
+
+
+
+
+ Revised
+
+ 17 May 2003
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
+
diff --git a/doc/thread_safety.html b/doc/thread_safety.html
new file mode 100644
index 00000000..eeda681d
--- /dev/null
+++ b/doc/thread_safety.html
@@ -0,0 +1,68 @@
+
+
+
+ Boost.Regex: Thread Safety
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Thread Safety
+
+
+
+
+
+
+
+
+ Class basic_regex <> and its typedefs regex
+ and wregex are thread safe, in that compiled regular expressions can safely be
+ shared between threads. The matching algorithms regex_match ,
+ regex_search , regex_grep ,
+ regex_format and regex_merge
+ are all re-entrant and thread safe. Class match_results
+ is now thread safe, in that the results of a match can be safely copied from
+ one thread to another (for example one thread may find matches and push
+ match_results instances onto a queue, while another thread pops them off the
+ other end), otherwise use a separate instance of match_results
+ per thread.
+
+ The POSIX API functions are all re-entrant and
+ thread safe, regular expressions compiled with regcomp can also be
+ shared between threads.
+
+ The class RegEx is only thread safe if each thread
+ gets its own RegEx instance (apartment threading) - this is a consequence of
+ RegEx handling both compiling and matching regular expressions.
+
+ Finally note that changing the global locale invalidates all compiled regular
+ expressions, therefore calling set_locale from one thread while another
+ uses regular expressions will produce unpredictable results.
+
+
+ There is also a requirement that there is only one thread executing prior to
+ the start of main().
+
+ Revised
+
+ 17 May 2003
+
+
+ © Copyright John Maddock 1998- 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty.
+
+
+
diff --git a/doc/uarrow.gif b/doc/uarrow.gif
new file mode 100644
index 00000000..6afd20c3
Binary files /dev/null and b/doc/uarrow.gif differ
diff --git a/doc/vc71-performance.html b/doc/vc71-performance.html
new file mode 100644
index 00000000..2478065d
--- /dev/null
+++ b/doc/vc71-performance.html
@@ -0,0 +1,705 @@
+
+
+
+ Regular Expression Performance Comparison (Visual Studio.NET 2003)
+
+
+
+
+
+
+
+ Regular Expression Performance Comparison
+ The following tables provide comparisons between the following regular
+ expression libraries:
+ GRETA .
+ The Boost regex library .
+ Henry Spencer's regular expression library
+ - this is provided for comparison as a typical non-backtracking implementation.
+ Philip Hazel's PCRE library.
+ Details
+ Machine: Intel Pentium 4 2.8GHz PC.
+ Compiler: Microsoft Visual C++ version 7.1.
+ C++ Standard Library: Dinkumware standard library version 313.
+ OS: Win32.
+ Boost version: 1.31.0.
+ PCRE version: 3.9.
+ As ever care should be taken in interpreting the results, only sensible regular
+ expressions (rather than pathological cases) are given, most are taken from the
+ Boost regex examples, or from the Library of
+ Regular Expressions . In addition, some variation in the relative
+ performance of these libraries can be expected on other machines - as memory
+ access and processor caching effects can be quite large for most finite state
+ machine algorithms. In each case the first figure given is the relative
+ time taken (so a value of 1.0 is as good as it gets), while the second figure
+ is the actual time taken.
+ Averages
+ The following are the average relative scores for all the tests: the perfect
+ regular expression library would score 1, in practice anything less than 2
+ is pretty good.
+
+
+ GRETA
+ GRETA
+ (non-recursive mode)
+ Boost
+ Boost + C++ locale
+ POSIX
+ PCRE
+
+
+ 6.90669
+ 23.751
+ 1.62553
+ 1.38213
+ 110.973
+ 1.69371
+
+
+
+
+ Comparison 1: Long Search
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within a long English language text was measured
+ (mtent12.txt
+ from Project Gutenberg , 19Mb).
+
+
+ Expression
+ GRETA
+ GRETA
+ (non-recursive mode)
+ Boost
+ Boost + C++ locale
+ POSIX
+ PCRE
+
+
+ Twain
+ 19.7
+ (0.541s)
+ 85.5
+ (2.35s)
+ 3.09
+ (0.0851s)
+ 3.09
+ (0.0851s)
+ 131
+ (3.6s)
+ 1
+ (0.0275s)
+
+
+ Huck[[:alpha:]]+
+ 11
+ (0.55s)
+ 93.4
+ (4.68s)
+ 3.4
+ (0.17s)
+ 3.35
+ (0.168s)
+ 124
+ (6.19s)
+ 1
+ (0.0501s)
+
+
+ [[:alpha:]]+ing
+ 11.3
+ (6.82s)
+ 21.3
+ (12.8s)
+ 1.83
+ (1.1s)
+ 1
+ (0.601s)
+ 6.47
+ (3.89s)
+ 4.75
+ (2.85s)
+
+
+ ^[^ ]*?Twain
+ 5.75
+ (1.15s)
+ 17.1
+ (3.43s)
+ 1
+ (0.2s)
+ 1.3
+ (0.26s)
+ NA
+ 3.8
+ (0.761s)
+
+
+ Tom|Sawyer|Huckleberry|Finn
+ 28.5
+ (3.1s)
+ 77.2
+ (8.4s)
+ 2.3
+ (0.251s)
+ 1
+ (0.109s)
+ 191
+ (20.8s)
+ 1.77
+ (0.193s)
+
+
+ (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)
+ 16.2
+ (4.14s)
+ 49
+ (12.5s)
+ 1.65
+ (0.42s)
+ 1
+ (0.255s)
+ NA
+ 2.43
+ (0.62s)
+
+
+
+
+ Comparison 2: Medium Sized Search
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within a medium sized English language text was
+ measured (the first 50K from mtent12.txt).
+
+
+ Expression
+ GRETA
+ GRETA
+ (non-recursive mode)
+ Boost
+ Boost + C++ locale
+ POSIX
+ PCRE
+
+
+ Twain
+ 9.49
+ (0.00274s)
+ 40.7
+ (0.0117s)
+ 1.54
+ (0.000445s)
+ 1.56
+ (0.00045s)
+ 13.5
+ (0.00391s)
+ 1
+ (0.000289s)
+
+
+ Huck[[:alpha:]]+
+ 14.3
+ (0.0027s)
+ 62.3
+ (0.0117s)
+ 2.26
+ (0.000425s)
+ 2.29
+ (0.000431s)
+ 1.27
+ (0.000239s)
+ 1
+ (0.000188s)
+
+
+ [[:alpha:]]+ing
+ 7.34
+ (0.0178s)
+ 13.7
+ (0.0331s)
+ 1
+ (0.00243s)
+ 1.02
+ (0.00246s)
+ 7.36
+ (0.0178s)
+ 5.87
+ (0.0142s)
+
+
+ ^[^ ]*?Twain
+ 8.34
+ (0.00579s)
+ 24.8
+ (0.0172s)
+ 1.52
+ (0.00105s)
+ 1
+ (0.000694s)
+ NA
+ 2.81
+ (0.00195s)
+
+
+ Tom|Sawyer|Huckleberry|Finn
+ 12.9
+ (0.00781s)
+ 35.1
+ (0.0213s)
+ 1.67
+ (0.00102s)
+ 1
+ (0.000606s)
+ 81.5
+ (0.0494s)
+ 1.94
+ (0.00117s)
+
+
+ (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)
+ 15.6
+ (0.0106s)
+ 46.6
+ (0.0319s)
+ 2.72
+ (0.00186s)
+ 1
+ (0.000684s)
+ 311
+ (0.213s)
+ 1.72
+ (0.00117s)
+
+
+
+
+ Comparison 3: C++ Code Search
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within the C++ source file
+ boost/crc.hpp was measured.
+
+
+ Expression
+ GRETA
+ GRETA
+ (non-recursive mode)
+ Boost
+ Boost + C++ locale
+ POSIX
+ PCRE
+
+
+ ^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([
+ ]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{)
+ 8.88
+ (0.000792s)
+ 46.4
+ (0.00414s)
+ 1.19
+ (0.000106s)
+ 1
+ (8.92e-005s)
+ 688
+ (0.0614s)
+ 3.23
+ (0.000288s)
+
+
+ (^[
+ ]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\>
+ 1
+ (0.00571s)
+ 5.31
+ (0.0303s)
+ 2.47
+ (0.0141s)
+ 1.92
+ (0.011s)
+ NA
+ 3.29
+ (0.0188s)
+
+
+ ^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>)
+ 5.78
+ (0.00172s)
+ 26.3
+ (0.00783s)
+ 1.12
+ (0.000333s)
+ 1
+ (0.000298s)
+ 128
+ (0.0382s)
+ 1.74
+ (0.000518s)
+
+
+ ^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>)
+ 10.2
+ (0.00305s)
+ 28.4
+ (0.00845s)
+ 1.12
+ (0.000333s)
+ 1
+ (0.000298s)
+ 155
+ (0.0463s)
+ 1.74
+ (0.000519s)
+
+
+
+
+ Comparison 4: HTML Document Search
+
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within the html file libs/libraries.htm
+ was measured.
+
+
+ Expression
+ GRETA
+ GRETA
+ (non-recursive mode)
+ Boost
+ Boost + C++ locale
+ POSIX
+ PCRE
+
+
+ beman|john|dave
+ 11
+ (0.00297s)
+ 34.3
+ (0.00922s)
+ 1.78
+ (0.000479s)
+ 1
+ (0.000269s)
+ 55.2
+ (0.0149s)
+ 1.85
+ (0.000499s)
+
+
+ <p>.*?</p>
+ 5.38
+ (0.00145s)
+ 21.8
+ (0.00587s)
+ 1.02
+ (0.000274s)
+ 1
+ (0.000269s)
+ NA
+ 1.05
+ (0.000283s)
+
+
+ <a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*>
+ 4.51
+ (0.00207s)
+ 12.6
+ (0.00579s)
+ 1.34
+ (0.000616s)
+ 1
+ (0.000459s)
+ 343
+ (0.158s)
+ 1.09
+ (0.000499s)
+
+
+ <h[12345678][^>]*>.*?</h[12345678]>
+ 7.39
+ (0.00143s)
+ 29.6
+ (0.00571s)
+ 1.87
+ (0.000362s)
+ 1
+ (0.000193s)
+ NA
+ 1.27
+ (0.000245s)
+
+
+ <img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*>
+ 6.73
+ (0.00145s)
+ 27.3
+ (0.00587s)
+ 1.2
+ (0.000259s)
+ 1.32
+ (0.000283s)
+ 148
+ (0.0319s)
+ 1
+ (0.000215s)
+
+
+ <font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font>
+ 6.93
+ (0.00153s)
+ 27
+ (0.00595s)
+ 1.22
+ (0.000269s)
+ 1.31
+ (0.000289s)
+ NA
+ 1
+ (0.00022s)
+
+
+
+
+ Comparison 3: Simple Matches
+ For each of the following regular expressions the time taken to match against
+ the text indicated was measured.
+
+
+ Expression
+ Text
+ GRETA
+ GRETA
+ (non-recursive mode)
+ Boost
+ Boost + C++ locale
+ POSIX
+ PCRE
+
+
+ abc
+ abc
+ 1.31
+ (2.2e-007s)
+ 1.94
+ (3.25e-007s)
+ 1.26
+ (2.1e-007s)
+ 1.24
+ (2.08e-007s)
+ 3.03
+ (5.06e-007s)
+ 1
+ (1.67e-007s)
+
+
+ ^([0-9]+)(\-| |$)(.*)$
+ 100- this is a line of ftp response which contains a message string
+ 1.52
+ (6.88e-007s)
+ 2.28
+ (1.03e-006s)
+ 1.5
+ (6.78e-007s)
+ 1.5
+ (6.78e-007s)
+ 329
+ (0.000149s)
+ 1
+ (4.53e-007s)
+
+
+ ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}
+ 1234-5678-1234-456
+ 2.04
+ (1.03e-006s)
+ 2.83
+ (1.43e-006s)
+ 2.12
+ (1.07e-006s)
+ 2.04
+ (1.03e-006s)
+ 30.8
+ (1.56e-005s)
+ 1
+ (5.05e-007s)
+
+
+ ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
+ john_maddock@compuserve.com
+ 1.48
+ (1.78e-006s)
+ 2.1
+ (2.52e-006s)
+ 1.35
+ (1.62e-006s)
+ 1.32
+ (1.59e-006s)
+ 165
+ (0.000198s)
+ 1
+ (1.2e-006s)
+
+
+ ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
+ foo12@foo.edu
+ 1.28
+ (1.41e-006s)
+ 1.9
+ (2.1e-006s)
+ 1.42
+ (1.57e-006s)
+ 1.38
+ (1.53e-006s)
+ 107
+ (0.000119s)
+ 1
+ (1.11e-006s)
+
+
+ ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
+ bob.smith@foo.tv
+ 1.29
+ (1.43e-006s)
+ 1.9
+ (2.1e-006s)
+ 1.42
+ (1.57e-006s)
+ 1.38
+ (1.53e-006s)
+ 119
+ (0.000132s)
+ 1
+ (1.11e-006s)
+
+
+ ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$
+ EH10 2QQ
+ 1.26
+ (4.63e-007s)
+ 1.77
+ (6.49e-007s)
+ 1.3
+ (4.77e-007s)
+ 1.2
+ (4.4e-007s)
+ 9.15
+ (3.36e-006s)
+ 1
+ (3.68e-007s)
+
+
+ ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$
+ G1 1AA
+ 1.06
+ (4.73e-007s)
+ 1.59
+ (7.07e-007s)
+ 1.05
+ (4.68e-007s)
+ 1
+ (4.44e-007s)
+ 12.9
+ (5.73e-006s)
+ 1.63
+ (7.26e-007s)
+
+
+ ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$
+ SW1 1ZZ
+ 1.26
+ (9.17e-007s)
+ 1.84
+ (1.34e-006s)
+ 1.28
+ (9.26e-007s)
+ 1.21
+ (8.78e-007s)
+ 8.42
+ (6.11e-006s)
+ 1
+ (7.26e-007s)
+
+
+ ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$
+ 4/1/2001
+ 1.57
+ (9.73e-007s)
+ 2.28
+ (1.41e-006s)
+ 1.25
+ (7.73e-007s)
+ 1.26
+ (7.83e-007s)
+ 11.2
+ (6.95e-006s)
+ 1
+ (6.21e-007s)
+
+
+ ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$
+ 12/12/2001
+ 1.52
+ (9.56e-007s)
+ 2.06
+ (1.3e-006s)
+ 1.29
+ (8.12e-007s)
+ 1.24
+ (7.83e-007s)
+ 12.4
+ (7.8e-006s)
+ 1
+ (6.3e-007s)
+
+
+ ^[-+]?[[:digit:]]*\.?[[:digit:]]*$
+ 123
+ 2.11
+ (7.35e-007s)
+ 3.18
+ (1.11e-006s)
+ 2.5
+ (8.7e-007s)
+ 2.44
+ (8.5e-007s)
+ 5.26
+ (1.83e-006s)
+ 1
+ (3.49e-007s)
+
+
+ ^[-+]?[[:digit:]]*\.?[[:digit:]]*$
+ +3.14159
+ 1.31
+ (4.96e-007s)
+ 1.92
+ (7.26e-007s)
+ 1.26
+ (4.77e-007s)
+ 1.2
+ (4.53e-007s)
+ 9.71
+ (3.66e-006s)
+ 1
+ (3.77e-007s)
+
+
+ ^[-+]?[[:digit:]]*\.?[[:digit:]]*$
+ -3.14159
+ 1.32
+ (4.97e-007s)
+ 1.92
+ (7.26e-007s)
+ 1.24
+ (4.67e-007s)
+ 1.2
+ (4.53e-007s)
+ 9.7
+ (3.66e-006s)
+ 1
+ (3.78e-007s)
+
+
+
+
+
+ Copyright John Maddock April 2003, all rights reserved.
+
+
diff --git a/example/snippets/regex_iterator_example.cpp b/example/snippets/regex_iterator_example.cpp
new file mode 100644
index 00000000..6ec3d85e
--- /dev/null
+++ b/example/snippets/regex_iterator_example.cpp
@@ -0,0 +1,115 @@
+/*
+ *
+ * Copyright (c) 2003
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+ /*
+ * LOCATION: see http://www.boost.org for most recent version.
+ * FILE regex_iterator_example_2.cpp
+ * VERSION see
+ * DESCRIPTION: regex_iterator example 2: searches a cpp file for class definitions,
+ * using global data.
+ */
+
+#include
+#include
+#include
+#include
+#include
+
+using namespace std;
+
+// purpose:
+// takes the contents of a file in the form of a string
+// and searches for all the C++ class definitions, storing
+// their locations in a map of strings/int's
+
+typedef std::map > map_type;
+
+const char* re =
+ // possibly leading whitespace:
+ "^[[:space:]]*"
+ // possible template declaration:
+ "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
+ // class or struct:
+ "(class|struct)[[:space:]]*"
+ // leading declspec macros etc:
+ "("
+ "\\<\\w+\\>"
+ "("
+ "[[:blank:]]*\\([^)]*\\)"
+ ")?"
+ "[[:space:]]*"
+ ")*"
+ // the class name
+ "(\\<\\w*\\>)[[:space:]]*"
+ // template specialisation parameters
+ "(<[^;:{]+>)?[[:space:]]*"
+ // terminate in { or :
+ "(\\{|:[^;\\{()]*\\{)";
+
+
+boost::regex expression(re);
+map_type class_index;
+
+bool regex_callback(const boost::match_results& what)
+{
+ // what[0] contains the whole string
+ // what[5] contains the class name.
+ // what[6] contains the template specialisation if any.
+ // add class name and position to map:
+ class_index[what[5].str() + what[6].str()] = what.position(5);
+ return true;
+}
+
+void load_file(std::string& s, std::istream& is)
+{
+ s.erase();
+ s.reserve(is.rdbuf()->in_avail());
+ char c;
+ while(is.get(c))
+ {
+ if(s.capacity() == s.size())
+ s.reserve(s.capacity() * 3);
+ s.append(1, c);
+ }
+}
+
+int main(int argc, const char** argv)
+{
+ std::string text;
+ for(int i = 1; i < argc; ++i)
+ {
+ cout << "Processing file " << argv[i] << endl;
+ std::ifstream fs(argv[i]);
+ load_file(text, fs);
+ // construct our iterators:
+ boost::regex_iterator m1(text.begin(), text.end(), expression);
+ boost::regex_iterator m2;
+ std::for_each(m1, m2, ®ex_callback);
+ // copy results:
+ cout << class_index.size() << " matches found" << endl;
+ map_type::iterator c, d;
+ c = class_index.begin();
+ d = class_index.end();
+ while(c != d)
+ {
+ cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl;
+ ++c;
+ }
+ class_index.erase(class_index.begin(), class_index.end());
+ }
+ return 0;
+}
+
+
diff --git a/example/snippets/regex_replace_example.cpp b/example/snippets/regex_replace_example.cpp
new file mode 100644
index 00000000..b00345ff
--- /dev/null
+++ b/example/snippets/regex_replace_example.cpp
@@ -0,0 +1,138 @@
+/*
+ *
+ * Copyright (c) 1998-2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+ /*
+ * LOCATION: see http://www.boost.org for most recent version.
+ * FILE regex_replace_example.cpp
+ * VERSION see
+ * DESCRIPTION: regex_replace example:
+ * converts a C++ file to syntax highlighted HTML.
+ */
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+
+// purpose:
+// takes the contents of a file and transform to
+// syntax highlighted code in html format
+
+boost::regex e1, e2;
+extern const char* expression_text;
+extern const char* format_string;
+extern const char* pre_expression;
+extern const char* pre_format;
+extern const char* header_text;
+extern const char* footer_text;
+
+void load_file(std::string& s, std::istream& is)
+{
+ s.erase();
+ s.reserve(is.rdbuf()->in_avail());
+ char c;
+ while(is.get(c))
+ {
+ if(s.capacity() == s.size())
+ s.reserve(s.capacity() * 3);
+ s.append(1, c);
+ }
+}
+
+int main(int argc, const char** argv)
+{
+ try{
+ e1.assign(expression_text);
+ e2.assign(pre_expression);
+ for(int i = 1; i < argc; ++i)
+ {
+ std::cout << "Processing file " << argv[i] << std::endl;
+ std::ifstream fs(argv[i]);
+ std::string in;
+ load_file(in, fs);
+ std::string out_name = std::string(argv[i]) + std::string(".htm");
+ std::ofstream os(out_name.c_str());
+ os << header_text;
+ // strip '<' and '>' first by outputting to a
+ // temporary string stream
+ std::ostringstream t(std::ios::out | std::ios::binary);
+ std::ostream_iterator oi(t);
+ boost::regex_replace(oi, in.begin(), in.end(), e2, pre_format, boost::match_default | boost::format_all);
+ // then output to final output stream
+ // adding syntax highlighting:
+ std::string s(t.str());
+ std::ostream_iterator out(os);
+ boost::regex_replace(out, s.begin(), s.end(), e1, format_string, boost::match_default | boost::format_all);
+ os << footer_text;
+ }
+ }
+ catch(...)
+ { return -1; }
+ return 0;
+}
+
+extern const char* pre_expression = "(<)|(>)|\\r";
+extern const char* pre_format = "(?1<)(?2>)";
+
+
+const char* expression_text = // preprocessor directives: index 1
+ "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
+ // comment: index 2
+ "(//[^\\n]*|/\\*.*?\\*/)|"
+ // literals: index 3
+ "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
+ // string literals: index 4
+ "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
+ // keywords: index 5
+ "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
+ "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
+ "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
+ "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
+ "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
+ "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
+ "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
+ "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
+ "|using|virtual|void|volatile|wchar_t|while)\\>"
+ ;
+
+const char* format_string = "(?1$& )"
+ "(?2$& )"
+ "(?3$& )"
+ "(?4$& )"
+ "(?5$& )";
+
+const char* header_text = "\n\n"
+ "Auto-generated html formated source \n"
+ " \n"
+ "\n"
+ "\n"
+ "
\n";
+
+const char* footer_text = " \n\n\n";
+
+
+
+
+
+
+
+
+
+
+
diff --git a/example/snippets/regex_token_iterator_example_1.cpp b/example/snippets/regex_token_iterator_example_1.cpp
new file mode 100644
index 00000000..8ba8dcb5
--- /dev/null
+++ b/example/snippets/regex_token_iterator_example_1.cpp
@@ -0,0 +1,75 @@
+/*
+ *
+ * Copyright (c) 12003
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+ /*
+ * LOCATION: see http://www.boost.org for most recent version.
+ * FILE regex_token_iterator_example_1.cpp
+ * VERSION see
+ * DESCRIPTION: regex_token_iterator example: split a string into tokens.
+ */
+
+
+#include
+
+#include
+using namespace std;
+
+
+#if defined(BOOST_MSVC) || (defined(__BORLANDC__) && (__BORLANDC__ == 0x550))
+//
+// problem with std::getline under MSVC6sp3
+istream& getline(istream& is, std::string& s)
+{
+ s.erase();
+ char c = is.get();
+ while(c != '\n')
+ {
+ s.append(1, c);
+ c = is.get();
+ }
+ return is;
+}
+#endif
+
+
+int main(int argc)
+{
+ string s;
+ do{
+ if(argc == 1)
+ {
+ cout << "Enter text to split (or \"quit\" to exit): ";
+ getline(cin, s);
+ if(s == "quit") break;
+ }
+ else
+ s = "This is a string of tokens";
+
+ boost::regex re("\\s+");
+ boost::regex_token_iterator i(s.begin(), s.end(), re, -1);
+ boost::regex_token_iterator j;
+
+ unsigned count = 0;
+ while(i != j)
+ {
+ cout << *i++ << endl;
+ count++;
+ }
+ cout << "There were " << count << " tokens found." << endl;
+
+ }while(argc == 1);
+ return 0;
+}
+
diff --git a/example/snippets/regex_token_iterator_example_2.cpp b/example/snippets/regex_token_iterator_example_2.cpp
new file mode 100644
index 00000000..71b2188b
--- /dev/null
+++ b/example/snippets/regex_token_iterator_example_2.cpp
@@ -0,0 +1,92 @@
+/*
+ *
+ * Copyright (c) 2003
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+ /*
+ * LOCATION: see http://www.boost.org for most recent version.
+ * FILE regex_token_iterator_example_2.cpp
+ * VERSION see
+ * DESCRIPTION: regex_token_iterator example: spit out linked URL's.
+ */
+
+
+#include
+#include
+#include
+#include
+
+boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
+ boost::regex::normal | boost::regbase::icase);
+
+void load_file(std::string& s, std::istream& is)
+{
+ s.erase();
+ //
+ // attempt to grow string buffer to match file size,
+ // this doesn't always work...
+ s.reserve(is.rdbuf()->in_avail());
+ char c;
+ while(is.get(c))
+ {
+ // use logarithmic growth stategy, in case
+ // in_avail (above) returned zero:
+ if(s.capacity() == s.size())
+ s.reserve(s.capacity() * 3);
+ s.append(1, c);
+ }
+}
+
+int main(int argc, char** argv)
+{
+ std::string s;
+ int i;
+ for(i = 1; i < argc; ++i)
+ {
+ std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
+ s.erase();
+ std::ifstream is(argv[i]);
+ load_file(s, is);
+ boost::regex_token_iterator
+ i(s.begin(), s.end(), e, 1);
+ boost::regex_token_iterator j;
+ while(i != j)
+ {
+ std::cout << *i++ << std::endl;
+ }
+ }
+ //
+ // alternative method:
+ // test the array-literal constructor, and split out the whole
+ // match as well as $1....
+ //
+ for(i = 1; i < argc; ++i)
+ {
+ std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
+ s.erase();
+ std::ifstream is(argv[i]);
+ load_file(s, is);
+ const int subs[] = {1, 0,};
+ boost::regex_token_iterator
+ i(s.begin(), s.end(), e, subs);
+ boost::regex_token_iterator j;
+ while(i != j)
+ {
+ std::cout << *i++ << std::endl;
+ }
+ }
+
+ return 0;
+}
+
+
diff --git a/faq.htm b/faq.htm
deleted file mode 100644
index fb3795b6..00000000
--- a/faq.htm
+++ /dev/null
@@ -1,205 +0,0 @@
-
-
-
-
-
-
-Regex++ - FAQ
-
-
-
-
-
-
-
-
-
-
- Regex++, FAQ.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-Q. Why does using parenthesis in a
-regular expression change the result of a match?
-
-Parentheses don't only mark; they determine what the best
-match is as well. regex++ tries to follow the POSIX standard
-leftmost longest rule for determining what matched. So if there
-is more than one possible match after considering the whole
-expression, it looks next at the first sub-expression and then
-the second sub-expression and so on. So...
-
-"(0*)([0-9]*)" against "00123" would produce
-$1 = "00"
-$2 = "123"
-
-where as
-
-"0*([0-9)*" against "00123" would produce
-$1 = "00123"
-
-If you think about it, had $1 only matched the "123",
-this would be "less good" than the match "00123"
-which is both further to the left and longer. If you want $1 to
-match only the "123" part, then you need to use
-something like:
-
-"0*([1-9][0-9]*)"
-
-as the expression.
-
-Q. Configure says that my compiler is
-unable to merge template instances, what does this mean?
-
-A. When you compile template code, you can end up with the
-same template instances in multiple translation units - this will
-lead to link time errors unless your compiler/linker is smart
-enough to merge these template instances into a single record in
-the executable file. If you see this warning after running
-configure, then you can still link to libregex++.a if:
-
-
- You use only the low-level template classes (reg_expression<>
- match_results<> etc), from a single translation
- unit, and use no other part of regex++.
- You use only the POSIX API functions (regcomp regexec etc),
- and no other part of regex++.
- You use only the high level class RegEx, and no other
- part of regex++.
-
-
-Another option is to create a master include file, which
-#include's all the regex++ source files, and all the source files
-in which you use regex++. You then compile and link this master
-file as a single translation unit.
-
-Q. Configure says that my compiler is
-unable to merge template instances from archive files, what does
-this mean?
-
-A. When you compile template code, you can end up with the
-same template instances in multiple translation units - this will
-lead to link time errors unless your compiler/linker is smart
-enough to merge these template instances into a single record in
-the executable file. Some compilers are able to do this for
-normal .cpp or .o files, but fail if the object file has been
-placed in a library archive. If you see this warning after
-running configure, then you can still link to libregex++.a if:
-
-
- You use only the low-level template classes (reg_expression<>
- match_results<> etc), and use no other part of
- regex++.
- You use only the POSIX API functions (regcomp regexec etc),
- and no other part of regex++.
- You use only the high level class RegEx, and no other
- part of regex++.
-
-
-Another option is to add the regex++ source files directly to
-your project instead of linking to libregex++.a, generally you
-should do this only if you are getting link time errors with
-libregex++.a.
-
-Q. Configure says that my compiler can't
-merge templates containing switch statements, what does this
-mean?
-
-A. Some compilers can't merge templates that contain static
-data - this includes switch statements which implicitly generate
-static data as well as code. Principally this affects the egcs
-compiler - but note gcc 2.81 also suffers from this problem - the
-compiler will compile and link the code - but the code will not
-run because the code and the static data it uses have become
-separated. The default behaviour of regex++ is to try and fix
-this problem by declaring "problem" templates inside
-unnamed namespaces, so that the templates have internal linkage.
-Note that this can result in a great deal of code bloat. If the
-compiler doesn't support namespaces, or if code bloat becomes a
-problem, then follow the guidelines above for placing all the
-templates used in a single translation unit, and edit boost/regex/config.hpp
-so that BOOST_REGEX_NO_TEMPLATE_SWITCH_MERGE is no longer defined.
-
-
-Q. I can't get regex++ to work with
-escape characters, what's going on?
-
-A. If you embed regular expressions in C++ code, then remember
-that escape characters are processed twice: once by the C++
-compiler, and once by the regex++ expression compiler, so to pass
-the regular expression \d+ to regex++, you need to embed "\\d+"
-in your code. Likewise to match a literal backslash you will need
-to embed "\\\\" in your code.
-
-Q. Why don't character ranges work
-properly?
-A. The POSIX standard specifies that character range expressions
-are locale sensitive - so for example the expression [A-Z] will
-match any collating element that collates between 'A' and 'Z'.
-That means that for most locales other than "C" or
-"POSIX", [A-Z] would match the single character 't' for
-example, which is not what most people expect - or at least not
-what most people have come to expect from regular expression
-engines. For this reason, the default behaviour of regex++ is to
-turn locale sensitive collation off by setting the regbase::nocollate
-compile time flag (this is set by regbase::normal). However if
-you set a non-default compile time flag - for example regbase::extended
-or regbase::basic, then locale dependent collation will be
-enabled, this also applies to the POSIX API functions which use
-either regbase::extended or regbase::basic internally, in the
-latter case use REG_NOCOLLATE in combination with either
-REG_BASIC or REG_EXTENDED when invoking regcomp if you don't want
-locale sensitive collation. [Note - when regbase::nocollate in
-effect, the library behaves "as if" the LC_COLLATE
-locale category were always "C", regardless of what its
-actually set to - end note ].
-
- Q. Why can't I use the "convenience"
-versions of query_match/reg_search/reg_grep/reg_format/reg_merge?
-
-
-A. These versions may or may not be available depending upon
-the capabilities of your compiler, the rules determining the
-format of these functions are quite complex - and only the
-versions visible to a standard compliant compiler are given in
-the help. To find out what your compiler supports, run <boost/regex.hpp>
-through your C++ pre-processor, and search the output file for
-the function that you are interested in.
-
-Q. Why are there no throw specifications
-on any of the functions? What exceptions can the library throw?
-
-
-A. Not all compilers support (or honor) throw specifications,
-others support them but with reduced efficiency. Throw
-specifications may be added at a later date as compilers begin to
-handle this better. The library should throw only three types of
-exception: boost::bad_expression can be thrown by reg_expression
-when compiling a regular expression, std::runtime_error can be
-thrown when a call to reg_expression::imbue tries to open a
-message catalogue that doesn't exist or when a call to RegEx::GrepFiles
-or RegEx::FindFiles tries to open a file that cannot be opened,
-finally std::bad_alloc can be thrown by just about any of the
-functions in this library.
-
-
-
-Copyright Dr
-John Maddock 1998-2000 all rights reserved.
-
-
diff --git a/format_string.htm b/format_string.htm
deleted file mode 100644
index 41a33842..00000000
--- a/format_string.htm
+++ /dev/null
@@ -1,243 +0,0 @@
-
-
-
-
-
-
-Regex++, Format String Reference
-
-
-
-
-
-
-
-
-
-
- Regex++, Format
- String Reference.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
- Format String Syntax
-
-Format strings are used by the algorithms regex_format and regex_merge , and are
-used to transform one string into another.
-
-There are three kind of format string: sed, perl and extended,
-the extended syntax is the default so this is covered first.
-
-Extended format syntax
-
-In format strings, all characters are treated as literals
-except: ()$\?:
-
-To use any of these as literals you must prefix them with the
-escape character \
-
-The following special sequences are recognized:
-
-
-
-Grouping:
-
-Use the parenthesis characters ( and ) to group sub-expressions
-within the format string, use \( and \) to represent literal '('
-and ')'.
-
-
-
-Sub-expression expansions:
-
-The following perl like expressions expand to a particular
-matched sub-expression:
-
-
-
-
-
- $`
- Expands to all the text from
- the end of the previous match to the start of the current
- match, if there was no previous match in the current
- operation, then everything from the start of the input
- string to the start of the match.
-
-
-
-
- $'
- Expands to all the text from
- the end of the match to the end of the input string.
-
-
-
-
- $&
- Expands to all of the
- current match.
-
-
-
-
- $0
- Expands to all of the
- current match.
-
-
-
-
- $N
- Expands to the text that
- matched sub-expression N .
-
-
-
-
-
-
-
-Conditional expressions:
-
-Conditional expressions allow two different format strings to
-be selected dependent upon whether a sub-expression participated
-in the match or not:
-
-?Ntrue_expression:false_expression
-
-Executes true_expression if sub-expression N
-participated in the match, otherwise executes false_expression.
-
-Example: suppose we search for "(while)|(for)" then
-the format string "?1WHILE:FOR" would output what
-matched, but in upper case.
-
-
-
-Escape sequences:
-
-The following escape sequences are also allowed:
-
-
-
-
-
- \a
- The bell character.
-
-
-
-
- \f
- The form feed character.
-
-
-
-
- \n
- The newline character.
-
-
-
-
- \r
- The carriage return
- character.
-
-
-
-
- \t
- The tab character.
-
-
-
-
- \v
- A vertical tab character.
-
-
-
-
- \x
- A hexadecimal character -
- for example \x0D.
-
-
-
-
- \x{}
- A possible unicode
- hexadecimal character - for example \x{1A0}
-
-
-
-
- \cx
- The ASCII escape character
- x, for example \c@ is equivalent to escape-@.
-
-
-
-
- \e
- The ASCII escape character.
-
-
-
-
- \dd
- An octal character constant,
- for example \10.
-
-
-
-
-
-
-
-Perl format strings
-
-Perl format strings are the same as the default syntax except
-that the characters ()?: have no special meaning.
-
-Sed format strings
-
-Sed format strings use only the characters \ and & as
-special characters.
-
-\n where n is a digit, is expanded to the nth sub-expression.
-
-& is expanded to the whole of the match (equivalent to \0).
-
-
-Other escape sequences are expanded as per the default syntax.
-
-
-
-
-
-Copyright Dr
-John Maddock 1998-2000 all rights reserved.
-
-
diff --git a/hl_ref.htm b/hl_ref.htm
deleted file mode 100644
index 44b803a1..00000000
--- a/hl_ref.htm
+++ /dev/null
@@ -1,572 +0,0 @@
-
-
-
-
-
-
-Regex++, RegEx Class Reference
-
-
-
-
-
-
-
-
-
-
- Regex++, RegEx Class
- Reference.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
-Class RegEx
-
-#include <boost/cregex.hpp>
-
-The class RegEx provides a high level simplified interface to
-the regular expression library, this class only handles narrow
-character strings, and regular expressions always follow the
-"normal" syntax - that is the same as the standard
-POSIX extended syntax, but with locale specific collation
-disabled, and escape characters inside character set declarations
-are allowed.
-
-typedef bool (*GrepCallback)(const RegEx& expression);
-typedef bool (*GrepFileCallback)(const char * file, const RegEx& expression);
-typedef bool (*FindFilesCallback)(const char * file);
-
-class RegEx
-{
-public :
- RegEx();
- RegEx(const RegEx& o);
- ~RegEx();
- RegEx(const char * c, bool icase = false );
- explicit RegEx(const std::string& s, bool icase = false );
- RegEx& operator =(const RegEx& o);
- RegEx& operator =(const char * p);
- RegEx& operator =(const std::string& s);
- unsigned int SetExpression(const char * p, bool icase = false );
- unsigned int SetExpression(const std::string& s, bool icase = false );
- std::string Expression()const ;
- //
- // now matching operators:
- //
- bool Match(const char * p, unsigned int flags = match_default);
- bool Match(const std::string& s, unsigned int flags = match_default);
- bool Search(const char * p, unsigned int flags = match_default);
- bool Search(const std::string& s, unsigned int flags = match_default);
- unsigned int Grep(GrepCallback cb, const char * p, unsigned int flags = match_default);
- unsigned int Grep(GrepCallback cb, const std::string& s, unsigned int flags = match_default);
- unsigned int Grep(std::vector<std::string>& v, const char * p, unsigned int flags = match_default);
- unsigned int Grep(std::vector<std::string>& v, const std::string& s, unsigned int flags = match_default);
- unsigned int Grep(std::vector<unsigned int >& v, const char * p, unsigned int flags = match_default);
- unsigned int Grep(std::vector<unsigned int >& v, const std::string& s, unsigned int flags = match_default);
- unsigned int GrepFiles(GrepFileCallback cb, const char * files, bool recurse = false , unsigned int flags = match_default);
- unsigned int GrepFiles(GrepFileCallback cb, const std::string& files, bool recurse = false , unsigned int flags = match_default);
- unsigned int FindFiles(FindFilesCallback cb, const char * files, bool recurse = false , unsigned int flags = match_default);
- unsigned int FindFiles(FindFilesCallback cb, const std::string& files, bool recurse = false , unsigned int flags = match_default);
- std::string Merge(const std::string& in, const std::string& fmt, bool copy = true , unsigned int flags = match_default);
- std::string Merge(const char* in, const char* fmt, bool copy = true , unsigned int flags = match_default);
- unsigned Split(std::vector<std::string>& v, std::string& s, unsigned flags = match_default, unsigned max_count = ~0);
- //
- // now operators for returning what matched in more detail:
- //
- unsigned int Position(int i = 0)const ;
- unsigned int Length(int i = 0)const ;
- bool Matched(int i = 0)const ;
- unsigned int Line()const ;
- unsigned int Marks() const;
- std::string What(int i)const ;
- std::string operator [](int i)const ;
-
- static const unsigned int npos;
-};
-
-Member functions for class RegEx are defined as follows:
-
-
-
-
-
- RegEx();
- Default constructor,
- constructs an instance of RegEx without any valid
- expression.
-
-
-
-
- RegEx(const
- RegEx& o);
- Copy constructor, all the
- properties of parameter o are copied.
-
-
-
-
- RegEx(const char *
- c, bool icase = false );
- Constructs an instance of
- RegEx, setting the expression to c , if icase
- is true then matching is insensitive to case,
- otherwise it is sensitive to case. Throws bad_expression
- on failure.
-
-
-
-
- RegEx(const std::string&
- s, bool icase = false );
- Constructs an instance of
- RegEx, setting the expression to s , if icase is
- true then matching is insensitive to case,
- otherwise it is sensitive to case. Throws bad_expression
- on failure.
-
-
-
-
- RegEx& operator =(const
- RegEx& o);
- Default assignment operator.
-
-
-
-
- RegEx& operator =(const
- char * p);
- Assignment operator,
- equivalent to calling SetExpression(p, false).
- Throws bad_expression on failure.
-
-
-
-
- RegEx& operator =(const
- std::string& s);
- Assignment operator,
- equivalent to calling SetExpression(s, false).
- Throws bad_expression on failure.
-
-
-
-
- unsigned int
- SetExpression(constchar * p, bool icase = false );
- Sets the current expression
- to p , if icase is true then matching
- is insensitive to case, otherwise it is sensitive to case.
- Throws bad_expression on failure.
-
-
-
-
- unsigned int
- SetExpression(const std::string& s, bool
- icase = false );
- Sets the current expression
- to s , if icase is true then matching
- is insensitive to case, otherwise it is sensitive to case.
- Throws bad_expression on failure.
-
-
-
-
- std::string Expression()const ;
- Returns a copy of the
- current regular expression.
-
-
-
-
- bool Match(const
- char * p, unsigned int flags =
- match_default);
- Attempts to match the
- current expression against the text p using the
- match flags flags - see match flags .
- Returns true if the expression matches the whole
- of the input string.
-
-
-
-
- bool Match(const
- std::string& s, unsigned int flags =
- match_default) ;
- Attempts to match the
- current expression against the text s using the
- match flags flags - see match flags .
- Returns true if the expression matches the whole
- of the input string.
-
-
-
-
- bool Search(const
- char * p, unsigned int flags =
- match_default);
- Attempts to find a match for
- the current expression somewhere in the text p
- using the match flags flags - see match flags .
- Returns true if the match succeeds.
-
-
-
-
- bool Search(const
- std::string& s, unsigned int flags =
- match_default) ;
- Attempts to find a match for
- the current expression somewhere in the text s
- using the match flags flags - see match flags .
- Returns true if the match succeeds.
-
-
-
-
- unsigned int
- Grep(GrepCallback cb, const char * p, unsigned
- int flags = match_default);
- Finds all matches of the
- current expression in the text p using the match
- flags flags - see match flags .
- For each match found calls the call-back function cb
- as: cb(*this); If at any stage the call-back function
- returns false then the grep operation terminates,
- otherwise continues until no further matches are found.
- Returns the number of matches found.
-
-
-
-
-
- unsigned int
- Grep(GrepCallback cb, const std::string& s, unsigned
- int flags = match_default);
- Finds all matches of the
- current expression in the text s using the match
- flags flags - see match flags .
- For each match found calls the call-back function cb
- as: cb(*this); If at any stage the call-back function
- returns false then the grep operation terminates,
- otherwise continues until no further matches are found.
- Returns the number of matches found.
-
-
-
-
-
- unsigned int
- Grep(std::vector<std::string>& v, const char *
- p, unsigned int flags = match_default);
- Finds all matches of the
- current expression in the text p using the match
- flags flags - see match flags .
- For each match pushes a copy of what matched onto v .
- Returns the number of matches found.
-
-
-
-
- unsigned int
- Grep(std::vector<std::string>& v, const
- std::string& s, unsigned int flags =
- match_default);
- Finds all matches of the
- current expression in the text s using the match
- flags flags - see match flags .
- For each match pushes a copy of what matched onto v .
- Returns the number of matches found.
-
-
-
-
- unsigned int
- Grep(std::vector<unsigned int >& v, const
- char * p, unsigned int flags =
- match_default);
- Finds all matches of the
- current expression in the text p using the match
- flags flags - see match flags .
- For each match pushes the starting index of what matched
- onto v . Returns the number of matches found.
-
-
-
-
- unsigned int
- Grep(std::vector<unsigned int >& v, const
- std::string& s, unsigned int flags =
- match_default);
- Finds all matches of the
- current expression in the text s using the match
- flags flags - see match flags .
- For each match pushes the starting index of what matched
- onto v . Returns the number of matches found.
-
-
-
-
- unsigned int
- GrepFiles(GrepFileCallback cb, const char *
- files, bool recurse = false , unsigned
- int flags = match_default);
- Finds all matches of the
- current expression in the files files using the
- match flags flags - see match flags .
- For each match calls the call-back function cb. If
- the call-back returns false then the algorithm returns
- without considering further matches in the current file,
- or any further files.
- The parameter files can include wild card
- characters '*' and '?', if the parameter recurse
- is true then searches sub-directories for matching file
- names.
- Returns the total number of matches found.
- May throw an exception derived from std::runtime_error
- if file io fails.
-
-
-
-
-
- unsigned int
- GrepFiles(GrepFileCallback cb, const std::string&
- files, bool recurse = false , unsigned
- int flags = match_default);
- Finds all matches of the
- current expression in the files files using the
- match flags flags - see match flags .
- For each match calls the call-back function cb. If
- the call-back returns false then the algorithm returns
- without considering further matches in the current file,
- or any further files.
- The parameter files can include wild card
- characters '*' and '?', if the parameter recurse
- is true then searches sub-directories for matching file
- names.
- Returns the total number of matches found.
- May throw an exception derived from std::runtime_error
- if file io fails.
-
-
-
-
-
- unsigned int
- FindFiles(FindFilesCallback cb, const char *
- files, bool recurse = false , unsigned
- int flags = match_default);
- Searches files to
- find all those which contain at least one match of the
- current expression using the match flags flags -
- see match
- flags . For each matching file calls the call-back
- function cb. If the call-back returns false then
- the algorithm returns without considering any further
- files.
- The parameter files can include wild card
- characters '*' and '?', if the parameter recurse
- is true then searches sub-directories for matching file
- names.
- Returns the total number of files found.
- May throw an exception derived from std::runtime_error
- if file io fails.
-
-
-
-
-
- unsigned int
- FindFiles(FindFilesCallback cb, const std::string&
- files, bool recurse = false , unsigned
- int flags = match_default);
- Searches files to
- find all those which contain at least one match of the
- current expression using the match flags flags -
- see match
- flags . For each matching file calls the call-back
- function cb. If the call-back returns false then
- the algorithm returns without considering any further
- files.
- The parameter files can include wild card
- characters '*' and '?', if the parameter recurse
- is true then searches sub-directories for matching file
- names.
- Returns the total number of files found.
- May throw an exception derived from std::runtime_error
- if file io fails.
-
-
-
-
-
- std::string Merge(const
- std::string& in, const std::string& fmt, bool
- copy = true , unsigned int flags =
- match_default);
- Performs a search and
- replace operation: searches through the string in
- for all occurrences of the current expression, for each
- occurrence replaces the match with the format string fmt .
- Uses flags to determine what gets matched, and how
- the format string should be treated. If copy is
- true then all unmatched sections of input are copied
- unchanged to output, if the flag format_first_only
- is set then only the first occurance of the pattern found
- is replaced. Returns the new string. See also format string
- syntax , match
- flags and format flags .
-
-
-
-
- std::string Merge(const
- char* in, const char* fmt, bool copy = true ,
- unsigned int flags = match_default);
- Performs a search and
- replace operation: searches through the string in
- for all occurrences of the current expression, for each
- occurrence replaces the match with the format string fmt .
- Uses flags to determine what gets matched, and how
- the format string should be treated. If copy is
- true then all unmatched sections of input are copied
- unchanged to output, if the flag format_first_only
- is set then only the first occurance of the pattern found
- is replaced. Returns the new string. See also format string
- syntax , match
- flags and format flags .
-
-
-
-
- unsigned Split(std::vector<std::string>&
- v, std::string& s, unsigned flags =
- match_default, unsigned max_count = ~0);
- Splits the input string and pushes each
- one onto the vector. If the expression contains no marked
- sub-expressions, then one string is outputted for each
- section of the input that does not match the expression.
- If the expression does contain marked sub-expressions,
- then outputs one string for each marked sub-expression
- each time a match occurs. Outputs no more than max_count
- strings. Before returning, deletes from the input
- string s all of the input that has been processed
- (all of the string if max_count was not reached).
- Returns the number of strings pushed onto the vector.
-
-
-
-
- unsigned int
- Position(int i = 0)const ;
- Returns the position of what
- matched sub-expression i . If i = 0 then
- returns the position of the whole match. Returns RegEx::npos
- if the supplied index is invalid, or if the specified sub-expression
- did not participate in the match.
-
-
-
-
- unsigned int
- Length(int i = 0)const ;
- Returns the length of what
- matched sub-expression i . If i = 0 then
- returns the length of the whole match. Returns RegEx::npos
- if the supplied index is invalid, or if the specified sub-expression
- did not participate in the match.
-
-
-
-
- bool Matched(int i
- = 0)const ;
- Returns true if sub-expression i was
- matched, false otherwise.
-
-
-
-
- unsigned int
- Line()const ;
- Returns the line on which
- the match occurred, indexes start from 1 not zero, if no
- match occurred then returns RegEx::npos.
-
-
-
-
- unsigned int Marks()
- const;
- Returns the number of marked
- sub-expressions contained in the expression. Note that
- this includes the whole match (sub-expression zero), so
- the value returned is always >= 1.
-
-
-
-
- std::string What(int
- i)const ;
- Returns a copy of what
- matched sub-expression i . If i = 0 then
- returns a copy of the whole match. Returns a null string
- if the index is invalid or if the specified sub-expression
- did not participate in a match.
-
-
-
-
- std::string operator [](int
- i)const ;
- Returns what(i); Can
- be used to simplify access to sub-expression matches, and
- make usage more perl-like.
-
-
-
-
-
-
-
-Copyright Dr
-John Maddock 1998-2000 all rights reserved.
-
-
diff --git a/index.htm b/index.htm
deleted file mode 100644
index f313dd7c..00000000
--- a/index.htm
+++ /dev/null
@@ -1,150 +0,0 @@
-
-
-
-
-
-
-
-regex++, Index
-
-
-
-
-
-
-
-
-
-
- Regex++, Index.
- (Version 3.31, 16th Dec 2001)
-
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
-Contents
-
-
-
-
-
-Copyright Dr
-John Maddock 1998-2001 all rights reserved.
-
-
diff --git a/index.html b/index.html
new file mode 100644
index 00000000..a1f01b7b
--- /dev/null
+++ b/index.html
@@ -0,0 +1,9 @@
+
+
+
+
+
+ Automatic redirection failed, please go to doc/index.html .
+
+
+
diff --git a/introduction.htm b/introduction.htm
deleted file mode 100644
index bcac99bb..00000000
--- a/introduction.htm
+++ /dev/null
@@ -1,476 +0,0 @@
-
-
-
-
-
-
-
-regex++, Introduction
-
-
-
-
-
-
-
-
-
-
- Regex++, Introduction.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
-Introduction
-
-Regular expressions are a form of pattern-matching that are
-often used in text processing; many users will be familiar with
-the Unix utilities grep , sed and awk , and
-the programming language perl , each of which make
-extensive use of regular expressions. Traditionally C++ users
-have been limited to the POSIX C API's for manipulating regular
-expressions, and while regex++ does provide these API's, they do
-not represent the best way to use the library. For example regex++
-can cope with wide character strings, or search and replace
-operations (in a manner analogous to either sed or perl),
-something that traditional C libraries can not do.
-
-The class boost::reg_expression
-is the key class in this library; it represents a "machine
-readable" regular expression, and is very closely modelled
-on std::basic_string, think of it as a string plus the actual
-state-machine required by the regular expression algorithms. Like
-std::basic_string there are two typedefs that are almost always
-the means by which this class is referenced:
-
-namespace boost{
-
-template <class charT,
- class traits = regex_traits<charT>,
- class Allocator = std::allocator<charT> >
-class reg_expression;
-
-typedef reg_expression<char > regex;
-typedef reg_expression<wchar_t> wregex;
-
-}
-
-To see how this library can be used, imagine that we are
-writing a credit card processing application. Credit card numbers
-generally come as a string of 16-digits, separated into groups of
-4-digits, and separated by either a space or a hyphen. Before
-storing a credit card number in a database (not necessarily
-something your customers will appreciate!), we may want to verify
-that the number is in the correct format. To match any digit we
-could use the regular expression [0-9], however ranges of
-characters like this are actually locale dependent. Instead we
-should use the POSIX standard form [[:digit:]], or the regex++
-and perl shorthand for this \d (note that many older libraries
-tended to be hard-coded to the C-locale, consequently this was
-not an issue for them). That leaves us with the following regular
-expression to validate credit card number formats:
-
-(\d{4}[- ]){3}\d{4}
-
-Here the parenthesis act to group (and mark for future
-reference) sub-expressions, and the {4} means "repeat
-exactly 4 times". This is an example of the extended regular
-expression syntax used by perl, awk and egrep. Regex++ also
-supports the older "basic" syntax used by sed and grep,
-but this is generally less useful, unless you already have some
-basic regular expressions that you need to reuse.
-
-Now lets take that expression and place it in some C++ code to
-validate the format of a credit card number:
-
-bool validate_card_format(const std::string s)
-{
- static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
- return regex_match (s, e);
-}
-
-Note how we had to add some extra escapes to the expression:
-remember that the escape is seen once by the C++ compiler, before
-it gets to be seen by the regular expression engine, consequently
-escapes in regular expressions have to be doubled up when
-embedding them in C/C++ code. Also note that all the examples
-assume that your compiler supports Koenig lookup, if yours
-doesn't (for example VC6), then you will have to add some boost::
-prefixes to some of the function calls in the examples.
-
-Those of you who are familiar with credit card processing,
-will have realised that while the format used above is suitable
-for human readable card numbers, it does not represent the format
-required by online credit card systems; these require the number
-as a string of 16 (or possibly 15) digits, without any
-intervening spaces. What we need is a means to convert easily
-between the two formats, and this is where search and replace
-comes in. Those who are familiar with the utilities sed
-and perl will already be ahead here; we need two strings -
-one a regular expression - the other a "format string " that provides a
-description of the text to replace the match with. In regex++
-this search and replace operation is performed with the algorithm
-regex_merge, for our credit card example we can write two
-algorithms like this to provide the format conversions:
-
-
-// match any format with the regular expression:
- const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
-const std::string machine_format("\\1\\2\\3\\4");
-const std::string human_format("\\1-\\2-\\3-\\4");
-
-std::string machine_readable_card_number(const std::string s)
-{
- return regex_merge (s, e, machine_format, boost::match_default | boost::format_sed);
-}
-
-std::string human_readable_card_number(const std::string s)
-{
- return regex_merge (s, e, human_format, boost::match_default | boost::format_sed);
-}
-
-Here we've used marked sub-expressions in the regular
-expression to split out the four parts of the card number as
-separate fields, the format string then uses the sed-like syntax
-to replace the matched text with the reformatted version.
-
-In the examples above, we haven't directly manipulated the
-results of a regular expression match, however in general the
-result of a match contains a number of sub-expression matches in
-addition to the overall match. When the library needs to report a
-regular expression match it does so using an instance of the
-class match_results ,
-as before there are typedefs of this class for the most common
-cases:
-
-namespace boost{
-typedef match_results<const char *> cmatch;
-typedef match_results<const wchar_t *> wcmatch;
-typedef match_results<std::string::const_iterator> smatch;
-typedef match_results<std::wstring::const_iterator> wsmatch;
-}
-
-The algorithms regex_search
-and regex_grep (i.e.
-finding all matches in a string) make use of match_results to
-report what matched.
-
-Note that these algorithms are not restricted to searching
-regular C-strings, any bidirectional iterator type can be
-searched, allowing for the possibility of seamlessly searching
-almost any kind of data.
-
-For search and replace operations in addition to the algorithm
-regex_merge that
-we have already seen, the algorithm regex_format takes
-the result of a match and a format string, and produces a new
-string by merging the two.
-
-For those that dislike templates, there is a high level
-wrapper class RegEx that is an encapsulation of the lower level
-template code - it provides a simplified interface for those that
-don't need the full power of the library, and supports only
-narrow characters, and the "extended" regular
-expression syntax.
-
-The POSIX API functions:
-regcomp, regexec, regfree and regerror, are available in both
-narrow character and Unicode versions, and are provided for those
-who need compatibility with these API's.
-
-Finally, note that the library now has run-time localization support, and
-recognizes the full POSIX regular expression syntax - including
-advanced features like multi-character collating elements and
-equivalence classes - as well as providing compatibility with
-other regular expression libraries including GNU and BSD4 regex
-packages, and to a more limited extent perl 5.
-
-Installation and Configuration
-Options
-
-[ Important : If you are
-upgrading from the 2.x version of this library then you will find
-a number of changes to the documented header names and library
-interfaces, existing code should still compile unchanged however
-- see Note
-for Upgraders . ]
-
-When you extract the library from its zip file, you must
-preserve its internal directory structure (for example by using
-the -d option when extracting). If you didn't do that when
-extracting, then you'd better stop reading this, delete the files
-you just extracted, and try again!
-
-This library should not need configuring before use; most
-popular compilers/standard libraries/platforms are already
-supported "as is". If you do experience configuration
-problems, or just want to test the configuration with your
-compiler, then the process is the same as for all of boost; see
-the configuration library
-documentation .
-
-The library will encase all code inside namespace boost.
-
-Unlike some other template libraries, this library consists of
-a mixture of template code (in the headers) and static code and
-data (in cpp files). Consequently it is necessary to build the
-library's support code into a library or archive file before you
-can use it, instructions for specific platforms are as follows:
-
-Borland C++ Builder:
-
-
- Open up a console window and change to the
- <boost>\libs\regex\build directory.
- Select the appropriate makefile (bcb4.mak for C++ Builder
- 4, bcb5.mak for C++ Builder 5, and bcb6.mak for C++
- Builder 6).
- Invoke the makefile (pass the full path to your version
- of make if you have more than one version installed, the
- makefile relies on the path to make to obtain your C++
- Builder installation directory and tools) for example:
-
-
-make -fbcb5.mak
-
-The build process will build a variety of .lib and .dll files
-(the exact number depends upon the version of Borland's tools you
-are using) the .lib and dll files will be in a sub-directory
-called bcb4 or bcb5 depending upon the makefile used. To install
-the libraries into your development system use:
-
-make -fbcb5.mak install
-
-library files will be copied to <BCROOT>/lib and the
-dll's to <BCROOT>/bin, where <BCROOT> corresponds to
-the install path of your Borland C++ tools.
-
-You may also remove temporary files created during the build
-process (excluding lib and dll files) by using:
-
-make -fbcb5.mak clean
-
-Finally when you use regex++ it is only necessary for you to
-add the <boost> root director to your list of include
-directories for that project. It is not necessary for you to
-manually add a .lib file to the project; the headers will
-automatically select the correct .lib file for your build mode
-and tell the linker to include it. There is one caveat however:
-the library can not tell the difference between VCL and non-VCL
-enabled builds when building a GUI application from the command
-line, if you build from the command line with the 5.5 command
-line tools then you must define the pre-processor symbol _NO_VCL
-in order to ensure that the correct link libraries are selected:
-the C++ Builder IDE normally sets this automatically. Hint, users
-of the 5.5 command line tools may want to add a -D_NO_VCL to bcc32.cfg
-in order to set this option permanently.
-
-If you would prefer to do a static link to the regex libraries
-even when using the dll runtime then define
-BOOST_REGEX_STATIC_LINK, and if you want to suppress automatic
-linking altogether (and supply your own custom build of the lib)
-then define BOOST_REGEX_NO_LIB.
-
-If you are building with C++ Builder 6, you will find that
-<boost/regex.hpp> can not be used in a pre-compiled header
-(the actual problem is in <locale> which gets included by
-<boost/regex.hpp>), if this causes problems for you, then
-try defining BOOST_NO_STD_LOCALE when building, this will disable
-some features throughout boost, but may save you a lot in compile
-times!
-
-Microsoft Visual C++ 6 and 7
-
-You need version 6 of MSVC to build this library. If you are
-using VC5 then you may want to look at one of the previous
-releases of this library
-
-
-Open up a command prompt, which has the necessary MSVC
-environment variables defined (for example by using the batch
-file Vcvars32.bat installed by the Visual Studio installation),
-and change to the <boost>\libs\regex\build directory.
-
-Select the correct makefile - vc6.mak for "vanilla"
-Visual C++ 6 or vc6-stlport.mak if you are using STLPort.
-
-Invoke the makefile like this:
-
-nmake -fvc6.mak
-
-You will now have a collection of lib and dll files in a
-"vc6" subdirectory, to install these into your
-development system use:
-
-nmake -fvc6.mak install
-
-The lib files will be copied to your <VC6>\lib directory
-and the dll files to <VC6>\bin, where <VC6> is the
-root of your Visual C++ 6 installation.
-
-You can delete all the temporary files created during the
-build (excluding lib and dll files) using:
-
-nmake -fvc6.mak clean
-
-Finally when you use regex++ it is only necessary for you to
-add the <boost> root directory to your list of include
-directories for that project. It is not necessary for you to
-manually add a .lib file to the project; the headers will
-automatically select the correct .lib file for your build mode
-and tell the linker to include it.
-
-Note that if you want to statically link to the regex library
-when using the dynamic C++ runtime, define
-BOOST_REGEX_STATIC_LINK when building your project (this only has
-an effect for release builds). If you want to add the source
-directly to your project then define BOOST_REGEX_NO_LIB to
-disable automatic library selection.
-
-Important : there have been some
-reports of compiler-optimisation bugs affecting this library, (particularly
-with VC6 versions prior to service patch 5) the workaround is to
-build the library using /Oityb1 rather than /O2. That is to use
-all optimisation settings except /Oa. This problem is reported to
-affect some standard library code as well (in fact I'm not sure
-if the problem is with the regex code or the underlying standard
-library), so it's probably worthwhile applying this workaround in
-normal practice in any case.
-
-Note: if you have replaced the C++ standard library that comes
-with VC6, then when you build the library you must ensure that
-the environment variables "INCLUDE" and "LIB"
-have been updated to reflect the include and library paths for
-the new library - see vcvars32.bat (part of your Visual Studio
-installation) for more details. Alternatively if STLPort is in c:/stlport
-then you could use:
-
-nmake INCLUDES="-Ic:/stlport/stlport" XLFLAGS="/LIBPATH:c:/stlport/lib"
--fvc6-stlport.mak
-
-If you are building with the full STLPort v4.x, then use the
-vc6-stlport.mak file provided and set the environment variable
-STLPORT_PATH to point to the location of your STLport
-installation (Note that the full STLPort libraries appear not to
-support single-thread static builds).
-
-
-
-GCC(2.95)
-
-There is a conservative makefile for the g++ compiler. From
-the command prompt change to the <boost>/libs/regex/build
-directory and type:
-
-make -fgcc.mak
-
-At the end of the build process you should have a gcc sub-directory
-containing release and debug versions of the library (libboost_regex.a
-and libboost_regex_debug.a). When you build projects that use
-regex++, you will need to add the boost install directory to your
-list of include paths and add <boost>/libs/regex/build/gcc/libboost_regex.a
-to your list of library files.
-
-There is also a makefile to build the library as a shared
-library:
-
-make -fgcc-shared.mak
-
-which will build libboost_regex.so and libboost_regex_debug.so.
-
-Both of the these makefiles support the following environment
-variables:
-
-CXXFLAGS: extra compiler options - note that this applies to
-both the debug and release builds.
-
-INCLUDES: additional include directories.
-
-LDFLAGS: additional linker options.
-
-LIBS: additional library files.
-
-For the more adventurous there is a configure script in
-<boost>/libs/config; see the config
-library documentation .
-
-Sun Workshop 6.1
-
-There is a makefile for the sun (6.1) compiler (C++ version 3.12).
-From the command prompt change to the <boost>/libs/regex/build
-directory and type:
-
-dmake -f sunpro.mak
-
-At the end of the build process you should have a sunpro sub-directory
-containing single and multithread versions of the library (libboost_regex.a,
-libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so).
-When you build projects that use regex++, you will need to add
-the boost install directory to your list of include paths and add
-<boost>/libs/regex/build/sunpro/ to your library search
-path.
-
-Both of the these makefiles support the following environment
-variables:
-
-CXXFLAGS: extra compiler options - note that this applies to
-both the single and multithreaded builds.
-
-INCLUDES: additional include directories.
-
-LDFLAGS: additional linker options.
-
-LIBS: additional library files.
-
-LIBSUFFIX: a suffix to mangle the library name with (defaults
-to nothing).
-
-This makefile does not set any architecture specific options
-like -xarch=v9, you can set these by defining the appropriate
-macros, for example:
-
-dmake CXXFLAGS="-xarch=v9" LDFLAGS="-xarch=v9"
-LIBSUFFIX="_v9" -f sunpro.mak
-
-will build v9 variants of the regex library named
-libboost_regex_v9.a etc.
-
-Other compilers:
-
-There is a generic makefile (generic.mak )
-provided in <boost-root>/libs/regex/build - see that
-makefile for details of environment variables that need to be set
-before use. Alternatively you can using the Jam based build system .
-If you need to configure the library for your platform, then
-refer to the config library
-documentation .
-
-
-
-Copyright Dr
-John Maddock 1998-2001 all rights reserved.
-
-
diff --git a/performance/Jamfile b/performance/Jamfile
new file mode 100644
index 00000000..d3a58ee6
--- /dev/null
+++ b/performance/Jamfile
@@ -0,0 +1,43 @@
+
+subproject libs/regex/performance ;
+
+SOURCES = command_line main time_boost time_greta time_localised_boost time_pcre time_posix time_safe_greta ;
+
+if $(HS_REGEX_PATH)
+{
+ HS_SOURCES = $(HS_REGEX_PATH)/regcomp.c $(HS_REGEX_PATH)/regerror.c $(HS_REGEX_PATH)/regexec.c $(HS_REGEX_PATH)/regfree.c ;
+ POSIX_OPTS = BOOST_HAS_POSIX=1 $(HS_REGEX_PATH) ;
+}
+else if $(USE_POSIX)
+{
+ POSIX_OPTS = BOOST_HAS_POSIX=1 ;
+}
+
+if $(PCRE_PATH)
+{
+ PCRE_SOURCES = $(PCRE_PATH)/chartables.c $(PCRE_PATH)/get.c $(PCRE_PATH)/pcre.c $(PCRE_PATH)/study.c ;
+ PCRE_OPTS = BOOST_HAS_PCRE=1 $(PCRE_PATH) ;
+}
+else if $(USE_PCRE)
+{
+ PCRE_OPTS = BOOST_HAS_PCRE=1 pcre ;
+}
+
+
+exe regex_comparison :
+ $(SOURCES).cpp
+ $(HS_SOURCES)
+ $(PCRE_SOURCES)
+ ../build/boost_regex
+ ../../test/build/boost_prg_exec_monitor
+ :
+ $(BOOST_ROOT)
+ BOOST_REGEX_NO_LIB=1
+ BOOST_REGEX_STATIC_LINK=1
+ $(POSIX_OPTS)
+ $(PCRE_OPTS)
+ ;
+
+
+
+
diff --git a/performance/command_line.cpp b/performance/command_line.cpp
new file mode 100644
index 00000000..b74143c3
--- /dev/null
+++ b/performance/command_line.cpp
@@ -0,0 +1,470 @@
+
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include "regex_comparison.hpp"
+
+#ifdef BOOST_HAS_PCRE
+#include "pcre.h" // for pcre version number
+#endif
+
+//
+// globals:
+//
+bool time_boost = false;
+bool time_localised_boost = false;
+bool time_greta = false;
+bool time_safe_greta = false;
+bool time_posix = false;
+bool time_pcre = false;
+
+bool test_matches = false;
+bool test_code = false;
+bool test_html = false;
+bool test_short_twain = false;
+bool test_long_twain = false;
+
+
+std::string html_template_file;
+std::string html_out_file;
+std::string html_contents;
+std::list result_list;
+
+// the following let us compute averages:
+double greta_total = 0;
+double safe_greta_total = 0;
+double boost_total = 0;
+double locale_boost_total = 0;
+double posix_total = 0;
+double pcre_total = 0;
+unsigned greta_test_count = 0;
+unsigned safe_greta_test_count = 0;
+unsigned boost_test_count = 0;
+unsigned locale_boost_test_count = 0;
+unsigned posix_test_count = 0;
+unsigned pcre_test_count = 0;
+
+int handle_argument(const std::string& what)
+{
+ if(what == "-b")
+ time_boost = true;
+ else if(what == "-bl")
+ time_localised_boost = true;
+#ifdef BOOST_HAS_GRETA
+ else if(what == "-g")
+ time_greta = true;
+ else if(what == "-gs")
+ time_safe_greta = true;
+#endif
+#ifdef BOOST_HAS_POSIX
+ else if(what == "-posix")
+ time_posix = true;
+#endif
+#ifdef BOOST_HAS_PCRE
+ else if(what == "-pcre")
+ time_pcre = true;
+#endif
+ else if(what == "-all")
+ {
+ time_boost = true;
+ time_localised_boost = true;
+#ifdef BOOST_HAS_GRETA
+ time_greta = true;
+ time_safe_greta = true;
+#endif
+#ifdef BOOST_HAS_POSIX
+ time_posix = true;
+#endif
+#ifdef BOOST_HAS_PCRE
+ time_pcre = true;
+#endif
+ }
+ else if(what == "-test-matches")
+ test_matches = true;
+ else if(what == "-test-code")
+ test_code = true;
+ else if(what == "-test-html")
+ test_html = true;
+ else if(what == "-test-short-twain")
+ test_short_twain = true;
+ else if(what == "-test-long-twain")
+ test_long_twain = true;
+ else if(what == "-test-all")
+ {
+ test_matches = true;
+ test_code = true;
+ test_html = true;
+ test_short_twain = true;
+ test_long_twain = true;
+ }
+ else if((what == "-h") || (what == "--help"))
+ return show_usage();
+ else if((what[0] == '-') || (what[0] == '/'))
+ {
+ std::cerr << "Unknown argument: \"" << what << "\"" << std::endl;
+ return 1;
+ }
+ else if(html_template_file.size() == 0)
+ {
+ html_template_file = what;
+ load_file(html_contents, what.c_str());
+ }
+ else if(html_out_file.size() == 0)
+ html_out_file = what;
+ else
+ {
+ std::cerr << "Unexpected argument: \"" << what << "\"" << std::endl;
+ return 1;
+ }
+ return 0;
+}
+
+int show_usage()
+{
+ std::cout <<
+ "Usage\n"
+ "regex_comparison [-h] [library options] [test options] [html_template html_output_file]\n"
+ " -h Show help\n\n"
+ " library options:\n"
+ " -b Apply tests to boost library\n"
+ " -bl Apply tests to boost library with C++ locale\n"
+#ifdef BOOST_HAS_GRETA
+ " -g Apply tests to GRETA library\n"
+ " -gs Apply tests to GRETA library (in non-recursive mode)\n"
+#endif
+#ifdef BOOST_HAS_POSIX
+ " -posix Apply tests to POSIX library\n"
+#endif
+#ifdef BOOST_HAS_PCRE
+ " -pcre Apply tests to PCRE library\n"
+#endif
+ " -all Apply tests to all libraries\n\n"
+ " test options:\n"
+ " -test-matches Test short matches\n"
+ " -test-code Test c++ code examples\n"
+ " -test-html Test c++ code examples\n"
+ " -test-short-twain Test short searches\n"
+ " -test-long-twain Test long searches\n"
+ " -test-all Test everthing\n";
+ return 1;
+}
+
+void load_file(std::string& text, const char* file)
+{
+ std::deque temp_copy;
+ std::ifstream is(file);
+ if(!is.good())
+ {
+ std::string msg("Unable to open file: \"");
+ msg.append(file);
+ msg.append("\"");
+ throw std::runtime_error(msg);
+ }
+ is.seekg(0, std::ios_base::end);
+ std::istream::pos_type pos = is.tellg();
+ is.seekg(0, std::ios_base::beg);
+ text.erase();
+ text.reserve(pos);
+ std::istreambuf_iterator it(is);
+ std::copy(it, std::istreambuf_iterator(), std::back_inserter(text));
+}
+
+void print_result(std::ostream& os, double time, double best)
+{
+ static const char* suffixes[] = {"s", "ms", "us", "ns", "ps", };
+
+ if(time < 0)
+ {
+ os << "NA ";
+ return;
+ }
+ double rel = time / best;
+ bool highlight = ((rel > 0) && (rel < 1.1));
+ unsigned suffix = 0;
+ while(time < 0)
+ {
+ time *= 1000;
+ ++suffix;
+ }
+ os << "";
+ if(highlight)
+ os << "";
+ if(rel <= 1000)
+ os << std::setprecision(3) << rel;
+ else
+ os << (int)rel;
+ os << " (";
+ if(time <= 1000)
+ os << std::setprecision(3) << time;
+ else
+ os << (int)time;
+ os << suffixes[suffix] << ")";
+ if(highlight)
+ os << " ";
+ os << " ";
+}
+
+std::string html_quote(const std::string& in)
+{
+ static const boost::regex e("(<)|(>)|(&)|(\")");
+ static const std::string format("(?1<)(?2>)(?3&)(?4")");
+ return regex_replace(in, e, format, boost::match_default | boost::format_all);
+}
+
+void output_html_results(bool show_description, const std::string& tagname)
+{
+ std::stringstream os;
+ if(result_list.size())
+ {
+ //
+ // start by outputting the table header:
+ //
+ os << "\n";
+ os << "Expression ";
+ if(show_description)
+ os << "Text ";
+#if defined(BOOST_HAS_GRETA)
+ if(time_greta == true)
+ os << "GRETA ";
+ if(time_safe_greta == true)
+ os << "GRETA (non-recursive mode) ";
+#endif
+ if(time_boost == true)
+ os << "Boost ";
+ if(time_localised_boost == true)
+ os << "Boost + C++ locale ";
+#if defined(BOOST_HAS_POSIX)
+ if(time_posix == true)
+ os << "POSIX ";
+#endif
+#ifdef BOOST_HAS_PCRE
+ if(time_pcre == true)
+ os << "PCRE ";
+#endif
+ os << " \n";
+
+ //
+ // Now enumerate through all the test results:
+ //
+ std::list::const_iterator first, last;
+ first = result_list.begin();
+ last = result_list.end();
+ while(first != last)
+ {
+ os << "" << html_quote(first->expression) << "
";
+ if(show_description)
+ os << "" << html_quote(first->description) << " ";
+#if defined(BOOST_HAS_GRETA)
+ if(time_greta == true)
+ {
+ print_result(os, first->greta_time, first->factor);
+ if(first->greta_time > 0)
+ {
+ greta_total += first->greta_time / first->factor;
+ ++greta_test_count;
+ }
+ }
+ if(time_safe_greta == true)
+ {
+ print_result(os, first->safe_greta_time, first->factor);
+ if(first->safe_greta_time > 0)
+ {
+ safe_greta_total += first->safe_greta_time / first->factor;
+ ++safe_greta_test_count;
+ }
+ }
+#endif
+#if defined(BOOST_HAS_POSIX)
+ if(time_boost == true)
+ {
+ print_result(os, first->boost_time, first->factor);
+ if(first->boost_time > 0)
+ {
+ boost_total += first->boost_time / first->factor;
+ ++boost_test_count;
+ }
+ }
+ if(time_localised_boost == true)
+ {
+ print_result(os, first->localised_boost_time, first->factor);
+ if(first->localised_boost_time > 0)
+ {
+ locale_boost_total += first->localised_boost_time / first->factor;
+ ++locale_boost_test_count;
+ }
+ }
+#endif
+ if(time_posix == true)
+ {
+ print_result(os, first->posix_time, first->factor);
+ if(first->posix_time > 0)
+ {
+ posix_total += first->posix_time / first->factor;
+ ++posix_test_count;
+ }
+ }
+#if defined(BOOST_HAS_PCRE)
+ if(time_pcre == true)
+ {
+ print_result(os, first->pcre_time, first->factor);
+ if(first->pcre_time > 0)
+ {
+ pcre_total += first->pcre_time / first->factor;
+ ++pcre_test_count;
+ }
+ }
+#endif
+ os << " \n";
+ ++first;
+ }
+ os << "
\n";
+ result_list.clear();
+ }
+ else
+ {
+ os << "Results not available...
\n";
+ }
+
+ std::string result = os.str();
+
+ std::string::size_type pos = html_contents.find(tagname);
+ if(pos != std::string::npos)
+ {
+ html_contents.replace(pos, tagname.size(), result);
+ }
+}
+
+std::string get_boost_version()
+{
+ std::stringstream os;
+ os << (BOOST_VERSION / 100000) << '.' << ((BOOST_VERSION / 100) % 1000) << '.' << (BOOST_VERSION % 100);
+ return os.str();
+}
+
+std::string get_averages_table()
+{
+ std::stringstream os;
+ //
+ // start by outputting the table header:
+ //
+ os << "\n";
+ os << "";
+#if defined(BOOST_HAS_GRETA)
+ if(time_greta == true)
+ {
+ os << "GRETA ";
+ }
+ if(time_safe_greta == true)
+ {
+ os << "GRETA (non-recursive mode) ";
+ }
+
+#endif
+ if(time_boost == true)
+ {
+ os << "Boost ";
+ }
+ if(time_localised_boost == true)
+ {
+ os << "Boost + C++ locale ";
+ }
+#if defined(BOOST_HAS_POSIX)
+ if(time_posix == true)
+ {
+ os << "POSIX ";
+ }
+#endif
+#ifdef BOOST_HAS_PCRE
+ if(time_pcre == true)
+ {
+ os << "PCRE ";
+ }
+#endif
+ os << " \n";
+
+ //
+ // Now enumerate through all averages:
+ //
+ os << "";
+#if defined(BOOST_HAS_GRETA)
+ if(time_greta == true)
+ os << "" << (greta_total / greta_test_count) << " \n";
+ if(time_safe_greta == true)
+ os << "" << (safe_greta_total / safe_greta_test_count) << " \n";
+#endif
+#if defined(BOOST_HAS_POSIX)
+ if(time_boost == true)
+ os << "" << (boost_total / boost_test_count) << " \n";
+ if(time_localised_boost == true)
+ os << "" << (locale_boost_total / locale_boost_test_count) << " \n";
+#endif
+ if(time_posix == true)
+ os << "" << (posix_total / posix_test_count) << " \n";
+#if defined(BOOST_HAS_PCRE)
+ if(time_pcre == true)
+ os << "" << (pcre_total / pcre_test_count) << " \n";
+#endif
+ os << " \n";
+ os << "
\n";
+ return os.str();
+}
+
+void output_final_html()
+{
+ if(html_out_file.size())
+ {
+ //
+ // start with search and replace ops:
+ //
+ std::string::size_type pos;
+ pos = html_contents.find("%compiler%");
+ if(pos != std::string::npos)
+ {
+ html_contents.replace(pos, 10, BOOST_COMPILER);
+ }
+ pos = html_contents.find("%library%");
+ if(pos != std::string::npos)
+ {
+ html_contents.replace(pos, 9, BOOST_STDLIB);
+ }
+ pos = html_contents.find("%os%");
+ if(pos != std::string::npos)
+ {
+ html_contents.replace(pos, 4, BOOST_PLATFORM);
+ }
+ pos = html_contents.find("%boost%");
+ if(pos != std::string::npos)
+ {
+ html_contents.replace(pos, 7, get_boost_version());
+ }
+ pos = html_contents.find("%pcre%");
+ if(pos != std::string::npos)
+ {
+#ifdef PCRE_MINOR
+ html_contents.replace(pos, 6, BOOST_STRINGIZE(PCRE_MAJOR.PCRE_MINOR));
+#else
+ html_contents.replace(pos, 6, "N/A");
+#endif
+ }
+ pos = html_contents.find("%averages%");
+ if(pos != std::string::npos)
+ {
+ html_contents.replace(pos, 10, get_averages_table());
+ }
+ //
+ // now right the output to file:
+ //
+ std::ofstream os(html_out_file.c_str());
+ os << html_contents;
+ }
+ else
+ {
+ std::cout << html_contents;
+ }
+}
\ No newline at end of file
diff --git a/performance/input.html b/performance/input.html
new file mode 100644
index 00000000..85ca5dba
--- /dev/null
+++ b/performance/input.html
@@ -0,0 +1,70 @@
+
+
+ Regular Expression Performance Comparison
+
+
+
+
+
+
+ Regular Expression Performance Comparison
+
+ The following tables provide comparisons between the following regular
+ expression libraries:
+ GRETA .
+ The Boost regex library .
+ Henry Spencer's regular expression library
+ - this is provided for comparison as a typical non-backtracking implementation.
+ Philip Hazel's PCRE library.
+ Details
+ Machine: Intel Pentium 4 2.8GHz PC.
+ Compiler: %compiler%.
+ C++ Standard Library: %library%.
+ OS: %os%.
+ Boost version: %boost%.
+ PCRE version: %pcre%.
+
+ As ever care should be taken in interpreting the results, only sensible regular
+ expressions (rather than pathological cases) are given, most are taken from the
+ Boost regex examples, or from the Library of
+ Regular Expressions . In addition, some variation in the relative
+ performance of these libraries can be expected on other machines - as memory
+ access and processor caching effects can be quite large for most finite state
+ machine algorithms.
+ Averages
+ The following are the average relative scores for all the tests: the perfect
+ regular expression library would score 1, in practice anything less than 2
+ is pretty good.
+ %averages%
+ Comparison 1: Long Search
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within a long English language text was measured
+ (mtent12.txt
+ from Project Gutenberg , 19Mb).
+ %long_twain_search%
+ Comparison 2: Medium Sized Search
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within a medium sized English language text was
+ measured (the first 50K from mtent12.txt).
+ %short_twain_search%
+ Comparison 3: C++ Code Search
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within the C++ source file
+ boost/crc.hpp was measured.
+ %code_search%
+
+ Comparison 4: HTML Document Search
+
+ For each of the following regular expressions the time taken to find all
+ occurrences of the expression within the html file libs/libraries.htm
+ was measured.
+ %html_search%
+ Comparison 3: Simple Matches
+
+ For each of the following regular expressions the time taken to match against
+ the text indicated was measured.
+ %short_matches%
+
+ Copyright John Maddock April 2003, all rights reserved.
+
+
diff --git a/performance/main.cpp b/performance/main.cpp
new file mode 100644
index 00000000..96ecbaf8
--- /dev/null
+++ b/performance/main.cpp
@@ -0,0 +1,251 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include
+#include
+#include
+#include
+#include
+#include "regex_comparison.hpp"
+
+
+void test_match(const std::string& re, const std::string& text, const std::string& description, bool icase)
+{
+ double time;
+ results r(re, description);
+
+ std::cout << "Testing: \"" << re << "\" against \"" << description << "\"" << std::endl;
+
+#ifdef BOOST_HAS_GRETA
+ if(time_greta == true)
+ {
+ time = g::time_match(re, text, icase);
+ r.greta_time = time;
+ std::cout << "\tGRETA regex: " << time << "s\n";
+ }
+ if(time_safe_greta == true)
+ {
+ time = gs::time_match(re, text, icase);
+ r.safe_greta_time = time;
+ std::cout << "\tSafe GRETA regex: " << time << "s\n";
+ }
+#endif
+ if(time_boost == true)
+ {
+ time = b::time_match(re, text, icase);
+ r.boost_time = time;
+ std::cout << "\tBoost regex: " << time << "s\n";
+ }
+ if(time_localised_boost == true)
+ {
+ time = bl::time_match(re, text, icase);
+ r.localised_boost_time = time;
+ std::cout << "\tBoost regex (C++ locale): " << time << "s\n";
+ }
+#ifdef BOOST_HAS_POSIX
+ if(time_posix == true)
+ {
+ time = posix::time_match(re, text, icase);
+ r.posix_time = time;
+ std::cout << "\tPOSIX regex: " << time << "s\n";
+ }
+#endif
+#ifdef BOOST_HAS_PCRE
+ if(time_pcre == true)
+ {
+ time = pcr::time_match(re, text, icase);
+ r.pcre_time = time;
+ std::cout << "\tPCRE regex: " << time << "s\n";
+ }
+#endif
+ r.finalise();
+ result_list.push_back(r);
+}
+
+void test_find_all(const std::string& re, const std::string& text, const std::string& description, bool icase)
+{
+ std::cout << "Testing: " << re << std::endl;
+
+ double time;
+ results r(re, description);
+
+#ifdef BOOST_HAS_GRETA
+ if(time_greta == true)
+ {
+ time = g::time_find_all(re, text, icase);
+ r.greta_time = time;
+ std::cout << "\tGRETA regex: " << time << "s\n";
+ }
+ if(time_safe_greta == true)
+ {
+ time = gs::time_find_all(re, text, icase);
+ r.safe_greta_time = time;
+ std::cout << "\tSafe GRETA regex: " << time << "s\n";
+ }
+#endif
+ if(time_boost == true)
+ {
+ time = b::time_find_all(re, text, icase);
+ r.boost_time = time;
+ std::cout << "\tBoost regex: " << time << "s\n";
+ }
+ if(time_localised_boost == true)
+ {
+ time = bl::time_find_all(re, text, icase);
+ r.localised_boost_time = time;
+ std::cout << "\tBoost regex (C++ locale): " << time << "s\n";
+ }
+#ifdef BOOST_HAS_POSIX
+ if(time_posix == true)
+ {
+ time = posix::time_find_all(re, text, icase);
+ r.posix_time = time;
+ std::cout << "\tPOSIX regex: " << time << "s\n";
+ }
+#endif
+#ifdef BOOST_HAS_PCRE
+ if(time_pcre == true)
+ {
+ time = pcr::time_find_all(re, text, icase);
+ r.pcre_time = time;
+ std::cout << "\tPCRE regex: " << time << "s\n";
+ }
+#endif
+ r.finalise();
+ result_list.push_back(r);
+}
+
+int cpp_main(int argc, char * argv[])
+{
+ // start by processing the command line args:
+ if(argc < 2)
+ return show_usage();
+ int result = 0;
+ for(int c = 1; c < argc; ++c)
+ {
+ result += handle_argument(argv[c]);
+ }
+ if(result)
+ return result;
+
+ if(test_matches)
+ {
+ // start with a simple test, this is basically a measure of the minimal overhead
+ // involved in calling a regex matcher:
+ test_match("abc", "abc");
+ // these are from the regex docs:
+ test_match("^([0-9]+)(\\-| |$)(.*)$", "100- this is a line of ftp response which contains a message string");
+ test_match("([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4}", "1234-5678-1234-456");
+ // these are from http://www.regxlib.com/
+ test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "john_maddock@compuserve.com");
+ test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "foo12@foo.edu");
+ test_match("^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$", "bob.smith@foo.tv");
+ test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "EH10 2QQ");
+ test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "G1 1AA");
+ test_match("^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$", "SW1 1ZZ");
+ test_match("^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$", "4/1/2001");
+ test_match("^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$", "12/12/2001");
+ test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "123");
+ test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "+3.14159");
+ test_match("^[-+]?[[:digit:]]*\\.?[[:digit:]]*$", "-3.14159");
+ }
+ output_html_results(true, "%short_matches%");
+
+ std::string file_contents;
+
+ if(test_code)
+ {
+ load_file(file_contents, "../../../boost/crc.hpp");
+
+ const char* highlight_expression = // preprocessor directives: index 1
+ "(^[ \t]*#(?:[^\\\\\\n]|\\\\[^\\n_[:punct:][:alnum:]]*[\\n[:punct:][:word:]])*)|"
+ // comment: index 2
+ "(//[^\\n]*|/\\*.*?\\*/)|"
+ // literals: index 3
+ "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
+ // string literals: index 4
+ "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
+ // keywords: index 5
+ "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
+ "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
+ "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
+ "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
+ "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
+ "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
+ "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
+ "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
+ "|using|virtual|void|volatile|wchar_t|while)\\>"
+ ;
+
+ const char* class_expression = "^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
+ "(class|struct)[[:space:]]*(\\<\\w+\\>([ \t]*\\([^)]*\\))?"
+ "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?"
+ "(\\{|:[^;\\{()]*\\{)";
+
+ const char* include_expression = "^[ \t]*#[ \t]*include[ \t]+(\"[^\"]+\"|<[^>]+>)";
+ const char* boost_include_expression = "^[ \t]*#[ \t]*include[ \t]+(\"boost/[^\"]+\"|]+>)";
+
+
+ test_find_all(class_expression, file_contents);
+ test_find_all(highlight_expression, file_contents);
+ test_find_all(include_expression, file_contents);
+ test_find_all(boost_include_expression, file_contents);
+ }
+ output_html_results(false, "%code_search%");
+
+ if(test_html)
+ {
+ load_file(file_contents, "../../../libs/libraries.htm");
+ test_find_all("beman|john|dave", file_contents, true);
+ test_find_all(".*?
", file_contents, true);
+ test_find_all("]+href=(\"[^\"]*\"|[^[:space:]]+)[^>]*>", file_contents, true);
+ test_find_all("]*>.*? ", file_contents, true);
+ test_find_all(" ]+src=(\"[^\"]*\"|[^[:space:]]+)[^>]*>", file_contents, true);
+ test_find_all("]+face=(\"[^\"]*\"|[^[:space:]]+)[^>]*>.*? ", file_contents, true);
+ }
+ output_html_results(false, "%html_search%");
+
+ if(test_short_twain)
+ {
+ load_file(file_contents, "short_twain.txt");
+
+ test_find_all("Twain", file_contents);
+ test_find_all("Huck[[:alpha:]]+", file_contents);
+ test_find_all("[[:alpha:]]+ing", file_contents);
+ test_find_all("^[^\n]*?Twain", file_contents);
+ test_find_all("Tom|Sawyer|Huckleberry|Finn", file_contents);
+ test_find_all("(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)", file_contents);
+ }
+ output_html_results(false, "%short_twain_search%");
+
+ if(test_long_twain)
+ {
+ load_file(file_contents, "mtent13.txt");
+
+ test_find_all("Twain", file_contents);
+ test_find_all("Huck[[:alpha:]]+", file_contents);
+ test_find_all("[[:alpha:]]+ing", file_contents);
+ test_find_all("^[^\n]*?Twain", file_contents);
+ test_find_all("Tom|Sawyer|Huckleberry|Finn", file_contents);
+ time_posix = false;
+ test_find_all("(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn)", file_contents);
+ time_posix = true;
+ }
+ output_html_results(false, "%long_twain_search%");
+
+ output_final_html();
+ return 0;
+}
+
diff --git a/performance/regex_comparison.hpp b/performance/regex_comparison.hpp
new file mode 100644
index 00000000..0a695e3b
--- /dev/null
+++ b/performance/regex_comparison.hpp
@@ -0,0 +1,136 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * All rights reserved.
+ * May not be transfered or disclosed to a third party without
+ * prior consent of the author.
+ *
+ */
+
+
+#ifndef REGEX_COMPARISON_HPP
+#define REGEX_COMPARISON_HPP
+
+#include
+#include
+#include
+
+//
+// globals:
+//
+extern bool time_boost;
+extern bool time_localised_boost;
+extern bool time_greta;
+extern bool time_safe_greta;
+extern bool time_posix;
+extern bool time_pcre;
+
+extern bool test_matches;
+extern bool test_short_twain;
+extern bool test_long_twain;
+extern bool test_code;
+extern bool test_html;
+
+extern std::string html_template_file;
+extern std::string html_out_file;
+extern std::string html_contents;
+
+
+int handle_argument(const std::string& what);
+int show_usage();
+void load_file(std::string& text, const char* file);
+void output_html_results(bool show_description, const std::string& tagname);
+void output_final_html();
+
+
+struct results
+{
+ double boost_time;
+ double localised_boost_time;
+ double greta_time;
+ double safe_greta_time;
+ double posix_time;
+ double pcre_time;
+ double factor;
+ std::string expression;
+ std::string description;
+ results(const std::string& ex, const std::string& desc)
+ : boost_time(-1),
+ localised_boost_time(-1),
+ greta_time(-1),
+ safe_greta_time(-1),
+ posix_time(-1),
+ pcre_time(-1),
+ factor(std::numeric_limits::max()),
+ expression(ex),
+ description(desc)
+ {}
+ void finalise()
+ {
+ if((boost_time >= 0) && (boost_time < factor))
+ factor = boost_time;
+ if((localised_boost_time >= 0) && (localised_boost_time < factor))
+ factor = localised_boost_time;
+ if((greta_time >= 0) && (greta_time < factor))
+ factor = greta_time;
+ if((safe_greta_time >= 0) && (safe_greta_time < factor))
+ factor = safe_greta_time;
+ if((posix_time >= 0) && (posix_time < factor))
+ factor = posix_time;
+ if((pcre_time >= 0) && (pcre_time < factor))
+ factor = pcre_time;
+ }
+};
+
+extern std::list result_list;
+
+
+namespace b {
+// boost tests:
+double time_match(const std::string& re, const std::string& text, bool icase);
+double time_find_all(const std::string& re, const std::string& text, bool icase);
+
+}
+namespace bl {
+// localised boost tests:
+double time_match(const std::string& re, const std::string& text, bool icase);
+double time_find_all(const std::string& re, const std::string& text, bool icase);
+
+}
+namespace pcr {
+// pcre tests:
+double time_match(const std::string& re, const std::string& text, bool icase);
+double time_find_all(const std::string& re, const std::string& text, bool icase);
+
+}
+namespace g {
+// greta tests:
+double time_match(const std::string& re, const std::string& text, bool icase);
+double time_find_all(const std::string& re, const std::string& text, bool icase);
+
+}
+namespace gs {
+// safe greta tests:
+double time_match(const std::string& re, const std::string& text, bool icase);
+double time_find_all(const std::string& re, const std::string& text, bool icase);
+
+}
+namespace posix {
+// safe greta tests:
+double time_match(const std::string& re, const std::string& text, bool icase);
+double time_find_all(const std::string& re, const std::string& text, bool icase);
+
+}
+void test_match(const std::string& re, const std::string& text, const std::string& description, bool icase = false);
+void test_find_all(const std::string& re, const std::string& text, const std::string& description, bool icase = false);
+inline void test_match(const std::string& re, const std::string& text, bool icase = false)
+{ test_match(re, text, text, icase); }
+inline void test_find_all(const std::string& re, const std::string& text, bool icase = false)
+{ test_find_all(re, text, "", icase); }
+
+
+#define REPEAT_COUNT 10
+
+#endif
diff --git a/performance/time_boost.cpp b/performance/time_boost.cpp
new file mode 100644
index 00000000..9dc3e791
--- /dev/null
+++ b/performance/time_boost.cpp
@@ -0,0 +1,98 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include "regex_comparison.hpp"
+#include
+#include
+
+namespace b{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ boost::regex e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
+ boost::smatch what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_match(text, what, e);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_match(text, what, e);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+bool dummy_grep_proc(const boost::smatch&)
+{ return true; }
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ boost::regex e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
+ boost::smatch what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_grep(&dummy_grep_proc, text, e);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ if(result >10)
+ return result / iter;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_grep(&dummy_grep_proc, text, e);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+}
diff --git a/performance/time_greta.cpp b/performance/time_greta.cpp
new file mode 100644
index 00000000..f6e4b309
--- /dev/null
+++ b/performance/time_greta.cpp
@@ -0,0 +1,125 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include "regex_comparison.hpp"
+#if defined(BOOST_HAS_GRETA)
+#include
+#include
+#include "regexpr2.h"
+
+namespace g{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE));
+ regex::match_results what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ assert(e.match(text, what));
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text, what);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text, what);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE));
+ regex::match_results what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text.begin(), text.end(), what);
+ while(what.backref(0).matched)
+ {
+ e.match(what.backref(0).end(), text.end(), what);
+ }
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ if(result > 10)
+ return result / iter;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text.begin(), text.end(), what);
+ while(what.backref(0).matched)
+ {
+ e.match(what.backref(0).end(), text.end(), what);
+ }
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+}
+
+#else
+
+namespace g {
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+
+}
+
+#endif
+
diff --git a/performance/time_localised_boost.cpp b/performance/time_localised_boost.cpp
new file mode 100644
index 00000000..d1aeac89
--- /dev/null
+++ b/performance/time_localised_boost.cpp
@@ -0,0 +1,98 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include "regex_comparison.hpp"
+#include
+#include
+
+namespace bl{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ boost::reg_expression > e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
+ boost::smatch what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_match(text, what, e);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_match(text, what, e);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+bool dummy_grep_proc(const boost::smatch&)
+{ return true; }
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ boost::reg_expression > e(re, (icase ? boost::regex::perl | boost::regex::icase : boost::regex::perl));
+ boost::smatch what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_grep(&dummy_grep_proc, text, e);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ if(result >10)
+ return result / iter;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ boost::regex_grep(&dummy_grep_proc, text, e);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+}
diff --git a/performance/time_pcre.cpp b/performance/time_pcre.cpp
new file mode 100644
index 00000000..5956b521
--- /dev/null
+++ b/performance/time_pcre.cpp
@@ -0,0 +1,180 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include
+#include
+#include "regex_comparison.hpp"
+#ifdef BOOST_HAS_PCRE
+#include "pcre.h"
+#include
+
+namespace pcr{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ pcre *ppcre;
+ const char *error;
+ int erroffset;
+
+ int what[50];
+
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+
+ if(0 == (ppcre = pcre_compile(re.c_str(), (icase ? PCRE_CASELESS | PCRE_ANCHORED | PCRE_DOTALL | PCRE_MULTILINE : PCRE_ANCHORED | PCRE_DOTALL | PCRE_MULTILINE),
+ &error, &erroffset, NULL)))
+ {
+ free(ppcre);
+ return -1;
+ }
+
+ pcre_extra *pe;
+ pe = pcre_study(ppcre, 0, &error);
+ if(error)
+ {
+ free(ppcre);
+ free(pe);
+ return -1;
+ }
+
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ erroffset = pcre_exec(ppcre, pe, text.c_str(), text.size(), 0, 0, what, sizeof(what)/sizeof(int));
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ free(ppcre);
+ free(pe);
+ return result / iter;
+}
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ pcre *ppcre;
+ const char *error;
+ int erroffset;
+
+ int what[50];
+
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ int exec_result;
+ int matches;
+
+ if(0 == (ppcre = pcre_compile(re.c_str(), (icase ? PCRE_CASELESS | PCRE_DOTALL | PCRE_MULTILINE : PCRE_DOTALL | PCRE_MULTILINE), &error, &erroffset, NULL)))
+ {
+ free(ppcre);
+ return -1;
+ }
+
+ pcre_extra *pe;
+ pe = pcre_study(ppcre, 0, &error);
+ if(error)
+ {
+ free(ppcre);
+ free(pe);
+ return -1;
+ }
+
+ do
+ {
+ int startoff;
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ matches = 0;
+ startoff = 0;
+ exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
+ while(exec_result >= 0)
+ {
+ ++matches;
+ startoff = what[1];
+ exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
+ }
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ if(result >10)
+ return result / iter;
+
+ result = DBL_MAX;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ int startoff;
+ matches = 0;
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ matches = 0;
+ startoff = 0;
+ exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
+ while(exec_result >= 0)
+ {
+ ++matches;
+ startoff = what[1];
+ exec_result = pcre_exec(ppcre, pe, text.c_str(), text.size(), startoff, 0, what, sizeof(what)/sizeof(int));
+ }
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+}
+#else
+
+namespace pcr{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+
+}
+
+#endif
\ No newline at end of file
diff --git a/performance/time_posix.cpp b/performance/time_posix.cpp
new file mode 100644
index 00000000..cd2cec68
--- /dev/null
+++ b/performance/time_posix.cpp
@@ -0,0 +1,143 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include
+#include
+#include "regex_comparison.hpp"
+#ifdef BOOST_HAS_POSIX
+#include
+#include "regex.h"
+
+namespace posix{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ regex_t e;
+ regmatch_t what[20];
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ if(0 != regcomp(&e, re.c_str(), (icase ? REG_ICASE | REG_EXTENDED : REG_EXTENDED)))
+ return -1;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ regexec(&e, text.c_str(), e.re_nsub, what, 0);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ regexec(&e, text.c_str(), e.re_nsub, what, 0);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ regfree(&e);
+ return result / iter;
+}
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ regex_t e;
+ regmatch_t what[20];
+ memset(what, 0, sizeof(what));
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ int exec_result;
+ int matches;
+ if(0 != regcomp(&e, re.c_str(), (icase ? REG_ICASE | REG_EXTENDED : REG_EXTENDED)))
+ return -1;
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ what[0].rm_so = 0;
+ what[0].rm_eo = text.size();
+ matches = 0;
+ exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
+ while(exec_result == 0)
+ {
+ ++matches;
+ what[0].rm_so = what[0].rm_eo;
+ what[0].rm_eo = text.size();
+ exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
+ }
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ if(result >10)
+ return result / iter;
+
+ result = DBL_MAX;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ what[0].rm_so = 0;
+ what[0].rm_eo = text.size();
+ matches = 0;
+ exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
+ while(exec_result == 0)
+ {
+ ++matches;
+ what[0].rm_so = what[0].rm_eo;
+ what[0].rm_eo = text.size();
+ exec_result = regexec(&e, text.c_str(), 20, what, REG_STARTEND);
+ }
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+}
+#else
+
+namespace posix{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+
+}
+#endif
\ No newline at end of file
diff --git a/performance/time_safe_greta.cpp b/performance/time_safe_greta.cpp
new file mode 100644
index 00000000..6c600bda
--- /dev/null
+++ b/performance/time_safe_greta.cpp
@@ -0,0 +1,127 @@
+/*
+ *
+ * Copyright (c) 2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+#include "regex_comparison.hpp"
+#if defined(BOOST_HAS_GRETA)
+
+#include
+#include
+#include "regexpr2.h"
+
+namespace gs{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE), regex::MODE_SAFE);
+ regex::match_results what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ assert(e.match(text, what));
+ do
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text, what);
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text, what);
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ regex::rpattern e(re, (icase ? regex::MULTILINE | regex::NORMALIZE | regex::NOCASE : regex::MULTILINE | regex::NORMALIZE), regex::MODE_SAFE);
+ regex::match_results what;
+ boost::timer tim;
+ int iter = 1;
+ int counter, repeats;
+ double result = 0;
+ double run;
+ do
+ {
+ bool r;
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text.begin(), text.end(), what);
+ while(what.backref(0).matched)
+ {
+ e.match(what.backref(0).end(), text.end(), what);
+ }
+ }
+ result = tim.elapsed();
+ iter *= 2;
+ }while(result < 0.5);
+ iter /= 2;
+
+ if(result > 10)
+ return result / iter;
+
+ // repeat test and report least value for consistency:
+ for(repeats = 0; repeats < REPEAT_COUNT; ++repeats)
+ {
+ tim.restart();
+ for(counter = 0; counter < iter; ++counter)
+ {
+ e.match(text.begin(), text.end(), what);
+ while(what.backref(0).matched)
+ {
+ e.match(what.backref(0).end(), text.end(), what);
+ }
+ }
+ run = tim.elapsed();
+ result = std::min(run, result);
+ }
+ return result / iter;
+}
+
+}
+
+#else
+
+namespace gs{
+
+double time_match(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+
+double time_find_all(const std::string& re, const std::string& text, bool icase)
+{
+ return -1;
+}
+
+}
+
+#endif
+
diff --git a/posix_ref.htm b/posix_ref.htm
deleted file mode 100644
index ffe2e677..00000000
--- a/posix_ref.htm
+++ /dev/null
@@ -1,314 +0,0 @@
-
-
-
-
-
-
-Regex++, POSIX API Reference
-
-
-
-
-
-
-
-
-
-
- Regex++, POSIX API
- Reference.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
-POSIX compatibility library
-
-#include <boost/cregex.hpp>
-or :
-#include <boost/regex.h>
-
-The following functions are available for users who need a
-POSIX compatible C library, they are available in both Unicode
-and narrow character versions, the standard POSIX API names are
-macros that expand to one version or the other depending upon
-whether UNICODE is defined or not.
-
-Important : Note that all the symbols defined here are
-enclosed inside namespace boost when used in C++ programs,
-unless you use #include <boost/regex.h> instead - in which
-case the symbols are still defined in namespace boost, but are
-made available in the global namespace as well.
-
-The functions are defined as:
-
-extern "C" {
-int regcompA(regex_tA*, const char *, int );
-unsigned int regerrorA(int , const regex_tA*, char *, unsigned int );
-int regexecA(const regex_tA*, const char *, unsigned int , regmatch_t*, int );
-void regfreeA(regex_tA*);
-
-int regcompW(regex_tW*, const wchar_t *, int );
-unsigned int regerrorW(int , const regex_tW*, wchar_t *, unsigned int );
-int regexecW(const regex_tW*, const wchar_t *, unsigned int , regmatch_t*, int );
-void regfreeW(regex_tW*);
-
-#ifdef UNICODE
-#define regcomp regcompW
-#define regerror regerrorW
-#define regexec regexecW
-#define regfree regfreeW
-#define regex_t regex_tW
-#else
-#define regcomp regcompA
-#define regerror regerrorA
-#define regexec regexecA
-#define regfree regfreeA
-#define regex_t regex_tA
-#endif
-}
-
-All the functions operate on structure regex_t , which
-exposes two public members:
-
-unsigned int re_nsub this is filled in by regcomp
-and indicates the number of sub-expressions contained in the
-regular expression.
-
-const TCHAR* re_endp points to the end of the
-expression to compile when the flag REG_PEND is set.
-
-Footnote: regex_t is actually a #define - it is either
-regex_tA or regex_tW depending upon whether UNICODE is defined or
-not, TCHAR is either char or wchar_t again depending upon the
-macro UNICODE.
-
-regcomp takes a pointer to a regex_t , a pointer
-to the expression to compile and a flags parameter which can be a
-combination of:
-
-
-
-
-
- REG_EXTENDED
- Compiles modern regular
- expressions. Equivalent to regbase::char_classes |
- regbase::intervals | regbase::bk_refs.
-
-
-
-
- REG_BASIC
- Compiles basic (obsolete)
- regular expression syntax. Equivalent to regbase::char_classes
- | regbase::intervals | regbase::limited_ops | regbase::bk_braces
- | regbase::bk_parens | regbase::bk_refs.
-
-
-
-
- REG_NOSPEC
- All characters are ordinary,
- the expression is a literal string.
-
-
-
-
- REG_ICASE
- Compiles for matching that
- ignores character case.
-
-
-
-
- REG_NOSUB
- Has no effect in this
- library.
-
-
-
-
- REG_NEWLINE
- When this flag is set a dot
- does not match the newline character.
-
-
-
-
- REG_PEND
- When this flag is set the
- re_endp parameter of the regex_t structure must point to
- the end of the regular expression to compile.
-
-
-
-
- REG_NOCOLLATE
- When this flag is set then
- locale dependent collation for character ranges is turned
- off.
-
-
-
-
- REG_ESCAPE_IN_LISTS
- , , ,
- When this flag is set, then
- escape sequences are permitted in bracket expressions (character
- sets).
-
-
-
-
- REG_NEWLINE_ALT
- When this flag is set then
- the newline character is equivalent to the alternation
- operator |.
-
-
-
-
- REG_PERL
- A shortcut for perl-like
- behavior: REG_EXTENDED | REG_NOCOLLATE |
- REG_ESCAPE_IN_LISTS
-
-
-
-
- REG_AWK
- A shortcut for awk-like
- behavior: REG_EXTENDED | REG_ESCAPE_IN_LISTS
-
-
-
-
- REG_GREP
- A shortcut for grep like
- behavior: REG_BASIC | REG_NEWLINE_ALT
-
-
-
-
- REG_EGREP
- A shortcut for egrep
- like behavior: REG_EXTENDED | REG_NEWLINE_ALT
-
-
-
-
-
-
-
-regerror takes the following parameters, it maps an
-error code to a human readable string:
-
-
-
-
-
- int code
- The error code.
-
-
-
-
- const regex_t* e
- The regular expression (can
- be null).
-
-
-
-
- char* buf
- The buffer to fill in with
- the error message.
-
-
-
-
- unsigned int buf_size
- The length of buf.
-
-
-
-
-If the error code is OR'ed with REG_ITOA then the message that
-results is the printable name of the code rather than a message,
-for example "REG_BADPAT". If the code is REG_ATIO then e
-must not be null and e->re_pend must point to the
-printable name of an error code, the return value is then the
-value of the error code. For any other value of code , the
-return value is the number of characters in the error message, if
-the return value is greater than or equal to buf_size then
-regerror will have to be called again with a larger buffer.
-
-regexec finds the first occurrence of expression e
-within string buf . If len is non-zero then *m
-is filled in with what matched the regular expression, m[0]
-contains what matched the whole string, m[1] the first sub-expression
-etc, see regmatch_t in the header file declaration for
-more details. The eflags parameter can be a combination of:
-
-
-
-
-
-
- REG_NOTBOL
- Parameter buf does
- not represent the start of a line.
-
-
-
-
- REG_NOTEOL
- Parameter buf does
- not terminate at the end of a line.
-
-
-
-
- REG_STARTEND
- The string searched starts
- at buf + pmatch[0].rm_so and ends at buf + pmatch[0].rm_eo.
-
-
-
-
-
-
-
-Finally regfree frees all the memory that was allocated
-by regcomp.
-
-Footnote: this is an abridged reference to the POSIX API
-functions, it is provided for compatibility with other libraries,
-rather than an API to be used in new code (unless you need access
-from a language other than C++). This version of these functions
-should also happily coexist with other versions, as the names
-used are macros that expand to the actual function names.
-
-
-
-
-Copyright Dr
-John Maddock 1998-2000 all rights reserved.
-
-
diff --git a/syntax.htm b/syntax.htm
deleted file mode 100644
index 327071e5..00000000
--- a/syntax.htm
+++ /dev/null
@@ -1,742 +0,0 @@
-
-
-
-
-
-
-Regex++, Regular Expression Syntax
-
-
-
-
-
-
-
-
-
-
- Regex++, Regular
- Expression Syntax.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
-Regular expression syntax
-
-This section covers the regular expression syntax used by this
-library, this is a programmers guide, the actual syntax presented
-to your program's users will depend upon the flags used during
-expression compilation.
-
-Literals
-
-All characters are literals except: ".", "|",
-"*", "?", "+", "(",
-")", "{", "}", "[",
-"]", "^", "$" and "\".
-These characters are literals when preceded by a "\". A
-literal is a character that matches itself, or matches the result
-of traits_type::translate(), where traits_type is the traits
-template parameter to class reg_expression.
-
-
-
-Wildcard
-
-The dot character "." matches any single character
-except : when match_not_dot_null is passed to the matching
-algorithms, the dot does not match a null character; when match_not_dot_newline
-is passed to the matching algorithms, then the dot does not match
-a newline character.
-
-
-
-Repeats
-
-A repeat is an expression that is repeated an arbitrary number
-of times. An expression followed by "*" can be repeated
-any number of times including zero. An expression followed by
-"+" can be repeated any number of times, but at least
-once, if the expression is compiled with the flag regbase::bk_plus_qm
-then "+" is an ordinary character and "\+"
-represents a repeat of once or more. An expression followed by
-"?" may be repeated zero or one times only, if the
-expression is compiled with the flag regbase::bk_plus_qm then
-"?" is an ordinary character and "\?"
-represents the repeat zero or once operator. When it is necessary
-to specify the minimum and maximum number of repeats explicitly,
-the bounds operator "{}" may be used, thus "a{2}"
-is the letter "a" repeated exactly twice, "a{2,4}"
-represents the letter "a" repeated between 2 and 4
-times, and "a{2,}" represents the letter "a"
-repeated at least twice with no upper limit. Note that there must
-be no white-space inside the {}, and there is no upper limit on
-the values of the lower and upper bounds. When the expression is
-compiled with the flag regbase::bk_braces then "{" and
-"}" are ordinary characters and "\{" and
-"\}" are used to delimit bounds instead. All repeat
-expressions refer to the shortest possible previous sub-expression:
-a single character; a character set, or a sub-expression grouped
-with "()" for example.
-
-Examples:
-
-"ba*" will match all of "b", "ba",
-"baaa" etc.
-
-"ba+" will match "ba" or "baaaa"
-for example but not "b".
-
-"ba?" will match "b" or "ba".
-
-"ba{2,4}" will match "baa", "baaa"
-and "baaaa".
-
-Non-greedy repeats
-
-Whenever the "extended" regular expression syntax is
-in use (the default) then non-greedy repeats are possible by
-appending a '?' after the repeat; a non-greedy repeat is one
-which will match the shortest possible string.
-
-For example to match html tag pairs one could use something
-like:
-
-"<\s*tagname[^>]*>(.*?)<\s*/tagname\s*>"
-
-
-In this case $1 will contain the text between the tag pairs,
-and will be the shortest possible matching string.
-
-
-
-Parenthesis
-
-Parentheses serve two purposes, to group items together into a
-sub-expression, and to mark what generated the match. For example
-the expression "(ab)*" would match all of the string
-"ababab". The matching algorithms regex_match and regex_search each
-take an instance of match_results
-that reports what caused the match, on exit from these functions
-the match_results
-contains information both on what the whole expression matched
-and on what each sub-expression matched. In the example above
-match_results[1] would contain a pair of iterators denoting the
-final "ab" of the matching string. It is permissible
-for sub-expressions to match null strings. If a sub-expression
-takes no part in a match - for example if it is part of an
-alternative that is not taken - then both of the iterators that
-are returned for that sub-expression point to the end of the
-input string, and the matched parameter for that sub-expression
-is false . Sub-expressions are indexed from left to right
-starting from 1, sub-expression 0 is the whole expression.
-
-Non-Marking Parenthesis
-
-Sometimes you need to group sub-expressions with parenthesis,
-but don't want the parenthesis to spit out another marked sub-expression,
-in this case a non-marking parenthesis (?:expression) can be used.
-For example the following expression creates no sub-expressions:
-
-"(?:abc)*"
-
-Forward Lookahead Asserts
-
-There are two forms of these; one for positive forward
-lookahead asserts, and one for negative lookahead asserts:
-
-"(?=abc)" matches zero characters only if they are
-followed by the expression "abc".
-
-"(?!abc)" matches zero characters only if they are
-not followed by the expression "abc".
-
-Alternatives
-
-Alternatives occur when the expression can match either one
-sub-expression or another, each alternative is separated by a
-"|", or a "\|" if the flag regbase::bk_vbar
-is set, or by a newline character if the flag regbase::newline_alt
-is set. Each alternative is the largest possible previous sub-expression;
-this is the opposite behaviour from repetition operators.
-
-Examples:
-
-"a(b|c)" could match "ab" or "ac".
-
-
-"abc|def" could match "abc" or "def".
-
-
-
-
-Sets
-
-A set is a set of characters that can match any single
-character that is a member of the set. Sets are delimited by
-"[" and "]" and can contain literals,
-character ranges, character classes, collating elements and
-equivalence classes. Set declarations that start with "^"
-contain the compliment of the elements that follow.
-
-Examples:
-
-Character literals:
-
-"[abc]" will match either of "a", "b",
-or "c".
-
-"[^abc] will match any character other than "a",
-"b", or "c".
-
-Character ranges:
-
-"[a-z]" will match any character in the range "a"
-to "z".
-
-"[^A-Z]" will match any character other than those
-in the range "A" to "Z".
-
-Note that character ranges are highly locale dependent: they
-match any character that collates between the endpoints of the
-range, ranges will only behave according to ASCII rules when the
-default "C" locale is in effect. For example if the
-library is compiled with the Win32 localization model, then [a-z]
-will match the ASCII characters a-z, and also 'A', 'B' etc, but
-not 'Z' which collates just after 'z'. This locale specific
-behaviour can be disabled by specifying regbase::nocollate when
-compiling, this is the default behaviour when using regbase::normal,
-and forces ranges to collate according to ASCII character code.
-Likewise, if you use the POSIX C API functions then setting
-REG_NOCOLLATE turns off locale dependent collation.
-
-Character classes are denoted using the syntax "[:classname:]"
-within a set declaration, for example "[[:space:]]" is
-the set of all whitespace characters. Character classes are only
-available if the flag regbase::char_classes is set. The available
-character classes are:
-
-
-
-
-
- alnum
- Any alpha numeric character.
-
-
-
-
- alpha
- Any alphabetical character a-z
- and A-Z. Other characters may also be included depending
- upon the locale.
-
-
-
-
- blank
- Any blank character, either
- a space or a tab.
-
-
-
-
- cntrl
- Any control character.
-
-
-
-
- digit
- Any digit 0-9.
-
-
-
-
- graph
- Any graphical character.
-
-
-
-
- lower
- Any lower case character a-z.
- Other characters may also be included depending upon the
- locale.
-
-
-
-
- print
- Any printable character.
-
-
-
-
- punct
- Any punctuation character.
-
-
-
-
- space
- Any whitespace character.
-
-
-
-
- upper
- Any upper case character A-Z.
- Other characters may also be included depending upon the
- locale.
-
-
-
-
- xdigit
- Any hexadecimal digit
- character, 0-9, a-f and A-F.
-
-
-
-
- word
- Any word character - all
- alphanumeric characters plus the underscore.
-
-
-
-
- unicode
- Any character whose code is
- greater than 255, this applies to the wide character
- traits classes only.
-
-
-
-
-There are some shortcuts that can be used in place of the
-character classes, provided the flag regbase::escape_in_lists is
-set then you can use:
-
-\w in place of [:word:]
-
-\s in place of [:space:]
-
-\d in place of [:digit:]
-
-\l in place of [:lower:]
-
-\u in place of [:upper:]
-
-
-
-Collating elements take the general form [.tagname.] inside a
-set declaration, where tagname is either a single
-character, or a name of a collating element, for example [[.a.]]
-is equivalent to [a], and [[.comma.]] is equivalent to [,]. The
-library supports all the standard POSIX collating element names,
-and in addition the following digraphs: "ae", "ch",
-"ll", "ss", "nj", "dz",
-"lj", each in lower, upper and title case variations.
-Multi-character collating elements can result in the set matching
-more than one character, for example [[.ae.]] would match two
-characters, but note that [^[.ae.]] would only match one
-character.
-
-
-
-Equivalence classes take the general form [=tagname=] inside a
-set declaration, where tagname is either a single
-character, or a name of a collating element, and matches any
-character that is a member of the same primary equivalence class
-as the collating element [.tagname.]. An equivalence class is a
-set of characters that collate the same, a primary equivalence
-class is a set of characters whose primary sort key are all the
-same (for example strings are typically collated by character,
-then by accent, and then by case; the primary sort key then
-relates to the character, the secondary to the accentation, and
-the tertiary to the case). If there is no equivalence class
-corresponding to tagname , then [=tagname=] is exactly the
-same as [.tagname.]. Unfortunately there is no locale independent
-method of obtaining the primary sort key for a character, except
-under Win32. For other operating systems the library will "guess"
-the primary sort key from the full sort key (obtained from strxfrm ),
-so equivalence classes are probably best considered broken under
-any operating system other than Win32.
-
-
-
-To include a literal "-" in a set declaration then:
-make it the first character after the opening "[" or
-"[^", the endpoint of a range, a collating element, or
-if the flag regbase::escape_in_lists is set then precede with an
-escape character as in "[\-]". To include a literal
-"[" or "]" or "^" in a set then
-make them the endpoint of a range, a collating element, or
-precede with an escape character if the flag regbase::escape_in_lists
-is set.
-
-
-
-Line anchors
-
-An anchor is something that matches the null string at the
-start or end of a line: "^" matches the null string at
-the start of a line, "$" matches the null string at the
-end of a line.
-
-
-
-Back references
-
-A back reference is a reference to a previous sub-expression
-that has already been matched, the reference is to what the sub-expression
-matched, not to the expression itself. A back reference consists
-of the escape character "\" followed by a digit "1"
-to "9", "\1" refers to the first sub-expression,
-"\2" to the second etc. For example the expression
-"(.*)\1" matches any string that is repeated about its
-mid-point for example "abcabc" or "xyzxyz". A
-back reference to a sub-expression that did not participate in
-any match, matches the null string: NB this is different to some
-other regular expression matchers. Back references are only
-available if the expression is compiled with the flag regbase::bk_refs
-set.
-
-
-
-Characters by code
-
-This is an extension to the algorithm that is not available in
-other libraries, it consists of the escape character followed by
-the digit "0" followed by the octal character code. For
-example "\023" represents the character whose octal
-code is 23. Where ambiguity could occur use parentheses to break
-the expression up: "\0103" represents the character
-whose code is 103, "(\010)3 represents the character 10
-followed by "3". To match characters by their
-hexadecimal code, use \x followed by a string of hexadecimal
-digits, optionally enclosed inside {}, for example \xf0 or
-\x{aff}, notice the latter example is a Unicode character.
-
-
-
-Word operators
-
-The following operators are provided for compatibility with
-the GNU regular expression library.
-
-"\w" matches any single character that is a member
-of the "word" character class, this is identical to the
-expression "[[:word:]]".
-
-"\W" matches any single character that is not a
-member of the "word" character class, this is identical
-to the expression "[^[:word:]]".
-
-"\<" matches the null string at the start of a
-word.
-
-"\>" matches the null string at the end of the
-word.
-
-"\b" matches the null string at either the start or
-the end of a word.
-
-"\B" matches a null string within a word.
-
-The start of the sequence passed to the matching algorithms is
-considered to be a potential start of a word unless the flag
-match_not_bow is set. The end of the sequence passed to the
-matching algorithms is considered to be a potential end of a word
-unless the flag match_not_eow is set.
-
-
-
-Buffer operators
-
-The following operators are provide for compatibility with the
-GNU regular expression library, and Perl regular expressions:
-
-"\`" matches the start of a buffer.
-
-"\A" matches the start of the buffer.
-
-"\'" matches the end of a buffer.
-
-"\z" matches the end of a buffer.
-
-"\Z" matches the end of a buffer, or possibly one or
-more new line characters followed by the end of the buffer.
-
-A buffer is considered to consist of the whole sequence passed
-to the matching algorithms, unless the flags match_not_bob or
-match_not_eob are set.
-
-
-
-Escape operator
-
-The escape character "\" has several meanings.
-
-Inside a set declaration the escape character is a normal
-character unless the flag regbase::escape_in_lists is set in
-which case whatever follows the escape is a literal character
-regardless of its normal meaning.
-
-The escape operator may introduce an operator for example:
-back references, or a word operator.
-
-The escape operator may make the following character normal,
-for example "\*" represents a literal "*"
-rather than the repeat operator.
-
-
-
-Single character escape sequences
-
-The following escape sequences are aliases for single
-characters:
-
-
-
-
-
- Escape sequence
- Character code
- Meaning
-
-
-
-
- \a
- 0x07
- Bell character.
-
-
-
-
- \f
- 0x0C
- Form feed.
-
-
-
-
- \n
- 0x0A
- Newline character.
-
-
-
-
- \r
- 0x0D
- Carriage return.
-
-
-
-
- \t
- 0x09
- Tab character.
-
-
-
-
- \v
- 0x0B
- Vertical tab.
-
-
-
-
- \e
- 0x1B
- ASCII Escape character.
-
-
-
-
- \0dd
- 0dd
- An octal character code,
- where dd is one or more octal digits.
-
-
-
-
- \xXX
- 0xXX
- A hexadecimal character
- code, where XX is one or more hexadecimal digits.
-
-
-
-
- \x{XX}
- 0xXX
- A hexadecimal character
- code, where XX is one or more hexadecimal digits,
- optionally a unicode character.
-
-
-
-
- \cZ
- z-@
- An ASCII escape sequence
- control-Z, where Z is any ASCII character greater than or
- equal to the character code for '@'.
-
-
-
-
-
-
-
-Miscellaneous escape sequences:
-
-The following are provided mostly for perl compatibility, but
-note that there are some differences in the meanings of \l \L \u
-and \U:
-
-
-
-
-
- \w
- Equivalent to [[:word:]].
-
-
-
-
- \W
- Equivalent to [^[:word:]].
-
-
-
-
- \s
- Equivalent to [[:space:]].
-
-
-
-
- \S
- Equivalent to [^[:space:]].
-
-
-
-
- \d
- Equivalent to [[:digit:]].
-
-
-
-
- \D
- Equivalent to [^[:digit:]].
-
-
-
-
- \l
- Equivalent to [[:lower:]].
-
-
-
-
- \L
- Equivalent to [^[:lower:]].
-
-
-
-
- \u
- Equivalent to [[:upper:]].
-
-
-
-
- \U
- Equivalent to [^[:upper:]].
-
-
-
-
- \C
- Any single character,
- equivalent to '.'.
-
-
-
-
- \X
- Match any Unicode combining
- character sequence, for example "a\x 0301" (a
- letter a with an acute).
-
-
-
-
- \Q
- The begin quote operator,
- everything that follows is treated as a literal character
- until a \E end quote operator is found.
-
-
-
-
- \E
- The end quote operator,
- terminates a sequence begun with \Q.
-
-
-
-
-
-
-
-What gets matched?
-
-The regular expression library will match the first possible
-matching string, if more than one string starting at a given
-location can match then it matches the longest possible string,
-unless the flag match_any is set, in which case the first match
-encountered is returned. Use of the match_any option can reduce
-the time taken to find the match - but is only useful if the user
-is less concerned about what matched - for example it would not
-be suitable for search and replace operations. In cases where
-their are multiple possible matches all starting at the same
-location, and all of the same length, then the match chosen is
-the one with the longest first sub-expression, if that is the
-same for two or more matches, then the second sub-expression will
-be examined and so on.
-
-
-
-
-Copyright Dr
-John Maddock 1998-2000 all rights reserved.
-
-
diff --git a/template_class_ref.htm b/template_class_ref.htm
deleted file mode 100644
index ccd0d3c9..00000000
--- a/template_class_ref.htm
+++ /dev/null
@@ -1,2479 +0,0 @@
-
-
-
-
-
-
-Regex++, template class and algorithm reference
-
-
-
-
-
-
-
-
-
-
- Regex++ template
- class reference.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
- class regbase
-
-#include <boost/regex.hpp >
-
-
-Class regbase is the template argument independent base class
-for reg_expression, the only public members are the flag_type
-enumerated values that determine how regular expressions are
-interpreted.
-
-class regbase
-{
-public :
- enum flag_type_
- {
- escape_in_lists = 1, // '\\' special inside [...]
- char_classes = escape_in_lists << 1, // [[:CLASS:]] allowed
- intervals = char_classes << 1, // {x,y} allowed
- limited_ops = intervals << 1, // all of + ? and | are normal characters
- newline_alt = limited_ops << 1, // \n is the same as |
- bk_plus_qm = newline_alt << 1, // uses \+ and \?
- bk_braces = bk_plus_qm << 1, // uses \{ and \}
- bk_parens = bk_braces << 1, // uses \( and \)
- bk_refs = bk_parens << 1, // \d allowed
- bk_vbar = bk_refs << 1, // uses \|
- use_except = bk_vbar << 1, // exception on error
- failbit = use_except << 1, // error flag
- literal = failbit << 1, // all characters are literals
- icase = literal << 1, // characters are matched regardless of case
- nocollate = icase << 1, // don't use locale specific collation
-
- basic = char_classes | intervals | limited_ops | bk_braces | bk_parens | bk_refs,
- extended = char_classes | intervals | bk_refs,
- normal = escape_in_lists | char_classes | intervals | bk_refs | nocollate,
- emacs = bk_braces | bk_parens | bk_refs | bk_vbar,
- awk = extended | escape_in_lists,
- grep = basic | newline_alt,
- egrep = extended | newline_alt,
- sed = basic,
- perl = normal
- };
- typedef unsigned int flag_type;
-};
-
-
-
-
-
-The enumerated type regbase::flag_type determines the
-syntax rules for regular expression compilation, the various
-flags have the following effects:
-
-
-
-
-
- regbase::escape_in_lists
- Allows the use of the escape
- "\" character in sets of characters, for
- example [\]] represents the set of characters containing
- only "]". If this flag is not set then "\"
- is an ordinary character inside sets.
-
-
-
-
- regbase::char_classes
- When this bit is set,
- character classes [:classname:] are allowed inside
- character set declarations, for example "[[:word:]]"
- represents the set of all characters that belong to the
- character class "word".
-
-
-
-
- regbase:: intervals
- When this bit is set,
- repetition intervals are allowed, for example "a{2,4}"
- represents a repeat of between 2 and 4 letter a's.
-
-
-
-
- regbase:: limited_ops
- When this bit is set all of
- "+", "?" and "|" are
- ordinary characters in all situations.
-
-
-
-
- regbase:: newline_alt
- When this bit is set, then
- the newline character "\n" has the same effect
- as the alternation operator "|".
-
-
-
-
- regbase:: bk_plus_qm
- When this bit is set then
- "\+" represents the one or more repetition
- operator and "\?" represents the zero or one
- repetition operator. When this bit is not set then
- "+" and "?" are used instead.
-
-
-
-
- regbase:: bk_braces
- When this bit is set then
- "\{" and "\}" are used for bounded
- repetitions and "{" and "}" are
- normal characters. This is the opposite of default
- behavior.
-
-
-
-
- regbase:: bk_parens
- When this bit is set then
- "\(" and "\)" are used to group sub-expressions
- and "(" and ")" are ordinary
- characters, this is the opposite of default behaviour.
-
-
-
-
- regbase:: bk_refs
- When this bit is set then
- back references are allowed.
-
-
-
-
- regbase:: bk_vbar
- When this bit is set then
- "\|" represents the alternation operator and
- "|" is an ordinary character. This is the
- opposite of default behaviour.
-
-
-
-
- regbase:: use_except
- When this bit is set then a bad_expression exception will
- be thrown on error. Use of this flag is deprecated
- - reg_expression will always throw on error.
-
-
-
-
- regbase:: failbit
- This bit is set on error, if
- regbase::use_except is not set, then this bit should be
- checked to see if a regular expression is valid before
- usage.
-
-
-
-
- regbase::literal
- All characters in the string
- are treated as literals, there are no special characters
- or escape sequences.
-
-
-
-
- regbase::icase
- All characters in the string
- are matched regardless of case.
-
-
-
-
- regbase::nocollate
- Locale specific collation is
- disabled when dealing with ranges in character set
- declarations. For example when this bit is set the
- expression [a-c] would match the characters a, b and c
- only regardless of locale, where as when this is not set
- , then [a-c] matches any character which collates in the
- range a to c.
-
-
-
-
- regbase::basic
- Equivalent to the POSIX
- basic regular expression syntax: char_classes | intervals
- | limited_ops | bk_braces | bk_parens | bk_refs.
-
-
-
-
- Regbase::extended
- Equivalent to the POSIX
- extended regular expression syntax: char_classes |
- intervals | bk_refs.
-
-
-
-
- regbase::normal
- This is the
- default setting, and represents how most people expect
- the library to behave. Equivalent to the POSIX extended
- syntax, but with locale specific collation disabled, and
- escape characters inside set declarations enabled:
- regbase::escape_in_lists | regbase::char_classes |
- regbase::intervals | regbase::bk_refs | regbase::nocollate.
-
-
-
-
- regbase::emacs
- Provides
- compatability with the emacs editor, eqivalent to:
- bk_braces | bk_parens | bk_refs | bk_vbar.
-
-
-
-
- regbase::awk
- Provides
- compatabilty with the Unix utility Awk, the same as POSIX
- extended regular expressions, but allows escapes inside
- bracket-expressions (character sets). Equivalent to
- extended | escape_in_lists.
-
-
-
-
- regbase::grep
- Provides
- compatabilty with the Unix grep utility, the same as
- POSIX basic regular expressions, but with the newline
- character equivalent to the alternation operator. the
- same as basic | newline_alt.
-
-
-
-
- regbase::egrep
- Provides
- compatabilty with the Unix egrep utility, the same as
- POSIX extended regular expressions, but with the newline
- character equivalent to the alternation operator. the
- same as extended | newline_alt.
-
-
-
-
- regbase::sed
- Provides
- compatabilty with the Unix sed utility, the same as POSIX
- basic regular expressions.
-
-
-
-
- regbase::perl
- Provides
- compatibility with the perl programming language, the
- same as regbase::normal.
-
-
-
-
-
-
- Exception classes.
-
-#include <boost/pat_except.hpp >
-
-
-An instance of bad_expression is thrown whenever a bad
-regular expression is encountered.
-
-namespace boost{
-
-class bad_pattern : public std::runtime_error
-{
-public :
- explicit bad_pattern(const std::string& s) : std::runtime_error(s){};
-};
-
-class bad_expression : public bad_pattern
-{
-public :
- bad_expression(const std::string& s) : bad_pattern(s) {}
-};
-
-
-} // namespace boost
-
-Footnotes: the class bad_pattern forms the base class
-for all pattern-matching exceptions, of which bad_expression
-is one. The choice of std::runtime_error as the base class
-for bad_pattern is moot, depending upon how the library is
-used exceptions may be either logic errors (programmer supplied
-expressions) or run time errors (user supplied expressions).
-
-
-
- Class reg_expression
-
-#include <boost/regex.hpp >
-
-
-The template class reg_expression encapsulates regular
-expression parsing and compilation. The class derives from class regbase and takes three template
-parameters:
-
-charT : determines the character type, i.e.
-either char or wchar_t.
-
-traits : determines the behaviour of the
-character type, for example whether character matching is case
-sensitive or not, and which character class names are recognized.
-A default traits class is provided: regex_traits<charT> .
-
-
-Allocator : the allocator class used to allocate
-memory by the class.
-
-For ease of use there are two typedefs that define the two
-standard reg_expression instances, unless you want to use
-custom allocators, you won't need to use anything other than
-these:
-
-namespace boost{
-template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT> >
-class reg_expression;
-typedef reg_expression<char > regex;
-typedef reg_expression<wchar_t> wregex;
-}
-
-The definition of reg_expression follows: it is based
-very closely on class basic_string, and fulfils the requirements
-for a container of charT .
-
-namespace boost{
-template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT> >
-class reg_expression : public regbase
-{
-public :
- // typedefs:
- typedef charT char_type;
- typedef traits traits_type;
- // locale_type
- // placeholder for actual locale type used by the
- // traits class to localise *this.
- typedef typename traits::locale_type locale_type;
- // value_type
- typedef charT value_type;
- // reference, const_reference
- typedef charT& reference;
- typedef const charT& const_reference;
- // iterator, const_iterator
- typedef const charT* const_iterator;
- typedef const_iterator iterator;
- // difference_type
- typedef typename Allocator::difference_type difference_type;
- // size_type
- typedef typename Allocator::size_type size_type;
- // allocator_type
- typedef Allocator allocator_type;
- typedef Allocator alloc_type;
- // flag_type
- typedef boost::int_fast32_t flag_type;
-public :
- // constructors
- explicit reg_expression(const Allocator& a = Allocator());
- explicit reg_expression(const charT* p, flag_type f = regbase::normal, const Allocator& a = Allocator());
- reg_expression(const charT* p1, const charT* p2, flag_type f = regbase::normal, const Allocator& a = Allocator());
- reg_expression(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator());
- reg_expression(const reg_expression&);
- template <class ST, class SA>
- explicit reg_expression(const std::basic_string<charT, ST, SA>& p, flag_type f = regbase::normal, const Allocator& a = Allocator());
- template <class I>
- reg_expression(I first, I last, flag_type f = regbase::normal, const Allocator& a = Allocator());
- ~reg_expression();
- reg_expression& operator =(const reg_expression&);
- reg_expression& operator =(const charT* ptr);
- template <class ST, class SA>
- reg_expression& operator =(const std::basic_string<charT, ST, SA>& p);
- //
- // assign:
- reg_expression& assign(const reg_expression& that);
- reg_expression& assign(const charT* ptr, flag_type f = regbase::normal);
- reg_expression& assign(const charT* first, const charT* last, flag_type f = regbase::normal);
- template <class string_traits, class A>
- reg_expression& assign(
- const std::basic_string<charT, string_traits, A>& s,
- flag_type f = regbase::normal);
- template <class iterator>
- reg_expression& assign(iterator first,
- iterator last,
- flag_type f = regbase::normal);
- //
- // allocator access:
- Allocator get_allocator()const ;
- //
- // locale:
- locale_type imbue(locale_type l);
- locale_type getloc()const ;
- //
- // flags:
- flag_type getflags()const ;
- //
- // str:
- std::basic_string<charT> str()const ;
- //
- // begin, end:
- const_iterator begin()const ;
- const_iterator end()const ;
- //
- // swap:
- void swap(reg_expression&)throw ();
- //
- // size:
- size_type size()const ;
- //
- // max_size:
- size_type max_size()const ;
- //
- // empty:
- bool empty()const ;
- unsigned mark_count()const ;
- bool operator ==(const reg_expression&)const ;
- bool operator <(const reg_expression&)const ;
-};
-} // namespace boost
-
-Class reg_expression has the following public member functions:
-
-
-
-
-
-
- reg_expression(Allocator a =
- Allocator());
- Constructs a default
- instance of reg_expression without any expression.
-
-
-
-
- reg_expression(charT* p, unsigned
- f = regbase::normal, Allocator a = Allocator());
- Constructs an instance
- of reg_expression from the expression denoted by the null
- terminated string p , using the flags f to
- determine regular expression syntax. See class regbase for allowable flag values.
-
-
-
-
- reg_expression(charT* p1,
- charT* p2, unsigned f = regbase::normal, Allocator
- a = Allocator());
- Constructs an instance
- of reg_expression from the expression denoted by pair of
- input-iterators p1 and p2 , using the flags f
- to determine regular expression syntax. See class regbase for allowable flag values.
-
-
-
-
- reg_expression(charT* p,
- size_type len, unsigned f, Allocator a = Allocator());
- Constructs an instance
- of reg_expression from the expression denoted by the
- string p of length len , using the flags f
- to determine regular expression syntax. See class regbase for allowable flag values.
-
-
-
-
- template <class
- ST, class SA>
- reg_expression(const std::basic_string<charT,
- ST, SA>& p, boost::int_fast32_t f = regbase::normal,
- const Allocator& a = Allocator());
- Constructs an instance
- of reg_expression from the expression denoted by the
- string p , using the flags f to determine
- regular expression syntax. See class regbase
- for allowable flag values. Note - this member may not
- be available depending upon your compiler capabilities.
-
-
-
-
-
- template <class I>
- reg_expression(I first, I last, flag_type f = regbase::normal,
- const Allocator& a = Allocator());
- Constructs an instance
- of reg_expression from the expression denoted by pair of
- input-iterators p1 and p2 , using the flags f
- to determine regular expression syntax. See class regbase for allowable flag values.
-
-
-
-
- reg_expression(const
- reg_expression&);
- Copy constructor - copies an
- existing regular expression.
-
-
-
-
- reg_expression& operator =(const
- reg_expression&);
- Copies an existing regular
- expression.
-
-
-
-
- reg_expression& operator =(const
- charT* ptr);
- Equivalent to assign(ptr);
-
-
-
-
- template <class ST, class
- SA> reg_expression& operator=(const std::basic_string<charT,
- ST, SA>& p);
-
- Equivalent to assign(p);
-
-
-
-
- reg_expression& assign(const
- reg_expression& that);
- Copies the regular
- expression contained by that , throws bad_expression if that
- does not contain a valid expression. Returns *this.
-
-
-
-
- reg_expression& assign(const
- charT* p, flag_type f = regbase::normal);
- Compiles a regular
- expression from the expression denoted by the null
- terminated string p , using the flags f to
- determine regular expression syntax. See class regbase for allowable flag values.
- Throws bad_expression if p
- does not contain a valid expression. Returns *this.
-
-
-
-
- reg_expression& assign(const
- charT* first, const charT* last, flag_type f =
- regbase::normal);
- Compiles a regular
- expression from the expression denoted by the pair of
- input-iterators first-last , using the flags f
- to determine regular expression syntax. See class regbase for allowable flag values.
- Throws bad_expression if first-last
- does not contain a valid expression. Returns *this.
-
-
-
-
- template <class
- string_traits, class A>
- reg_expression& assign(const std::basic_string<charT,
- string_traits, A>& s, flag_type f = regbase::normal);
- Compiles a regular
- expression from the expression denoted by the string s ,
- using the flags f to determine regular expression
- syntax. See class regbase for
- allowable flag values. Throws bad_expression
- if s does not contain a valid expression. Returns
- *this.
-
-
-
-
- template <class
- iterator>
- reg_expression& assign(iterator first, iterator last,
- flag_type f = regbase::normal);
- Compiles a regular
- expression from the expression denoted by the pair of
- input-iterators first-last , using the flags f
- to determine regular expression syntax. See class regbase for allowable flag values.
- Throws bad_expression if first-last
- does not contain a valid expression. Returns *this.
-
-
-
-
- Allocator get_allocator()const ;
- Returns the allocator used
- by the expression.
-
-
-
-
- locale_type imbue(const
- locale_type& l);
- Imbues the expression with
- the specified locale, and invalidates the current
- expression. May throw std::runtime_error if the call
- results in an attempt to open a non-existent message
- catalogue.
-
-
-
-
- locale_type getloc()const ;
- Returns the locale used by
- the expression.
-
-
-
-
- flag_type getflags()const ;
- Returns the flags used to
- compile the current expression.
-
-
-
-
- std::basic_string<charT>
- str()const ;
- Returns the current
- expression as a string.
-
-
-
-
- const_iterator begin()const ;
- Returns a pointer to the
- first character of the current expression.
-
-
-
-
- const_iterator end()const ;
- Returns a pointer to the end
- of the current expression.
-
-
-
-
- size_type size()const ;
- Returns the length of the
- current expression.
-
-
-
-
- size_type max_size()const ;
- Returns the maximum length
- of a regular expression text.
-
-
-
-
- bool empty()const ;
- Returns true if the object
- contains no valid expression.
-
-
-
-
- unsigned mark_count()const
- ;
- Returns the number of sub-expressions
- in the compiled regular expression. Note that this
- includes the whole match (subexpression zero), so the
- value returned is always >= 1.
-
-
-
-
-
-
-Class regex_traits
-
-#include <boost/regex/regex_traits.hpp >
-
-
-This is a preliminary version of the regular expression
-traits class, and is subject to change .
-
-The purpose of the traits class is to make it easier to
-customise the behaviour of reg_expression and the
-associated matching algorithms. Custom traits classes can handle
-special character sets or define additional character classes,
-for example one could define [[:kanji:]] as the set of all (Unicode)
-kanji characters. This library provides three traits classes and
-a wrapper class regex_traits , which inherits from one of
-these depending upon the default localisation model in use, class
-c_regex_traits encapsulates the global C locale, class w32_regex_traits
-encapsulates the global Win32 locale (only available on Win32
-systems), and class cpp_regex_traits encapsulates the C++
-locale (only provided if std::locale is supported):
-
-template <class charT> class c_regex_traits;
-template<> class c_regex_traits<char> { /*details*/ };
-template<> class c_regex_traits<wchar_t> { /*details*/ };
-
-template <class charT> class w32_regex_traits;
-template<> class w32_regex_traits<char> { /*details*/ };
-template<> class w32_regex_traits<wchar_t> { /*details*/ };
-
-template <class charT> class cpp_regex_traits;
-template<> class cpp_regex_traits<char> { /*details*/ };
-template<> class cpp_regex_traits<wchar_t> { /*details*/ };
-
-template <class charT> class regex_traits : public base_type { /*detailts*/ };
-
-Where "base_type " defaults to w32_regex_traits
-on Win32 systems, and c_regex_traits otherwise. The
-default behaviour can be changed by defining one of
-BOOST_REGEX_USE_C_LOCALE (forces use of c_regex_traits by
-default), or BOOST_REGEX_USE_CPP_LOCALE (forces use of cpp_regex_traits
-by default). Alternatively a specific traits class can be passed
-to the reg_expression template.
-
-The requirements for custom traits classes are documented separately here....
-
-There is also an example of a custom traits class supplied by Christian Engström ,
-see iso8859_1_regex_traits.cpp
-and iso8859_1_regex_traits.hpp ,
-see the
-readme file for more details.
-
-
-
-Class match_results
-
-#include <boost/regex.hpp >
-
-
-Regular expressions are different from many simple pattern-matching
-algorithms in that as well as finding an overall match they can
-also produce sub-expression matches: each sub-expression being
-delimited in the pattern by a pair of parenthesis (...). There
-has to be some method for reporting sub-expression matches back
-to the user: this is achieved this by defining a class match_results
-that acts as an indexed collection of sub-expression matches,
-each sub-expression match being contained in an object of type sub_match .
-
-
-//
-// class sub_match:
-// denotes one sub-expression match.
-//
- template <class iterator>
-struct sub_match
-{
- typedef typename std::iterator_traits<iterator>::value_type value_type;
- typedef typename std::iterator_traits<iterator>::difference_type difference_type;
- typedef iterator iterator_type;
-
- iterator first;
- iterator second;
- bool matched;
-
- operator std::basic_string<value_type>()const ;
-
- bool operator ==(const sub_match& that)const ;
- bool operator !=(const sub_match& that)const ;
- difference_type length()const ;
-};
-
-//
-// class match_results:
-// contains an indexed collection of matched sub-expressions.
-//
- template <class iterator, class Allocator = std::allocator<typename std::iterator_traits<iterator>::value_type > >
-class match_results
-{
-public :
- typedef Allocator alloc_type;
- typedef typename Allocator::template Rebind<iterator>::size_type size_type;
- typedef typename std::iterator_traits<iterator>::value_type char_type;
- typedef sub_match<iterator> value_type;
- typedef typename std::iterator_traits<iterator>::difference_type difference_type;
- typedef iterator iterator_type;
- explicit match_results(const Allocator& a = Allocator());
- match_results(const match_results& m);
- match_results& operator =(const match_results& m);
- ~match_results();
- size_type size()const ;
- const sub_match<iterator>& operator [](int n) const ;
- Allocator allocator()const ;
- difference_type length(int sub = 0)const ;
- difference_type position(unsigned int sub = 0)const ;
- unsigned int line()const ;
- iterator line_start()const ;
- std::basic_string<char_type> str(int sub = 0)const ;
- void swap(match_results& that);
- bool operator ==(const match_results& that)const ;
- bool operator <(const match_results& that)const ;
-};
-typedef match_results<const char *> cmatch;
-typedef match_results<const wchar_t *> wcmatch;
-typedef match_results<std::string::const_iterator> smatch;
-typedef match_results<std::wstring::const_iterator> wsmatch;
-
-Class match_results is used for reporting what matched a
-regular expression, it is passed to the matching algorithms regex_match and regex_search ,
-and is used by regex_grep to notify the
-callback function (or function object) what matched. Note that
-the default allocator parameter has been chosen to match the
-default allocator parameter to reg_expression. match_results has
-the following public member functions:
-
-
-
-
-
- match_results(Allocator a =
- Allocator());
- Constructs an instance of
- match_results, using allocator instance a.
-
-
-
-
- match_results(const
- match_results& m);
- Copy constructor.
-
-
-
-
- match_results& operator=(const
- match_results& m);
- Assignment operator.
-
-
-
-
- const
- sub_match<iterator>& operator [](size_type
- n) const;
- Returns what matched, item 0
- represents the whole string, item 1 the first sub-expression
- and so on.
-
-
-
-
- Allocator& allocator()const;
- Returns the allocator used
- by the class.
-
-
-
-
- difference_type length(unsigned
- int sub = 0);
- Returns the length of the
- matched subexpression, defaults to the length of the
- whole match, in effect this is equivalent to operator[](sub).second
- - operator[](sub).first.
-
-
-
-
- difference_type position(unsigned
- int sub = 0);
- Returns the position of the
- matched sub-expression, defaults to the position of the
- whole match. The returned value is the position of the
- match relative to the start of the string.
-
-
-
-
- unsigned int
- line()const ;
- Returns the index of the
- line on which the match occurred, indices start with 1,
- not zero. Equivalent to the number of newline characters
- prior to operator[](0).first plus one.
-
-
-
-
- iterator line_start()const;
- Returns an iterator denoting
- the start of the line on which the match occurred.
-
-
-
-
- size_type size()const;
- Returns how many sub-expressions
- are present in the match, including sub-expression zero (the
- whole match). This is the case even if no matches were
- found in the search operation - you must use the returned
- value from regex_search / regex_match to determine whether
- any match occured.
-
-
-
-
-
-
-
-The operator[] member function needs further explanation: it
-returns a const reference to a structure of type
-sub_match<iterator>, which has the following public members:
-
-
-
-
-
-
- typedef typename
- std::iterator_traits<iterator>::value_type
- value_type;
- The type pointed to by the
- iterators.
-
-
-
-
- typedef typename
- std::iterator_traits<iterator>::difference_type
- difference_type;
- A type that represents the
- difference between two iterators.
-
-
-
-
- typedef iterator
- iterator_type;
- The iterator type.
-
-
-
-
- iterator first
- An iterator denoting the
- position of the start of the match.
-
-
-
-
- iterator second
- An iterator denoting the
- position of the end of the match.
-
-
-
-
- bool matched
- A Boolean value denoting
- whether this sub-expression participated in the match.
-
-
-
-
- difference_type length()const;
- Returns the length of the
- sub-expression match.
-
-
-
-
- operator std::basic_string<value_type>
- ()const ;
- Converts the sub-expression
- match into an instance of std::basic_string<>. Note
- that this member may be either absent, or present to a
- more limited degree depending upon your compiler
- capabilities.
-
-
-
-
-Operator[] takes an integer as an argument that denotes the
-sub-expression for which to return information, the argument can
-take the following special values:
-
-
-
-
-
- -2
- Returns everything from the
- end of the match, to the end of the input string,
- equivalent to $' in perl. If this is a null string, then:
- first == second
- And
- matched == false.
-
-
-
-
-
- -1
- Returns everything from the
- start of the input string (or the end of the last match
- if this is a grep operation), to the start of this match.
- Equivalent to $` in perl. If this is a null string, then:
- first == second
- And
- matched == false.
-
-
-
-
-
- 0
- Returns the whole of what
- matched, equivalent to $& in perl. The matched
- parameter is always true.
-
-
-
-
- 0 < N < size()
- Returns what matched sub-expression
- N, if this sub-expression did not participate in the
- match then matched == false
- otherwise:
- matched == true.
-
-
-
-
-
- N < -2 or N >= size()
- Represents an out-of range
- non-existent sub-expression. Returns a "null"
- match in which first == last
- And
- matched == false.
-
-
-
-
-
-Note that as well as being parameterised for an allocator,
-match_results<> also takes an iterator type, this allows
-any pair of iterators to be searched for a given regular
-expression, provided the iterators have at least bi-directional
-properties.
-
-
-
- Algorithm regex_match
-
-#include <boost/regex.hpp >
-
-
-The algorithm regex _match determines whether a given regular
-expression matches a given sequence denoted by a pair of
-bidirectional-iterators, the algorithm is defined as follows, note
-that the result is true only if the expression matches the whole
-of the input sequence , the main use of this function is data
-input validation:
-
-template <class iterator, class Allocator, class charT, class traits, class Allocator2>
-bool regex_match(iterator first,
- iterator last,
- match_results<iterator, Allocator>& m,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-The library also defines the following convenience versions,
-which take either a const charT*, or a const std::basic_string<>&
-in place of a pair of iterators [note - these versions may not be
-available, or may be available in a more limited form, depending
-upon your compilers capabilities]:
-
-template <class charT, class Allocator, class traits, class Allocator2>
-bool regex_match(const charT* str,
- match_results<const charT*, Allocator>& m,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default)
-
-template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
-bool regex_match(const std::basic_string<charT, ST, SA>& s,
- match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator>& m,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-Finally there is a set of convenience versions that simply
-return true or false and do not indicate what matched:
-
-template <class iterator, class Allocator, class charT, class traits, class Allocator2>
-bool regex_match(iterator first,
- iterator last,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-template <class charT, class Allocator, class traits, class Allocator2>
-bool regex_match(const charT* str,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default)
-
-template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
-bool regex_match(const std::basic_string<charT, ST, SA>& s,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-The parameters for the main function version are as follows:
-
-
-
-
-
- iterator first
- Denotes the start of the range to be matched.
-
-
-
-
- iterator last
- Denotes the end of the range
- to be matched.
-
-
-
-
- match_results<iterator,
- Allocator>& m
- An instance of match_results
- in which what matched will be reported. On exit if a
- match occurred then m[0] denotes the whole of the string
- that matched, m[0].first must be equal to first, m[0].second
- will be less than or equal to last. m[1] denotes the
- first subexpression m[2] the second subexpression and so
- on. If no match occurred then m[0].first = m[0].second =
- last.Note that since the match_results structure
- stores only iterators, and not strings, the iterators/strings
- passed to regex_match must be valid for as long as the
- result is to be used. For that reason never pass
- temporary string objects to regex_match.
-
-
-
-
-
- const
- reg_expression<charT, traits, Allocator2>& e
- Contains the regular
- expression to be matched.
-
-
-
-
- unsigned flags =
- match_default
- Determines the semantics
- used for matching, a combination of one or more match_flags enumerators.
-
-
-
-
-regex_match returns false if no match occurs or true if it
-does. A match only occurs if it starts at first and
-finishes at last . Example: the following example
-processes an ftp response:
-
-#include <stdlib.h>
-#include <boost/regex.hpp>
-#include <string>
-#include <iostream>
-
- using namespace boost;
-
-regex expression("([0-9]+)(\\-| |$)(.*)" );
-
-// process_ftp:
-// on success returns the ftp response code, and fills
-// msg with the ftp response message.
- int process_ftp(const char * response, std::string* msg)
-{
- cmatch what;
- if (regex_match(response, what, expression))
- {
- // what[0] contains the whole string
- // what[1] contains the response code
- // what[2] contains the separator character
- // what[3] contains the text message.
- if (msg)
- msg->assign(what[3].first, what[3].second);
- return std::atoi(what[1].first);
- }
- // failure did not match
- if (msg)
- msg->erase();
- return -1;
-}
-
- The value of the flags parameter
-passed to the algorithm must be a combination of one or more of
-the following values:
-
-
-
-
-
- match_default
- The default value, indicates
- that first represents the start of a line, the
- start of a buffer, and (possibly) the start of a word.
- Also implies that last represents the end of a
- line, the end of the buffer and (possibly) the end of a
- word. Implies that a dot sub-expression "."
- will match both the newline character and a null.
-
-
-
-
- match_not_bol
- When this flag is set then first
- does not represent the start of a new line.
-
-
-
-
- match_not_eol
- When this flag is set then last
- does not represent the end of a line.
-
-
-
-
- match_not_bob
- When this flag is set then first
- is not the beginning of a buffer.
-
-
-
-
- match_not_eob
- When this flag is set then last
- does not represent the end of a buffer.
-
-
-
-
- match_not_bow
- When this flag is set then first
- can never match the start of a word.
-
-
-
-
- match_not_eow
- When this flag is set then last
- can never match the end of a word.
-
-
-
-
- match_not_dot_newline
- When this flag is set then a
- dot expression "." can not match the newline
- character.
-
-
-
-
- match_not_dot_null
- When this flag is set then a
- dot expression "." can not match a null
- character.
-
-
-
-
- match_prev_avail
- When this flag
- is set, then *--first is a valid expression and
- the flags match_not_bol and match_not_bow have no effect,
- since the value of the previous character can be used to
- check these.
-
-
-
-
- match_any
- When this flag
- is set, then the first string matched is returned, rather
- than the longest possible match. This flag can
- significantly reduce the time taken to find a match, but
- what matches is undefined.
-
-
-
-
- match_not_null
- When this flag
- is set, then the expression will never match a null
- string.
-
-
-
-
- match_continuous
- When this flags
- is set, then during a grep operation, each successive
- match must start from where the previous match finished.
-
-
-
-
- match_partial
- When this flag
- is set, the regex algorithms will report partial matches - that is
- where one or more characters at the end of the text input
- matched some prefix of the regular expression.
-
-
-
-
-
-
-
-
- Algorithm regex_search
-
- #include <boost/regex.hpp >
-
-
-The algorithm regex_search will search a range denoted by a
-pair of bidirectional-iterators for a given regular expression.
-The algorithm uses various heuristics to reduce the search time
-by only checking for a match if a match could conceivably start
-at that position. The algorithm is defined as follows:
-
-template <class iterator, class Allocator, class charT, class traits, class Allocator2>
-bool regex_search(iterator first,
- iterator last,
- match_results<iterator, Allocator>& m,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-The library also defines the following convenience versions,
-which take either a const charT*, or a const std::basic_string<>&
-in place of a pair of iterators [note - these versions may not be
-available, or may be available in a more limited form, depending
-upon your compilers capabilities]:
-
-template <class charT, class Allocator, class traits, class Allocator2>
-bool regex_search(const charT* str,
- match_results<const charT*, Allocator>& m,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-template <class ST, class SA, class Allocator, class charT, class traits, class Allocator2>
-bool regex_search(const std::basic_string<charT, ST, SA>& s,
- match_results<typename std::basic_string<charT, ST, SA>::const_iterator, Allocator>& m,
- const reg_expression<charT, traits, Allocator2>& e,
- unsigned flags = match_default);
-
-The parameters for the main function version are as follows:
-
-
-
-
-
- iterator first
- The starting position of the
- range to search.
-
-
-
-
- iterator last
- The ending position of the
- range to search.
-
-
-
-
- match_results<iterator,
- Allocator>& m
- An instance of match_results
- in which what matched will be reported. On exit if a
- match occurred then m[0] denotes the whole of the string
- that matched, m[0].first and m[0].second will be less
- than or equal to last. m[1] denotes the first sub-expression
- m[2] the second sub-expression and so on. If no match
- occurred then m[0].first = m[0].second = last.Note
- that since the match_results structure stores only
- iterators, and not strings, the iterators/strings passed
- to regex_search must be valid for as long as the result
- is to be used. For that reason never pass temporary
- string objects to regex_search.
-
-
-
-
-
- const
- reg_expression<charT, traits, Allocator2>& e
- The regular expression to
- search for.
-
-
-
-
- unsigned flags =
- match_default
- The flags that determine
- what gets matched, a combination of one or more match_flags enumerators.
-
-
-
-
-
-
-
-Example: the following example ,
-takes the contents of a file in the form of a string, and
-searches for all the C++ class declarations in the file. The code
-will work regardless of the way that std::string is implemented,
-for example it could easily be modified to work with the SGI rope
-class, which uses a non-contiguous storage strategy.
-
-#include <string>
-#include <map>
-#include <boost/regex.hpp>
-
-// purpose:
-// takes the contents of a file in the form of a string
-// and searches for all the C++ class definitions, storing
-// their locations in a map of strings/int's
- typedef std::map<std::string, int , std::less<std::string> > map_type;
-
-boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)");
-
-void IndexClasses(map_type& m, const std::string& file)
-{
- std::string::const_iterator start, end;
- start = file.begin();
- end = file.end();
- boost::match_results<std::string::const_iterator> what;
- unsigned int flags = boost::match_default;
- while (regex_search(start, end, what, expression, flags))
- {
- // what[0] contains the whole string
- // what[5] contains the class name.
- // what[6] contains the template specialisation if any.
- // add class name and position to map:
- m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] =
- what[5].first - file.begin();
- // update search position:
- start = what[0].second;
- // update flags:
- flags |= boost::match_prev_avail;
- flags |= boost::match_not_bob;
- }
-}
-
-
-
-
- Algorithm regex_grep
-
-#include <boost/regex.hpp >
-
-
- Regex_grep allows you to search through a bidirectional-iterator
-range and locate all the (non-overlapping) matches with a given
-regular expression. The function is declared as:
-
-template <class Predicate, class iterator, class charT, class traits, class Allocator>
-unsigned int regex_grep(Predicate foo,
- iterator first,
- iterator last,
- const reg_expression<charT, traits, Allocator>& e,
- unsigned flags = match_default)
-
-The library also defines the following convenience versions,
-which take either a const charT*, or a const std::basic_string<>&
-in place of a pair of iterators [note - these versions may not be
-available, or may be available in a more limited form, depending
-upon your compilers capabilities]:
-
-template <class Predicate, class charT, class Allocator, class traits>
-unsigned int regex_grep(Predicate foo,
- const charT* str,
- const reg_expression<charT, traits, Allocator>& e,
- unsigned flags = match_default);
-
-template <class Predicate, class ST, class SA, class Allocator, class charT, class traits>
-unsigned int regex_grep(Predicate foo,
- const std::basic_string<charT, ST, SA>& s,
- const reg_expression<charT, traits, Allocator>& e,
- unsigned flags = match_default);
-
-The parameters for the primary version of regex_grep have the
-following meanings:
-
-
-
-
-
- foo
- A predicate function object
- or function pointer, see below for more information.
-
-
-
-
- first
- The start of the range to
- search.
-
-
-
-
- last
- The end of the range to
- search.
-
-
-
-
- e
- The regular expression to
- search for.
-
-
-
-
- flags
- The flags that determine how
- matching is carried out, one of the match_flags
- enumerators.
-
-
-
-
- The algorithm finds all of the non-overlapping matches
-of the expression e, for each match it fills a match_results <iterator, Allocator>
-structure, which contains information on what matched, and calls
-the predicate foo, passing the match_results<iterator,
-Allocator> as a single argument. If the predicate returns
-true, then the grep operation continues, otherwise it terminates
-without searching for further matches. The function returns the
-number of matches found.
-
-The general form of the predicate is:
-
-struct grep_predicate
-{
- bool operator ()(const match_results<iterator_type, expression_type::alloc_type>& m);
-};
-
-For example the regular expression "a*b" would find
-one match in the string "aaaaab" and two in the string
-"aaabb".
-
-Remember this algorithm can be used for a lot more than
-implementing a version of grep, the predicate can be and do
-anything that you want, grep utilities would output the results
-to the screen, another program could index a file based on a
-regular expression and store a set of bookmarks in a list, or a
-text file conversion utility would output to file. The results of
-one regex_grep can even be chained into another regex_grep to
-create recursive parsers.
-
-Example :
-convert the example from regex_search to use regex_grep
-instead:
-
-#include <string>
-#include <map>
-#include <boost/regex.hpp>
-
- // IndexClasses:
-// takes the contents of a file in the form of a string
-// and searches for all the C++ class definitions, storing
-// their locations in a map of strings/int's
-
-typedef std::map<std::string, int , std::less<std::string> > map_type;
-
-boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
- "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)"
- "[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)" );
-
-class IndexClassesPred
-{
- map_type& m;
- std::string::const_iterator base;
-public :
- IndexClassesPred(map_type& a, std::string::const_iterator b) : m(a), base(b) {}
- bool operator ()(const match_results<std::string::const_iterator, regex::alloc_type>& what)
- {
- // what[0] contains the whole string
- // what[5] contains the class name.
- // what[6] contains the template specialisation if any.
- // add class name and position to map:
- m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] =
- what[5].first - base;
- return true ;
- }
-};
-
-void IndexClasses(map_type& m, const std::string& file)
-{
- std::string::const_iterator start, end;
- start = file.begin();
- end = file.end();
- regex_grep(IndexClassesPred(m, start), start, end, expression);
-}
-
-Example :
-Use regex_grep to call a global callback function:
-
-#include <string>
-#include <map>
-#include <boost/regex.hpp>
-
- // purpose:
-// takes the contents of a file in the form of a string
-// and searches for all the C++ class definitions, storing
-// their locations in a map of strings/int's
-
-typedef std::map<std::string, int , std::less<std::string> > map_type;
-
-boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)" );
-
-map_type class_index;
-std::string::const_iterator base;
-
-bool grep_callback(const boost::match_results<std::string::const_iterator, boost::regex::alloc_type>& what)
-{
- // what[0] contains the whole string
- // what[5] contains the class name.
- // what[6] contains the template specialisation if any.
- // add class name and position to map:
- class_index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] =
- what[5].first - base;
- return true ;
-}
-
-void IndexClasses(const std::string& file)
-{
- std::string::const_iterator start, end;
- start = file.begin();
- end = file.end();
- base = start;
- regex_grep(grep_callback, start, end, expression, match_default);
-}
-
-
-Example :
-use regex_grep to call a class member function, use the standard
-library adapters std::mem_fun and std::bind1st to
-convert the member function into a predicate:
-
-#include <string>
-#include <map>
-#include <boost/regex.hpp>
-#include <functional>
-
-// purpose:
-// takes the contents of a file in the form of a string
-// and searches for all the C++ class definitions, storing
-// their locations in a map of strings/int's
-
- typedef std::map<std::string, int , std::less<std::string> > map_type;
-
-class class_index
-{
- boost::regex expression;
- map_type index;
- std::string::const_iterator base;
- bool grep_callback(boost::match_results<std::string::const_iterator, boost::regex::alloc_type> what);
-public :
- void IndexClasses(const std::string& file);
- class_index()
- : index(),
- expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
- "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?"
- "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?"
- "(\\{|:[^;\\{()]*\\{)"
- ){}
-};
-
-bool class_index::grep_callback(boost::match_results<std::string::const_iterator, boost::regex::alloc_type> what)
-{
- // what[0] contains the whole string
- // what[5] contains the class name.
- // what[6] contains the template specialisation if any.
- // add class name and position to map:
- index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] =
- what[5].first - base;
- return true ;
-}
-
-void class_index::IndexClasses(const std::string& file)
-{
- std::string::const_iterator start, end;
- start = file.begin();
- end = file.end();
- base = start;
- regex_grep(std::bind1st(std::mem_fun(&class_index::grep_callback), this ),
- start,
- end,
- expression);
-}
-
-
-Finally ,
-C++ Builder users can use C++ Builder's closure type as a
-callback argument:
-
-#include <string>
-#include <map>
-#include <boost/regex.hpp>
-#include <functional>
-
-// purpose:
-// takes the contents of a file in the form of a string
-// and searches for all the C++ class definitions, storing
-// their locations in a map of strings/int's
-
- typedef std::map<std::string, int , std::less<std::string> > map_type;
-class class_index
-{
- boost::regex expression;
- map_type index;
- std::string::const_iterator base;
- typedef boost::match_results<std::string::const_iterator, boost::regex::alloc_type> arg_type;
- bool grep_callback(const arg_type& what);
-public :
- typedef bool (__closure * grep_callback_type)(const arg_type&);
- void IndexClasses(const std::string& file);
- class_index()
- : index(),
- expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"
- "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?"
- "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?"
- "(\\{|:[^;\\{()]*\\{)"
- ){}
-};
-
-bool class_index::grep_callback(const arg_type& what)
-{
- // what[0] contains the whole string
-// what[5] contains the class name.
-// what[6] contains the template specialisation if any.
-// add class name and position to map:
-index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] =
- what[5].first - base;
- return true ;
-}
-
-void class_index::IndexClasses(const std::string& file)
-{
- std::string::const_iterator start, end;
- start = file.begin();
- end = file.end();
- base = start;
- class_index::grep_callback_type cl = &(this ->grep_callback);
- regex_grep(cl,
- start,
- end,
- expression);
-}
-
-
-
- Algorithm regex_format
-
-#include <boost/regex.hpp >
-
-
-The algorithm regex_format takes the results of a match and
-creates a new string based upon a format string ,
-regex_format can be used for search and replace operations:
-
-template <class OutputIterator, class iterator, class Allocator, class charT>
-OutputIterator regex_format(OutputIterator out,
- const match_results<iterator, Allocator>& m,
- const charT* fmt,
- unsigned flags = 0);
-
-template <class OutputIterator, class iterator, class Allocator, class charT>
-OutputIterator regex_format(OutputIterator out,
- const match_results<iterator, Allocator>& m,
- const std::basic_string<charT>& fmt,
- unsigned flags = 0);
-
-The library also defines the following convenience variation
-of regex_format, which returns the result directly as a string,
-rather than outputting to an iterator [note - this version may
-not be available, or may be available in a more limited form,
-depending upon your compilers capabilities]:
-
-template <class iterator, class Allocator, class charT>
-std::basic_string<charT> regex_format
- (const match_results<iterator, Allocator>& m,
- const charT* fmt,
- unsigned flags = 0);
-
-template <class iterator, class Allocator, class charT>
-std::basic_string<charT> regex_format
- (const match_results<iterator, Allocator>& m,
- const std::basic_string<charT>& fmt,
- unsigned flags = 0);
-
-Parameters to the main version of the function are passed as
-follows:
-
-
-
-
-
- OutputIterator out
- An output iterator type, the
- output string is sent to this iterator. Typically this
- would be a std::ostream_iterator.
-
-
-
-
- const
- match_results<iterator, Allocator>& m
- An instance of
- match_results<> obtained from one of the matching
- algorithms above, and denoting what matched.
-
-
-
-
- const charT* fmt
- A format string that
- determines how the match is transformed into the new
- string.
-
-
-
-
- unsigned flags
- Optional flags which
- describe how the format string is to be interpreted.
-
-
-
-
- Format flags are defined as follows:
-
-
-
-
-
-
- format_all
- Enables all syntax options (perl-like
- plus extentions).
-
-
-
-
- format_sed
- Allows only a sed-like
- syntax.
-
-
-
-
- format_perl
- Allows only a perl-like
- syntax.
-
-
-
-
- format_no_copy
- Disables copying of
- unmatched sections to the output string during regex_merge operations.
-
-
-
-
- format_first_only
- When this flag is set only the first occurance will
- be replaced (applies to regex_merge only).
-
-
-
-
-
-
-
-The format string syntax (and available options) is described
-more fully under format
-strings .
-
-
-
- Algorithm regex_merge
-
-#include <boost/regex.hpp >
-
-
-The algorithm regex_merge is a combination of regex_grep and regex_format .
-That is, it greps through the string finding all the matches to
-the regular expression, for each match it then calls regex_format to format the string and
-sends the result to the output iterator. Sections of text that do
-not match are copied to the output unchanged only if the flags
-parameter does not have the flag format_no_copy
-set. If the flag format_first_only is
-set then only the first occurance is replaced rather than all
-occurrences.
-
-template <class OutputIterator, class iterator, class traits, class Allocator, class charT>
-OutputIterator regex_merge(OutputIterator out,
- iterator first,
- iterator last,
- const reg_expression<charT, traits, Allocator>& e,
- const charT* fmt,
- unsigned int flags = match_default);
-
-template <class OutputIterator, class iterator, class traits, class Allocator, class charT>
-OutputIterator regex_merge(OutputIterator out,
- iterator first,
- iterator last,
- const reg_expression<charT, traits, Allocator>& e,
- std::basic_string<charT>& fmt,
- unsigned int flags = match_default);
-
-The library also defines the following convenience variation
-of regex_merge, which returns the result directly as a string,
-rather than outputting to an iterator [note - this version may
-not be available, or may be available in a more limited form,
-depending upon your compilers capabilities]:
-
-template <class traits, class Allocator, class charT>
-std::basic_string<charT> regex_merge(const std::basic_string<charT>& text,
- const reg_expression<charT, traits, Allocator>& e,
- const charT* fmt,
- unsigned int flags = match_default);
-
-template <class traits, class Allocator, class charT>
-std::basic_string<charT> regex_merge(const std::basic_string<charT>& text,
- const reg_expression<charT, traits, Allocator>& e,
- const std::basic_string<charT>& fmt,
- unsigned int flags = match_default);
-
-Parameters to the main version of the function are passed as
-follows:
-
-
-
-
-
- OutputIterator out
- An output iterator type, the
- output string is sent to this iterator. Typically this
- would be a std::ostream_iterator.
-
-
-
-
- iterator first
- The start of the range of
- text to grep (bidirectional-iterator).
-
-
-
-
- iterator last
- The end of the range of text
- to grep (bidirectional-iterator).
-
-
-
-
- const
- reg_expression<charT, traits, Allocator>& e
- The expression to search for.
-
-
-
-
- const charT* fmt
- The format string to be
- applied to sections of text that match.
-
-
-
-
- unsigned int
- flags = match_default
- Flags which determine how
- the expression is matched - see match_flags ,
- and how the format string is interpreted - see format_flags .
-
-
-
-
-Example: the following example takes
-C/C++ source code as input, and outputs syntax highlighted HTML
-code.
-
-
-#include <fstream>
-#include <sstream>
-#include <string>
-#include <iterator>
-#include <boost/regex.hpp>
-#include <fstream>
-#include <iostream>
-
-// purpose:
-// takes the contents of a file and transform to
-// syntax highlighted code in html format
-
-boost::regex e1, e2;
-extern const char * expression_text;
-extern const char * format_string;
-extern const char * pre_expression;
-extern const char * pre_format;
-extern const char * header_text;
-extern const char * footer_text;
-
-void load_file(std::string& s, std::istream& is)
-{
- s.erase();
- s.reserve(is.rdbuf()->in_avail());
- char c;
- while (is.get(c))
- {
- if (s.capacity() == s.size())
- s.reserve(s.capacity() * 3 );
- s.append(1 , c);
- }
-}
-
-int main(int argc, const char ** argv)
-{
- try{
- e1.assign(expression_text);
- e2.assign(pre_expression);
- for (int i = 1 ; i < argc; ++i)
- {
- std::cout << "Processing file " << argv[i] << std::endl;
- std::ifstream fs(argv[i]);
- std::string in;
- load_file(in, fs);
- std::string out_name(std::string(argv[i]) + std::string(".htm" ));
- std::ofstream os(out_name.c_str());
- os << header_text;
- // strip '<' and '>' first by outputting to a
- // temporary string stream
- std::ostringstream t(std::ios::out | std::ios::binary);
- std::ostream_iterator<char , char > oi(t);
- boost::regex_merge(oi, in.begin(), in.end(), e2, pre_format);
- // then output to final output stream
- // adding syntax highlighting:
- std::string s(t.str());
- std::ostream_iterator<char , char > out(os);
- boost::regex_merge(out, s.begin(), s.end(), e1, format_string);
- os << footer_text;
- }
- }
- catch (...)
- { return -1; }
- return 0 ;
-}
-
-extern const char * pre_expression = "(<)|(>)|\\r" ;
-extern const char * pre_format = "(?1<)(?2>)" ;
-
-
-const char * expression_text = // preprocessor directives: index 1
- "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"
- // comment: index 2
- "(//[^\\n]*|/\\*.*?\\*/)|"
- // literals: index 3
- "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"
- // string literals: index 4
- "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"
- // keywords: index 5
- "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import"
- "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall"
- "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool"
- "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete"
- "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto"
- "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected"
- "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast"
- "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned"
- "|using|virtual|void|volatile|wchar_t|while)\\>"
- ;
-
-const char * format_string = "(?1<font color=\"#008040\">$&</font>)"
- "(?2<I><font color=\"#000080\">$&</font></I>)"
- "(?3<font color=\"#0000A0\">$&</font>)"
- "(?4<font color=\"#0000FF\">$&</font>)"
- "(?5<B>$&</B>)" ;
-
-const char * header_text = "<HTML>\n<HEAD>\n"
- "<TITLE>Auto-generated html formated source</TITLE>\n"
- "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n"
- "</HEAD>\n"
- "<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n"
- "<P> </P>\n<PRE>" ;
-
-const char * footer_text = "</PRE>\n</BODY>\n\n" ;
-
-
-
- Algorithm regex_split
-
-#include <boost/regex.hpp >
-
-
-Algorithm regex_split performs a similar operation to the perl
-split operation, and comes in three overloaded forms:
-
-template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2>
-std::size_t regex_split(OutputIterator out,
- std::basic_string<charT, Traits1, Alloc1>& s,
- const reg_expression<charT, Traits2, Alloc2>& e,
- unsigned flags,
- std::size_t max_split);
-
-template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2>
-std::size_t regex_split(OutputIterator out,
- std::basic_string<charT, Traits1, Alloc1>& s,
- const reg_expression<charT, Traits2, Alloc2>& e,
- unsigned flags = match_default);
-
-template <class OutputIterator, class charT, class Traits1, class Alloc1>
-std::size_t regex_split(OutputIterator out,
- std::basic_string<charT, Traits1, Alloc1>& s);
-
-Each version takes an output-iterator for output, and a string
-for input. If the expression contains no marked sub-expressions,
-then the algorithm writes one string onto the output-iterator for
-each section of input that does not match the expression. If the
-expression does contain marked sub-expressions, then each time a
-match is found, one string for each marked sub-expression will be
-written to the output-iterator. No more than max_split strings
-will be written to the output-iterator. Before returning, all the
-input processed will be deleted from the string s (if max_split
- is not reached then all of s will be deleted). Returns
-the number of strings written to the output-iterator. If the
-parameter max_split is not specified then it defaults to
-UINT_MAX. If no expression is specified, then it defaults to
-"\s+", and splitting occurs on whitespace.
-
-Example :
-the following function will split the input string into a series
-of tokens, and remove each token from the string s :
-
-unsigned tokenise(std::list<std::string>& l, std::string& s)
-{
- return boost::regex_split(std::back_inserter(l), s);
-}
-
-Example :
-the following short program will extract all of the URL's from a
-html file, and print them out to cout :
-
-#include <list>
-#include <fstream>
-#include <iostream>
-#include <boost/regex.hpp>
-
-boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"" ,
- boost::regbase::normal | boost::regbase::icase);
-
-void load_file(std::string& s, std::istream& is)
-{
- s.erase();
- //
- // attempt to grow string buffer to match file size,
- // this doesn't always work...
- s.reserve(is.rdbuf()->in_avail());
- char c;
- while (is.get(c))
- {
- // use logarithmic growth stategy, in case
- // in_avail (above) returned zero:
- if (s.capacity() == s.size())
- s.reserve(s.capacity() * 3);
- s.append(1, c);
- }
-}
-
-
-int main(int argc, char ** argv)
-{
- std::string s;
- std::list<std::string> l;
-
- for (int i = 1; i < argc; ++i)
- {
- std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
- s.erase();
- std::ifstream is(argv[i]);
- load_file(s, is);
- boost::regex_split(std::back_inserter(l), s, e);
- while (l.size())
- {
- s = *(l.begin());
- l.pop_front();
- std::cout << s << std::endl;
- }
- }
- return 0;
-}
-
-
-
- Partial Matches
-
-The match-flag match_partial
can be passed to the
-following algorithms: regex_match , regex_search , and regex_grep .
-When used it indicates that partial as well as full matches
-should be found. A partial match is one that matched one or more
-characters at the end of the text input, but did not match all of
-the regular expression (although it may have done so had more
-input been available). Partial matches are typically used when
-either validating data input (checking each character as it is
-entered on the keyboard), or when searching texts that are either
-too long to load into memory (or even into a memory mapped file),
-or are of indeterminate length (for example the source may be a
-socket or similar). Partial and full matches can be
-differentiated as shown in the following table (the variable M
-represents an instance of match_results<> as filled in by
-regex_match, regex_search or regex_grep):
-
-
-
-
-
- Result
- M[0].matched
- M[0].first
- M[0].second
-
-
- No match
- False
- Undefined
- Undefined
- Undefined
-
-
- Partial match
- True
- False
- Start of partial match.
- End of partial match (end of
- text).
-
-
- Full match
- True
- True
- Start of full match.
- End of full match.
-
-
-
-The following example tests
-to see whether the text could be a valid credit card number, as
-the user presses a key, the character entered would be added to
-the string being built up, and passed to is_possible_card_number
.
-If this returns true then the text could be a valid card number,
-so the user interface's OK button would be enabled. If it returns
-false, then this is not yet a valid card number, but could be
-with more input, so the user interface would disable the OK
-button. Finally, if the procedure throws an exception the input
-could never become a valid number, and the inputted character
-must be discarded, and a suitable error indication displayed to
-the user.
-
-#include <string>
-#include <iostream>
-#include <boost/regex.hpp>
-
-boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})");
-
-bool is_possible_card_number(const std::string& input)
-{
- //
- // return false for partial match, true for full match, or throw for
- // impossible match based on what we have so far...
- boost::match_results<std::string::const_iterator> what;
- if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial))
- {
- // the input so far could not possibly be valid so reject it:
- throw std::runtime_error("Invalid data entered - this could not possibly be a valid card number");
- }
- // OK so far so good, but have we finished?
- if(what[0].matched)
- {
- // excellent, we have a result:
- return true;
- }
- // what we have so far is only a partial match...
- return false;
-}
-
-In the following example , text
-input is taken from a stream containing an unknown amount of
-text; this example simply counts the number of html tags
-encountered in the stream. The text is loaded into a buffer and
-searched a part at a time, if a partial match was encountered,
-then the partial match gets searched a second time as the start
-of the next batch of text:
-
-#include <iostream>
-#include <fstream>
-#include <sstream>
-#include <string>
-#include <boost/regex.hpp>
-
-// match some kind of html tag:
-boost::regex e("<[^>]*>");
-// count how many:
-unsigned int tags = 0;
-// saved position of partial match:
-char* next_pos = 0;
-
-bool grep_callback(const boost::match_results<char*>& m)
-{
- if(m[0].matched == false)
- {
- // save position and return:
- next_pos = m[0].first;
- }
- else
- ++tags;
- return true;
-}
-
-void search(std::istream& is)
-{
- char buf[4096];
- next_pos = buf + sizeof(buf);
- bool have_more = true;
- while(have_more)
- {
- // how much do we copy forward from last try:
- unsigned leftover = (buf + sizeof(buf)) - next_pos;
- // and how much is left to fill:
- unsigned size = next_pos - buf;
- // copy forward whatever we have left:
- memcpy(buf, next_pos, leftover);
- // fill the rest from the stream:
- unsigned read = is.readsome(buf + leftover, size);
- // check to see if we've run out of text:
- have_more = read == size;
- // reset next_pos:
- next_pos = buf + sizeof(buf);
- // and then grep:
- boost::regex_grep(grep_callback,
- buf,
- buf + read + leftover,
- e,
- boost::match_default | boost::match_partial);
- }
-}
-
-
-
-Copyright Dr
-John Maddock 1998-2001 all rights reserved.
-
-
diff --git a/test/pathology/bad_expression_test.cpp b/test/pathology/bad_expression_test.cpp
new file mode 100644
index 00000000..8a929941
--- /dev/null
+++ b/test/pathology/bad_expression_test.cpp
@@ -0,0 +1,52 @@
+/*
+ *
+ * Copyright (c) 1998-2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+ /*
+ * LOCATION: see http://www.boost.org for most recent version.
+ * FILE: recursion_test.cpp
+ * VERSION: see
+ * DESCRIPTION: Test for indefinite recursion and/or stack overrun.
+ */
+
+#include
+#include
+#include
+
+int test_main( int argc, char* argv[] )
+{
+ std::string bad_text(1024, ' ');
+ std::string good_text(200, ' ');
+ good_text.append("xyz");
+
+ boost::smatch what;
+
+ boost::regex e1("(.+)+xyz");
+
+ BOOST_CHECK(boost::regex_search(good_text, what, e1));
+ BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e1), std::runtime_error);
+ BOOST_CHECK(boost::regex_search(good_text, what, e1));
+
+ BOOST_CHECK(boost::regex_match(good_text, what, e1));
+ BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e1), std::runtime_error);
+ BOOST_CHECK(boost::regex_match(good_text, what, e1));
+
+ boost::regex e2("abc|[[:space:]]+(xyz)?[[:space:]]+xyz");
+
+ BOOST_CHECK(boost::regex_search(good_text, what, e2));
+ BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e2), std::runtime_error);
+ BOOST_CHECK(boost::regex_search(good_text, what, e2));
+
+ return 0;
+}
diff --git a/test/pathology/recursion_test.cpp b/test/pathology/recursion_test.cpp
new file mode 100644
index 00000000..1a67eee1
--- /dev/null
+++ b/test/pathology/recursion_test.cpp
@@ -0,0 +1,63 @@
+/*
+ *
+ * Copyright (c) 1998-2002
+ * Dr John Maddock
+ *
+ * Permission to use, copy, modify, distribute and sell this software
+ * and its documentation for any purpose is hereby granted without fee,
+ * provided that the above copyright notice appear in all copies and
+ * that both that copyright notice and this permission notice appear
+ * in supporting documentation. Dr John Maddock makes no representations
+ * about the suitability of this software for any purpose.
+ * It is provided "as is" without express or implied warranty.
+ *
+ */
+
+ /*
+ * LOCATION: see http://www.boost.org for most recent version.
+ * FILE: recursion_test.cpp
+ * VERSION: see
+ * DESCRIPTION: Test for indefinite recursion and/or stack overrun.
+ */
+
+#include
+#include
+#include
+
+int test_main( int argc, char* argv[] )
+{
+ // this regex will recurse twice for each whitespace character matched:
+ boost::regex e("([[:space:]]|.)+");
+
+ std::string bad_text(1024*1024*4, ' ');
+ std::string good_text(200, ' ');
+
+ boost::smatch what;
+
+ //
+ // Over and over: We want to make sure that after a stack error has
+ // been triggered, that we can still conduct a good search and that
+ // subsequent stack failures still do the right thing:
+ //
+ BOOST_CHECK(boost::regex_search(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_search(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_search(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_search(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_search(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_search(good_text, what, e));
+
+ BOOST_CHECK(boost::regex_match(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_match(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_match(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_match(good_text, what, e));
+ BOOST_CHECK_THROW(boost::regex_match(bad_text, what, e), std::runtime_error);
+ BOOST_CHECK(boost::regex_match(good_text, what, e));
+
+ return 0;
+}
\ No newline at end of file
diff --git a/test/regress/v3_tests.txt b/test/regress/v3_tests.txt
new file mode 100644
index 00000000..5ad00e7f
--- /dev/null
+++ b/test/regress/v3_tests.txt
@@ -0,0 +1,908 @@
+;
+;
+; this file contains a script of tests to run through regress.exe
+;
+; comments start with a semicolon and proceed to the end of the line
+;
+; changes to regular expression compile flags start with a "-" as the first
+; non-whitespace character and consist of a list of the printable names
+; of the flags, for example "match_default"
+;
+; Other lines contain a test to perform using the current flag status
+; the first token contains the expression to compile, the second the string
+; to match it against. If the second string is "!" then the expression should
+; not compile, that is the first string is an invalid regular expression.
+; This is then followed by a list of integers that specify what should match,
+; each pair represents the starting and ending positions of a subexpression
+; starting with the zeroth subexpression (the whole match).
+; A value of -1 indicates that the subexpression should not take part in the
+; match at all, if the first value is -1 then no part of the expression should
+; match the string.
+;
+
+- match_default normal REG_EXTENDED
+
+;
+; try some really simple literals:
+a a 0 1
+Z Z 0 1
+Z aaa -1 -1
+Z xxxxZZxxx 4 5
+
+; and some simple brackets:
+(a) zzzaazz 3 4 3 4
+() zzz 0 0 0 0
+() "" 0 0 0 0
+( !
+) !
+(aa !
+aa) !
+a b -1 -1
+\(\) () 0 2
+\(a\) (a) 0 3
+\() !
+(\) !
+p(a)rameter ABCparameterXYZ 3 12 4 5
+[pq](a)rameter ABCparameterXYZ 3 12 4 5
+
+; now try escaped brackets:
+- match_default bk_parens REG_BASIC
+\(a\) zzzaazz 3 4 3 4
+\(\) zzz 0 0 0 0
+\(\) "" 0 0 0 0
+\( !
+\) !
+\(aa !
+aa\) !
+() () 0 2
+(a) (a) 0 3
+(\) !
+\() !
+
+; now move on to "." wildcards
+- match_default normal REG_EXTENDED REG_STARTEND
+. a 0 1
+. \n 0 1
+. \r 0 1
+. \0 0 1
+- match_default normal match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE
+. a 0 1
+. \n -1 -1
+. \r -1 -1
+. \0 0 1
+- match_default normal match_not_dot_null match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE
+. \n -1 -1
+. \r -1 -1
+; this *WILL* produce an error from the POSIX API functions:
+- match_default normal match_not_dot_null match_not_dot_newline REG_EXTENDED REG_STARTEND REG_NEWLINE REG_NO_POSIX_TEST
+. \0 -1 -1
+
+
+;
+; now move on to the repetion ops,
+; starting with operator *
+- match_default normal REG_EXTENDED
+a* b 0 0
+ab* a 0 1
+ab* ab 0 2
+ab* sssabbbbbbsss 3 10
+ab*c* a 0 1
+ab*c* abbb 0 4
+ab*c* accc 0 4
+ab*c* abbcc 0 5
+*a !
+\<* !
+\>* !
+\n* \n\n 0 2
+\** ** 0 2
+\* * 0 1
+
+; now try operator +
+ab+ a -1 -1
+ab+ ab 0 2
+ab+ sssabbbbbbsss 3 10
+ab+c+ a -1 -1
+ab+c+ abbb -1 -1
+ab+c+ accc -1 -1
+ab+c+ abbcc 0 5
++a !
+\<+ !
+\>+ !
+\n+ \n\n 0 2
+\+ + 0 1
+\+ ++ 0 1
+\++ ++ 0 2
+- match_default normal bk_plus_qm REG_EXTENDED REG_NO_POSIX_TEST
++ + 0 1
+\+ !
+a\+ aa 0 2
+
+; now try operator ?
+- match_default normal REG_EXTENDED
+a? b 0 0
+ab? a 0 1
+ab? ab 0 2
+ab? sssabbbbbbsss 3 5
+ab?c? a 0 1
+ab?c? abbb 0 2
+ab?c? accc 0 2
+ab?c? abcc 0 3
+?a !
+\ !
+\>? !
+\n? \n\n 0 1
+\? ? 0 1
+\? ?? 0 1
+\?? ?? 0 1
+- match_default normal bk_plus_qm REG_EXTENDED REG_NO_POSIX_TEST
+? ? 0 1
+\? !
+a\? aa 0 1
+a\? b 0 0
+
+- match_default normal limited_ops
+a? a? 0 2
+a+ a+ 0 2
+a\? a? 0 2
+a\+ a+ 0 2
+
+; now try operator {}
+- match_default normal REG_EXTENDED
+a{2} a -1 -1
+a{2} aa 0 2
+a{2} aaa 0 2
+a{2,} a -1 -1
+a{2,} aa 0 2
+a{2,} aaaaa 0 5
+a{2,4} a -1 -1
+a{2,4} aa 0 2
+a{2,4} aaa 0 3
+a{2,4} aaaa 0 4
+a{2,4} aaaaa 0 4
+; spaces are now allowed inside {}
+"a{ 2 , 4 }" aaaaa 0 4
+a{} !
+"a{ }" !
+a{2 !
+a} !
+\{\} {} 0 2
+
+- match_default normal bk_braces
+a\{2\} a -1 -1
+a\{2\} aa 0 2
+a\{2\} aaa 0 2
+a\{2,\} a -1 -1
+a\{2,\} aa 0 2
+a\{2,\} aaaaa 0 5
+a\{2,4\} a -1 -1
+a\{2,4\} aa 0 2
+a\{2,4\} aaa 0 3
+a\{2,4\} aaaa 0 4
+a\{2,4\} aaaaa 0 4
+"a\{ 2 , 4 \}" aaaaa 0 4
+{} {} 0 2
+
+; now test the alternation operator |
+- match_default normal REG_EXTENDED
+a|b a 0 1
+a|b b 0 1
+a(b|c) ab 0 2 1 2
+a(b|c) ac 0 2 1 2
+a(b|c) ad -1 -1 -1 -1
+|c !
+c| !
+(|) !
+(a|) !
+(|a) !
+a\| a| 0 2
+- match_default normal limited_ops
+a| a| 0 2
+a\| a| 0 2
+| | 0 1
+- match_default normal bk_vbar REG_NO_POSIX_TEST
+a| a| 0 2
+a\|b a 0 1
+a\|b b 0 1
+
+; now test the set operator []
+- match_default normal REG_EXTENDED
+; try some literals first
+[abc] a 0 1
+[abc] b 0 1
+[abc] c 0 1
+[abc] d -1 -1
+[^bcd] a 0 1
+[^bcd] b -1 -1
+[^bcd] d -1 -1
+[^bcd] e 0 1
+a[b]c abc 0 3
+a[ab]c abc 0 3
+a[^ab]c adc 0 3
+a[]b]c a]c 0 3
+a[[b]c a[c 0 3
+a[-b]c a-c 0 3
+a[^]b]c adc 0 3
+a[^-b]c adc 0 3
+a[b-]c a-c 0 3
+a[b !
+a[] !
+
+; then some ranges
+[b-e] a -1 -1
+[b-e] b 0 1
+[b-e] e 0 1
+[b-e] f -1 -1
+[^b-e] a 0 1
+[^b-e] b -1 -1
+[^b-e] e -1 -1
+[^b-e] f 0 1
+a[1-3]c a2c 0 3
+a[3-1]c !
+a[1-3-5]c !
+a[1- !
+
+; and some classes
+a[[:alpha:]]c abc 0 3
+a[[:unknown:]]c !
+a[[: !
+a[[:alpha !
+a[[:alpha:] !
+a[[:alpha,:] !
+a[[:]:]]b !
+a[[:-:]]b !
+a[[:alph:]] !
+a[[:alphabet:]] !
+[[:alnum:]]+ -%@a0X_- 3 6
+[[:alpha:]]+ -%@aX_0- 3 5
+[[:blank:]]+ "a \tb" 1 4
+[[:cntrl:]]+ a\n\tb 1 3
+[[:digit:]]+ a019b 1 4
+[[:graph:]]+ " a%b " 1 4
+[[:lower:]]+ AabC 1 3
+; This test fails with STLPort, disable for now as this is a corner case anyway...
+;[[:print:]]+ "\na b\n" 1 4
+[[:punct:]]+ " %-&\t" 1 4
+[[:space:]]+ "a \n\t\rb" 1 5
+[[:upper:]]+ aBCd 1 3
+[[:xdigit:]]+ p0f3Cx 1 5
+
+; now test flag settings:
+- escape_in_lists REG_NO_POSIX_TEST
+[\n] \n 0 1
+- REG_NO_POSIX_TEST
+[\n] \n -1 -1
+[\n] \\ 0 1
+[[:class:] : 0 1
+[[:class:] [ 0 1
+[[:class:] c 0 1
+
+; line anchors
+- match_default normal REG_EXTENDED
+^ab ab 0 2
+^ab xxabxx -1 -1
+^ab xx\nabzz 3 5
+ab$ ab 0 2
+ab$ abxx -1 -1
+ab$ ab\nzz 0 2
+- match_default match_not_bol match_not_eol normal REG_EXTENDED REG_NOTBOL REG_NOTEOL
+^ab ab -1 -1
+^ab xxabxx -1 -1
+^ab xx\nabzz 3 5
+ab$ ab -1 -1
+ab$ abxx -1 -1
+ab$ ab\nzz 0 2
+
+; back references
+- match_default normal REG_EXTENDED
+a(b)\2c !
+a(b\1)c !
+a(b*)c\1d abbcbbd 0 7 1 3
+a(b*)c\1d abbcbd -1 -1
+a(b*)c\1d abbcbbbd -1 -1
+^(.)\1 abc -1 -1
+a([bc])\1d abcdabbd 4 8 5 6
+; strictly speaking this is at best ambiguous, at worst wrong, this is what most
+; re implimentations will match though.
+a(([bc])\2)*d abbccd 0 6 3 5 3 4
+
+a(([bc])\2)*d abbcbd -1 -1
+a((b)*\2)*d abbbd 0 5 1 4 2 3
+(ab*)[ab]*\1 ababaaa 0 7 0 1
+(a)\1bcd aabcd 0 5 0 1
+(a)\1bc*d aabcd 0 5 0 1
+(a)\1bc*d aabd 0 4 0 1
+(a)\1bc*d aabcccd 0 7 0 1
+(a)\1bc*[ce]d aabcccd 0 7 0 1
+^(a)\1b(c)*cd$ aabcccd 0 7 0 1 4 5
+
+;
+; characters by code:
+- match_default normal REG_EXTENDED REG_STARTEND
+\0101 A 0 1
+\00 \0 0 1
+\0 \0 0 1
+\0172 z 0 1
+
+;
+; word operators:
+\w a 0 1
+\w z 0 1
+\w A 0 1
+\w Z 0 1
+\w _ 0 1
+\w } -1 -1
+\w ` -1 -1
+\w [ -1 -1
+\w @ -1 -1
+; non-word:
+\W a -1 -1
+\W z -1 -1
+\W A -1 -1
+\W Z -1 -1
+\W _ -1 -1
+\W } 0 1
+\W ` 0 1
+\W [ 0 1
+\W @ 0 1
+; word start:
+\ abc 0 3
+abc\> abcd -1 -1
+abc\> abc\n 0 3
+abc\> abc:: 0 3
+; word boundary:
+\babcd " abcd" 2 6
+\bab cab -1 -1
+\bab "\nab" 1 3
+\btag ::tag 2 5
+abc\b abc 0 3
+abc\b abcd -1 -1
+abc\b abc\n 0 3
+abc\b abc:: 0 3
+; within word:
+\B ab 1 1
+a\Bb ab 0 2
+a\B ab 0 1
+a\B a -1 -1
+a\B "a " -1 -1
+
+;
+; buffer operators:
+\`abc abc 0 3
+\`abc \nabc -1 -1
+\`abc " abc" -1 -1
+abc\' abc 0 3
+abc\' abc\n -1 -1
+abc\' "abc " -1 -1
+
+;
+; extra escape sequences:
+\a \a 0 1
+\f \f 0 1
+\n \n 0 1
+\r \r 0 1
+\t \t 0 1
+\v \v 0 1
+
+
+;
+; now follows various complex expressions designed to try and bust the matcher:
+a(((b)))c abc 0 3 1 2 1 2 1 2
+a(b|(c))d abd 0 3 1 2 -1 -1
+a(b|(c))d acd 0 3 1 2 1 2
+a(b*|c)d abbd 0 4 1 3
+; just gotta have one DFA-buster, of course
+a[ab]{20} aaaaabaaaabaaaabaaaab 0 21
+; and an inline expansion in case somebody gets tricky
+a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab] aaaaabaaaabaaaabaaaab 0 21
+; and in case somebody just slips in an NFA...
+a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab](wee|week)(knights|night) aaaaabaaaabaaaabaaaabweeknights 0 31 21 24 24 31
+; one really big one
+1234567890123456789012345678901234567890123456789012345678901234567890 a1234567890123456789012345678901234567890123456789012345678901234567890b 1 71
+; fish for problems as brackets go past 8
+[ab][cd][ef][gh][ij][kl][mn] xacegikmoq 1 8
+[ab][cd][ef][gh][ij][kl][mn][op] xacegikmoq 1 9
+[ab][cd][ef][gh][ij][kl][mn][op][qr] xacegikmoqy 1 10
+[ab][cd][ef][gh][ij][kl][mn][op][q] xacegikmoqy 1 10
+; and as parenthesis go past 9:
+(a)(b)(c)(d)(e)(f)(g)(h) zabcdefghi 1 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
+(a)(b)(c)(d)(e)(f)(g)(h)(i) zabcdefghij 1 10 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
+(a)(b)(c)(d)(e)(f)(g)(h)(i)(j) zabcdefghijk 1 11 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11
+(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k) zabcdefghijkl 1 12 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12
+(a)d|(b)c abc 1 3 -1 -1 1 2
+"_+((www)|(ftp)|(mailto)):_*" "_wwwnocolon _mailto:" 12 20 13 19 -1 -1 -1 -1 13 19
+
+; subtleties of matching
+a(b)?c\1d acd 0 3 -1 -1
+a(b?c)+d accd 0 4 2 3
+(wee|week)(knights|night) weeknights 0 10 0 3 3 10
+.* abc 0 3
+a(b|(c))d abd 0 3 1 2 -1 -1
+a(b|(c))d acd 0 3 1 2 1 2
+a(b*|c|e)d abbd 0 4 1 3
+a(b*|c|e)d acd 0 3 1 2
+a(b*|c|e)d ad 0 2 1 1
+a(b?)c abc 0 3 1 2
+a(b?)c ac 0 2 1 1
+a(b+)c abc 0 3 1 2
+a(b+)c abbbc 0 5 1 4
+a(b*)c ac 0 2 1 1
+(a|ab)(bc([de]+)f|cde) abcdef 0 6 0 1 1 6 3 5
+a([bc]?)c abc 0 3 1 2
+a([bc]?)c ac 0 2 1 1
+a([bc]+)c abc 0 3 1 2
+a([bc]+)c abcc 0 4 1 3
+a([bc]+)bc abcbc 0 5 1 3
+a(bb+|b)b abb 0 3 1 2
+a(bbb+|bb+|b)b abb 0 3 1 2
+a(bbb+|bb+|b)b abbb 0 4 1 3
+a(bbb+|bb+|b)bb abbb 0 4 1 2
+(.*).* abcdef 0 6 0 6
+(a*)* bc 0 0 0 0
+
+; do we get the right subexpression when it is used more than once?
+a(b|c)*d ad 0 2 -1 -1
+a(b|c)*d abcd 0 4 2 3
+a(b|c)+d abd 0 3 1 2
+a(b|c)+d abcd 0 4 2 3
+a(b|c?)+d ad 0 2 1 1
+a(b|c?)+d abcd 0 4 2 3
+a(b|c){0,0}d ad 0 2 -1 -1
+a(b|c){0,1}d ad 0 2 -1 -1
+a(b|c){0,1}d abd 0 3 1 2
+a(b|c){0,2}d ad 0 2 -1 -1
+a(b|c){0,2}d abcd 0 4 2 3
+a(b|c){0,}d ad 0 2 -1 -1
+a(b|c){0,}d abcd 0 4 2 3
+a(b|c){1,1}d abd 0 3 1 2
+a(b|c){1,2}d abd 0 3 1 2
+a(b|c){1,2}d abcd 0 4 2 3
+a(b|c){1,}d abd 0 3 1 2
+a(b|c){1,}d abcd 0 4 2 3
+a(b|c){2,2}d acbd 0 4 2 3
+a(b|c){2,2}d abcd 0 4 2 3
+a(b|c){2,4}d abcd 0 4 2 3
+a(b|c){2,4}d abcbd 0 5 3 4
+a(b|c){2,4}d abcbcd 0 6 4 5
+a(b|c){2,}d abcd 0 4 2 3
+a(b|c){2,}d abcbd 0 5 3 4
+a(b+|((c)*))+d abd 0 3 1 2 -1 -1 -1 -1
+a(b+|((c)*))+d abcd 0 4 2 3 2 3 2 3
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_NOSPEC literal
+\**?/{} \\**?/{} 0 7
+
+- match_default normal REG_EXTENDED REG_NO_POSIX_TEST ; we disable POSIX testing because it can't handle escapes in sets
+; try to match C++ syntax elements:
+; line comment:
+//[^\n]* "++i //here is a line comment\n" 4 28
+; block comment:
+/\*([^*]|\*+[^*/])*\*+/ "/* here is a block comment */" 0 29 26 27
+/\*([^*]|\*+[^*/])*\*+/ "/**/" 0 4 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/***/" 0 5 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/****/" 0 6 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/*****/" 0 7 -1 -1
+/\*([^*]|\*+[^*/])*\*+/ "/*****/*/" 0 7 -1 -1
+; preprossor directives:
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol" 0 19 -1 -1
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) #x" 0 25 -1 -1
+^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) \\ \r\n foo();\\\r\n printf(#x);" 0 53 28 42
+; literals:
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFF 0 4 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 35 0 2 0 2 -1 -1 0 2 -1 -1 -1 -1 -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFu 0 5 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFL 0 5 0 4 0 4 -1 -1 4 5 -1 -1 -1 -1
+((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFFFFFFFFFFFFFFFuint64 0 24 0 18 0 18 -1 -1 19 24 19 24 22 24
+; strings:
+'([^\\']|\\.)*' '\\x3A' 0 6 4 5
+'([^\\']|\\.)*' '\\'' 0 4 1 3
+'([^\\']|\\.)*' '\\n' 0 4 1 3
+
+; now try and test some unicode specific characters:
+- match_default normal REG_PERL REG_UNICODE_ONLY
+[[:unicode:]]+ a\0300\0400z 1 3
+[\x10-\xff] \39135\12409 -1 -1
+[\01-\05]{5} \36865\36865\36865\36865\36865 -1 -1
+
+; finally try some case insensitive matches:
+- match_default normal REG_EXTENDED REG_ICASE
+; upper and lower have no meaning here so they fail, however these
+; may compile with other libraries...
+;[[:lower:]] !
+;[[:upper:]] !
+0123456789@abcdefghijklmnopqrstuvwxyz\[\\\]\^_`ABCDEFGHIJKLMNOPQRSTUVWXYZ\{\|\} 0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]\^_`abcdefghijklmnopqrstuvwxyz\{\|\} 0 72
+
+; known and suspected bugs:
+- match_default normal REG_EXTENDED
+\( ( 0 1
+\) ) 0 1
+\$ $ 0 1
+\^ ^ 0 1
+\. . 0 1
+\* * 0 1
+\+ + 0 1
+\? ? 0 1
+\[ [ 0 1
+\] ] 0 1
+\| | 0 1
+\\ \\ 0 1
+# # 0 1
+\# # 0 1
+a- a- 0 2
+\- - 0 1
+\{ { 0 1
+\} } 0 1
+0 0 0 1
+1 1 0 1
+9 9 0 1
+b b 0 1
+B B 0 1
+< < 0 1
+> > 0 1
+w w 0 1
+W W 0 1
+` ` 0 1
+' ' 0 1
+\n \n 0 1
+, , 0 1
+a a 0 1
+f f 0 1
+n n 0 1
+r r 0 1
+t t 0 1
+v v 0 1
+c c 0 1
+x x 0 1
+: : 0 1
+(\.[[:alnum:]]+){2} "w.a.b " 1 5 3 5
+
+- match_default normal REG_EXTENDED REG_ICASE
+a A 0 1
+A a 0 1
+[abc]+ abcABC 0 6
+[ABC]+ abcABC 0 6
+[a-z]+ abcABC 0 6
+[A-Z]+ abzANZ 0 6
+[a-Z]+ abzABZ 0 6
+[A-z]+ abzABZ 0 6
+[[:lower:]]+ abyzABYZ 0 8
+[[:upper:]]+ abzABZ 0 6
+[[:word:]]+ abcZZZ 0 6
+[[:alpha:]]+ abyzABYZ 0 8
+[[:alnum:]]+ 09abyzABYZ 0 10
+
+; updated tests for version 2:
+- match_default normal REG_EXTENDED
+\x41 A 0 1
+\xff \255 0 1
+\xFF \255 0 1
+- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
+\c@ \0 0 1
+- match_default normal REG_EXTENDED
+\cA \1 0 1
+\cz \58 0 1
+\c= !
+\c? !
+=: =: 0 2
+
+; word start:
+[[:<:]]abcd " abcd" 2 6
+[[:<:]]ab cab -1 -1
+[[:<:]]ab "\nab" 1 3
+[[:<:]]tag ::tag 2 5
+;word end:
+abc[[:>:]] abc 0 3
+abc[[:>:]] abcd -1 -1
+abc[[:>:]] abc\n 0 3
+abc[[:>:]] abc:: 0 3
+
+; collating elements and rewritten set code:
+- match_default normal REG_EXTENDED REG_STARTEND
+[[.zero.]] 0 0 1
+[[.one.]] 1 0 1
+[[.two.]] 2 0 1
+[[.three.]] 3 0 1
+[[.a.]] baa 1 2
+[[.right-curly-bracket.]] } 0 1
+[[.NUL.]] \0 0 1
+[[:<:]z] !
+[a[:>:]] !
+[[=a=]] a 0 1
+[[=right-curly-bracket=]] } 0 1
+- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
+[[.A.]] A 0 1
+[[.A.]] a 0 1
+[[.A.]-b]+ AaBb 0 4
+[A-[.b.]]+ AaBb 0 4
+[[.a.]-B]+ AaBb 0 4
+[a-[.B.]]+ AaBb 0 4
+- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
+[\x61] a 0 1
+[\x61-c]+ abcd 0 3
+[a-\x63]+ abcd 0 3
+- match_default normal REG_EXTENDED REG_STARTEND
+[[.a.]-c]+ abcd 0 3
+[a-[.c.]]+ abcd 0 3
+[[:alpha:]-a] !
+[a-[:alpha:]] !
+
+; try mutli-character ligatures:
+[[.ae.]] ae 0 2
+[[.ae.]] aE -1 -1
+[[.AE.]] AE 0 2
+[[.Ae.]] Ae 0 2
+[[.ae.]-b] a -1 -1
+[[.ae.]-b] b 0 1
+[[.ae.]-b] ae 0 2
+[a-[.ae.]] a 0 1
+[a-[.ae.]] b -1 -1
+[a-[.ae.]] ae 0 2
+- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE
+[[.ae.]] AE 0 2
+[[.ae.]] Ae 0 2
+[[.AE.]] Ae 0 2
+[[.Ae.]] aE 0 2
+[[.AE.]-B] a -1 -1
+[[.Ae.]-b] b 0 1
+[[.Ae.]-b] B 0 1
+[[.ae.]-b] AE 0 2
+
+- match_default normal REG_EXTENDED REG_STARTEND
+;extended perl style escape sequences:
+\e \27 0 1
+\x1b \27 0 1
+\x{1b} \27 0 1
+\x{} !
+\x{ !
+\x} !
+\x !
+\x{yy !
+\x{1b !
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_NO_POSIX_TEST
+\l+ ABabcAB 2 5
+[\l]+ ABabcAB 2 5
+[a-\l] !
+[\l-a] !
+[\L] !
+\L+ abABCab 2 5
+\u+ abABCab 2 5
+[\u]+ abABCab 2 5
+[\U] !
+\U+ ABabcAB 2 5
+\d+ ab012ab 2 5
+[\d]+ ab012ab 2 5
+[\D] !
+\D+ 01abc01 2 5
+\s+ "ab ab" 2 5
+[\s]+ "ab ab" 2 5
+[\S] !
+\S+ " abc " 2 5
+- match_default normal REG_EXTENDED REG_STARTEND
+\Qabc !
+\Qabc\E abcd 0 3
+\Qabc\Ed abcde 0 4
+\Q+*?\\E +*?\\ 0 4
+
+\C+ abcde 0 5
+\X+ abcde 0 5
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_UNICODE_ONLY
+\X+ a\768\769 0 3
+\X+ \2309\2307 0 2 ;DEVANAGARI script
+\X+ \2489\2494 0 2 ;BENGALI script
+
+- match_default normal REG_EXTENDED REG_STARTEND
+\Aabc abc 0 3
+\Aabc aabc -1 -1
+abc\z abc 0 3
+abc\z abcd -1 -1
+abc\Z abc\n\n 0 3
+abc\Z abc 0 3
+
+
+\Gabc abc 0 3
+\Gabc dabcd -1 -1
+a\Gbc abc -1 -1
+a\Aab abc -1 -1
+
+;
+; now test grep,
+; basically check all our restart types - line, word, etc
+; checking each one for null and non-null matches.
+;
+- match_default normal REG_EXTENDED REG_STARTEND REG_GREP
+a " a a a aa" 1 2 3 4 5 6 7 8 8 9
+a+b+ "aabaabbb ab" 0 3 3 8 9 11
+a(b*|c|e)d adabbdacd 0 2 2 6 6 9
+a "\na\na\na\naa" 1 2 3 4 5 6 7 8 8 9
+
+^ " \n\n \n\n\n" 0 0 4 4 5 5 8 8 9 9 10 10
+^ab "ab \nab ab\n" 0 2 5 7
+^[^\n]*\n " \n \n\n \n" 0 4 4 7 7 8 8 11
+\ <123><><><>
+[[:digit:]]* 123ab1 <$0> <123><><><1>
+
+; and now escapes:
+a+ "...aaa,,," $x "$x"
+a+ "...aaa,,," \a "\a"
+a+ "...aaa,,," \f "\f"
+a+ "...aaa,,," \n "\n"
+a+ "...aaa,,," \r "\r"
+a+ "...aaa,,," \t "\t"
+a+ "...aaa,,," \v "\v"
+
+a+ "...aaa,,," \x21 "!"
+a+ "...aaa,,," \x{21} "!"
+a+ "...aaa,,," \c@ \0
+a+ "...aaa,,," \e \27
+a+ "...aaa,,," \0101 A
+a+ "...aaa,,," (\0101) A
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_sed format_no_copy
+(a+)(b+) ...aabb,, \0 aabb
+(a+)(b+) ...aabb,, \1 aa
+(a+)(b+) ...aabb,, \2 bb
+(a+)(b+) ...aabb,, & aabb
+(a+)(b+) ...aabb,, $ $
+(a+)(b+) ...aabb,, $1 $1
+(a+)(b+) ...aabb,, ()?: ()?:
+(a+)(b+) ...aabb,, \\ \\
+(a+)(b+) ...aabb,, \& &
+
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_perl format_no_copy
+(a+)(b+) ...aabb,, $0 aabb
+(a+)(b+) ...aabb,, $1 aa
+(a+)(b+) ...aabb,, $2 bb
+(a+)(b+) ...aabb,, $& aabb
+(a+)(b+) ...aabb,, & &
+(a+)(b+) ...aabb,, \0 \0
+(a+)(b+) ...aabb,, ()?: ()?:
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE
+; move to copying unmatched data:
+a+ "...aaa,,," bbb "...bbb,,,"
+a+(b+) "...aaabb,,," $1 "...bb,,,"
+a+(b+) "...aaabb,,,ab*abbb?" $1 "...bb,,,b*bbb?"
+
+(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A)(?2B) "...AB,,,AB*AB?"
+(a+)|(b+) "...aaabb,,,ab*abbb?" ?1A:B "...AB,,,AB*AB?"
+(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A:B)C "...ACBC,,,ACBC*ACBC?"
+(a+)|(b+) "...aaabb,,,ab*abbb?" ?1:B "...B,,,B*B?"
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_MERGE format_first_only
+; move to copying unmatched data, but replace first occurance only:
+a+ "...aaa,,," bbb "...bbb,,,"
+a+(b+) "...aaabb,,," $1 "...bb,,,"
+a+(b+) "...aaabb,,,ab*abbb?" $1 "...bb,,,ab*abbb?"
+(a+)|(b+) "...aaabb,,,ab*abbb?" (?1A)(?2B) "...Abb,,,ab*abbb?"
+
+;
+; changes to newline handling with 2.11:
+;
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_GREP
+
+^. " \n \r\n " 0 1 3 4 7 8
+.$ " \n \r\n " 1 2 4 5 8 9
+
+- match_default normal REG_EXTENDED REG_STARTEND REG_GREP REG_UNICODE_ONLY
+^. " \8232 \8233 " 0 1 3 4 5 6
+.$ " \8232 \8233 " 1 2 3 4 6 7
+
+;
+; non-greedy repeats added 21/04/00
+- match_default normal REG_EXTENDED
+a** !
+a*? aa 0 0
+a?? aa 0 0
+a++ !
+a+? aa 0 1
+a{1,3}{1} !
+a{1,3}? aaa 0 1
+\w+?w ...ccccccwcccccw 3 10
+\W+\w+?w ...ccccccwcccccw 0 10
+abc|\w+? abd 0 1
+abc|\w+? abcd 0 3
+<\s*tag[^>]*>(.*?)<\s*/tag\s*> " here is some text " 1 29 6 23
+<\s*tag[^>]*>(.*?)<\s*/tag\s*> " < tag attr=\"something\">here is some text< /tag > " 1 49 24 41
+
+;
+; non-marking parenthesis added 25/04/00
+- match_default normal REG_EXTENDED
+(?:abc)+ xxabcabcxx 2 8
+(?:a+)(b+) xaaabbbx 1 7 4 7
+(a+)(?:b+) xaaabbba 1 7 1 4
+(?:(a+)b+) xaaabbba 1 7 1 4
+(?:a+(b+)) xaaabbba 1 7 4 7
+a+(?#b+)b+ xaaabbba 1 7
+(a)(?:b|$) ab 0 2 0 1
+(a)(?:b|$) a 0 1 0 1
+
+
+;
+; try some partial matches:
+- match_partial match_default normal REG_EXTENDED REG_NO_POSIX_TEST
+(xyz)(.*)abc xyzaaab -1 -1 0 3 3 7
+(xyz)(.*)abc xyz -1 -1 0 3 3 3
+(xyz)(.*)abc xy -1 -1 -1 -1 -1 -1
+
+;
+; forward lookahead asserts added 21/01/02
+- match_default normal REG_EXTENDED REG_NO_POSIX_TEST
+((?:(?!a|b)\w)+)(\w+) " xxxabaxxx " 2 11 2 5 5 11
+
+/\*(?:(?!\*/).)*\*/ " /**/ " 2 6
+/\*(?:(?!\*/).)*\*/ " /***/ " 2 7
+/\*(?:(?!\*/).)*\*/ " /********/ " 2 12
+/\*(?:(?!\*/).)*\*/ " /* comment */ " 2 15
+
+<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)<\s*/\s*a\s*> " here " 1 24 16 20
+<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)<\s*/\s*a\s*> " here< / a > " 1 28 16 20
+
+<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)(?=<\s*/\s*a\s*>) " here " 1 20 16 20
+<\s*a[^>]*>((?:(?!<\s*/\s*a\s*>).)*)(?=<\s*/\s*a\s*>) " here< / a > " 1 20 16 20
+
+; filename matching:
+^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ command.com 0 11
+^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ PRN -1 -1
+^(?!^(?:PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(?:\..+)?$)[^\x00-\x1f\\?*:\"|/]+$ COM2 -1 -1
+
+; password checking:
+^(?=.*\d).{4,8}$ abc3 0 4
+^(?=.*\d).{4,8}$ abc3def4 0 8
+^(?=.*\d).{4,8}$ ab2 -1 -1
+^(?=.*\d).{4,8}$ abcdefg -1 -1
+^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ abc3 -1 -1
+^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ abC3 0 4
+^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{4,8}$ ABCD3 -1 -1
+
+
+
+
+
diff --git a/traits_class_ref.htm b/traits_class_ref.htm
deleted file mode 100644
index 669f5a87..00000000
--- a/traits_class_ref.htm
+++ /dev/null
@@ -1,1016 +0,0 @@
-
-
-
-
-
-
-
- regex++ traits-class reference
-
-
-
-
-
-
-
-
- Regex++, Traits Class
- Reference.
- Copyright (c) 1998-2001
- Dr John Maddock
- Permission to use, copy, modify,
- distribute and sell this software and its documentation
- for any purpose is hereby granted without fee, provided
- that the above copyright notice appear in all copies and
- that both that copyright notice and this permission
- notice appear in supporting documentation. Dr John
- Maddock makes no representations about the suitability of
- this software for any purpose. It is provided "as is"
- without express or implied warranty.
-
-
-
-
-
-
-This section describes the traits class requirements of the
-reg_expression template class, these requirements are somewhat
-complex (sorry), and subject to change as uses ask for new
-features, however I will try to keep them stable for a while, and
-ideally the requirements should lessen rather than increase.
-
-The reg_expression traits classes encapsulate both the
-properties of a character type, and the properties of the locale
-associated with that type. The associated locale may be defined
-at run-time (via std::locale), or hard-coded into the traits
-class and determined at compile time.
-
-The following example class illustrates the interface required
-by a "typical" traits class for use with class
-reg_expression:
-
-
-class mytraits
-{
- typedef implementation_defined char_type;
- typedef implementation_defined uchar_type;
- typedef implementation_defined size_type;
- typedef implementation_defined string_type;
- typedef implementation_defined locale_type;
- typedef implementation_defined uint32_t;
- struct sentry
- {
- sentry(const mytraits&);
- operator void*() { return this; }
- };
-
- enum char_syntax_type
- {
- syntax_char = 0,
- syntax_open_bracket = 1, // (
- syntax_close_bracket = 2, // )
- syntax_dollar = 3, // $
- syntax_caret = 4, // ^
- syntax_dot = 5, // .
- syntax_star = 6, // *
- syntax_plus = 7, // +
- syntax_question = 8, // ?
- syntax_open_set = 9, // [
- syntax_close_set = 10, // ]
- syntax_or = 11, // |
- syntax_slash = 12, //
- syntax_hash = 13, // #
- syntax_dash = 14, // -
- syntax_open_brace = 15, // {
- syntax_close_brace = 16, // }
- syntax_digit = 17, // 0-9
- syntax_b = 18, // for \b
- syntax_B = 19, // for \B
- syntax_left_word = 20, // for \<
- syntax_right_word = 21, // for \
- syntax_w = 22, // for \w
- syntax_W = 23, // for \W
- syntax_start_buffer = 24, // for \`
- syntax_end_buffer = 25, // for \'
- syntax_newline = 26, // for newline alt
- syntax_comma = 27, // for {x,y}
-
- syntax_a = 28, // for \a
- syntax_f = 29, // for \f
- syntax_n = 30, // for \n
- syntax_r = 31, // for \r
- syntax_t = 32, // for \t
- syntax_v = 33, // for \v
- syntax_x = 34, // for \xdd
- syntax_c = 35, // for \cx
- syntax_colon = 36, // for [:...:]
- syntax_equal = 37, // for [=...=]
-
- // perl ops:
- syntax_e = 38, // for \e
- syntax_l = 39, // for \l
- syntax_L = 40, // for \L
- syntax_u = 41, // for \u
- syntax_U = 42, // for \U
- syntax_s = 43, // for \s
- syntax_S = 44, // for \S
- syntax_d = 45, // for \d
- syntax_D = 46, // for \D
- syntax_E = 47, // for \Q\E
- syntax_Q = 48, // for \Q\E
- syntax_X = 49, // for \X
- syntax_C = 50, // for \C
- syntax_Z = 51, // for \Z
- syntax_G = 52, // for \G
- syntax_bang = 53, // reserved for future use '!'
- syntax_and = 54, // reserve for future use '&'
- };
-
- enum{
- char_class_none = 0,
- char_class_alpha,
- char_class_cntrl,
- char_class_digit,
- char_class_lower,
- char_class_punct,
- char_class_space,
- char_class_upper,
- char_class_xdigit,
- char_class_blank,
- char_class_unicode,
- char_class_alnum,
- char_class_graph,
- char_class_print,
- char_class_word
- };
-
- static size_t length(const char_type* p);
- unsigned int syntax_type(size_type c)const;
- char_type translate(char_type c, bool icase)const;
- void transform(string_type& out, const string_type& in)const;
- void transform_primary(string_type& out, const string_type& in)const;
- bool is_separator(char_type c)const;
- bool is_combining(char_type)const;
- bool is_class(char_type c, uint32_t f)const;
- int toi(char_type c)const;
- int toi(const char_type*& first, const char_type* last, int radix)const;
- uint32_t lookup_classname(const char_type* first, const char_type* last)const;
- bool lookup_collatename(string_type& buf, const char_type* first, const char_type* last)const;
- locale_type imbue(locale_type l);
- locale_type getloc()const;
- std::string error_string(unsigned id)const;
-
- mytraits();
- ~mytraits();
-};
-
-
-The member types required by a traits class are defined as
-follows:
-
-
-
-
-
- Member
- name
- Description
-
-
-
-
-
- char_type
- The
- character type encapsulated by this traits class, must be
- a POD type, and be convertible to uchar_type.
-
-
-
-
- uchar_type
-
- The
- unsigned type corresponding to char_type, must be
- convertible to size_type.
-
-
-
-
- size_type
- An
- unsigned integral type, with at least as much precision
- as uchar_type.
-
-
-
-
- string_type
-
- A type
- that offers the same facilities as std::basic_string<char_type.
- This is used for collating elements, and sort strings, if
- char_type has no locale dependent collation (it is not a
- "character"), then it could be something
- simpler than std::basic_string.
-
-
-
-
- locale_type
-
- A type
- that encapsulates the locale used by the traits class,
- probably std::locale but could be a platform specific
- type, or a dummy type if per-instance locales are not
- supported by the traits class.
-
-
-
-
- uint32_t
- An
- unsigned integral type with at least 32-bits of
- precision, used as a bitmask type for character
- classification.
-
-
-
-
- sentry
- A class or
- struct type which is constructible from an instance of
- the traits class, and is convertible to void*. An
- instance of type sentry will be constructed before
- compiling each regular expression, it provides an
- opportunity to carry out prefix/suffix operations on the
- traits class. For example a traits class that
- encapsulates the global locale, can use this as an
- opportunity to synchronize with the global locale (by
- updating any cached data).
-
-
-
-
-
-
- The following member constants are used to represent the
-locale independent syntax of a regular expression; the member
-function syntax_type returns one of these values, and is
-used to convert a locale dependent regular expression, into a
-locale-independent sequence of tokens.
-
-
-
-
-
- Member
- constant
- English
- language representation
-
-
-
-
- syntax_char
-
- All non-special
- characters.
-
-
-
-
- syntax_open_bracket
-
- (
-
-
-
-
- syntax_close_bracket
-
- )
-
-
-
-
- syntax_dollar
-
- $
-
-
-
-
- syntax_caret
-
- ^
-
-
-
-
- syntax_dot
-
- .
-
-
-
-
- syntax_star
-
- *
-
-
-
-
- syntax_plus
-
- +
-
-
-
-
- syntax_question
-
- ?
-
-
-
-
- syntax_open_set
-
- [
-
-
-
-
- syntax_close_set
-
- ]
-
-
-
-
- syntax_or
-
- |
-
-
-
-
- syntax_slash
-
- \
-
-
-
-
- syntax_hash
-
- #
-
-
-
-
- syntax_dash
-
- -
-
-
-
-
- syntax_open_brace
-
- {
-
-
-
-
- syntax_close_brace
-
- }
-
-
-
-
- syntax_digit
-
- 0123456789
-
-
-
-
-
- syntax_b
-
- b
-
-
-
-
- syntax_B
-
- B
-
-
-
-
- syntax_left_word
-
- <
-
-
-
-
-
- syntax_right_word
-
-
-
-
-
-
- syntax_w
-
- w
-
-
-
-
- syntax_W
-
- W
-
-
-
-
- syntax_start_buffer
-
- `
-
-
-
-
- syntax_end_buffer
-
- '
-
-
-
-
- syntax_newline
-
- \n
-
-
-
-
- syntax_comma
-
- ,
-
-
-
-
- syntax_a
-
- a
-
-
-
-
- syntax_f
-
- f
-
-
-
-
- syntax_n
-
- n
-
-
-
-
- syntax_r
-
- r
-
-
-
-
- syntax_t
-
- t
-
-
-
-
- syntax_v
-
- v
-
-
-
-
- syntax_x
-
- x
-
-
-
-
- syntax_c
-
- c
-
-
-
-
- syntax_colon
-
- :
-
-
-
-
- syntax_equal
-
- =
-
-
-
-
- syntax_e
-
- e
-
-
-
-
- syntax_l
-
- l
-
-
-
-
- syntax_L
-
- L
-
-
-
-
- syntax_u
-
- u
-
-
-
-
- syntax_U
-
- U
-
-
-
-
- syntax_s
-
- s
-
-
-
-
- syntax_S
-
- S
-
-
-
-
- syntax_d
-
- d
-
-
-
-
- syntax_D
-
- D
-
-
-
-
- syntax_E
-
- E
-
-
-
-
- syntax_Q
-
- Q
-
-
-
-
- syntax_X
-
- X
-
-
-
-
- syntax_C
-
- C
-
-
-
-
- syntax_Z
-
- Z
-
-
-
-
- syntax_G
-
- G
-
-
-
-
- syntax_bang
-
- !
-
-
-
-
- syntax_and
-
- &
-
-
-
-
-
-The following member constants are used to represent
-particular character classifications:
-
-
-
-
-
- Member
- constant
- Description
-
-
-
-
-
- char_class_none
-
- No
- classification, must be zero.
-
-
-
-
- char_class_alpha
-
- All
- alphabetic characters.
-
-
-
-
- char_class_cntrl
-
- All
- control characters.
-
-
-
-
- char_class_digit
-
- All
- decimal digits.
-
-
-
-
- char_class_lower
-
- All lower
- case characters.
-
-
-
-
- char_class_punct
-
- All
- punctuation characters.
-
-
-
-
- char_class_space
-
- All white-space
- characters.
-
-
-
-
- char_class_upper
-
- All upper
- case characters.
-
-
-
-
- char_class_xdigit
-
- All
- hexadecimal digit characters.
-
-
-
-
- char_class_blank
-
- All blank
- characters (space + tab).
-
-
-
-
- char_class_unicode
-
- All
- extended unicode characters - those that can not be
- represented as a single narrow character.
-
-
-
-
- char_class_alnum
-
- All alpha-numeric
- characters.
-
-
-
-
- char_class_graph
-
- All
- graphic characters.
-
-
-
-
- char_class_print
-
- All
- printable characters.
-
-
-
-
- char_class_word
-
- All word
- characters (alphanumeric characters + the underscore).
-
-
-
-
-The following member functions are required by all regular
-expression traits classes, those members that are declared here
-as const , could be declared static instead if the
-class does not contain instance data:
-
-
-
-
-
- Member
- function
- Description
-
-
-
-
-
- static
- size_t length(const char_type* p);
- Returns
- the length of the null-terminated string p.
-
-
-
-
- unsigned
- int syntax_type(size_type c)const;
- Converts
- an input character into a locale independent token (one
- of the syntax_xxx member constants). Called when parsing
- the regular expression into a locale-independent parse
- tree. Example: in English language regular
- expressions we would use "[[:word:]]" to
- represent the character class of all word characters, and
- "\w" as a shortcut for this. Consequently
- syntax_type('w') returns syntax_w. In French language
- regular expressions, we would use "[[:mot:]]"
- in place of "[[:word:]]" and therefore "\m"
- in place of "\w", therefore it is syntax_type('m')
- that returns syntax_w.
-
-
-
-
-
- char_type
- translate(char_type c, bool icase)const;
- Translates
- an input character into a unique identifier that
- represents the equivalence class that that character
- belongs to. If icase is true, then the returned value is
- insensitive to case. [An equivalence class is
- the set of all characters that must be treated as being
- equivalent to each other.]
-
-
-
-
-
- void
- transform(string_type& out, const string_type& in)const;
-
- Transforms
- the string in , into a locale-dependent sort key,
- and stores the result in out .
-
-
-
-
- void
- transform_primary(string_type& out, const
- string_type& in)const;
- Transforms
- the string in, into a locale-dependent primary
- sort key, and stores the result in out .
-
-
-
-
- bool
- is_separator(char_type c)const;
- Returns
- true only if c is a line separator.
-
-
-
-
- bool
- is_combining(char_type c)const;
- Returns
- true only if c is a unicode combining character.
-
-
-
-
- bool
- is_class(char_type c, uint32_t f)const;
- Returns
- true only if c is a member of one of the character
- classes represented by the bitmap f .
-
-
-
-
- int toi(char_type
- c)const;
- Converts
- the character c to a decimal integer. [Precondition:
- is_class(c,char_class_digit)==true]
-
-
-
-
-
- int toi(const
- char_type*& first, const char_type* last, int radix)const;
-
- Converts
- the string [first-last) into an integral value using base
- radix. Stops when it finds the first non-digit
- character, and sets first to point to that
- character. [Precondition: is_class(*first,char_class_digit)==true]
-
-
-
-
-
-
- uint32_t
- lookup_classname(const char_type* first, const char_type*
- last)const;
- Returns
- the bitmap representing the character class [first-last),
- or char_class_none if [first-last) is not recognized as a
- character class name.
-
-
-
-
- bool
- lookup_collatename(string_type& buf, const char_type*
- first, const char_type* last)const;
- If the
- sequence [first-last) is the name of a known collating
- element, then stores the collating element in buf, and
- returns true, otherwise returns false.
-
-
-
-
- locale_type
- imbue(locale_type l);
- Imbues
- the class with the locale l .
-
-
-
-
- locale_type
- getloc()const;
- Returns
- the traits-class locale.
-
-
-
-
- std::string
- error_string(unsigned id)const;
- Returns
- the locale-dependent error-string associated with the
- error-number id . The parameter id is one of
- the REG_XXX error codes described by the POSIX standard,
- and defined in <boost/cregex.hpp.
-
-
-
-
- mytraits();
-
- Constructor.
-
-
-
-
-
- ~ mytraits();
-
- Destructor.
-
-
-
-
-
-There is also an example of a custom traits class supplied by Christian Engström ,
-see iso8859_1_regex_traits.cpp
-and iso8859_1_regex_traits.hpp .
-This example inherits from c_regex_traits and provides it's own
-implementations of two locale specific functions. This ensures
-that the class gives consistent behaviour (albeit tied to one
-locale) on all platforms. A fuller desciption by the author is
-available in the readme file .
-
-
-
-
-Copyright Dr
-John Maddock 1998-2001 all rights reserved.
-
-