From f0f32bdda1792b263b9f0d8192ece72a864cfea6 Mon Sep 17 00:00:00 2001
From: John Maddock #include <boost/pattern_except.hpp> The class Effects: Constructs an object of class Postcondition: Footnotes: the class bad_pattern forms the base class for all
+ pattern-matching exceptions, of which bad_expression is one. The choice
+ of std::runtime_error as the base class for bad_pattern is moot;
+ depending upon how the library is used exceptions may be either logic errors
+ (programmer supplied expressions) or run time errors (user supplied
+ expressions). Revised
+
+ 17 May 2003
+ © Copyright John Maddock 1998-
+
+ 2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty. The template class basic_regex encapsulates regular
+expression parsing and compilation. The class takes three template
+parameters: charT: determines the character type, i.e. either
+char or wchar_t. traits: determines the behavior of the character
+type, for example which character class names are recognized. A
+default traits class is provided:
+regex_traits<charT>. Allocator: the allocator class used to allocate
+memory by the class. For ease of use there are two typedefs that define the two
+standard basic_regex instances, unless you want to use
+custom traits classes or allocators, you won't need to use anything
+other than these: The definition of basic_regex follows: it is based very
+closely on class basic_string, and fulfils the requirements for a
+constant-container of charT. Class basic_regex has the following public member
+functions: The static constant members are provided as synonyms for the
+constants declared in namespace In all Effects: Constructs an object of class Element Value empty() true size() 0 str() basic_string<charT>() Requires: p shall not be a null pointer. Throws: Effects: Constructs an object of class Element Value empty() false size() char_traits<charT>::length(p) str() basic_string<charT>(p) getflags() f mark_count() The number of marked sub-expressions within the expression. Requires: p1 and p2 are not null pointers,
+ Throws: Effects: Constructs an object of class Element Value empty() false size() std::distance(p1,p2) str() basic_string<charT>(p1,p2) getflags() f mark_count() The number of marked sub-expressions within the expression. Requires: p shall not be a null pointer, Throws: Effects: Constructs an object of class Element Value empty() false size() len str() basic_string<charT>(p, len) getflags() f mark_count() The number of marked sub-expressions within the expression. Effects: Constructs an object of class Element Value empty() e.empty() size() e.size() str() e.str() getflags() e.getflags() mark_count() e.mark_count() Throws: Effects: Constructs an object of class Element Value empty() false size() s.size() str() s getflags() f mark_count() The number of marked sub-expressions within the expression. Throws: Effects: Constructs an object of class Element Value empty() false size() distance(first,last) str() basic_string<charT>(first,last) getflags() f mark_count() The number of marked sub-expressions within the expression. Effects: Returns the result of Requires: p shall not be a null pointer. Effects: Returns the result of Effects: Returns the result of Effects: Returns a starting iterator to a sequence of
+characters representing the regular expression. Effects: Returns termination iterator to a sequence of
+characters representing the regular expression. Effects: Returns the length of the sequence of characters
+representing the regular expression. Effects: Returns the maximum length of the sequence of
+characters representing the regular expression. Effects: Returns true if the object does not
+contain a valid regular expression, otherwise false. Effects: Returns the number of marked sub-expressions
+within the regular expresion. Effects: Returns Effects: Returns Effects: Returns Throws: Returns: Effects: Assigns the regular expression contained in the
+string s, interpreted according the option flags specified in f.
+The postconditions of this function are indicated in the table: Element Value empty() false size() s.size() str() s getflags() f mark_count() The number of marked sub-expressions within the expression. Requires: The type InputIterator corresponds to the Input
+Iterator requirements (24.1.1). Effects: Returns Effects: Returns a copy of the Allocator that was passed
+to the object's constructor. Effects: Returns a copy of the regular expression syntax
+flags that were passed to the object's constructor, or the last
+call to Effects: Returns a copy of the character sequence passed
+to the object's constructor, or the last call to Effects: If Effects: Returns the result of Postcondition: Effects: Returns the result of Effects: Swaps the contents of the two regular
+expressions. Postcondition: Complexity: constant time. Effects: Returns Effects: Returns Effects: Returns Effects: Returns Effects: Returns Effects: Returns Effects: Returns (os << e.str()). Effects: calls Revised
+
+17 May 2003
+ © Copyright John
+Maddock 1998-
+
+2003
+ Permission to use, copy, modify, distribute and
+sell this software and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice
+appear in all copies and that both that copyright notice and this
+permission notice appear in supporting documentation. Dr John
+Maddock makes no representations about the suitability of this
+software for any purpose. It is provided "as is" without express or
+implied warranty. You shouldn't need to do anything special to configure
+boost.regex for use with your compiler - the boost.config subsystem should already
+take care of it, if you do have problems (or you are using a
+particularly obscure compiler or platform) then boost.config has a configure script. The following macros (see user.hpp) control how
+boost.regex interacts with the user's locale: The following option applies only if BOOST_REGEX_RECURSIVE is
+set. The following options apply only if BOOST_REGEX_NON_RECURSIVE is
+set. Revised
+17 May 2003
+ © Copyright John
+Maddock 1998-
+2003
+ Permission to use, copy, modify, distribute and
+sell this software and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice
+appear in all copies and that both that copyright notice and this
+permission notice appear in supporting documentation. Dr John
+Maddock makes no representations about the suitability of this
+software for any purpose. It is provided "as is" without express or
+implied warranty. The author can be contacted at
+john_maddock@compuserve.com, the home page for this library is
+at
+http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm,
+and the official boost version can be obtained from www.boost.org/libraries.htm. I am indebted to Robert Sedgewick's "Algorithms in C++" for
+forcing me to think about algorithms and their performance, and to
+the folks at boost for forcing me to think, period. The
+following people have all contributed useful comments or fixes:
+Dave Abrahams, Mike Allison, Edan Ayal, Jayashree Balasubramanian,
+Jan Bölsche, Beman Dawes, Paul Baxter, David Bergman, David
+Dennerline, Edward Diener, Peter Dimov, Robert Dunn, Fabio Forno,
+Tobias Gabrielsson, Rob Gillen, Marc Gregoire, Chris Hecker, Nick
+Hodapp, Jesse Jones, Martin Jost, Boris Krasnovskiy, Jan Hermelink,
+Max Leung, Wei-hao Lin, Jens Maurer, Richard Peters, Heiko Schmidt,
+Jason Shirk, Gerald Slacik, Scobie Smith, Mike Smyth, Alexander
+Sokolovsky, Hervé Poirier, Michael Raykh, Marc Recht, Scott
+VanCamp, Bruno Voigt, Alexey Voinov, Jerry Waldorf, Rob Ward,
+Lealon Watts, Thomas Witt and Yuval Yosef. I am also grateful to
+the manuals supplied with the Henry Spencer, Perl and GNU regular
+expression libraries - wherever possible I have tried to maintain
+compatibility with these libraries and with the POSIX standard -
+the code however is entirely my own, including any bugs! I can
+absolutely guarantee that I will not fix any bugs I don't know
+about, so if you have any comments or spot any bugs, please get in
+touch. Useful further information can be found at: A short tutorial on regular expressions can be
+found here. The Open Unix
+Specification contains a wealth of useful material, including
+the regular expression syntax, and specifications for
+<regex.h> and
+<nl_types.h>. The Pattern
+Matching Pointers site is a "must visit" resource for anyone
+interested in pattern matching. Glimpse and Agrep,
+use a simplified regular expression syntax to achieve faster search
+times. Udi Manber
+and Ricardo
+Baeza-Yates both have a selection of useful pattern matching
+papers available from their respective web sites. Revised
+17 May 2003
+ © Copyright John
+Maddock 1998-
+2003
+ Permission to use, copy, modify, distribute and
+sell this software and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice
+appear in all copies and that both that copyright notice and this
+permission notice appear in supporting documentation. Dr John
+Maddock makes no representations about the suitability of this
+software for any purpose. It is provided "as is" without express or
+implied warranty. There are three demo applications that ship with this library, they all come
+ with makefiles for Borland, Microsoft and gcc compilers, otherwise you will
+ have to create your own makefiles. A regression test application that gives the matching/searching algorithms a
+ full workout. The presence of this program is your guarantee that the library
+ will behave as claimed - at least as far as those items tested are concerned -
+ if anyone spots anything that isn't being tested I'd be glad to hear about it. Files: parse.cpp,
+ regress.cpp, tests.cpp. A simple grep implementation, run with no command line options to find out its
+ usage. Look at fileiter.cpp/fileiter.hpp and
+ the mapfile class to see an example of a "smart" bidirectional iterator that
+ can be used with boost.regex or any other STL algorithm. Files: jgrep.cpp,
+ main.cpp. A simple interactive expression matching application, the results of all
+ matches are timed, allowing the programmer to optimize their regular
+ expressions where performance is critical. Files: regex_timer.cpp. The snippets examples contain the code examples used in the documentation: credit_card_example.cpp:
+ Credit card number formatting code. partial_regex_grep.cpp:
+ Search example using partial matches. partial_regex_match.cpp:
+ regex_match example using partial matches. regex_grep_example_1.cpp:
+ regex_grep example 1: searches a cpp file for class definitions. regex_grep_example_2.cpp:
+ regex_grep example 2: searches a cpp file for class definitions, using a global
+ callback function. regex_grep_example_3.cpp:
+ regex_grep example 2: searches a cpp file for class definitions, using a bound
+ member function callback. regex_grep_example_4.cpp:
+ regex_grep example 2: searches a cpp file for class definitions, using a C++
+ Builder closure as a callback. regex_match_example.cpp:
+ ftp based regex_match example. regex_merge_example.cpp:
+ regex_merge example: converts a C++ file to syntax highlighted HTML. regex_replace_example.cpp:
+ regex_replace example: converts a C++ file to syntax highlighted HTML regex_search_example.cpp:
+ regex_search example: searches a cpp file for class definitions. regex_split_example_1.cpp:
+ regex_split example: split a string into tokens. regex_split_example_2.cpp
+ : regex_split example: spit out linked URL's. Revised
+17 May 2003
+ © Copyright John
+Maddock 1998-
+2003
+ Permission to use, copy, modify, distribute and sell this software
+ and its documentation for any purpose is hereby granted without fee, provided
+ that the above copyright notice appear in all copies and that both that
+ copyright notice and this permission notice appear in supporting documentation.
+ Dr John Maddock makes no representations about the suitability of this software
+ for any purpose. It is provided "as is" without express or implied warranty. Q. Why can't I use the "convenience" versions of
+regex_match / regex_search / regex_grep / regex_format /
+regex_merge? A. These versions may or may not be available depending upon the
+capabilities of your compiler, the rules determining the format of
+these functions are quite complex - and only the versions visible
+to a standard compliant compiler are given in the help. To find out
+what your compiler supports, run <boost/regex.hpp> through
+your C++ pre-processor, and search the output file for the function
+that you are interested in. Q. I can't get
+regex++ to work with escape characters, what's going
+on? A. If you embed regular expressions in C++ code, then remember
+that escape characters are processed twice: once by the C++
+compiler, and once by the regex++ expression compiler, so to pass
+the regular expression \d+ to regex++, you need to embed "\\d+" in
+your code. Likewise to match a literal backslash you will need to
+embed "\\\\" in your code. Q. Why does using parenthesis in a POSIX
+regular expression change the result of a match? For POSIX (extended and basic) regular expressions, but not for
+perl regexes, parentheses don't only mark; they determine what the
+best match is as well. When the expression is compiled as a POSIX
+basic or extended regex then Boost.regex follows the POSIX standard
+leftmost longest rule for determining what matched. So if there is
+more than one possible match after considering the whole
+expression, it looks next at the first sub-expression and then the
+second sub-expression and so on. So... where as If you think about it, had $1 only matched the "123", this would
+be "less good" than the match "00123" which is both further to the
+left and longer. If you want $1 to match only the "123" part, then
+you need to use something like: as the expression. Q. Why don't character ranges work
+properly (POSIX mode only)? Q. Why are there no throw specifications
+on any of the functions? What exceptions can the library
+throw? A. Not all compilers support (or honor) throw specifications,
+others support them but with reduced efficiency. Throw
+specifications may be added at a later date as compilers begin to
+handle this better. The library should throw only three types of
+exception: boost::bad_expression can be thrown by basic_regex when
+compiling a regular expression, std::runtime_error can be thrown
+when a call to basic_regex::imbue tries to open a message catalogue
+that doesn't exist, or when a call to regex_search or regex_match
+results in an "everlasting" search, or when a call to
+RegEx::GrepFiles or RegEx::FindFiles tries to open a file that
+cannot be opened, finally std::bad_alloc can be thrown by just
+about any of the functions in this library. Revised
+
+17 May 2003
+ © Copyright John
+Maddock 1998-
+
+2003
+ Permission to use, copy, modify, distribute and
+sell this software and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice
+appear in all copies and that both that copyright notice and this
+permission notice appear in supporting documentation. Dr John
+Maddock makes no representations about the suitability of this
+software for any purpose. It is provided "as is" without express or
+implied warranty. Format strings are used by the algorithm regex_merge and by match_results::format, and are used
+to transform one string into another. There are three kind of format string: sed, Perl and extended,
+the extended syntax is a superset of the others so this is covered
+first. Extended format syntax In format strings, all characters are treated as literals
+except: ()$\?: To use any of these as literals you must prefix them with the
+escape character \ The following special sequences are recognized: Use the parenthesis characters ( and ) to group sub-expressions
+within the format string, use \( and \) to represent literal '('
+and ')'. The following Perl like expressions expand to a particular
+matched sub-expression: Conditional expressions: Conditional expressions allow two different format strings to be
+selected dependent upon whether a sub-expression participated in
+the match or not: ?Ntrue_expression:false_expression Executes true_expression if sub-expression N participated
+in the match, otherwise executes false_expression. Example: suppose we search for "(while)|(for)" then the format
+string "?1WHILE:FOR" would output what matched, but in upper
+case. The following escape sequences are also allowed: Perl format strings Perl format strings are the same as the default syntax except
+that the characters ()?: have no special meaning. Sed format strings Sed format strings use only the characters \ and & as
+special characters. \n where n is a digit, is expanded to the nth
+sub-expression. & is expanded to the whole of the match (equivalent to
+\0). Other escape sequences are expanded as per the default
+syntax. Revised
+
+17 May 2003
+ © Copyright John
+Maddock 1998-
+
+2003
+ Permission to use, copy, modify, distribute and
+sell this software and its documentation for any purpose is hereby
+granted without fee, provided that the above copyright notice
+appear in all copies and that both that copyright notice and this
+permission notice appear in supporting documentation. Dr John
+Maddock makes no representations about the suitability of this
+software for any purpose. It is provided "as is" without express or
+implied warranty.
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ class bad_expression
+
+
+
+
+
+
+ Synopsis
+ bad_expression
defines the type of objects thrown as
+ exceptions to report errors during the conversion from a string representing a
+ regular expression to a finite state machine.
+namespace boost{
+
+class bad_pattern : public std::runtime_error
+{
+public:
+ explicit bad_pattern(const std::string& s) : std::runtime_error(s){};
+};
+
+class bad_expression : public bad_pattern
+{
+public:
+ bad_expression(const std::string& s) : bad_pattern(s) {}
+};
+
+
+} // namespace boost
+
+ Description
+
+bad_expression(const string& what_arg);
+
+ bad_expression
.strcmp(what(), what_arg.c_str()) == 0
.
+
+
+
+
+
+
+
+
+
+
+Boost.Regex
+
+basic_regex
+
+
+
+
+
+
+
+
+Synopsis
+
+
+#include <boost/regex.hpp>
+
+
+
+namespace boost{
+template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT> >
+class basic_regex;
+typedef basic_regex<char> regex;
+typedef basic_regex<wchar_t> wregex;
+}
+
+
+
+namespace boost{
+
+template <class charT,
+ class traits = regex_traits<charT>,
+ class Allocator = allocator<charT> >
+class basic_regex
+{
+public:
+ // types:
+ typedef charT value_type;
+ typedef implementation defined const_iterator;
+ typedef const_iterator iterator;
+ typedef typename Allocator::reference reference;
+ typedef typename Allocator::const_reference const_reference;
+ typedef typename Allocator::difference_type difference_type;
+ typedef typename Allocator::size_type size_type;
+ typedef Allocator allocator_type;
+ typedef regex_constants::syntax_option_type flag_type;
+ typedef typename traits::locale_type locale_type;
+
+ // constants:
+ static const regex_constants::syntax_option_type normal = regex_constants::normal;
+ static const regex_constants::syntax_option_type icase = regex_constants::icase;
+ static const regex_constants::syntax_option_type nosubs = regex_constants::nosubs;
+ static const regex_constants::syntax_option_type optimize = regex_constants::optimize;
+ static const regex_constants::syntax_option_type collate = regex_constants::collate;
+ static const regex_constants::syntax_option_type ECMAScript = normal;
+ static const regex_constants::syntax_option_type JavaScript = normal;
+ static const regex_constants::syntax_option_type JScript = normal;
+ // these flags are optional, if the functionality is supported
+ // then the flags shall take these names.
+ static const regex_constants::syntax_option_type basic = regex_constants::basic;
+ static const regex_constants::syntax_option_type extended = regex_constants::extended;
+ static const regex_constants::syntax_option_type awk = regex_constants::awk;
+ static const regex_constants::syntax_option_type grep = regex_constants::grep;
+ static const regex_constants::syntax_option_type egrep = regex_constants::egrep;
+ static const regex_constants::syntax_option_type sed = basic = regex_constants::sed;
+ static const regex_constants::syntax_option_type perl = regex_constants::perl;
+
+ // construct/copy/destroy:
+ explicit basic_regex(const Allocator& a = Allocator());
+ explicit basic_regex(const charT* p, flag_type f = regex_constants::normal,
+ const Allocator& a = Allocator());
+ basic_regex(const charT* p1, const charT* p2, flag_type f = regex_constants::normal,
+ const Allocator& a = Allocator());
+ basic_regex(const charT* p, size_type len, flag_type f,
+ const Allocator& a = Allocator());
+ basic_regex(const basic_regex&);
+ template <class ST, class SA>
+ explicit basic_regex(const basic_string<charT, ST, SA>& p,
+ flag_type f = regex_constants::normal,
+ const Allocator& a = Allocator());
+ template <class InputIterator>
+ basic_regex(InputIterator first, inputIterator last,
+ flag_type f = regex_constants::normal,
+ const Allocator& a = Allocator());
+
+ ~basic_regex();
+ basic_regex& operator=(const basic_regex&);
+ basic_regex& operator=(const charT* ptr);
+ template <class ST, class SA>
+ basic_regex& operator=(const basic_string<charT, ST, SA>& p);
+
+ // iterators:
+ const_iterator begin() const;
+ const_iterator end() const;
+ // capacity:
+ size_type size() const;
+ size_type max_size() const;
+ bool empty() const;
+ unsigned mark_count() const;
+
+ //
+ // modifiers:
+ basic_regex& assign(const basic_regex& that);
+ basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal);
+ basic_regex& assign(const charT* first, const charT* last,
+ flag_type f = regex_constants::normal);
+ template <class string_traits, class A>
+ basic_regex& assign(const basic_string<charT, string_traits, A>& s,
+ flag_type f = regex_constants::normal);
+ template <class InputIterator>
+ basic_regex& assign(InputIterator first, InputIterator last,
+ flag_type f = regex_constants::normal);
+
+ // const operations:
+ Allocator get_allocator() const;
+ flag_type getflags() const;
+ basic_string<charT> str() const;
+ int compare(basic_regex&) const;
+ // locale:
+ locale_type imbue(locale_type loc);
+ locale_type getloc() const;
+ // swap
+ void swap(basic_regex&) throw();
+};
+
+template <class charT, class traits, class Allocator>
+bool operator == (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+template <class charT, class traits, class Allocator>
+bool operator != (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+template <class charT, class traits, class Allocator>
+bool operator < (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+template <class charT, class traits, class Allocator>
+bool operator <= (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+template <class charT, class traits, class Allocator>
+bool operator >= (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+template <class charT, class traits, class Allocator>
+bool operator > (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+template <class charT, class io_traits, class re_traits, class Allocator>
+basic_ostream<charT, io_traits>&
+ operator << (basic_ostream<charT, io_traits>& os,
+ const basic_regex<charT, re_traits, Allocator>& e);
+
+template <class charT, class traits, class Allocator>
+void swap(basic_regex<charT, traits, Allocator>& e1,
+ basic_regex<charT, traits, Allocator>& e2);
+
+typedef basic_regex<char> regex;
+typedef basic_regex<wchar_t> wregex;
+
+} // namespace boost
+
+
+Description
+
+basic_regex constants
+
+
+static const regex_constants::syntax_option_type normal = regex_constants::normal;
+static const regex_constants::syntax_option_type icase = regex_constants::icase;
+static const regex_constants::syntax_option_type nosubs = regex_constants::nosubs;
+static const regex_constants::syntax_option_type optimize = regex_constants::optimize;
+static const regex_constants::syntax_option_type collate = regex_constants::collate;
+static const regex_constants::syntax_option_type ECMAScript = normal;
+static const regex_constants::syntax_option_type JavaScript = normal;
+static const regex_constants::syntax_option_type JScript = normal;
+static const regex_constants::syntax_option_type basic = regex_constants::basic;
+static const regex_constants::syntax_option_type extended = regex_constants::extended;
+static const regex_constants::syntax_option_type awk = regex_constants::awk;
+static const regex_constants::syntax_option_type grep = regex_constants::grep;
+static const regex_constants::syntax_option_type egrep = regex_constants::egrep;
+static const regex_constants::syntax_option_type sed = basic = regex_constants::sed;
+static const regex_constants::syntax_option_type perl = regex_constants::perl;
+
+
+
+boost::regex_constants
; for each constant of type
+syntax_option_type
declared in namespace
+boost::regex_constants
then a constant with the same name,
+type and value is declared within the scope of
+basic_regex
.basic_regex constructors
+
+basic_regex
constructors, a copy of the
+Allocator
argument is used for any memory allocation
+performed by the constructor or member functions during the
+lifetime of the object.
+basic_regex(const Allocator& a = Allocator());
+
+
+
+
+
+basic_regex
. The postconditions of this function are
+indicated in the table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+basic_regex(const charT* p, flag_type f = regex_constants::normal, const Allocator& a = Allocator());
+
+
+
+
+bad_expression
if p is not a
+valid regular expression.
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the
+null-terminated string p, and interpreted according to the
+option flags specified
+in f. The postconditions of this function are indicated in
+the table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+basic_regex(const charT* p1, const charT* p2, flag_type f = regex_constants::normal, const Allocator& a = Allocator());
+
+
+
+
+p1 < p2
.bad_expression
if [p1,p2) is not a
+valid regular expression.
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the sequence
+of characters [p1,p2), and interpreted according the option flags specified in f.
+The postconditions of this function are indicated in the table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+basic_regex(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator());
+
+
+
+
+len
+< max_size()
.bad_expression
if p is not a
+valid regular expression.
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the sequence
+of characters [p, p+len), and interpreted according the option flags specified in f.
+The postconditions of this function are indicated in the table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+basic_regex(const basic_regex& e);
+
+
+
+
+
+basic_regex
as a copy of the object e. The
+postconditions of this function are indicated in the table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+template <class ST, class SA>
+basic_regex(const basic_string<charT, ST, SA>& s,
+ flag_type f = regex_constants::normal, const Allocator& a = Allocator());
+
+
+
+
+bad_expression
if s is not a
+valid regular expression.
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the string
+s, and interpreted according to the option flags specified in f.
+The postconditions of this function are indicated in the table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+template <class ForwardIterator>
+basic_regex(ForwardIterator first, ForwardIterator last,
+ flag_type f = regex_constants::normal, const Allocator& a = Allocator());
+
+
+
+
+bad_expression
if the sequence
+[first, last) is not a valid regular expression.
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the sequence
+of characters [first, last), and interpreted according to the option flags specified in
+f. The postconditions of this function are indicated in the
+table:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+basic_regex& operator=(const basic_regex& e);
+
+
+
+
+assign(e.str(),
+e.getflags())
.
+basic_regex& operator=(const charT* ptr);
+
+
+
+
+
+assign(ptr)
.
+template <class ST, class SA>
+basic_regex& operator=(const basic_string<charT, ST, SA>& p);
+
+
+
+
+
+assign(p)
.basic_regex iterators
+
+
+const_iterator begin() const;
+
+
+
+
+
+const_iterator end() const;
+
+
+
+
+basic_regex capacity
+
+
+size_type size() const;
+
+
+
+
+
+size_type max_size() const;
+
+
+
+
+
+bool empty() const;
+
+
+
+
+
+unsigned mark_count() const;
+
+
+
+
+basic_regex assign
+
+
+basic_regex& assign(const basic_regex& that);
+
+
+
+
+assign(that.str(),
+that.getflags())
.
+basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal);
+
+
+
+
+assign(string_type(ptr),
+f)
.
+basic_regex& assign(const charT* first, const charT* last,
+ flag_type f = regex_constants::normal);
+
+
+
+
+assign(string_type(first, last),
+f)
.
+template <class string_traits, class A>
+basic_regex& assign(const basic_string<charT, string_traits, A>& s,
+ flag_type f = regex_constants::normal);
+
+
+
+
+bad_expression
if s is not a
+valid regular expression.*this
.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+template <class InputIterator>
+basic_regex& assign(InputIterator first, InputIterator last,
+ flag_type f = regex_constants::normal);
+
+
+
+
+assign(string_type(first, last),
+f)
.basic_regex constant operations
+
+
+Allocator get_allocator() const;
+
+
+
+
+
+flag_type getflags() const;
+
+
+
+
+assign.
+basic_string<charT> str() const;
+
+
+
+
+
+assign.
+int compare(basic_regex& e)const;
+
+
+
+
+getflags() == e.getflags()
then
+returns str().compare(e.str())
, otherwise returns
+getflags() - e.getflags()
.basic_regex locale
+
+
+locale_type imbue(locale_type l);
+
+
+
+
+
+traits_inst.imbue(l)
where traits_inst
is a
+(default initialized) instance of the template parameter
+traits
stored within the object. Calls to imbue invalidate
+any currently contained regular expression.empty() == true
.
+locale_type getloc() const;
+
+
+
+
+
+traits_inst.getloc()
where traits_inst
is a
+(default initialized) instance of the template parameter
+traits
stored within the object.basic_regex swap
+
+
+void swap(basic_regex& e) throw();
+
+
+
+
+*this
contains the characters
+that were in e, e contains the regular expression
+that was in *this
.basic_regex non-member functions
+
+basic_regex non-member comparison operators
+
+
+template <class charT, class traits, class Allocator>
+bool operator == (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.compare(rhs) == 0
.
+template <class charT, class traits, class Allocator>
+bool operator != (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.compare(rhs) != 0
.
+template <class charT, class traits, class Allocator>
+bool operator < (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.compare(rhs) <
+0
.
+template <class charT, class traits, class Allocator>
+bool operator <= (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.compare(rhs) <=
+0
.
+template <class charT, class traits, class Allocator>
+bool operator >= (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.compare(rhs) >=
+0
.
+template <class charT, class traits, class Allocator>
+bool operator > (const basic_regex<charT, traits, Allocator>& lhs,
+ const basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.compare(rhs) >
+0
.basic_regex inserter.
+
+
+template <class charT, class io_traits, class re_traits, class Allocator>
+basic_ostream<charT, io_traits>&
+ operator << (basic_ostream<charT, io_traits>& os
+ const basic_regex<charT, re_traits, Allocator>& e);
+
+
+
+
+basic_regex non-member swap
+
+
+template <class charT, class traits, class Allocator>
+void swap(basic_regex<charT, traits, Allocator>& lhs,
+ basic_regex<charT, traits, Allocator>& rhs);
+
+
+
+
+lhs.swap(rhs)
.
+
+
+
+
+
+
+
+
+
+
+Boost.Regex
+
+Configuration and setup
+
+
+
+
+
+
+
+
+Contents
+
+
+
+
+Compiler setup.
+
+Locale and traits class selection.
+
+
+
+
+
+
+
+BOOST_REGEX_USE_C_LOCALE
+Forces boost.regex to use the global C locale in its traits
+class support: this is the default behavior on non-windows
+platforms, but MS Windows platforms normally use the Win32 API for
+locale support.
+
+
+
+BOOST_REGEX_USE_CPP_LOCALE
+Forces boost.regex to use std::locale in it's default traits
+class, regular expressions can then be imbued with an
+instance specific locale.
+
+
+BOOST_REGEX_NO_W32
+Tells boost.regex not to use any Win32 API's even when
+available (implies BOOST_REGEX_USE_C_LOCALE unless
+BOOST_REGEX_USE_CPP_LOCALE is set).
+
+
+
+
+Linkage Options
+
+
+
+
+
+
+
+BOOST_REGEX_DYN_LINK
+For Microsoft and Borland C++ builds, this tells boost.regex
+that it should link to the dll build of the boost.regex. By
+default boost.regex will link to its static library build, even if
+the dynamic C runtime library is in use.
+
+
+BOOST_REGEX_NO_LIB
+For Microsoft and Borland C++ builds, this tells boost.regex
+that it should not automatically select the library to link
+to.
+
+
+
+
+Algorithm Selection
+
+
+
+
+
+
+
+BOOST_REGEX_V3
+Tells boost.regex to use the boost-1.30.0 matching algorithm,
+define only if you need maximum compatibility with previous
+behavior.
+
+
+
+BOOST_REGEX_RECURSIVE
+Tells boost.regex to use a stack-recursive matching
+algorithm. This is generally the fastest option (although
+there is very little in it), but can cause stack overflow in
+extreme cases, on Win32 this can be handled safely, but this is not
+the case on other platforms.
+
+
+BOOST_REGEX_NON_RECURSIVE
+Tells boost.regex to use a non-stack recursive matching
+algorithm, this can be slightly slower than the alternative, but is
+always safe no matter how pathological the regular
+expression. This is the default on non-Win32 platforms.
+
+
+
+
+Algorithm Tuning
+
+
+
+
+
+
+BOOST_REGEX_HAS_MS_STACK_GUARD
+Tells boost.regex that Microsoft style __try - __except blocks
+are supported, and can be used to safely trap stack overflow.
+
+
+
+
+
+
+
+
+
+
+BOOST_REGEX_BLOCKSIZE
+In non-recursive mode, boost.regex uses largish blocks of
+memory to act as a stack for the state machine, the larger the
+block size then the fewer allocations that will take place.
+This defaults to 4096 bytes, which is large enough to match the
+vast majority of regular expressions without further
+allocations, however, you can choose smaller or larger values
+depending upon your platforms characteristics.
+
+
+
+BOOST_REGEX_MAX_BLOCKS
+Tells boost.regex how many blocks of size BOOST_REGEX_BLOCKSIZE
+it is permitted to use. If this value is exceeded then
+boost.regex will stop trying to find a match and throw a
+std::runtime_error. Defaults to 1024, don't forget to tweek
+this value if you alter BOOST_REGEX_BLOCKSIZE by much.
+
+
+BOOST_REGEX_MAX_CACHE_BLOCKS
+Tells boost.regex how many memory blocks to store in it's
+internal cache - memory blocks are taken from this cache rather
+than by calling ::operator new. Generally speeking this can
+be an order of magnitude faster than calling ::opertator new each
+time a memory block is required, but has the downside that
+boost.regex can end up caching a large chunk of memory (by default
+up to 16 blocks each of BOOST_REGEX_BLOCKSIZE size). If
+memory is tight then try defining this to 0 (disables all caching),
+or if that is too slow, then a value of 1 or 2, may be
+sufficient. On the other hand, on large multi-processor,
+multi-threaded systems, you may find that a higher value is in
+order.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Boost.Regex
+
+Contacts and Acknowledgements
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Examples
+
+
+
+
+
+
+ regress.exe:
+ jgrep.exe
+ timer.exe
+ Code snippets
+
+
+
+
+
+
+
+
+
+
+
+Boost.Regex
+
+FAQ
+
+
+
+
+
+
+
+
+
+
+"(0*)([0-9]*)" against "00123" would produce
+$1 = "00"
+$2 = "123"
+
+
+
+"0*([0-9)*" against "00123" would produce
+$1 = "00123"
+
+
+
+"0*([1-9][0-9]*)"
+
+
+
+ A. The POSIX standard specifies that character range expressions
+are locale sensitive - so for example the expression [A-Z] will
+match any collating element that collates between 'A' and 'Z'. That
+means that for most locales other than "C" or "POSIX", [A-Z] would
+match the single character 't' for example, which is not what most
+people expect - or at least not what most people have come to
+expect from regular expression engines. For this reason, the
+default behaviour of boost.regex (perl mode) is to turn locale
+sensitive collation off by not setting the regex_constants::collate
+compile time flag. However if you set a non-default compile time
+flag - for example regex_constants::extended or
+regex_constants::basic, then locale dependent collation will be
+enabled, this also applies to the POSIX API functions which use
+either regex_constants::extended or regex_constants::basic
+internally. [Note - when regex_constants::nocollate in effect,
+the library behaves "as if" the LC_COLLATE locale category were
+always "C", regardless of what its actually set to - end
+note].
+
+
+
+
+
+
+
+
+
+
+Boost.Regex
+
+Format String Syntax
+
+
+
+
+
+
+
+
+
+
+ Grouping:
+
+ Sub-expression expansions:
+
+
+
+
+
+
+
+$`
+Expands to all the text from the end
+of the previous match to the start of the current match, if there
+was no previous match in the current operation, then everything
+from the start of the input string to the start of the match.
+
+
+
+
+
+$'
+Expands to all the text from the end
+of the match to the end of the input string.
+
+
+
+
+
+$&
+Expands to all of the current
+match.
+
+
+
+
+
+$0
+Expands to all of the current
+match.
+
+
+
+
+$N
+Expands to the text that matched
+sub-expression N.
+
+
+
+
+
+
+
+ Escape sequences:
+
+
+
+
+
+
+
+\a
+The bell character.
+
+
+
+
+
+\f
+The form feed character.
+
+
+
+
+
+\n
+The newline character.
+
+
+
+
+
+\r
+The carriage return character.
+
+
+
+
+
+\t
+The tab character.
+
+
+
+
+
+\v
+A vertical tab character.
+
+
+
+
+
+\x
+A hexadecimal character - for example
+\x0D.
+
+
+
+
+
+\x{}
+A possible Unicode hexadecimal
+character - for example \x{1A0}
+
+
+
+
+
+\cx
+The ASCII escape character x, for
+example \c@ is equivalent to escape-@.
+
+
+
+
+
+\e
+The ASCII escape character.
+
+
+
+
+\dd
+An octal character constant, for
+example \10.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Boost.Regex
+ Headers
+
+
+
+
There are two main headers used by this library: <boost/regex.hpp> + provides full access to the entire library, while <boost/cregex.hpp> + provides access to just the high level class RegEx, and the POSIX API + functions. +
+There is also a header containing only forward declarations + <boost/regex_fwd.hpp> for use when an interface is dependent upon + boost::basic_regex, but otherwise does not need the full definitions.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/history.html b/doc/Attic/history.html new file mode 100644 index 00000000..17ca695c --- /dev/null +++ b/doc/Attic/history.html @@ -0,0 +1,58 @@ + + + ++
+ |
+
+ Boost.Regex+History+ |
+
+ |
+
Boost 1.31.0.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/implementation.html b/doc/Attic/implementation.html new file mode 100644 index 00000000..dfb8811a --- /dev/null +++ b/doc/Attic/implementation.html @@ -0,0 +1,45 @@ + + + ++
+ |
+
+ Boost.Regex+Implementation+ |
+
+ |
+
Todo.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/install.html b/doc/Attic/install.html new file mode 100644 index 00000000..f24fb744 --- /dev/null +++ b/doc/Attic/install.html @@ -0,0 +1,237 @@ + + + ++
+ |
+
+ Boost.Regex+Installation+ |
+
+ |
+
[ Important: If you are upgrading from the + 2.x version of this library then you will find a number of changes to the + documented header names and library interfaces, existing code should still + compile unchanged however - see + Note for Upgraders. ]
+When you extract the library from its zip file, you must preserve its internal + directory structure (for example by using the -d option when extracting). If + you didn't do that when extracting, then you'd better stop reading this, delete + the files you just extracted, and try again! +
+This library should not need configuring before use; most popular + compilers/standard libraries/platforms are already supported "as is". If you do + experience configuration problems, or just want to test the configuration with + your compiler, then the process is the same as for all of boost; see the + configuration library documentation.
+The library will encase all code inside namespace boost. +
+Unlike some other template libraries, this library consists of a mixture of + template code (in the headers) and static code and data (in cpp files). + Consequently it is necessary to build the library's support code into a library + or archive file before you can use it, instructions for specific platforms are + as follows: +
+ +make -fbcb5.mak+
The build process will build a variety of .lib and .dll files (the exact number + depends upon the version of Borland's tools you are using) the .lib and dll + files will be in a sub-directory called bcb4 or bcb5 depending upon the + makefile used. To install the libraries into your development system use:
+make -fbcb5.mak install
+library files will be copied to <BCROOT>/lib and the dll's to + <BCROOT>/bin, where <BCROOT> corresponds to the install path of + your Borland C++ tools. +
+You may also remove temporary files created during the build process (excluding + lib and dll files) by using:
+make -fbcb5.mak clean
+Finally when you use regex++ it is only necessary for you to add the + <boost> root director to your list of include directories for that + project. It is not necessary for you to manually add a .lib file to the + project; the headers will automatically select the correct .lib file for your + build mode and tell the linker to include it. There is one caveat however: the + library can not tell the difference between VCL and non-VCL enabled builds when + building a GUI application from the command line, if you build from the command + line with the 5.5 command line tools then you must define the pre-processor + symbol _NO_VCL in order to ensure that the correct link libraries are selected: + the C++ Builder IDE normally sets this automatically. Hint, users of the 5.5 + command line tools may want to add a -D_NO_VCL to bcc32.cfg in order to set + this option permanently. +
+If you would prefer to do a static link to the regex libraries even when using + the dll runtime then define BOOST_REGEX_STATIC_LINK, and if you want to + suppress automatic linking altogether (and supply your own custom build of the + lib) then define BOOST_REGEX_NO_LIB.
+If you are building with C++ Builder 6, you will find that + <boost/regex.hpp> can not be used in a pre-compiled header (the actual + problem is in <locale> which gets included by <boost/regex.hpp>), + if this causes problems for you, then try defining BOOST_NO_STD_LOCALE when + building, this will disable some features throughout boost, but may save you a + lot in compile times!
+ +You need version 6 of MSVC to build this library. If you are using VC5 then you + may want to look at one of the previous releases of this + library +
+Open up a command prompt, which has the necessary MSVC environment variables + defined (for example by using the batch file Vcvars32.bat installed by the + Visual Studio installation), and change to the <boost>\libs\regex\build + directory. +
+Select the correct makefile - vc6.mak for "vanilla" Visual C++ 6 or + vc6-stlport.mak if you are using STLPort.
+Invoke the makefile like this:
+nmake -fvc6.mak
+You will now have a collection of lib and dll files in a "vc6" subdirectory, to + install these into your development system use:
+nmake -fvc6.mak install
+The lib files will be copied to your <VC6>\lib directory and the dll + files to <VC6>\bin, where <VC6> is the root of your Visual C++ 6 + installation.
+You can delete all the temporary files created during the build (excluding lib + and dll files) using:
+nmake -fvc6.mak clean +
+Finally when you use regex++ it is only necessary for you to add the + <boost> root directory to your list of include directories for that + project. It is not necessary for you to manually add a .lib file to the + project; the headers will automatically select the correct .lib file for your + build mode and tell the linker to include it. +
+Note that if you want to statically link to the regex library when using the + dynamic C++ runtime, define BOOST_REGEX_STATIC_LINK when building your project + (this only has an effect for release builds). If you want to add the source + directly to your project then define BOOST_REGEX_NO_LIB to disable automatic + library selection.
+Important: there have been some reports of + compiler-optimization bugs affecting this library, (particularly with VC6 + versions prior to service patch 5) the workaround is to build the library using + /Oityb1 rather than /O2. That is to use all optimization settings except /Oa. + This problem is reported to affect some standard library code as well (in fact + I'm not sure if the problem is with the regex code or the underlying standard + library), so it's probably worthwhile applying this workaround in normal + practice in any case.
+Note: if you have replaced the C++ standard library that comes with VC6, then + when you build the library you must ensure that the environment variables + "INCLUDE" and "LIB" have been updated to reflect the include and library paths + for the new library - see vcvars32.bat (part of your Visual Studio + installation) for more details. Alternatively if STLPort is in c:/stlport then + you could use:
+nmake INCLUDES="-Ic:/stlport/stlport" XLFLAGS="/LIBPATH:c:/stlport/lib" + -fvc6-stlport.mak
+If you are building with the full STLPort v4.x, then use the vc6-stlport.mak
+ file provided and set the environment variable STLPORT_PATH to point to the
+ location of your STLport installation (Note that the full STLPort libraries
+ appear not to support single-thread static builds).
+
+
+
+
+
There is a conservative makefile for the g++ compiler. From the command prompt + change to the <boost>/libs/regex/build directory and type: +
+make -fgcc.mak +
+At the end of the build process you should have a gcc sub-directory containing + release and debug versions of the library (libboost_regex.a and + libboost_regex_debug.a). When you build projects that use regex++, you will + need to add the boost install directory to your list of include paths and add + <boost>/libs/regex/build/gcc/libboost_regex.a to your list of library + files. +
+There is also a makefile to build the library as a shared library:
+make -fgcc-shared.mak
+which will build libboost_regex.so and libboost_regex_debug.so.
+Both of the these makefiles support the following environment variables:
+CXXFLAGS: extra compiler options - note that this applies to both the debug and + release builds.
+INCLUDES: additional include directories.
+LDFLAGS: additional linker options.
+LIBS: additional library files.
+For the more adventurous there is a configure script in + <boost>/libs/config; see the config library + documentation.
+ +There is a makefile for the sun (6.1) compiler (C++ version 3.12). From the + command prompt change to the <boost>/libs/regex/build directory and type: +
+dmake -f sunpro.mak +
+At the end of the build process you should have a sunpro sub-directory + containing single and multithread versions of the library (libboost_regex.a, + libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so). When you + build projects that use regex++, you will need to add the boost install + directory to your list of include paths and add + <boost>/libs/regex/build/sunpro/ to your library search path. +
+Both of the these makefiles support the following environment variables:
+CXXFLAGS: extra compiler options - note that this applies to both the single + and multithreaded builds.
+INCLUDES: additional include directories.
+LDFLAGS: additional linker options.
+LIBS: additional library files.
+LIBSUFFIX: a suffix to mangle the library name with (defaults to nothing).
+This makefile does not set any architecture specific options like -xarch=v9, + you can set these by defining the appropriate macros, for example:
+dmake CXXFLAGS="-xarch=v9" LDFLAGS="-xarch=v9" LIBSUFFIX="_v9" -f sunpro.mak
+will build v9 variants of the regex library named libboost_regex_v9.a etc.
+ +There is a generic makefile (generic.mak) + provided in <boost-root>/libs/regex/build - see that makefile for details + of environment variables that need to be set before use. Alternatively you can + using the Jam based build system. If + you need to configure the library for your platform, then refer to the + config library documentation + . +
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/introduction.html b/doc/Attic/introduction.html new file mode 100644 index 00000000..cd00847a --- /dev/null +++ b/doc/Attic/introduction.html @@ -0,0 +1,176 @@ + + + ++
+ |
+
+ Boost.Regex+Introduction+ |
+
+ |
+
Regular expressions are a form of pattern-matching that are often used in text + processing; many users will be familiar with the Unix utilities grep, sed + and awk, and the programming language Perl, each of which make + extensive use of regular expressions. Traditionally C++ users have been limited + to the POSIX C API's for manipulating regular expressions, and while regex++ + does provide these API's, they do not represent the best way to use the + library. For example regex++ can cope with wide character strings, or search + and replace operations (in a manner analogous to either sed or Perl), something + that traditional C libraries can not do.
+The class boost::basic_regex is the key class in + this library; it represents a "machine readable" regular expression, and is + very closely modeled on std::basic_string, think of it as a string plus the + actual state-machine required by the regular expression algorithms. Like + std::basic_string there are two typedefs that are almost always the means by + which this class is referenced:
+namespace boost{ + +template <class charT, + class traits = regex_traits<charT>, + class Allocator = std::allocator<charT> > +class basic_regex; + +typedef basic_regex<char> regex; +typedef basic_regex<wchar_t> wregex; + +}+
To see how this library can be used, imagine that we are writing a credit card + processing application. Credit card numbers generally come as a string of + 16-digits, separated into groups of 4-digits, and separated by either a space + or a hyphen. Before storing a credit card number in a database (not necessarily + something your customers will appreciate!), we may want to verify that the + number is in the correct format. To match any digit we could use the regular + expression [0-9], however ranges of characters like this are actually locale + dependent. Instead we should use the POSIX standard form [[:digit:]], or the + regex++ and Perl shorthand for this \d (note that many older libraries tended + to be hard-coded to the C-locale, consequently this was not an issue for them). + That leaves us with the following regular expression to validate credit card + number formats:
+(\d{4}[- ]){3}\d{4}
+Here the parenthesis act to group (and mark for future reference) + sub-expressions, and the {4} means "repeat exactly 4 times". This is an example + of the extended regular expression syntax used by Perl, awk and egrep. Regex++ + also supports the older "basic" syntax used by sed and grep, but this is + generally less useful, unless you already have some basic regular expressions + that you need to reuse.
+Now let's take that expression and place it in some C++ code to validate the + format of a credit card number:
+bool validate_card_format(const std::string s) +{ + static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); + return regex_match(s, e); +}+
Note how we had to add some extra escapes to the expression: remember that the + escape is seen once by the C++ compiler, before it gets to be seen by the + regular expression engine, consequently escapes in regular expressions have to + be doubled up when embedding them in C/C++ code. Also note that all the + examples assume that your compiler supports Koenig lookup, if yours doesn't + (for example VC6), then you will have to add some boost:: prefixes to some of + the function calls in the examples.
+Those of you who are familiar with credit card processing, will have realized + that while the format used above is suitable for human readable card numbers, + it does not represent the format required by online credit card systems; these + require the number as a string of 16 (or possibly 15) digits, without any + intervening spaces. What we need is a means to convert easily between the two + formats, and this is where search and replace comes in. Those who are familiar + with the utilities sed and Perl will already be ahead here; we + need two strings - one a regular expression - the other a "format + string" that provides a description of the text to replace the match + with. In regex++ this search and replace operation is performed with the + algorithm regex_replace, for our credit card example we can write two algorithms + like this to provide the format conversions:
+// match any format with the regular expression: +const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); +const std::string machine_format("\\1\\2\\3\\4"); +const std::string human_format("\\1-\\2-\\3-\\4"); + +std::string machine_readable_card_number(const std::string s) +{ + return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); +} + +std::string human_readable_card_number(const std::string s) +{ + return regex_replace(s, e, human_format, boost::match_default | boost::format_sed); +}+
Here we've used marked sub-expressions in the regular expression to split out + the four parts of the card number as separate fields, the format string then + uses the sed-like syntax to replace the matched text with the reformatted + version.
+In the examples above, we haven't directly manipulated the results of a regular + expression match, however in general the result of a match contains a number of + sub-expression matches in addition to the overall match. When the library needs + to report a regular expression match it does so using an instance of the class + match_results, as before there are typedefs of this class for the most + common cases: +
+namespace boost{ +typedef match_results<const char*> cmatch; +typedef match_results<const wchar_t*> wcmatch; +typedef match_results<std::string::const_iterator> smatch; +typedef match_results<std::wstring::const_iterator> wsmatch; +}+
The algorithms regex_search and + regex_grep (i.e. finding all matches in a string) make use of + match_results to report what matched.
+Note that these algorithms are not restricted to searching regular C-strings, + any bidirectional iterator type can be searched, allowing for the possibility + of seamlessly searching almost any kind of data. +
+For search and replace operations in addition to the algorithm + regex_replace that we have already seen, the algorithm + regex_format takes the result of a match and a format string, and + produces a new string by merging the two.
+For those that dislike templates, there is a high level wrapper class RegEx + that is an encapsulation of the lower level template code - it provides a + simplified interface for those that don't need the full power of the library, + and supports only narrow characters, and the "extended" regular expression + syntax. +
+The POSIX API functions: regcomp, regexec, regfree + and regerror, are available in both narrow character and Unicode versions, and + are provided for those who need compatibility with these API's. +
+Finally, note that the library now has run-time localization + support, and recognizes the full POSIX regular expression syntax - including + advanced features like multi-character collating elements and equivalence + classes - as well as providing compatibility with other regular expression + libraries including GNU and BSD4 regex packages, and to a more limited extent + Perl 5. +
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998-2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + + diff --git a/doc/Attic/localisation.html b/doc/Attic/localisation.html new file mode 100644 index 00000000..e4184fd8 --- /dev/null +++ b/doc/Attic/localisation.html @@ -0,0 +1,1032 @@ + + + + +
+ |
+
+Boost.Regex+ +Localisation+ |
+
+ |
+
Boost.regex provides extensive support for run-time +localization, the localization model used can be split into two +parts: front-end and back-end.
+ +Front-end localization deals with everything which the user sees +- error messages, and the regular expression syntax itself. For +example a French application could change [[:word:]] to [[:mot:]] +and \w to \m. Modifying the front end locale requires active +support from the developer, by providing the library with a message +catalogue to load, containing the localized strings. Front-end +locale is affected by the LC_MESSAGES category only.
+ +Back-end localization deals with everything that occurs after +the expression has been parsed - in other words everything that the +user does not see or interact with directly. It deals with case +conversion, collation, and character class membership. The back-end +locale does not require any intervention from the developer - the +library will acquire all the information it requires for the +current locale from the underlying operating system / run time +library. This means that if the program user does not interact with +regular expressions directly - for example if the expressions are +embedded in your C++ code - then no explicit localization is +required, as the library will take care of everything for you. For +example embedding the expression [[:word:]]+ in your code will +always match a whole word, if the program is run on a machine with, +for example, a Greek locale, then it will still match a whole word, +but in Greek characters rather than Latin ones. The back-end locale +is affected by the LC_TYPE and LC_COLLATE categories.
+ +There are three separate localization mechanisms supported by +boost.regex:
+ +This is the default model when the library is compiled under +Win32, and is encapsulated by the traits class w32_regex_traits. +When this model is in effect there is a single global locale as +defined by the user's control panel settings, and returned by +GetUserDefaultLCID. All the settings used by boost.regex are +acquired directly from the operating system bypassing the C run +time library. Front-end localization requires a resource dll, +containing a string table with the user-defined strings. The traits +class exports the function:
+ +static std::string set_message_catalogue(const std::string& +s);
+ +which needs to be called with a string identifying the name of +the resource dll, before your code compiles any regular +expressions (but not necessarily before you construct any +basic_regex instances):
+ ++boost::w32_regex_traits<char>::set_message_catalogue("mydll.dll");
+ +Note that this API sets the dll name for both the narrow +and wide character specializations of w32_regex_traits.
+ +This model does not currently support thread specific locales +(via SetThreadLocale under Windows NT), the library provides full +Unicode support under NT, under Windows 9x the library degrades +gracefully - characters 0 to 255 are supported, the remainder are +treated as "unknown" graphic characters.
+ +This is the default model when the library is compiled under an +operating system other than Win32, and is encapsulated by the +traits class c_regex_traits, Win32 users can force this +model to take effect by defining the pre-processor symbol +BOOST_REGEX_USE_C_LOCALE. When this model is in effect there is a +single global locale, as set by setlocale. All settings are +acquired from your run time library, consequently Unicode support +is dependent upon your run time library implementation. Front end +localization requires a POSIX message catalogue. The traits class +exports the function:
+ +static std::string set_message_catalogue(const std::string& +s);
+ +which needs to be called with a string identifying the name of +the message catalogue, before your code compiles any regular +expressions (but not necessarily before you construct any +basic_regex instances):
+ ++boost::c_regex_traits<char>::set_message_catalogue("mycatalogue");
+ +Note that this API sets the dll name for both the narrow +and wide character specializations of c_regex_traits. If your run +time library does not support POSIX message catalogues, then you +can either provide your own implementation of <nl_types.h> or +define BOOST_RE_NO_CAT to disable front-end localization via +message catalogues.
+ +Note that calling setlocale invalidates all compiled +regular expressions, calling setlocale(LC_ALL, "C") will +make this library behave equivalent to most traditional regular +expression libraries including version 1 of this library.
+ +This model is only in effect if the library is built with the +pre-processor symbol BOOST_REGEX_USE_CPP_LOCALE defined. When this +model is in effect each instance of basic_regex<> has its own +instance of std::locale, class basic_regex<> also has a +member function imbue which allows the locale for the +expression to be set on a per-instance basis. Front end +localization requires a POSIX message catalogue, which will be +loaded via the std::messages facet of the expression's locale, the +traits class exports the symbol:
+ +static std::string set_message_catalogue(const std::string& +s);
+ +which needs to be called with a string identifying the name of +the message catalogue, before your code compiles any regular +expressions (but not necessarily before you construct any +basic_regex instances):
+ ++boost::cpp_regex_traits<char>::set_message_catalogue("mycatalogue");
+ +Note that calling basic_regex<>::imbue will invalidate any +expression currently compiled in that instance of +basic_regex<>. This model is the one which closest fits the +ethos of the C++ standard library, however it is the model which +will produce the slowest code, and which is the least well +supported by current standard library implementations, for example +I have yet to find an implementation of std::locale which supports +either message catalogues, or locales other than "C" or +"POSIX".
+ +Finally note that if you build the library with a non-default +localization model, then the appropriate pre-processor symbol +(BOOST_REGEX_USE_C_LOCALE or BOOST_REGEX_USE_CPP_LOCALE) must be +defined both when you build the support library, and when you +include <boost/regex.hpp> or <boost/cregex.hpp> in your +code. The best way to ensure this is to add the #define to +<boost/regex/user.hpp>.
+ +In order to localize the front end of the library, you need to
+provide the library with the appropriate message strings contained
+either in a resource dll's string table (Win32 model), or a POSIX
+message catalogue (C or C++ models). In the latter case the
+messages must appear in message set zero of the catalogue. The
+messages and their id's are as follows:
+
+ | Message id | +Meaning | +Default value | ++ |
+ | 101 | +The character used to start a +sub-expression. | +"(" | ++ |
+ | 102 | +The character used to end a +sub-expression declaration. | +")" | ++ |
+ | 103 | +The character used to denote an end of +line assertion. | +"$" | ++ |
+ | 104 | +The character used to denote the start +of line assertion. | +"^" | ++ |
+ | 105 | +The character used to denote the +"match any character expression". | +"." | ++ |
+ | 106 | +The match zero or more times +repetition operator. | +"*" | ++ |
+ | 107 | +The match one or more repetition +operator. | +"+" | ++ |
+ | 108 | +The match zero or one repetition +operator. | +"?" | ++ |
+ | 109 | +The character set opening +character. | +"[" | ++ |
+ | 110 | +The character set closing +character. | +"]" | ++ |
+ | 111 | +The alternation operator. | +"|" | ++ |
+ | 112 | +The escape character. | +"\\" | ++ |
+ | 113 | +The hash character (not currently +used). | +"#" | ++ |
+ | 114 | +The range operator. | +"-" | ++ |
+ | 115 | +The repetition operator opening +character. | +"{" | ++ |
+ | 116 | +The repetition operator closing +character. | +"}" | ++ |
+ | 117 | +The digit characters. | +"0123456789" | ++ |
+ | 118 | +The character which when preceded by +an escape character represents the word boundary assertion. | +"b" | ++ |
+ | 119 | +The character which when preceded by +an escape character represents the non-word boundary +assertion. | +"B" | ++ |
+ | 120 | +The character which when preceded by +an escape character represents the word-start boundary +assertion. | +"<" | ++ |
+ | 121 | +The character which when preceded by +an escape character represents the word-end boundary +assertion. | +">" | ++ |
+ | 122 | +The character which when preceded by +an escape character represents any word character. | +"w" | ++ |
+ | 123 | +The character which when preceded by +an escape character represents a non-word character. | +"W" | ++ |
+ | 124 | +The character which when preceded by +an escape character represents a start of buffer assertion. | +"`A" | ++ |
+ | 125 | +The character which when preceded by +an escape character represents an end of buffer assertion. | +"'z" | ++ |
+ | 126 | +The newline character. | +"\n" | ++ |
+ | 127 | +The comma separator. | +"," | ++ |
+ | 128 | +The character which when preceded by +an escape character represents the bell character. | +"a" | ++ |
+ | 129 | +The character which when preceded by +an escape character represents the form feed character. | +"f" | ++ |
+ | 130 | +The character which when preceded by +an escape character represents the newline character. | +"n" | ++ |
+ | 131 | +The character which when preceded by +an escape character represents the carriage return character. | +"r" | ++ |
+ | 132 | +The character which when preceded by +an escape character represents the tab character. | +"t" | ++ |
+ | 133 | +The character which when preceded by +an escape character represents the vertical tab character. | +"v" | ++ |
+ | 134 | +The character which when preceded by +an escape character represents the start of a hexadecimal character +constant. | +"x" | ++ |
+ | 135 | +The character which when preceded by +an escape character represents the start of an ASCII escape +character. | +"c" | ++ |
+ | 136 | +The colon character. | +":" | ++ |
+ | 137 | +The equals character. | +"=" | ++ |
+ | 138 | +The character which when preceded by +an escape character represents the ASCII escape character. | +"e" | ++ |
+ | 139 | +The character which when preceded by +an escape character represents any lower case character. | +"l" | ++ |
+ | 140 | +The character which when preceded by +an escape character represents any non-lower case character. | +"L" | ++ |
+ | 141 | +The character which when preceded by +an escape character represents any upper case character. | +"u" | ++ |
+ | 142 | +The character which when preceded by +an escape character represents any non-upper case character. | +"U" | ++ |
+ | 143 | +The character which when preceded by +an escape character represents any space character. | +"s" | ++ |
+ | 144 | +The character which when preceded by +an escape character represents any non-space character. | +"S" | ++ |
+ | 145 | +The character which when preceded by +an escape character represents any digit character. | +"d" | ++ |
+ | 146 | +The character which when preceded by +an escape character represents any non-digit character. | +"D" | ++ |
+ | 147 | +The character which when preceded by +an escape character represents the end quote operator. | +"E" | ++ |
+ | 148 | +The character which when preceded by +an escape character represents the start quote operator. | +"Q" | ++ |
+ | 149 | +The character which when preceded by +an escape character represents a Unicode combining character +sequence. | +"X" | ++ |
+ | 150 | +The character which when preceded by +an escape character represents any single character. | +"C" | ++ |
+ | 151 | +The character which when preceded by +an escape character represents end of buffer operator. | +"Z" | ++ |
+ | 152 | +The character which when preceded by +an escape character represents the continuation assertion. | +"G" | ++ |
+ | 153 | +The character which when preceeded by (? indicates a zero width +negated forward lookahead assert. | +! | ++ |
Custom error messages are loaded as follows:
+ + + ++ | Message ID | +Error message ID | +Default string | ++ |
+ | 201 | +REG_NOMATCH | +"No match" | ++ |
+ | 202 | +REG_BADPAT | +"Invalid regular expression" | ++ |
+ | 203 | +REG_ECOLLATE | +"Invalid collation character" | ++ |
+ | 204 | +REG_ECTYPE | +"Invalid character class name" | ++ |
+ | 205 | +REG_EESCAPE | +"Trailing backslash" | ++ |
+ | 206 | +REG_ESUBREG | +"Invalid back reference" | ++ |
+ | 207 | +REG_EBRACK | +"Unmatched [ or [^" | ++ |
+ | 208 | +REG_EPAREN | +"Unmatched ( or \\(" | ++ |
+ | 209 | +REG_EBRACE | +"Unmatched \\{" | ++ |
+ | 210 | +REG_BADBR | +"Invalid content of \\{\\}" | ++ |
+ | 211 | +REG_ERANGE | +"Invalid range end" | ++ |
+ | 212 | +REG_ESPACE | +"Memory exhausted" | ++ |
+ | 213 | +REG_BADRPT | +"Invalid preceding regular +expression" | ++ |
+ | 214 | +REG_EEND | +"Premature end of regular +expression" | ++ |
+ | 215 | +REG_ESIZE | +"Regular expression too big" | ++ |
+ | 216 | +REG_ERPAREN | +"Unmatched ) or \\)" | ++ |
+ | 217 | +REG_EMPTY | +"Empty expression" | ++ |
+ | 218 | +REG_E_UNKNOWN | +"Unknown error" | ++ |
Custom character class names are loaded as followed:
+ + + ++ | Message ID | +Description | +Equivalent default class name | ++ |
+ | 300 | +The character class name for +alphanumeric characters. | +"alnum" | ++ |
+ | 301 | +The character class name for +alphabetic characters. | +"alpha" | ++ |
+ | 302 | +The character class name for control +characters. | +"cntrl" | ++ |
+ | 303 | +The character class name for digit +characters. | +"digit" | ++ |
+ | 304 | +The character class name for graphics +characters. | +"graph" | ++ |
+ | 305 | +The character class name for lower +case characters. | +"lower" | ++ |
+ | 306 | +The character class name for printable +characters. | +"print" | ++ |
+ | 307 | +The character class name for +punctuation characters. | +"punct" | ++ |
+ | 308 | +The character class name for space +characters. | +"space" | ++ |
+ | 309 | +The character class name for upper +case characters. | +"upper" | ++ |
+ | 310 | +The character class name for +hexadecimal characters. | +"xdigit" | ++ |
+ | 311 | +The character class name for blank +characters. | +"blank" | ++ |
+ | 312 | +The character class name for word +characters. | +"word" | ++ |
+ | 313 | +The character class name for Unicode +characters. | +"unicode" | ++ |
Finally, custom collating element names are loaded starting from +message id 400, and terminating when the first load thereafter +fails. Each message looks something like: "tagname string" where +tagname is the name used inside [[.tagname.]] and +string is the actual text of the collating element. Note that +the value of collating element [[.zero.]] is used for the +conversion of strings to numbers - if you replace this with another +value then that will be used for string parsing - for example use +the Unicode character 0x0660 for [[.zero.]] if you want to use +Unicode Arabic-Indic digits in your regular expressions in place of +Latin digits.
+ +Note that the POSIX defined names for character classes and +collating elements are always available - even if custom names are +defined, in contrast, custom error messages, and custom syntax +messages replace the default ones.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/Attic/match_flag_type.html b/doc/Attic/match_flag_type.html new file mode 100644 index 00000000..0e89736a --- /dev/null +++ b/doc/Attic/match_flag_type.html @@ -0,0 +1,330 @@ + + + + +
+ |
+
+Boost.Regex+ +match_flag_type+ |
+
+ |
+
The type match_flag_type
is an implementation
+defined bitmask type (17.3.2.1.2) that controls how a regular
+expression is matched against a character sequence.
+namespace std{ namespace regex_constants{ + +typedef bitmask_type match_flag_type; + +static const match_flag_type match_default = 0; +static const match_flag_type match_not_bob; +static const match_flag_type match_not_eob; +static const match_flag_type match_not_bol; +static const match_flag_type match_not_eol; +static const match_flag_type match_not_bow; +static const match_flag_type match_not_eow; +static const match_flag_type match_any; +static const match_flag_type match_not_null; +static const match_flag_type match_continuous; +static const match_flag_type match_partial; +static const match_flag_type match_prev_avail; +static const match_flag_type match_not_dot_newline; +static const match_flag_type match_not_dot_null; + +static const match_flag_type format_default = 0; +static const match_flag_type format_sed; +static const match_flag_type format_perl; +static const match_flag_type format_no_copy; +static const match_flag_type format_first_only; +static const match_flag_type format_all; + +} // namespace regex_constants +} // namespace std ++ +
The type match_flag_type
is an implementation
+defined bitmask type (17.3.2.1.2). When matching a regular
+expression against a sequence of characters [first, last) then
+setting its elements has the effects listed in the table below:
+ Element + |
+
+ Effect if set + |
+
+ match_default + |
+
+ Specifies that matching of regular expressions proceeds without +any modification of the normal rules used in ECMA-262, ECMAScript +Language Specification, Chapter 15 part 10, RegExp (Regular +Expression) Objects (FWD.1) + |
+
match_not_bob | +Specifies that the expression "\A" +should not match against the sub-sequence [first,first). | +
match_not_eob | +Specifies that the expressions "\z" +and "\Z" should not match against the sub-sequence +[last,last). | +
+ match_not_bol + |
+
+ Specifies that the expression "^" should not be matched against +the sub-sequence [first,first). + |
+
+ match_not_eol + |
+
+ Specifies that the expression "$" should not be matched against +the sub-sequence [last,last). + |
+
+ match_not_bow + |
+
+ Specifies that the expression "\b" should not be matched against +the sub-sequence [first,first). + |
+
+ match_not_eow + |
+
+ Specifies that the expression "\b" should not be matched against +the sub-sequence [last,last). + |
+
+ match_any + |
+
+ Specifies that if more than one match is possible then any match +is an acceptable result. + |
+
+ match_not_null + |
+
+ Specifies that the expression can not be matched against an +empty sequence. + |
+
+ match_continuous + |
+
+ Specifies that the expression must match a sub-sequence that +begins at first. + |
+
+ match_partial + |
+
+ Specifies that if no match can be found, then it is acceptable +to return a match [from, last) where from!=last, if there exists +some sequence of characters [from,to) of which [from,last) is a +prefix, and which would result in a full match. + |
+
+ match_prev_avail + |
+
+ Specifies that |
+
match_not_dot_newline | +Specifies that the expression "." does +not match a newline character. | +
match_not_dot_null | +Specified that the expression "." does +not match a character null '\0'. | +
+ format_default + |
+
+ Specifies that when a regular expression match is to be replaced +by a new string, that the new string is constructed using the rules +used by the ECMAScript replace function in ECMA-262, ECMAScript +Language Specification, Chapter 15 part 5.4.11 +String.prototype.replace. (FWD.1). In addition during search and +replace operations then all non-overlapping occurrences of the +regular expression are located and replaced, and sections of the +input that did not match the expression, are copied unchanged to +the output string. + |
+
+ format_sed + |
+
+ Specifies that when a regular expression match is to be replaced +by a new string, that the new string is constructed using the rules +used by the Unix sed utility in IEEE Std 1003.1-2001, Portable +Operating SystemInterface (POSIX ), Shells and Utilities.. + |
+
+ format_perl + |
+
+ Specifies that when a regular expression match is to be replaced +by a new string, that the new string is constructed using an +implementation defined superset of the rules used by the ECMAScript +replace function in ECMA-262, ECMAScript Language Specification, +Chapter 15 part 5.4.11 String.prototype.replace (FWD.1). + |
+
format_all | +Specifies that all syntax +extensions are enabled, including conditional +(?ddexpression1:expression2) replacements. | +
+ format_no_copy + |
+
+ When specified during a search and replace operation, then +sections of the character container sequence being searched that do +match the regular expression, are not copied to the output +string. + |
+
+ format_first_only + |
+
+ When specified during a search and replace operation, then only +the first occurrence of the regular expression is replaced. + |
+
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/Attic/match_results.html b/doc/Attic/match_results.html new file mode 100644 index 00000000..9acc3afc --- /dev/null +++ b/doc/Attic/match_results.html @@ -0,0 +1,511 @@ + + + + +
+ |
+
+Boost.Regex+ +class match_results+ |
+
+ |
+
#include <boost/regex.hpp>
+ +Regular expressions are different from many simple +pattern-matching algorithms in that as well as finding an overall +match they can also produce sub-expression matches: each +sub-expression being delimited in the pattern by a pair of +parenthesis (...). There has to be some method for reporting +sub-expression matches back to the user: this is achieved this by +defining a class match_results that acts as an indexed +collection of sub-expression matches, each sub-expression match +being contained in an object of type +sub_match .
+ +Template class match_results denotes a collection of character +sequences representing the result of a regular expression match. +Objects of type match_results are passed to the algorithms regex_match and +regex_search, and are returned by the iterator regex_iterator . Storage for the +collection is allocated and freed as necessary by the member +functions of class match_results.
+ +The template class match_results conforms to the requirements of +a Sequence, as specified in (lib.sequence.reqmts), except that only +operations defined for const-qualified Sequences are supported.
+ +Class template match_results is most commonly used as one of the +typedefs cmatch, wcmatch, smatch, or wsmatch:
+ ++template <class BidirectionalIterator, + class Allocator = allocator<sub_match<BidirectionalIterator> > +class match_results; + +typedef match_results<const char*> cmatch; +typedef match_results<const wchar_t*> wcmatch; +typedef match_results<string::const_iterator> smatch; +typedef match_results<wstring::const_iterator> wsmatch; + +template <class BidirectionalIterator, + class Allocator = allocator<sub_match<BidirectionalIterator> > +class match_results +{ +public: + typedef sub_match<BidirectionalIterator> value_type; + typedef const value_type& const_reference; + typedef const_reference reference; + typedef implementation defined const_iterator; + typedef const_iterator iterator; + typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; + typedef typename Allocator::size_type size_type; + typedef Allocator allocator_type; + typedef typename iterator_traits<BidirectionalIterator>::value_type char_type; + typedef basic_string<char_type> string_type; + + // construct/copy/destroy: + explicit match_results(const Allocator& a = Allocator()); + match_results(const match_results& m); + match_results& operator=(const match_results& m); + ~match_results(); + + // size: + size_type size() const; + size_type max_size() const; + bool empty() const; + // element access: + difference_type length(int sub = 0) const; + difference_type position(unsigned int sub = 0) const; + string_type str(int sub = 0) const; + const_reference operator[](int n) const; + + const_reference prefix() const; + + const_reference suffix() const; + const_iterator begin() const; + const_iterator end() const; + // format: + template <class OutputIterator> + OutputIterator format(OutputIterator out, + const string_type& fmt, + match_flag_type flags = format_default) const; + string_type format(const string_type& fmt, + match_flag_type flags = format_default) const; + + allocator_type get_allocator() const; + void swap(match_results& that); +}; + +template <class BidirectionalIterator, class Allocator> +bool operator == (const match_results<BidirectionalIterator, Allocator>& m1, + const match_results<BidirectionalIterator, Allocator>& m2); +template <class BidirectionalIterator, class Allocator> +bool operator != (const match_results<BidirectionalIterator, Allocator>& m1, + const match_results<BidirectionalIterator, Allocator>& m2); + +template <class charT, class traits, class BidirectionalIterator, class Allocator> +basic_ostream<charT, traits>& + operator << (basic_ostream<charT, traits>& os, + const match_results<BidirectionalIterator, Allocator>& m); + +template <class BidirectionalIterator, class Allocator> +void swap(match_results<BidirectionalIterator, Allocator>& m1, + match_results<BidirectionalIterator, Allocator>& m2); ++ +
In all match_results
constructors, a copy of the
+Allocator argument is used for any memory allocation performed by
+the constructor or member functions during the lifetime of the
+object.
+match_results(const Allocator& a = Allocator()); ++ + +
Effects: Constructs an object of class match_results. The +postconditions of this function are indicated in the table:
+ + + +
+ Element + |
+
+ Value + |
+
+ empty() + |
+
+ true + |
+
+ size() + |
+
+ 0 + |
+
+ str() + |
+
+ basic_string<charT>() + |
+
+ +
+match_results(const match_results& m); ++ + +
Effects: Constructs an object of class match_results, as +a copy of m.
+ ++match_results& operator=(const match_results& m); ++ + +
Effects: Assigns m to *this. The postconditions of this +function are indicated in the table:
+ + + +
+ Element + |
+
+ Value + |
+
+ empty() + |
+
+ m.empty(). + |
+
+ size() + |
+
+ m.size(). + |
+
+ str(n) + |
+
+ m.str(n) for all integers n < m.size(). + |
+
+ prefix() + |
+
+ m.prefix(). + |
+
+ suffix() + |
+
+ m.suffix(). + |
+
+ (*this)[n] + |
+
+ m[n] for all integers n < m.size(). + |
+
+ length(n) + |
+
+ m.length(n) for all integers n < m.size(). + |
+
+ position(n) + |
+
+ m.position(n) for all integers n < m.size(). + |
+
+size_type size()const; ++ + +
Effects: Returns the number of sub_match elements stored +in *this.
+ ++size_type max_size()const; ++ + +
Effects: Returns the maximum number of sub_match elements +that can be stored in *this.
+ ++bool empty()const; ++ + +
Effects: Returns size() == 0
.
+difference_type length(int sub = 0)const; ++ + +
Effects: Returns (*this)[sub].length()
.
+difference_type position(unsigned int sub = 0)const; ++ + +
Effects: Returns std::distance(prefix().first,
+(*this)[sub].first).
+string_type str(int sub = 0)const; ++ + +
Effects: Returns
+string_type((*this)[sub]).
+const_reference operator[](int n) const; ++ + +
Effects: Returns a reference to the
+sub_match
object representing the character sequence that
+matched marked sub-expression n. If n == 0
then
+returns a reference to a sub_match
object representing
+the character sequence that matched the whole regular
+expression.
+const_reference prefix()const; ++ + +
Effects: Returns a reference to the
+sub_match
object representing the character sequence from
+the start of the string being matched/searched, to the start of the
+match found.
+const_reference suffix()const; ++ + +
Effects: Returns a reference to the
+sub_match
object representing the character sequence from
+the end of the match found to the end of the string being
+matched/searched.
+const_iterator begin()const; ++ + +
Effects: Returns a starting iterator that enumerates over +all the marked sub-expression matches stored in *this.
+ ++const_iterator end()const; ++ + +
Effects: Returns a terminating iterator that enumerates +over all the marked sub-expression matches stored in *this.
+ ++template <class OutputIterator> +OutputIterator format(OutputIterator out, + const string_type& fmt, + match_flag_type flags = format_default); ++ + +
Requires: The type OutputIterator conforms to the Output +Iterator requirements (24.1.2).
+ + +Effects: Copies the character sequence [fmt.begin(), +fmt.end()) to OutputIterator out. For each format +specifier or escape sequence in fmt, replace that sequence +with either the character(s) it represents, or the sequence of +characters within *this to which it refers. The bitmasks specified +in flags determines what +format specifiers or escape sequences +are recognized, by default this is the format used by ECMA-262, +ECMAScript Language Specification, Chapter 15 part 5.4.11 +String.prototype.replace.
+ + +Returns: out.
+ ++string_type format(const string_type& fmt, + match_flag_type flags = format_default); ++ + +
Effects: Returns a copy of the string fmt. For +each format specifier or escape sequence in fmt, replace +that sequence with either the character(s) it represents, or the +sequence of characters within *this to which it refers. The +bitmasks specified in +flags determines what format +specifiers or escape sequences are recognized, by default this +is the format used by ECMA-262, ECMAScript Language Specification, +Chapter 15 part 5.4.11 String.prototype.replace.
+ ++allocator_type get_allocator()const; ++ + +
Effects: Returns a copy of the Allocator that was passed +to the object's constructor.
+ ++void swap(match_results& that); ++ + +
Effects: Swaps the contents of the two sequences.
+ + +Postcondition: *this
contains the sequence
+of matched sub-expressions that were in that
,
+that
contains the sequence of matched sub-expressions that
+were in *this
.
Complexity: constant time.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/Attic/partial_matches.html b/doc/Attic/partial_matches.html new file mode 100644 index 00000000..3f4d2a53 --- /dev/null +++ b/doc/Attic/partial_matches.html @@ -0,0 +1,185 @@ + + + ++
+ |
+
+ Boost.Regex+Partial Matches+ |
+
+ |
+
The match-flag match_partial
can
+ be passed to the following algorithms: regex_match,
+ regex_search, and regex_grep.
+ When used it indicates that partial as well as full matches should be found. A
+ partial match is one that matched one or more characters at the end of the text
+ input, but did not match all of the regular expression (although it may have
+ done so had more input been available). Partial matches are typically used when
+ either validating data input (checking each character as it is entered on the
+ keyboard), or when searching texts that are either too long to load into memory
+ (or even into a memory mapped file), or are of indeterminate length (for
+ example the source may be a socket or similar). Partial and full matches can be
+ differentiated as shown in the following table (the variable M represents an
+ instance of match_results<> as filled in
+ by regex_match, regex_search or regex_grep):
+
+
+ | Result | +M[0].matched | +M[0].first | +M[0].second | +
No match | +False | +Undefined | +Undefined | +Undefined | +
Partial match | +True | +False | +Start of partial match. | +End of partial match (end of text). | +
Full match | +True | +True | +Start of full match. | +End of full match. | +
The following example
+ tests to see whether the text could be a valid credit card number, as the user
+ presses a key, the character entered would be added to the string being built
+ up, and passed to is_possible_card_number
. If this returns true
+ then the text could be a valid card number, so the user interface's OK button
+ would be enabled. If it returns false, then this is not yet a valid card
+ number, but could be with more input, so the user interface would disable the
+ OK button. Finally, if the procedure throws an exception the input could never
+ become a valid number, and the inputted character must be discarded, and a
+ suitable error indication displayed to the user.
#include <string> +#include <iostream> +#include <boost/regex.hpp> + +boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})"); + +bool is_possible_card_number(const std::string& input) +{ + // + // return false for partial match, true for full match, or throw for + // impossible match based on what we have so far... + boost::match_results<std::string::const_iterator> what; + if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial)) + { + // the input so far could not possibly be valid so reject it: + throw std::runtime_error("Invalid data entered - this could not possibly be a valid card number"); + } + // OK so far so good, but have we finished? + if(what[0].matched) + { + // excellent, we have a result: + return true; + } + // what we have so far is only a partial match... + return false; +}+
In the following example, + text input is taken from a stream containing an unknown amount of text; this + example simply counts the number of html tags encountered in the stream. The + text is loaded into a buffer and searched a part at a time, if a partial match + was encountered, then the partial match gets searched a second time as the + start of the next batch of text:
+#include <iostream> +#include <fstream> +#include <sstream> +#include <string> +#include <boost/regex.hpp> + +// match some kind of html tag: +boost::regex e("<[^>]*>"); +// count how many: +unsigned int tags = 0; +// saved position of partial match: +char* next_pos = 0; + +bool grep_callback(const boost::match_results<char*>& m) +{ + if(m[0].matched == false) + { + // save position and return: + next_pos = m[0].first; + } + else + ++tags; + return true; +} + +void search(std::istream& is) +{ + char buf[4096]; + next_pos = buf + sizeof(buf); + bool have_more = true; + while(have_more) + { + // how much do we copy forward from last try: + unsigned leftover = (buf + sizeof(buf)) - next_pos; + // and how much is left to fill: + unsigned size = next_pos - buf; + // copy forward whatever we have left: + memcpy(buf, next_pos, leftover); + // fill the rest from the stream: + unsigned read = is.readsome(buf + leftover, size); + // check to see if we've run out of text: + have_more = read == size; + // reset next_pos: + next_pos = buf + sizeof(buf); + // and then grep: + boost::regex_grep(grep_callback, + buf, + buf + read + leftover, + e, + boost::match_default | boost::match_partial); + } +}+
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/performance.html b/doc/Attic/performance.html new file mode 100644 index 00000000..826dd83a --- /dev/null +++ b/doc/Attic/performance.html @@ -0,0 +1,54 @@ + + + ++
+ |
+
+ Boost.Regex+Performance+ |
+
+ |
+
The performance of Boost.regex in both recursive and non-recursive modes should + be broadly comparable to other regular expression libraries: recursive mode is + slightly faster (especially where memory allocation requires thread + synchronisation), but not by much. The following pages compare + Boost.regex with various other regular expression libraries for the following + compilers:
+Visual Studio.Net 2003 (recursive Boost.regex + implementation).
+Gcc 3.2 (cygwin) (non-recursive Boost.regex + implementation).
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/posix_api.html b/doc/Attic/posix_api.html new file mode 100644 index 00000000..fdc3bba3 --- /dev/null +++ b/doc/Attic/posix_api.html @@ -0,0 +1,288 @@ + + + ++
+ |
+
+ Boost.Regex+POSIX API Compatibility Functions+ |
+
+ |
+
#include <boost/cregex.hpp> +or: +#include <boost/regex.h>+
The following functions are available for users who need a POSIX compatible C + library, they are available in both Unicode and narrow character versions, the + standard POSIX API names are macros that expand to one version or the other + depending upon whether UNICODE is defined or not. +
+Important: Note that all the symbols defined here are enclosed inside + namespace boost when used in C++ programs, unless you use #include + <boost/regex.h> instead - in which case the symbols are still defined in + namespace boost, but are made available in the global namespace as well.
+The functions are defined as: +
+extern "C" { +int regcompA(regex_tA*, const char*, int); +unsigned int regerrorA(int, const regex_tA*, char*, unsigned int); +int regexecA(const regex_tA*, const char*, unsigned int, regmatch_t*, int); +void regfreeA(regex_tA*); + +int regcompW(regex_tW*, const wchar_t*, int); +unsigned int regerrorW(int, const regex_tW*, wchar_t*, unsigned int); +int regexecW(const regex_tW*, const wchar_t*, unsigned int, regmatch_t*, int); +void regfreeW(regex_tW*); + +#ifdef UNICODE +#define regcomp regcompW +#define regerror regerrorW +#define regexec regexecW +#define regfree regfreeW +#define regex_t regex_tW +#else +#define regcomp regcompA +#define regerror regerrorA +#define regexec regexecA +#define regfree regfreeA +#define regex_t regex_tA +#endif +}+
All the functions operate on structure regex_t, which exposes two public + members: +
+unsigned int re_nsub this is filled in by regcomp and indicates + the number of sub-expressions contained in the regular expression. +
+const TCHAR* re_endp points to the end of the expression to compile when + the flag REG_PEND is set. +
+Footnote: regex_t is actually a #define - it is either regex_tA or regex_tW + depending upon whether UNICODE is defined or not, TCHAR is either char or + wchar_t again depending upon the macro UNICODE. +
+regcomp takes a pointer to a regex_t, a pointer to the expression
+ to compile and a flags parameter which can be a combination of:
+
+
+
+
+ | REG_EXTENDED | +Compiles modern regular expressions. Equivalent to + regbase::char_classes | regbase::intervals | regbase::bk_refs. | ++ |
+ | REG_BASIC | +Compiles basic (obsolete) regular expression syntax. + Equivalent to regbase::char_classes | regbase::intervals | regbase::limited_ops + | regbase::bk_braces | regbase::bk_parens | regbase::bk_refs. | ++ |
+ | REG_NOSPEC | +All characters are ordinary, the expression is a + literal string. | ++ |
+ | REG_ICASE | +Compiles for matching that ignores character case. | ++ |
+ | REG_NOSUB | +Has no effect in this library. | ++ |
+ | REG_NEWLINE | +When this flag is set a dot does not match the + newline character. | ++ |
+ | REG_PEND | +When this flag is set the re_endp parameter of the + regex_t structure must point to the end of the regular expression to compile. | ++ |
+ | REG_NOCOLLATE | +When this flag is set then locale dependent collation + for character ranges is turned off. | ++ |
+ | REG_ESCAPE_IN_LISTS + , , , + |
+ When this flag is set, then escape sequences are + permitted in bracket expressions (character sets). | ++ |
+ | REG_NEWLINE_ALT | +When this flag is set then the newline character is + equivalent to the alternation operator |. | ++ |
+ | REG_PERL | +Compiles Perl like regular expressions. | ++ |
+ | REG_AWK | +A shortcut for awk-like behavior: REG_EXTENDED | + REG_ESCAPE_IN_LISTS | ++ |
+ | REG_GREP | +A shortcut for grep like behavior: REG_BASIC | + REG_NEWLINE_ALT | ++ |
+ | REG_EGREP | +A shortcut for egrep like behavior: + REG_EXTENDED | REG_NEWLINE_ALT | ++ |
regerror takes the following parameters, it maps an error code to a human
+ readable string:
+
+
+
+ | int code | +The error code. | ++ |
+ | const regex_t* e | +The regular expression (can be null). | ++ |
+ | char* buf | +The buffer to fill in with the error message. | ++ |
+ | unsigned int buf_size | +The length of buf. | ++ |
If the error code is OR'ed with REG_ITOA then the message that results is the + printable name of the code rather than a message, for example "REG_BADPAT". If + the code is REG_ATIO then e must not be null and e->re_pend must + point to the printable name of an error code, the return value is then the + value of the error code. For any other value of code, the return value + is the number of characters in the error message, if the return value is + greater than or equal to buf_size then regerror will have to be + called again with a larger buffer.
+regexec finds the first occurrence of expression e within string buf.
+ If len is non-zero then *m is filled in with what matched the
+ regular expression, m[0] contains what matched the whole string, m[1]
+ the first sub-expression etc, see regmatch_t in the header file
+ declaration for more details. The eflags parameter can be a combination
+ of:
+
+
+
+
+ | REG_NOTBOL | +Parameter buf does not represent the start of + a line. | ++ |
+ | REG_NOTEOL | +Parameter buf does not terminate at the end of + a line. | ++ |
+ | REG_STARTEND | +The string searched starts at buf + pmatch[0].rm_so + and ends at buf + pmatch[0].rm_eo. | ++ |
Finally regfree frees all the memory that was allocated by regcomp. +
+Footnote: this is an abridged reference to the POSIX API functions, it is + provided for compatibility with other libraries, rather than an API to be used + in new code (unless you need access from a language other than C++). This + version of these functions should also happily coexist with other versions, as + the names used are macros that expand to the actual function names. +
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/redistributables.html b/doc/Attic/redistributables.html new file mode 100644 index 00000000..884fca7a --- /dev/null +++ b/doc/Attic/redistributables.html @@ -0,0 +1,84 @@ + + + ++
+ |
+
+ Boost.Regex+Redistributables and Library Names+ |
+
+ |
+
If you are using Microsoft or Borland C++ and link to a dll version of the run
+ time library, then you will also link to one of the dll versions of boost.regex.
+ While these dll's are redistributable, there are no "standard" versions, so
+ when installing on the users PC, you should place these in a directory private
+ to your application, and not in the PC's directory path. Note that if you link
+ to a static version of your run time library, then you will also link to a
+ static version of boost.regex and no dll's will need to be distributed. The
+ possible boost.regex dll and library names are computed according to the following
+ formula:
+
"boost_regex_"
+ + BOOST_LIB_TOOLSET
+ + "_"
+ + BOOST_LIB_THREAD_OPT
+ + BOOST_LIB_RT_OPT
+ + BOOST_LIB_LINK_OPT
+ + BOOST_LIB_DEBUG_OPT
+
+ These are defined as:
+
+ BOOST_LIB_TOOLSET: The compiler toolset name (vc6, vc7, bcb5 etc).
+
+ BOOST_LIB_THREAD_OPT: "s" for single thread builds,
+ "m" for multithread builds.
+
+ BOOST_LIB_RT_OPT: "s" for static runtime,
+ "d" for dynamic runtime.
+
+ BOOST_LIB_LINK_OPT: "s" for static link,
+ "i" for dynamic link.
+
+ BOOST_LIB_DEBUG_OPT: nothing for release builds,
+ "d" for debug builds,
+ "dd" for debug-diagnostic builds (_STLP_DEBUG).
+ Note: you can disable automatic library selection by defining the symbol + BOOST_REGEX_NO_LIB when compiling, this is useful if you want to statically + link even though you're using the dll version of your run time library, or if + you need to debug boost.regex. +
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/reg_expression.html b/doc/Attic/reg_expression.html new file mode 100644 index 00000000..a1fd6b56 --- /dev/null +++ b/doc/Attic/reg_expression.html @@ -0,0 +1,46 @@ + + + ++
+ |
+
+ Boost.Regex+Class reg_expression (deprecated)+ |
+
+ |
+
The use of class template reg_expression is deprecated: use + basic_regex instead.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/regbase.html b/doc/Attic/regbase.html new file mode 100644 index 00000000..f36ce38a --- /dev/null +++ b/doc/Attic/regbase.html @@ -0,0 +1,91 @@ + + + + +
+ |
+
+Boost.Regex+ +regbase+ |
+
+ |
+
Use of the type boost::regbase
is now deprecated,
+and the type does not form a part of the
+regular expression standardization proposal. This type
+still exists as a base class of boost::basic_regex
,
+and you can still refer to
+boost::regbase::constant_name
in your code, however for
+maximum portability to other std regex implementations you should
+instead use either:
+boost::regex_constants::constant_name ++ +
or
+ ++boost::regex::constant_name ++ +
or
+ ++boost::wregex::constant_name ++ + + +
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/Attic/regex.html b/doc/Attic/regex.html new file mode 100644 index 00000000..785caf87 --- /dev/null +++ b/doc/Attic/regex.html @@ -0,0 +1,620 @@ + + + + +
+ |
+
+Boost.Regex+ +class RegEx (deprecated)+ |
+
+ |
+
The high level wrapper class RegEx is now deprecated and does +not form a part of the +regular expression standardization proposal. This type +still exists, and existing code will continue to compile, however +the following documentation is unlikely to be further updated.
+ ++#include <boost/cregex.hpp> ++ +
The class RegEx provides a high level simplified interface to +the regular expression library, this class only handles narrow +character strings, and regular expressions always follow the +"normal" syntax - that is the same as the perl / ECMAScript +synatx.
+ ++typedef bool (*GrepCallback)(const RegEx& expression); +typedef bool (*GrepFileCallback)(const char* file, const RegEx& expression); +typedef bool (*FindFilesCallback)(const char* file); + +class RegEx +{ +public: + RegEx(); + RegEx(const RegEx& o); + ~RegEx(); + RegEx(const char* c, bool icase = false); + explicit RegEx(const std::string& s, bool icase = false); + RegEx& operator=(const RegEx& o); + RegEx& operator=(const char* p); + RegEx& operator=(const std::string& s); + unsigned int SetExpression(const char* p, bool icase = false); + unsigned int SetExpression(const std::string& s, bool icase = false); + std::string Expression()const; + // + // now matching operators: + // + bool Match(const char* p, unsigned int flags = match_default); + bool Match(const std::string& s, unsigned int flags = match_default); + bool Search(const char* p, unsigned int flags = match_default); + bool Search(const std::string& s, unsigned int flags = match_default); + unsigned int Grep(GrepCallback cb, const char* p, unsigned int flags = match_default); + unsigned int Grep(GrepCallback cb, const std::string& s, unsigned int flags = match_default); + unsigned int Grep(std::vector<std::string>& v, const char* p, unsigned int flags = match_default); + unsigned int Grep(std::vector<std::string>& v, const std::string& s, unsigned int flags = match_default); + unsigned int Grep(std::vector<unsigned int>& v, const char* p, unsigned int flags = match_default); + unsigned int Grep(std::vector<unsigned int>& v, const std::string& s, unsigned int flags = match_default); + unsigned int GrepFiles(GrepFileCallback cb, const char* files, bool recurse = false, unsigned int flags = match_default); + unsigned int GrepFiles(GrepFileCallback cb, const std::string& files, bool recurse = false, unsigned int flags = match_default); + unsigned int FindFiles(FindFilesCallback cb, const char* files, bool recurse = false, unsigned int flags = match_default); + unsigned int FindFiles(FindFilesCallback cb, const std::string& files, bool recurse = false, unsigned int flags = match_default); + std::string Merge(const std::string& in, const std::string& fmt, bool copy = true, unsigned int flags = match_default); + std::string Merge(const char* in, const char* fmt, bool copy = true, unsigned int flags = match_default); + unsigned Split(std::vector<std::string>& v, std::string& s, unsigned flags = match_default, unsigned max_count = ~0); + // + // now operators for returning what matched in more detail: + // + unsigned int Position(int i = 0)const; + unsigned int Length(int i = 0)const; + bool Matched(int i = 0)const; + unsigned int Line()const; + unsigned int Marks() const; + std::string What(int i)const; + std::string operator[](int i)const ; + + static const unsigned int npos; +}; ++ +
Member functions for class RegEx are defined as follows:
+
+ | RegEx(); | +Default constructor, constructs an +instance of RegEx without any valid expression. | ++ |
+ | RegEx(const RegEx& o); | +Copy constructor, all the properties +of parameter o are copied. | ++ |
+ | RegEx(const char* c, +bool icase = false); | +Constructs an instance of RegEx, +setting the expression to c, if icase is true +then matching is insensitive to case, otherwise it is sensitive to +case. Throws bad_expression on failure. | ++ |
+ | RegEx(const std::string& s, +bool icase = false); | +Constructs an instance of RegEx, +setting the expression to s, if icase is true +then matching is insensitive to case, otherwise it is sensitive to +case. Throws bad_expression on failure. | ++ |
+ | RegEx& +operator=(const RegEx& o); | +Default assignment operator. | ++ |
+ | RegEx& +operator=(const char* p); | +Assignment operator, equivalent to +calling SetExpression(p, false). Throws +bad_expression on failure. | ++ |
+ | RegEx& +operator=(const std::string& s); | +Assignment operator, equivalent to +calling SetExpression(s, false). Throws +bad_expression on failure. | ++ |
+ | unsigned int +SetExpression(constchar* p, bool icase = +false); | +Sets the current expression to +p, if icase is true then matching is insensitive +to case, otherwise it is sensitive to case. Throws +bad_expression on failure. | ++ |
+ | unsigned int +SetExpression(const std::string& s, bool icase = +false); | +Sets the current expression to +s, if icase is true then matching is insensitive +to case, otherwise it is sensitive to case. Throws +bad_expression on failure. | ++ |
+ | std::string +Expression()const; | +Returns a copy of the current regular +expression. | ++ |
+ | bool Match(const +char* p, unsigned int flags = +match_default); | +Attempts to match the current +expression against the text p using the match flags +flags - see match flags. +Returns true if the expression matches the whole of the +input string. | ++ |
+ | bool Match(const +std::string& s, unsigned int flags = +match_default) ; | +Attempts to match the current +expression against the text s using the match flags +flags - see match flags. +Returns true if the expression matches the whole of the +input string. | ++ |
+ | bool Search(const +char* p, unsigned int flags = +match_default); | +Attempts to find a match for the +current expression somewhere in the text p using the match +flags flags - see match +flags. Returns true if the match succeeds. | ++ |
+ | bool Search(const +std::string& s, unsigned int flags = +match_default) ; | +Attempts to find a match for the +current expression somewhere in the text s using the match +flags flags - see match +flags. Returns true if the match succeeds. | ++ |
+ | unsigned int +Grep(GrepCallback cb, const char* p, unsigned +int flags = match_default); | +Finds all matches of the current
+expression in the text p using the match flags flags
+- see match flags. For each
+match found calls the call-back function cb as: cb(*this);
+
+ If at any stage the call-back function returns false then the +grep operation terminates, otherwise continues until no further +matches are found. Returns the number of matches found. + |
++ |
+ | unsigned int +Grep(GrepCallback cb, const std::string& s, +unsigned int flags = match_default); | +Finds all matches of the current
+expression in the text s using the match flags flags
+- see match flags. For each
+match found calls the call-back function cb as: cb(*this);
+
+ If at any stage the call-back function returns false then the +grep operation terminates, otherwise continues until no further +matches are found. Returns the number of matches found. + |
++ |
+ | unsigned int +Grep(std::vector<std::string>& v, const +char* p, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text p using the match flags flags +- see match flags. For each +match pushes a copy of what matched onto v. Returns the +number of matches found. | ++ |
+ | unsigned int +Grep(std::vector<std::string>& v, const +std::string& s, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text s using the match flags flags +- see match flags. For each +match pushes a copy of what matched onto v. Returns the +number of matches found. | ++ |
+ | unsigned int +Grep(std::vector<unsigned int>& v, const +char* p, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text p using the match flags flags +- see match flags. For each +match pushes the starting index of what matched onto v. +Returns the number of matches found. | ++ |
+ | unsigned int +Grep(std::vector<unsigned int>& v, const +std::string& s, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text s using the match flags flags +- see match flags. For each +match pushes the starting index of what matched onto v. +Returns the number of matches found. | ++ |
+ | unsigned int +GrepFiles(GrepFileCallback cb, const char* files, +bool recurse = false, unsigned int flags = +match_default); | +Finds all matches of the current
+expression in the files files using the match flags
+flags - see match flags. For
+each match calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering further matches in the current file, or any +further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of matches found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | unsigned int +GrepFiles(GrepFileCallback cb, const std::string& files, +bool recurse = false, unsigned int +flags = match_default); | +Finds all matches of the current
+expression in the files files using the match flags
+flags - see match flags. For
+each match calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering further matches in the current file, or any +further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of matches found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | unsigned int +FindFiles(FindFilesCallback cb, const char* files, +bool recurse = false, unsigned int +flags = match_default); | +Searches files to find all
+those which contain at least one match of the current expression
+using the match flags flags - see match flags. For each matching file
+calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering any further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of files found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | unsigned int +FindFiles(FindFilesCallback cb, const std::string& +files, bool recurse = false, unsigned +int flags = match_default); | +Searches files to find all
+those which contain at least one match of the current expression
+using the match flags flags - see match flags. For each matching file
+calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering any further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of files found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | std::string Merge(const +std::string& in, const std::string& fmt, bool +copy = true, unsigned int flags = +match_default); | +Performs a search and replace +operation: searches through the string in for all +occurrences of the current expression, for each occurrence replaces +the match with the format string fmt. Uses flags to +determine what gets matched, and how the format string should be +treated. If copy is true then all unmatched sections of +input are copied unchanged to output, if the flag +format_first_only is set then only the first occurance of the +pattern found is replaced. Returns the new string. See also format string syntax, match flags and format flags. | ++ |
+ | std::string Merge(const char* +in, const char* fmt, bool copy = true, +unsigned int flags = match_default); | +Performs a search and replace +operation: searches through the string in for all +occurrences of the current expression, for each occurrence replaces +the match with the format string fmt. Uses flags to +determine what gets matched, and how the format string should be +treated. If copy is true then all unmatched sections of +input are copied unchanged to output, if the flag +format_first_only is set then only the first occurance of the +pattern found is replaced. Returns the new string. See also format string syntax, match flags and format flags. | ++ |
+ | unsigned +Split(std::vector<std::string>& v, std::string& s, +unsigned flags = match_default, unsigned max_count = +~0); | +Splits the input string and pushes each one onto +the vector. If the expression contains no marked sub-expressions, +then one string is outputted for each section of the input that +does not match the expression. If the expression does contain +marked sub-expressions, then outputs one string for each marked +sub-expression each time a match occurs. Outputs no more than +max_count strings. Before returning, deletes from the input +string s all of the input that has been processed (all of +the string if max_count was not reached). Returns the number +of strings pushed onto the vector. | ++ |
+ | unsigned int +Position(int i = 0)const; | +Returns the position of what matched +sub-expression i. If i = 0 then returns the position +of the whole match. Returns RegEx::npos if the supplied index is +invalid, or if the specified sub-expression did not participate in +the match. | ++ |
+ | unsigned int +Length(int i = 0)const; | +Returns the length of what matched +sub-expression i. If i = 0 then returns the length of +the whole match. Returns RegEx::npos if the supplied index is +invalid, or if the specified sub-expression did not participate in +the match. | ++ |
+ | bool Matched(int i = +0)const; | +Returns true if sub-expression i was matched, false +otherwise. | ++ |
+ | unsigned int +Line()const; | +Returns the line on which the match +occurred, indexes start from 1 not zero, if no match occurred then +returns RegEx::npos. | ++ |
+ | unsigned int Marks() +const; | +Returns the number of marked +sub-expressions contained in the expression. Note that this +includes the whole match (sub-expression zero), so the value +returned is always >= 1. | ++ |
+ | std::string What(int +i)const; | +Returns a copy of what matched +sub-expression i. If i = 0 then returns a copy of the +whole match. Returns a null string if the index is invalid or if +the specified sub-expression did not participate in a match. | ++ |
+ | std::string +operator[](int i)const ; | +Returns what(i);
+
+ Can be used to simplify access to sub-expression matches, and +make usage more perl-like. + |
++ |
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/Attic/regex_format.html b/doc/Attic/regex_format.html new file mode 100644 index 00000000..786353e8 --- /dev/null +++ b/doc/Attic/regex_format.html @@ -0,0 +1,213 @@ + + + + +
+ |
+
+Boost.Regex+ +Algorithm regex_format (deprecated)+ |
+
+ |
+
The algorithm regex_format is deprecated; new code should use +match_results::format instead. Existing code will continue to +compile, the following documentation is taken from the previous +version of boost.regex and will not be further updated:
+ ++#include <boost/regex.hpp> ++ +
The algorithm regex_format takes the results of a match and +creates a new string based upon a +format string, regex_format can be used for search and replace +operations:
+ ++template <class OutputIterator, class iterator, class Allocator, class charT> +OutputIterator regex_format(OutputIterator out, + const match_results<iterator, Allocator>& m, + const charT* fmt, + match_flag_type flags = 0); +template <class OutputIterator, class iterator, class Allocator, class charT> +OutputIterator regex_format(OutputIterator out, + const match_results<iterator, Allocator>& m, + const std::basic_string<charT>& fmt, + match_flag_type flags = 0); ++ +
The library also defines the following convenience variation of +regex_format, which returns the result directly as a string, rather +than outputting to an iterator [note - this version may not be +available, or may be available in a more limited form, depending +upon your compilers capabilities]:
+ ++template <class iterator, class Allocator, class charT> +std::basic_string<charT> regex_format + (const match_results<iterator, Allocator>& m, + const charT* fmt, + match_flag_type flags = 0); + +template <class iterator, class Allocator, class charT> +std::basic_string<charT> regex_format + (const match_results<iterator, Allocator>& m, + const std::basic_string<charT>& fmt, + match_flag_type flags = 0); ++ +
Parameters to the main version of the function are passed as +follows:
+ + + ++ | OutputIterator out | +An output iterator type, the output +string is sent to this iterator. Typically this would be a +std::ostream_iterator. | ++ |
+ | const +match_results<iterator, Allocator>& m | +An instance of match_results<> +obtained from one of the matching algorithms above, and denoting +what matched. | ++ |
+ | const charT* fmt | +A format string that determines how +the match is transformed into the new string. | ++ |
+ | unsigned flags | +Optional flags which describe how the +format string is to be interpreted. | ++ |
Format flags are defined as +follows:
+ + + ++ | format_all | +Enables all syntax options (perl-like +plus extentions). | ++ |
+ | format_sed | +Allows only a sed-like syntax. | ++ |
+ | format_perl | +Allows only a perl-like syntax. | ++ |
+ | format_no_copy | +Disables copying of unmatched sections +to the output string during +regex_merge operations. | ++ |
+ | format_first_only | +When this flag is set only the first occurance will be replaced +(applies to regex_merge only). | ++ |
The format string syntax (and available options) is described +more fully under format strings +.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/Attic/regex_grep.html b/doc/Attic/regex_grep.html new file mode 100644 index 00000000..0c6f1218 --- /dev/null +++ b/doc/Attic/regex_grep.html @@ -0,0 +1,386 @@ + + + +
+ |
+
+ Boost.Regex+Algorithm regex_grep (deprecated)+ |
+
+ |
+
The algorithm regex_grep is deprecated in favor of regex_iterator + which provides a more convenient and standard library friendly interface.
+The following documentation is taken unchanged from the previous boost release, + and will not be updated in future.
++#include <boost/regex.hpp> ++
regex_grep allows you to search through a bidirectional-iterator range and + locate all the (non-overlapping) matches with a given regular expression. The + function is declared as:
++template <class Predicate, class iterator, class charT, class traits, class Allocator> +unsigned int regex_grep(Predicate foo, + iterator first, + iterator last, + const basic_regex<charT, traits, Allocator>& e, + unsigned flags = match_default) ++
The library also defines the following convenience versions, which take either + a const charT*, or a const std::basic_string<>& in place of a pair of + iterators [note - these versions may not be available, or may be available in a + more limited form, depending upon your compilers capabilities]:
++template <class Predicate, class charT, class Allocator, class traits> +unsigned int regex_grep(Predicate foo, + const charT* str, + const basic_regex<charT, traits, Allocator>& e, + unsigned flags = match_default); + +template <class Predicate, class ST, class SA, class Allocator, class charT, class traits> +unsigned int regex_grep(Predicate foo, + const std::basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator>& e, + unsigned flags = match_default); ++
The parameters for the primary version of regex_grep have the following + meanings:
+ ++ | foo | +A predicate function object or function pointer, see + below for more information. | ++ |
+ | first | +The start of the range to search. | ++ |
+ | last | +The end of the range to search. | ++ |
+ | e | +The regular expression to search for. | ++ |
+ | flags | +The flags that determine how matching is carried out, + one of the match_flags enumerators. | ++ |
The algorithm finds all of the non-overlapping matches of the expression e, for + each match it fills a match_results<iterator, + Allocator> structure, which contains information on what matched, and calls + the predicate foo, passing the match_results<iterator, Allocator> as a + single argument. If the predicate returns true, then the grep operation + continues, otherwise it terminates without searching for further matches. The + function returns the number of matches found.
+The general form of the predicate is:
++struct grep_predicate +{ + bool operator()(const match_results<iterator_type, typename expression_type::alloc_type::template rebind<sub_match<BidirectionalIterator> >::other>& m); +}; ++
Note that in almost every case the allocator parameter can be omitted, when + specifying the match_results type, + alternatively one of the typedefs cmatch, wcmatch, smatch or wsmatch can be + used.
+For example the regular expression "a*b" would find one match in the string + "aaaaab" and two in the string "aaabb".
+Remember this algorithm can be used for a lot more than implementing a version + of grep, the predicate can be and do anything that you want, grep utilities + would output the results to the screen, another program could index a file + based on a regular expression and store a set of bookmarks in a list, or a text + file conversion utility would output to file. The results of one regex_grep can + even be chained into another regex_grep to create recursive parsers.
+The algorithm may throw std::runtime_error
if the complexity
+ of matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Example: convert + the example from regex_search to use regex_grep instead:
++#include <string> +#include <map> +#include <boost/regex.hpp> + +// IndexClasses: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's +typedef std::map<std::string, int, std::less<std::string> > map_type; + +const char* re = + // possibly leading whitespace: + "^[[:space:]]*" + // possible template declaration: + "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + // class or struct: + "(class|struct)[[:space:]]*" + // leading declspec macros etc: + "(" + "\\<\\w+\\>" + "(" + "[[:blank:]]*\\([^)]*\\)" + ")?" + "[[:space:]]*" + ")*" + // the class name + "(\\<\\w*\\>)[[:space:]]*" + // template specialisation parameters + "(<[^;:{]+>)?[[:space:]]*" + // terminate in { or : + "(\\{|:[^;\\{()]*\\{)"; + +boost::regex expression(re); +class IndexClassesPred +{ + map_type& m; + std::string::const_iterator base; +public: + IndexClassesPred(map_type& a, std::string::const_iterator b) : m(a), base(b) {} + bool operator()(const smatch& what) + { + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; + } +}; +void IndexClasses(map_type& m, const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + regex_grep(IndexClassesPred(m, start), start, end, expression); +} ++
Example: Use + regex_grep to call a global callback function:
++#include <string> +#include <map> +#include <boost/regex.hpp> + +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's +typedef std::map<std::string, int, std::less<std::string> > map_type; + +const char* re = + // possibly leading whitespace: + "^[[:space:]]*" + // possible template declaration: + "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + // class or struct: + "(class|struct)[[:space:]]*" + // leading declspec macros etc: + "(" + "\\<\\w+\\>" + "(" + "[[:blank:]]*\\([^)]*\\)" + ")?" + "[[:space:]]*" + ")*" + // the class name + "(\\<\\w*\\>)[[:space:]]*" + // template specialisation parameters + "(<[^;:{]+>)?[[:space:]]*" + // terminate in { or : + "(\\{|:[^;\\{()]*\\{)"; + +boost::regex expression(re); +map_type class_index; +std::string::const_iterator base; + +bool grep_callback(const boost::smatch& what) +{ + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + class_index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; +} +void IndexClasses(const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + base = start; + regex_grep(grep_callback, start, end, expression, match_default); +} + ++
Example: use + regex_grep to call a class member function, use the standard library adapters std::mem_fun + and std::bind1st to convert the member function into a predicate:
++#include <string> +#include <map> +#include <boost/regex.hpp> +#include <functional> +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's + +typedef std::map<std::string, int, std::less<std::string> > map_type; +class class_index +{ + boost::regex expression; + map_type index; + std::string::const_iterator base; + bool grep_callback(boost::smatch what); +public: + void IndexClasses(const std::string& file); + class_index() + : index(), + expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" + "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" + "(\\{|:[^;\\{()]*\\{)" + ){} +}; +bool class_index::grep_callback(boost::smatch what) +{ + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; +} + +void class_index::IndexClasses(const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + base = start; + regex_grep(std::bind1st(std::mem_fun(&class_index::grep_callback), this), + start, + end, + expression); +} + ++
Finally, C++ + Builder users can use C++ Builder's closure type as a callback argument:
++#include <string> +#include <map> +#include <boost/regex.hpp> +#include <functional> +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's + +typedef std::map<std::string, int, std::less<std::string> > map_type; +class class_index +{ + boost::regex expression; + map_type index; + std::string::const_iterator base; + typedef boost::smatch arg_type; + bool grep_callback(const arg_type& what); +public: + typedef bool (__closure* grep_callback_type)(const arg_type&); + void IndexClasses(const std::string& file); + class_index() + : index(), + expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" + "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" + "(\\{|:[^;\\{()]*\\{)" + ){} +}; + +bool class_index::grep_callback(const arg_type& what) +{ + // what[0] contains the whole string +// what[5] contains the class name. +// what[6] contains the template specialisation if any. +// add class name and position to map: +index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; +} + +void class_index::IndexClasses(const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + base = start; + class_index::grep_callback_type cl = &(this->grep_callback); + regex_grep(cl, + start, + end, + expression); +} ++ +
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- + + 2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_iterator.html b/doc/Attic/regex_iterator.html new file mode 100644 index 00000000..4a24769b --- /dev/null +++ b/doc/Attic/regex_iterator.html @@ -0,0 +1,427 @@ + + + +
+ |
+
+ Boost.Regex+regex_iterator+ |
+
+ |
+
The iterator type regex_iterator will enumerate all of the regular expression + matches found in some sequence: dereferencing a regex_iterator yields a + reference to a match_results object.
++template <class BidirectionalIterator, + class charT = iterator_traits<BidirectionalIterator>::value_type, + class traits = regex_traits<charT>, + class Allocator = allocator<charT> > +class regex_iterator +{ +public: + typedef basic_regex<charT, traits, Allocator> regex_type; + typedef match_results<BidirectionalIterator> value_type; + typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; + typedef const value_type* pointer; + typedef const value_type& reference; + typedef std::forward_iterator_tag iterator_category; + + regex_iterator(); + regex_iterator(BidirectionalIterator a, BidirectionalIterator b, + const regex_type& re, + match_flag_type m = match_default); + regex_iterator(const regex_iterator&); + regex_iterator& operator=(const regex_iterator&); + bool operator==(const regex_iterator&); + bool operator!=(const regex_iterator&); + const value_type& operator*(); + const value_type* operator->(); + regex_iterator& operator++(); + regex_iterator operator++(int); +}; + ++
A regex_iterator is constructed from a pair of iterators, and enumerates all + occurrences of a regular expression within that iterator range.
++regex_iterator(); ++ +
Effects: constructs an end of sequence regex_iterator.
++regex_iterator(BidirectionalIterator a, BidirectionalIterator b, + const regex_type& re, + match_flag_type m = match_default); ++ +
Effects: constructs a regex_iterator that will enumerate all occurrences + of the expression re, within the sequence [a,b), and found + using match flags m. The object re must exist for the + lifetime of the regex_iterator.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
+regex_iterator(const regex_iterator& that); ++ +
Effects: constructs a copy of that
.
Postconditions: *this == that
.
+regex_iterator& operator=(const regex_iterator&); ++ +
Effects: sets *this
equal to those in that
.
Postconditions: *this == that
.
+bool operator==(const regex_iterator& that); ++ +
Effects: returns true if *this is equal to that.
++bool operator!=(const regex_iterator&); ++ +
Effects: returns !(*this == that)
.
+const value_type& operator*(); ++
Effects: dereferencing a regex_iterator object it yields a + const reference to a match_results object, + whose members are set as follows:
+ +
+ Element + |
+
+ Value + |
+
+ (*it).size() + |
+
+ re.mark_count() + |
+
+ (*it).empty() + |
+
+ false + |
+
+ (*it).prefix().first + |
+
+ The end of the last match found, or the start of the underlying sequence if + this is the first match enumerated + |
+
+ (*it).prefix().last + |
+
+ (*it)[0].first + |
+
+ (*it).prefix().matched + |
+
+ (*it).prefix().first != (*it).prefix().second + |
+
+ (*it).suffix().first + |
+
+ (*it)[0].second + |
+
+ (*it).suffix().last + |
+
+ The end of the underlying sequence. + |
+
+ (*it).suffix().matched + |
+
+ (*it).suffix().first != (*it).suffix().second + |
+
+ (*it)[0].first + |
+
+ The start of the sequence of characters that matched the regular expression + |
+
+ (*it)[0].second + |
+
+ The end of the sequence of characters that matched the regular expression + |
+
+ (*it)[0].matched + |
+
+
|
+
+ (*it)[n].first + |
+
+ For all integers n < (*it).size(), the start of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ (*it)[n].second + |
+
+ For all integers n < (*it).size(), the end of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ (*it)[n].matched + |
+
+ For all integers n < (*it).size(), true if sub-expression n participated + in the match, false otherwise. + |
+
(*it).position(n) | +For all integers n < (*it).size(), then the + distance from the start of the underlying sequence to the start of + sub-expression match n. | +
+const value_type* operator->(); ++ +
Effects: returns &(*this)
.
+regex_iterator& operator++(); ++
Effects: moves the iterator to the next match in the + underlying sequence, or the end of sequence iterator if none if found. + When the last match found matched a zero length string, then the + regex_iterator will find the next match as follows: if there exists a non-zero + length match that starts at the same location as the last one, then returns it, + otherwise starts looking for the next (possibly zero length) match from one + position to the right of the last match.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Returns: *this
.
+regex_iterator operator++(int); ++ +
Effects: constructs a copy result
of *this
,
+ then calls ++(*this)
.
Returns: result
.
The following example + takes a C++ source file and builds up an index of class names, and the location + of that class in the file.
++#include <string> +#include <map> +#include <fstream> +#include <iostream> +#include <boost/regex.hpp> + +using namespace std; + +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's + +typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type; + +const char* re = + // possibly leading whitespace: + "^[[:space:]]*" + // possible template declaration: + "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + // class or struct: + "(class|struct)[[:space:]]*" + // leading declspec macros etc: + "(" + "\\<\\w+\\>" + "(" + "[[:blank:]]*\\([^)]*\\)" + ")?" + "[[:space:]]*" + ")*" + // the class name + "(\\<\\w*\\>)[[:space:]]*" + // template specialisation parameters + "(<[^;:{]+>)?[[:space:]]*" + // terminate in { or : + "(\\{|:[^;\\{()]*\\{)"; + + +boost::regex expression(re); +map_type class_index; + +bool regex_callback(const boost::match_results<std::string::const_iterator>& what) +{ + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + class_index[what[5].str() + what[6].str()] = what.position(5); + return true; +} + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + +int main(int argc, const char** argv) +{ + std::string text; + for(int i = 1; i < argc; ++i) + { + cout << "Processing file " << argv[i] << endl; + std::ifstream fs(argv[i]); + load_file(text, fs); + // construct our iterators: + boost::regex_iterator<std::string::const_iterator> m1(text.begin(), text.end(), expression); + boost::regex_iterator<std::string::const_iterator> m2; + std::for_each(m1, m2, ®ex_callback); + // copy results: + cout << class_index.size() << " matches found" << endl; + map_type::iterator c, d; + c = class_index.begin(); + d = class_index.end(); + while(c != d) + { + cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl; + ++c; + } + class_index.erase(class_index.begin(), class_index.end()); + } + return 0; +} ++
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- + + 2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_match.html b/doc/Attic/regex_match.html new file mode 100644 index 00000000..1345180b --- /dev/null +++ b/doc/Attic/regex_match.html @@ -0,0 +1,317 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_match+ |
+
+ |
+
#include <boost/regex.hpp>+
The algorithm regex _match determines whether a given regular expression + matches a given sequence denoted by a pair of bidirectional-iterators, the + algorithm is defined as follows, note that the result is true only if the + expression matches the whole of the input sequence, the main use of + this function is data input validation. +
template <class BidirectionalIterator, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class BidirectionalIterator, class charT, class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class charT, class Allocator, class traits, class Allocator2> +bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class ST, class SA, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class charT, class traits, class Allocator2> +bool regex_match(const charT* str, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class ST, class SA, class charT, class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); ++
template <class BidirectionalIterator, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Requires: Type BidirectionalIterator meets the requirements of a + Bidirectional Iterator (24.1.4).
+Effects: Determines whether there is an exact match between the regular + expression e, and all of the character sequence [first, last), parameter + flags is used to control how the expression + is matched against the character sequence. Returns true if such a match + exists, false otherwise.
+Throws: std::runtime_error
if the complexity of matching the
+ expression against an N character string begins to exceed O(N2), or
+ if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Postconditions: If the function returns false, then the effect on + parameter m is undefined, otherwise the effects on parameter m are + given in the table:
++
+ Element + + |
+
+ Value + + |
+
+ m.size() + |
+
+ e.mark_count() + |
+
+ m.empty() + |
+
+ false + |
+
+ m.prefix().first + |
+
+ first + |
+
+ m.prefix().last + |
+
+ first + |
+
+ m.prefix().matched + |
+
+ false + |
+
+ m.suffix().first + |
+
+ last + |
+
+ m.suffix().last + |
+
+ last + |
+
+ m.suffix().matched + |
+
+ false + |
+
+ m[0].first + |
+
+ first + |
+
+ m[0].second + |
+
+ last + |
+
+ m[0].matched + |
+
+
|
+
+ m[n].first + |
+
+ For all integers n < m.size(), the start of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].second + |
+
+ For all integers n < m.size(), the end of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].matched + |
+
+ For all integers n < m.size(), true if sub-expression n participated + in the match, false otherwise. + |
+
+
template <class BidirectionalIterator, class charT, class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Behaves "as if" by constructing an instance of
+ match_results<
BidirectionalIterator> what
,
+ and then returning the result of regex_match(first, last, what, e, flags)
.
template <class charT, class Allocator, class traits, class Allocator2> +bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(str, str +
+ char_traits<charT>::length(str), m, e, flags)
.
template <class ST, class SA, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(s.begin(), s.end(), m, e,
+ flags)
.
template <class charT, class traits, class Allocator2> +bool regex_match(const charT* str, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(str, str +
+ char_traits<charT>::length(str), e, flags)
.
template <class ST, class SA, class charT, class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(s.begin(), s.end(), e,
+ flags)
.
+
The following example + processes an ftp response: +
+#include <stdlib.h> +#include <boost/regex.hpp> +#include <string> +#include <iostream> + +using namespace boost; + +regex expression("([0-9]+)(\\-| |$)(.*)"); + +// process_ftp: +// on success returns the ftp response code, and fills +// msg with the ftp response message. +int process_ftp(const char* response, std::string* msg) +{ + cmatch what; + if(regex_match(response, what, expression)) + { + // what[0] contains the whole string + // what[1] contains the response code + // what[2] contains the separator character + // what[3] contains the text message. + if(msg) + msg->assign(what[3].first, what[3].second); + return std::atoi(what[1].first); + } + // failure did not match + if(msg) + msg->erase(); + return -1; +} ++ ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_merge.html b/doc/Attic/regex_merge.html new file mode 100644 index 00000000..00c35d76 --- /dev/null +++ b/doc/Attic/regex_merge.html @@ -0,0 +1,47 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_merge (deprecated)+ |
+
+ |
+
Algorithm regex_merge has been renamed regex_replace, + existing code will continue to compile, but new code should use + regex_replace instead.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/Attic/regex_replace.html b/doc/Attic/regex_replace.html new file mode 100644 index 00000000..1e13b553 --- /dev/null +++ b/doc/Attic/regex_replace.html @@ -0,0 +1,213 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_replace+ |
+
+ |
+
#include <boost/regex.hpp>+
The algorithm regex_replace searches through a string finding + all the matches to the regular expression: for each match it then calls + match_results::format to format the string and sends the result to the + output iterator. Sections of text that do not match are copied to the output + unchanged only if the flags parameter does not have the flag + format_no_copy set. If the flag format_first_only + is set then only the first occurrence is replaced rather than all + occurrences.
template <class OutputIterator, class BidirectionalIterator, class traits, + class Allocator, class charT> +OutputIterator regex_replace(OutputIterator out, + BidirectionalIterator first, + BidirectionalIterator last, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default); + +template <class traits, class Allocator, class charT> +basic_string<charT> regex_replace(const basic_string<charT>& s, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default); + ++
template <class OutputIterator, class BidirectionalIterator, class traits, + class Allocator, class charT> +OutputIterator regex_replace(OutputIterator out, + BidirectionalIterator first, + BidirectionalIterator last, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default);+
Effects: Finds all the non-overlapping matches m of type match_results<BidirectionalIterator>
+
that occur within the sequence [first, last). If no such matches are
+ found and !(flags & format_no_copy)
then calls std::copy(first,
+ last, out)
. Otherwise, for each match found, if !(flags &
+ format_no_copy)
calls std::copy(m.prefix().first, m.prefix().last,
+ out)
, and then calls m.format(out, fmt, flags)
. Finally
+ if !(flags & format_no_copy)
calls std::copy(last_m.suffix().first,
+ last_m,suffix().last, out)
where last_m
is a copy of the
+ last match found. If flags & format_first_only
is non-zero
+ then only the first match found is replaced.
Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Returns: out
.
+
template <class traits, class Allocator, class charT> +basic_string<charT> regex_replace(const basic_string<charT>& s, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default);+
Effects: Constructs an object basic_string<charT> result
,
+ calls regex_replace(back_inserter(result), s.begin(), s.end(), e, fmt,
+ flags)
, and then returns result
.
+
The following example + takes C/C++ source code as input, and outputs syntax highlighted HTML code.
+ +#include <fstream> +#include <sstream> +#include <string> +#include <iterator> +#include <boost/regex.hpp> +#include <fstream> +#include <iostream> + +// purpose: +// takes the contents of a file and transform to +// syntax highlighted code in html format + +boost::regex e1, e2; +extern const char* expression_text; +extern const char* format_string; +extern const char* pre_expression; +extern const char* pre_format; +extern const char* header_text; +extern const char* footer_text; + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + +int main(int argc, const char** argv) +{ + try{ + e1.assign(expression_text); + e2.assign(pre_expression); + for(int i = 1; i < argc; ++i) + { + std::cout << "Processing file " << argv[i] << std::endl; + std::ifstream fs(argv[i]); + std::string in; + load_file(in, fs); + std::string out_name(std::string(argv[i]) + std::string(".htm")); + std::ofstream os(out_name.c_str()); + os << header_text; + // strip '<' and '>' first by outputting to a + // temporary string stream + std::ostringstream t(std::ios::out | std::ios::binary); + std::ostream_iterator<char, char> oi(t); + boost::regex_replace(oi, in.begin(), in.end(), + e2, pre_format, boost::match_default | boost::format_all); + // then output to final output stream + // adding syntax highlighting: + std::string s(t.str()); + std::ostream_iterator<char, char> out(os); + boost::regex_replace(out, s.begin(), s.end(), + e1, format_string, boost::match_default | boost::format_all); + os << footer_text; + } + } + catch(...) + { return -1; } + return 0; +} + +extern const char* pre_expression = "(<)|(>)|\\r"; +extern const char* pre_format = "(?1<)(?2>)"; + + +const char* expression_text = // preprocessor directives: index 1 + "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|" + // comment: index 2 + "(//[^\\n]*|/\\*.*?\\*/)|" + // literals: index 3 + "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|" + // string literals: index 4 + "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|" + // keywords: index 5 + "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import" + "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall" + "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool" + "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete" + "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto" + "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected" + "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast" + "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned" + "|using|virtual|void|volatile|wchar_t|while)\\>" + ; + +const char* format_string = "(?1<font color=\"#008040\">$&</font>)" + "(?2<I><font color=\"#000080\">$&</font></I>)" + "(?3<font color=\"#0000A0\">$&</font>)" + "(?4<font color=\"#0000FF\">$&</font>)" + "(?5<B>$&</B>)"; + +const char* header_text = "<HTML>\n<HEAD>\n" + "<TITLE>Auto-generated html formated source</TITLE>\n" + "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n" + "</HEAD>\n" + "<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n" + "<P> </P>\n<PRE>"; + +const char* footer_text = "</PRE>\n</BODY>\n\n"; ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_search.html b/doc/Attic/regex_search.html new file mode 100644 index 00000000..a7fcd9b8 --- /dev/null +++ b/doc/Attic/regex_search.html @@ -0,0 +1,328 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_search+ |
+
+ |
+
#include <boost/regex.hpp>+ +
The algorithm regex_search will search a range denoted by a pair of + bidirectional-iterators for a given regular expression. The algorithm uses + various heuristics to reduce the search time by only checking for a match if a + match could conceivably start at that position. The algorithm is defined as + follows: +
template <class BidirectionalIterator, + class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class ST, class SA, + class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(const basic_string<charT, ST, SA>& s, + match_results< + typename basic_string<charT, ST,SA>::const_iterator, + Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template<class charT, class Allocator, class traits, + class Allocator2> +bool regex_search(const charT* str, + match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class BidirectionalIterator, class Allocator, + class charT, class traits> +bool regex_search(BidirectionalIterator first, BidirectionalIterator last, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default); + +template <class charT, class Allocator, + class traits> +bool regex_search(const charT* str, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default); + +template<class ST, class SA, + class Allocator, class charT, + class traits> +bool regex_search(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default); ++
template <class BidirectionalIterator, class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Requires: Type BidirectionalIterator meets the requirements of a + Bidirectional Iterator (24.1.4).
+Effects: Determines whether there is some sub-sequence within + [first,last) that matches the regular expression e, parameter flags + is used to control how the expression is matched against the character + sequence. Returns true if such a sequence exists, false otherwise.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Postconditions: If the function returns false, then the effect on + parameter m is undefined, otherwise the effects on parameter m are + given in the table:
+
+ Element + |
+
+ Value + + |
+
+ m.size() + |
+
+ e.mark_count() + |
+
+ m.empty() + |
+
+ false + |
+
+ m.prefix().first + |
+
+ first + |
+
+ m.prefix().last + |
+
+ m[0].first + |
+
+ m.prefix().matched + |
+
+ m.prefix().first != m.prefix().second + |
+
+ m.suffix().first + |
+
+ m[0].second + |
+
+ m.suffix().last + |
+
+ last + |
+
+ m.suffix().matched + |
+
+ m.suffix().first != m.suffix().second + |
+
+ m[0].first + |
+
+ The start of the sequence of characters that matched the regular expression + |
+
+ m[0].second + |
+
+ The end of the sequence of characters that matched the regular expression + |
+
+ m[0].matched + |
+
+
|
+
+ m[n].first + |
+
+ For all integers n < m.size(), the start of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].second + |
+
+ For all integers n < m.size(), the end of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].matched + |
+
+ For all integers n < m.size(), true if sub-expression n participated + in the match, false otherwise. + |
+
template <class charT, class Allocator, class traits, class Allocator2> +bool regex_search(const charT* str, match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(str, str +
+ char_traits<charT>::length(str), m, e, flags)
.
template <class ST, class SA, class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(const basic_string<charT, ST, SA>& s, + match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(s.begin(), s.end(), m,
+ e, flags)
.
template <class iterator, class Allocator, class charT, + class traits> +bool regex_search(iterator first, iterator last, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default);+
Effects: Behaves "as if" by constructing an instance of
+ match_results<
BidirectionalIterator> what
,
+ and then returning the result of regex_search(first, last, what, e, flags)
.
template <class charT, class Allocator, class traits> +bool regex_search(const charT* str + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(str, str +
+ char_traits<charT>::length(str), e, flags)
.
template <class ST, class SA, class Allocator, class charT, + class traits> +bool regex_search(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(s.begin(), s.end(), e,
+ flags)
.
+
The following example, + takes the contents of a file in the form of a string, and searches for all the + C++ class declarations in the file. The code will work regardless of the way + that std::string is implemented, for example it could easily be modified to + work with the SGI rope class, which uses a non-contiguous storage strategy.
+ +#include <string> +#include <map> +#include <boost/regex.hpp> + +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's +typedef std::map<std::string, int, std::less<std::string> > map_type; + +boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); + +void IndexClasses(map_type& m, const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + boost::match_results<std::string::const_iterator> what; + unsigned int flags = boost::match_default; + while(regex_search(start, end, what, expression, flags)) + { + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - file.begin(); + // update search position: + start = what[0].second; + // update flags: + flags |= boost::match_prev_avail; + flags |= boost::match_not_bob; + } +} ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_split.html b/doc/Attic/regex_split.html new file mode 100644 index 00000000..e1eba954 --- /dev/null +++ b/doc/Attic/regex_split.html @@ -0,0 +1,148 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_split (deprecated)+ |
+
+ |
+
The algorithm regex_split has been deprecated in favor of the iterator + regex_token_iterator which has a more flexible and powerful interface, + as well as following the more usual standard library "pull" rather than "push" + semantics.
+Code which uses regex_split will continue to compile, the following + documentation is taken from the previous boost.regex version:
+#include <boost/regex.hpp>+
Algorithm regex_split performs a similar operation to the perl split operation, + and comes in three overloaded forms: +
+template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2> +std::size_t regex_split(OutputIterator out, + std::basic_string<charT, Traits1, Alloc1>& s, + const basic_regex<charT, Traits2, Alloc2>& e, + unsigned flags, + std::size_t max_split); + +template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2> +std::size_t regex_split(OutputIterator out, + std::basic_string<charT, Traits1, Alloc1>& s, + const basic_regex<charT, Traits2, Alloc2>& e, + unsigned flags = match_default); + +template <class OutputIterator, class charT, class Traits1, class Alloc1> +std::size_t regex_split(OutputIterator out, + std::basic_string<charT, Traits1, Alloc1>& s);+
Effects: Each version of the algorithm takes an + output-iterator for output, and a string for input. If the expression contains + no marked sub-expressions, then the algorithm writes one string onto the + output-iterator for each section of input that does not match the expression. + If the expression does contain marked sub-expressions, then each time a match + is found, one string for each marked sub-expression will be written to the + output-iterator. No more than max_split strings will be written to the + output-iterator. Before returning, all the input processed will be deleted from + the string s (if max_split is not reached then all of s will + be deleted). Returns the number of strings written to the output-iterator. If + the parameter max_split is not specified then it defaults to UINT_MAX. + If no expression is specified, then it defaults to "\s+", and splitting occurs + on whitespace. +
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Example: the + following function will split the input string into a series of tokens, and + remove each token from the string s: +
+unsigned tokenise(std::list<std::string>& l, std::string& s) +{ + return boost::regex_split(std::back_inserter(l), s); +}+
Example: the + following short program will extract all of the URL's from a html file, and + print them out to cout: +
+#include <list> +#include <fstream> +#include <iostream> +#include <boost/regex.hpp> + +boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"", + boost::regbase::normal | boost::regbase::icase); + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + // + // attempt to grow string buffer to match file size, + // this doesn't always work... + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + // use logarithmic growth stategy, in case + // in_avail (above) returned zero: + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + + +int main(int argc, char** argv) +{ + std::string s; + std::list<std::string> l; + + for(int i = 1; i < argc; ++i) + { + std::cout << "Findings URL's in " << argv[i] << ":" << std::endl; + s.erase(); + std::ifstream is(argv[i]); + load_file(s, is); + boost::regex_split(std::back_inserter(l), s, e); + while(l.size()) + { + s = *(l.begin()); + l.pop_front(); + std::cout << s << std::endl; + } + } + return 0; +}+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_token_iterator.html b/doc/Attic/regex_token_iterator.html new file mode 100644 index 00000000..03e2e64e --- /dev/null +++ b/doc/Attic/regex_token_iterator.html @@ -0,0 +1,286 @@ + + + ++
+ |
+
+ Boost.Regex+regex_token_iterator+ |
+
+ |
+
The template class regex_token_iterator
is an iterator adapter;
+ that is to say it represents a new view of an existing iterator sequence, by
+ enumerating all the occurrences of a regular expression within that sequence,
+ and presenting one or more new strings for each match found. Each position
+ enumerated by the iterator is a string that represents what matched a
+ particular sub-expression within the regular expression. When class regex_token_iterator
+ is used to enumerate a single sub-expression with index -1, then the iterator
+ performs field splitting: that is to say it enumerates one string for each
+ section of the character container sequence that does not match the regular
+ expression specified.
+template <class BidirectionalIterator, + class charT = iterator_traits<BidirectionalIterator>::value_type, + class traits = regex_traits<charT>, + class Allocator = allocator<charT> > +class regex_token_iterator +{ +public: + typedef basic_regex<charT, traits, Allocator> regex_type; + typedef basic_string<charT> value_type; + typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; + typedef const value_type* pointer; + typedef const value_type& reference; + typedef std::forward_iterator_tag iterator_category; + + regex_token_iterator(); + regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + int submatch = 0, match_flag_type m = match_default); + regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const std::vector<int>& submatches, match_flag_type m = match_default); + template <std::size_t N> + regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const int (&submatches)[N], match_flag_type m = match_default); + regex_token_iterator(const regex_token_iterator&); + regex_token_iterator& operator=(const regex_token_iterator&); + bool operator==(const regex_token_iterator&); + bool operator!=(const regex_token_iterator&); + const value_type& operator*(); + const value_type* operator->(); + regex_token_iterator& operator++(); + regex_token_iterator operator++(int); +}; ++
regex_token_iterator();+
Effects: constructs an end of sequence iterator.
+regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + int submatch = 0, match_flag_type m = match_default);+
Preconditions: !re.empty()
.
Effects: constructs a regex_token_iterator that will enumerate one + string for each regular expression match of the expression re found + within the sequence [a,b), using match flags m. The + string enumerated is the sub-expression submatch for each match + found; if submatch is -1, then enumerates all the text sequences that + did not match the expression re (that is to performs field splitting).
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const std::vector<int>& submatches, match_flag_type m = match_default);+
Preconditions: submatches.size() && !re.empty()
.
Effects: constructs a regex_token_iterator that will enumerate submatches.size() + strings for each regular expression match of the expression re found + within the sequence [a,b), using match flags m. For + each match found one string will be enumerated for each sub-expression + index contained within submatches vector; if submatches[0] + is -1, then the first string enumerated for each match will be all of the text + from end of the last match to the start of the current match, in addition there + will be one extra string enumerated when no more matches can be found: from the + end of the last match found, to the end of the underlying sequence.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
template <std::size_t N> +regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const int (&submatches)[R], match_flag_type m = match_default);+
Preconditions: !re.empty()
.
Effects: constructs a regex_token_iterator that will + enumerate R strings for each regular expression match of the + expression re found within the sequence [a,b), using match + flags m. For each match found one string will be + enumerated for each sub-expression index contained within the submatches + array; if submatches[0] is -1, then the first string enumerated + for each match will be all of the text from end of the last match to the start + of the current match, in addition there will be one extra string enumerated + when no more matches can be found: from the end of the last match found, to the + end of the underlying sequence.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
regex_token_iterator(const regex_token_iterator& that);+
Effects: constructs a copy of that
.
Postconditions: *this == that
.
regex_token_iterator& operator=(const regex_token_iterator& that);+
Effects: sets *this
to be equal to that
.
Postconditions: *this == that
.
bool operator==(const regex_token_iterator&);+
+ Effects: returns true if *this is the same position as that.
+bool operator!=(const regex_token_iterator&);+
+ Effects: returns !(*this == that)
.
const value_type& operator*();+
+ Effects: returns the current string being enumerated.
+const value_type* operator->();+
+ Effects: returns &(*this)
.
regex_token_iterator& operator++();+
+ Effects: Moves on to the next string to be enumerated.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
+ Returns: *this
.
regex_token_iterator& operator++(int);+
Effects: constructs a copy result
of *this
,
+ then calls ++(*this)
.
The following example + takes a string and splits it into a series of tokens:
++#include <iostream> +#include <boost/regex.hpp> + +using namespace std; + +int main(int argc) +{ + string s; + do{ + if(argc == 1) + { + cout << "Enter text to split (or \"quit\" to exit): "; + getline(cin, s); + if(s == "quit") break; + } + else + s = "This is a string of tokens"; + + boost::regex re("\\s+"); + boost::regex_token_iterator<std::string::const_iterator> i(s.begin(), s.end(), re, -1); + boost::regex_token_iterator<std::string::const_iterator> j; + + unsigned count = 0; + while(i != j) + { + cout << *i++ << endl; + count++; + } + cout << "There were " << count << " tokens found." << endl; + + }while(argc == 1); + return 0; +} + ++
The following example + takes a html file and outputs a list of all the linked files:
++#include <fstream> +#include <iostream> +#include <iterator> +#include <boost/regex.hpp> + +boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"", + boost::regex::normal | boost::regbase::icase); + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + // + // attempt to grow string buffer to match file size, + // this doesn't always work... + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + // use logarithmic growth stategy, in case + // in_avail (above) returned zero: + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + +int main(int argc, char** argv) +{ + std::string s; + int i; + for(i = 1; i < argc; ++i) + { + std::cout << "Findings URL's in " << argv[i] << ":" << std::endl; + s.erase(); + std::ifstream is(argv[i]); + load_file(s, is); + boost::regex_token_iterator<std::string::const_iterator> + i(s.begin(), s.end(), e, 1); + boost::regex_token_iterator<std::string::const_iterator> j; + while(i != j) + { + std::cout << *i++ << std::endl; + } + } + // + // alternative method: + // test the array-literal constructor, and split out the whole + // match as well as $1.... + // + for(i = 1; i < argc; ++i) + { + std::cout << "Findings URL's in " << argv[i] << ":" << std::endl; + s.erase(); + std::ifstream is(argv[i]); + load_file(s, is); + const int subs[] = {1, 0,}; + boost::regex_token_iterator<std::string::const_iterator> + i(s.begin(), s.end(), e, subs); + boost::regex_token_iterator<std::string::const_iterator> j; + while(i != j) + { + std::cout << *i++ << std::endl; + } + } + + return 0; +} ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/Attic/regex_traits.html b/doc/Attic/regex_traits.html new file mode 100644 index 00000000..a359e2e9 --- /dev/null +++ b/doc/Attic/regex_traits.html @@ -0,0 +1,48 @@ + + + ++
+ |
+
+ Boost.Regex+class regex_traits+ |
+
+ |
+
Under construction.
+The current boost.regex traits class design will be migrated to that specified + in the regular + expression standardization proposal.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/bad_expression.html b/doc/bad_expression.html new file mode 100644 index 00000000..c6e6d41f --- /dev/null +++ b/doc/bad_expression.html @@ -0,0 +1,82 @@ + + + +
+ |
+
+ Boost.Regex+class bad_expression+ |
+
+ |
+
#include <boost/pattern_except.hpp>
+The class bad_expression
defines the type of objects thrown as
+ exceptions to report errors during the conversion from a string representing a
+ regular expression to a finite state machine.
+namespace boost{ + +class bad_pattern : public std::runtime_error +{ +public: + explicit bad_pattern(const std::string& s) : std::runtime_error(s){}; +}; + +class bad_expression : public bad_pattern +{ +public: + bad_expression(const std::string& s) : bad_pattern(s) {} +}; + + +} // namespace boost ++
+bad_expression(const string& what_arg); ++
Effects: Constructs an object of class bad_expression
.
Postcondition: strcmp(what(), what_arg.c_str()) == 0
.
Footnotes: the class bad_pattern forms the base class for all + pattern-matching exceptions, of which bad_expression is one. The choice + of std::runtime_error as the base class for bad_pattern is moot; + depending upon how the library is used exceptions may be either logic errors + (programmer supplied expressions) or run time errors (user supplied + expressions).
+ +Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- + + 2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/basic_regex.html b/doc/basic_regex.html new file mode 100644 index 00000000..a8edfca2 --- /dev/null +++ b/doc/basic_regex.html @@ -0,0 +1,1293 @@ + + + +
+ |
+
+Boost.Regex+ +basic_regex+ |
+
+ |
+
+#include <boost/regex.hpp> ++ +
The template class basic_regex encapsulates regular +expression parsing and compilation. The class takes three template +parameters:
+ +charT: determines the character type, i.e. either +char or wchar_t.
+ +traits: determines the behavior of the character +type, for example which character class names are recognized. A +default traits class is provided: +regex_traits<charT>.
+ +Allocator: the allocator class used to allocate +memory by the class.
+ +For ease of use there are two typedefs that define the two +standard basic_regex instances, unless you want to use +custom traits classes or allocators, you won't need to use anything +other than these:
+ ++namespace boost{ +template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT> > +class basic_regex; +typedef basic_regex<char> regex; +typedef basic_regex<wchar_t> wregex; +} ++ +
The definition of basic_regex follows: it is based very +closely on class basic_string, and fulfils the requirements for a +constant-container of charT.
+ ++namespace boost{ + +template <class charT, + class traits = regex_traits<charT>, + class Allocator = allocator<charT> > +class basic_regex +{ +public: + // types: + typedef charT value_type; + typedef implementation defined const_iterator; + typedef const_iterator iterator; + typedef typename Allocator::reference reference; + typedef typename Allocator::const_reference const_reference; + typedef typename Allocator::difference_type difference_type; + typedef typename Allocator::size_type size_type; + typedef Allocator allocator_type; + typedef regex_constants::syntax_option_type flag_type; + typedef typename traits::locale_type locale_type; + + // constants: + static const regex_constants::syntax_option_type normal = regex_constants::normal; + static const regex_constants::syntax_option_type icase = regex_constants::icase; + static const regex_constants::syntax_option_type nosubs = regex_constants::nosubs; + static const regex_constants::syntax_option_type optimize = regex_constants::optimize; + static const regex_constants::syntax_option_type collate = regex_constants::collate; + static const regex_constants::syntax_option_type ECMAScript = normal; + static const regex_constants::syntax_option_type JavaScript = normal; + static const regex_constants::syntax_option_type JScript = normal; + // these flags are optional, if the functionality is supported + // then the flags shall take these names. + static const regex_constants::syntax_option_type basic = regex_constants::basic; + static const regex_constants::syntax_option_type extended = regex_constants::extended; + static const regex_constants::syntax_option_type awk = regex_constants::awk; + static const regex_constants::syntax_option_type grep = regex_constants::grep; + static const regex_constants::syntax_option_type egrep = regex_constants::egrep; + static const regex_constants::syntax_option_type sed = basic = regex_constants::sed; + static const regex_constants::syntax_option_type perl = regex_constants::perl; + + // construct/copy/destroy: + explicit basic_regex(const Allocator& a = Allocator()); + explicit basic_regex(const charT* p, flag_type f = regex_constants::normal, + const Allocator& a = Allocator()); + basic_regex(const charT* p1, const charT* p2, flag_type f = regex_constants::normal, + const Allocator& a = Allocator()); + basic_regex(const charT* p, size_type len, flag_type f, + const Allocator& a = Allocator()); + basic_regex(const basic_regex&); + template <class ST, class SA> + explicit basic_regex(const basic_string<charT, ST, SA>& p, + flag_type f = regex_constants::normal, + const Allocator& a = Allocator()); + template <class InputIterator> + basic_regex(InputIterator first, inputIterator last, + flag_type f = regex_constants::normal, + const Allocator& a = Allocator()); + + ~basic_regex(); + basic_regex& operator=(const basic_regex&); + basic_regex& operator=(const charT* ptr); + template <class ST, class SA> + basic_regex& operator=(const basic_string<charT, ST, SA>& p); + + // iterators: + const_iterator begin() const; + const_iterator end() const; + // capacity: + size_type size() const; + size_type max_size() const; + bool empty() const; + unsigned mark_count() const; + + // + // modifiers: + basic_regex& assign(const basic_regex& that); + basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal); + basic_regex& assign(const charT* first, const charT* last, + flag_type f = regex_constants::normal); + template <class string_traits, class A> + basic_regex& assign(const basic_string<charT, string_traits, A>& s, + flag_type f = regex_constants::normal); + template <class InputIterator> + basic_regex& assign(InputIterator first, InputIterator last, + flag_type f = regex_constants::normal); + + // const operations: + Allocator get_allocator() const; + flag_type getflags() const; + basic_string<charT> str() const; + int compare(basic_regex&) const; + // locale: + locale_type imbue(locale_type loc); + locale_type getloc() const; + // swap + void swap(basic_regex&) throw(); +}; + +template <class charT, class traits, class Allocator> +bool operator == (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); +template <class charT, class traits, class Allocator> +bool operator != (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); +template <class charT, class traits, class Allocator> +bool operator < (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); +template <class charT, class traits, class Allocator> +bool operator <= (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); +template <class charT, class traits, class Allocator> +bool operator >= (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); +template <class charT, class traits, class Allocator> +bool operator > (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); + +template <class charT, class io_traits, class re_traits, class Allocator> +basic_ostream<charT, io_traits>& + operator << (basic_ostream<charT, io_traits>& os, + const basic_regex<charT, re_traits, Allocator>& e); + +template <class charT, class traits, class Allocator> +void swap(basic_regex<charT, traits, Allocator>& e1, + basic_regex<charT, traits, Allocator>& e2); + +typedef basic_regex<char> regex; +typedef basic_regex<wchar_t> wregex; + +} // namespace boost ++ +
Class basic_regex has the following public member +functions:
+ ++static const regex_constants::syntax_option_type normal = regex_constants::normal; +static const regex_constants::syntax_option_type icase = regex_constants::icase; +static const regex_constants::syntax_option_type nosubs = regex_constants::nosubs; +static const regex_constants::syntax_option_type optimize = regex_constants::optimize; +static const regex_constants::syntax_option_type collate = regex_constants::collate; +static const regex_constants::syntax_option_type ECMAScript = normal; +static const regex_constants::syntax_option_type JavaScript = normal; +static const regex_constants::syntax_option_type JScript = normal; +static const regex_constants::syntax_option_type basic = regex_constants::basic; +static const regex_constants::syntax_option_type extended = regex_constants::extended; +static const regex_constants::syntax_option_type awk = regex_constants::awk; +static const regex_constants::syntax_option_type grep = regex_constants::grep; +static const regex_constants::syntax_option_type egrep = regex_constants::egrep; +static const regex_constants::syntax_option_type sed = basic = regex_constants::sed; +static const regex_constants::syntax_option_type perl = regex_constants::perl; ++ +
The static constant members are provided as synonyms for the
+constants declared in namespace
+boost::regex_constants
; for each constant of type
+syntax_option_type
declared in namespace
+boost::regex_constants
then a constant with the same name,
+type and value is declared within the scope of
+basic_regex
.
In all basic_regex
constructors, a copy of the
+Allocator
argument is used for any memory allocation
+performed by the constructor or member functions during the
+lifetime of the object.
+basic_regex(const Allocator& a = Allocator()); ++ + + +
Effects: Constructs an object of class
+basic_regex
. The postconditions of this function are
+indicated in the table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ true + |
+
+ size() + |
+
+ 0 + |
+
+ str() + |
+
+ basic_string<charT>() + |
+
+ ++ +
+basic_regex(const charT* p, flag_type f = regex_constants::normal, const Allocator& a = Allocator()); ++ + + +
Requires: p shall not be a null pointer.
+ + + +Throws: bad_expression
if p is not a
+valid regular expression.
Effects: Constructs an object of class
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the
+null-terminated string p, and interpreted according to the
+option flags specified
+in f. The postconditions of this function are indicated in
+the table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ false + |
+
+ size() + |
+
+ char_traits<charT>::length(p) + |
+
+ str() + |
+
+ basic_string<charT>(p) + |
+
+ getflags() + |
+
+ f + |
+
+ mark_count() + |
+
+ The number of marked sub-expressions within the expression. + |
+
+ ++ +
+basic_regex(const charT* p1, const charT* p2, flag_type f = regex_constants::normal, const Allocator& a = Allocator()); ++ + + +
Requires: p1 and p2 are not null pointers,
+p1 < p2
.
Throws: bad_expression
if [p1,p2) is not a
+valid regular expression.
Effects: Constructs an object of class
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the sequence
+of characters [p1,p2), and interpreted according the option flags specified in f.
+The postconditions of this function are indicated in the table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ false + |
+
+ size() + |
+
+ std::distance(p1,p2) + |
+
+ str() + |
+
+ basic_string<charT>(p1,p2) + |
+
+ getflags() + |
+
+ f + |
+
+ mark_count() + |
+
+ The number of marked sub-expressions within the expression. + |
+
+ ++ +
+basic_regex(const charT* p, size_type len, flag_type f, const Allocator& a = Allocator()); ++ + + +
Requires: p shall not be a null pointer, len
+< max_size()
.
Throws: bad_expression
if p is not a
+valid regular expression.
Effects: Constructs an object of class
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the sequence
+of characters [p, p+len), and interpreted according the option flags specified in f.
+The postconditions of this function are indicated in the table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ false + |
+
+ size() + |
+
+ len + |
+
+ str() + |
+
+ basic_string<charT>(p, len) + |
+
+ getflags() + |
+
+ f + |
+
+ mark_count() + |
+
+ The number of marked sub-expressions within the expression. + |
+
+ ++ +
+basic_regex(const basic_regex& e); ++ + + +
Effects: Constructs an object of class
+basic_regex
as a copy of the object e. The
+postconditions of this function are indicated in the table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ e.empty() + |
+
+ size() + |
+
+ e.size() + |
+
+ str() + |
+
+ e.str() + |
+
+ getflags() + |
+
+ e.getflags() + |
+
+ mark_count() + |
+
+ e.mark_count() + |
+
+ ++ +
+template <class ST, class SA> +basic_regex(const basic_string<charT, ST, SA>& s, + flag_type f = regex_constants::normal, const Allocator& a = Allocator()); ++ + + +
Throws: bad_expression
if s is not a
+valid regular expression.
Effects: Constructs an object of class
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the string
+s, and interpreted according to the option flags specified in f.
+The postconditions of this function are indicated in the table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ false + |
+
+ size() + |
+
+ s.size() + |
+
+ str() + |
+
+ s + |
+
+ getflags() + |
+
+ f + |
+
+ mark_count() + |
+
+ The number of marked sub-expressions within the expression. + |
+
+ ++ +
+template <class ForwardIterator> +basic_regex(ForwardIterator first, ForwardIterator last, + flag_type f = regex_constants::normal, const Allocator& a = Allocator()); ++ + + +
Throws: bad_expression
if the sequence
+[first, last) is not a valid regular expression.
Effects: Constructs an object of class
+basic_regex
; the object's internal finite state machine is
+constructed from the regular expression contained in the sequence
+of characters [first, last), and interpreted according to the option flags specified in
+f. The postconditions of this function are indicated in the
+table:
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ false + |
+
+ size() + |
+
+ distance(first,last) + |
+
+ str() + |
+
+ basic_string<charT>(first,last) + |
+
+ getflags() + |
+
+ f + |
+
+ mark_count() + |
+
+ The number of marked sub-expressions within the expression. + |
+
+ ++ +
+basic_regex& operator=(const basic_regex& e); ++ + + +
Effects: Returns the result of assign(e.str(),
+e.getflags())
.
+basic_regex& operator=(const charT* ptr); ++ + + +
Requires: p shall not be a null pointer.
+ + + +Effects: Returns the result of
+assign(ptr)
.
+template <class ST, class SA> +basic_regex& operator=(const basic_string<charT, ST, SA>& p); ++ + + +
Effects: Returns the result of
+assign(p)
.
+const_iterator begin() const; ++ + + +
Effects: Returns a starting iterator to a sequence of +characters representing the regular expression.
+ ++const_iterator end() const; ++ + + +
Effects: Returns termination iterator to a sequence of +characters representing the regular expression.
+ ++size_type size() const; ++ + + +
Effects: Returns the length of the sequence of characters +representing the regular expression.
+ ++size_type max_size() const; ++ + + +
Effects: Returns the maximum length of the sequence of +characters representing the regular expression.
+ ++bool empty() const; ++ + + +
Effects: Returns true if the object does not +contain a valid regular expression, otherwise false.
+ ++unsigned mark_count() const; ++ + + +
Effects: Returns the number of marked sub-expressions +within the regular expresion.
+ ++basic_regex& assign(const basic_regex& that); ++ + + +
Effects: Returns assign(that.str(),
+that.getflags())
.
+basic_regex& assign(const charT* ptr, flag_type f = regex_constants::normal); ++ + + +
Effects: Returns assign(string_type(ptr),
+f)
.
+basic_regex& assign(const charT* first, const charT* last, + flag_type f = regex_constants::normal); ++ + + +
Effects: Returns assign(string_type(first, last),
+f)
.
+template <class string_traits, class A> +basic_regex& assign(const basic_string<charT, string_traits, A>& s, + flag_type f = regex_constants::normal); ++ + + +
Throws: bad_expression
if s is not a
+valid regular expression.
Returns: *this
.
Effects: Assigns the regular expression contained in the +string s, interpreted according the option flags specified in f. +The postconditions of this function are indicated in the table:
+ + + +
+
+ Element + |
+
+
+ Value + |
+
+ empty() + |
+
+ false + |
+
+ size() + |
+
+ s.size() + |
+
+ str() + |
+
+ s + |
+
+ getflags() + |
+
+ f + |
+
+ mark_count() + |
+
+ The number of marked sub-expressions within the expression. + |
+
+ ++ +
+template <class InputIterator> +basic_regex& assign(InputIterator first, InputIterator last, + flag_type f = regex_constants::normal); ++ + + +
Requires: The type InputIterator corresponds to the Input +Iterator requirements (24.1.1).
+ + + +Effects: Returns assign(string_type(first, last),
+f)
.
+Allocator get_allocator() const; ++ + + +
Effects: Returns a copy of the Allocator that was passed +to the object's constructor.
+ ++flag_type getflags() const; ++ + + +
Effects: Returns a copy of the regular expression syntax
+flags that were passed to the object's constructor, or the last
+call to assign.
+basic_string<charT> str() const; ++ + + +
Effects: Returns a copy of the character sequence passed
+to the object's constructor, or the last call to
+assign.
+int compare(basic_regex& e)const; ++ + + +
Effects: If getflags() == e.getflags()
then
+returns str().compare(e.str())
, otherwise returns
+getflags() - e.getflags()
.
+locale_type imbue(locale_type l); ++ + + +
Effects: Returns the result of
+traits_inst.imbue(l)
where traits_inst
is a
+(default initialized) instance of the template parameter
+traits
stored within the object. Calls to imbue invalidate
+any currently contained regular expression.
Postcondition: empty() == true
.
+locale_type getloc() const; ++ + + +
Effects: Returns the result of
+traits_inst.getloc()
where traits_inst
is a
+(default initialized) instance of the template parameter
+traits
stored within the object.
+void swap(basic_regex& e) throw(); ++ + + +
Effects: Swaps the contents of the two regular +expressions.
+ + + +Postcondition: *this
contains the characters
+that were in e, e contains the regular expression
+that was in *this
.
Complexity: constant time.
+ ++template <class charT, class traits, class Allocator> +bool operator == (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: Returns lhs.compare(rhs) == 0
.
+template <class charT, class traits, class Allocator> +bool operator != (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: Returns lhs.compare(rhs) != 0
.
+template <class charT, class traits, class Allocator> +bool operator < (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: Returns lhs.compare(rhs) <
+0
.
+template <class charT, class traits, class Allocator> +bool operator <= (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: Returns lhs.compare(rhs) <=
+0
.
+template <class charT, class traits, class Allocator> +bool operator >= (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: Returns lhs.compare(rhs) >=
+0
.
+template <class charT, class traits, class Allocator> +bool operator > (const basic_regex<charT, traits, Allocator>& lhs, + const basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: Returns lhs.compare(rhs) >
+0
.
+template <class charT, class io_traits, class re_traits, class Allocator> +basic_ostream<charT, io_traits>& + operator << (basic_ostream<charT, io_traits>& os + const basic_regex<charT, re_traits, Allocator>& e); ++ + + +
Effects: Returns (os << e.str()).
+ ++template <class charT, class traits, class Allocator> +void swap(basic_regex<charT, traits, Allocator>& lhs, + basic_regex<charT, traits, Allocator>& rhs); ++ + + +
Effects: calls lhs.swap(rhs)
.
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/configuration.html b/doc/configuration.html new file mode 100644 index 00000000..205a9a83 --- /dev/null +++ b/doc/configuration.html @@ -0,0 +1,232 @@ + + + +
+ |
+
+Boost.Regex+ +Configuration and setup+ |
+
+ |
+
You shouldn't need to do anything special to configure +boost.regex for use with your compiler - the boost.config subsystem should already +take care of it, if you do have problems (or you are using a +particularly obscure compiler or platform) then boost.config has a configure script.
+ +The following macros (see user.hpp) control how +boost.regex interacts with the user's locale:
+ +BOOST_REGEX_USE_C_LOCALE | +Forces boost.regex to use the global C locale in its traits +class support: this is the default behavior on non-windows +platforms, but MS Windows platforms normally use the Win32 API for +locale support. | +
BOOST_REGEX_USE_CPP_LOCALE | +Forces boost.regex to use std::locale in it's default traits +class, regular expressions can then be imbued with an +instance specific locale. | +
BOOST_REGEX_NO_W32 | +Tells boost.regex not to use any Win32 API's even when +available (implies BOOST_REGEX_USE_C_LOCALE unless +BOOST_REGEX_USE_CPP_LOCALE is set). | +
BOOST_REGEX_DYN_LINK | +For Microsoft and Borland C++ builds, this tells boost.regex +that it should link to the dll build of the boost.regex. By +default boost.regex will link to its static library build, even if +the dynamic C runtime library is in use. | +
BOOST_REGEX_NO_LIB | +For Microsoft and Borland C++ builds, this tells boost.regex +that it should not automatically select the library to link +to. | +
BOOST_REGEX_V3 | +Tells boost.regex to use the boost-1.30.0 matching algorithm, +define only if you need maximum compatibility with previous +behavior. | +
BOOST_REGEX_RECURSIVE | +Tells boost.regex to use a stack-recursive matching +algorithm. This is generally the fastest option (although +there is very little in it), but can cause stack overflow in +extreme cases, on Win32 this can be handled safely, but this is not +the case on other platforms. | +
BOOST_REGEX_NON_RECURSIVE | +Tells boost.regex to use a non-stack recursive matching +algorithm, this can be slightly slower than the alternative, but is +always safe no matter how pathological the regular +expression. This is the default on non-Win32 platforms. | +
The following option applies only if BOOST_REGEX_RECURSIVE is +set.
+ +BOOST_REGEX_HAS_MS_STACK_GUARD | +Tells boost.regex that Microsoft style __try - __except blocks +are supported, and can be used to safely trap stack overflow. | +
The following options apply only if BOOST_REGEX_NON_RECURSIVE is +set.
+ +BOOST_REGEX_BLOCKSIZE | +In non-recursive mode, boost.regex uses largish blocks of +memory to act as a stack for the state machine, the larger the +block size then the fewer allocations that will take place. +This defaults to 4096 bytes, which is large enough to match the +vast majority of regular expressions without further +allocations, however, you can choose smaller or larger values +depending upon your platforms characteristics. | +
BOOST_REGEX_MAX_BLOCKS | +Tells boost.regex how many blocks of size BOOST_REGEX_BLOCKSIZE +it is permitted to use. If this value is exceeded then +boost.regex will stop trying to find a match and throw a +std::runtime_error. Defaults to 1024, don't forget to tweek +this value if you alter BOOST_REGEX_BLOCKSIZE by much. | +
BOOST_REGEX_MAX_CACHE_BLOCKS | +Tells boost.regex how many memory blocks to store in it's +internal cache - memory blocks are taken from this cache rather +than by calling ::operator new. Generally speeking this can +be an order of magnitude faster than calling ::opertator new each +time a memory block is required, but has the downside that +boost.regex can end up caching a large chunk of memory (by default +up to 16 blocks each of BOOST_REGEX_BLOCKSIZE size). If +memory is tight then try defining this to 0 (disables all caching), +or if that is too slow, then a value of 1 or 2, may be +sufficient. On the other hand, on large multi-processor, +multi-threaded systems, you may find that a higher value is in +order. | +
Revised +17 May 2003 +
+ +© Copyright John +Maddock 1998- +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/contacts.html b/doc/contacts.html new file mode 100644 index 00000000..f459d203 --- /dev/null +++ b/doc/contacts.html @@ -0,0 +1,110 @@ + + + +
+ |
+
+Boost.Regex+ +Contacts and Acknowledgements+ |
+
+ |
+
The author can be contacted at +john_maddock@compuserve.com, the home page for this library is +at +http://ourworld.compuserve.com/homepages/john_maddock/regexpp.htm, +and the official boost version can be obtained from www.boost.org/libraries.htm.
+ +I am indebted to Robert Sedgewick's "Algorithms in C++" for +forcing me to think about algorithms and their performance, and to +the folks at boost for forcing me to think, period. The +following people have all contributed useful comments or fixes: +Dave Abrahams, Mike Allison, Edan Ayal, Jayashree Balasubramanian, +Jan Bölsche, Beman Dawes, Paul Baxter, David Bergman, David +Dennerline, Edward Diener, Peter Dimov, Robert Dunn, Fabio Forno, +Tobias Gabrielsson, Rob Gillen, Marc Gregoire, Chris Hecker, Nick +Hodapp, Jesse Jones, Martin Jost, Boris Krasnovskiy, Jan Hermelink, +Max Leung, Wei-hao Lin, Jens Maurer, Richard Peters, Heiko Schmidt, +Jason Shirk, Gerald Slacik, Scobie Smith, Mike Smyth, Alexander +Sokolovsky, Hervé Poirier, Michael Raykh, Marc Recht, Scott +VanCamp, Bruno Voigt, Alexey Voinov, Jerry Waldorf, Rob Ward, +Lealon Watts, Thomas Witt and Yuval Yosef. I am also grateful to +the manuals supplied with the Henry Spencer, Perl and GNU regular +expression libraries - wherever possible I have tried to maintain +compatibility with these libraries and with the POSIX standard - +the code however is entirely my own, including any bugs! I can +absolutely guarantee that I will not fix any bugs I don't know +about, so if you have any comments or spot any bugs, please get in +touch.
+ +Useful further information can be found at:
+ +A short tutorial on regular expressions can be +found here.
+ +The Open Unix +Specification contains a wealth of useful material, including +the regular expression syntax, and specifications for +<regex.h> and +<nl_types.h>.
+ +The Pattern +Matching Pointers site is a "must visit" resource for anyone +interested in pattern matching.
+ +Glimpse and Agrep, +use a simplified regular expression syntax to achieve faster search +times.
+ +Udi Manber +and Ricardo +Baeza-Yates both have a selection of useful pattern matching +papers available from their respective web sites.
+ + + +Revised +17 May 2003 +
+ +© Copyright John +Maddock 1998- +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + diff --git a/doc/examples.html b/doc/examples.html new file mode 100644 index 00000000..eeb99819 --- /dev/null +++ b/doc/examples.html @@ -0,0 +1,99 @@ + + + +
+ |
+
+ Boost.Regex+Examples+ |
+
+ |
+
There are three demo applications that ship with this library, they all come + with makefiles for Borland, Microsoft and gcc compilers, otherwise you will + have to create your own makefiles.
+A regression test application that gives the matching/searching algorithms a + full workout. The presence of this program is your guarantee that the library + will behave as claimed - at least as far as those items tested are concerned - + if anyone spots anything that isn't being tested I'd be glad to hear about it.
+Files: parse.cpp, + regress.cpp, tests.cpp.
+A simple grep implementation, run with no command line options to find out its + usage. Look at fileiter.cpp/fileiter.hpp and + the mapfile class to see an example of a "smart" bidirectional iterator that + can be used with boost.regex or any other STL algorithm.
+Files: jgrep.cpp, + main.cpp.
+A simple interactive expression matching application, the results of all + matches are timed, allowing the programmer to optimize their regular + expressions where performance is critical.
+Files: regex_timer.cpp.
+The snippets examples contain the code examples used in the documentation:
+credit_card_example.cpp: + Credit card number formatting code.
+partial_regex_grep.cpp: + Search example using partial matches.
+partial_regex_match.cpp: + regex_match example using partial matches.
+regex_grep_example_1.cpp: + regex_grep example 1: searches a cpp file for class definitions.
+regex_grep_example_2.cpp: + regex_grep example 2: searches a cpp file for class definitions, using a global + callback function.
+regex_grep_example_3.cpp: + regex_grep example 2: searches a cpp file for class definitions, using a bound + member function callback.
+regex_grep_example_4.cpp: + regex_grep example 2: searches a cpp file for class definitions, using a C++ + Builder closure as a callback.
+regex_match_example.cpp: + ftp based regex_match example.
+regex_merge_example.cpp: + regex_merge example: converts a C++ file to syntax highlighted HTML.
+regex_replace_example.cpp: + regex_replace example: converts a C++ file to syntax highlighted HTML
+regex_search_example.cpp: + regex_search example: searches a cpp file for class definitions.
+regex_split_example_1.cpp: + regex_split example: split a string into tokens.
+regex_split_example_2.cpp + : regex_split example: spit out linked URL's.
+ +Revised +17 May 2003 +
+ +© Copyright John +Maddock 1998- +2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/faq.html b/doc/faq.html new file mode 100644 index 00000000..f398ad92 --- /dev/null +++ b/doc/faq.html @@ -0,0 +1,162 @@ + + + + +
+ |
+
+Boost.Regex+ +FAQ+ |
+
+ |
+
Q. Why can't I use the "convenience" versions of +regex_match / regex_search / regex_grep / regex_format / +regex_merge?
+ +A. These versions may or may not be available depending upon the +capabilities of your compiler, the rules determining the format of +these functions are quite complex - and only the versions visible +to a standard compliant compiler are given in the help. To find out +what your compiler supports, run <boost/regex.hpp> through +your C++ pre-processor, and search the output file for the function +that you are interested in.
+ +Q. I can't get +regex++ to work with escape characters, what's going +on?
+ +A. If you embed regular expressions in C++ code, then remember +that escape characters are processed twice: once by the C++ +compiler, and once by the regex++ expression compiler, so to pass +the regular expression \d+ to regex++, you need to embed "\\d+" in +your code. Likewise to match a literal backslash you will need to +embed "\\\\" in your code.
+ +Q. Why does using parenthesis in a POSIX +regular expression change the result of a match?
+ +For POSIX (extended and basic) regular expressions, but not for +perl regexes, parentheses don't only mark; they determine what the +best match is as well. When the expression is compiled as a POSIX +basic or extended regex then Boost.regex follows the POSIX standard +leftmost longest rule for determining what matched. So if there is +more than one possible match after considering the whole +expression, it looks next at the first sub-expression and then the +second sub-expression and so on. So...
+ ++"(0*)([0-9]*)" against "00123" would produce +$1 = "00" +$2 = "123" ++ +
where as
+ ++"0*([0-9)*" against "00123" would produce +$1 = "00123" ++ +
If you think about it, had $1 only matched the "123", this would +be "less good" than the match "00123" which is both further to the +left and longer. If you want $1 to match only the "123" part, then +you need to use something like:
+ ++"0*([1-9][0-9]*)" ++ +
as the expression.
+ +Q. Why don't character ranges work
+properly (POSIX mode only)?
+ A. The POSIX standard specifies that character range expressions
+are locale sensitive - so for example the expression [A-Z] will
+match any collating element that collates between 'A' and 'Z'. That
+means that for most locales other than "C" or "POSIX", [A-Z] would
+match the single character 't' for example, which is not what most
+people expect - or at least not what most people have come to
+expect from regular expression engines. For this reason, the
+default behaviour of boost.regex (perl mode) is to turn locale
+sensitive collation off by not setting the regex_constants::collate
+compile time flag. However if you set a non-default compile time
+flag - for example regex_constants::extended or
+regex_constants::basic, then locale dependent collation will be
+enabled, this also applies to the POSIX API functions which use
+either regex_constants::extended or regex_constants::basic
+internally. [Note - when regex_constants::nocollate in effect,
+the library behaves "as if" the LC_COLLATE locale category were
+always "C", regardless of what its actually set to - end
+note].
Q. Why are there no throw specifications +on any of the functions? What exceptions can the library +throw?
+ +A. Not all compilers support (or honor) throw specifications, +others support them but with reduced efficiency. Throw +specifications may be added at a later date as compilers begin to +handle this better. The library should throw only three types of +exception: boost::bad_expression can be thrown by basic_regex when +compiling a regular expression, std::runtime_error can be thrown +when a call to basic_regex::imbue tries to open a message catalogue +that doesn't exist, or when a call to regex_search or regex_match +results in an "everlasting" search, or when a call to +RegEx::GrepFiles or RegEx::FindFiles tries to open a file that +cannot be opened, finally std::bad_alloc can be thrown by just +about any of the functions in this library.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/format_syntax.html b/doc/format_syntax.html new file mode 100644 index 00000000..c267528d --- /dev/null +++ b/doc/format_syntax.html @@ -0,0 +1,272 @@ + + + + +
+ |
+
+Boost.Regex+ +Format String Syntax+ |
+
+ |
+
Format strings are used by the algorithm regex_merge and by match_results::format, and are used +to transform one string into another.
+ +There are three kind of format string: sed, Perl and extended, +the extended syntax is a superset of the others so this is covered +first.
+ +Extended format syntax
+ +In format strings, all characters are treated as literals +except: ()$\?:
+ +To use any of these as literals you must prefix them with the +escape character \
+ +The following special sequences are recognized:
+
+ Grouping:
Use the parenthesis characters ( and ) to group sub-expressions
+within the format string, use \( and \) to represent literal '('
+and ')'.
+
+ Sub-expression expansions:
The following Perl like expressions expand to a particular
+matched sub-expression:
+
+ | $` | +Expands to all the text from the end +of the previous match to the start of the current match, if there +was no previous match in the current operation, then everything +from the start of the input string to the start of the match. | ++ |
+ | $' | +Expands to all the text from the end +of the match to the end of the input string. | ++ |
+ | $& | +Expands to all of the current +match. | ++ |
+ | $0 | +Expands to all of the current +match. | ++ |
+ | $N | +Expands to the text that matched +sub-expression N. | ++ |
Conditional expressions:
+ +Conditional expressions allow two different format strings to be +selected dependent upon whether a sub-expression participated in +the match or not:
+ +?Ntrue_expression:false_expression
+ +Executes true_expression if sub-expression N participated +in the match, otherwise executes false_expression.
+ +Example: suppose we search for "(while)|(for)" then the format
+string "?1WHILE:FOR" would output what matched, but in upper
+case.
+
+ Escape sequences:
The following escape sequences are also allowed:
+
+ | \a | +The bell character. | ++ |
+ | \f | +The form feed character. | ++ |
+ | \n | +The newline character. | ++ |
+ | \r | +The carriage return character. | ++ |
+ | \t | +The tab character. | ++ |
+ | \v | +A vertical tab character. | ++ |
+ | \x | +A hexadecimal character - for example +\x0D. | ++ |
+ | \x{} | +A possible Unicode hexadecimal +character - for example \x{1A0} | ++ |
+ | \cx | +The ASCII escape character x, for +example \c@ is equivalent to escape-@. | ++ |
+ | \e | +The ASCII escape character. | ++ |
+ | \dd | +An octal character constant, for +example \10. | ++ |
Perl format strings
+ +Perl format strings are the same as the default syntax except +that the characters ()?: have no special meaning.
+ +Sed format strings
+ +Sed format strings use only the characters \ and & as +special characters.
+ +\n where n is a digit, is expanded to the nth +sub-expression.
+ +& is expanded to the whole of the match (equivalent to +\0).
+ +Other escape sequences are expanded as per the default +syntax.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/gcc-performance.html b/doc/gcc-performance.html new file mode 100644 index 00000000..dc0adb53 --- /dev/null +++ b/doc/gcc-performance.html @@ -0,0 +1,539 @@ + + + +The following tables provide comparisons between the following regular + expression libraries:
+ +The GNU regular expression library.
+Philip Hazel's PCRE library.
+Machine: Intel Pentium 4 2.8GHz PC.
+Compiler: GNU C++ version 3.2 20020927 (prerelease).
+C++ Standard Library: GNU libstdc++ version 20020927.
+OS: Cygwin.
+Boost version: 1.31.0.
+PCRE version: 4.1.
+As ever care should be taken in interpreting the results, only sensible regular + expressions (rather than pathological cases) are given, most are taken from the + Boost regex examples, or from the Library of + Regular Expressions. In addition, some variation in the relative + performance of these libraries can be expected on other machines - as memory + access and processor caching effects can be quite large for most finite state + machine algorithms. In each case the first figure given is the relative time + taken (so a value of 1.0 is as good as it gets), while the second figure is the + actual time taken.
+The following are the average relative scores for all the tests: the perfect + regular expression library would score 1, in practice anything less than 2 + is pretty good.
+Boost | +Boost + C++ locale | +POSIX | +PCRE | +
1.4503 | +1.49124 | +108.372 | +1.56255 | +
For each of the following regular expressions the time taken to find all + occurrences of the expression within a long English language text was measured + (mtent12.txt + from Project Gutenberg, 19Mb).
+Expression | +Boost | +Boost + C++ locale | +POSIX | +PCRE | +
Twain |
+ 3.49 + (0.205s) |
+ 4.09 + (0.24s) |
+ 65.2 + (3.83s) |
+ 1 + (0.0588s) |
+
Huck[[:alpha:]]+ |
+ 3.86 + (0.203s) |
+ 4.52 + (0.238s) |
+ 100 + (5.26s) |
+ 1 + (0.0526s) |
+
[[:alpha:]]+ing |
+ 1.01 + (1.23s) |
+ 1 + (1.22s) |
+ 4.95 + (6.04s) |
+ 4.67 + (5.71s) |
+
^[^ ]*?Twain |
+ 1 + (0.31s) |
+ 1.05 + (0.326s) |
+ NA | +3.32 + (1.03s) |
+
Tom|Sawyer|Huckleberry|Finn |
+ 1.02 + (0.125s) |
+ 1 + (0.123s) |
+ 165 + (20.3s) |
+ 1.08 + (0.133s) |
+
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
+ 1 + (0.345s) |
+ 1.03 + (0.355s) |
+ NA | +1.71 + (0.59s) |
+
For each of the following regular expressions the time taken to find all + occurrences of the expression within a medium sized English language text was + measured (the first 50K from mtent12.txt).
+Expression | +Boost | +Boost + C++ locale | +POSIX | +PCRE | +
Twain |
+ 1.8 + (0.000519s) |
+ 2.14 + (0.000616s) |
+ 9.08 + (0.00262s) |
+ 1 + (0.000289s) |
+
Huck[[:alpha:]]+ |
+ 3.65 + (0.000499s) |
+ 4.36 + (0.000597s) |
+ 1 + (0.000137s) |
+ 1.43 + (0.000196s) |
+
[[:alpha:]]+ing |
+ 1 + (0.00258s) |
+ 1 + (0.00258s) |
+ 5.28 + (0.0136s) |
+ 5.63 + (0.0145s) |
+
^[^ ]*?Twain |
+ 1 + (0.000929s) |
+ 1.03 + (0.000957s) |
+ NA | +2.82 + (0.00262s) |
+
Tom|Sawyer|Huckleberry|Finn |
+ 1 + (0.000812s) |
+ 1 + (0.000812s) |
+ 60.1 + (0.0488s) |
+ 1.28 + (0.00104s) |
+
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
+ 1.02 + (0.00178s) |
+ 1 + (0.00174s) |
+ 242 + (0.421s) |
+ 1.3 + (0.00227s) |
+
For each of the following regular expressions the time taken to find all + occurrences of the expression within the C++ source file + boost/crc.hpp was measured.
+Expression | +Boost | +Boost + C++ locale | +POSIX | +PCRE | +
^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([
+ ]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{) |
+ 1.04 + (0.000144s) |
+ 1 + (0.000139s) |
+ 862 + (0.12s) |
+ 4.56 + (0.000636s) |
+
(^[
+ ]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\> |
+ 1 + (0.0139s) |
+ 1.01 + (0.0141s) |
+ NA | +1.55 + (0.0216s) |
+
^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>) |
+ 1.04 + (0.000332s) |
+ 1 + (0.000318s) |
+ 130 + (0.0413s) |
+ 1.72 + (0.000547s) |
+
^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>) |
+ 1.02 + (0.000323s) |
+ 1 + (0.000318s) |
+ 150 + (0.0476s) |
+ 1.72 + (0.000547s) |
+
For each of the following regular expressions the time taken to find all + occurrences of the expression within the html file libs/libraries.htm + was measured.
+Expression | +Boost | +Boost + C++ locale | +POSIX | +PCRE | +
beman|john|dave |
+ 1.03 + (0.000367s) |
+ 1 + (0.000357s) |
+ 47.4 + (0.0169s) |
+ 1.16 + (0.000416s) |
+
<p>.*?</p> |
+ 1.25 + (0.000459s) |
+ 1 + (0.000367s) |
+ NA | +1.03 + (0.000376s) |
+
<a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*> |
+ 1 + (0.000509s) |
+ 1.02 + (0.000518s) |
+ 305 + (0.155s) |
+ 1.1 + (0.000558s) |
+
<h[12345678][^>]*>.*?</h[12345678]> |
+ 1.04 + (0.00025s) |
+ 1 + (0.00024s) |
+ NA | +1.16 + (0.000279s) |
+
<img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*> |
+ 2.22 + (0.000489s) |
+ 1.69 + (0.000372s) |
+ 148 + (0.0326s) |
+ 1 + (0.00022s) |
+
<font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font> |
+ 1.71 + (0.000371s) |
+ 1.75 + (0.000381s) |
+ NA | +1 + (0.000218s) |
+
For each of the following regular expressions the time taken to match against + the text indicated was measured.
+Expression | +Text | +Boost | +Boost + C++ locale | +POSIX | +PCRE | +
abc |
+ abc | +1.36 + (2.15e-07s) |
+ 1.36 + (2.15e-07s) |
+ 2.76 + (4.34e-07s) |
+ 1 + (1.58e-07s) |
+
^([0-9]+)(\-| |$)(.*)$ |
+ 100- this is a line of ftp response which contains a message string | +1.55 + (7.26e-07s) |
+ 1.51 + (7.07e-07s) |
+ 319 + (0.000149s) |
+ 1 + (4.67e-07s) |
+
([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
+ 1234-5678-1234-456 | +1.96 + (9.54e-07s) |
+ 1.96 + (9.54e-07s) |
+ 44.5 + (2.17e-05s) |
+ 1 + (4.87e-07s) |
+
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
+ john_maddock@compuserve.com | +1.22 + (1.51e-06s) |
+ 1.23 + (1.53e-06s) |
+ 162 + (0.000201s) |
+ 1 + (1.24e-06s) |
+
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
+ foo12@foo.edu | +1.28 + (1.47e-06s) |
+ 1.3 + (1.49e-06s) |
+ 104 + (0.00012s) |
+ 1 + (1.15e-06s) |
+
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
+ bob.smith@foo.tv | +1.28 + (1.47e-06s) |
+ 1.3 + (1.49e-06s) |
+ 113 + (0.00013s) |
+ 1 + (1.15e-06s) |
+
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
+ EH10 2QQ | +1.38 + (4.68e-07s) |
+ 1.41 + (4.77e-07s) |
+ 13.5 + (4.59e-06s) |
+ 1 + (3.39e-07s) |
+
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
+ G1 1AA | +1.28 + (4.35e-07s) |
+ 1.25 + (4.25e-07s) |
+ 11.7 + (3.97e-06s) |
+ 1 + (3.39e-07s) |
+
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
+ SW1 1ZZ | +1.32 + (4.53e-07s) |
+ 1.31 + (4.49e-07s) |
+ 12.2 + (4.2e-06s) |
+ 1 + (3.44e-07s) |
+
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
+ 4/1/2001 | +1.16 + (3.82e-07s) |
+ 1.2 + (3.96e-07s) |
+ 13.9 + (4.59e-06s) |
+ 1 + (3.29e-07s) |
+
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
+ 12/12/2001 | +1.38 + (4.49e-07s) |
+ 1.38 + (4.49e-07s) |
+ 16 + (5.2e-06s) |
+ 1 + (3.25e-07s) |
+
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+ 123 | +1.19 + (7.64e-07s) |
+ 1.16 + (7.45e-07s) |
+ 7.51 + (4.81e-06s) |
+ 1 + (6.4e-07s) |
+
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+ +3.14159 | +1.32 + (8.97e-07s) |
+ 1.31 + (8.88e-07s) |
+ 14 + (9.48e-06s) |
+ 1 + (6.78e-07s) |
+
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+ -3.14159 | +1.32 + (8.97e-07s) |
+ 1.31 + (8.88e-07s) |
+ 14 + (9.48e-06s) |
+ 1 + (6.78e-07s) |
+
Copyright John Maddock April 2003, all rights reserved.
+ + diff --git a/doc/headers.html b/doc/headers.html new file mode 100644 index 00000000..d0b8283c --- /dev/null +++ b/doc/headers.html @@ -0,0 +1,52 @@ + + + ++
+ |
+
+ Boost.Regex+Headers+ |
+
+ |
+
There are two main headers used by this library: <boost/regex.hpp> + provides full access to the entire library, while <boost/cregex.hpp> + provides access to just the high level class RegEx, and the POSIX API + functions. +
+There is also a header containing only forward declarations + <boost/regex_fwd.hpp> for use when an interface is dependent upon + boost::basic_regex, but otherwise does not need the full definitions.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/history.html b/doc/history.html new file mode 100644 index 00000000..17ca695c --- /dev/null +++ b/doc/history.html @@ -0,0 +1,58 @@ + + + ++
+ |
+
+ Boost.Regex+History+ |
+
+ |
+
Boost 1.31.0.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/implementation.html b/doc/implementation.html new file mode 100644 index 00000000..dfb8811a --- /dev/null +++ b/doc/implementation.html @@ -0,0 +1,45 @@ + + + ++
+ |
+
+ Boost.Regex+Implementation+ |
+
+ |
+
Todo.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/index.html b/doc/index.html new file mode 100644 index 00000000..88196f29 --- /dev/null +++ b/doc/index.html @@ -0,0 +1,127 @@ + + + + +
+ |
+
+ Boost.Regex+Index+ |
+
+ |
+
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- + + 2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + + diff --git a/doc/install.html b/doc/install.html new file mode 100644 index 00000000..f24fb744 --- /dev/null +++ b/doc/install.html @@ -0,0 +1,237 @@ + + + ++
+ |
+
+ Boost.Regex+Installation+ |
+
+ |
+
[ Important: If you are upgrading from the + 2.x version of this library then you will find a number of changes to the + documented header names and library interfaces, existing code should still + compile unchanged however - see + Note for Upgraders. ]
+When you extract the library from its zip file, you must preserve its internal + directory structure (for example by using the -d option when extracting). If + you didn't do that when extracting, then you'd better stop reading this, delete + the files you just extracted, and try again! +
+This library should not need configuring before use; most popular + compilers/standard libraries/platforms are already supported "as is". If you do + experience configuration problems, or just want to test the configuration with + your compiler, then the process is the same as for all of boost; see the + configuration library documentation.
+The library will encase all code inside namespace boost. +
+Unlike some other template libraries, this library consists of a mixture of + template code (in the headers) and static code and data (in cpp files). + Consequently it is necessary to build the library's support code into a library + or archive file before you can use it, instructions for specific platforms are + as follows: +
+ +make -fbcb5.mak+
The build process will build a variety of .lib and .dll files (the exact number + depends upon the version of Borland's tools you are using) the .lib and dll + files will be in a sub-directory called bcb4 or bcb5 depending upon the + makefile used. To install the libraries into your development system use:
+make -fbcb5.mak install
+library files will be copied to <BCROOT>/lib and the dll's to + <BCROOT>/bin, where <BCROOT> corresponds to the install path of + your Borland C++ tools. +
+You may also remove temporary files created during the build process (excluding + lib and dll files) by using:
+make -fbcb5.mak clean
+Finally when you use regex++ it is only necessary for you to add the + <boost> root director to your list of include directories for that + project. It is not necessary for you to manually add a .lib file to the + project; the headers will automatically select the correct .lib file for your + build mode and tell the linker to include it. There is one caveat however: the + library can not tell the difference between VCL and non-VCL enabled builds when + building a GUI application from the command line, if you build from the command + line with the 5.5 command line tools then you must define the pre-processor + symbol _NO_VCL in order to ensure that the correct link libraries are selected: + the C++ Builder IDE normally sets this automatically. Hint, users of the 5.5 + command line tools may want to add a -D_NO_VCL to bcc32.cfg in order to set + this option permanently. +
+If you would prefer to do a static link to the regex libraries even when using + the dll runtime then define BOOST_REGEX_STATIC_LINK, and if you want to + suppress automatic linking altogether (and supply your own custom build of the + lib) then define BOOST_REGEX_NO_LIB.
+If you are building with C++ Builder 6, you will find that + <boost/regex.hpp> can not be used in a pre-compiled header (the actual + problem is in <locale> which gets included by <boost/regex.hpp>), + if this causes problems for you, then try defining BOOST_NO_STD_LOCALE when + building, this will disable some features throughout boost, but may save you a + lot in compile times!
+ +You need version 6 of MSVC to build this library. If you are using VC5 then you + may want to look at one of the previous releases of this + library +
+Open up a command prompt, which has the necessary MSVC environment variables + defined (for example by using the batch file Vcvars32.bat installed by the + Visual Studio installation), and change to the <boost>\libs\regex\build + directory. +
+Select the correct makefile - vc6.mak for "vanilla" Visual C++ 6 or + vc6-stlport.mak if you are using STLPort.
+Invoke the makefile like this:
+nmake -fvc6.mak
+You will now have a collection of lib and dll files in a "vc6" subdirectory, to + install these into your development system use:
+nmake -fvc6.mak install
+The lib files will be copied to your <VC6>\lib directory and the dll + files to <VC6>\bin, where <VC6> is the root of your Visual C++ 6 + installation.
+You can delete all the temporary files created during the build (excluding lib + and dll files) using:
+nmake -fvc6.mak clean +
+Finally when you use regex++ it is only necessary for you to add the + <boost> root directory to your list of include directories for that + project. It is not necessary for you to manually add a .lib file to the + project; the headers will automatically select the correct .lib file for your + build mode and tell the linker to include it. +
+Note that if you want to statically link to the regex library when using the + dynamic C++ runtime, define BOOST_REGEX_STATIC_LINK when building your project + (this only has an effect for release builds). If you want to add the source + directly to your project then define BOOST_REGEX_NO_LIB to disable automatic + library selection.
+Important: there have been some reports of + compiler-optimization bugs affecting this library, (particularly with VC6 + versions prior to service patch 5) the workaround is to build the library using + /Oityb1 rather than /O2. That is to use all optimization settings except /Oa. + This problem is reported to affect some standard library code as well (in fact + I'm not sure if the problem is with the regex code or the underlying standard + library), so it's probably worthwhile applying this workaround in normal + practice in any case.
+Note: if you have replaced the C++ standard library that comes with VC6, then + when you build the library you must ensure that the environment variables + "INCLUDE" and "LIB" have been updated to reflect the include and library paths + for the new library - see vcvars32.bat (part of your Visual Studio + installation) for more details. Alternatively if STLPort is in c:/stlport then + you could use:
+nmake INCLUDES="-Ic:/stlport/stlport" XLFLAGS="/LIBPATH:c:/stlport/lib" + -fvc6-stlport.mak
+If you are building with the full STLPort v4.x, then use the vc6-stlport.mak
+ file provided and set the environment variable STLPORT_PATH to point to the
+ location of your STLport installation (Note that the full STLPort libraries
+ appear not to support single-thread static builds).
+
+
+
+
+
There is a conservative makefile for the g++ compiler. From the command prompt + change to the <boost>/libs/regex/build directory and type: +
+make -fgcc.mak +
+At the end of the build process you should have a gcc sub-directory containing + release and debug versions of the library (libboost_regex.a and + libboost_regex_debug.a). When you build projects that use regex++, you will + need to add the boost install directory to your list of include paths and add + <boost>/libs/regex/build/gcc/libboost_regex.a to your list of library + files. +
+There is also a makefile to build the library as a shared library:
+make -fgcc-shared.mak
+which will build libboost_regex.so and libboost_regex_debug.so.
+Both of the these makefiles support the following environment variables:
+CXXFLAGS: extra compiler options - note that this applies to both the debug and + release builds.
+INCLUDES: additional include directories.
+LDFLAGS: additional linker options.
+LIBS: additional library files.
+For the more adventurous there is a configure script in + <boost>/libs/config; see the config library + documentation.
+ +There is a makefile for the sun (6.1) compiler (C++ version 3.12). From the + command prompt change to the <boost>/libs/regex/build directory and type: +
+dmake -f sunpro.mak +
+At the end of the build process you should have a sunpro sub-directory + containing single and multithread versions of the library (libboost_regex.a, + libboost_regex.so, libboost_regex_mt.a and libboost_regex_mt.so). When you + build projects that use regex++, you will need to add the boost install + directory to your list of include paths and add + <boost>/libs/regex/build/sunpro/ to your library search path. +
+Both of the these makefiles support the following environment variables:
+CXXFLAGS: extra compiler options - note that this applies to both the single + and multithreaded builds.
+INCLUDES: additional include directories.
+LDFLAGS: additional linker options.
+LIBS: additional library files.
+LIBSUFFIX: a suffix to mangle the library name with (defaults to nothing).
+This makefile does not set any architecture specific options like -xarch=v9, + you can set these by defining the appropriate macros, for example:
+dmake CXXFLAGS="-xarch=v9" LDFLAGS="-xarch=v9" LIBSUFFIX="_v9" -f sunpro.mak
+will build v9 variants of the regex library named libboost_regex_v9.a etc.
+ +There is a generic makefile (generic.mak) + provided in <boost-root>/libs/regex/build - see that makefile for details + of environment variables that need to be set before use. Alternatively you can + using the Jam based build system. If + you need to configure the library for your platform, then refer to the + config library documentation + . +
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/introduction.html b/doc/introduction.html new file mode 100644 index 00000000..cd00847a --- /dev/null +++ b/doc/introduction.html @@ -0,0 +1,176 @@ + + + ++
+ |
+
+ Boost.Regex+Introduction+ |
+
+ |
+
Regular expressions are a form of pattern-matching that are often used in text + processing; many users will be familiar with the Unix utilities grep, sed + and awk, and the programming language Perl, each of which make + extensive use of regular expressions. Traditionally C++ users have been limited + to the POSIX C API's for manipulating regular expressions, and while regex++ + does provide these API's, they do not represent the best way to use the + library. For example regex++ can cope with wide character strings, or search + and replace operations (in a manner analogous to either sed or Perl), something + that traditional C libraries can not do.
+The class boost::basic_regex is the key class in + this library; it represents a "machine readable" regular expression, and is + very closely modeled on std::basic_string, think of it as a string plus the + actual state-machine required by the regular expression algorithms. Like + std::basic_string there are two typedefs that are almost always the means by + which this class is referenced:
+namespace boost{ + +template <class charT, + class traits = regex_traits<charT>, + class Allocator = std::allocator<charT> > +class basic_regex; + +typedef basic_regex<char> regex; +typedef basic_regex<wchar_t> wregex; + +}+
To see how this library can be used, imagine that we are writing a credit card + processing application. Credit card numbers generally come as a string of + 16-digits, separated into groups of 4-digits, and separated by either a space + or a hyphen. Before storing a credit card number in a database (not necessarily + something your customers will appreciate!), we may want to verify that the + number is in the correct format. To match any digit we could use the regular + expression [0-9], however ranges of characters like this are actually locale + dependent. Instead we should use the POSIX standard form [[:digit:]], or the + regex++ and Perl shorthand for this \d (note that many older libraries tended + to be hard-coded to the C-locale, consequently this was not an issue for them). + That leaves us with the following regular expression to validate credit card + number formats:
+(\d{4}[- ]){3}\d{4}
+Here the parenthesis act to group (and mark for future reference) + sub-expressions, and the {4} means "repeat exactly 4 times". This is an example + of the extended regular expression syntax used by Perl, awk and egrep. Regex++ + also supports the older "basic" syntax used by sed and grep, but this is + generally less useful, unless you already have some basic regular expressions + that you need to reuse.
+Now let's take that expression and place it in some C++ code to validate the + format of a credit card number:
+bool validate_card_format(const std::string s) +{ + static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); + return regex_match(s, e); +}+
Note how we had to add some extra escapes to the expression: remember that the + escape is seen once by the C++ compiler, before it gets to be seen by the + regular expression engine, consequently escapes in regular expressions have to + be doubled up when embedding them in C/C++ code. Also note that all the + examples assume that your compiler supports Koenig lookup, if yours doesn't + (for example VC6), then you will have to add some boost:: prefixes to some of + the function calls in the examples.
+Those of you who are familiar with credit card processing, will have realized + that while the format used above is suitable for human readable card numbers, + it does not represent the format required by online credit card systems; these + require the number as a string of 16 (or possibly 15) digits, without any + intervening spaces. What we need is a means to convert easily between the two + formats, and this is where search and replace comes in. Those who are familiar + with the utilities sed and Perl will already be ahead here; we + need two strings - one a regular expression - the other a "format + string" that provides a description of the text to replace the match + with. In regex++ this search and replace operation is performed with the + algorithm regex_replace, for our credit card example we can write two algorithms + like this to provide the format conversions:
+// match any format with the regular expression: +const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); +const std::string machine_format("\\1\\2\\3\\4"); +const std::string human_format("\\1-\\2-\\3-\\4"); + +std::string machine_readable_card_number(const std::string s) +{ + return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); +} + +std::string human_readable_card_number(const std::string s) +{ + return regex_replace(s, e, human_format, boost::match_default | boost::format_sed); +}+
Here we've used marked sub-expressions in the regular expression to split out + the four parts of the card number as separate fields, the format string then + uses the sed-like syntax to replace the matched text with the reformatted + version.
+In the examples above, we haven't directly manipulated the results of a regular + expression match, however in general the result of a match contains a number of + sub-expression matches in addition to the overall match. When the library needs + to report a regular expression match it does so using an instance of the class + match_results, as before there are typedefs of this class for the most + common cases: +
+namespace boost{ +typedef match_results<const char*> cmatch; +typedef match_results<const wchar_t*> wcmatch; +typedef match_results<std::string::const_iterator> smatch; +typedef match_results<std::wstring::const_iterator> wsmatch; +}+
The algorithms regex_search and + regex_grep (i.e. finding all matches in a string) make use of + match_results to report what matched.
+Note that these algorithms are not restricted to searching regular C-strings, + any bidirectional iterator type can be searched, allowing for the possibility + of seamlessly searching almost any kind of data. +
+For search and replace operations in addition to the algorithm + regex_replace that we have already seen, the algorithm + regex_format takes the result of a match and a format string, and + produces a new string by merging the two.
+For those that dislike templates, there is a high level wrapper class RegEx + that is an encapsulation of the lower level template code - it provides a + simplified interface for those that don't need the full power of the library, + and supports only narrow characters, and the "extended" regular expression + syntax. +
+The POSIX API functions: regcomp, regexec, regfree + and regerror, are available in both narrow character and Unicode versions, and + are provided for those who need compatibility with these API's. +
+Finally, note that the library now has run-time localization + support, and recognizes the full POSIX regular expression syntax - including + advanced features like multi-character collating elements and equivalence + classes - as well as providing compatibility with other regular expression + libraries including GNU and BSD4 regex packages, and to a more limited extent + Perl 5. +
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998-2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + + + diff --git a/doc/localisation.html b/doc/localisation.html new file mode 100644 index 00000000..e4184fd8 --- /dev/null +++ b/doc/localisation.html @@ -0,0 +1,1032 @@ + + + + +
+ |
+
+Boost.Regex+ +Localisation+ |
+
+ |
+
Boost.regex provides extensive support for run-time +localization, the localization model used can be split into two +parts: front-end and back-end.
+ +Front-end localization deals with everything which the user sees +- error messages, and the regular expression syntax itself. For +example a French application could change [[:word:]] to [[:mot:]] +and \w to \m. Modifying the front end locale requires active +support from the developer, by providing the library with a message +catalogue to load, containing the localized strings. Front-end +locale is affected by the LC_MESSAGES category only.
+ +Back-end localization deals with everything that occurs after +the expression has been parsed - in other words everything that the +user does not see or interact with directly. It deals with case +conversion, collation, and character class membership. The back-end +locale does not require any intervention from the developer - the +library will acquire all the information it requires for the +current locale from the underlying operating system / run time +library. This means that if the program user does not interact with +regular expressions directly - for example if the expressions are +embedded in your C++ code - then no explicit localization is +required, as the library will take care of everything for you. For +example embedding the expression [[:word:]]+ in your code will +always match a whole word, if the program is run on a machine with, +for example, a Greek locale, then it will still match a whole word, +but in Greek characters rather than Latin ones. The back-end locale +is affected by the LC_TYPE and LC_COLLATE categories.
+ +There are three separate localization mechanisms supported by +boost.regex:
+ +This is the default model when the library is compiled under +Win32, and is encapsulated by the traits class w32_regex_traits. +When this model is in effect there is a single global locale as +defined by the user's control panel settings, and returned by +GetUserDefaultLCID. All the settings used by boost.regex are +acquired directly from the operating system bypassing the C run +time library. Front-end localization requires a resource dll, +containing a string table with the user-defined strings. The traits +class exports the function:
+ +static std::string set_message_catalogue(const std::string& +s);
+ +which needs to be called with a string identifying the name of +the resource dll, before your code compiles any regular +expressions (but not necessarily before you construct any +basic_regex instances):
+ ++boost::w32_regex_traits<char>::set_message_catalogue("mydll.dll");
+ +Note that this API sets the dll name for both the narrow +and wide character specializations of w32_regex_traits.
+ +This model does not currently support thread specific locales +(via SetThreadLocale under Windows NT), the library provides full +Unicode support under NT, under Windows 9x the library degrades +gracefully - characters 0 to 255 are supported, the remainder are +treated as "unknown" graphic characters.
+ +This is the default model when the library is compiled under an +operating system other than Win32, and is encapsulated by the +traits class c_regex_traits, Win32 users can force this +model to take effect by defining the pre-processor symbol +BOOST_REGEX_USE_C_LOCALE. When this model is in effect there is a +single global locale, as set by setlocale. All settings are +acquired from your run time library, consequently Unicode support +is dependent upon your run time library implementation. Front end +localization requires a POSIX message catalogue. The traits class +exports the function:
+ +static std::string set_message_catalogue(const std::string& +s);
+ +which needs to be called with a string identifying the name of +the message catalogue, before your code compiles any regular +expressions (but not necessarily before you construct any +basic_regex instances):
+ ++boost::c_regex_traits<char>::set_message_catalogue("mycatalogue");
+ +Note that this API sets the dll name for both the narrow +and wide character specializations of c_regex_traits. If your run +time library does not support POSIX message catalogues, then you +can either provide your own implementation of <nl_types.h> or +define BOOST_RE_NO_CAT to disable front-end localization via +message catalogues.
+ +Note that calling setlocale invalidates all compiled +regular expressions, calling setlocale(LC_ALL, "C") will +make this library behave equivalent to most traditional regular +expression libraries including version 1 of this library.
+ +This model is only in effect if the library is built with the +pre-processor symbol BOOST_REGEX_USE_CPP_LOCALE defined. When this +model is in effect each instance of basic_regex<> has its own +instance of std::locale, class basic_regex<> also has a +member function imbue which allows the locale for the +expression to be set on a per-instance basis. Front end +localization requires a POSIX message catalogue, which will be +loaded via the std::messages facet of the expression's locale, the +traits class exports the symbol:
+ +static std::string set_message_catalogue(const std::string& +s);
+ +which needs to be called with a string identifying the name of +the message catalogue, before your code compiles any regular +expressions (but not necessarily before you construct any +basic_regex instances):
+ ++boost::cpp_regex_traits<char>::set_message_catalogue("mycatalogue");
+ +Note that calling basic_regex<>::imbue will invalidate any +expression currently compiled in that instance of +basic_regex<>. This model is the one which closest fits the +ethos of the C++ standard library, however it is the model which +will produce the slowest code, and which is the least well +supported by current standard library implementations, for example +I have yet to find an implementation of std::locale which supports +either message catalogues, or locales other than "C" or +"POSIX".
+ +Finally note that if you build the library with a non-default +localization model, then the appropriate pre-processor symbol +(BOOST_REGEX_USE_C_LOCALE or BOOST_REGEX_USE_CPP_LOCALE) must be +defined both when you build the support library, and when you +include <boost/regex.hpp> or <boost/cregex.hpp> in your +code. The best way to ensure this is to add the #define to +<boost/regex/user.hpp>.
+ +In order to localize the front end of the library, you need to
+provide the library with the appropriate message strings contained
+either in a resource dll's string table (Win32 model), or a POSIX
+message catalogue (C or C++ models). In the latter case the
+messages must appear in message set zero of the catalogue. The
+messages and their id's are as follows:
+
+ | Message id | +Meaning | +Default value | ++ |
+ | 101 | +The character used to start a +sub-expression. | +"(" | ++ |
+ | 102 | +The character used to end a +sub-expression declaration. | +")" | ++ |
+ | 103 | +The character used to denote an end of +line assertion. | +"$" | ++ |
+ | 104 | +The character used to denote the start +of line assertion. | +"^" | ++ |
+ | 105 | +The character used to denote the +"match any character expression". | +"." | ++ |
+ | 106 | +The match zero or more times +repetition operator. | +"*" | ++ |
+ | 107 | +The match one or more repetition +operator. | +"+" | ++ |
+ | 108 | +The match zero or one repetition +operator. | +"?" | ++ |
+ | 109 | +The character set opening +character. | +"[" | ++ |
+ | 110 | +The character set closing +character. | +"]" | ++ |
+ | 111 | +The alternation operator. | +"|" | ++ |
+ | 112 | +The escape character. | +"\\" | ++ |
+ | 113 | +The hash character (not currently +used). | +"#" | ++ |
+ | 114 | +The range operator. | +"-" | ++ |
+ | 115 | +The repetition operator opening +character. | +"{" | ++ |
+ | 116 | +The repetition operator closing +character. | +"}" | ++ |
+ | 117 | +The digit characters. | +"0123456789" | ++ |
+ | 118 | +The character which when preceded by +an escape character represents the word boundary assertion. | +"b" | ++ |
+ | 119 | +The character which when preceded by +an escape character represents the non-word boundary +assertion. | +"B" | ++ |
+ | 120 | +The character which when preceded by +an escape character represents the word-start boundary +assertion. | +"<" | ++ |
+ | 121 | +The character which when preceded by +an escape character represents the word-end boundary +assertion. | +">" | ++ |
+ | 122 | +The character which when preceded by +an escape character represents any word character. | +"w" | ++ |
+ | 123 | +The character which when preceded by +an escape character represents a non-word character. | +"W" | ++ |
+ | 124 | +The character which when preceded by +an escape character represents a start of buffer assertion. | +"`A" | ++ |
+ | 125 | +The character which when preceded by +an escape character represents an end of buffer assertion. | +"'z" | ++ |
+ | 126 | +The newline character. | +"\n" | ++ |
+ | 127 | +The comma separator. | +"," | ++ |
+ | 128 | +The character which when preceded by +an escape character represents the bell character. | +"a" | ++ |
+ | 129 | +The character which when preceded by +an escape character represents the form feed character. | +"f" | ++ |
+ | 130 | +The character which when preceded by +an escape character represents the newline character. | +"n" | ++ |
+ | 131 | +The character which when preceded by +an escape character represents the carriage return character. | +"r" | ++ |
+ | 132 | +The character which when preceded by +an escape character represents the tab character. | +"t" | ++ |
+ | 133 | +The character which when preceded by +an escape character represents the vertical tab character. | +"v" | ++ |
+ | 134 | +The character which when preceded by +an escape character represents the start of a hexadecimal character +constant. | +"x" | ++ |
+ | 135 | +The character which when preceded by +an escape character represents the start of an ASCII escape +character. | +"c" | ++ |
+ | 136 | +The colon character. | +":" | ++ |
+ | 137 | +The equals character. | +"=" | ++ |
+ | 138 | +The character which when preceded by +an escape character represents the ASCII escape character. | +"e" | ++ |
+ | 139 | +The character which when preceded by +an escape character represents any lower case character. | +"l" | ++ |
+ | 140 | +The character which when preceded by +an escape character represents any non-lower case character. | +"L" | ++ |
+ | 141 | +The character which when preceded by +an escape character represents any upper case character. | +"u" | ++ |
+ | 142 | +The character which when preceded by +an escape character represents any non-upper case character. | +"U" | ++ |
+ | 143 | +The character which when preceded by +an escape character represents any space character. | +"s" | ++ |
+ | 144 | +The character which when preceded by +an escape character represents any non-space character. | +"S" | ++ |
+ | 145 | +The character which when preceded by +an escape character represents any digit character. | +"d" | ++ |
+ | 146 | +The character which when preceded by +an escape character represents any non-digit character. | +"D" | ++ |
+ | 147 | +The character which when preceded by +an escape character represents the end quote operator. | +"E" | ++ |
+ | 148 | +The character which when preceded by +an escape character represents the start quote operator. | +"Q" | ++ |
+ | 149 | +The character which when preceded by +an escape character represents a Unicode combining character +sequence. | +"X" | ++ |
+ | 150 | +The character which when preceded by +an escape character represents any single character. | +"C" | ++ |
+ | 151 | +The character which when preceded by +an escape character represents end of buffer operator. | +"Z" | ++ |
+ | 152 | +The character which when preceded by +an escape character represents the continuation assertion. | +"G" | ++ |
+ | 153 | +The character which when preceeded by (? indicates a zero width +negated forward lookahead assert. | +! | ++ |
Custom error messages are loaded as follows:
+ + + ++ | Message ID | +Error message ID | +Default string | ++ |
+ | 201 | +REG_NOMATCH | +"No match" | ++ |
+ | 202 | +REG_BADPAT | +"Invalid regular expression" | ++ |
+ | 203 | +REG_ECOLLATE | +"Invalid collation character" | ++ |
+ | 204 | +REG_ECTYPE | +"Invalid character class name" | ++ |
+ | 205 | +REG_EESCAPE | +"Trailing backslash" | ++ |
+ | 206 | +REG_ESUBREG | +"Invalid back reference" | ++ |
+ | 207 | +REG_EBRACK | +"Unmatched [ or [^" | ++ |
+ | 208 | +REG_EPAREN | +"Unmatched ( or \\(" | ++ |
+ | 209 | +REG_EBRACE | +"Unmatched \\{" | ++ |
+ | 210 | +REG_BADBR | +"Invalid content of \\{\\}" | ++ |
+ | 211 | +REG_ERANGE | +"Invalid range end" | ++ |
+ | 212 | +REG_ESPACE | +"Memory exhausted" | ++ |
+ | 213 | +REG_BADRPT | +"Invalid preceding regular +expression" | ++ |
+ | 214 | +REG_EEND | +"Premature end of regular +expression" | ++ |
+ | 215 | +REG_ESIZE | +"Regular expression too big" | ++ |
+ | 216 | +REG_ERPAREN | +"Unmatched ) or \\)" | ++ |
+ | 217 | +REG_EMPTY | +"Empty expression" | ++ |
+ | 218 | +REG_E_UNKNOWN | +"Unknown error" | ++ |
Custom character class names are loaded as followed:
+ + + ++ | Message ID | +Description | +Equivalent default class name | ++ |
+ | 300 | +The character class name for +alphanumeric characters. | +"alnum" | ++ |
+ | 301 | +The character class name for +alphabetic characters. | +"alpha" | ++ |
+ | 302 | +The character class name for control +characters. | +"cntrl" | ++ |
+ | 303 | +The character class name for digit +characters. | +"digit" | ++ |
+ | 304 | +The character class name for graphics +characters. | +"graph" | ++ |
+ | 305 | +The character class name for lower +case characters. | +"lower" | ++ |
+ | 306 | +The character class name for printable +characters. | +"print" | ++ |
+ | 307 | +The character class name for +punctuation characters. | +"punct" | ++ |
+ | 308 | +The character class name for space +characters. | +"space" | ++ |
+ | 309 | +The character class name for upper +case characters. | +"upper" | ++ |
+ | 310 | +The character class name for +hexadecimal characters. | +"xdigit" | ++ |
+ | 311 | +The character class name for blank +characters. | +"blank" | ++ |
+ | 312 | +The character class name for word +characters. | +"word" | ++ |
+ | 313 | +The character class name for Unicode +characters. | +"unicode" | ++ |
Finally, custom collating element names are loaded starting from +message id 400, and terminating when the first load thereafter +fails. Each message looks something like: "tagname string" where +tagname is the name used inside [[.tagname.]] and +string is the actual text of the collating element. Note that +the value of collating element [[.zero.]] is used for the +conversion of strings to numbers - if you replace this with another +value then that will be used for string parsing - for example use +the Unicode character 0x0660 for [[.zero.]] if you want to use +Unicode Arabic-Indic digits in your regular expressions in place of +Latin digits.
+ +Note that the POSIX defined names for character classes and +collating elements are always available - even if custom names are +defined, in contrast, custom error messages, and custom syntax +messages replace the default ones.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/match_flag_type.html b/doc/match_flag_type.html new file mode 100644 index 00000000..0e89736a --- /dev/null +++ b/doc/match_flag_type.html @@ -0,0 +1,330 @@ + + + + +
+ |
+
+Boost.Regex+ +match_flag_type+ |
+
+ |
+
The type match_flag_type
is an implementation
+defined bitmask type (17.3.2.1.2) that controls how a regular
+expression is matched against a character sequence.
+namespace std{ namespace regex_constants{ + +typedef bitmask_type match_flag_type; + +static const match_flag_type match_default = 0; +static const match_flag_type match_not_bob; +static const match_flag_type match_not_eob; +static const match_flag_type match_not_bol; +static const match_flag_type match_not_eol; +static const match_flag_type match_not_bow; +static const match_flag_type match_not_eow; +static const match_flag_type match_any; +static const match_flag_type match_not_null; +static const match_flag_type match_continuous; +static const match_flag_type match_partial; +static const match_flag_type match_prev_avail; +static const match_flag_type match_not_dot_newline; +static const match_flag_type match_not_dot_null; + +static const match_flag_type format_default = 0; +static const match_flag_type format_sed; +static const match_flag_type format_perl; +static const match_flag_type format_no_copy; +static const match_flag_type format_first_only; +static const match_flag_type format_all; + +} // namespace regex_constants +} // namespace std ++ +
The type match_flag_type
is an implementation
+defined bitmask type (17.3.2.1.2). When matching a regular
+expression against a sequence of characters [first, last) then
+setting its elements has the effects listed in the table below:
+ Element + |
+
+ Effect if set + |
+
+ match_default + |
+
+ Specifies that matching of regular expressions proceeds without +any modification of the normal rules used in ECMA-262, ECMAScript +Language Specification, Chapter 15 part 10, RegExp (Regular +Expression) Objects (FWD.1) + |
+
match_not_bob | +Specifies that the expression "\A" +should not match against the sub-sequence [first,first). | +
match_not_eob | +Specifies that the expressions "\z" +and "\Z" should not match against the sub-sequence +[last,last). | +
+ match_not_bol + |
+
+ Specifies that the expression "^" should not be matched against +the sub-sequence [first,first). + |
+
+ match_not_eol + |
+
+ Specifies that the expression "$" should not be matched against +the sub-sequence [last,last). + |
+
+ match_not_bow + |
+
+ Specifies that the expression "\b" should not be matched against +the sub-sequence [first,first). + |
+
+ match_not_eow + |
+
+ Specifies that the expression "\b" should not be matched against +the sub-sequence [last,last). + |
+
+ match_any + |
+
+ Specifies that if more than one match is possible then any match +is an acceptable result. + |
+
+ match_not_null + |
+
+ Specifies that the expression can not be matched against an +empty sequence. + |
+
+ match_continuous + |
+
+ Specifies that the expression must match a sub-sequence that +begins at first. + |
+
+ match_partial + |
+
+ Specifies that if no match can be found, then it is acceptable +to return a match [from, last) where from!=last, if there exists +some sequence of characters [from,to) of which [from,last) is a +prefix, and which would result in a full match. + |
+
+ match_prev_avail + |
+
+ Specifies that |
+
match_not_dot_newline | +Specifies that the expression "." does +not match a newline character. | +
match_not_dot_null | +Specified that the expression "." does +not match a character null '\0'. | +
+ format_default + |
+
+ Specifies that when a regular expression match is to be replaced +by a new string, that the new string is constructed using the rules +used by the ECMAScript replace function in ECMA-262, ECMAScript +Language Specification, Chapter 15 part 5.4.11 +String.prototype.replace. (FWD.1). In addition during search and +replace operations then all non-overlapping occurrences of the +regular expression are located and replaced, and sections of the +input that did not match the expression, are copied unchanged to +the output string. + |
+
+ format_sed + |
+
+ Specifies that when a regular expression match is to be replaced +by a new string, that the new string is constructed using the rules +used by the Unix sed utility in IEEE Std 1003.1-2001, Portable +Operating SystemInterface (POSIX ), Shells and Utilities.. + |
+
+ format_perl + |
+
+ Specifies that when a regular expression match is to be replaced +by a new string, that the new string is constructed using an +implementation defined superset of the rules used by the ECMAScript +replace function in ECMA-262, ECMAScript Language Specification, +Chapter 15 part 5.4.11 String.prototype.replace (FWD.1). + |
+
format_all | +Specifies that all syntax +extensions are enabled, including conditional +(?ddexpression1:expression2) replacements. | +
+ format_no_copy + |
+
+ When specified during a search and replace operation, then +sections of the character container sequence being searched that do +match the regular expression, are not copied to the output +string. + |
+
+ format_first_only + |
+
+ When specified during a search and replace operation, then only +the first occurrence of the regular expression is replaced. + |
+
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/match_results.html b/doc/match_results.html new file mode 100644 index 00000000..9acc3afc --- /dev/null +++ b/doc/match_results.html @@ -0,0 +1,511 @@ + + + + +
+ |
+
+Boost.Regex+ +class match_results+ |
+
+ |
+
#include <boost/regex.hpp>
+ +Regular expressions are different from many simple +pattern-matching algorithms in that as well as finding an overall +match they can also produce sub-expression matches: each +sub-expression being delimited in the pattern by a pair of +parenthesis (...). There has to be some method for reporting +sub-expression matches back to the user: this is achieved this by +defining a class match_results that acts as an indexed +collection of sub-expression matches, each sub-expression match +being contained in an object of type +sub_match .
+ +Template class match_results denotes a collection of character +sequences representing the result of a regular expression match. +Objects of type match_results are passed to the algorithms regex_match and +regex_search, and are returned by the iterator regex_iterator . Storage for the +collection is allocated and freed as necessary by the member +functions of class match_results.
+ +The template class match_results conforms to the requirements of +a Sequence, as specified in (lib.sequence.reqmts), except that only +operations defined for const-qualified Sequences are supported.
+ +Class template match_results is most commonly used as one of the +typedefs cmatch, wcmatch, smatch, or wsmatch:
+ ++template <class BidirectionalIterator, + class Allocator = allocator<sub_match<BidirectionalIterator> > +class match_results; + +typedef match_results<const char*> cmatch; +typedef match_results<const wchar_t*> wcmatch; +typedef match_results<string::const_iterator> smatch; +typedef match_results<wstring::const_iterator> wsmatch; + +template <class BidirectionalIterator, + class Allocator = allocator<sub_match<BidirectionalIterator> > +class match_results +{ +public: + typedef sub_match<BidirectionalIterator> value_type; + typedef const value_type& const_reference; + typedef const_reference reference; + typedef implementation defined const_iterator; + typedef const_iterator iterator; + typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; + typedef typename Allocator::size_type size_type; + typedef Allocator allocator_type; + typedef typename iterator_traits<BidirectionalIterator>::value_type char_type; + typedef basic_string<char_type> string_type; + + // construct/copy/destroy: + explicit match_results(const Allocator& a = Allocator()); + match_results(const match_results& m); + match_results& operator=(const match_results& m); + ~match_results(); + + // size: + size_type size() const; + size_type max_size() const; + bool empty() const; + // element access: + difference_type length(int sub = 0) const; + difference_type position(unsigned int sub = 0) const; + string_type str(int sub = 0) const; + const_reference operator[](int n) const; + + const_reference prefix() const; + + const_reference suffix() const; + const_iterator begin() const; + const_iterator end() const; + // format: + template <class OutputIterator> + OutputIterator format(OutputIterator out, + const string_type& fmt, + match_flag_type flags = format_default) const; + string_type format(const string_type& fmt, + match_flag_type flags = format_default) const; + + allocator_type get_allocator() const; + void swap(match_results& that); +}; + +template <class BidirectionalIterator, class Allocator> +bool operator == (const match_results<BidirectionalIterator, Allocator>& m1, + const match_results<BidirectionalIterator, Allocator>& m2); +template <class BidirectionalIterator, class Allocator> +bool operator != (const match_results<BidirectionalIterator, Allocator>& m1, + const match_results<BidirectionalIterator, Allocator>& m2); + +template <class charT, class traits, class BidirectionalIterator, class Allocator> +basic_ostream<charT, traits>& + operator << (basic_ostream<charT, traits>& os, + const match_results<BidirectionalIterator, Allocator>& m); + +template <class BidirectionalIterator, class Allocator> +void swap(match_results<BidirectionalIterator, Allocator>& m1, + match_results<BidirectionalIterator, Allocator>& m2); ++ +
In all match_results
constructors, a copy of the
+Allocator argument is used for any memory allocation performed by
+the constructor or member functions during the lifetime of the
+object.
+match_results(const Allocator& a = Allocator()); ++ + +
Effects: Constructs an object of class match_results. The +postconditions of this function are indicated in the table:
+ + + +
+ Element + |
+
+ Value + |
+
+ empty() + |
+
+ true + |
+
+ size() + |
+
+ 0 + |
+
+ str() + |
+
+ basic_string<charT>() + |
+
+ +
+match_results(const match_results& m); ++ + +
Effects: Constructs an object of class match_results, as +a copy of m.
+ ++match_results& operator=(const match_results& m); ++ + +
Effects: Assigns m to *this. The postconditions of this +function are indicated in the table:
+ + + +
+ Element + |
+
+ Value + |
+
+ empty() + |
+
+ m.empty(). + |
+
+ size() + |
+
+ m.size(). + |
+
+ str(n) + |
+
+ m.str(n) for all integers n < m.size(). + |
+
+ prefix() + |
+
+ m.prefix(). + |
+
+ suffix() + |
+
+ m.suffix(). + |
+
+ (*this)[n] + |
+
+ m[n] for all integers n < m.size(). + |
+
+ length(n) + |
+
+ m.length(n) for all integers n < m.size(). + |
+
+ position(n) + |
+
+ m.position(n) for all integers n < m.size(). + |
+
+size_type size()const; ++ + +
Effects: Returns the number of sub_match elements stored +in *this.
+ ++size_type max_size()const; ++ + +
Effects: Returns the maximum number of sub_match elements +that can be stored in *this.
+ ++bool empty()const; ++ + +
Effects: Returns size() == 0
.
+difference_type length(int sub = 0)const; ++ + +
Effects: Returns (*this)[sub].length()
.
+difference_type position(unsigned int sub = 0)const; ++ + +
Effects: Returns std::distance(prefix().first,
+(*this)[sub].first).
+string_type str(int sub = 0)const; ++ + +
Effects: Returns
+string_type((*this)[sub]).
+const_reference operator[](int n) const; ++ + +
Effects: Returns a reference to the
+sub_match
object representing the character sequence that
+matched marked sub-expression n. If n == 0
then
+returns a reference to a sub_match
object representing
+the character sequence that matched the whole regular
+expression.
+const_reference prefix()const; ++ + +
Effects: Returns a reference to the
+sub_match
object representing the character sequence from
+the start of the string being matched/searched, to the start of the
+match found.
+const_reference suffix()const; ++ + +
Effects: Returns a reference to the
+sub_match
object representing the character sequence from
+the end of the match found to the end of the string being
+matched/searched.
+const_iterator begin()const; ++ + +
Effects: Returns a starting iterator that enumerates over +all the marked sub-expression matches stored in *this.
+ ++const_iterator end()const; ++ + +
Effects: Returns a terminating iterator that enumerates +over all the marked sub-expression matches stored in *this.
+ ++template <class OutputIterator> +OutputIterator format(OutputIterator out, + const string_type& fmt, + match_flag_type flags = format_default); ++ + +
Requires: The type OutputIterator conforms to the Output +Iterator requirements (24.1.2).
+ + +Effects: Copies the character sequence [fmt.begin(), +fmt.end()) to OutputIterator out. For each format +specifier or escape sequence in fmt, replace that sequence +with either the character(s) it represents, or the sequence of +characters within *this to which it refers. The bitmasks specified +in flags determines what +format specifiers or escape sequences +are recognized, by default this is the format used by ECMA-262, +ECMAScript Language Specification, Chapter 15 part 5.4.11 +String.prototype.replace.
+ + +Returns: out.
+ ++string_type format(const string_type& fmt, + match_flag_type flags = format_default); ++ + +
Effects: Returns a copy of the string fmt. For +each format specifier or escape sequence in fmt, replace +that sequence with either the character(s) it represents, or the +sequence of characters within *this to which it refers. The +bitmasks specified in +flags determines what format +specifiers or escape sequences are recognized, by default this +is the format used by ECMA-262, ECMAScript Language Specification, +Chapter 15 part 5.4.11 String.prototype.replace.
+ ++allocator_type get_allocator()const; ++ + +
Effects: Returns a copy of the Allocator that was passed +to the object's constructor.
+ ++void swap(match_results& that); ++ + +
Effects: Swaps the contents of the two sequences.
+ + +Postcondition: *this
contains the sequence
+of matched sub-expressions that were in that
,
+that
contains the sequence of matched sub-expressions that
+were in *this
.
Complexity: constant time.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/partial_matches.html b/doc/partial_matches.html new file mode 100644 index 00000000..3f4d2a53 --- /dev/null +++ b/doc/partial_matches.html @@ -0,0 +1,185 @@ + + + ++
+ |
+
+ Boost.Regex+Partial Matches+ |
+
+ |
+
The match-flag match_partial
can
+ be passed to the following algorithms: regex_match,
+ regex_search, and regex_grep.
+ When used it indicates that partial as well as full matches should be found. A
+ partial match is one that matched one or more characters at the end of the text
+ input, but did not match all of the regular expression (although it may have
+ done so had more input been available). Partial matches are typically used when
+ either validating data input (checking each character as it is entered on the
+ keyboard), or when searching texts that are either too long to load into memory
+ (or even into a memory mapped file), or are of indeterminate length (for
+ example the source may be a socket or similar). Partial and full matches can be
+ differentiated as shown in the following table (the variable M represents an
+ instance of match_results<> as filled in
+ by regex_match, regex_search or regex_grep):
+
+
+ | Result | +M[0].matched | +M[0].first | +M[0].second | +
No match | +False | +Undefined | +Undefined | +Undefined | +
Partial match | +True | +False | +Start of partial match. | +End of partial match (end of text). | +
Full match | +True | +True | +Start of full match. | +End of full match. | +
The following example
+ tests to see whether the text could be a valid credit card number, as the user
+ presses a key, the character entered would be added to the string being built
+ up, and passed to is_possible_card_number
. If this returns true
+ then the text could be a valid card number, so the user interface's OK button
+ would be enabled. If it returns false, then this is not yet a valid card
+ number, but could be with more input, so the user interface would disable the
+ OK button. Finally, if the procedure throws an exception the input could never
+ become a valid number, and the inputted character must be discarded, and a
+ suitable error indication displayed to the user.
#include <string> +#include <iostream> +#include <boost/regex.hpp> + +boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})"); + +bool is_possible_card_number(const std::string& input) +{ + // + // return false for partial match, true for full match, or throw for + // impossible match based on what we have so far... + boost::match_results<std::string::const_iterator> what; + if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial)) + { + // the input so far could not possibly be valid so reject it: + throw std::runtime_error("Invalid data entered - this could not possibly be a valid card number"); + } + // OK so far so good, but have we finished? + if(what[0].matched) + { + // excellent, we have a result: + return true; + } + // what we have so far is only a partial match... + return false; +}+
In the following example, + text input is taken from a stream containing an unknown amount of text; this + example simply counts the number of html tags encountered in the stream. The + text is loaded into a buffer and searched a part at a time, if a partial match + was encountered, then the partial match gets searched a second time as the + start of the next batch of text:
+#include <iostream> +#include <fstream> +#include <sstream> +#include <string> +#include <boost/regex.hpp> + +// match some kind of html tag: +boost::regex e("<[^>]*>"); +// count how many: +unsigned int tags = 0; +// saved position of partial match: +char* next_pos = 0; + +bool grep_callback(const boost::match_results<char*>& m) +{ + if(m[0].matched == false) + { + // save position and return: + next_pos = m[0].first; + } + else + ++tags; + return true; +} + +void search(std::istream& is) +{ + char buf[4096]; + next_pos = buf + sizeof(buf); + bool have_more = true; + while(have_more) + { + // how much do we copy forward from last try: + unsigned leftover = (buf + sizeof(buf)) - next_pos; + // and how much is left to fill: + unsigned size = next_pos - buf; + // copy forward whatever we have left: + memcpy(buf, next_pos, leftover); + // fill the rest from the stream: + unsigned read = is.readsome(buf + leftover, size); + // check to see if we've run out of text: + have_more = read == size; + // reset next_pos: + next_pos = buf + sizeof(buf); + // and then grep: + boost::regex_grep(grep_callback, + buf, + buf + read + leftover, + e, + boost::match_default | boost::match_partial); + } +}+
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/performance.html b/doc/performance.html new file mode 100644 index 00000000..826dd83a --- /dev/null +++ b/doc/performance.html @@ -0,0 +1,54 @@ + + + ++
+ |
+
+ Boost.Regex+Performance+ |
+
+ |
+
The performance of Boost.regex in both recursive and non-recursive modes should + be broadly comparable to other regular expression libraries: recursive mode is + slightly faster (especially where memory allocation requires thread + synchronisation), but not by much. The following pages compare + Boost.regex with various other regular expression libraries for the following + compilers:
+Visual Studio.Net 2003 (recursive Boost.regex + implementation).
+Gcc 3.2 (cygwin) (non-recursive Boost.regex + implementation).
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/posix_api.html b/doc/posix_api.html new file mode 100644 index 00000000..fdc3bba3 --- /dev/null +++ b/doc/posix_api.html @@ -0,0 +1,288 @@ + + + ++
+ |
+
+ Boost.Regex+POSIX API Compatibility Functions+ |
+
+ |
+
#include <boost/cregex.hpp> +or: +#include <boost/regex.h>+
The following functions are available for users who need a POSIX compatible C + library, they are available in both Unicode and narrow character versions, the + standard POSIX API names are macros that expand to one version or the other + depending upon whether UNICODE is defined or not. +
+Important: Note that all the symbols defined here are enclosed inside + namespace boost when used in C++ programs, unless you use #include + <boost/regex.h> instead - in which case the symbols are still defined in + namespace boost, but are made available in the global namespace as well.
+The functions are defined as: +
+extern "C" { +int regcompA(regex_tA*, const char*, int); +unsigned int regerrorA(int, const regex_tA*, char*, unsigned int); +int regexecA(const regex_tA*, const char*, unsigned int, regmatch_t*, int); +void regfreeA(regex_tA*); + +int regcompW(regex_tW*, const wchar_t*, int); +unsigned int regerrorW(int, const regex_tW*, wchar_t*, unsigned int); +int regexecW(const regex_tW*, const wchar_t*, unsigned int, regmatch_t*, int); +void regfreeW(regex_tW*); + +#ifdef UNICODE +#define regcomp regcompW +#define regerror regerrorW +#define regexec regexecW +#define regfree regfreeW +#define regex_t regex_tW +#else +#define regcomp regcompA +#define regerror regerrorA +#define regexec regexecA +#define regfree regfreeA +#define regex_t regex_tA +#endif +}+
All the functions operate on structure regex_t, which exposes two public + members: +
+unsigned int re_nsub this is filled in by regcomp and indicates + the number of sub-expressions contained in the regular expression. +
+const TCHAR* re_endp points to the end of the expression to compile when + the flag REG_PEND is set. +
+Footnote: regex_t is actually a #define - it is either regex_tA or regex_tW + depending upon whether UNICODE is defined or not, TCHAR is either char or + wchar_t again depending upon the macro UNICODE. +
+regcomp takes a pointer to a regex_t, a pointer to the expression
+ to compile and a flags parameter which can be a combination of:
+
+
+
+
+ | REG_EXTENDED | +Compiles modern regular expressions. Equivalent to + regbase::char_classes | regbase::intervals | regbase::bk_refs. | ++ |
+ | REG_BASIC | +Compiles basic (obsolete) regular expression syntax. + Equivalent to regbase::char_classes | regbase::intervals | regbase::limited_ops + | regbase::bk_braces | regbase::bk_parens | regbase::bk_refs. | ++ |
+ | REG_NOSPEC | +All characters are ordinary, the expression is a + literal string. | ++ |
+ | REG_ICASE | +Compiles for matching that ignores character case. | ++ |
+ | REG_NOSUB | +Has no effect in this library. | ++ |
+ | REG_NEWLINE | +When this flag is set a dot does not match the + newline character. | ++ |
+ | REG_PEND | +When this flag is set the re_endp parameter of the + regex_t structure must point to the end of the regular expression to compile. | ++ |
+ | REG_NOCOLLATE | +When this flag is set then locale dependent collation + for character ranges is turned off. | ++ |
+ | REG_ESCAPE_IN_LISTS + , , , + |
+ When this flag is set, then escape sequences are + permitted in bracket expressions (character sets). | ++ |
+ | REG_NEWLINE_ALT | +When this flag is set then the newline character is + equivalent to the alternation operator |. | ++ |
+ | REG_PERL | +Compiles Perl like regular expressions. | ++ |
+ | REG_AWK | +A shortcut for awk-like behavior: REG_EXTENDED | + REG_ESCAPE_IN_LISTS | ++ |
+ | REG_GREP | +A shortcut for grep like behavior: REG_BASIC | + REG_NEWLINE_ALT | ++ |
+ | REG_EGREP | +A shortcut for egrep like behavior: + REG_EXTENDED | REG_NEWLINE_ALT | ++ |
regerror takes the following parameters, it maps an error code to a human
+ readable string:
+
+
+
+ | int code | +The error code. | ++ |
+ | const regex_t* e | +The regular expression (can be null). | ++ |
+ | char* buf | +The buffer to fill in with the error message. | ++ |
+ | unsigned int buf_size | +The length of buf. | ++ |
If the error code is OR'ed with REG_ITOA then the message that results is the + printable name of the code rather than a message, for example "REG_BADPAT". If + the code is REG_ATIO then e must not be null and e->re_pend must + point to the printable name of an error code, the return value is then the + value of the error code. For any other value of code, the return value + is the number of characters in the error message, if the return value is + greater than or equal to buf_size then regerror will have to be + called again with a larger buffer.
+regexec finds the first occurrence of expression e within string buf.
+ If len is non-zero then *m is filled in with what matched the
+ regular expression, m[0] contains what matched the whole string, m[1]
+ the first sub-expression etc, see regmatch_t in the header file
+ declaration for more details. The eflags parameter can be a combination
+ of:
+
+
+
+
+ | REG_NOTBOL | +Parameter buf does not represent the start of + a line. | ++ |
+ | REG_NOTEOL | +Parameter buf does not terminate at the end of + a line. | ++ |
+ | REG_STARTEND | +The string searched starts at buf + pmatch[0].rm_so + and ends at buf + pmatch[0].rm_eo. | ++ |
Finally regfree frees all the memory that was allocated by regcomp. +
+Footnote: this is an abridged reference to the POSIX API functions, it is + provided for compatibility with other libraries, rather than an API to be used + in new code (unless you need access from a language other than C++). This + version of these functions should also happily coexist with other versions, as + the names used are macros that expand to the actual function names. +
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/redistributables.html b/doc/redistributables.html new file mode 100644 index 00000000..884fca7a --- /dev/null +++ b/doc/redistributables.html @@ -0,0 +1,84 @@ + + + ++
+ |
+
+ Boost.Regex+Redistributables and Library Names+ |
+
+ |
+
If you are using Microsoft or Borland C++ and link to a dll version of the run
+ time library, then you will also link to one of the dll versions of boost.regex.
+ While these dll's are redistributable, there are no "standard" versions, so
+ when installing on the users PC, you should place these in a directory private
+ to your application, and not in the PC's directory path. Note that if you link
+ to a static version of your run time library, then you will also link to a
+ static version of boost.regex and no dll's will need to be distributed. The
+ possible boost.regex dll and library names are computed according to the following
+ formula:
+
"boost_regex_"
+ + BOOST_LIB_TOOLSET
+ + "_"
+ + BOOST_LIB_THREAD_OPT
+ + BOOST_LIB_RT_OPT
+ + BOOST_LIB_LINK_OPT
+ + BOOST_LIB_DEBUG_OPT
+
+ These are defined as:
+
+ BOOST_LIB_TOOLSET: The compiler toolset name (vc6, vc7, bcb5 etc).
+
+ BOOST_LIB_THREAD_OPT: "s" for single thread builds,
+ "m" for multithread builds.
+
+ BOOST_LIB_RT_OPT: "s" for static runtime,
+ "d" for dynamic runtime.
+
+ BOOST_LIB_LINK_OPT: "s" for static link,
+ "i" for dynamic link.
+
+ BOOST_LIB_DEBUG_OPT: nothing for release builds,
+ "d" for debug builds,
+ "dd" for debug-diagnostic builds (_STLP_DEBUG).
+ Note: you can disable automatic library selection by defining the symbol + BOOST_REGEX_NO_LIB when compiling, this is useful if you want to statically + link even though you're using the dll version of your run time library, or if + you need to debug boost.regex. +
+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/reg_expression.html b/doc/reg_expression.html new file mode 100644 index 00000000..a1fd6b56 --- /dev/null +++ b/doc/reg_expression.html @@ -0,0 +1,46 @@ + + + ++
+ |
+
+ Boost.Regex+Class reg_expression (deprecated)+ |
+
+ |
+
The use of class template reg_expression is deprecated: use + basic_regex instead.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/regbase.html b/doc/regbase.html new file mode 100644 index 00000000..f36ce38a --- /dev/null +++ b/doc/regbase.html @@ -0,0 +1,91 @@ + + + + +
+ |
+
+Boost.Regex+ +regbase+ |
+
+ |
+
Use of the type boost::regbase
is now deprecated,
+and the type does not form a part of the
+regular expression standardization proposal. This type
+still exists as a base class of boost::basic_regex
,
+and you can still refer to
+boost::regbase::constant_name
in your code, however for
+maximum portability to other std regex implementations you should
+instead use either:
+boost::regex_constants::constant_name ++ +
or
+ ++boost::regex::constant_name ++ +
or
+ ++boost::wregex::constant_name ++ + + +
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/regex.html b/doc/regex.html new file mode 100644 index 00000000..785caf87 --- /dev/null +++ b/doc/regex.html @@ -0,0 +1,620 @@ + + + + +
+ |
+
+Boost.Regex+ +class RegEx (deprecated)+ |
+
+ |
+
The high level wrapper class RegEx is now deprecated and does +not form a part of the +regular expression standardization proposal. This type +still exists, and existing code will continue to compile, however +the following documentation is unlikely to be further updated.
+ ++#include <boost/cregex.hpp> ++ +
The class RegEx provides a high level simplified interface to +the regular expression library, this class only handles narrow +character strings, and regular expressions always follow the +"normal" syntax - that is the same as the perl / ECMAScript +synatx.
+ ++typedef bool (*GrepCallback)(const RegEx& expression); +typedef bool (*GrepFileCallback)(const char* file, const RegEx& expression); +typedef bool (*FindFilesCallback)(const char* file); + +class RegEx +{ +public: + RegEx(); + RegEx(const RegEx& o); + ~RegEx(); + RegEx(const char* c, bool icase = false); + explicit RegEx(const std::string& s, bool icase = false); + RegEx& operator=(const RegEx& o); + RegEx& operator=(const char* p); + RegEx& operator=(const std::string& s); + unsigned int SetExpression(const char* p, bool icase = false); + unsigned int SetExpression(const std::string& s, bool icase = false); + std::string Expression()const; + // + // now matching operators: + // + bool Match(const char* p, unsigned int flags = match_default); + bool Match(const std::string& s, unsigned int flags = match_default); + bool Search(const char* p, unsigned int flags = match_default); + bool Search(const std::string& s, unsigned int flags = match_default); + unsigned int Grep(GrepCallback cb, const char* p, unsigned int flags = match_default); + unsigned int Grep(GrepCallback cb, const std::string& s, unsigned int flags = match_default); + unsigned int Grep(std::vector<std::string>& v, const char* p, unsigned int flags = match_default); + unsigned int Grep(std::vector<std::string>& v, const std::string& s, unsigned int flags = match_default); + unsigned int Grep(std::vector<unsigned int>& v, const char* p, unsigned int flags = match_default); + unsigned int Grep(std::vector<unsigned int>& v, const std::string& s, unsigned int flags = match_default); + unsigned int GrepFiles(GrepFileCallback cb, const char* files, bool recurse = false, unsigned int flags = match_default); + unsigned int GrepFiles(GrepFileCallback cb, const std::string& files, bool recurse = false, unsigned int flags = match_default); + unsigned int FindFiles(FindFilesCallback cb, const char* files, bool recurse = false, unsigned int flags = match_default); + unsigned int FindFiles(FindFilesCallback cb, const std::string& files, bool recurse = false, unsigned int flags = match_default); + std::string Merge(const std::string& in, const std::string& fmt, bool copy = true, unsigned int flags = match_default); + std::string Merge(const char* in, const char* fmt, bool copy = true, unsigned int flags = match_default); + unsigned Split(std::vector<std::string>& v, std::string& s, unsigned flags = match_default, unsigned max_count = ~0); + // + // now operators for returning what matched in more detail: + // + unsigned int Position(int i = 0)const; + unsigned int Length(int i = 0)const; + bool Matched(int i = 0)const; + unsigned int Line()const; + unsigned int Marks() const; + std::string What(int i)const; + std::string operator[](int i)const ; + + static const unsigned int npos; +}; ++ +
Member functions for class RegEx are defined as follows:
+
+ | RegEx(); | +Default constructor, constructs an +instance of RegEx without any valid expression. | ++ |
+ | RegEx(const RegEx& o); | +Copy constructor, all the properties +of parameter o are copied. | ++ |
+ | RegEx(const char* c, +bool icase = false); | +Constructs an instance of RegEx, +setting the expression to c, if icase is true +then matching is insensitive to case, otherwise it is sensitive to +case. Throws bad_expression on failure. | ++ |
+ | RegEx(const std::string& s, +bool icase = false); | +Constructs an instance of RegEx, +setting the expression to s, if icase is true +then matching is insensitive to case, otherwise it is sensitive to +case. Throws bad_expression on failure. | ++ |
+ | RegEx& +operator=(const RegEx& o); | +Default assignment operator. | ++ |
+ | RegEx& +operator=(const char* p); | +Assignment operator, equivalent to +calling SetExpression(p, false). Throws +bad_expression on failure. | ++ |
+ | RegEx& +operator=(const std::string& s); | +Assignment operator, equivalent to +calling SetExpression(s, false). Throws +bad_expression on failure. | ++ |
+ | unsigned int +SetExpression(constchar* p, bool icase = +false); | +Sets the current expression to +p, if icase is true then matching is insensitive +to case, otherwise it is sensitive to case. Throws +bad_expression on failure. | ++ |
+ | unsigned int +SetExpression(const std::string& s, bool icase = +false); | +Sets the current expression to +s, if icase is true then matching is insensitive +to case, otherwise it is sensitive to case. Throws +bad_expression on failure. | ++ |
+ | std::string +Expression()const; | +Returns a copy of the current regular +expression. | ++ |
+ | bool Match(const +char* p, unsigned int flags = +match_default); | +Attempts to match the current +expression against the text p using the match flags +flags - see match flags. +Returns true if the expression matches the whole of the +input string. | ++ |
+ | bool Match(const +std::string& s, unsigned int flags = +match_default) ; | +Attempts to match the current +expression against the text s using the match flags +flags - see match flags. +Returns true if the expression matches the whole of the +input string. | ++ |
+ | bool Search(const +char* p, unsigned int flags = +match_default); | +Attempts to find a match for the +current expression somewhere in the text p using the match +flags flags - see match +flags. Returns true if the match succeeds. | ++ |
+ | bool Search(const +std::string& s, unsigned int flags = +match_default) ; | +Attempts to find a match for the +current expression somewhere in the text s using the match +flags flags - see match +flags. Returns true if the match succeeds. | ++ |
+ | unsigned int +Grep(GrepCallback cb, const char* p, unsigned +int flags = match_default); | +Finds all matches of the current
+expression in the text p using the match flags flags
+- see match flags. For each
+match found calls the call-back function cb as: cb(*this);
+
+ If at any stage the call-back function returns false then the +grep operation terminates, otherwise continues until no further +matches are found. Returns the number of matches found. + |
++ |
+ | unsigned int +Grep(GrepCallback cb, const std::string& s, +unsigned int flags = match_default); | +Finds all matches of the current
+expression in the text s using the match flags flags
+- see match flags. For each
+match found calls the call-back function cb as: cb(*this);
+
+ If at any stage the call-back function returns false then the +grep operation terminates, otherwise continues until no further +matches are found. Returns the number of matches found. + |
++ |
+ | unsigned int +Grep(std::vector<std::string>& v, const +char* p, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text p using the match flags flags +- see match flags. For each +match pushes a copy of what matched onto v. Returns the +number of matches found. | ++ |
+ | unsigned int +Grep(std::vector<std::string>& v, const +std::string& s, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text s using the match flags flags +- see match flags. For each +match pushes a copy of what matched onto v. Returns the +number of matches found. | ++ |
+ | unsigned int +Grep(std::vector<unsigned int>& v, const +char* p, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text p using the match flags flags +- see match flags. For each +match pushes the starting index of what matched onto v. +Returns the number of matches found. | ++ |
+ | unsigned int +Grep(std::vector<unsigned int>& v, const +std::string& s, unsigned int flags = +match_default); | +Finds all matches of the current +expression in the text s using the match flags flags +- see match flags. For each +match pushes the starting index of what matched onto v. +Returns the number of matches found. | ++ |
+ | unsigned int +GrepFiles(GrepFileCallback cb, const char* files, +bool recurse = false, unsigned int flags = +match_default); | +Finds all matches of the current
+expression in the files files using the match flags
+flags - see match flags. For
+each match calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering further matches in the current file, or any +further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of matches found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | unsigned int +GrepFiles(GrepFileCallback cb, const std::string& files, +bool recurse = false, unsigned int +flags = match_default); | +Finds all matches of the current
+expression in the files files using the match flags
+flags - see match flags. For
+each match calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering further matches in the current file, or any +further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of matches found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | unsigned int +FindFiles(FindFilesCallback cb, const char* files, +bool recurse = false, unsigned int +flags = match_default); | +Searches files to find all
+those which contain at least one match of the current expression
+using the match flags flags - see match flags. For each matching file
+calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering any further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of files found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | unsigned int +FindFiles(FindFilesCallback cb, const std::string& +files, bool recurse = false, unsigned +int flags = match_default); | +Searches files to find all
+those which contain at least one match of the current expression
+using the match flags flags - see match flags. For each matching file
+calls the call-back function cb.
+
+ If the call-back returns false then the algorithm returns +without considering any further files. + +The parameter files can include wild card characters '*' +and '?', if the parameter recurse is true then searches +sub-directories for matching file names. + +Returns the total number of files found. + +May throw an exception derived from std::runtime_error if file +io fails. + |
++ |
+ | std::string Merge(const +std::string& in, const std::string& fmt, bool +copy = true, unsigned int flags = +match_default); | +Performs a search and replace +operation: searches through the string in for all +occurrences of the current expression, for each occurrence replaces +the match with the format string fmt. Uses flags to +determine what gets matched, and how the format string should be +treated. If copy is true then all unmatched sections of +input are copied unchanged to output, if the flag +format_first_only is set then only the first occurance of the +pattern found is replaced. Returns the new string. See also format string syntax, match flags and format flags. | ++ |
+ | std::string Merge(const char* +in, const char* fmt, bool copy = true, +unsigned int flags = match_default); | +Performs a search and replace +operation: searches through the string in for all +occurrences of the current expression, for each occurrence replaces +the match with the format string fmt. Uses flags to +determine what gets matched, and how the format string should be +treated. If copy is true then all unmatched sections of +input are copied unchanged to output, if the flag +format_first_only is set then only the first occurance of the +pattern found is replaced. Returns the new string. See also format string syntax, match flags and format flags. | ++ |
+ | unsigned +Split(std::vector<std::string>& v, std::string& s, +unsigned flags = match_default, unsigned max_count = +~0); | +Splits the input string and pushes each one onto +the vector. If the expression contains no marked sub-expressions, +then one string is outputted for each section of the input that +does not match the expression. If the expression does contain +marked sub-expressions, then outputs one string for each marked +sub-expression each time a match occurs. Outputs no more than +max_count strings. Before returning, deletes from the input +string s all of the input that has been processed (all of +the string if max_count was not reached). Returns the number +of strings pushed onto the vector. | ++ |
+ | unsigned int +Position(int i = 0)const; | +Returns the position of what matched +sub-expression i. If i = 0 then returns the position +of the whole match. Returns RegEx::npos if the supplied index is +invalid, or if the specified sub-expression did not participate in +the match. | ++ |
+ | unsigned int +Length(int i = 0)const; | +Returns the length of what matched +sub-expression i. If i = 0 then returns the length of +the whole match. Returns RegEx::npos if the supplied index is +invalid, or if the specified sub-expression did not participate in +the match. | ++ |
+ | bool Matched(int i = +0)const; | +Returns true if sub-expression i was matched, false +otherwise. | ++ |
+ | unsigned int +Line()const; | +Returns the line on which the match +occurred, indexes start from 1 not zero, if no match occurred then +returns RegEx::npos. | ++ |
+ | unsigned int Marks() +const; | +Returns the number of marked +sub-expressions contained in the expression. Note that this +includes the whole match (sub-expression zero), so the value +returned is always >= 1. | ++ |
+ | std::string What(int +i)const; | +Returns a copy of what matched +sub-expression i. If i = 0 then returns a copy of the +whole match. Returns a null string if the index is invalid or if +the specified sub-expression did not participate in a match. | ++ |
+ | std::string +operator[](int i)const ; | +Returns what(i);
+
+ Can be used to simplify access to sub-expression matches, and +make usage more perl-like. + |
++ |
Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/regex_format.html b/doc/regex_format.html new file mode 100644 index 00000000..786353e8 --- /dev/null +++ b/doc/regex_format.html @@ -0,0 +1,213 @@ + + + + +
+ |
+
+Boost.Regex+ +Algorithm regex_format (deprecated)+ |
+
+ |
+
The algorithm regex_format is deprecated; new code should use +match_results::format instead. Existing code will continue to +compile, the following documentation is taken from the previous +version of boost.regex and will not be further updated:
+ ++#include <boost/regex.hpp> ++ +
The algorithm regex_format takes the results of a match and +creates a new string based upon a +format string, regex_format can be used for search and replace +operations:
+ ++template <class OutputIterator, class iterator, class Allocator, class charT> +OutputIterator regex_format(OutputIterator out, + const match_results<iterator, Allocator>& m, + const charT* fmt, + match_flag_type flags = 0); +template <class OutputIterator, class iterator, class Allocator, class charT> +OutputIterator regex_format(OutputIterator out, + const match_results<iterator, Allocator>& m, + const std::basic_string<charT>& fmt, + match_flag_type flags = 0); ++ +
The library also defines the following convenience variation of +regex_format, which returns the result directly as a string, rather +than outputting to an iterator [note - this version may not be +available, or may be available in a more limited form, depending +upon your compilers capabilities]:
+ ++template <class iterator, class Allocator, class charT> +std::basic_string<charT> regex_format + (const match_results<iterator, Allocator>& m, + const charT* fmt, + match_flag_type flags = 0); + +template <class iterator, class Allocator, class charT> +std::basic_string<charT> regex_format + (const match_results<iterator, Allocator>& m, + const std::basic_string<charT>& fmt, + match_flag_type flags = 0); ++ +
Parameters to the main version of the function are passed as +follows:
+ + + ++ | OutputIterator out | +An output iterator type, the output +string is sent to this iterator. Typically this would be a +std::ostream_iterator. | ++ |
+ | const +match_results<iterator, Allocator>& m | +An instance of match_results<> +obtained from one of the matching algorithms above, and denoting +what matched. | ++ |
+ | const charT* fmt | +A format string that determines how +the match is transformed into the new string. | ++ |
+ | unsigned flags | +Optional flags which describe how the +format string is to be interpreted. | ++ |
Format flags are defined as +follows:
+ + + ++ | format_all | +Enables all syntax options (perl-like +plus extentions). | ++ |
+ | format_sed | +Allows only a sed-like syntax. | ++ |
+ | format_perl | +Allows only a perl-like syntax. | ++ |
+ | format_no_copy | +Disables copying of unmatched sections +to the output string during +regex_merge operations. | ++ |
+ | format_first_only | +When this flag is set only the first occurance will be replaced +(applies to regex_merge only). | ++ |
The format string syntax (and available options) is described +more fully under format strings +.
+ + + +Revised + +17 May 2003 +
+ +© Copyright John +Maddock 1998- + +2003 +
+ +Permission to use, copy, modify, distribute and +sell this software and its documentation for any purpose is hereby +granted without fee, provided that the above copyright notice +appear in all copies and that both that copyright notice and this +permission notice appear in supporting documentation. Dr John +Maddock makes no representations about the suitability of this +software for any purpose. It is provided "as is" without express or +implied warranty.
+ + + + diff --git a/doc/regex_grep.html b/doc/regex_grep.html new file mode 100644 index 00000000..0c6f1218 --- /dev/null +++ b/doc/regex_grep.html @@ -0,0 +1,386 @@ + + + +
+ |
+
+ Boost.Regex+Algorithm regex_grep (deprecated)+ |
+
+ |
+
The algorithm regex_grep is deprecated in favor of regex_iterator + which provides a more convenient and standard library friendly interface.
+The following documentation is taken unchanged from the previous boost release, + and will not be updated in future.
++#include <boost/regex.hpp> ++
regex_grep allows you to search through a bidirectional-iterator range and + locate all the (non-overlapping) matches with a given regular expression. The + function is declared as:
++template <class Predicate, class iterator, class charT, class traits, class Allocator> +unsigned int regex_grep(Predicate foo, + iterator first, + iterator last, + const basic_regex<charT, traits, Allocator>& e, + unsigned flags = match_default) ++
The library also defines the following convenience versions, which take either + a const charT*, or a const std::basic_string<>& in place of a pair of + iterators [note - these versions may not be available, or may be available in a + more limited form, depending upon your compilers capabilities]:
++template <class Predicate, class charT, class Allocator, class traits> +unsigned int regex_grep(Predicate foo, + const charT* str, + const basic_regex<charT, traits, Allocator>& e, + unsigned flags = match_default); + +template <class Predicate, class ST, class SA, class Allocator, class charT, class traits> +unsigned int regex_grep(Predicate foo, + const std::basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator>& e, + unsigned flags = match_default); ++
The parameters for the primary version of regex_grep have the following + meanings:
+ ++ | foo | +A predicate function object or function pointer, see + below for more information. | ++ |
+ | first | +The start of the range to search. | ++ |
+ | last | +The end of the range to search. | ++ |
+ | e | +The regular expression to search for. | ++ |
+ | flags | +The flags that determine how matching is carried out, + one of the match_flags enumerators. | ++ |
The algorithm finds all of the non-overlapping matches of the expression e, for + each match it fills a match_results<iterator, + Allocator> structure, which contains information on what matched, and calls + the predicate foo, passing the match_results<iterator, Allocator> as a + single argument. If the predicate returns true, then the grep operation + continues, otherwise it terminates without searching for further matches. The + function returns the number of matches found.
+The general form of the predicate is:
++struct grep_predicate +{ + bool operator()(const match_results<iterator_type, typename expression_type::alloc_type::template rebind<sub_match<BidirectionalIterator> >::other>& m); +}; ++
Note that in almost every case the allocator parameter can be omitted, when + specifying the match_results type, + alternatively one of the typedefs cmatch, wcmatch, smatch or wsmatch can be + used.
+For example the regular expression "a*b" would find one match in the string + "aaaaab" and two in the string "aaabb".
+Remember this algorithm can be used for a lot more than implementing a version + of grep, the predicate can be and do anything that you want, grep utilities + would output the results to the screen, another program could index a file + based on a regular expression and store a set of bookmarks in a list, or a text + file conversion utility would output to file. The results of one regex_grep can + even be chained into another regex_grep to create recursive parsers.
+The algorithm may throw std::runtime_error
if the complexity
+ of matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Example: convert + the example from regex_search to use regex_grep instead:
++#include <string> +#include <map> +#include <boost/regex.hpp> + +// IndexClasses: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's +typedef std::map<std::string, int, std::less<std::string> > map_type; + +const char* re = + // possibly leading whitespace: + "^[[:space:]]*" + // possible template declaration: + "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + // class or struct: + "(class|struct)[[:space:]]*" + // leading declspec macros etc: + "(" + "\\<\\w+\\>" + "(" + "[[:blank:]]*\\([^)]*\\)" + ")?" + "[[:space:]]*" + ")*" + // the class name + "(\\<\\w*\\>)[[:space:]]*" + // template specialisation parameters + "(<[^;:{]+>)?[[:space:]]*" + // terminate in { or : + "(\\{|:[^;\\{()]*\\{)"; + +boost::regex expression(re); +class IndexClassesPred +{ + map_type& m; + std::string::const_iterator base; +public: + IndexClassesPred(map_type& a, std::string::const_iterator b) : m(a), base(b) {} + bool operator()(const smatch& what) + { + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; + } +}; +void IndexClasses(map_type& m, const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + regex_grep(IndexClassesPred(m, start), start, end, expression); +} ++
Example: Use + regex_grep to call a global callback function:
++#include <string> +#include <map> +#include <boost/regex.hpp> + +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's +typedef std::map<std::string, int, std::less<std::string> > map_type; + +const char* re = + // possibly leading whitespace: + "^[[:space:]]*" + // possible template declaration: + "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + // class or struct: + "(class|struct)[[:space:]]*" + // leading declspec macros etc: + "(" + "\\<\\w+\\>" + "(" + "[[:blank:]]*\\([^)]*\\)" + ")?" + "[[:space:]]*" + ")*" + // the class name + "(\\<\\w*\\>)[[:space:]]*" + // template specialisation parameters + "(<[^;:{]+>)?[[:space:]]*" + // terminate in { or : + "(\\{|:[^;\\{()]*\\{)"; + +boost::regex expression(re); +map_type class_index; +std::string::const_iterator base; + +bool grep_callback(const boost::smatch& what) +{ + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + class_index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; +} +void IndexClasses(const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + base = start; + regex_grep(grep_callback, start, end, expression, match_default); +} + ++
Example: use + regex_grep to call a class member function, use the standard library adapters std::mem_fun + and std::bind1st to convert the member function into a predicate:
++#include <string> +#include <map> +#include <boost/regex.hpp> +#include <functional> +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's + +typedef std::map<std::string, int, std::less<std::string> > map_type; +class class_index +{ + boost::regex expression; + map_type index; + std::string::const_iterator base; + bool grep_callback(boost::smatch what); +public: + void IndexClasses(const std::string& file); + class_index() + : index(), + expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" + "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" + "(\\{|:[^;\\{()]*\\{)" + ){} +}; +bool class_index::grep_callback(boost::smatch what) +{ + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; +} + +void class_index::IndexClasses(const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + base = start; + regex_grep(std::bind1st(std::mem_fun(&class_index::grep_callback), this), + start, + end, + expression); +} + ++
Finally, C++ + Builder users can use C++ Builder's closure type as a callback argument:
++#include <string> +#include <map> +#include <boost/regex.hpp> +#include <functional> +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's + +typedef std::map<std::string, int, std::less<std::string> > map_type; +class class_index +{ + boost::regex expression; + map_type index; + std::string::const_iterator base; + typedef boost::smatch arg_type; + bool grep_callback(const arg_type& what); +public: + typedef bool (__closure* grep_callback_type)(const arg_type&); + void IndexClasses(const std::string& file); + class_index() + : index(), + expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + "(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?" + "[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?" + "(\\{|:[^;\\{()]*\\{)" + ){} +}; + +bool class_index::grep_callback(const arg_type& what) +{ + // what[0] contains the whole string +// what[5] contains the class name. +// what[6] contains the template specialisation if any. +// add class name and position to map: +index[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - base; + return true; +} + +void class_index::IndexClasses(const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + base = start; + class_index::grep_callback_type cl = &(this->grep_callback); + regex_grep(cl, + start, + end, + expression); +} ++ +
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- + + 2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_iterator.html b/doc/regex_iterator.html new file mode 100644 index 00000000..4a24769b --- /dev/null +++ b/doc/regex_iterator.html @@ -0,0 +1,427 @@ + + + +
+ |
+
+ Boost.Regex+regex_iterator+ |
+
+ |
+
The iterator type regex_iterator will enumerate all of the regular expression + matches found in some sequence: dereferencing a regex_iterator yields a + reference to a match_results object.
++template <class BidirectionalIterator, + class charT = iterator_traits<BidirectionalIterator>::value_type, + class traits = regex_traits<charT>, + class Allocator = allocator<charT> > +class regex_iterator +{ +public: + typedef basic_regex<charT, traits, Allocator> regex_type; + typedef match_results<BidirectionalIterator> value_type; + typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; + typedef const value_type* pointer; + typedef const value_type& reference; + typedef std::forward_iterator_tag iterator_category; + + regex_iterator(); + regex_iterator(BidirectionalIterator a, BidirectionalIterator b, + const regex_type& re, + match_flag_type m = match_default); + regex_iterator(const regex_iterator&); + regex_iterator& operator=(const regex_iterator&); + bool operator==(const regex_iterator&); + bool operator!=(const regex_iterator&); + const value_type& operator*(); + const value_type* operator->(); + regex_iterator& operator++(); + regex_iterator operator++(int); +}; + ++
A regex_iterator is constructed from a pair of iterators, and enumerates all + occurrences of a regular expression within that iterator range.
++regex_iterator(); ++ +
Effects: constructs an end of sequence regex_iterator.
++regex_iterator(BidirectionalIterator a, BidirectionalIterator b, + const regex_type& re, + match_flag_type m = match_default); ++ +
Effects: constructs a regex_iterator that will enumerate all occurrences + of the expression re, within the sequence [a,b), and found + using match flags m. The object re must exist for the + lifetime of the regex_iterator.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
+regex_iterator(const regex_iterator& that); ++ +
Effects: constructs a copy of that
.
Postconditions: *this == that
.
+regex_iterator& operator=(const regex_iterator&); ++ +
Effects: sets *this
equal to those in that
.
Postconditions: *this == that
.
+bool operator==(const regex_iterator& that); ++ +
Effects: returns true if *this is equal to that.
++bool operator!=(const regex_iterator&); ++ +
Effects: returns !(*this == that)
.
+const value_type& operator*(); ++
Effects: dereferencing a regex_iterator object it yields a + const reference to a match_results object, + whose members are set as follows:
+ +
+ Element + |
+
+ Value + |
+
+ (*it).size() + |
+
+ re.mark_count() + |
+
+ (*it).empty() + |
+
+ false + |
+
+ (*it).prefix().first + |
+
+ The end of the last match found, or the start of the underlying sequence if + this is the first match enumerated + |
+
+ (*it).prefix().last + |
+
+ (*it)[0].first + |
+
+ (*it).prefix().matched + |
+
+ (*it).prefix().first != (*it).prefix().second + |
+
+ (*it).suffix().first + |
+
+ (*it)[0].second + |
+
+ (*it).suffix().last + |
+
+ The end of the underlying sequence. + |
+
+ (*it).suffix().matched + |
+
+ (*it).suffix().first != (*it).suffix().second + |
+
+ (*it)[0].first + |
+
+ The start of the sequence of characters that matched the regular expression + |
+
+ (*it)[0].second + |
+
+ The end of the sequence of characters that matched the regular expression + |
+
+ (*it)[0].matched + |
+
+
|
+
+ (*it)[n].first + |
+
+ For all integers n < (*it).size(), the start of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ (*it)[n].second + |
+
+ For all integers n < (*it).size(), the end of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ (*it)[n].matched + |
+
+ For all integers n < (*it).size(), true if sub-expression n participated + in the match, false otherwise. + |
+
(*it).position(n) | +For all integers n < (*it).size(), then the + distance from the start of the underlying sequence to the start of + sub-expression match n. | +
+const value_type* operator->(); ++ +
Effects: returns &(*this)
.
+regex_iterator& operator++(); ++
Effects: moves the iterator to the next match in the + underlying sequence, or the end of sequence iterator if none if found. + When the last match found matched a zero length string, then the + regex_iterator will find the next match as follows: if there exists a non-zero + length match that starts at the same location as the last one, then returns it, + otherwise starts looking for the next (possibly zero length) match from one + position to the right of the last match.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Returns: *this
.
+regex_iterator operator++(int); ++ +
Effects: constructs a copy result
of *this
,
+ then calls ++(*this)
.
Returns: result
.
The following example + takes a C++ source file and builds up an index of class names, and the location + of that class in the file.
++#include <string> +#include <map> +#include <fstream> +#include <iostream> +#include <boost/regex.hpp> + +using namespace std; + +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's + +typedef std::map<std::string, std::string::difference_type, std::less<std::string> > map_type; + +const char* re = + // possibly leading whitespace: + "^[[:space:]]*" + // possible template declaration: + "(template[[:space:]]*<[^;:{]+>[[:space:]]*)?" + // class or struct: + "(class|struct)[[:space:]]*" + // leading declspec macros etc: + "(" + "\\<\\w+\\>" + "(" + "[[:blank:]]*\\([^)]*\\)" + ")?" + "[[:space:]]*" + ")*" + // the class name + "(\\<\\w*\\>)[[:space:]]*" + // template specialisation parameters + "(<[^;:{]+>)?[[:space:]]*" + // terminate in { or : + "(\\{|:[^;\\{()]*\\{)"; + + +boost::regex expression(re); +map_type class_index; + +bool regex_callback(const boost::match_results<std::string::const_iterator>& what) +{ + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + class_index[what[5].str() + what[6].str()] = what.position(5); + return true; +} + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + +int main(int argc, const char** argv) +{ + std::string text; + for(int i = 1; i < argc; ++i) + { + cout << "Processing file " << argv[i] << endl; + std::ifstream fs(argv[i]); + load_file(text, fs); + // construct our iterators: + boost::regex_iterator<std::string::const_iterator> m1(text.begin(), text.end(), expression); + boost::regex_iterator<std::string::const_iterator> m2; + std::for_each(m1, m2, ®ex_callback); + // copy results: + cout << class_index.size() << " matches found" << endl; + map_type::iterator c, d; + c = class_index.begin(); + d = class_index.end(); + while(c != d) + { + cout << "class \"" << (*c).first << "\" found at index: " << (*c).second << endl; + ++c; + } + class_index.erase(class_index.begin(), class_index.end()); + } + return 0; +} ++
Revised + + 17 May 2003 +
+© Copyright John Maddock 1998- + + 2003 +
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_match.html b/doc/regex_match.html new file mode 100644 index 00000000..1345180b --- /dev/null +++ b/doc/regex_match.html @@ -0,0 +1,317 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_match+ |
+
+ |
+
#include <boost/regex.hpp>+
The algorithm regex _match determines whether a given regular expression + matches a given sequence denoted by a pair of bidirectional-iterators, the + algorithm is defined as follows, note that the result is true only if the + expression matches the whole of the input sequence, the main use of + this function is data input validation. +
template <class BidirectionalIterator, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class BidirectionalIterator, class charT, class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class charT, class Allocator, class traits, class Allocator2> +bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class ST, class SA, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class charT, class traits, class Allocator2> +bool regex_match(const charT* str, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class ST, class SA, class charT, class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); ++
template <class BidirectionalIterator, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Requires: Type BidirectionalIterator meets the requirements of a + Bidirectional Iterator (24.1.4).
+Effects: Determines whether there is an exact match between the regular + expression e, and all of the character sequence [first, last), parameter + flags is used to control how the expression + is matched against the character sequence. Returns true if such a match + exists, false otherwise.
+Throws: std::runtime_error
if the complexity of matching the
+ expression against an N character string begins to exceed O(N2), or
+ if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Postconditions: If the function returns false, then the effect on + parameter m is undefined, otherwise the effects on parameter m are + given in the table:
++
+ Element + + |
+
+ Value + + |
+
+ m.size() + |
+
+ e.mark_count() + |
+
+ m.empty() + |
+
+ false + |
+
+ m.prefix().first + |
+
+ first + |
+
+ m.prefix().last + |
+
+ first + |
+
+ m.prefix().matched + |
+
+ false + |
+
+ m.suffix().first + |
+
+ last + |
+
+ m.suffix().last + |
+
+ last + |
+
+ m.suffix().matched + |
+
+ false + |
+
+ m[0].first + |
+
+ first + |
+
+ m[0].second + |
+
+ last + |
+
+ m[0].matched + |
+
+
|
+
+ m[n].first + |
+
+ For all integers n < m.size(), the start of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].second + |
+
+ For all integers n < m.size(), the end of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].matched + |
+
+ For all integers n < m.size(), true if sub-expression n participated + in the match, false otherwise. + |
+
+
template <class BidirectionalIterator, class charT, class traits, class Allocator2> +bool regex_match(BidirectionalIterator first, BidirectionalIterator last, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Behaves "as if" by constructing an instance of
+ match_results<
BidirectionalIterator> what
,
+ and then returning the result of regex_match(first, last, what, e, flags)
.
template <class charT, class Allocator, class traits, class Allocator2> +bool regex_match(const charT* str, match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(str, str +
+ char_traits<charT>::length(str), m, e, flags)
.
template <class ST, class SA, class Allocator, class charT, + class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(s.begin(), s.end(), m, e,
+ flags)
.
template <class charT, class traits, class Allocator2> +bool regex_match(const charT* str, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(str, str +
+ char_traits<charT>::length(str), e, flags)
.
template <class ST, class SA, class charT, class traits, class Allocator2> +bool regex_match(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_match(s.begin(), s.end(), e,
+ flags)
.
+
The following example + processes an ftp response: +
+#include <stdlib.h> +#include <boost/regex.hpp> +#include <string> +#include <iostream> + +using namespace boost; + +regex expression("([0-9]+)(\\-| |$)(.*)"); + +// process_ftp: +// on success returns the ftp response code, and fills +// msg with the ftp response message. +int process_ftp(const char* response, std::string* msg) +{ + cmatch what; + if(regex_match(response, what, expression)) + { + // what[0] contains the whole string + // what[1] contains the response code + // what[2] contains the separator character + // what[3] contains the text message. + if(msg) + msg->assign(what[3].first, what[3].second); + return std::atoi(what[1].first); + } + // failure did not match + if(msg) + msg->erase(); + return -1; +} ++ ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_merge.html b/doc/regex_merge.html new file mode 100644 index 00000000..00c35d76 --- /dev/null +++ b/doc/regex_merge.html @@ -0,0 +1,47 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_merge (deprecated)+ |
+
+ |
+
Algorithm regex_merge has been renamed regex_replace, + existing code will continue to compile, but new code should use + regex_replace instead.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/doc/regex_replace.html b/doc/regex_replace.html new file mode 100644 index 00000000..1e13b553 --- /dev/null +++ b/doc/regex_replace.html @@ -0,0 +1,213 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_replace+ |
+
+ |
+
#include <boost/regex.hpp>+
The algorithm regex_replace searches through a string finding + all the matches to the regular expression: for each match it then calls + match_results::format to format the string and sends the result to the + output iterator. Sections of text that do not match are copied to the output + unchanged only if the flags parameter does not have the flag + format_no_copy set. If the flag format_first_only + is set then only the first occurrence is replaced rather than all + occurrences.
template <class OutputIterator, class BidirectionalIterator, class traits, + class Allocator, class charT> +OutputIterator regex_replace(OutputIterator out, + BidirectionalIterator first, + BidirectionalIterator last, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default); + +template <class traits, class Allocator, class charT> +basic_string<charT> regex_replace(const basic_string<charT>& s, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default); + ++
template <class OutputIterator, class BidirectionalIterator, class traits, + class Allocator, class charT> +OutputIterator regex_replace(OutputIterator out, + BidirectionalIterator first, + BidirectionalIterator last, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default);+
Effects: Finds all the non-overlapping matches m of type match_results<BidirectionalIterator>
+
that occur within the sequence [first, last). If no such matches are
+ found and !(flags & format_no_copy)
then calls std::copy(first,
+ last, out)
. Otherwise, for each match found, if !(flags &
+ format_no_copy)
calls std::copy(m.prefix().first, m.prefix().last,
+ out)
, and then calls m.format(out, fmt, flags)
. Finally
+ if !(flags & format_no_copy)
calls std::copy(last_m.suffix().first,
+ last_m,suffix().last, out)
where last_m
is a copy of the
+ last match found. If flags & format_first_only
is non-zero
+ then only the first match found is replaced.
Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Returns: out
.
+
template <class traits, class Allocator, class charT> +basic_string<charT> regex_replace(const basic_string<charT>& s, + const basic_regex<charT, traits, Allocator>& e, + const basic_string<charT>& fmt, + match_flag_type flags = match_default);+
Effects: Constructs an object basic_string<charT> result
,
+ calls regex_replace(back_inserter(result), s.begin(), s.end(), e, fmt,
+ flags)
, and then returns result
.
+
The following example + takes C/C++ source code as input, and outputs syntax highlighted HTML code.
+ +#include <fstream> +#include <sstream> +#include <string> +#include <iterator> +#include <boost/regex.hpp> +#include <fstream> +#include <iostream> + +// purpose: +// takes the contents of a file and transform to +// syntax highlighted code in html format + +boost::regex e1, e2; +extern const char* expression_text; +extern const char* format_string; +extern const char* pre_expression; +extern const char* pre_format; +extern const char* header_text; +extern const char* footer_text; + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + +int main(int argc, const char** argv) +{ + try{ + e1.assign(expression_text); + e2.assign(pre_expression); + for(int i = 1; i < argc; ++i) + { + std::cout << "Processing file " << argv[i] << std::endl; + std::ifstream fs(argv[i]); + std::string in; + load_file(in, fs); + std::string out_name(std::string(argv[i]) + std::string(".htm")); + std::ofstream os(out_name.c_str()); + os << header_text; + // strip '<' and '>' first by outputting to a + // temporary string stream + std::ostringstream t(std::ios::out | std::ios::binary); + std::ostream_iterator<char, char> oi(t); + boost::regex_replace(oi, in.begin(), in.end(), + e2, pre_format, boost::match_default | boost::format_all); + // then output to final output stream + // adding syntax highlighting: + std::string s(t.str()); + std::ostream_iterator<char, char> out(os); + boost::regex_replace(out, s.begin(), s.end(), + e1, format_string, boost::match_default | boost::format_all); + os << footer_text; + } + } + catch(...) + { return -1; } + return 0; +} + +extern const char* pre_expression = "(<)|(>)|\\r"; +extern const char* pre_format = "(?1<)(?2>)"; + + +const char* expression_text = // preprocessor directives: index 1 + "(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|" + // comment: index 2 + "(//[^\\n]*|/\\*.*?\\*/)|" + // literals: index 3 + "\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|" + // string literals: index 4 + "('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|" + // keywords: index 5 + "\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import" + "|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall" + "|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool" + "|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete" + "|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto" + "|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected" + "|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast" + "|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned" + "|using|virtual|void|volatile|wchar_t|while)\\>" + ; + +const char* format_string = "(?1<font color=\"#008040\">$&</font>)" + "(?2<I><font color=\"#000080\">$&</font></I>)" + "(?3<font color=\"#0000A0\">$&</font>)" + "(?4<font color=\"#0000FF\">$&</font>)" + "(?5<B>$&</B>)"; + +const char* header_text = "<HTML>\n<HEAD>\n" + "<TITLE>Auto-generated html formated source</TITLE>\n" + "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=windows-1252\">\n" + "</HEAD>\n" + "<BODY LINK=\"#0000ff\" VLINK=\"#800080\" BGCOLOR=\"#ffffff\">\n" + "<P> </P>\n<PRE>"; + +const char* footer_text = "</PRE>\n</BODY>\n\n"; ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_search.html b/doc/regex_search.html new file mode 100644 index 00000000..a7fcd9b8 --- /dev/null +++ b/doc/regex_search.html @@ -0,0 +1,328 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_search+ |
+
+ |
+
#include <boost/regex.hpp>+ +
The algorithm regex_search will search a range denoted by a pair of + bidirectional-iterators for a given regular expression. The algorithm uses + various heuristics to reduce the search time by only checking for a match if a + match could conceivably start at that position. The algorithm is defined as + follows: +
template <class BidirectionalIterator, + class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class ST, class SA, + class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(const basic_string<charT, ST, SA>& s, + match_results< + typename basic_string<charT, ST,SA>::const_iterator, + Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template<class charT, class Allocator, class traits, + class Allocator2> +bool regex_search(const charT* str, + match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default); + +template <class BidirectionalIterator, class Allocator, + class charT, class traits> +bool regex_search(BidirectionalIterator first, BidirectionalIterator last, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default); + +template <class charT, class Allocator, + class traits> +bool regex_search(const charT* str, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default); + +template<class ST, class SA, + class Allocator, class charT, + class traits> +bool regex_search(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default); ++
template <class BidirectionalIterator, class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(BidirectionalIterator first, BidirectionalIterator last, + match_results<BidirectionalIterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Requires: Type BidirectionalIterator meets the requirements of a + Bidirectional Iterator (24.1.4).
+Effects: Determines whether there is some sub-sequence within + [first,last) that matches the regular expression e, parameter flags + is used to control how the expression is matched against the character + sequence. Returns true if such a sequence exists, false otherwise.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Postconditions: If the function returns false, then the effect on + parameter m is undefined, otherwise the effects on parameter m are + given in the table:
+
+ Element + |
+
+ Value + + |
+
+ m.size() + |
+
+ e.mark_count() + |
+
+ m.empty() + |
+
+ false + |
+
+ m.prefix().first + |
+
+ first + |
+
+ m.prefix().last + |
+
+ m[0].first + |
+
+ m.prefix().matched + |
+
+ m.prefix().first != m.prefix().second + |
+
+ m.suffix().first + |
+
+ m[0].second + |
+
+ m.suffix().last + |
+
+ last + |
+
+ m.suffix().matched + |
+
+ m.suffix().first != m.suffix().second + |
+
+ m[0].first + |
+
+ The start of the sequence of characters that matched the regular expression + |
+
+ m[0].second + |
+
+ The end of the sequence of characters that matched the regular expression + |
+
+ m[0].matched + |
+
+
|
+
+ m[n].first + |
+
+ For all integers n < m.size(), the start of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].second + |
+
+ For all integers n < m.size(), the end of the sequence that matched + sub-expression n. Alternatively, if sub-expression n did not participate + in the match, then last. + |
+
+ m[n].matched + |
+
+ For all integers n < m.size(), true if sub-expression n participated + in the match, false otherwise. + |
+
template <class charT, class Allocator, class traits, class Allocator2> +bool regex_search(const charT* str, match_results<const charT*, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(str, str +
+ char_traits<charT>::length(str), m, e, flags)
.
template <class ST, class SA, class Allocator, class charT, + class traits, class Allocator2> +bool regex_search(const basic_string<charT, ST, SA>& s, + match_results<typename basic_string<charT, ST, SA>::const_iterator, Allocator>& m, + const basic_regex<charT, traits, Allocator2>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(s.begin(), s.end(), m,
+ e, flags)
.
template <class iterator, class Allocator, class charT, + class traits> +bool regex_search(iterator first, iterator last, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default);+
Effects: Behaves "as if" by constructing an instance of
+ match_results<
BidirectionalIterator> what
,
+ and then returning the result of regex_search(first, last, what, e, flags)
.
template <class charT, class Allocator, class traits> +bool regex_search(const charT* str + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(str, str +
+ char_traits<charT>::length(str), e, flags)
.
template <class ST, class SA, class Allocator, class charT, + class traits> +bool regex_search(const basic_string<charT, ST, SA>& s, + const basic_regex<charT, traits, Allocator>& e, + match_flag_type flags = match_default);+
Effects: Returns the result of regex_search(s.begin(), s.end(), e,
+ flags)
.
+
The following example, + takes the contents of a file in the form of a string, and searches for all the + C++ class declarations in the file. The code will work regardless of the way + that std::string is implemented, for example it could easily be modified to + work with the SGI rope class, which uses a non-contiguous storage strategy.
+ +#include <string> +#include <map> +#include <boost/regex.hpp> + +// purpose: +// takes the contents of a file in the form of a string +// and searches for all the C++ class definitions, storing +// their locations in a map of strings/int's +typedef std::map<std::string, int, std::less<std::string> > map_type; + +boost::regex expression("^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\\<\\w+\\>([[:blank:]]*\\([^)]*\\))?[[:space:]]*)*(\\<\\w*\\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\\{|:[^;\\{()]*\\{)"); + +void IndexClasses(map_type& m, const std::string& file) +{ + std::string::const_iterator start, end; + start = file.begin(); + end = file.end(); + boost::match_results<std::string::const_iterator> what; + unsigned int flags = boost::match_default; + while(regex_search(start, end, what, expression, flags)) + { + // what[0] contains the whole string + // what[5] contains the class name. + // what[6] contains the template specialisation if any. + // add class name and position to map: + m[std::string(what[5].first, what[5].second) + std::string(what[6].first, what[6].second)] = + what[5].first - file.begin(); + // update search position: + start = what[0].second; + // update flags: + flags |= boost::match_prev_avail; + flags |= boost::match_not_bob; + } +} ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_split.html b/doc/regex_split.html new file mode 100644 index 00000000..e1eba954 --- /dev/null +++ b/doc/regex_split.html @@ -0,0 +1,148 @@ + + + ++
+ |
+
+ Boost.Regex+Algorithm regex_split (deprecated)+ |
+
+ |
+
The algorithm regex_split has been deprecated in favor of the iterator + regex_token_iterator which has a more flexible and powerful interface, + as well as following the more usual standard library "pull" rather than "push" + semantics.
+Code which uses regex_split will continue to compile, the following + documentation is taken from the previous boost.regex version:
+#include <boost/regex.hpp>+
Algorithm regex_split performs a similar operation to the perl split operation, + and comes in three overloaded forms: +
+template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2> +std::size_t regex_split(OutputIterator out, + std::basic_string<charT, Traits1, Alloc1>& s, + const basic_regex<charT, Traits2, Alloc2>& e, + unsigned flags, + std::size_t max_split); + +template <class OutputIterator, class charT, class Traits1, class Alloc1, class Traits2, class Alloc2> +std::size_t regex_split(OutputIterator out, + std::basic_string<charT, Traits1, Alloc1>& s, + const basic_regex<charT, Traits2, Alloc2>& e, + unsigned flags = match_default); + +template <class OutputIterator, class charT, class Traits1, class Alloc1> +std::size_t regex_split(OutputIterator out, + std::basic_string<charT, Traits1, Alloc1>& s);+
Effects: Each version of the algorithm takes an + output-iterator for output, and a string for input. If the expression contains + no marked sub-expressions, then the algorithm writes one string onto the + output-iterator for each section of input that does not match the expression. + If the expression does contain marked sub-expressions, then each time a match + is found, one string for each marked sub-expression will be written to the + output-iterator. No more than max_split strings will be written to the + output-iterator. Before returning, all the input processed will be deleted from + the string s (if max_split is not reached then all of s will + be deleted). Returns the number of strings written to the output-iterator. If + the parameter max_split is not specified then it defaults to UINT_MAX. + If no expression is specified, then it defaults to "\s+", and splitting occurs + on whitespace. +
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
Example: the + following function will split the input string into a series of tokens, and + remove each token from the string s: +
+unsigned tokenise(std::list<std::string>& l, std::string& s) +{ + return boost::regex_split(std::back_inserter(l), s); +}+
Example: the + following short program will extract all of the URL's from a html file, and + print them out to cout: +
+#include <list> +#include <fstream> +#include <iostream> +#include <boost/regex.hpp> + +boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"", + boost::regbase::normal | boost::regbase::icase); + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + // + // attempt to grow string buffer to match file size, + // this doesn't always work... + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + // use logarithmic growth stategy, in case + // in_avail (above) returned zero: + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + + +int main(int argc, char** argv) +{ + std::string s; + std::list<std::string> l; + + for(int i = 1; i < argc; ++i) + { + std::cout << "Findings URL's in " << argv[i] << ":" << std::endl; + s.erase(); + std::ifstream is(argv[i]); + load_file(s, is); + boost::regex_split(std::back_inserter(l), s, e); + while(l.size()) + { + s = *(l.begin()); + l.pop_front(); + std::cout << s << std::endl; + } + } + return 0; +}+
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_token_iterator.html b/doc/regex_token_iterator.html new file mode 100644 index 00000000..03e2e64e --- /dev/null +++ b/doc/regex_token_iterator.html @@ -0,0 +1,286 @@ + + + ++
+ |
+
+ Boost.Regex+regex_token_iterator+ |
+
+ |
+
The template class regex_token_iterator
is an iterator adapter;
+ that is to say it represents a new view of an existing iterator sequence, by
+ enumerating all the occurrences of a regular expression within that sequence,
+ and presenting one or more new strings for each match found. Each position
+ enumerated by the iterator is a string that represents what matched a
+ particular sub-expression within the regular expression. When class regex_token_iterator
+ is used to enumerate a single sub-expression with index -1, then the iterator
+ performs field splitting: that is to say it enumerates one string for each
+ section of the character container sequence that does not match the regular
+ expression specified.
+template <class BidirectionalIterator, + class charT = iterator_traits<BidirectionalIterator>::value_type, + class traits = regex_traits<charT>, + class Allocator = allocator<charT> > +class regex_token_iterator +{ +public: + typedef basic_regex<charT, traits, Allocator> regex_type; + typedef basic_string<charT> value_type; + typedef typename iterator_traits<BidirectionalIterator>::difference_type difference_type; + typedef const value_type* pointer; + typedef const value_type& reference; + typedef std::forward_iterator_tag iterator_category; + + regex_token_iterator(); + regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + int submatch = 0, match_flag_type m = match_default); + regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const std::vector<int>& submatches, match_flag_type m = match_default); + template <std::size_t N> + regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const int (&submatches)[N], match_flag_type m = match_default); + regex_token_iterator(const regex_token_iterator&); + regex_token_iterator& operator=(const regex_token_iterator&); + bool operator==(const regex_token_iterator&); + bool operator!=(const regex_token_iterator&); + const value_type& operator*(); + const value_type* operator->(); + regex_token_iterator& operator++(); + regex_token_iterator operator++(int); +}; ++
regex_token_iterator();+
Effects: constructs an end of sequence iterator.
+regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + int submatch = 0, match_flag_type m = match_default);+
Preconditions: !re.empty()
.
Effects: constructs a regex_token_iterator that will enumerate one + string for each regular expression match of the expression re found + within the sequence [a,b), using match flags m. The + string enumerated is the sub-expression submatch for each match + found; if submatch is -1, then enumerates all the text sequences that + did not match the expression re (that is to performs field splitting).
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const std::vector<int>& submatches, match_flag_type m = match_default);+
Preconditions: submatches.size() && !re.empty()
.
Effects: constructs a regex_token_iterator that will enumerate submatches.size() + strings for each regular expression match of the expression re found + within the sequence [a,b), using match flags m. For + each match found one string will be enumerated for each sub-expression + index contained within submatches vector; if submatches[0] + is -1, then the first string enumerated for each match will be all of the text + from end of the last match to the start of the current match, in addition there + will be one extra string enumerated when no more matches can be found: from the + end of the last match found, to the end of the underlying sequence.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
template <std::size_t N> +regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re, + const int (&submatches)[R], match_flag_type m = match_default);+
Preconditions: !re.empty()
.
Effects: constructs a regex_token_iterator that will + enumerate R strings for each regular expression match of the + expression re found within the sequence [a,b), using match + flags m. For each match found one string will be + enumerated for each sub-expression index contained within the submatches + array; if submatches[0] is -1, then the first string enumerated + for each match will be all of the text from end of the last match to the start + of the current match, in addition there will be one extra string enumerated + when no more matches can be found: from the end of the last match found, to the + end of the underlying sequence.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
regex_token_iterator(const regex_token_iterator& that);+
Effects: constructs a copy of that
.
Postconditions: *this == that
.
regex_token_iterator& operator=(const regex_token_iterator& that);+
Effects: sets *this
to be equal to that
.
Postconditions: *this == that
.
bool operator==(const regex_token_iterator&);+
+ Effects: returns true if *this is the same position as that.
+bool operator!=(const regex_token_iterator&);+
+ Effects: returns !(*this == that)
.
const value_type& operator*();+
+ Effects: returns the current string being enumerated.
+const value_type* operator->();+
+ Effects: returns &(*this)
.
regex_token_iterator& operator++();+
+ Effects: Moves on to the next string to be enumerated.
+Throws: std::runtime_error
if the complexity of
+ matching the expression against an N character string begins to exceed O(N2),
+ or if the program runs out of stack space while matching the expression (if
+ Boost.regex is configured in recursive mode),
+ or if the matcher exhausts it's permitted memory allocation (if Boost.regex is
+ configured in non-recursive mode).
+ Returns: *this
.
regex_token_iterator& operator++(int);+
Effects: constructs a copy result
of *this
,
+ then calls ++(*this)
.
The following example + takes a string and splits it into a series of tokens:
++#include <iostream> +#include <boost/regex.hpp> + +using namespace std; + +int main(int argc) +{ + string s; + do{ + if(argc == 1) + { + cout << "Enter text to split (or \"quit\" to exit): "; + getline(cin, s); + if(s == "quit") break; + } + else + s = "This is a string of tokens"; + + boost::regex re("\\s+"); + boost::regex_token_iterator<std::string::const_iterator> i(s.begin(), s.end(), re, -1); + boost::regex_token_iterator<std::string::const_iterator> j; + + unsigned count = 0; + while(i != j) + { + cout << *i++ << endl; + count++; + } + cout << "There were " << count << " tokens found." << endl; + + }while(argc == 1); + return 0; +} + ++
The following example + takes a html file and outputs a list of all the linked files:
++#include <fstream> +#include <iostream> +#include <iterator> +#include <boost/regex.hpp> + +boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"", + boost::regex::normal | boost::regbase::icase); + +void load_file(std::string& s, std::istream& is) +{ + s.erase(); + // + // attempt to grow string buffer to match file size, + // this doesn't always work... + s.reserve(is.rdbuf()->in_avail()); + char c; + while(is.get(c)) + { + // use logarithmic growth stategy, in case + // in_avail (above) returned zero: + if(s.capacity() == s.size()) + s.reserve(s.capacity() * 3); + s.append(1, c); + } +} + +int main(int argc, char** argv) +{ + std::string s; + int i; + for(i = 1; i < argc; ++i) + { + std::cout << "Findings URL's in " << argv[i] << ":" << std::endl; + s.erase(); + std::ifstream is(argv[i]); + load_file(s, is); + boost::regex_token_iterator<std::string::const_iterator> + i(s.begin(), s.end(), e, 1); + boost::regex_token_iterator<std::string::const_iterator> j; + while(i != j) + { + std::cout << *i++ << std::endl; + } + } + // + // alternative method: + // test the array-literal constructor, and split out the whole + // match as well as $1.... + // + for(i = 1; i < argc; ++i) + { + std::cout << "Findings URL's in " << argv[i] << ":" << std::endl; + s.erase(); + std::ifstream is(argv[i]); + load_file(s, is); + const int subs[] = {1, 0,}; + boost::regex_token_iterator<std::string::const_iterator> + i(s.begin(), s.end(), e, subs); + boost::regex_token_iterator<std::string::const_iterator> j; + while(i != j) + { + std::cout << *i++ << std::endl; + } + } + + return 0; +} ++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + diff --git a/doc/regex_traits.html b/doc/regex_traits.html new file mode 100644 index 00000000..a359e2e9 --- /dev/null +++ b/doc/regex_traits.html @@ -0,0 +1,48 @@ + + + ++
+ |
+
+ Boost.Regex+class regex_traits+ |
+
+ |
+
Under construction.
+The current boost.regex traits class design will be migrated to that specified + in the regular + expression standardization proposal.
++
Revised + + 17 May 2003 + +
+© Copyright John Maddock 1998- 2003
+Permission to use, copy, modify, distribute and sell this software + and its documentation for any purpose is hereby granted without fee, provided + that the above copyright notice appear in all copies and that both that + copyright notice and this permission notice appear in supporting documentation. + Dr John Maddock makes no representations about the suitability of this software + for any purpose. It is provided "as is" without express or implied warranty.
+ + + diff --git a/example/Jamfile b/example/Jamfile index f57f7a32..1392b2f8 100644 --- a/example/Jamfile +++ b/example/Jamfile @@ -38,11 +38,14 @@ test-suite regex-examples : [ regex-test-run snippets/regex_grep_example_4.cpp : $(BOOST_ROOT)/boost/rational.hpp ] [ regex-test-run snippets/regex_match_example.cpp : -auto ] [ regex-test-run snippets/regex_merge_example.cpp : $(BOOST_ROOT)/boost/rational.hpp ] +[ regex-test-run snippets/regex_replace_example.cpp : $(BOOST_ROOT)/boost/rational.hpp ] [ regex-test-run snippets/regex_search_example.cpp : $(BOOST_ROOT)/boost/rational.hpp ] [ regex-test-run snippets/regex_split_example_1.cpp : -auto ] -[ regex-test-run snippets/regex_split_example_2.cpp : $(BOOST_ROOT)/libs/regex/index.htm ] +[ regex-test-run snippets/regex_split_example_2.cpp : $(BOOST_ROOT)/libs/regex/doc/index.html ] ; + + diff --git a/example/jgrep/jgrep.cpp b/example/jgrep/jgrep.cpp index 696a18b0..4e4caad2 100644 --- a/example/jgrep/jgrep.cpp +++ b/example/jgrep/jgrep.cpp @@ -19,6 +19,7 @@ */ #include