From 5a2904790698ae60d8a1c01018f0831fa4ea0e81 Mon Sep 17 00:00:00 2001
From: John Maddock Effects: constructs an end of sequence iterator. Preconditions: Preconditions: Effects: constructs a regex_token_iterator that will enumerate one
string for each regular expression match of the expression re found
within the sequence [a,b), using match flags m. The
@@ -99,7 +100,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
configured in non-recursive mode). Preconditions: Preconditions: Effects: constructs a regex_token_iterator that will enumerate submatches.size()
strings for each regular expression match of the expression re found
within the sequence [a,b), using match flags m. For
@@ -118,7 +120,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
Preconditions: Preconditions: Effects: constructs a regex_token_iterator that will
enumerate R strings for each regular expression match of the
expression re found within the sequence [a,b), using match
diff --git a/doc/Attic/syntax.html b/doc/Attic/syntax.html
index a88b2a79..d7e048a8 100644
--- a/doc/Attic/syntax.html
+++ b/doc/Attic/syntax.html
@@ -91,18 +91,18 @@
Parentheses serve two purposes, to group items together into a sub-expression,
and to mark what generated the match. For example the expression "(ab)*" would
match all of the string "ababab". The matching algorithms
- regex_match and regex_search
- each take an instance of match_results
- that reports what caused the match, on exit from these functions the
- match_results contains information both on what the whole expression
- matched and on what each sub-expression matched. In the example above
- match_results[1] would contain a pair of iterators denoting the final "ab" of
- the matching string. It is permissible for sub-expressions to match null
- strings. If a sub-expression takes no part in a match - for example if it is
- part of an alternative that is not taken - then both of the iterators that are
- returned for that sub-expression point to the end of the input string, and the matched
- parameter for that sub-expression is false. Sub-expressions are indexed
- from left to right starting from 1, sub-expression 0 is the whole expression.
+ regex_match and regex_search each take
+ an instance of match_results that reports what
+ caused the match, on exit from these functions the match_results
+ contains information both on what the whole expression matched and on what each
+ sub-expression matched. In the example above match_results[1] would contain a
+ pair of iterators denoting the final "ab" of the matching string. It is
+ permissible for sub-expressions to match null strings. If a sub-expression
+ takes no part in a match - for example if it is part of an alternative that is
+ not taken - then both of the iterators that are returned for that
+ sub-expression point to the end of the input string, and the matched parameter
+ for that sub-expression is false. Sub-expressions are indexed from left
+ to right starting from 1, sub-expression 0 is the whole expression.
A set is a set of characters that can match any single character that is a
member of the set. Sets are delimited by "[" and "]" and can contain literals,
character ranges, character classes, collating elements and equivalence
- classes. Set declarations that start with "^" contain the compliment of the
+ classes. Set declarations that start with "^" contain the complement of the
elements that follow.
Examples:
@@ -293,7 +293,7 @@
[^[.ae.]] would only match one character.
- Equivalence classes take the general form[=tagname=] inside a set declaration,
+ Equivalence classes take the generalform[=tagname=] inside a set declaration,
where tagname is either a single character, or a name of a collating
element, and matches any character that is a member of the same primary
equivalence class as the collating element [.tagname.]. An equivalence class is
@@ -302,7 +302,7 @@
typically collated by character, then by accent, and then by case; the primary
sort key then relates to the character, the secondary to the accentation, and
the tertiary to the case). If there is no equivalence class corresponding to tagname
- , then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
+ ,then[=tagname=] is exactly the same as [.tagname.]. Unfortunately there is no
locale independent method of obtaining the primary sort key for a character,
except under Win32. For other operating systems the library will "guess" the
primary sort key from the full sort key (obtained from strxfrm), so
@@ -666,106 +666,103 @@
- When the expression is compiled as a Perl-compatible regex then the matching
- algorithms will perform a depth first search on the state machine and report
- the first match found.Description
@@ -84,7 +84,8 @@ typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
int submatch = 0, match_flag_type m = match_default);
- !re.empty()
.!re.empty()
. Object re shall exist
+ for the lifetime of the iterator constructed from it.regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
const std::vector<int>& submatches, match_flag_type m = match_default);
- submatches.size() && !re.empty()
.submatches.size() && !re.empty()
.
+ Object re shall exist for the lifetime of the iterator constructed from it.template <std::size_t N>
regex_token_iterator(BidirectionalIterator a, BidirectionalIterator b, const regex_type& re,
const int (&submatches)[R], match_flag_type m = match_default);
- !re.empty()
.!re.empty()
. Object re shall exist
+ for the lifetime of the iterator constructed from it.Non-Marking Parenthesis
@@ -143,7 +143,7 @@
What gets matched?
- When the expression is compiled as a POSIX-compatible regex then the matching - algorithms will match the first possible matching string, if more than one - string starting at a given location can match then it matches the longest - possible string, unless the flag match_any is set, in which case the first - match encountered is returned. Use of the match_any option can reduce the time - taken to find the match - but is only useful if the user is less concerned - about what matched - for example it would not be suitable for search and - replace operations. In cases where their are multiple possible matches all - starting at the same location, and all of the same length, then the match - chosen is the one with the longest first sub-expression, if that is the same - for two or more matches, then the second sub-expression will be examined and so - on. -
- The following table examples illustrate the main differences between Perl and - POSIX regular expression matching rules: + When the expression is compiled as a POSIX-compatible regex then the matching + algorithms will match the first possible matching string, if more than one + string starting at a given location can match then it matches the longest + possible string, unless the flag match_any is set, in which case the first + match encountered is returned. Use of the match_any option can reduce the time + taken to find the match - but is only useful if the user is less concerned + about what matched - for example it would not be suitable for search and + replace operations. In cases where their are multiple possible matches all + starting at the same location, and all of the same length, then the match + chosen is the one with the longest first sub-expression, if that is the same + for two or more matches, then the second sub-expression will be examined and so + on.
-
- Expression - |
-
- Text - |
-
- POSIX leftmost longest match - |
-
- ECMAScript depth first search match - |
-
-
|
-
-
|
-
-
|
-
-
|
-
-
|
-
-
|
-
- $0 = " abc def xyz " |
-
- $0 = " abc def xyz " |
-
-
|
-
-
|
-
-
|
-
-
|
-
+
+ Expression + |
+
+ Text + |
+
+ POSIX leftmost longest match + |
+
+ ECMAScript depth first search match + |
+
+
|
+
+
|
+
+
|
+
+
|
+
+
|
+
+
|
+
+ $0 = " abc def xyz " |
+
+ $0 = " abc def xyz " |
+
+
|
+
+
|
+
+
|
+
+
|
+
These differences between Perl matching rules, and POSIX matching rules, mean that these two regular expression syntaxes differ not only in the features offered, but also in the form that the state machine takes and/or the - algorithms used to traverse the state machine.
-Revised 24 Oct 2003
© Copyright John Maddock 1998- - - 2003
+ 2003Use, modification and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)