From 91b21e78ffa883f45cb0cc42be9fd59ea642cba1 Mon Sep 17 00:00:00 2001 From: John Maddock Date: Sat, 16 Apr 2005 16:06:45 +0000 Subject: [PATCH] Added better documentation on POSIX-conformance. [SVN r28278] --- doc/Attic/history.html | 3 +++ doc/Attic/syntax_basic.html | 6 ++++-- doc/Attic/syntax_extended.html | 18 ++++++++++------ doc/Attic/syntax_option_type.html | 36 ++++++++++++++++++++++++++----- doc/history.html | 3 +++ doc/syntax_basic.html | 6 ++++-- doc/syntax_extended.html | 18 ++++++++++------ doc/syntax_option_type.html | 36 ++++++++++++++++++++++++++----- 8 files changed, 98 insertions(+), 28 deletions(-) diff --git a/doc/Attic/history.html b/doc/Attic/history.html index 6f3618e5..c3a4907b 100644 --- a/doc/Attic/history.html +++ b/doc/Attic/history.html @@ -29,6 +29,9 @@
  • Completely rewritten expression parsing code, and traits class support; now conforms to the standardization proposal. +
  • + POSIX-extended and POSIX-basic regular expressions now enforce the letter of + the POSIX standard much more closely than before.
  • Added support for (?imsx-imsx) constructs.
  • diff --git a/doc/Attic/syntax_basic.html b/doc/Attic/syntax_basic.html index cce1434b..f781948c 100644 --- a/doc/Attic/syntax_basic.html +++ b/doc/Attic/syntax_basic.html @@ -128,13 +128,16 @@ aaaa point of a range, for example: [[.ae.]-c] matches the character sequence "ae", plus any single character in the rangle "ae"-c, assuming that "ae" is treated as a single collating element in the current locale.

    +

    Collating elements may be used in place of escapes (which are not normally + allowed inside character sets), for example [[.^.]abc] would match either one + of the characters 'abc^'.

    As an extension, a collating element may also be specified via its symbolic name, for example:

    [[.NUL.]]

    matches a NUL character.

    Equivalence classes:

    - An expression of the form[[=col=]], matches any character or collating element + An expression of theform[[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with collating elements the name col may be a symbolic name.  A primary sort key is one that ignores case, @@ -233,4 +236,3 @@ aaaa - diff --git a/doc/Attic/syntax_extended.html b/doc/Attic/syntax_extended.html index bfba568a..d9253166 100644 --- a/doc/Attic/syntax_extended.html +++ b/doc/Attic/syntax_extended.html @@ -130,10 +130,11 @@ aaaa

    For example [a-c] will match any single character in the range 'a' to 'c'.  By default, for POSIX-Extended regular expressions, a character x is within the range y to z, if it collates within that - range; this results in locale specific behavior.  This behavior can - be turned off by unsetting the collate - option flag - in which case whether a character appears within a range is - determined by comparing the code points of the characters only

    + range; this results in locale specific behavior .  + This behavior can be turned off by unsetting the + collate option flag - in which case whether a character appears + within a range is determined by comparing the code points of the characters + only.

    Negation:

    If the bracket-expression begins with the ^ character, then it matches the complement of the characters it contains, for example [^a-c] matches any @@ -149,13 +150,16 @@ aaaa point of a range, for example: [[.ae.]-c] matches the character sequence "ae", plus any single character in the range "ae"-c, assuming that "ae" is treated as a single collating element in the current locale.

    +

    Collating elements may be used in place of escapes (which are not normally + allowed inside character sets), for example [[.^.]abc] would match either one + of the characters 'abc^'.

    As an extension, a collating element may also be specified via its symbolic name, for example:

    [[.NUL.]]

    matches a NUL character.

    Equivalence classes:

    - An expression of theform[[=col=]], matches any character or collating element + An expression oftheform[[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with colating elements the name col may be a symbolic name.  A primary sort key is one that ignores case, @@ -177,9 +181,9 @@ aaaa

  • The effect of any ordinary character being preceded by an escape is undefined.
  • - An escape inside a character class declaration shall match itself (in other + An escape inside a character class declaration shall match itself: in other words the escape character is not "special" inside a character class - declaration).
  • + declaration; so [\^] will match either a literal '\' or a '^'.

    However, that's rather restrictive, so the following standard-compatible extensions are also supported by Boost.Regex:

    diff --git a/doc/Attic/syntax_option_type.html b/doc/Attic/syntax_option_type.html index 7ea438d7..e346af32 100644 --- a/doc/Attic/syntax_option_type.html +++ b/doc/Attic/syntax_option_type.html @@ -153,7 +153,7 @@ static const syntax_option_type collate; speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output.  This currently has no effect for - boost.regex.

    + Boost.Regex.

    @@ -250,8 +250,9 @@ static const syntax_option_type collate;

    That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted.

    In addition some perl-style escape sequences are supported (actually the awk - syntax requires \a \b \t \v \f \n and \r to be recognised, but other - escape sequences invoke undefined behavior according to the POSIX standard).

    + syntax only requires \a \b \t \v \f \n and \r to be recognised, all other + Perl-style escape sequences invoke undefined behavior according to the POSIX + standard, but are in fact recognised by Boost.Regex).

    @@ -297,7 +298,10 @@ static const syntax_option_type collate; collate Yes -

    Specifies that character ranges of the form "[a-b]" should be locale sensitive.

    +

    Specifies that character ranges of the form "[a-b]" should be locale + sensitive.  This bit is on by default for + POSIX-Extended regular expressions, but can be unset to force ranges to be + compared by code point only.

    @@ -307,6 +311,21 @@ static const syntax_option_type collate; operator |.  Allows newline separated lists to be used as a list of alternatives. + + no_escape_in_lists + No + When set this makes the escape character ordinary inside lists, so that [\b] + would match either '\' or 'b'. This bit is one by default for + POSIX-Extended regular expressions, but can be unset to force escapes to be + recognised inside lists. + + + no_bk_refs + No + When set then backreferences are disabled.  This bit is + on by default for POSIX-Extended regular expressions, but can be + unset to support for backreferences on. +

    Options for POSIX Basic Regular Expressions:

    @@ -415,6 +434,14 @@ static const syntax_option_type collate; No When set then character classes such as [[:alnum:]] are not allowed. + + no_escape_in_lists + No + When set this makes the escape character ordinary inside lists, so that [\b] + would match either '\' or 'b'. This bit is one by default for + POSIX-basic regular expressions, but can be unset to force escapes to be + recognised inside lists. + no_intervals No @@ -492,4 +519,3 @@ static const syntax_option_type collate; or copy at http://www.boost.org/LICENSE_1_0.txt)

    - diff --git a/doc/history.html b/doc/history.html index 6f3618e5..c3a4907b 100644 --- a/doc/history.html +++ b/doc/history.html @@ -29,6 +29,9 @@
  • Completely rewritten expression parsing code, and traits class support; now conforms to the standardization proposal. +
  • + POSIX-extended and POSIX-basic regular expressions now enforce the letter of + the POSIX standard much more closely than before.
  • Added support for (?imsx-imsx) constructs.
  • diff --git a/doc/syntax_basic.html b/doc/syntax_basic.html index cce1434b..f781948c 100644 --- a/doc/syntax_basic.html +++ b/doc/syntax_basic.html @@ -128,13 +128,16 @@ aaaa point of a range, for example: [[.ae.]-c] matches the character sequence "ae", plus any single character in the rangle "ae"-c, assuming that "ae" is treated as a single collating element in the current locale.

    +

    Collating elements may be used in place of escapes (which are not normally + allowed inside character sets), for example [[.^.]abc] would match either one + of the characters 'abc^'.

    As an extension, a collating element may also be specified via its symbolic name, for example:

    [[.NUL.]]

    matches a NUL character.

    Equivalence classes:

    - An expression of the form[[=col=]], matches any character or collating element + An expression of theform[[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with collating elements the name col may be a symbolic name.  A primary sort key is one that ignores case, @@ -233,4 +236,3 @@ aaaa - diff --git a/doc/syntax_extended.html b/doc/syntax_extended.html index bfba568a..d9253166 100644 --- a/doc/syntax_extended.html +++ b/doc/syntax_extended.html @@ -130,10 +130,11 @@ aaaa

    For example [a-c] will match any single character in the range 'a' to 'c'.  By default, for POSIX-Extended regular expressions, a character x is within the range y to z, if it collates within that - range; this results in locale specific behavior.  This behavior can - be turned off by unsetting the collate - option flag - in which case whether a character appears within a range is - determined by comparing the code points of the characters only

    + range; this results in locale specific behavior .  + This behavior can be turned off by unsetting the + collate option flag - in which case whether a character appears + within a range is determined by comparing the code points of the characters + only.

    Negation:

    If the bracket-expression begins with the ^ character, then it matches the complement of the characters it contains, for example [^a-c] matches any @@ -149,13 +150,16 @@ aaaa point of a range, for example: [[.ae.]-c] matches the character sequence "ae", plus any single character in the range "ae"-c, assuming that "ae" is treated as a single collating element in the current locale.

    +

    Collating elements may be used in place of escapes (which are not normally + allowed inside character sets), for example [[.^.]abc] would match either one + of the characters 'abc^'.

    As an extension, a collating element may also be specified via its symbolic name, for example:

    [[.NUL.]]

    matches a NUL character.

    Equivalence classes:

    - An expression of theform[[=col=]], matches any character or collating element + An expression oftheform[[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with colating elements the name col may be a symbolic name.  A primary sort key is one that ignores case, @@ -177,9 +181,9 @@ aaaa

  • The effect of any ordinary character being preceded by an escape is undefined.
  • - An escape inside a character class declaration shall match itself (in other + An escape inside a character class declaration shall match itself: in other words the escape character is not "special" inside a character class - declaration).
  • + declaration; so [\^] will match either a literal '\' or a '^'.

    However, that's rather restrictive, so the following standard-compatible extensions are also supported by Boost.Regex:

    diff --git a/doc/syntax_option_type.html b/doc/syntax_option_type.html index 7ea438d7..e346af32 100644 --- a/doc/syntax_option_type.html +++ b/doc/syntax_option_type.html @@ -153,7 +153,7 @@ static const syntax_option_type collate; speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output.  This currently has no effect for - boost.regex.

    + Boost.Regex.

    @@ -250,8 +250,9 @@ static const syntax_option_type collate;

    That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted.

    In addition some perl-style escape sequences are supported (actually the awk - syntax requires \a \b \t \v \f \n and \r to be recognised, but other - escape sequences invoke undefined behavior according to the POSIX standard).

    + syntax only requires \a \b \t \v \f \n and \r to be recognised, all other + Perl-style escape sequences invoke undefined behavior according to the POSIX + standard, but are in fact recognised by Boost.Regex).

    @@ -297,7 +298,10 @@ static const syntax_option_type collate; collate Yes -

    Specifies that character ranges of the form "[a-b]" should be locale sensitive.

    +

    Specifies that character ranges of the form "[a-b]" should be locale + sensitive.  This bit is on by default for + POSIX-Extended regular expressions, but can be unset to force ranges to be + compared by code point only.

    @@ -307,6 +311,21 @@ static const syntax_option_type collate; operator |.  Allows newline separated lists to be used as a list of alternatives. + + no_escape_in_lists + No + When set this makes the escape character ordinary inside lists, so that [\b] + would match either '\' or 'b'. This bit is one by default for + POSIX-Extended regular expressions, but can be unset to force escapes to be + recognised inside lists. + + + no_bk_refs + No + When set then backreferences are disabled.  This bit is + on by default for POSIX-Extended regular expressions, but can be + unset to support for backreferences on. +

    Options for POSIX Basic Regular Expressions:

    @@ -415,6 +434,14 @@ static const syntax_option_type collate; No When set then character classes such as [[:alnum:]] are not allowed. + + no_escape_in_lists + No + When set this makes the escape character ordinary inside lists, so that [\b] + would match either '\' or 'b'. This bit is one by default for + POSIX-basic regular expressions, but can be unset to force escapes to be + recognised inside lists. + no_intervals No @@ -492,4 +519,3 @@ static const syntax_option_type collate; or copy at http://www.boost.org/LICENSE_1_0.txt)

    -