Added better documentation on POSIX-conformance.

[SVN r28278]
This commit is contained in:
John Maddock
2005-04-16 16:06:45 +00:00
parent 5d1f265345
commit 91b21e78ff
8 changed files with 98 additions and 28 deletions

View File

@ -29,6 +29,9 @@
<LI> <LI>
Completely rewritten expression parsing code, and traits class support; now Completely rewritten expression parsing code, and traits class support; now
conforms to the standardization proposal. conforms to the standardization proposal.
<LI>
POSIX-extended and POSIX-basic regular expressions now enforce the letter of
the POSIX standard much more closely than before.
<LI> <LI>
Added <A href="syntax_perl.html#Perl">support for (?imsx-imsx) constructs</A>. Added <A href="syntax_perl.html#Perl">support for (?imsx-imsx) constructs</A>.
<LI> <LI>

View File

@ -128,13 +128,16 @@ aaaa</PRE>
point of a range, for example: [[.ae.]-c] matches the character sequence "ae", point of a range, for example: [[.ae.]-c] matches the character sequence "ae",
plus any single character in the rangle "ae"-c, assuming that "ae" is treated plus any single character in the rangle "ae"-c, assuming that "ae" is treated
as a single collating element in the current locale.</P> as a single collating element in the current locale.</P>
<P>Collating elements may be used in place of escapes (which are not normally
allowed inside character sets), for example [[.^.]abc] would match either one
of the characters 'abc^'.</P>
<P>As an extension, a collating element may also be specified via its <A href="collating_names.html"> <P>As an extension, a collating element may also be specified via its <A href="collating_names.html">
symbolic name</A>, for example:</P> symbolic name</A>, for example:</P>
<P>[[.NUL.]]</P> <P>[[.NUL.]]</P>
<P>matches a NUL character.</P> <P>matches a NUL character.</P>
<H5>Equivalence classes:</H5> <H5>Equivalence classes:</H5>
<P> <P>
An expression of the form[[=col=]], matches any character or collating element An expression of theform[[=col=]], matches any character or collating element
whose primary sort key is the same as that for collating element <EM>col</EM>, whose primary sort key is the same as that for collating element <EM>col</EM>,
as with collating elements the name <EM>col</EM> may be a <A href="collating_names.html"> as with collating elements the name <EM>col</EM> may be a <A href="collating_names.html">
symbolic name</A>.&nbsp; A primary sort key is one that ignores case, symbolic name</A>.&nbsp; A primary sort key is one that ignores case,
@ -233,4 +236,3 @@ aaaa</PRE>
</I> </I>
</body> </body>
</html> </html>

View File

@ -130,10 +130,11 @@ aaaa</PRE>
<P>For example [a-c] will match any single character in the range 'a' to <P>For example [a-c] will match any single character in the range 'a' to
'c'.&nbsp; By default, for POSIX-Extended regular expressions, a character <EM>x</EM> 'c'.&nbsp; By default, for POSIX-Extended regular expressions, a character <EM>x</EM>
is within the range <EM>y</EM> to <EM>z</EM>, if it collates within that is within the range <EM>y</EM> to <EM>z</EM>, if it collates within that
range;&nbsp;this results in locale specific behavior.&nbsp; This behavior can range;&nbsp;<EM><STRONG>this results in locale specific behavior</STRONG></EM> .&nbsp;
be turned off by unsetting the <EM><A href="syntax_option_type.html#extended">collate</A></EM> This behavior can be turned off by unsetting the <EM><A href="syntax_option_type.html#extended">
option flag - in which case whether a character appears within a range is collate</A></EM> option flag - in which case whether a character appears
determined by comparing the code points of the characters only</P> within a range is determined by comparing the code points of the characters
only.</P>
<H5>Negation:</H5> <H5>Negation:</H5>
<P>If the bracket-expression begins with the ^ character, then it matches the <P>If the bracket-expression begins with the ^ character, then it matches the
complement of the characters it contains, for example [^a-c] matches any complement of the characters it contains, for example [^a-c] matches any
@ -149,13 +150,16 @@ aaaa</PRE>
point of a range, for example: [[.ae.]-c] matches the character sequence "ae", point of a range, for example: [[.ae.]-c] matches the character sequence "ae",
plus any single character in the range "ae"-c, assuming that "ae" is treated as plus any single character in the range "ae"-c, assuming that "ae" is treated as
a single collating element in the current locale.</P> a single collating element in the current locale.</P>
<P>Collating elements may be used in place of escapes (which are not normally
allowed inside character sets), for example [[.^.]abc] would match either one
of the characters 'abc^'.</P>
<P>As an extension, a collating element may also be specified via its <A href="collating_names.html"> <P>As an extension, a collating element may also be specified via its <A href="collating_names.html">
symbolic name</A>, for example:</P> symbolic name</A>, for example:</P>
<P>[[.NUL.]]</P> <P>[[.NUL.]]</P>
<P>matches a NUL character.</P> <P>matches a NUL character.</P>
<H5>Equivalence classes:</H5> <H5>Equivalence classes:</H5>
<P> <P>
An expression of theform[[=col=]], matches any character or collating element An expression oftheform[[=col=]], matches any character or collating element
whose primary sort key is the same as that for collating element <EM>col</EM>, whose primary sort key is the same as that for collating element <EM>col</EM>,
as with colating elements the name <EM>col</EM> may be a <A href="collating_names.html"> as with colating elements the name <EM>col</EM> may be a <A href="collating_names.html">
symbolic name</A>.&nbsp; A primary sort key is one that ignores case, symbolic name</A>.&nbsp; A primary sort key is one that ignores case,
@ -177,9 +181,9 @@ aaaa</PRE>
<LI> <LI>
The effect of any ordinary character being preceded by an escape is undefined. The effect of any ordinary character being preceded by an escape is undefined.
<LI> <LI>
An escape inside a character class declaration shall match itself (in other An escape inside a character class declaration shall match itself: in other
words the escape character is not "special" inside a character class words the escape character is not "special" inside a character class
declaration).</LI></UL> declaration; so [\^] will match either a literal '\' or a '^'.</LI></UL>
<P>However, that's rather restrictive, so the following standard-compatible <P>However, that's rather restrictive, so the following standard-compatible
extensions are also supported by Boost.Regex:</P> extensions are also supported by Boost.Regex:</P>
<BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px"> <BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">

View File

@ -153,7 +153,7 @@ static const syntax_option_type collate;
speed with which regular expressions are matched, and less to the speed with speed with which regular expressions are matched, and less to the speed with
which regular expression objects are constructed. Otherwise it has no which regular expression objects are constructed. Otherwise it has no
detectable effect on the program output.&nbsp; This currently has no effect for detectable effect on the program output.&nbsp; This currently has no effect for
boost.regex.</P> Boost.Regex.</P>
</TD> </TD>
</TR> </TR>
<TR> <TR>
@ -250,8 +250,9 @@ static const syntax_option_type collate;
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in <P>That is to say: the same as POSIX extended syntax, but with escape sequences in
character classes permitted.</P> character classes permitted.</P>
<P>In addition some perl-style escape sequences are supported (actually the awk <P>In addition some perl-style escape sequences are supported (actually the awk
syntax requires \a \b \t \v \f \n and \r to be recognised,&nbsp;but other syntax only requires \a \b \t \v \f \n and \r to be recognised,&nbsp;all other
escape sequences invoke undefined behavior according to the POSIX standard).</P> Perl-style escape sequences invoke undefined behavior according to the POSIX
standard, but are in fact recognised by Boost.Regex).</P>
</TD> </TD>
</TR> </TR>
</TABLE> </TABLE>
@ -297,7 +298,10 @@ static const syntax_option_type collate;
<TD>collate</TD> <TD>collate</TD>
<TD>Yes</TD> <TD>Yes</TD>
<TD> <TD>
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P> <P>Specifies that character ranges of the form "[a-b]" should be locale
sensitive.&nbsp; <STRONG>This bit is</STRONG> <STRONG>on by default</STRONG> for
POSIX-Extended regular expressions, but can be unset to force ranges to be
compared by code point only.</P>
</TD> </TD>
</TR> </TR>
<TR> <TR>
@ -307,6 +311,21 @@ static const syntax_option_type collate;
operator |.&nbsp; Allows newline separated lists to be used as a list of operator |.&nbsp; Allows newline separated lists to be used as a list of
alternatives.</TD> alternatives.</TD>
</TR> </TR>
<TR>
<TD>no_escape_in_lists</TD>
<TD>No</TD>
<TD>When set this makes the escape character ordinary inside lists, so that [\b]
would match either '\' or 'b'. <STRONG>This bit is one by default</STRONG> for
POSIX-Extended regular expressions, but can be unset to force escapes to be
recognised inside lists.</TD>
</TR>
<TR>
<TD>no_bk_refs</TD>
<TD>No</TD>
<TD>When set then backreferences are disabled.&nbsp; <STRONG>This bit is</STRONG> <STRONG>
on by default</STRONG> for POSIX-Extended regular expressions, but can be
unset to support for backreferences on.</TD>
</TR>
</TABLE> </TABLE>
</P> </P>
<H4><A name="basic"></A>Options for POSIX Basic Regular Expressions:</H4> <H4><A name="basic"></A>Options for POSIX Basic Regular Expressions:</H4>
@ -415,6 +434,14 @@ static const syntax_option_type collate;
<TD>No</TD> <TD>No</TD>
<TD>When set then character classes such as [[:alnum:]] are not allowed.</TD> <TD>When set then character classes such as [[:alnum:]] are not allowed.</TD>
</TR> </TR>
<TR>
<TD>no_escape_in_lists</TD>
<TD>No</TD>
<TD>When set this makes the escape character ordinary inside lists, so that [\b]
would match either '\' or 'b'. <STRONG>This bit is one by default</STRONG> for
POSIX-basic regular expressions, but can be unset to force escapes to be
recognised inside lists.</TD>
</TR>
<TR> <TR>
<TD>no_intervals</TD> <TD>no_intervals</TD>
<TD>No</TD> <TD>No</TD>
@ -492,4 +519,3 @@ static const syntax_option_type collate;
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
</body> </body>
</html> </html>

View File

@ -29,6 +29,9 @@
<LI> <LI>
Completely rewritten expression parsing code, and traits class support; now Completely rewritten expression parsing code, and traits class support; now
conforms to the standardization proposal. conforms to the standardization proposal.
<LI>
POSIX-extended and POSIX-basic regular expressions now enforce the letter of
the POSIX standard much more closely than before.
<LI> <LI>
Added <A href="syntax_perl.html#Perl">support for (?imsx-imsx) constructs</A>. Added <A href="syntax_perl.html#Perl">support for (?imsx-imsx) constructs</A>.
<LI> <LI>

View File

@ -128,13 +128,16 @@ aaaa</PRE>
point of a range, for example: [[.ae.]-c] matches the character sequence "ae", point of a range, for example: [[.ae.]-c] matches the character sequence "ae",
plus any single character in the rangle "ae"-c, assuming that "ae" is treated plus any single character in the rangle "ae"-c, assuming that "ae" is treated
as a single collating element in the current locale.</P> as a single collating element in the current locale.</P>
<P>Collating elements may be used in place of escapes (which are not normally
allowed inside character sets), for example [[.^.]abc] would match either one
of the characters 'abc^'.</P>
<P>As an extension, a collating element may also be specified via its <A href="collating_names.html"> <P>As an extension, a collating element may also be specified via its <A href="collating_names.html">
symbolic name</A>, for example:</P> symbolic name</A>, for example:</P>
<P>[[.NUL.]]</P> <P>[[.NUL.]]</P>
<P>matches a NUL character.</P> <P>matches a NUL character.</P>
<H5>Equivalence classes:</H5> <H5>Equivalence classes:</H5>
<P> <P>
An expression of the form[[=col=]], matches any character or collating element An expression of theform[[=col=]], matches any character or collating element
whose primary sort key is the same as that for collating element <EM>col</EM>, whose primary sort key is the same as that for collating element <EM>col</EM>,
as with collating elements the name <EM>col</EM> may be a <A href="collating_names.html"> as with collating elements the name <EM>col</EM> may be a <A href="collating_names.html">
symbolic name</A>.&nbsp; A primary sort key is one that ignores case, symbolic name</A>.&nbsp; A primary sort key is one that ignores case,
@ -233,4 +236,3 @@ aaaa</PRE>
</I> </I>
</body> </body>
</html> </html>

View File

@ -130,10 +130,11 @@ aaaa</PRE>
<P>For example [a-c] will match any single character in the range 'a' to <P>For example [a-c] will match any single character in the range 'a' to
'c'.&nbsp; By default, for POSIX-Extended regular expressions, a character <EM>x</EM> 'c'.&nbsp; By default, for POSIX-Extended regular expressions, a character <EM>x</EM>
is within the range <EM>y</EM> to <EM>z</EM>, if it collates within that is within the range <EM>y</EM> to <EM>z</EM>, if it collates within that
range;&nbsp;this results in locale specific behavior.&nbsp; This behavior can range;&nbsp;<EM><STRONG>this results in locale specific behavior</STRONG></EM> .&nbsp;
be turned off by unsetting the <EM><A href="syntax_option_type.html#extended">collate</A></EM> This behavior can be turned off by unsetting the <EM><A href="syntax_option_type.html#extended">
option flag - in which case whether a character appears within a range is collate</A></EM> option flag - in which case whether a character appears
determined by comparing the code points of the characters only</P> within a range is determined by comparing the code points of the characters
only.</P>
<H5>Negation:</H5> <H5>Negation:</H5>
<P>If the bracket-expression begins with the ^ character, then it matches the <P>If the bracket-expression begins with the ^ character, then it matches the
complement of the characters it contains, for example [^a-c] matches any complement of the characters it contains, for example [^a-c] matches any
@ -149,13 +150,16 @@ aaaa</PRE>
point of a range, for example: [[.ae.]-c] matches the character sequence "ae", point of a range, for example: [[.ae.]-c] matches the character sequence "ae",
plus any single character in the range "ae"-c, assuming that "ae" is treated as plus any single character in the range "ae"-c, assuming that "ae" is treated as
a single collating element in the current locale.</P> a single collating element in the current locale.</P>
<P>Collating elements may be used in place of escapes (which are not normally
allowed inside character sets), for example [[.^.]abc] would match either one
of the characters 'abc^'.</P>
<P>As an extension, a collating element may also be specified via its <A href="collating_names.html"> <P>As an extension, a collating element may also be specified via its <A href="collating_names.html">
symbolic name</A>, for example:</P> symbolic name</A>, for example:</P>
<P>[[.NUL.]]</P> <P>[[.NUL.]]</P>
<P>matches a NUL character.</P> <P>matches a NUL character.</P>
<H5>Equivalence classes:</H5> <H5>Equivalence classes:</H5>
<P> <P>
An expression of theform[[=col=]], matches any character or collating element An expression oftheform[[=col=]], matches any character or collating element
whose primary sort key is the same as that for collating element <EM>col</EM>, whose primary sort key is the same as that for collating element <EM>col</EM>,
as with colating elements the name <EM>col</EM> may be a <A href="collating_names.html"> as with colating elements the name <EM>col</EM> may be a <A href="collating_names.html">
symbolic name</A>.&nbsp; A primary sort key is one that ignores case, symbolic name</A>.&nbsp; A primary sort key is one that ignores case,
@ -177,9 +181,9 @@ aaaa</PRE>
<LI> <LI>
The effect of any ordinary character being preceded by an escape is undefined. The effect of any ordinary character being preceded by an escape is undefined.
<LI> <LI>
An escape inside a character class declaration shall match itself (in other An escape inside a character class declaration shall match itself: in other
words the escape character is not "special" inside a character class words the escape character is not "special" inside a character class
declaration).</LI></UL> declaration; so [\^] will match either a literal '\' or a '^'.</LI></UL>
<P>However, that's rather restrictive, so the following standard-compatible <P>However, that's rather restrictive, so the following standard-compatible
extensions are also supported by Boost.Regex:</P> extensions are also supported by Boost.Regex:</P>
<BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px"> <BLOCKQUOTE dir="ltr" style="MARGIN-RIGHT: 0px">

View File

@ -153,7 +153,7 @@ static const syntax_option_type collate;
speed with which regular expressions are matched, and less to the speed with speed with which regular expressions are matched, and less to the speed with
which regular expression objects are constructed. Otherwise it has no which regular expression objects are constructed. Otherwise it has no
detectable effect on the program output.&nbsp; This currently has no effect for detectable effect on the program output.&nbsp; This currently has no effect for
boost.regex.</P> Boost.Regex.</P>
</TD> </TD>
</TR> </TR>
<TR> <TR>
@ -250,8 +250,9 @@ static const syntax_option_type collate;
<P>That is to say: the same as POSIX extended syntax, but with escape sequences in <P>That is to say: the same as POSIX extended syntax, but with escape sequences in
character classes permitted.</P> character classes permitted.</P>
<P>In addition some perl-style escape sequences are supported (actually the awk <P>In addition some perl-style escape sequences are supported (actually the awk
syntax requires \a \b \t \v \f \n and \r to be recognised,&nbsp;but other syntax only requires \a \b \t \v \f \n and \r to be recognised,&nbsp;all other
escape sequences invoke undefined behavior according to the POSIX standard).</P> Perl-style escape sequences invoke undefined behavior according to the POSIX
standard, but are in fact recognised by Boost.Regex).</P>
</TD> </TD>
</TR> </TR>
</TABLE> </TABLE>
@ -297,7 +298,10 @@ static const syntax_option_type collate;
<TD>collate</TD> <TD>collate</TD>
<TD>Yes</TD> <TD>Yes</TD>
<TD> <TD>
<P>Specifies that character ranges of the form "[a-b]" should be locale sensitive.</P> <P>Specifies that character ranges of the form "[a-b]" should be locale
sensitive.&nbsp; <STRONG>This bit is</STRONG> <STRONG>on by default</STRONG> for
POSIX-Extended regular expressions, but can be unset to force ranges to be
compared by code point only.</P>
</TD> </TD>
</TR> </TR>
<TR> <TR>
@ -307,6 +311,21 @@ static const syntax_option_type collate;
operator |.&nbsp; Allows newline separated lists to be used as a list of operator |.&nbsp; Allows newline separated lists to be used as a list of
alternatives.</TD> alternatives.</TD>
</TR> </TR>
<TR>
<TD>no_escape_in_lists</TD>
<TD>No</TD>
<TD>When set this makes the escape character ordinary inside lists, so that [\b]
would match either '\' or 'b'. <STRONG>This bit is one by default</STRONG> for
POSIX-Extended regular expressions, but can be unset to force escapes to be
recognised inside lists.</TD>
</TR>
<TR>
<TD>no_bk_refs</TD>
<TD>No</TD>
<TD>When set then backreferences are disabled.&nbsp; <STRONG>This bit is</STRONG> <STRONG>
on by default</STRONG> for POSIX-Extended regular expressions, but can be
unset to support for backreferences on.</TD>
</TR>
</TABLE> </TABLE>
</P> </P>
<H4><A name="basic"></A>Options for POSIX Basic Regular Expressions:</H4> <H4><A name="basic"></A>Options for POSIX Basic Regular Expressions:</H4>
@ -415,6 +434,14 @@ static const syntax_option_type collate;
<TD>No</TD> <TD>No</TD>
<TD>When set then character classes such as [[:alnum:]] are not allowed.</TD> <TD>When set then character classes such as [[:alnum:]] are not allowed.</TD>
</TR> </TR>
<TR>
<TD>no_escape_in_lists</TD>
<TD>No</TD>
<TD>When set this makes the escape character ordinary inside lists, so that [\b]
would match either '\' or 'b'. <STRONG>This bit is one by default</STRONG> for
POSIX-basic regular expressions, but can be unset to force escapes to be
recognised inside lists.</TD>
</TR>
<TR> <TR>
<TD>no_intervals</TD> <TD>no_intervals</TD>
<TD>No</TD> <TD>No</TD>
@ -492,4 +519,3 @@ static const syntax_option_type collate;
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
</body> </body>
</html> </html>