mirror of
https://github.com/boostorg/regex.git
synced 2025-07-15 21:32:18 +02:00
Updated Faq and acknowledgements
[SVN r10006]
This commit is contained in:
2524
appendix.htm
2524
appendix.htm
File diff suppressed because it is too large
Load Diff
180
faq.htm
180
faq.htm
@ -1,28 +1,20 @@
|
|||||||
<!DOCTYPE HTML PUBLIC "-//w3c//dtd html 4.0 transitional//en">
|
|
||||||
|
|
||||||
<HTML>
|
<HTML>
|
||||||
|
|
||||||
<HEAD>
|
<HEAD>
|
||||||
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
|
||||||
<META NAME="Template"
|
<META NAME="Generator" CONTENT="Microsoft Word 97">
|
||||||
CONTENT="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
|
||||||
<META NAME="GENERATOR" CONTENT="Mozilla/4.5 [en] (Win98; I) [Netscape]">
|
|
||||||
<TITLE>Regex++ - FAQ</TITLE>
|
<TITLE>Regex++ - FAQ</TITLE>
|
||||||
|
<META NAME="Template" CONTENT="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot">
|
||||||
</HEAD>
|
</HEAD>
|
||||||
|
<BODY LINK="#0000ff" VLINK="#800080" BGCOLOR="#ffffff">
|
||||||
|
|
||||||
<BODY BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#800080">
|
<P><!DOCTYPE HTML PUBLIC "-//w3c//dtd html 4.0 transitional//en"> </P>
|
||||||
<TABLE BORDER="0" CELLSPACING="0" CELLPADDING="7" WIDTH="100%">
|
<TABLE CELLSPACING=0 BORDER=0 CELLPADDING=7 WIDTH=624>
|
||||||
<TR>
|
<TR><TD WIDTH="50%" VALIGN="TOP">
|
||||||
<TD VALIGN="TOP" WIDTH="50%"> <H3>
|
<H3><IMG SRC="../../c++boost.gif" WIDTH=276 HEIGHT=86 ALT="C++ Boost"></H3></TD>
|
||||||
<IMG SRC="../../c++boost.gif" HEIGHT="86" WIDTH="276" ALT="C++ Boost"></H3>
|
<TD WIDTH="50%" VALIGN="TOP">
|
||||||
</TD>
|
<H3 ALIGN="CENTER">Regex++, FAQ.</H3>
|
||||||
<TD VALIGN="TOP" WIDTH="50%"> <CENTER>
|
<I><P ALIGN="CENTER">(version 3.10, 18 April 2000)</I> </P>
|
||||||
<H3> Regex++, FAQ.</H3>
|
<I><PRE>Copyright (c) 1998-2000
|
||||||
</CENTER>
|
|
||||||
<CENTER>
|
|
||||||
<I>(version 3.10, 18 April 2000)</I>
|
|
||||||
</CENTER>
|
|
||||||
<PRE><I>Copyright (c) 1998-2000
|
|
||||||
Dr John Maddock
|
Dr John Maddock
|
||||||
|
|
||||||
Permission to use, copy, modify, distribute and sell this software
|
Permission to use, copy, modify, distribute and sell this software
|
||||||
@ -31,111 +23,49 @@ provided that the above copyright notice appear in all copies and
|
|||||||
that both that copyright notice and this permission notice appear
|
that both that copyright notice and this permission notice appear
|
||||||
in supporting documentation. Dr John Maddock makes no representations
|
in supporting documentation. Dr John Maddock makes no representations
|
||||||
about the suitability of this software for any purpose.
|
about the suitability of this software for any purpose.
|
||||||
It is provided "as is" without express or implied warranty.</I></PRE>
|
It is provided "as is" without express or implied warranty.</PRE></I></TD>
|
||||||
|
|
||||||
</TD>
|
|
||||||
</TR>
|
</TR>
|
||||||
</TABLE>
|
</TABLE>
|
||||||
<P><FONT COLOR="#FF0000">Q. Configure says that my compiler is unable to merge
|
|
||||||
template instances, what does this mean?</FONT> </P>
|
|
||||||
<P>A. When you compile template code, you can end up with the same template
|
|
||||||
instances in multiple translation units - this will lead to link time errors
|
|
||||||
unless your compiler/linker is smart enough to merge these template instances
|
|
||||||
into a single record in the executable file. If you see this warning after
|
|
||||||
running configure, then you can still link to libregex++.a if: </P>
|
|
||||||
<OL>
|
|
||||||
<LI> You use only the low-level template classes (reg_expression<>
|
|
||||||
match_results<> etc), from a single translation unit, and use no other part
|
|
||||||
of regex++.</LI>
|
|
||||||
<LI> You use only the POSIX API functions (regcomp regexec etc), and no other
|
|
||||||
part of regex++.</LI>
|
|
||||||
<LI> You use only the high level class RegEx, and no other part of regex++.
|
|
||||||
</LI>
|
|
||||||
</OL>
|
|
||||||
Another option is to create a master include file, which #include's all the
|
|
||||||
regex++ source files, and all the source files in which you use regex++. You
|
|
||||||
then compile and link this master file as a single translation unit. <P><FONT
|
|
||||||
COLOR="#FF0000">Q. Configure says that my compiler is unable to merge template
|
|
||||||
instances from archive files, what does this mean?</FONT> </P>
|
|
||||||
<P>A. When you compile template code, you can end up with the same template
|
|
||||||
instances in multiple translation units - this will lead to link time errors
|
|
||||||
unless your compiler/linker is smart enough to merge these template instances
|
|
||||||
into a single record in the executable file. Some compilers are able to do this
|
|
||||||
for normal .cpp or .o files, but fail if the object file has been placed in a
|
|
||||||
library archive. If you see this warning after running configure, then you can
|
|
||||||
still link to libregex++.a if: </P>
|
|
||||||
<OL>
|
|
||||||
<LI> You use only the low-level template classes (reg_expression<>
|
|
||||||
match_results<> etc), and use no other part of regex++.</LI>
|
|
||||||
<LI> You use only the POSIX API functions (regcomp regexec etc), and no other
|
|
||||||
part of regex++.</LI>
|
|
||||||
<LI> You use only the high level class RegEx, and no other part of regex++.
|
|
||||||
</LI>
|
|
||||||
</OL>
|
|
||||||
Another option is to add the regex++ source files directly to your project
|
|
||||||
instead of linking to libregex++.a, generally you should do this only if you
|
|
||||||
are getting link time errors with libregex++.a. <P><FONT COLOR="#FF0000">Q.
|
|
||||||
Configure says that my compiler can't merge templates containing switch
|
|
||||||
statements, what does this mean?</FONT> </P>
|
|
||||||
<P>A. Some compilers can't merge templates that contain static data - this
|
|
||||||
includes switch statements which implicitly generate static data as well as
|
|
||||||
code. Principally this affects the egcs compiler - but note gcc 2.81 also
|
|
||||||
suffers from this problem - the compiler will compile and link the code - but
|
|
||||||
the code will not run because the code and the static data it uses have become
|
|
||||||
separated. The default behaviour of regex++ is to try and fix this problem by
|
|
||||||
declaring "problem" templates inside unnamed namespaces, so that the
|
|
||||||
templates have internal linkage. Note that this can result in a great deal of
|
|
||||||
code bloat. If the compiler doesn't support namespaces, or if code bloat
|
|
||||||
becomes a problem, then follow the guidelines above for placing all the
|
|
||||||
templates used in a single translation unit, and edit jm_opt.h so that
|
|
||||||
BOOST_RE_NO_TEMPLATE_SWITCH_MERGE is no longer defined. </P>
|
|
||||||
<P><FONT COLOR="#FF0000">Q. I can't get regex++ to work with escape characters,
|
|
||||||
what's going on?</FONT> </P>
|
|
||||||
<P>A. If you embed regular expressions in C++ code, then remember that escape
|
|
||||||
characters are processed twice: once by the C++ compiler, and once by the
|
|
||||||
regex++ expression compiler, so to pass the regular expression \d+ to regex++,
|
|
||||||
you need to embed "\\d+" in your code. Likewise to match a literal
|
|
||||||
backslash you will need to embed "\\\\" in your code. </P>
|
|
||||||
<P><FONT COLOR="#FF0000">Q. Why don't character ranges work properly?</FONT>
|
|
||||||
<BR>
|
|
||||||
A. The POSIX standard specifies that character range expressions are locale
|
|
||||||
sensitive - so for example the expression [A-Z] will match any collating
|
|
||||||
element that collates between 'A' and 'Z'. That means that for most locales
|
|
||||||
other than "C" or "POSIX", [A-Z] would match the single
|
|
||||||
character 't' for example, which is not what most people expect - or at least
|
|
||||||
not what most people have come to expect from regular expression engines. For
|
|
||||||
this reason, the default behaviour of regex++ is to turn locale sensitive
|
|
||||||
collation off by setting the regbase::nocollate compile time flag (this is set
|
|
||||||
by regbase::normal). However if you set a non-default compile time flag - for
|
|
||||||
example regbase::extended or regbase::basic, then locale dependent collation
|
|
||||||
will be enabled, this also applies to the POSIX API functions which use either
|
|
||||||
regbase::extended or regbase::basic internally, in the latter case use
|
|
||||||
REG_NOCOLLATE in combination with either REG_BASIC or REG_EXTENDED when
|
|
||||||
invoking regcomp if you don't want locale sensitive collation. <I>[Note - when
|
|
||||||
regbase::nocollate in effect, the library behaves "as if" the
|
|
||||||
LC_COLLATE locale category were always "C", regardless of what its
|
|
||||||
actually set to - end note</I>]. </P>
|
|
||||||
<P><FONT COLOR="#FF0000"> Q. Why can't I use the "convenience"
|
|
||||||
versions of query_match/reg_search/reg_grep/reg_format/reg_merge?</FONT> </P>
|
|
||||||
<P>A. These versions may or may not be available depending upon the
|
|
||||||
capabilities of your compiler, the rules determining the format of these
|
|
||||||
functions are quite complex - and only the versions visible to a standard
|
|
||||||
compliant compiler are given in the help. To find out what your compiler
|
|
||||||
supports, run <boost/regex.hpp> through your C++ pre-processor, and
|
|
||||||
search the output file for the function that you are interested in. </P>
|
|
||||||
<P><FONT COLOR="#FF0000">Q. Why are there no throw specifications on any of the
|
|
||||||
functions? What exceptions can the library throw?</FONT> </P>
|
|
||||||
<P>A. Not all compilers support (or honor) throw specifications, others support
|
|
||||||
them but with reduced efficiency. Throw specifications may be added at a later
|
|
||||||
date as compilers begin to handle this better. The library should throw only
|
|
||||||
three types of exception: boost::bad_expression can be thrown by reg_expression
|
|
||||||
when compiling a regular expression; boost::bad_pattern can be thrown by the
|
|
||||||
class sub_match's conversion operators; finally std::bad_alloc can be thrown by
|
|
||||||
just about any of the functions in this library. <BR>
|
|
||||||
</P>
|
|
||||||
<HR>
|
|
||||||
<P><I>Copyright <A HREF="mailto:John_Maddock@compuserve.com">Dr John
|
|
||||||
Maddock</A> 1998-2000 all rights reserved.</I> </P>
|
|
||||||
</BODY>
|
|
||||||
</HTML>
|
|
||||||
|
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. Why does using parenthesis in a regular expression change the result of a match?</P>
|
||||||
|
</FONT><P>Parentheses don't only mark; they determine what the best match is as well. regex++ tries to follow the POSIX standard leftmost longest rule for determining what matched. So if there is more than one possible match after considering the whole expression, it looks next at the first sub-expression and then the second sub-expression and so on. So...</P>
|
||||||
|
<PRE>"(0*)([0-9]*)" against "00123" would produce
|
||||||
|
$1 = "00"
|
||||||
|
$2 = "123"</PRE>
|
||||||
|
<P>where as</P>
|
||||||
|
<PRE>"0*([0-9)*" against "00123" would produce
|
||||||
|
$1 = "00123"</PRE>
|
||||||
|
<P>If you think about it, had $1 only matched the "123", this would be "less good" than the match "00123" which is both further to the left and longer. If you want $1 to match only the "123" part, then you need to use something like:</P>
|
||||||
|
<PRE>"0*([1-9][0-9]*)"</PRE>
|
||||||
|
<P>as the expression.</P>
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. Configure says that my compiler is unable to merge template instances, what does this mean?</FONT> </P>
|
||||||
|
<P>A. When you compile template code, you can end up with the same template instances in multiple translation units - this will lead to link time errors unless your compiler/linker is smart enough to merge these template instances into a single record in the executable file. If you see this warning after running configure, then you can still link to libregex++.a if: </P>
|
||||||
|
<OL>
|
||||||
|
|
||||||
|
<LI>You use only the low-level template classes (reg_expression<> match_results<> etc), from a single translation unit, and use no other part of regex++.</LI>
|
||||||
|
<LI>You use only the POSIX API functions (regcomp regexec etc), and no other part of regex++.</LI>
|
||||||
|
<LI>You use only the high level class RegEx, and no other part of regex++. </LI></OL>
|
||||||
|
|
||||||
|
<P>Another option is to create a master include file, which #include's all the regex++ source files, and all the source files in which you use regex++. You then compile and link this master file as a single translation unit. </P>
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. Configure says that my compiler is unable to merge template instances from archive files, what does this mean?</FONT> </P>
|
||||||
|
<P>A. When you compile template code, you can end up with the same template instances in multiple translation units - this will lead to link time errors unless your compiler/linker is smart enough to merge these template instances into a single record in the executable file. Some compilers are able to do this for normal .cpp or .o files, but fail if the object file has been placed in a library archive. If you see this warning after running configure, then you can still link to libregex++.a if: </P>
|
||||||
|
<OL>
|
||||||
|
|
||||||
|
<LI>You use only the low-level template classes (reg_expression<> match_results<> etc), and use no other part of regex++.</LI>
|
||||||
|
<LI>You use only the POSIX API functions (regcomp regexec etc), and no other part of regex++.</LI>
|
||||||
|
<LI>You use only the high level class RegEx, and no other part of regex++. </LI></OL>
|
||||||
|
|
||||||
|
<P>Another option is to add the regex++ source files directly to your project instead of linking to libregex++.a, generally you should do this only if you are getting link time errors with libregex++.a. </P>
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. Configure says that my compiler can't merge templates containing switch statements, what does this mean?</FONT> </P>
|
||||||
|
<P>A. Some compilers can't merge templates that contain static data - this includes switch statements which implicitly generate static data as well as code. Principally this affects the egcs compiler - but note gcc 2.81 also suffers from this problem - the compiler will compile and link the code - but the code will not run because the code and the static data it uses have become separated. The default behaviour of regex++ is to try and fix this problem by declaring "problem" templates inside unnamed namespaces, so that the templates have internal linkage. Note that this can result in a great deal of code bloat. If the compiler doesn't support namespaces, or if code bloat becomes a problem, then follow the guidelines above for placing all the templates used in a single translation unit, and edit jm_opt.h so that BOOST_RE_NO_TEMPLATE_SWITCH_MERGE is no longer defined. </P>
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. I can't get regex++ to work with escape characters, what's going on?</FONT> </P>
|
||||||
|
<P>A. If you embed regular expressions in C++ code, then remember that escape characters are processed twice: once by the C++ compiler, and once by the regex++ expression compiler, so to pass the regular expression \d+ to regex++, you need to embed "\\d+" in your code. Likewise to match a literal backslash you will need to embed "\\\\" in your code. </P>
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. Why don't character ranges work properly?</FONT> <BR>
|
||||||
|
A. The POSIX standard specifies that character range expressions are locale sensitive - so for example the expression [A-Z] will match any collating element that collates between 'A' and 'Z'. That means that for most locales other than "C" or "POSIX", [A-Z] would match the single character 't' for example, which is not what most people expect - or at least not what most people have come to expect from regular expression engines. For this reason, the default behaviour of regex++ is to turn locale sensitive collation off by setting the regbase::nocollate compile time flag (this is set by regbase::normal). However if you set a non-default compile time flag - for example regbase::extended or regbase::basic, then locale dependent collation will be enabled, this also applies to the POSIX API functions which use either regbase::extended or regbase::basic internally, in the latter case use REG_NOCOLLATE in combination with either REG_BASIC or REG_EXTENDED when invoking regcomp if you don't want locale sensitive collation. <I>[Note - when regbase::nocollate in effect, the library behaves "as if" the LC_COLLATE locale category were always "C", regardless of what its actually set to - end note</I>]. </P>
|
||||||
|
<FONT COLOR="#ff0000"><P> Q. Why can't I use the "convenience" versions of query_match/reg_search/reg_grep/reg_format/reg_merge?</FONT> </P>
|
||||||
|
<P>A. These versions may or may not be available depending upon the capabilities of your compiler, the rules determining the format of these functions are quite complex - and only the versions visible to a standard compliant compiler are given in the help. To find out what your compiler supports, run <boost/regex.hpp> through your C++ pre-processor, and search the output file for the function that you are interested in. </P>
|
||||||
|
<FONT COLOR="#ff0000"><P>Q. Why are there no throw specifications on any of the functions? What exceptions can the library throw?</FONT> </P>
|
||||||
|
<P>A. Not all compilers support (or honor) throw specifications, others support them but with reduced efficiency. Throw specifications may be added at a later date as compilers begin to handle this better. The library should throw only three types of exception: boost::bad_expression can be thrown by reg_expression when compiling a regular expression; boost::bad_pattern can be thrown by the class sub_match's conversion operators; finally std::bad_alloc can be thrown by just about any of the functions in this library. </P>
|
||||||
|
<P><HR></P>
|
||||||
|
<I><P>Copyright </I><A HREF="mailto:John_Maddock@compuserve.com"><I>Dr John Maddock</I></A><I> 1998-2000 all rights reserved.</I> </P></BODY>
|
||||||
|
</HTML>
|
||||||
|
Reference in New Issue
Block a user