2003-05-17 11:45:48 +00:00
|
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
|
|
|
<html>
|
|
|
|
|
<head>
|
|
|
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
|
|
|
<title>Boost.Regex: FAQ</title>
|
|
|
|
|
<meta http-equiv="Content-Type" content=
|
|
|
|
|
"text/html; charset=iso-8859-1">
|
|
|
|
|
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
|
|
|
|
</head>
|
|
|
|
|
<body>
|
|
|
|
|
<p></p>
|
|
|
|
|
|
|
|
|
|
<table id="Table1" cellspacing="1" cellpadding="1" width="100%"
|
|
|
|
|
border="0">
|
|
|
|
|
<tr>
|
|
|
|
|
<td valign="top" width="300">
|
|
|
|
|
<h3><a href="../../../index.htm"><img height="86" width="277" alt=
|
|
|
|
|
"C++ Boost" src="../../../c++boost.gif" border="0"></a></h3>
|
|
|
|
|
</td>
|
|
|
|
|
<td width="353">
|
|
|
|
|
<h1 align="center">Boost.Regex</h1>
|
|
|
|
|
|
|
|
|
|
<h2 align="center">FAQ</h2>
|
|
|
|
|
</td>
|
|
|
|
|
<td width="50">
|
|
|
|
|
<h3><a href="index.html"><img height="45" width="43" alt=
|
|
|
|
|
"Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
|
|
|
|
</td>
|
|
|
|
|
</tr>
|
|
|
|
|
</table>
|
|
|
|
|
|
|
|
|
|
<br>
|
|
|
|
|
<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<hr>
|
|
|
|
|
<font color="#ff0000"><font color="#ff0000"></font></font>
|
|
|
|
|
<p><font color="#ff0000"><font color="#ff0000"><font color=
|
|
|
|
|
"#ff0000"> Q. Why can't I use the "convenience" versions of
|
|
|
|
|
regex_match / regex_search / regex_grep / regex_format /
|
|
|
|
|
regex_merge?</font></font></font></p>
|
|
|
|
|
|
|
|
|
|
<p>A. These versions may or may not be available depending upon the
|
|
|
|
|
capabilities of your compiler, the rules determining the format of
|
|
|
|
|
these functions are quite complex - and only the versions visible
|
|
|
|
|
to a standard compliant compiler are given in the help. To find out
|
|
|
|
|
what your compiler supports, run <boost/regex.hpp> through
|
|
|
|
|
your C++ pre-processor, and search the output file for the function
|
|
|
|
|
that you are interested in.<font color="#ff0000"><font color=
|
|
|
|
|
"#ff0000"></font></font></p>
|
|
|
|
|
|
|
|
|
|
<p><font color="#ff0000"><font color="#ff0000">Q. I can't get
|
|
|
|
|
regex++ to work with escape characters, what's going
|
|
|
|
|
on?</font></font></p>
|
|
|
|
|
|
|
|
|
|
<p>A. If you embed regular expressions in C++ code, then remember
|
|
|
|
|
that escape characters are processed twice: once by the C++
|
|
|
|
|
compiler, and once by the regex++ expression compiler, so to pass
|
|
|
|
|
the regular expression \d+ to regex++, you need to embed "\\d+" in
|
|
|
|
|
your code. Likewise to match a literal backslash you will need to
|
|
|
|
|
embed "\\\\" in your code. <font color="#ff0000"></font></p>
|
|
|
|
|
|
|
|
|
|
<p><font color="#ff0000">Q. Why does using parenthesis in a POSIX
|
|
|
|
|
regular expression change the result of a match?</font></p>
|
|
|
|
|
|
|
|
|
|
<p>For POSIX (extended and basic) regular expressions, but not for
|
|
|
|
|
perl regexes, parentheses don't only mark; they determine what the
|
|
|
|
|
best match is as well. When the expression is compiled as a POSIX
|
|
|
|
|
basic or extended regex then Boost.regex follows the POSIX standard
|
|
|
|
|
leftmost longest rule for determining what matched. So if there is
|
|
|
|
|
more than one possible match after considering the whole
|
|
|
|
|
expression, it looks next at the first sub-expression and then the
|
|
|
|
|
second sub-expression and so on. So...</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
"(0*)([0-9]*)" against "00123" would produce
|
|
|
|
|
$1 = "00"
|
|
|
|
|
$2 = "123"
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>where as</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
"0*([0-9)*" against "00123" would produce
|
|
|
|
|
$1 = "00123"
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>If you think about it, had $1 only matched the "123", this would
|
|
|
|
|
be "less good" than the match "00123" which is both further to the
|
|
|
|
|
left and longer. If you want $1 to match only the "123" part, then
|
|
|
|
|
you need to use something like:</p>
|
|
|
|
|
|
|
|
|
|
<pre>
|
|
|
|
|
"0*([1-9][0-9]*)"
|
|
|
|
|
</pre>
|
|
|
|
|
|
|
|
|
|
<p>as the expression.</p>
|
|
|
|
|
|
|
|
|
|
<p><font color="#ff0000">Q. Why don't character ranges work
|
|
|
|
|
properly (POSIX mode only)?</font><br>
|
|
|
|
|
A. The POSIX standard specifies that character range expressions
|
|
|
|
|
are locale sensitive - so for example the expression [A-Z] will
|
|
|
|
|
match any collating element that collates between 'A' and 'Z'. That
|
|
|
|
|
means that for most locales other than "C" or "POSIX", [A-Z] would
|
|
|
|
|
match the single character 't' for example, which is not what most
|
|
|
|
|
people expect - or at least not what most people have come to
|
|
|
|
|
expect from regular expression engines. For this reason, the
|
|
|
|
|
default behaviour of boost.regex (perl mode) is to turn locale
|
|
|
|
|
sensitive collation off by not setting the regex_constants::collate
|
|
|
|
|
compile time flag. However if you set a non-default compile time
|
|
|
|
|
flag - for example regex_constants::extended or
|
|
|
|
|
regex_constants::basic, then locale dependent collation will be
|
|
|
|
|
enabled, this also applies to the POSIX API functions which use
|
|
|
|
|
either regex_constants::extended or regex_constants::basic
|
|
|
|
|
internally. <i>[Note - when regex_constants::nocollate in effect,
|
|
|
|
|
the library behaves "as if" the LC_COLLATE locale category were
|
|
|
|
|
always "C", regardless of what its actually set to - end
|
|
|
|
|
note</i>].</p>
|
|
|
|
|
|
|
|
|
|
<p><font color="#ff0000">Q. Why are there no throw specifications
|
|
|
|
|
on any of the functions? What exceptions can the library
|
|
|
|
|
throw?</font></p>
|
|
|
|
|
|
|
|
|
|
<p>A. Not all compilers support (or honor) throw specifications,
|
|
|
|
|
others support them but with reduced efficiency. Throw
|
|
|
|
|
specifications may be added at a later date as compilers begin to
|
|
|
|
|
handle this better. The library should throw only three types of
|
|
|
|
|
exception: boost::bad_expression can be thrown by basic_regex when
|
|
|
|
|
compiling a regular expression, std::runtime_error can be thrown
|
|
|
|
|
when a call to basic_regex::imbue tries to open a message catalogue
|
|
|
|
|
that doesn't exist, or when a call to regex_search or regex_match
|
|
|
|
|
results in an "everlasting" search, or when a call to
|
|
|
|
|
RegEx::GrepFiles or RegEx::FindFiles tries to open a file that
|
|
|
|
|
cannot be opened, finally std::bad_alloc can be thrown by just
|
|
|
|
|
about any of the functions in this library.</p>
|
|
|
|
|
|
|
|
|
|
<p></p>
|
|
|
|
|
|
|
|
|
|
<hr>
|
2003-10-24 10:51:38 +00:00
|
|
|
|
<p>Revised
|
|
|
|
|
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
|
|
|
|
24 Oct 2003
|
|
|
|
|
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
|
|
|
|
<p><i><EFBFBD> Copyright John Maddock 1998-
|
|
|
|
|
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan -->
|
|
|
|
|
2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p>
|
|
|
|
|
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
|
|
|
|
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
|
|
|
|
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
2003-05-17 11:45:48 +00:00
|
|
|
|
</body>
|
|
|
|
|
</html>
|
|
|
|
|
|
|
|
|
|
|