forked from boostorg/regex
67 lines
3.3 KiB
HTML
67 lines
3.3 KiB
HTML
![]() |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|||
|
<html>
|
|||
|
<head>
|
|||
|
<title>Boost.Regex: Index</title>
|
|||
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|||
|
<link rel="stylesheet" type="text/css" href="../../../boost.css">
|
|||
|
</head>
|
|||
|
<body>
|
|||
|
<P>
|
|||
|
<TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0">
|
|||
|
<TR>
|
|||
|
<td valign="top" width="300">
|
|||
|
<h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
|
|||
|
</td>
|
|||
|
<TD width="353">
|
|||
|
<H1 align="center">Boost.Regex</H1>
|
|||
|
<H2 align="center">Unicode Regular Expressions.</H2>
|
|||
|
</TD>
|
|||
|
<td width="50">
|
|||
|
<h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3>
|
|||
|
</td>
|
|||
|
</TR>
|
|||
|
</TABLE>
|
|||
|
</P>
|
|||
|
<HR>
|
|||
|
<p></p>
|
|||
|
<P>There are two ways to use Boost.Regex with Unicode strings:</P>
|
|||
|
<H3>Rely on wchar_t</H3>
|
|||
|
<P>If your platform's wchar_t type can hold Unicode strings, <EM>and</EM> your
|
|||
|
platform's C/C++ runtime correctly handles wide character constants (when
|
|||
|
passed to std::iswspace std::iswlower etc), then you can use boost::wregex to
|
|||
|
process Unicode. However, there are several disadvantages to this
|
|||
|
approach:</P>
|
|||
|
<UL>
|
|||
|
<LI>
|
|||
|
It's not portable: there's no guarantee on the width of wchar_t, or even
|
|||
|
whether the runtime treats wide characters as Unicode at all, most Windows
|
|||
|
compilers do so, but many Unix systems do not.</LI>
|
|||
|
<LI>
|
|||
|
There's no support for Unicode-specific character classes: [[:Nd:]], [[:Po:]]
|
|||
|
etc.</LI>
|
|||
|
<LI>
|
|||
|
You can only search strings that are encoded as sequences of wide characters,
|
|||
|
it is not possible to search UTF-8, or even UTF-16 on many platforms.</LI></UL>
|
|||
|
<H3>Use a Unicode Aware Regular Expression Type.</H3>
|
|||
|
<P>If you have the <A href="http://www.ibm.com/software/globalization/icu/">ICU
|
|||
|
library</A>, then Boost.Regex can be <A href="install.html#unicode">configured
|
|||
|
to make use of it</A>, and provide a distinct regular expression type
|
|||
|
(boost::u32regex), that supports both Unicode specific character properties,
|
|||
|
and the searching of text that is encoded in either UTF-8, UTF-16, or
|
|||
|
UTF-32. See: <A href="icu_strings.html">ICU string class support</A>.</P>
|
|||
|
<P>
|
|||
|
<HR>
|
|||
|
</P>
|
|||
|
<P></P>
|
|||
|
<p>Revised
|
|||
|
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->
|
|||
|
04 Jan 2005
|
|||
|
<!--webbot bot="Timestamp" endspan i-checksum="39359" --></p>
|
|||
|
<p><i><EFBFBD> Copyright John Maddock 2005</i></p>
|
|||
|
<P><I>Use, modification and distribution are subject to the Boost Software License,
|
|||
|
Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A>
|
|||
|
or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P>
|
|||
|
</body>
|
|||
|
</html>
|
|||
|
|