regex/doc/html/boost_regex/unicode.html

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Unicode and Boost.Regex</title>
<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="../index.html" title="Boost.Regex 7.0.1">
<link rel="up" href="../index.html" title="Boost.Regex 7.0.1">
<link rel="prev" href="intro.html" title="Introduction and Overview">
<link rel="next" href="captures.html" title="Understanding Marked Sub-Expressions and Captures">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
<td align="center"><a href="../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="boost_regex.unicode"></a><a class="link" href="unicode.html" title="Unicode and Boost.Regex">Unicode and Boost.Regex</a>
</h2></div></div></div>
<p>
      There are two ways to use Boost.Regex with Unicode strings:
    </p>
<h5>
<a name="boost_regex.unicode.h0"></a>
      <span class="phrase"><a name="boost_regex.unicode.rely_on_wchar_t"></a></span><a class="link" href="unicode.html#boost_regex.unicode.rely_on_wchar_t">Rely
      on wchar_t</a>
    </h5>
<p>
      If your platform's <code class="computeroutput"><span class="keyword">wchar_t</span></code> type
      can hold Unicode strings, and your platform's C/C++ runtime correctly handles
      wide character constants (when passed to <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswspace</span></code>
      <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswlower</span></code> etc), then you can use <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">wregex</span></code>
      to process Unicode. However, there are several disadvantages to this approach:
    </p>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
          It's not portable: there's no guarantee on the width of <code class="computeroutput"><span class="keyword">wchar_t</span></code>,
          or even whether the runtime treats wide characters as Unicode at all, most
          Windows compilers do so, but many Unix systems do not.
        </li>
<li class="listitem">
          There's no support for Unicode-specific character classes: <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Nd</span><span class="special">:]]</span></code>, <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Po</span><span class="special">:]]</span></code>
          etc.
        </li>
<li class="listitem">
          You can only search strings that are encoded as sequences of wide characters,
          it is not possible to search UTF-8, or even UTF-16 on many platforms.
        </li>
</ul></div>
<h5>
<a name="boost_regex.unicode.h1"></a>
      <span class="phrase"><a name="boost_regex.unicode.use_a_unicode_aware_regular_expr"></a></span><a class="link" href="unicode.html#boost_regex.unicode.use_a_unicode_aware_regular_expr">Use
      a Unicode Aware Regular Expression Type.</a>
    </h5>
<p>
      If you have the <a href="http://www.ibm.com/software/globalization/icu/" target="_top">ICU
      library</a>, then Boost.Regex provides a distinct regular expression type
      (boost::u32regex), that supports both Unicode specific character properties,
      and the searching of text that is encoded in either UTF-8, UTF-16, or UTF-32.
      See: <a class="link" href="ref/non_std_strings/icu.html" title="Working With Unicode and ICU String Types">ICU string class support</a>.
    </p>
</div>
<div class="copyright-footer">Copyright © 1998-2013 John Maddock<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<html>`
			`<head>`
Fix syntax formatting in docs. Replaces https://github.com/boostorg/regex/pull/107. 2020-10-12 18:22:57 +01:00			`<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`<title>Unicode and Boost.Regex</title>`
Rebuild the regex documentation. [SVN r63768] 2010-07-08 22:49:58 +00:00			`<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">`
Update installation instructions for icu. Regenerate docs. Fixes: https://github.com/boostorg/regex/issues/89. 2019-10-26 10:51:25 +01:00			`<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">`
Update history, regenerate docs. [CI SKIP] 2022-03-08 11:26:11 +00:00			`<link rel="home" href="../index.html" title="Boost.Regex 7.0.1">`
			`<link rel="up" href="../index.html" title="Boost.Regex 7.0.1">`
Regenerate documentation with shorter path names. Fixes: https://svn.boost.org/trac10/ticket/13001 2017-08-01 18:01:46 +01:00			`<link rel="prev" href="intro.html" title="Introduction and Overview">`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`<link rel="next" href="captures.html" title="Understanding Marked Sub-Expressions and Captures">`
Expunge C++03'isms from the docs. 2024-03-23 19:03:45 +00:00			`<meta name="viewport" content="width=device-width, initial-scale=1">`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`</head>`
			`<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">`
Added some missing match_flag_type options. [SVN r38626] 2007-08-13 17:54:01 +00:00			`<table cellpadding="2" width="100%"><tr>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>`
Fix doc typo from issue #1794. [SVN r44169] 2008-04-11 08:53:54 +00:00			`<td align="center"><a href="../../../../../index.html">Home</a></td>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>`
Fixes #1940. [SVN r47795] 2008-07-25 09:28:01 +00:00			`<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>`
			`<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<td align="center"><a href="../../../../../more/index.htm">More</a></td>`
Added some missing match_flag_type options. [SVN r38626] 2007-08-13 17:54:01 +00:00			`</tr></table>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<hr>`
			`<div class="spirit-nav">`
Regenerate documentation with shorter path names. Fixes: https://svn.boost.org/trac10/ticket/13001 2017-08-01 18:01:46 +01:00			`<a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`</div>`
Fix typos from https://svn.boost.org/trac/boost/ticket/9283 and update history. 2013-12-14 17:42:13 +00:00			`<div class="section">`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<div class="titlepage"><div><div><h2 class="title" style="clear: both">`
Change PDF URL to SF download page. [SVN r67528] 2011-01-01 12:27:00 +00:00			`<a name="boost_regex.unicode"></a><a class="link" href="unicode.html" title="Unicode and Boost.Regex">Unicode and Boost.Regex</a>`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`</h2></div></div></div>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<p>`
			`There are two ways to use Boost.Regex with Unicode strings:`
			`</p>`
Fix typo and regenerate docs. Fixes #6154. [SVN r76139] 2011-12-24 17:51:57 +00:00			`<h5>`
			`<a name="boost_regex.unicode.h0"></a>`
Suppress GCC warning and update History log. Regenerate docs. Fixes #7644. [SVN r81620] 2012-11-29 10:28:07 +00:00			`<span class="phrase"><a name="boost_regex.unicode.rely_on_wchar_t"></a></span><a class="link" href="unicode.html#boost_regex.unicode.rely_on_wchar_t">Rely`
Fix typo and regenerate docs. Fixes #6154. [SVN r76139] 2011-12-24 17:51:57 +00:00			`on wchar_t</a>`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`</h5>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<p>`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`If your platform's <code class="computeroutput"><span class="keyword">wchar_t</span></code> type`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`can hold Unicode strings, and your platform's C/C++ runtime correctly handles`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`wide character constants (when passed to <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswspace</span></code>`
			`<code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswlower</span></code> etc), then you can use <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">wregex</span></code>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`to process Unicode. However, there are several disadvantages to this approach:`
			`</p>`
Suppress GCC warning and update History log. Regenerate docs. Fixes #7644. [SVN r81620] 2012-11-29 10:28:07 +00:00			`<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">`
Add checked constructors to the Unicode iterators that need them. Update icu support code to use the new checking-constructors. Update tests to check the full Unicode character range (as of Unicode V6). Add minimal docs describing the iterators. [SVN r73271] 2011-07-21 10:01:09 +00:00			`<li class="listitem">`
Rebuild the regex documentation. [SVN r63768] 2010-07-08 22:49:58 +00:00			`It's not portable: there's no guarantee on the width of <code class="computeroutput"><span class="keyword">wchar_t</span></code>,`
			`or even whether the runtime treats wide characters as Unicode at all, most`
			`Windows compilers do so, but many Unix systems do not.`
			`</li>`
Add checked constructors to the Unicode iterators that need them. Update icu support code to use the new checking-constructors. Update tests to check the full Unicode character range (as of Unicode V6). Add minimal docs describing the iterators. [SVN r73271] 2011-07-21 10:01:09 +00:00			`<li class="listitem">`
Rebuild the regex documentation. [SVN r63768] 2010-07-08 22:49:58 +00:00			`There's no support for Unicode-specific character classes: <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Nd</span><span class="special">:]]</span></code>, <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Po</span><span class="special">:]]</span></code>`
			`etc.`
			`</li>`
Add checked constructors to the Unicode iterators that need them. Update icu support code to use the new checking-constructors. Update tests to check the full Unicode character range (as of Unicode V6). Add minimal docs describing the iterators. [SVN r73271] 2011-07-21 10:01:09 +00:00			`<li class="listitem">`
Rebuild the regex documentation. [SVN r63768] 2010-07-08 22:49:58 +00:00			`You can only search strings that are encoded as sequences of wide characters,`
			`it is not possible to search UTF-8, or even UTF-16 on many platforms.`
			`</li>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`</ul></div>`
Fix typo and regenerate docs. Fixes #6154. [SVN r76139] 2011-12-24 17:51:57 +00:00			`<h5>`
			`<a name="boost_regex.unicode.h1"></a>`
Change docs to use new performance test code, upgrade docs to quickbook 1.7. 2015-10-15 13:27:45 +01:00			`<span class="phrase"><a name="boost_regex.unicode.use_a_unicode_aware_regular_expr"></a></span><a class="link" href="unicode.html#boost_regex.unicode.use_a_unicode_aware_regular_expr">Use`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`a Unicode Aware Regular Expression Type.</a>`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`</h5>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<p>`
			`If you have the <a href="http://www.ibm.com/software/globalization/icu/" target="_top">ICU`
Update documentation for ICU usage. Regenerate docs. 2021-10-10 16:41:19 +01:00			`library</a>, then Boost.Regex provides a distinct regular expression type`
			`(boost::u32regex), that supports both Unicode specific character properties,`
			`and the searching of text that is encoded in either UTF-8, UTF-16, or UTF-32.`
			`See: <a class="link" href="ref/non_std_strings/icu.html" title="Working With Unicode and ICU String Types">ICU string class support</a>.`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`</p>`
			`</div>`
Expunge C++03'isms from the docs. 2024-03-23 19:03:45 +00:00			`<div class="copyright-footer">Copyright © 1998-2013 John Maddock<p>`
added license info in copyright notice at the footer [SVN r40867] 2007-11-07 03:23:31 +00:00			`Distributed under the Boost Software License, Version 1.0. (See accompanying`
			`file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)`
Fix typos reported by Sebastian Pipping. [SVN r42025] 2007-12-14 10:11:21 +00:00			`</p>`
Expunge C++03'isms from the docs. 2024-03-23 19:03:45 +00:00			`</div>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`<hr>`
			`<div class="spirit-nav">`
Regenerate documentation with shorter path names. Fixes: https://svn.boost.org/trac10/ticket/13001 2017-08-01 18:01:46 +01:00			`<a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>`
Initial commit of quickbook generated docs. [SVN r37943] 2007-06-08 09:23:23 +00:00			`</div>`
			`</body>`
			`</html>`