mirror of
https://github.com/boostorg/algorithm.git
synced 2025-07-29 03:57:17 +02:00
String Algorithm Library: initial commit
[SVN r22431]
This commit is contained in:
275
string/doc/usage.xml
Normal file
275
string/doc/usage.xml
Normal file
@ -0,0 +1,275 @@
|
||||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
|
||||
"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
|
||||
<section id="string_algo.usage" last-revision="$Date$">
|
||||
<title>Usage</title>
|
||||
|
||||
<using-namespace name="boost"/>
|
||||
<using-namespace name="boost::string_algo"/>
|
||||
|
||||
|
||||
<section>
|
||||
<title>First Example</title>
|
||||
|
||||
<para>
|
||||
Using the algorithms is straightforward. Let us have a look at the first example:
|
||||
</para>
|
||||
<programlisting>
|
||||
#include <boost/string_algo.hpp>
|
||||
using namespace std;
|
||||
using namespace boost;
|
||||
namespace sa=boost::string_algo
|
||||
|
||||
// ...
|
||||
|
||||
string str1(" hello world! ");
|
||||
trim( to_upper(str1) ); // str1 == "HELLO WORLD!"
|
||||
|
||||
string str2=ireplace_first_copy(str1,"hello","goodbye"); // str2 == "goodbye WORLD!"
|
||||
</programlisting>
|
||||
<para>
|
||||
This example converts str1 to upper case and trims spaces from the start and the end
|
||||
of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".
|
||||
This example demonstrates several important concepts used in the library:
|
||||
</para>
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para><emphasis role="bold">Container parameters:</emphasis>
|
||||
Unlike the STL algorithms, parameters are not specified only in form
|
||||
of iterators. The STL convention allows for great flexibility,
|
||||
but it has several limitation. It is not possible to <emphasis>stack</emphasis> algorithms together,
|
||||
because a container is passed in two parameters, so it is not possible to use
|
||||
a return value from another algorithm. It is considerably easier to write
|
||||
<code>to_lower(str1)</code>, then <code>to_lower(str1.begin(), str1.end())</code>.
|
||||
</para>
|
||||
<para>
|
||||
The magic of <link linkend="string_algo.container_traits">container_traits</link>
|
||||
provides a uniform way of handling different containers.
|
||||
If there is a need to pass a pair of iterators,
|
||||
<link linkend="string_algo.iterator_range"><code>iterator_range</code></link>
|
||||
can be used to package iterators into a structure with the container interface.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para><emphasis role="bold">Copy vs. Mutable:</emphasis>
|
||||
Many algorithms in the library are performing a transformation of the input.
|
||||
The transformation can be done in-place, mutating the input sequence, or a copy
|
||||
of the transformed input can be created, leaving the input intact. None of
|
||||
these possibilities is superior to the other one and both have different
|
||||
advantages and disadvantages. For this reason, both are provided with the library.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para><emphasis role="bold">Algorithm stacking:</emphasis>
|
||||
Copy versions return a transformed input as a result. Mutable variants return
|
||||
a reference to the input. Thus both versions allow a simple chaining of
|
||||
transformations within one expression (i.e. one can write
|
||||
<code>trim_copy(to_upper_copy(s))</code> as well as <code>trim(to_upper(s))</code>).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para><emphasis role="bold">Naming:</emphasis>
|
||||
Naming follows the conventions from the Standard C++ Library. If there is a
|
||||
copy and mutable version of the same algorithm, the mutable version has no suffix
|
||||
and the copy version has suffix <emphasis>_copy</emphasis>.
|
||||
Some algorithms have prefix <emphasis>i</emphasis>
|
||||
(e.g. <functionname>ifind_first()</functionname>).
|
||||
This prefix identifies that the algorithm works in a case-insensitive manner.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
<para>
|
||||
To use the library, include the <headername>boost/string_algo.hpp</headername> header.
|
||||
If the regex related functions are needed, include the
|
||||
<headername>boost/string_algo_regex.hpp</headername> header.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Case conversion</title>
|
||||
|
||||
<para>
|
||||
STL has a nice way of converting character case. Unfortunately, it works only
|
||||
for a single character and we want to convert a string,
|
||||
</para>
|
||||
<programlisting>
|
||||
string str1("HeLlO WoRld!");
|
||||
to_upper(str1); // str1=="HELLO WORLD!"
|
||||
</programlisting>
|
||||
<para>
|
||||
<functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of
|
||||
characters in a container using a specified locale.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Predicates and Classification</title>
|
||||
<para>
|
||||
A part of the library deals with string related predicates. Consider this example:
|
||||
</para>
|
||||
<programlisting>
|
||||
bool is_executable( string& filename )
|
||||
{
|
||||
return
|
||||
iends_with(filename, ".exe") ||
|
||||
iends_with(filename, ".com");
|
||||
}
|
||||
|
||||
// ...
|
||||
string str1("command.com");
|
||||
cout
|
||||
<< str1
|
||||
<< is_executable("command.com")? "is": "is not"
|
||||
<< "an executable"
|
||||
<< endl; // prints "command.com is an executable"
|
||||
|
||||
//..
|
||||
char text1[]="hello world!";
|
||||
cout
|
||||
<< text1
|
||||
<< all( text1, is_lower<char>() )? "is": "is not"
|
||||
<< "written in the lower case"
|
||||
<< endl; // prints "hello world! is written in the lower case"
|
||||
</programlisting>
|
||||
<para>
|
||||
The predicates are resolving if a substring is contained in the input string
|
||||
under various conditions. The conditions are if a string starts with the substring,
|
||||
ends with the substring,
|
||||
simply contains the substring or if both strings are equal. See the reference for
|
||||
<headername>boost/string_algo/predicate.hpp</headername> for more details.
|
||||
In addition the algorithm <functionname>all()</functionname> checks
|
||||
all elements of a container to satisfy a condition specified by a predicate.
|
||||
This predicate can be any unary predicate, but the library provides a bunch of
|
||||
useful string-related predicates ready for use.
|
||||
These are located in the <headername>boost/string_algo/classification.hpp</headername> header.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Trimming</title>
|
||||
|
||||
<para>
|
||||
When parsing the input of a user, strings usually have unwanted leading or trailing
|
||||
characters. To get rid of them, we need trim functions:
|
||||
</para>
|
||||
<programlisting>
|
||||
string str1=" hello world! ";
|
||||
string str2=trim_left_copy(str1); // str2 == "hello world! "
|
||||
string str3=trim_right_copy(str2); // str3 == " hello world!"
|
||||
trim(str1); // str1 == "hello world!"
|
||||
|
||||
string phone="00423333444";
|
||||
// remove leading 0 from the phone number
|
||||
sa::trim_left(phone,is_any_of<char>("0")); // phone == "423333444"
|
||||
</programlisting>
|
||||
<para>
|
||||
It is possible to trim the spaces on the right, on the left or on the both sides of a string.
|
||||
And for those cases when there is a need to remove something else than blank space, the
|
||||
<code>string_algo</code> namespace contains generic versions of the trim algorithms. Using these,
|
||||
a user can specify a functor which will select the <emphasis>space</emphasis> to be removed. It is possible to use
|
||||
classification predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.
|
||||
See the reference for the <headername>boost/string_algo/trim.hpp</headername>.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Find algorithms</title>
|
||||
|
||||
<para>
|
||||
The library contains a set of find algorithms. Here is an example:
|
||||
</para>
|
||||
<programlisting>
|
||||
char text[]="hello dolly!";
|
||||
iterator_range<char*> result=find_last(text,"ll");
|
||||
|
||||
transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) );
|
||||
// text = "hello dommy!"
|
||||
|
||||
to_upper(result); // text == "hello doMMy!"
|
||||
</programlisting>
|
||||
<para>
|
||||
We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".
|
||||
The result is given in the <link linkend="string_algo.iterator_range"><code>iterator_range</code></link>.
|
||||
This range delimits the
|
||||
part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
|
||||
|
||||
As we can see, input of the <functionname>find_last()</functionname> algorithm can be also
|
||||
char[] because this type is supported by
|
||||
<link linkend="string_algo.container_traits">container_traits</link>.
|
||||
|
||||
Following lines transform the result. Notice, that
|
||||
<link linkend="string_algo.iterator_range"><code>iterator_range</code></link> have familiar
|
||||
<code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Replace Algorithms</title>
|
||||
<para>
|
||||
Find algorithms can be used for searching for a specific part of the sequence. Replace goes one step
|
||||
further. After a matching part is found, it is substituted with something else. The substitution is computed
|
||||
from an original, using some transformation.
|
||||
</para>
|
||||
<programlisting>
|
||||
string str1="Hello Dolly, Hello World!"
|
||||
replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!"
|
||||
replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!"
|
||||
erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!"
|
||||
erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!"
|
||||
</programlisting>
|
||||
<para>
|
||||
For the complete list of replace and erase functions see the
|
||||
<link linkend="string_algo.reference">reference</link>.
|
||||
There is a lot of predefined function for common usage, however, the library allows you to
|
||||
define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>replace()</functionname>
|
||||
function which takes two parameters.
|
||||
The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is
|
||||
a <link linkend="string_algo.formatter_concept">Formatter</link> object.
|
||||
The Finder object is a functor which performs the searching for the replacement part. The Formatter object
|
||||
takes the result of the Finder (usually a reference to the found substring) and creates a
|
||||
substitute for it. Replace algorithm puts these two together and makes the desired substitution.
|
||||
</para>
|
||||
</section>
|
||||
<section>
|
||||
<title>Split</title>
|
||||
|
||||
<para>
|
||||
Split algorithms allow one to divide a sequence into parts. Each part represents a
|
||||
<emphasis>token</emphasis> and tokens are separated by <emphasis>separators</emphasis>.
|
||||
One can either search for tokens or search for separators:
|
||||
</para>
|
||||
|
||||
<programlisting>
|
||||
string str1("hello abc-*-ABC-*-aBc goodbye");
|
||||
|
||||
typedef vector< iterator_range<string::iterator> > find_vector_type;
|
||||
|
||||
find_vector_type FindVec; // #1: Search for separators
|
||||
ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }
|
||||
|
||||
typdef vector< string > split_vector_type;
|
||||
|
||||
split_vector_type SplitVec; // #2: Search for tokens
|
||||
split( SplitVec, str1, is_any_of<char>("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
|
||||
</programlisting>
|
||||
<para>
|
||||
<code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.
|
||||
</para>
|
||||
<para>
|
||||
The result of a split algorithm is a <emphasis>container of containers</emphasis>. There is only one restriction:
|
||||
The inner container type must be able to hold extracted parts of the input sequence. This example
|
||||
shows the special case where the inner container is an
|
||||
<link linkend="string_algo.iterator_range"><code>iterator_range</code></link>
|
||||
instead of e.g. <code>std::string</code>. This way, a user gets a reference
|
||||
(in the form of iterators) delimiting the parts of the input sequence. Otherwise, a copy of
|
||||
each extracted part is created and added to the outer container.
|
||||
</para>
|
||||
<para>
|
||||
So to recap, there are two basic algorithms: <functionname>find_all()</functionname>
|
||||
returns extracts the parts
|
||||
matching the specification whereas <functionname>split()</functionname> uses the matching
|
||||
parts as delimiters, and extracts the parts in between them.
|
||||
</para>
|
||||
<para>
|
||||
Generalizations of these two algorithms are called <functionname>iter_find()</functionname> and
|
||||
<functionname>iter_split()</functionname>. They take a
|
||||
<link linkend="string_algo.finder_concept">Finder</link> object, as an argument to search for
|
||||
the substring.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
Reference in New Issue
Block a user