algorithm/string/doc/usage.xml

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
<section id="string_algo.usage" last-revision="$Date$">
    <title>Usage</title>

    <using-namespace name="boost"/>
    <using-namespace name="boost::string_algo"/>


    <section>
        <title>First Example</title>
        
        <para>
            Using the algorithms is straightforward. Let us have a look at the first example:
        </para>
        <programlisting>
    #include &lt;boost/string_algo.hpp&gt;
    using namespace std;
    using namespace boost;
    namespace sa=boost::string_algo
    
    // ...

    string str1(" hello world! ");
    trim( to_upper(str1) );  // str1 == "HELLO WORLD!"

    string str2=ireplace_first_copy(str1,"hello","goodbye"); // str2 == "goodbye WORLD!"
        </programlisting>
        <para>
            This example converts str1 to upper case and trims spaces from the start and the end
            of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".
            This example demonstrates several important concepts used in the library:
        </para>
        <itemizedlist>
            <listitem>
                <para><emphasis role="bold">Container parameters:</emphasis>
                    Unlike the STL algorithms, parameters are not specified only in form
                    of iterators. The STL convention allows for great flexibility,
                    but it has several limitation. It is not possible to <emphasis>stack</emphasis> algorithms together, 
                    because a container is passed in two parameters, so it is not possible to use 
                    a return value from another algorithm. It is considerably easier to write
                    <code>to_lower(str1)</code>, then <code>to_lower(str1.begin(), str1.end())</code>.
                </para>
                <para>
                    The magic of <link linkend="string_algo.container_traits">container_traits</link> 
                    provides a uniform way of handling different containers. 
                    If there is a need to pass a pair of iterators, 
                    <link linkend="string_algo.iterator_range"><code>iterator_range</code></link>
                    can be used to package iterators into a structure with the container interface.
                </para>
            </listitem>
            <listitem>
                <para><emphasis role="bold">Copy vs. Mutable:</emphasis>
                    Many algorithms in the library are performing a transformation of the input. 
                    The transformation can be done in-place, mutating the input sequence, or a copy 
                    of the transformed input can be created, leaving the input intact. None of 
                    these possibilities is superior to the other one and both have different 
                    advantages and disadvantages. For this reason, both are provided with the library. 
                </para>
            </listitem>
            <listitem>
                <para><emphasis role="bold">Algorithm stacking:</emphasis>
                    Copy versions return a transformed input as a result. Mutable variants return 
                    a reference to the input. Thus both versions allow a simple chaining of
                    transformations within one expression (i.e. one can write 
                    <code>trim_copy(to_upper_copy(s))</code> as well as <code>trim(to_upper(s))</code>). 
                </para>
            </listitem>
            <listitem>
                <para><emphasis role="bold">Naming:</emphasis>
                    Naming follows the conventions from the Standard C++ Library. If there is a 
                    copy and mutable version of the same algorithm, the mutable version has no suffix 
                    and the copy version has suffix <emphasis>_copy</emphasis>. 
                    Some algorithms have prefix <emphasis>i</emphasis> 
                    (e.g. <functionname>ifind_first()</functionname>).
                    This prefix identifies that the algorithm works in a case-insensitive manner.
                </para>
            </listitem>
        </itemizedlist>
        <para>
            To use the library, include the <headername>boost/string_algo.hpp</headername> header. 
            If the regex related functions are needed, include the 
            <headername>boost/string_algo_regex.hpp</headername> header.
        </para>
    </section>
    <section>
        <title>Case conversion</title>
        
        <para>
            STL has a nice way of converting character case. Unfortunately, it works only
            for a single character and we want to convert a string, 
        </para>
        <programlisting>
    string str1("HeLlO WoRld!");
    to_upper(str1); // str1=="HELLO WORLD!"
        </programlisting>
        <para>
            <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of 
            characters in a container using a specified locale.
        </para>
    </section>
    <section>
        <title>Predicates and Classification</title>
        <para>
            A part of the library deals with string related predicates. Consider this example:
        </para>
        <programlisting>
    bool is_executable( string&amp; filename )
    {
        return 
            iends_with(filename, ".exe") ||
            iends_with(filename, ".com");
    }

    // ...
    string str1("command.com");
    cout 
        &lt;&lt; str1
        &lt;&lt; is_executable("command.com")? "is": "is not" 
        &lt;&lt; "an executable" 
        &lt;&lt; endl; // prints "command.com is an executable"
    
    //..
    char text1[]="hello world!";
    cout 
        &lt;&lt; text1 
        &lt;&lt; all( text1, is_lower&lt;char&gt;() )? "is": "is not"
        &lt;&lt; "written in the lower case" 
        &lt;&lt; endl; // prints "hello world! is written in the lower case"
        </programlisting>
        <para>
            The predicates are resolving if a substring is contained in the input string
            under various conditions. The conditions are if a string starts with the substring, 
            ends with the substring, 
            simply contains the substring or if both strings are equal. See the reference for 
            <headername>boost/string_algo/predicate.hpp</headername> for more details. 
            In addition the algorithm <functionname>all()</functionname> checks
            all elements of a container to satisfy a condition specified by a predicate. 
            This predicate can be any unary predicate, but the library provides a bunch of 
            useful string-related predicates ready for use.
            These are located in the <headername>boost/string_algo/classification.hpp</headername> header.
        </para>
    </section>
    <section>
        <title>Trimming</title>
        
        <para>
            When parsing the input of a user, strings usually have unwanted leading or trailing 
            characters. To get rid of them, we need trim functions:
        </para>
        <programlisting>
    string str1="     hello world!     ";
    string str2=trim_left_copy(str1);   // str2 == "hello world!     "
    string str3=trim_right_copy(str2);  // str3 == "     hello world!"
    trim(str1);                         // str1 == "hello world!"

    string phone="00423333444";
    // remove leading 0 from the phone number
    sa::trim_left(phone,is_any_of&lt;char&gt;("0")); // phone == "423333444"
        </programlisting>
        <para>
            It is possible to trim the spaces on the right, on the left or on the both sides of a string.
            And for those cases when there is a need to remove something else than blank space, the
            <code>string_algo</code> namespace contains generic versions of the trim algorithms. Using these, 
            a user can specify a functor which will select the <emphasis>space</emphasis> to be removed. It is possible to use 
            classification predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.
            See the reference for the <headername>boost/string_algo/trim.hpp</headername>.
        </para>
    </section>
    <section>
        <title>Find algorithms</title>
        
        <para>
            The library contains a set of find algorithms. Here is an example:
        </para>
        <programlisting>
    char text[]="hello dolly!";
    iterator_range&lt;char*&gt; result=find_last(text,"ll");

    transform( result.begin(), result.end(), result.begin(), bind2nd(plus&lt;char&gt;(), 1) );
    // text = "hello dommy!"            

    to_upper(result); // text == "hello doMMy!"
        </programlisting>
        <para>
            We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".
            The result is given in the <link linkend="string_algo.iterator_range"><code>iterator_range</code></link>. 
            This range delimits the
            part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
            
            As we can see, input of the <functionname>find_last()</functionname> algorithm can be also 
            char[] because this type is supported by 
            <link linkend="string_algo.container_traits">container_traits</link>.

            Following lines transform the result. Notice, that 
            <link linkend="string_algo.iterator_range"><code>iterator_range</code></link> have familiar 
            <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.
        </para>
    </section>
    <section>
        <title>Replace Algorithms</title>
        <para>
            Find algorithms can be used for searching for a specific part of the sequence. Replace goes one step
            further. After a matching part is found, it is substituted with something else. The substitution is computed
            from an original, using some transformation. 
        </para>
        <programlisting>
    string str1="Hello  Dolly,   Hello World!"
    replace_first(str1, "Dolly", "Jane");      // str1 == "Hello  Jane,   Hello World!"
    replace_last(str1, "Hello", "Goodbye");    // str1 == "Hello  Jane,   Goodbye World!"
    erase_all(str1, " ");                      // str1 == "HelloJane,GoodbyeWorld!"
    erase_head(str1, 6);                       // str1 == "Jane,GoodbyeWorld!"
        </programlisting>
        <para>
            For the complete list of replace and erase functions see the 
            <link linkend="string_algo.reference">reference</link>.
            There is a lot of predefined function for common usage, however, the library allows you to 
            define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>replace()</functionname> 
            function which takes two parameters.
            The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is 
            a <link linkend="string_algo.formatter_concept">Formatter</link> object. 
            The Finder object is a functor which performs the searching for the replacement part. The Formatter object
            takes the result of the Finder (usually a reference to the found substring) and creates a 
            substitute for it. Replace algorithm puts these two together and makes the desired substitution. 
        </para>
    </section>
    <section>
        <title>Split</title>

        <para>
            Split algorithms allow one to divide a sequence into parts. Each part represents a 
            <emphasis>token</emphasis> and  tokens are separated by <emphasis>separators</emphasis>. 
            One can either search for tokens or search for separators:
        </para>

        <programlisting>
    string str1("hello abc-*-ABC-*-aBc goodbye");

    typedef vector&lt; iterator_range&lt;string::iterator&gt; &gt; find_vector_type;
    
    find_vector_type FindVec; // #1: Search for separators
    ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }

    typdef vector&lt; string &gt; split_vector_type;
    
    split_vector_type SplitVec; // #2: Search for tokens
    split( SplitVec, str1, is_any_of&lt;char&gt;("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
        </programlisting>
        <para>
            <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.                       
        </para>
        <para>
            The result of a split algorithm is a <emphasis>container of containers</emphasis>. There is only one restriction:
            The inner container type must be able to hold extracted parts of the input sequence. This example 
            shows the special case where the inner container is an 
            <link linkend="string_algo.iterator_range"><code>iterator_range</code></link> 
            instead of e.g. <code>std::string</code>. This way, a user gets a reference 
            (in the form of iterators) delimiting the parts of the input sequence. Otherwise, a copy of 
            each extracted part is created and added to the outer container.
        </para>
        <para>
            So to recap, there are two basic algorithms: <functionname>find_all()</functionname> 
            returns extracts the parts
            matching the specification whereas <functionname>split()</functionname> uses the matching 
            parts as delimiters, and extracts the parts in between them. 
        </para>
        <para>
            Generalizations of these two algorithms are called <functionname>iter_find()</functionname> and 
            <functionname>iter_split()</functionname>. They take a 
            <link linkend="string_algo.finder_concept">Finder</link> object, as an argument to search for 
            the substring. 
        </para>     
    </section>
</section>
String Algorithm Library: initial commit [SVN r22431] 2004-03-04 22:12:19 +00:00			`<?xml version="1.0" encoding="utf-8"?>`
			`<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"`
			`"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">`
			`<section id="string_algo.usage" last-revision="$Date$">`
			`<title>Usage</title>`

			`<using-namespace name="boost"/>`
			`<using-namespace name="boost::string_algo"/>`


			`<section>`
			`<title>First Example</title>`

			`<para>`
			`Using the algorithms is straightforward. Let us have a look at the first example:`
			`</para>`
			`<programlisting>`
			`#include <boost/string_algo.hpp>`
			`using namespace std;`
			`using namespace boost;`
			`namespace sa=boost::string_algo`

			`// ...`

			`string str1(" hello world! ");`
			`trim( to_upper(str1) ); // str1 == "HELLO WORLD!"`

			`string str2=ireplace_first_copy(str1,"hello","goodbye"); // str2 == "goodbye WORLD!"`
			`</programlisting>`
			`<para>`
			`This example converts str1 to upper case and trims spaces from the start and the end`
			`of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".`
			`This example demonstrates several important concepts used in the library:`
			`</para>`
			`<itemizedlist>`
			`<listitem>`
			`<para><emphasis role="bold">Container parameters:</emphasis>`
			`Unlike the STL algorithms, parameters are not specified only in form`
			`of iterators. The STL convention allows for great flexibility,`
			`but it has several limitation. It is not possible to <emphasis>stack</emphasis> algorithms together,`
			`because a container is passed in two parameters, so it is not possible to use`
			`a return value from another algorithm. It is considerably easier to write`
			`<code>to_lower(str1)</code>, then <code>to_lower(str1.begin(), str1.end())</code>.`
			`</para>`
			`<para>`
			`The magic of <link linkend="string_algo.container_traits">container_traits</link>`
			`provides a uniform way of handling different containers.`
			`If there is a need to pass a pair of iterators,`
			`<link linkend="string_algo.iterator_range"><code>iterator_range</code></link>`
			`can be used to package iterators into a structure with the container interface.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para><emphasis role="bold">Copy vs. Mutable:</emphasis>`
			`Many algorithms in the library are performing a transformation of the input.`
			`The transformation can be done in-place, mutating the input sequence, or a copy`
			`of the transformed input can be created, leaving the input intact. None of`
			`these possibilities is superior to the other one and both have different`
			`advantages and disadvantages. For this reason, both are provided with the library.`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para><emphasis role="bold">Algorithm stacking:</emphasis>`
			`Copy versions return a transformed input as a result. Mutable variants return`
			`a reference to the input. Thus both versions allow a simple chaining of`
			`transformations within one expression (i.e. one can write`
			`<code>trim_copy(to_upper_copy(s))</code> as well as <code>trim(to_upper(s))</code>).`
			`</para>`
			`</listitem>`
			`<listitem>`
			`<para><emphasis role="bold">Naming:</emphasis>`
			`Naming follows the conventions from the Standard C++ Library. If there is a`
			`copy and mutable version of the same algorithm, the mutable version has no suffix`
			`and the copy version has suffix <emphasis>_copy</emphasis>.`
			`Some algorithms have prefix <emphasis>i</emphasis>`
			`(e.g. <functionname>ifind_first()</functionname>).`
			`This prefix identifies that the algorithm works in a case-insensitive manner.`
			`</para>`
			`</listitem>`
			`</itemizedlist>`
			`<para>`
			`To use the library, include the <headername>boost/string_algo.hpp</headername> header.`
			`If the regex related functions are needed, include the`
			`<headername>boost/string_algo_regex.hpp</headername> header.`
			`</para>`
			`</section>`
			`<section>`
			`<title>Case conversion</title>`

			`<para>`
			`STL has a nice way of converting character case. Unfortunately, it works only`
			`for a single character and we want to convert a string,`
			`</para>`
			`<programlisting>`
			`string str1("HeLlO WoRld!");`
			`to_upper(str1); // str1=="HELLO WORLD!"`
			`</programlisting>`
			`<para>`
			`<functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of`
			`characters in a container using a specified locale.`
			`</para>`
			`</section>`
			`<section>`
			`<title>Predicates and Classification</title>`
			`<para>`
			`A part of the library deals with string related predicates. Consider this example:`
			`</para>`
			`<programlisting>`
			`bool is_executable( string& filename )`
			`{`
			`return`
			`iends_with(filename, ".exe") \|\|`
			`iends_with(filename, ".com");`
			`}`

			`// ...`
			`string str1("command.com");`
			`cout`
			`<< str1`
			`<< is_executable("command.com")? "is": "is not"`
			`<< "an executable"`
			`<< endl; // prints "command.com is an executable"`

			`//..`
			`char text1[]="hello world!";`
			`cout`
			`<< text1`
			`<< all( text1, is_lower<char>() )? "is": "is not"`
			`<< "written in the lower case"`
			`<< endl; // prints "hello world! is written in the lower case"`
			`</programlisting>`
			`<para>`
			`The predicates are resolving if a substring is contained in the input string`
			`under various conditions. The conditions are if a string starts with the substring,`
			`ends with the substring,`
			`simply contains the substring or if both strings are equal. See the reference for`
			`<headername>boost/string_algo/predicate.hpp</headername> for more details.`
			`In addition the algorithm <functionname>all()</functionname> checks`
			`all elements of a container to satisfy a condition specified by a predicate.`
			`This predicate can be any unary predicate, but the library provides a bunch of`
			`useful string-related predicates ready for use.`
			`These are located in the <headername>boost/string_algo/classification.hpp</headername> header.`
			`</para>`
			`</section>`
			`<section>`
			`<title>Trimming</title>`

			`<para>`
			`When parsing the input of a user, strings usually have unwanted leading or trailing`
			`characters. To get rid of them, we need trim functions:`
			`</para>`
			`<programlisting>`
			`string str1=" hello world! ";`
			`string str2=trim_left_copy(str1); // str2 == "hello world! "`
			`string str3=trim_right_copy(str2); // str3 == " hello world!"`
			`trim(str1); // str1 == "hello world!"`

			`string phone="00423333444";`
			`// remove leading 0 from the phone number`
			`sa::trim_left(phone,is_any_of<char>("0")); // phone == "423333444"`
			`</programlisting>`
			`<para>`
			`It is possible to trim the spaces on the right, on the left or on the both sides of a string.`
			`And for those cases when there is a need to remove something else than blank space, the`
			`<code>string_algo</code> namespace contains generic versions of the trim algorithms. Using these,`
			`a user can specify a functor which will select the <emphasis>space</emphasis> to be removed. It is possible to use`
			`classification predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.`
			`See the reference for the <headername>boost/string_algo/trim.hpp</headername>.`
			`</para>`
			`</section>`
			`<section>`
			`<title>Find algorithms</title>`

			`<para>`
			`The library contains a set of find algorithms. Here is an example:`
			`</para>`
			`<programlisting>`
			`char text[]="hello dolly!";`
			`iterator_range<char*> result=find_last(text,"ll");`

			`transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) );`
			`// text = "hello dommy!"`

			`to_upper(result); // text == "hello doMMy!"`
			`</programlisting>`
			`<para>`
			`We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".`
			`The result is given in the <link linkend="string_algo.iterator_range"><code>iterator_range</code></link>.`
			`This range delimits the`
			`part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".`

			`As we can see, input of the <functionname>find_last()</functionname> algorithm can be also`
			`char[] because this type is supported by`
			`<link linkend="string_algo.container_traits">container_traits</link>.`

			`Following lines transform the result. Notice, that`
			`<link linkend="string_algo.iterator_range"><code>iterator_range</code></link> have familiar`
			`<code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.`
			`</para>`
			`</section>`
			`<section>`
			`<title>Replace Algorithms</title>`
			`<para>`
			`Find algorithms can be used for searching for a specific part of the sequence. Replace goes one step`
			`further. After a matching part is found, it is substituted with something else. The substitution is computed`
			`from an original, using some transformation.`
			`</para>`
			`<programlisting>`
			`string str1="Hello Dolly, Hello World!"`
			`replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!"`
			`replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!"`
			`erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!"`
			`erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!"`
			`</programlisting>`
			`<para>`
			`For the complete list of replace and erase functions see the`
			`<link linkend="string_algo.reference">reference</link>.`
			`There is a lot of predefined function for common usage, however, the library allows you to`
			`define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>replace()</functionname>`
			`function which takes two parameters.`
			`The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is`
			`a <link linkend="string_algo.formatter_concept">Formatter</link> object.`
			`The Finder object is a functor which performs the searching for the replacement part. The Formatter object`
			`takes the result of the Finder (usually a reference to the found substring) and creates a`
			`substitute for it. Replace algorithm puts these two together and makes the desired substitution.`
			`</para>`
			`</section>`
			`<section>`
			`<title>Split</title>`

			`<para>`
			`Split algorithms allow one to divide a sequence into parts. Each part represents a`
			`<emphasis>token</emphasis> and tokens are separated by <emphasis>separators</emphasis>.`
			`One can either search for tokens or search for separators:`
			`</para>`

			`<programlisting>`
			`string str1("hello abc--ABC--aBc goodbye");`

			`typedef vector< iterator_range<string::iterator> > find_vector_type;`

			`find_vector_type FindVec; // #1: Search for separators`
			`ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }`

			`typdef vector< string > split_vector_type;`

			`split_vector_type SplitVec; // #2: Search for tokens`
			`split( SplitVec, str1, is_any_of<char>("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }`
			`</programlisting>`
			`<para>`
			`<code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.`
			`</para>`
			`<para>`
			`The result of a split algorithm is a <emphasis>container of containers</emphasis>. There is only one restriction:`
			`The inner container type must be able to hold extracted parts of the input sequence. This example`
			`shows the special case where the inner container is an`
			`<link linkend="string_algo.iterator_range"><code>iterator_range</code></link>`
			`instead of e.g. <code>std::string</code>. This way, a user gets a reference`
			`(in the form of iterators) delimiting the parts of the input sequence. Otherwise, a copy of`
			`each extracted part is created and added to the outer container.`
			`</para>`
			`<para>`
			`So to recap, there are two basic algorithms: <functionname>find_all()</functionname>`
			`returns extracts the parts`
			`matching the specification whereas <functionname>split()</functionname> uses the matching`
			`parts as delimiters, and extracts the parts in between them.`
			`</para>`
			`<para>`
			`Generalizations of these two algorithms are called <functionname>iter_find()</functionname> and`
			`<functionname>iter_split()</functionname>. They take a`
			`<link linkend="string_algo.finder_concept">Finder</link> object, as an argument to search for`
			`the substring.`
			`</para>`
			`</section>`
			`</section>`