String Algorithm Library: initial commit

[SVN r22431]
2026-07-10 02:11:08 +02:00 · 2004-03-04 22:12:19 +00:00
parent 3d9b6badea
commit 1deef19d55
73 changed files with 12359 additions and 0 deletions
@@ -0,0 +1,275 @@
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
+"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
+<section id="string_algo.usage" last-revision="$Date$">
+    <title>Usage</title>
+
+    <using-namespace name="boost"/>
+    <using-namespace name="boost::string_algo"/>
+
+
+    <section>
+        <title>First Example</title>
+        
+        <para>
+            Using the algorithms is straightforward. Let us have a look at the first example:
+        </para>
+        <programlisting>
+    #include &lt;boost/string_algo.hpp&gt;
+    using namespace std;
+    using namespace boost;
+    namespace sa=boost::string_algo
+    
+    // ...
+
+    string str1(" hello world! ");
+    trim( to_upper(str1) );  // str1 == "HELLO WORLD!"
+
+    string str2=ireplace_first_copy(str1,"hello","goodbye"); // str2 == "goodbye WORLD!"
+        </programlisting>
+        <para>
+            This example converts str1 to upper case and trims spaces from the start and the end
+            of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".
+            This example demonstrates several important concepts used in the library:
+        </para>
+        <itemizedlist>
+            <listitem>
+                <para><emphasis role="bold">Container parameters:</emphasis>
+                    Unlike the STL algorithms, parameters are not specified only in form
+                    of iterators. The STL convention allows for great flexibility,
+                    but it has several limitation. It is not possible to <emphasis>stack</emphasis> algorithms together, 
+                    because a container is passed in two parameters, so it is not possible to use 
+                    a return value from another algorithm. It is considerably easier to write
+                    <code>to_lower(str1)</code>, then <code>to_lower(str1.begin(), str1.end())</code>.
+                </para>
+                <para>
+                    The magic of <link linkend="string_algo.container_traits">container_traits</link> 
+                    provides a uniform way of handling different containers. 
+                    If there is a need to pass a pair of iterators, 
+                    <link linkend="string_algo.iterator_range"><code>iterator_range</code></link>
+                    can be used to package iterators into a structure with the container interface.
+                </para>
+            </listitem>
+            <listitem>
+                <para><emphasis role="bold">Copy vs. Mutable:</emphasis>
+                    Many algorithms in the library are performing a transformation of the input. 
+                    The transformation can be done in-place, mutating the input sequence, or a copy 
+                    of the transformed input can be created, leaving the input intact. None of 
+                    these possibilities is superior to the other one and both have different 
+                    advantages and disadvantages. For this reason, both are provided with the library. 
+                </para>
+            </listitem>
+            <listitem>
+                <para><emphasis role="bold">Algorithm stacking:</emphasis>
+                    Copy versions return a transformed input as a result. Mutable variants return 
+                    a reference to the input. Thus both versions allow a simple chaining of
+                    transformations within one expression (i.e. one can write 
+                    <code>trim_copy(to_upper_copy(s))</code> as well as <code>trim(to_upper(s))</code>). 
+                </para>
+            </listitem>
+            <listitem>
+                <para><emphasis role="bold">Naming:</emphasis>
+                    Naming follows the conventions from the Standard C++ Library. If there is a 
+                    copy and mutable version of the same algorithm, the mutable version has no suffix 
+                    and the copy version has suffix <emphasis>_copy</emphasis>. 
+                    Some algorithms have prefix <emphasis>i</emphasis> 
+                    (e.g. <functionname>ifind_first()</functionname>).
+                    This prefix identifies that the algorithm works in a case-insensitive manner.
+                </para>
+            </listitem>
+        </itemizedlist>
+        <para>
+            To use the library, include the <headername>boost/string_algo.hpp</headername> header. 
+            If the regex related functions are needed, include the 
+            <headername>boost/string_algo_regex.hpp</headername> header.
+        </para>
+    </section>
+    <section>
+        <title>Case conversion</title>
+        
+        <para>
+            STL has a nice way of converting character case. Unfortunately, it works only
+            for a single character and we want to convert a string, 
+        </para>
+        <programlisting>
+    string str1("HeLlO WoRld!");
+    to_upper(str1); // str1=="HELLO WORLD!"
+        </programlisting>
+        <para>
+            <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of 
+            characters in a container using a specified locale.
+        </para>
+    </section>
+    <section>
+        <title>Predicates and Classification</title>
+        <para>
+            A part of the library deals with string related predicates. Consider this example:
+        </para>
+        <programlisting>
+    bool is_executable( string&amp; filename )
+    {
+        return 
+            iends_with(filename, ".exe") ||
+            iends_with(filename, ".com");
+    }
+
+    // ...
+    string str1("command.com");
+    cout 
+        &lt;&lt; str1
+        &lt;&lt; is_executable("command.com")? "is": "is not" 
+        &lt;&lt; "an executable" 
+        &lt;&lt; endl; // prints "command.com is an executable"
+    
+    //..
+    char text1[]="hello world!";
+    cout 
+        &lt;&lt; text1 
+        &lt;&lt; all( text1, is_lower&lt;char&gt;() )? "is": "is not"
+        &lt;&lt; "written in the lower case" 
+        &lt;&lt; endl; // prints "hello world! is written in the lower case"
+        </programlisting>
+        <para>
+            The predicates are resolving if a substring is contained in the input string
+            under various conditions. The conditions are if a string starts with the substring, 
+            ends with the substring, 
+            simply contains the substring or if both strings are equal. See the reference for 
+            <headername>boost/string_algo/predicate.hpp</headername> for more details. 
+            In addition the algorithm <functionname>all()</functionname> checks
+            all elements of a container to satisfy a condition specified by a predicate. 
+            This predicate can be any unary predicate, but the library provides a bunch of 
+            useful string-related predicates ready for use.
+            These are located in the <headername>boost/string_algo/classification.hpp</headername> header.
+        </para>
+    </section>
+    <section>
+        <title>Trimming</title>
+        
+        <para>
+            When parsing the input of a user, strings usually have unwanted leading or trailing 
+            characters. To get rid of them, we need trim functions:
+        </para>
+        <programlisting>
+    string str1="     hello world!     ";
+    string str2=trim_left_copy(str1);   // str2 == "hello world!     "
+    string str3=trim_right_copy(str2);  // str3 == "     hello world!"
+    trim(str1);                         // str1 == "hello world!"
+
+    string phone="00423333444";
+    // remove leading 0 from the phone number
+    sa::trim_left(phone,is_any_of&lt;char&gt;("0")); // phone == "423333444"
+        </programlisting>
+        <para>
+            It is possible to trim the spaces on the right, on the left or on the both sides of a string.
+            And for those cases when there is a need to remove something else than blank space, the
+            <code>string_algo</code> namespace contains generic versions of the trim algorithms. Using these, 
+            a user can specify a functor which will select the <emphasis>space</emphasis> to be removed. It is possible to use 
+            classification predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.
+            See the reference for the <headername>boost/string_algo/trim.hpp</headername>.
+        </para>
+    </section>
+    <section>
+        <title>Find algorithms</title>
+        
+        <para>
+            The library contains a set of find algorithms. Here is an example:
+        </para>
+        <programlisting>
+    char text[]="hello dolly!";
+    iterator_range&lt;char*&gt; result=find_last(text,"ll");
+
+    transform( result.begin(), result.end(), result.begin(), bind2nd(plus&lt;char&gt;(), 1) );
+    // text = "hello dommy!"            
+
+    to_upper(result); // text == "hello doMMy!"
+        </programlisting>
+        <para>
+            We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".
+            The result is given in the <link linkend="string_algo.iterator_range"><code>iterator_range</code></link>. 
+            This range delimits the
+            part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
+            
+            As we can see, input of the <functionname>find_last()</functionname> algorithm can be also 
+            char[] because this type is supported by 
+            <link linkend="string_algo.container_traits">container_traits</link>.
+
+            Following lines transform the result. Notice, that 
+            <link linkend="string_algo.iterator_range"><code>iterator_range</code></link> have familiar 
+            <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.
+        </para>
+    </section>
+    <section>
+        <title>Replace Algorithms</title>
+        <para>
+            Find algorithms can be used for searching for a specific part of the sequence. Replace goes one step
+            further. After a matching part is found, it is substituted with something else. The substitution is computed
+            from an original, using some transformation. 
+        </para>
+        <programlisting>
+    string str1="Hello  Dolly,   Hello World!"
+    replace_first(str1, "Dolly", "Jane");      // str1 == "Hello  Jane,   Hello World!"
+    replace_last(str1, "Hello", "Goodbye");    // str1 == "Hello  Jane,   Goodbye World!"
+    erase_all(str1, " ");                      // str1 == "HelloJane,GoodbyeWorld!"
+    erase_head(str1, 6);                       // str1 == "Jane,GoodbyeWorld!"
+        </programlisting>
+        <para>
+            For the complete list of replace and erase functions see the 
+            <link linkend="string_algo.reference">reference</link>.
+            There is a lot of predefined function for common usage, however, the library allows you to 
+            define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>replace()</functionname> 
+            function which takes two parameters.
+            The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is 
+            a <link linkend="string_algo.formatter_concept">Formatter</link> object. 
+            The Finder object is a functor which performs the searching for the replacement part. The Formatter object
+            takes the result of the Finder (usually a reference to the found substring) and creates a 
+            substitute for it. Replace algorithm puts these two together and makes the desired substitution. 
+        </para>
+    </section>
+    <section>
+        <title>Split</title>
+
+        <para>
+            Split algorithms allow one to divide a sequence into parts. Each part represents a 
+            <emphasis>token</emphasis> and  tokens are separated by <emphasis>separators</emphasis>. 
+            One can either search for tokens or search for separators:
+        </para>
+
+        <programlisting>
+    string str1("hello abc-*-ABC-*-aBc goodbye");
+
+    typedef vector&lt; iterator_range&lt;string::iterator&gt; &gt; find_vector_type;
+    
+    find_vector_type FindVec; // #1: Search for separators
+    ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }
+
+    typdef vector&lt; string &gt; split_vector_type;
+    
+    split_vector_type SplitVec; // #2: Search for tokens
+    split( SplitVec, str1, is_any_of&lt;char&gt;("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
+        </programlisting>
+        <para>
+            <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.                       
+        </para>
+        <para>
+            The result of a split algorithm is a <emphasis>container of containers</emphasis>. There is only one restriction:
+            The inner container type must be able to hold extracted parts of the input sequence. This example 
+            shows the special case where the inner container is an 
+            <link linkend="string_algo.iterator_range"><code>iterator_range</code></link> 
+            instead of e.g. <code>std::string</code>. This way, a user gets a reference 
+            (in the form of iterators) delimiting the parts of the input sequence. Otherwise, a copy of 
+            each extracted part is created and added to the outer container.
+        </para>
+        <para>
+            So to recap, there are two basic algorithms: <functionname>find_all()</functionname> 
+            returns extracts the parts
+            matching the specification whereas <functionname>split()</functionname> uses the matching 
+            parts as delimiters, and extracts the parts in between them. 
+        </para>
+        <para>
+            Generalizations of these two algorithms are called <functionname>iter_find()</functionname> and 
+            <functionname>iter_split()</functionname>. They take a 
+            <link linkend="string_algo.finder_concept">Finder</link> object, as an argument to search for 
+            the substring. 
+        </para>     
+    </section>
+</section>