mirror of
				https://github.com/boostorg/algorithm.git
				synced 2025-11-04 01:31:39 +01:00 
			
		
		
		
	
		
			
	
	
		
			282 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
		
		
			
		
	
	
			282 lines
		
	
	
		
			15 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
| 
								 | 
							
								<?xml version="1.0" encoding="utf-8"?>
							 | 
						||
| 
								 | 
							
								<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
							 | 
						||
| 
								 | 
							
								"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
							 | 
						||
| 
								 | 
							
								<section id="string_algo.design" last-revision="$Date$">
							 | 
						||
| 
								 | 
							
								    <title>Design Topics</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    <using-namespace name="boost"/>
							 | 
						||
| 
								 | 
							
								    <using-namespace name="boost::string_algo"/>
							 | 
						||
| 
								 | 
							
								    
							 | 
						||
| 
								 | 
							
								    <section id="string_algo.iterator_range">
							 | 
						||
| 
								 | 
							
								        <title><code>iterator_range</code> class</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            An <classname>iterator_range</classname> is an encapsulation of a pair of iterators that
							 | 
						||
| 
								 | 
							
								            delimit a sequence (or, a range). This concept is widely used by 
							 | 
						||
| 
								 | 
							
								            sequence manipulating algorithms. Although being so useful, there no direct support 
							 | 
						||
| 
								 | 
							
								            for it in the standard library (The closest thing is that some algorithms return a pair of iterators). 
							 | 
						||
| 
								 | 
							
								            Instead all STL algorithms have two distinct parameters for beginning and end of a range. This design 
							 | 
						||
| 
								 | 
							
								            is natural for implementation of generic algorithms, but it forbids to work with a range as a single value. 
							 | 
						||
| 
								 | 
							
								        </para> 
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            It is possible to encapsulate a range in <code>std::pair<></code>, but
							 | 
						||
| 
								 | 
							
								            the <code>std::pair<></code> is a too generic encapsulation, so it is not best match for a range.
							 | 
						||
| 
								 | 
							
								            For instance, it does not enforce that begin and end iterators are of the same type.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Naturally the range concept is heavily used also in this library. During the development of
							 | 
						||
| 
								 | 
							
								            the library, it was discovered, that there is a need for a reasonable encapsulation for it.
							 | 
						||
| 
								 | 
							
								            A core part of the library deals with substring searching algorithms. Any such an algorithm,
							 | 
						||
| 
								 | 
							
								            returns a range delimiting the result of the search. <code>std::pair<></code> was considered as 
							 | 
						||
| 
								 | 
							
								            unsuitable. Therefore the <code>iterator_range</code> was defined.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            The intention of the <code>iterator_range</code> class is to manage a range as a single value and provide 
							 | 
						||
| 
								 | 
							
								            a basic interface for common operations. Its interface is similar to that of container. 
							 | 
						||
| 
								 | 
							
								            In addition of <code>begin()</code>
							 | 
						||
| 
								 | 
							
								            and <code>end()</code> accessors, it has member functions for checking if the range is empty,
							 | 
						||
| 
								 | 
							
								            or to determine the size of the range. It has also a set of member typedefs that extract
							 | 
						||
| 
								 | 
							
								            type information from the encapsulated iterators. As such, the interface is compatible with 
							 | 
						||
| 
								 | 
							
								            the <link linkend="string_algo.container_traits">container traits</link> requirements so
							 | 
						||
| 
								 | 
							
								            it is possible to use this class as a parameter to many algorithms in this library.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								    </section>
							 | 
						||
| 
								 | 
							
								        
							 | 
						||
| 
								 | 
							
								    <section id="string_algo.container_traits">
							 | 
						||
| 
								 | 
							
								        <title>Container Traits</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Container traits provide uniform access to different types of containers. 
							 | 
						||
| 
								 | 
							
								            This functionality allows to write generic algorithms which work with several 
							 | 
						||
| 
								 | 
							
								            different kinds of containers. For this library it means, that, for instance,
							 | 
						||
| 
								 | 
							
								            many algorithms work with <code>std::string</code> as well as with <code>char[]</code>.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            The following container types are supported:
							 | 
						||
| 
								 | 
							
								            <itemizedlist>
							 | 
						||
| 
								 | 
							
								                <listitem>
							 | 
						||
| 
								 | 
							
								                    Standard containers
							 | 
						||
| 
								 | 
							
								                </listitem>
							 | 
						||
| 
								 | 
							
								                <listitem>
							 | 
						||
| 
								 | 
							
								                    Built-in arrays (like int[])
							 | 
						||
| 
								 | 
							
								                </listitem>
							 | 
						||
| 
								 | 
							
								                <listitem>
							 | 
						||
| 
								 | 
							
								                    Null terminated strings (this includes char[],wchar_t[],char*, and wchar_t*)
							 | 
						||
| 
								 | 
							
								                </listitem>
							 | 
						||
| 
								 | 
							
								                <listitem>
							 | 
						||
| 
								 | 
							
								                    std::pair<iterator,iterator>
							 | 
						||
| 
								 | 
							
								                </listitem>
							 | 
						||
| 
								 | 
							
								            </itemizedlist>
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Container traits support a subset of container concept (Std §23.1). This subset 
							 | 
						||
| 
								 | 
							
								            can be described as an input container concept, e.g. a container with an immutable content. 
							 | 
						||
| 
								 | 
							
								            Its definition can be found in the header <headername>boost/string_algo/container_traits.hpp</headername>.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            In the table C denotes a container and c is an object of C. 
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <table>
							 | 
						||
| 
								 | 
							
								            <title>Container Traits</title>
							 | 
						||
| 
								 | 
							
								            <tgroup cols="3" align="left">
							 | 
						||
| 
								 | 
							
								                <thead>
							 | 
						||
| 
								 | 
							
								                    <row>   
							 | 
						||
| 
								 | 
							
								                        <entry>Name</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>Standard container equivalent</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>Description</entry>
							 | 
						||
| 
								 | 
							
								                    </row>Maeterlinck
							 | 
						||
| 
								 | 
							
								                </thead>
							 | 
						||
| 
								 | 
							
								                <tbody>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>container_value_type<C></classname>::type</entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>C::value_type</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>Type of contained values</entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>container_difference_type<C></classname>::type</entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>C::difference_type</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>difference type of the container</entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>container_iterator<C></classname>::type</entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>C::iterator</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>iterator type of the container</entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>container_const_iterator<C></classname>::type</entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>C::const_iterator</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>const_iterator type of the container</entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>container_result_iterator<C></classname>::type</entry>
							 | 
						||
| 
								 | 
							
								                        <entry></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            result_iterator type of the container. This type maps to <code>C::iterator</code>
							 | 
						||
| 
								 | 
							
								                            for mutable container and <code>C::const_iterator</code> for const containers.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><functionname>begin(c)</functionname></entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>c.begin()</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Gets the iterator pointing to the start of the container.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><functionname>end(c)</functionname></entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>c.end()</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Gets the iterator pointing to the end of the container.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><functionname>size(c)</functionname></entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>c.size()</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Gets the size of the container.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><functionname>empty(c)</functionname></entry>
							 | 
						||
| 
								 | 
							
								                        <entry><code>c.empty()</code></entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Checks if the container is empty.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                </tbody>
							 | 
						||
| 
								 | 
							
								            </tgroup>
							 | 
						||
| 
								 | 
							
								        </table>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            The container traits are only a temporary part of this library. There is a plan for a separate submission
							 | 
						||
| 
								 | 
							
								            of a container_traits library to Boost. Once it gets accepted, String Algorithm Library will be adopted to 
							 | 
						||
| 
								 | 
							
								            use it and the internal implementation will be deprecated.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								    
							 | 
						||
| 
								 | 
							
								    </section>
							 | 
						||
| 
								 | 
							
								    <section id="string_algo.sequence_traits">
							 | 
						||
| 
								 | 
							
								        <title>Sequence Traits</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
							 | 
						||
| 
								 | 
							
								            they provide, rather in the inner details of the class and the way how it performs 
							 | 
						||
| 
								 | 
							
								            various operation. The problem is that it is not possible to infer this difference from the 
							 | 
						||
| 
								 | 
							
								            definitions of classes without some special mechanism.
							 | 
						||
| 
								 | 
							
								            However some algorithms can run significantly faster with the knowledge of the properties
							 | 
						||
| 
								 | 
							
								            of a particular container.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Sequence traits allows one to specify additional properties of a sequence container (see Std.§32.2).
							 | 
						||
| 
								 | 
							
								            These properties are then used by algorithms to select optimized handling for some operations.
							 | 
						||
| 
								 | 
							
								            The sequence traits are declared in the header 
							 | 
						||
| 
								 | 
							
								            <headername>boost/string_algo/sequence_traits.hpp</headername>.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            In the table C denotes a container and c is an object of C.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <table>
							 | 
						||
| 
								 | 
							
								            <title>Sequence Traits</title>
							 | 
						||
| 
								 | 
							
								            <tgroup cols="2" align="left">
							 | 
						||
| 
								 | 
							
								                <thead>
							 | 
						||
| 
								 | 
							
								                    <row>   
							 | 
						||
| 
								 | 
							
								                        <entry>Trait</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>Description</entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                </thead>
							 | 
						||
| 
								 | 
							
								                <tbody>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>sequence_has_native_replace<C></classname>::value</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>Specifies that the sequence has std::string like replace method</entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>sequence_has_stable_iterators<C></classname>::value</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Specifies that the sequence has stable iterators. It means,
							 | 
						||
| 
								 | 
							
								                            that operations like <code>insert</code>/<code>erase</code>/<code>replace</code> 
							 | 
						||
| 
								 | 
							
								                            do not invalidate iterators.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>sequence_has_const_time_insert<C></classname>::value</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Specifies that the insert method of the sequence has 
							 | 
						||
| 
								 | 
							
								                            constant time complexity.
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    <row>
							 | 
						||
| 
								 | 
							
								                        <entry><classname>sequence_has_const_time_erase<C></classname>::value</entry>
							 | 
						||
| 
								 | 
							
								                        <entry>
							 | 
						||
| 
								 | 
							
								                            Specifies that the erase method of the sequence has constant time complexity
							 | 
						||
| 
								 | 
							
								                        </entry>
							 | 
						||
| 
								 | 
							
								                    </row>
							 | 
						||
| 
								 | 
							
								                    </tbody>
							 | 
						||
| 
								 | 
							
								            </tgroup>
							 | 
						||
| 
								 | 
							
								        </table>
							 | 
						||
| 
								 | 
							
								        
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Current implementation contains specializations for std::list<T> and
							 | 
						||
| 
								 | 
							
								            std::basic_string<T> from the standard library and SGI's std::rope<T> and std::slist<T>.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								    </section>
							 | 
						||
| 
								 | 
							
								    <section id="string_algo.find">
							 | 
						||
| 
								 | 
							
								        <title>Find Algorithms</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different 
							 | 
						||
| 
								 | 
							
								            interface which is more suitable for common string operations. 
							 | 
						||
| 
								 | 
							
								            Instead of returning just the start of matching subsequence they return a range which is necessary 
							 | 
						||
| 
								 | 
							
								            when the length of the matching subsequence is not known beforehand. 
							 | 
						||
| 
								 | 
							
								            This feature also allows a partitioning of  the input sequence into three 
							 | 
						||
| 
								 | 
							
								            parts: a prefix, a substring and a suffix. 
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Another difference is an addition of various searching methods besides find_first, including find_regex. 
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            It the library, find algorithms are implemented in terms of 
							 | 
						||
| 
								 | 
							
								            <link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities 
							 | 
						||
| 
								 | 
							
								            (replace,split).
							 | 
						||
| 
								 | 
							
								            For convenience, there are also function wrappers for these finders to simplify find operations.
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Currently the library contains only naive implementation of find algorithms with complexity 
							 | 
						||
| 
								 | 
							
								            O(n * m) where n is the size of the input sequence and m is the size of the search sequence. 
							 | 
						||
| 
								 | 
							
								            There are algorithms with complexity O(n), but for smaller sequence a constant overhead is 
							 | 
						||
| 
								 | 
							
								            rather big. For small m << n (m magnitued smaller than n) the current implementation 
							 | 
						||
| 
								 | 
							
								            provides acceptable efficiency. 
							 | 
						||
| 
								 | 
							
								            Even the C++ standard defines the required complexity for search algorithm as O(n * m). 
							 | 
						||
| 
								 | 
							
								            It is possible that a future version of library will also contain algorithms with linear 
							 | 
						||
| 
								 | 
							
								            complexity as an option
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								    </section>
							 | 
						||
| 
								 | 
							
								    <section id="string_algo.replace">
							 | 
						||
| 
								 | 
							
								        <title>Replace Algorithms</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            The implementation of replace algorithms follows the layered structure of the library. The 
							 | 
						||
| 
								 | 
							
								            lower layer implements generic substitution of a range in the input sequence. 
							 | 
						||
| 
								 | 
							
								            This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a 
							 | 
						||
| 
								 | 
							
								            <link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two 
							 | 
						||
| 
								 | 
							
								            functors define what to replace and what to replace it with. The upper layer functions 
							 | 
						||
| 
								 | 
							
								            are just wrapping calls to the lower layer. Finders are shared with the find and split facility. 
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            As usual, the implementation of the lower layer is designed to work with a generic sequence while
							 | 
						||
| 
								 | 
							
								            taking an advantage of specific features if possible 
							 | 
						||
| 
								 | 
							
								            (by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
							 | 
						||
| 
								 | 
							
								        </para>         
							 | 
						||
| 
								 | 
							
								    </section>
							 | 
						||
| 
								 | 
							
								    <section id="string_algo.split">
							 | 
						||
| 
								 | 
							
								        <title>Split Algorithms</title>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								        <para>
							 | 
						||
| 
								 | 
							
								            Split algorithms are a logical extension of <link linkend="string_algo.find">find facility</link>.
							 | 
						||
| 
								 | 
							
								            Instead of searching for one match, the whole input is searched. The result of the search is then used 
							 | 
						||
| 
								 | 
							
								            to partition the input. It depends on the algorithms which parts are returned as the result of
							 | 
						||
| 
								 | 
							
								            split operations. It can be the matching parts (<functionname>find_all()</functionname>) of the parts in
							 | 
						||
| 
								 | 
							
								            between (<functionname>split()</functionname>). 
							 | 
						||
| 
								 | 
							
								        </para>
							 | 
						||
| 
								 | 
							
								    </section>
							 | 
						||
| 
								 | 
							
								</section>
							 |