Files
boost_algorithm/string/doc/design.xml
2004-03-04 22:12:19 +00:00

282 lines
15 KiB
XML

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
<section id="string_algo.design" last-revision="$Date$">
<title>Design Topics</title>
<using-namespace name="boost"/>
<using-namespace name="boost::string_algo"/>
<section id="string_algo.iterator_range">
<title><code>iterator_range</code> class</title>
<para>
An <classname>iterator_range</classname> is an encapsulation of a pair of iterators that
delimit a sequence (or, a range). This concept is widely used by
sequence manipulating algorithms. Although being so useful, there no direct support
for it in the standard library (The closest thing is that some algorithms return a pair of iterators).
Instead all STL algorithms have two distinct parameters for beginning and end of a range. This design
is natural for implementation of generic algorithms, but it forbids to work with a range as a single value.
</para>
<para>
It is possible to encapsulate a range in <code>std::pair&lt;&gt;</code>, but
the <code>std::pair&lt;&gt;</code> is a too generic encapsulation, so it is not best match for a range.
For instance, it does not enforce that begin and end iterators are of the same type.
</para>
<para>
Naturally the range concept is heavily used also in this library. During the development of
the library, it was discovered, that there is a need for a reasonable encapsulation for it.
A core part of the library deals with substring searching algorithms. Any such an algorithm,
returns a range delimiting the result of the search. <code>std::pair&lt;&gt;</code> was considered as
unsuitable. Therefore the <code>iterator_range</code> was defined.
</para>
<para>
The intention of the <code>iterator_range</code> class is to manage a range as a single value and provide
a basic interface for common operations. Its interface is similar to that of container.
In addition of <code>begin()</code>
and <code>end()</code> accessors, it has member functions for checking if the range is empty,
or to determine the size of the range. It has also a set of member typedefs that extract
type information from the encapsulated iterators. As such, the interface is compatible with
the <link linkend="string_algo.container_traits">container traits</link> requirements so
it is possible to use this class as a parameter to many algorithms in this library.
</para>
</section>
<section id="string_algo.container_traits">
<title>Container Traits</title>
<para>
Container traits provide uniform access to different types of containers.
This functionality allows to write generic algorithms which work with several
different kinds of containers. For this library it means, that, for instance,
many algorithms work with <code>std::string</code> as well as with <code>char[]</code>.
</para>
<para>
The following container types are supported:
<itemizedlist>
<listitem>
Standard containers
</listitem>
<listitem>
Built-in arrays (like int[])
</listitem>
<listitem>
Null terminated strings (this includes char[],wchar_t[],char*, and wchar_t*)
</listitem>
<listitem>
std::pair&lt;iterator,iterator&gt;
</listitem>
</itemizedlist>
</para>
<para>
Container traits support a subset of container concept (Std &sect;23.1). This subset
can be described as an input container concept, e.g. a container with an immutable content.
Its definition can be found in the header <headername>boost/string_algo/container_traits.hpp</headername>.
</para>
<para>
In the table C denotes a container and c is an object of C.
</para>
<table>
<title>Container Traits</title>
<tgroup cols="3" align="left">
<thead>
<row>
<entry>Name</entry>
<entry>Standard container equivalent</entry>
<entry>Description</entry>
</row>Maeterlinck
</thead>
<tbody>
<row>
<entry><classname>container_value_type&lt;C&gt;</classname>::type</entry>
<entry><code>C::value_type</code></entry>
<entry>Type of contained values</entry>
</row>
<row>
<entry><classname>container_difference_type&lt;C&gt;</classname>::type</entry>
<entry><code>C::difference_type</code></entry>
<entry>difference type of the container</entry>
</row>
<row>
<entry><classname>container_iterator&lt;C&gt;</classname>::type</entry>
<entry><code>C::iterator</code></entry>
<entry>iterator type of the container</entry>
</row>
<row>
<entry><classname>container_const_iterator&lt;C&gt;</classname>::type</entry>
<entry><code>C::const_iterator</code></entry>
<entry>const_iterator type of the container</entry>
</row>
<row>
<entry><classname>container_result_iterator&lt;C&gt;</classname>::type</entry>
<entry></entry>
<entry>
result_iterator type of the container. This type maps to <code>C::iterator</code>
for mutable container and <code>C::const_iterator</code> for const containers.
</entry>
</row>
<row>
<entry><functionname>begin(c)</functionname></entry>
<entry><code>c.begin()</code></entry>
<entry>
Gets the iterator pointing to the start of the container.
</entry>
</row>
<row>
<entry><functionname>end(c)</functionname></entry>
<entry><code>c.end()</code></entry>
<entry>
Gets the iterator pointing to the end of the container.
</entry>
</row>
<row>
<entry><functionname>size(c)</functionname></entry>
<entry><code>c.size()</code></entry>
<entry>
Gets the size of the container.
</entry>
</row>
<row>
<entry><functionname>empty(c)</functionname></entry>
<entry><code>c.empty()</code></entry>
<entry>
Checks if the container is empty.
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
The container traits are only a temporary part of this library. There is a plan for a separate submission
of a container_traits library to Boost. Once it gets accepted, String Algorithm Library will be adopted to
use it and the internal implementation will be deprecated.
</para>
</section>
<section id="string_algo.sequence_traits">
<title>Sequence Traits</title>
<para>
Major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
they provide, rather in the inner details of the class and the way how it performs
various operation. The problem is that it is not possible to infer this difference from the
definitions of classes without some special mechanism.
However some algorithms can run significantly faster with the knowledge of the properties
of a particular container.
</para>
<para>
Sequence traits allows one to specify additional properties of a sequence container (see Std.&sect;32.2).
These properties are then used by algorithms to select optimized handling for some operations.
The sequence traits are declared in the header
<headername>boost/string_algo/sequence_traits.hpp</headername>.
</para>
<para>
In the table C denotes a container and c is an object of C.
</para>
<table>
<title>Sequence Traits</title>
<tgroup cols="2" align="left">
<thead>
<row>
<entry>Trait</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><classname>sequence_has_native_replace&lt;C&gt;</classname>::value</entry>
<entry>Specifies that the sequence has std::string like replace method</entry>
</row>
<row>
<entry><classname>sequence_has_stable_iterators&lt;C&gt;</classname>::value</entry>
<entry>
Specifies that the sequence has stable iterators. It means,
that operations like <code>insert</code>/<code>erase</code>/<code>replace</code>
do not invalidate iterators.
</entry>
</row>
<row>
<entry><classname>sequence_has_const_time_insert&lt;C&gt;</classname>::value</entry>
<entry>
Specifies that the insert method of the sequence has
constant time complexity.
</entry>
</row>
<row>
<entry><classname>sequence_has_const_time_erase&lt;C&gt;</classname>::value</entry>
<entry>
Specifies that the erase method of the sequence has constant time complexity
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
Current implementation contains specializations for std::list&lt;T&gt; and
std::basic_string&lt;T&gt; from the standard library and SGI's std::rope&lt;T&gt; and std::slist&lt;T&gt;.
</para>
</section>
<section id="string_algo.find">
<title>Find Algorithms</title>
<para>
Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different
interface which is more suitable for common string operations.
Instead of returning just the start of matching subsequence they return a range which is necessary
when the length of the matching subsequence is not known beforehand.
This feature also allows a partitioning of the input sequence into three
parts: a prefix, a substring and a suffix.
</para>
<para>
Another difference is an addition of various searching methods besides find_first, including find_regex.
</para>
<para>
It the library, find algorithms are implemented in terms of
<link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities
(replace,split).
For convenience, there are also function wrappers for these finders to simplify find operations.
</para>
<para>
Currently the library contains only naive implementation of find algorithms with complexity
O(n * m) where n is the size of the input sequence and m is the size of the search sequence.
There are algorithms with complexity O(n), but for smaller sequence a constant overhead is
rather big. For small m &lt;&lt; n (m magnitued smaller than n) the current implementation
provides acceptable efficiency.
Even the C++ standard defines the required complexity for search algorithm as O(n * m).
It is possible that a future version of library will also contain algorithms with linear
complexity as an option
</para>
</section>
<section id="string_algo.replace">
<title>Replace Algorithms</title>
<para>
The implementation of replace algorithms follows the layered structure of the library. The
lower layer implements generic substitution of a range in the input sequence.
This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a
<link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two
functors define what to replace and what to replace it with. The upper layer functions
are just wrapping calls to the lower layer. Finders are shared with the find and split facility.
</para>
<para>
As usual, the implementation of the lower layer is designed to work with a generic sequence while
taking an advantage of specific features if possible
(by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
</para>
</section>
<section id="string_algo.split">
<title>Split Algorithms</title>
<para>
Split algorithms are a logical extension of <link linkend="string_algo.find">find facility</link>.
Instead of searching for one match, the whole input is searched. The result of the search is then used
to partition the input. It depends on the algorithms which parts are returned as the result of
split operations. It can be the matching parts (<functionname>find_all()</functionname>) of the parts in
between (<functionname>split()</functionname>).
</para>
</section>
</section>