forked from boostorg/algorithm
282 lines
15 KiB
XML
282 lines
15 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
|
|
"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
|
|
<section id="string_algo.design" last-revision="$Date$">
|
|
<title>Design Topics</title>
|
|
|
|
<using-namespace name="boost"/>
|
|
<using-namespace name="boost::string_algo"/>
|
|
|
|
<section id="string_algo.iterator_range">
|
|
<title><code>iterator_range</code> class</title>
|
|
|
|
<para>
|
|
An <classname>iterator_range</classname> is an encapsulation of a pair of iterators that
|
|
delimit a sequence (or, a range). This concept is widely used by
|
|
sequence manipulating algorithms. Although being so useful, there no direct support
|
|
for it in the standard library (The closest thing is that some algorithms return a pair of iterators).
|
|
Instead all STL algorithms have two distinct parameters for beginning and end of a range. This design
|
|
is natural for implementation of generic algorithms, but it forbids to work with a range as a single value.
|
|
</para>
|
|
<para>
|
|
It is possible to encapsulate a range in <code>std::pair<></code>, but
|
|
the <code>std::pair<></code> is a too generic encapsulation, so it is not best match for a range.
|
|
For instance, it does not enforce that begin and end iterators are of the same type.
|
|
</para>
|
|
<para>
|
|
Naturally the range concept is heavily used also in this library. During the development of
|
|
the library, it was discovered, that there is a need for a reasonable encapsulation for it.
|
|
A core part of the library deals with substring searching algorithms. Any such an algorithm,
|
|
returns a range delimiting the result of the search. <code>std::pair<></code> was considered as
|
|
unsuitable. Therefore the <code>iterator_range</code> was defined.
|
|
</para>
|
|
<para>
|
|
The intention of the <code>iterator_range</code> class is to manage a range as a single value and provide
|
|
a basic interface for common operations. Its interface is similar to that of container.
|
|
In addition of <code>begin()</code>
|
|
and <code>end()</code> accessors, it has member functions for checking if the range is empty,
|
|
or to determine the size of the range. It has also a set of member typedefs that extract
|
|
type information from the encapsulated iterators. As such, the interface is compatible with
|
|
the <link linkend="string_algo.container_traits">container traits</link> requirements so
|
|
it is possible to use this class as a parameter to many algorithms in this library.
|
|
</para>
|
|
</section>
|
|
|
|
<section id="string_algo.container_traits">
|
|
<title>Container Traits</title>
|
|
|
|
<para>
|
|
Container traits provide uniform access to different types of containers.
|
|
This functionality allows to write generic algorithms which work with several
|
|
different kinds of containers. For this library it means, that, for instance,
|
|
many algorithms work with <code>std::string</code> as well as with <code>char[]</code>.
|
|
</para>
|
|
<para>
|
|
The following container types are supported:
|
|
<itemizedlist>
|
|
<listitem>
|
|
Standard containers
|
|
</listitem>
|
|
<listitem>
|
|
Built-in arrays (like int[])
|
|
</listitem>
|
|
<listitem>
|
|
Null terminated strings (this includes char[],wchar_t[],char*, and wchar_t*)
|
|
</listitem>
|
|
<listitem>
|
|
std::pair<iterator,iterator>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Container traits support a subset of container concept (Std §23.1). This subset
|
|
can be described as an input container concept, e.g. a container with an immutable content.
|
|
Its definition can be found in the header <headername>boost/string_algo/container_traits.hpp</headername>.
|
|
</para>
|
|
<para>
|
|
In the table C denotes a container and c is an object of C.
|
|
</para>
|
|
<table>
|
|
<title>Container Traits</title>
|
|
<tgroup cols="3" align="left">
|
|
<thead>
|
|
<row>
|
|
<entry>Name</entry>
|
|
<entry>Standard container equivalent</entry>
|
|
<entry>Description</entry>
|
|
</row>Maeterlinck
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><classname>container_value_type<C></classname>::type</entry>
|
|
<entry><code>C::value_type</code></entry>
|
|
<entry>Type of contained values</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>container_difference_type<C></classname>::type</entry>
|
|
<entry><code>C::difference_type</code></entry>
|
|
<entry>difference type of the container</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>container_iterator<C></classname>::type</entry>
|
|
<entry><code>C::iterator</code></entry>
|
|
<entry>iterator type of the container</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>container_const_iterator<C></classname>::type</entry>
|
|
<entry><code>C::const_iterator</code></entry>
|
|
<entry>const_iterator type of the container</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>container_result_iterator<C></classname>::type</entry>
|
|
<entry></entry>
|
|
<entry>
|
|
result_iterator type of the container. This type maps to <code>C::iterator</code>
|
|
for mutable container and <code>C::const_iterator</code> for const containers.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><functionname>begin(c)</functionname></entry>
|
|
<entry><code>c.begin()</code></entry>
|
|
<entry>
|
|
Gets the iterator pointing to the start of the container.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><functionname>end(c)</functionname></entry>
|
|
<entry><code>c.end()</code></entry>
|
|
<entry>
|
|
Gets the iterator pointing to the end of the container.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><functionname>size(c)</functionname></entry>
|
|
<entry><code>c.size()</code></entry>
|
|
<entry>
|
|
Gets the size of the container.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><functionname>empty(c)</functionname></entry>
|
|
<entry><code>c.empty()</code></entry>
|
|
<entry>
|
|
Checks if the container is empty.
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
The container traits are only a temporary part of this library. There is a plan for a separate submission
|
|
of a container_traits library to Boost. Once it gets accepted, String Algorithm Library will be adopted to
|
|
use it and the internal implementation will be deprecated.
|
|
</para>
|
|
|
|
</section>
|
|
<section id="string_algo.sequence_traits">
|
|
<title>Sequence Traits</title>
|
|
|
|
<para>
|
|
Major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
|
|
they provide, rather in the inner details of the class and the way how it performs
|
|
various operation. The problem is that it is not possible to infer this difference from the
|
|
definitions of classes without some special mechanism.
|
|
However some algorithms can run significantly faster with the knowledge of the properties
|
|
of a particular container.
|
|
</para>
|
|
<para>
|
|
Sequence traits allows one to specify additional properties of a sequence container (see Std.§32.2).
|
|
These properties are then used by algorithms to select optimized handling for some operations.
|
|
The sequence traits are declared in the header
|
|
<headername>boost/string_algo/sequence_traits.hpp</headername>.
|
|
</para>
|
|
|
|
<para>
|
|
In the table C denotes a container and c is an object of C.
|
|
</para>
|
|
<table>
|
|
<title>Sequence Traits</title>
|
|
<tgroup cols="2" align="left">
|
|
<thead>
|
|
<row>
|
|
<entry>Trait</entry>
|
|
<entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><classname>sequence_has_native_replace<C></classname>::value</entry>
|
|
<entry>Specifies that the sequence has std::string like replace method</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>sequence_has_stable_iterators<C></classname>::value</entry>
|
|
<entry>
|
|
Specifies that the sequence has stable iterators. It means,
|
|
that operations like <code>insert</code>/<code>erase</code>/<code>replace</code>
|
|
do not invalidate iterators.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>sequence_has_const_time_insert<C></classname>::value</entry>
|
|
<entry>
|
|
Specifies that the insert method of the sequence has
|
|
constant time complexity.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry><classname>sequence_has_const_time_erase<C></classname>::value</entry>
|
|
<entry>
|
|
Specifies that the erase method of the sequence has constant time complexity
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
Current implementation contains specializations for std::list<T> and
|
|
std::basic_string<T> from the standard library and SGI's std::rope<T> and std::slist<T>.
|
|
</para>
|
|
</section>
|
|
<section id="string_algo.find">
|
|
<title>Find Algorithms</title>
|
|
|
|
<para>
|
|
Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different
|
|
interface which is more suitable for common string operations.
|
|
Instead of returning just the start of matching subsequence they return a range which is necessary
|
|
when the length of the matching subsequence is not known beforehand.
|
|
This feature also allows a partitioning of the input sequence into three
|
|
parts: a prefix, a substring and a suffix.
|
|
</para>
|
|
<para>
|
|
Another difference is an addition of various searching methods besides find_first, including find_regex.
|
|
</para>
|
|
<para>
|
|
It the library, find algorithms are implemented in terms of
|
|
<link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities
|
|
(replace,split).
|
|
For convenience, there are also function wrappers for these finders to simplify find operations.
|
|
</para>
|
|
<para>
|
|
Currently the library contains only naive implementation of find algorithms with complexity
|
|
O(n * m) where n is the size of the input sequence and m is the size of the search sequence.
|
|
There are algorithms with complexity O(n), but for smaller sequence a constant overhead is
|
|
rather big. For small m << n (m magnitued smaller than n) the current implementation
|
|
provides acceptable efficiency.
|
|
Even the C++ standard defines the required complexity for search algorithm as O(n * m).
|
|
It is possible that a future version of library will also contain algorithms with linear
|
|
complexity as an option
|
|
</para>
|
|
</section>
|
|
<section id="string_algo.replace">
|
|
<title>Replace Algorithms</title>
|
|
|
|
<para>
|
|
The implementation of replace algorithms follows the layered structure of the library. The
|
|
lower layer implements generic substitution of a range in the input sequence.
|
|
This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a
|
|
<link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two
|
|
functors define what to replace and what to replace it with. The upper layer functions
|
|
are just wrapping calls to the lower layer. Finders are shared with the find and split facility.
|
|
</para>
|
|
<para>
|
|
As usual, the implementation of the lower layer is designed to work with a generic sequence while
|
|
taking an advantage of specific features if possible
|
|
(by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
|
|
</para>
|
|
</section>
|
|
<section id="string_algo.split">
|
|
<title>Split Algorithms</title>
|
|
|
|
<para>
|
|
Split algorithms are a logical extension of <link linkend="string_algo.find">find facility</link>.
|
|
Instead of searching for one match, the whole input is searched. The result of the search is then used
|
|
to partition the input. It depends on the algorithms which parts are returned as the result of
|
|
split operations. It can be the matching parts (<functionname>find_all()</functionname>) of the parts in
|
|
between (<functionname>split()</functionname>).
|
|
</para>
|
|
</section>
|
|
</section>
|