2004-03-04 22:12:19 +00:00
|
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
|
|
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
|
|
|
|
"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
|
|
|
|
<section id="string_algo.design" last-revision="$Date$">
|
|
|
|
<title>Design Topics</title>
|
|
|
|
|
|
|
|
<using-namespace name="boost"/>
|
2004-07-14 21:46:50 +00:00
|
|
|
<using-namespace name="boost::algorithm"/>
|
|
|
|
|
|
|
|
<section id="string_algo.string">
|
|
|
|
<title>String Representation</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As the name suggest, this library works mainly with strings. However, in the context of this library,
|
|
|
|
a string is not restricted to any particular implementation (like <code>std::basic_string</code>),
|
|
|
|
rather it is a concept. This allows the algorithms in this library to be reused for any string type,
|
|
|
|
that satisfies the given requirements.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
<emphasis role="bold">Definition:</emphasis> A string is a
|
|
|
|
<ulink url="../../libs/utility/Collection.html">collection</ulink> of characters accessible in sequential
|
|
|
|
ordered fashion. Character is any value type with "cheap" copying and assignment.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
First requirement of string-type is that it must accessible using
|
|
|
|
<link linkend="string_algo.collection_traits">collection traits</link>. This facility allows to access
|
|
|
|
the elements inside the string in a uniform iterator-based fashion.
|
2004-07-14 22:17:10 +00:00
|
|
|
This facility actually requires lessen requirements then collection concept. It implements
|
2004-07-14 21:46:50 +00:00
|
|
|
<ulink url="../../libs/algorithm/string/doc/external_concepts.html">external</ulink> collection interface.
|
|
|
|
This is sufficient for our library
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Second requirement defines the way in which are the characters stored in the string. Algorithms in
|
|
|
|
this library work with an assumption, that copying a character is cheaper then allocating an extra
|
|
|
|
storage to cache results. This is natural assumption for common character types. Algorithms will
|
|
|
|
work even if this requirement will not be satisfied, however at the cost of performance degradation.
|
|
|
|
<para>
|
|
|
|
</para>
|
2004-07-14 22:17:10 +00:00
|
|
|
In addition some algorithms have additional requirements on the string-type. Particularly, it is required,
|
2004-07-14 21:46:50 +00:00
|
|
|
that an algorithm can create a new string of the given type. In this case, it is required, that
|
2004-07-14 22:17:10 +00:00
|
|
|
the type satisfies the sequence (Std §23.1.1) requirements.
|
2004-07-14 21:46:50 +00:00
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
In the reference and also in the code, requirement on the string type is designated by the name of
|
|
|
|
template argument. <code>CollectionT</code> means that the basic collection requirements must be held.
|
|
|
|
<code>SequenceT</code> designates extended sequence requirements.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
2004-03-04 22:12:19 +00:00
|
|
|
|
|
|
|
<section id="string_algo.iterator_range">
|
|
|
|
<title><code>iterator_range</code> class</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
An <classname>iterator_range</classname> is an encapsulation of a pair of iterators that
|
|
|
|
delimit a sequence (or, a range). This concept is widely used by
|
|
|
|
sequence manipulating algorithms. Although being so useful, there no direct support
|
|
|
|
for it in the standard library (The closest thing is that some algorithms return a pair of iterators).
|
|
|
|
Instead all STL algorithms have two distinct parameters for beginning and end of a range. This design
|
|
|
|
is natural for implementation of generic algorithms, but it forbids to work with a range as a single value.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
It is possible to encapsulate a range in <code>std::pair<></code>, but
|
|
|
|
the <code>std::pair<></code> is a too generic encapsulation, so it is not best match for a range.
|
|
|
|
For instance, it does not enforce that begin and end iterators are of the same type.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Naturally the range concept is heavily used also in this library. During the development of
|
|
|
|
the library, it was discovered, that there is a need for a reasonable encapsulation for it.
|
|
|
|
A core part of the library deals with substring searching algorithms. Any such an algorithm,
|
|
|
|
returns a range delimiting the result of the search. <code>std::pair<></code> was considered as
|
|
|
|
unsuitable. Therefore the <code>iterator_range</code> was defined.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The intention of the <code>iterator_range</code> class is to manage a range as a single value and provide
|
2004-07-14 21:46:50 +00:00
|
|
|
a basic interface for common operations. Its interface is similar to that of collection.
|
2004-03-04 22:12:19 +00:00
|
|
|
In addition of <code>begin()</code>
|
|
|
|
and <code>end()</code> accessors, it has member functions for checking if the range is empty,
|
|
|
|
or to determine the size of the range. It has also a set of member typedefs that extract
|
|
|
|
type information from the encapsulated iterators. As such, the interface is compatible with
|
2004-07-14 22:17:10 +00:00
|
|
|
the <link linkend="string_algo.collection_traits">collection traits</link> requirements so
|
2004-03-04 22:12:19 +00:00
|
|
|
it is possible to use this class as a parameter to many algorithms in this library.
|
|
|
|
</para>
|
2004-07-14 21:46:50 +00:00
|
|
|
<para>
|
|
|
|
<classname>iterator_range</classname> will be moved to Boost.Range library in the future
|
|
|
|
releases. Internal version will be deprecated then.
|
|
|
|
</para>
|
2004-03-04 22:12:19 +00:00
|
|
|
</section>
|
|
|
|
|
2004-07-14 21:46:50 +00:00
|
|
|
<section id="string_algo.collection_traits">
|
|
|
|
<title>Collection Traits</title>
|
2004-03-04 22:12:19 +00:00
|
|
|
|
|
|
|
<para>
|
2004-07-14 21:46:50 +00:00
|
|
|
Collection traits provide uniform access to different types of
|
|
|
|
<ulink url="../../libs/utility/Collection.html">collections</ulink> .
|
2004-03-04 22:12:19 +00:00
|
|
|
This functionality allows to write generic algorithms which work with several
|
2004-07-14 21:46:50 +00:00
|
|
|
different kinds of collections. For this library it means, that, for instance,
|
2004-03-04 22:12:19 +00:00
|
|
|
many algorithms work with <code>std::string</code> as well as with <code>char[]</code>.
|
2004-07-14 21:46:50 +00:00
|
|
|
This facility implements
|
|
|
|
<ulink url="../../libs/algorithm/string/doc/external_concepts.html">external</ulink> collection
|
|
|
|
concept.
|
2004-03-04 22:12:19 +00:00
|
|
|
</para>
|
|
|
|
<para>
|
2004-07-14 21:46:50 +00:00
|
|
|
The following collection types are supported:
|
2004-03-04 22:12:19 +00:00
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
Standard containers
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
Built-in arrays (like int[])
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
Null terminated strings (this includes char[],wchar_t[],char*, and wchar_t*)
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
std::pair<iterator,iterator>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
|
|
<para>
|
2004-07-14 21:46:50 +00:00
|
|
|
Collection traits support a subset of container concept (Std §23.1). This subset
|
2004-03-04 22:12:19 +00:00
|
|
|
can be described as an input container concept, e.g. a container with an immutable content.
|
2004-07-14 21:46:50 +00:00
|
|
|
Its definition can be found in the header <headername>boost/algorithm/string/collection_traits.hpp</headername>.
|
2004-03-04 22:12:19 +00:00
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
In the table C denotes a container and c is an object of C.
|
|
|
|
</para>
|
|
|
|
<table>
|
2004-07-14 21:46:50 +00:00
|
|
|
<title>Collection Traits</title>
|
2004-03-04 22:12:19 +00:00
|
|
|
<tgroup cols="3" align="left">
|
|
|
|
<thead>
|
|
|
|
<row>
|
|
|
|
<entry>Name</entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry>Standard collection equivalent</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry>Description</entry>
|
|
|
|
</row>Maeterlinck
|
|
|
|
</thead>
|
|
|
|
<tbody>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>value_type_of<C></classname>::type</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry><code>C::value_type</code></entry>
|
|
|
|
<entry>Type of contained values</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>difference_type_of<C></classname>::type</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry><code>C::difference_type</code></entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry>difference type of the collection</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>iterator_of<C></classname>::type</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry><code>C::iterator</code></entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry>iterator type of the collection</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>const_iterator_of<C></classname>::type</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry><code>C::const_iterator</code></entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry>const_iterator type of the collection</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>result_iterator_of<C></classname>::type</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry></entry>
|
|
|
|
<entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
result_iterator type of the collection. This type maps to <code>C::iterator</code>
|
|
|
|
for mutable collection and <code>C::const_iterator</code> for const collection.
|
2004-03-04 22:12:19 +00:00
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
|
|
|
<entry><functionname>begin(c)</functionname></entry>
|
|
|
|
<entry><code>c.begin()</code></entry>
|
|
|
|
<entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
Gets the iterator pointing to the start of the collection.
|
2004-03-04 22:12:19 +00:00
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
|
|
|
<entry><functionname>end(c)</functionname></entry>
|
|
|
|
<entry><code>c.end()</code></entry>
|
|
|
|
<entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
Gets the iterator pointing to the end of the collection.
|
2004-03-04 22:12:19 +00:00
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
|
|
|
<entry><functionname>size(c)</functionname></entry>
|
|
|
|
<entry><code>c.size()</code></entry>
|
|
|
|
<entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
Gets the size of the collection.
|
2004-03-04 22:12:19 +00:00
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
|
|
|
<entry><functionname>empty(c)</functionname></entry>
|
|
|
|
<entry><code>c.empty()</code></entry>
|
|
|
|
<entry>
|
2004-07-14 21:46:50 +00:00
|
|
|
Checks if the collection is empty.
|
2004-03-04 22:12:19 +00:00
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
<para>
|
2004-07-14 21:46:50 +00:00
|
|
|
The collection traits are only a temporary part of this library. They will be replaced in the future
|
|
|
|
releases by Boost.Range library. Use of the internal implementation will be deprecated then.
|
2004-03-04 22:12:19 +00:00
|
|
|
</para>
|
|
|
|
|
|
|
|
</section>
|
|
|
|
<section id="string_algo.sequence_traits">
|
|
|
|
<title>Sequence Traits</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
|
|
|
|
they provide, rather in the inner details of the class and the way how it performs
|
|
|
|
various operation. The problem is that it is not possible to infer this difference from the
|
|
|
|
definitions of classes without some special mechanism.
|
|
|
|
However some algorithms can run significantly faster with the knowledge of the properties
|
|
|
|
of a particular container.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Sequence traits allows one to specify additional properties of a sequence container (see Std.§32.2).
|
|
|
|
These properties are then used by algorithms to select optimized handling for some operations.
|
|
|
|
The sequence traits are declared in the header
|
2004-07-14 21:46:50 +00:00
|
|
|
<headername>boost/algorithm/string/sequence_traits.hpp</headername>.
|
2004-03-04 22:12:19 +00:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In the table C denotes a container and c is an object of C.
|
|
|
|
</para>
|
|
|
|
<table>
|
|
|
|
<title>Sequence Traits</title>
|
|
|
|
<tgroup cols="2" align="left">
|
|
|
|
<thead>
|
|
|
|
<row>
|
|
|
|
<entry>Trait</entry>
|
|
|
|
<entry>Description</entry>
|
|
|
|
</row>
|
|
|
|
</thead>
|
|
|
|
<tbody>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>has_native_replace<C></classname>::value</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry>Specifies that the sequence has std::string like replace method</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>has_stable_iterators<C></classname>::value</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry>
|
|
|
|
Specifies that the sequence has stable iterators. It means,
|
|
|
|
that operations like <code>insert</code>/<code>erase</code>/<code>replace</code>
|
|
|
|
do not invalidate iterators.
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>has_const_time_insert<C></classname>::value</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry>
|
|
|
|
Specifies that the insert method of the sequence has
|
|
|
|
constant time complexity.
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
<row>
|
2004-07-14 21:46:50 +00:00
|
|
|
<entry><classname>has_const_time_erase<C></classname>::value</entry>
|
2004-03-04 22:12:19 +00:00
|
|
|
<entry>
|
|
|
|
Specifies that the erase method of the sequence has constant time complexity
|
|
|
|
</entry>
|
|
|
|
</row>
|
|
|
|
</tbody>
|
|
|
|
</tgroup>
|
|
|
|
</table>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Current implementation contains specializations for std::list<T> and
|
|
|
|
std::basic_string<T> from the standard library and SGI's std::rope<T> and std::slist<T>.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section id="string_algo.find">
|
|
|
|
<title>Find Algorithms</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different
|
|
|
|
interface which is more suitable for common string operations.
|
|
|
|
Instead of returning just the start of matching subsequence they return a range which is necessary
|
|
|
|
when the length of the matching subsequence is not known beforehand.
|
|
|
|
This feature also allows a partitioning of the input sequence into three
|
|
|
|
parts: a prefix, a substring and a suffix.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Another difference is an addition of various searching methods besides find_first, including find_regex.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
It the library, find algorithms are implemented in terms of
|
|
|
|
<link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities
|
|
|
|
(replace,split).
|
|
|
|
For convenience, there are also function wrappers for these finders to simplify find operations.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Currently the library contains only naive implementation of find algorithms with complexity
|
|
|
|
O(n * m) where n is the size of the input sequence and m is the size of the search sequence.
|
|
|
|
There are algorithms with complexity O(n), but for smaller sequence a constant overhead is
|
2004-07-14 21:46:50 +00:00
|
|
|
rather big. For small m << n (m by magnitude smaller than n) the current implementation
|
2004-03-04 22:12:19 +00:00
|
|
|
provides acceptable efficiency.
|
|
|
|
Even the C++ standard defines the required complexity for search algorithm as O(n * m).
|
|
|
|
It is possible that a future version of library will also contain algorithms with linear
|
|
|
|
complexity as an option
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section id="string_algo.replace">
|
|
|
|
<title>Replace Algorithms</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The implementation of replace algorithms follows the layered structure of the library. The
|
|
|
|
lower layer implements generic substitution of a range in the input sequence.
|
|
|
|
This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a
|
|
|
|
<link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two
|
|
|
|
functors define what to replace and what to replace it with. The upper layer functions
|
|
|
|
are just wrapping calls to the lower layer. Finders are shared with the find and split facility.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
As usual, the implementation of the lower layer is designed to work with a generic sequence while
|
|
|
|
taking an advantage of specific features if possible
|
|
|
|
(by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section id="string_algo.split">
|
2004-07-14 21:46:50 +00:00
|
|
|
<title>Find Iterators & Split Algorithms</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Find iterators are a logical extension of <link linkend="string_algo.find">find facility</link>.
|
|
|
|
Instead of searching for one match, the whole input can be iteratively searched for multiple matches.
|
|
|
|
The result of the search is then used to partition the input. It depends on the algorithms which parts
|
|
|
|
are returned as the result. It can be the matching parts (<classname>find_iterator</classname>) of the parts in
|
|
|
|
between (<classname>split_iterator</classname>).
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
In addition the split algorithms like <functionname>find_all()</functionname> and <functionname>split()</functionname>
|
|
|
|
can simplify the common operations. They use a find iterator to search the whole input and copy the
|
|
|
|
matches they found into the supplied container.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
<section id="string_algo.exception">
|
|
|
|
<title>Exception Safety</title>
|
2004-03-04 22:12:19 +00:00
|
|
|
|
|
|
|
<para>
|
2004-07-14 21:46:50 +00:00
|
|
|
The library provides some exceptions safety guaranties under following assumptions:
|
2004-07-14 22:17:10 +00:00
|
|
|
<orderedlist numeration="arabic">
|
2004-07-14 21:46:50 +00:00
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
All types that are used as a template arguments or passed as arguments to the
|
2004-07-14 22:17:10 +00:00
|
|
|
facilities in this library provide <emphasis>basic exception guarantee</emphasis>.
|
2004-07-14 21:46:50 +00:00
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
If the types mentioned in the first assumption can provide
|
2004-07-14 22:17:10 +00:00
|
|
|
<emphasis>strong exception guarantee</emphasis> for their const operations, some algorithm
|
2004-07-14 21:46:50 +00:00
|
|
|
can provide stronger guaranties.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</para>
|
|
|
|
<para>
|
2004-07-14 22:17:10 +00:00
|
|
|
Unless stated otherwise, all facilities and algorithms in this library have <emphasis>basic exception guarantee</emphasis>.
|
2004-03-04 22:12:19 +00:00
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
</section>
|