mirror of
https://github.com/boostorg/unordered.git
synced 2025-07-31 11:57:15 +02:00
Trying to make the unordered documentation a little better.
[SVN r4407]
This commit is contained in:
148
doc/buckets.qbk
Normal file
148
doc/buckets.qbk
Normal file
@@ -0,0 +1,148 @@
|
|||||||
|
[/ Copyright 2006-2007 Daniel James.
|
||||||
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||||||
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
|
||||||
|
|
||||||
|
[section:buckets The Data Structure]
|
||||||
|
|
||||||
|
The containers are made up of a number of 'buckets', each of which can contain
|
||||||
|
any number of elements. For example, the following diagram shows an [classref
|
||||||
|
boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`,
|
||||||
|
`B`, `C`, `D` and `E` (this is just for illustration, in practise containers
|
||||||
|
will have more buckets).
|
||||||
|
|
||||||
|
[$../../libs/unordered/doc/diagrams/buckets.png]
|
||||||
|
|
||||||
|
In order to decide which bucket to place an element in, the container applies
|
||||||
|
the hash function, `Hash`, to the element's key (for `unordered_set` and
|
||||||
|
`unordered_multiset` the key is the whole element, but is refered to as the key
|
||||||
|
so that the same terminology can be used for sets and maps). This returns a
|
||||||
|
value of type `std::size_t`. `std::size_t` has a much greater range of values
|
||||||
|
then the number of buckets, so that container applies another transformation to
|
||||||
|
that value to choose a bucket to place the element in.
|
||||||
|
|
||||||
|
If at a later date the container wants to find an element in the container it
|
||||||
|
just has to apply the same process to the element's key to discover which
|
||||||
|
bucket it is in. If the hash function has worked well the elements will be
|
||||||
|
evenly distributed amongst the buckets so it will only have to examine a small
|
||||||
|
number of elements.
|
||||||
|
|
||||||
|
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
||||||
|
This means that when looking for these elements, of another element that would
|
||||||
|
be placed in the same bucket, up to 2 comparison have to be made, making
|
||||||
|
searching slower. This is known as a collision. To keep things fast we try to
|
||||||
|
keep these to a minimum.
|
||||||
|
|
||||||
|
[table Methods for Accessing Buckets
|
||||||
|
[[Method] [Description]]
|
||||||
|
|
||||||
|
[
|
||||||
|
[``size_type bucket_count() const``]
|
||||||
|
[The number of buckets.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``size_type max_bucket_count() const``]
|
||||||
|
[An upper bound on the number of buckets.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``size_type bucket_size(size_type n) const``]
|
||||||
|
[The number of elements in bucket `n`.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``size_type bucket(key_type const& k) const``]
|
||||||
|
[Returns the index of the bucket which would contain k]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``
|
||||||
|
local_iterator begin(size_type n);
|
||||||
|
local_iterator end(size_type n);
|
||||||
|
const_local_iterator begin(size_type n) const;
|
||||||
|
const_local_iterator end(size_type n) const;
|
||||||
|
``]
|
||||||
|
[Return begin and end iterators for bucket `n`.]
|
||||||
|
]
|
||||||
|
]
|
||||||
|
|
||||||
|
[h2 Controlling the number of buckets]
|
||||||
|
|
||||||
|
As more elements are added to an unordered associative container, the number
|
||||||
|
of elements in the buckets will increase causing performance to get worse. To
|
||||||
|
combat this the containers increase the bucket count as elements are inserted.
|
||||||
|
|
||||||
|
The standard gives you two methods to influence the bucket count. First you can
|
||||||
|
specify the minimum number of buckets in the constructor, and later, by calling
|
||||||
|
`rehash`.
|
||||||
|
|
||||||
|
The other method is the `max_load_factor` member function. The 'load factor'
|
||||||
|
is the average number of elements per bucket, `max_load_factor` can be used
|
||||||
|
to give a /hint/ of a value that the load factor should be kept below. The
|
||||||
|
draft standard doesn't actually require the container to pay much attention
|
||||||
|
to this value. The only time the load factor is /required/ to be less than the
|
||||||
|
maximum is following a call to `rehash`. But most implementations will probably
|
||||||
|
try to keep the number of elements below the max load factor, and set the
|
||||||
|
maximum load factor something the same or near to your hint - unless your hint
|
||||||
|
is unreasonably small.
|
||||||
|
|
||||||
|
It is not specified anywhere how member functions other than `rehash` affect
|
||||||
|
the bucket count, although `insert` is only allowed to invalidate iterators
|
||||||
|
when the insertion causes the load factor to reach the maximum. Which will
|
||||||
|
typically mean that insert will only change the number of buckets when an
|
||||||
|
insert causes this.
|
||||||
|
|
||||||
|
In a similar manner to using `reserve` for `vector`s, it can be a good idea
|
||||||
|
to call `rehash` before inserting a large number of elements. This will get
|
||||||
|
the expensive rehashing out of the way and let you store iterators, safe in
|
||||||
|
the knowledge that they won't be invalidated. If you are inserting `n`
|
||||||
|
elements into container `x`, you could first call:
|
||||||
|
|
||||||
|
x.rehash((x.size() + n) / x.max_load_factor() + 1);
|
||||||
|
|
||||||
|
[blurb Note: `rehash`'s argument is the number of buckets, not the number of
|
||||||
|
elements, which is why the new size is divided by the maximum load factor. The
|
||||||
|
`+ 1` is required because the container is allowed to resize when the load
|
||||||
|
factor is equal to the maximum load factor.]
|
||||||
|
|
||||||
|
[table Methods for Controlling Bucket Size
|
||||||
|
[[Method] [Description]]
|
||||||
|
|
||||||
|
[
|
||||||
|
[``float load_factor() const``]
|
||||||
|
[The average number of elements per bucket.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``float max_load_factor() const``]
|
||||||
|
[Returns the current maximum load factor.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``float max_load_factor(float z)``]
|
||||||
|
[Changes the container's maximum load factor, using `z` as a hint.]
|
||||||
|
]
|
||||||
|
[
|
||||||
|
[``void rehash(size_type n)``]
|
||||||
|
[Changes the number of buckets so that there at least n buckets, and
|
||||||
|
so that the load factor is less than the maximum load factor.]
|
||||||
|
]
|
||||||
|
|
||||||
|
]
|
||||||
|
|
||||||
|
[/ I'm not at all happy with this section. So I've commented it out.]
|
||||||
|
|
||||||
|
[/ h2 Rehash Techniques]
|
||||||
|
|
||||||
|
[/If the container has a load factor much smaller than the maximum, `rehash`
|
||||||
|
might decrease the number of buckets, reducing the memory usage. This isn't
|
||||||
|
guaranteed by the standard but this implementation will do it.
|
||||||
|
|
||||||
|
If you want to stop the table from ever rehashing due to an insert, you can
|
||||||
|
set the maximum load factor to infinity (or perhaps a load factor that it'll
|
||||||
|
never reach - say `x.max_size()`. As you can only give a 'hint' for the maximum
|
||||||
|
load factor, this isn't guaranteed to work. But again, it'll work in this
|
||||||
|
implementation. (TODO: If an unordered container with infinite load factor
|
||||||
|
is copied, bad things could happen. So maybe this advice should be removed. Or
|
||||||
|
maybe the implementation should cope with that).
|
||||||
|
|
||||||
|
If you do this and want to make the container rehash, `rehash` will still work.
|
||||||
|
But be careful that you only ever call it with a sufficient number of buckets
|
||||||
|
- otherwise it's very likely that the container will decrease the bucket
|
||||||
|
count to an overly small amount.]
|
||||||
|
|
||||||
|
[endsect]
|
113
doc/intro.qbk
Normal file
113
doc/intro.qbk
Normal file
@@ -0,0 +1,113 @@
|
|||||||
|
[/ Copyright 2006-2007 Daniel James.
|
||||||
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||||||
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
|
||||||
|
|
||||||
|
[def __tr1__
|
||||||
|
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2009.pdf
|
||||||
|
C++ Standard Library Technical Report]]
|
||||||
|
[def __boost-tr1__
|
||||||
|
[@http://www.boost.org/doc/html/boost_tr1.html
|
||||||
|
Boost.TR1]]
|
||||||
|
[def __draft__
|
||||||
|
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2009.pdf
|
||||||
|
Working Draft of the C++ Standard]]
|
||||||
|
[def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table
|
||||||
|
hash table]]
|
||||||
|
[def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function
|
||||||
|
hash function]]
|
||||||
|
|
||||||
|
[section:intro Introduction]
|
||||||
|
|
||||||
|
For accessing data based on key lookup, the C++ standard library offers `std::set`,
|
||||||
|
`std::map`, `std::multiset` and `std::multimap`. These are generally
|
||||||
|
implemented using balanced binary trees so that lookup time has
|
||||||
|
logarithmic complexity. That is generally okay, but in many cases a
|
||||||
|
__hash-table__ can perform better, as accessing data has constant complexity,
|
||||||
|
on average. The worst case complexity is linear, but that occurs rarely and
|
||||||
|
with some care, can be avoided.
|
||||||
|
|
||||||
|
Also, the existing containers require a 'less than' comparison object
|
||||||
|
to order their elements. For some data types this is impossible to implement
|
||||||
|
or isn't practicle. For a hash table you need an equality function
|
||||||
|
and a hash function for the key.
|
||||||
|
|
||||||
|
So the __tr1__ introduced the unordered associative containers, which are
|
||||||
|
implemented using hash tables, and they have now been added to the __draft__.
|
||||||
|
|
||||||
|
This library supplies a standards compliant implementation that is proposed for
|
||||||
|
addition to boost. If accepted they should also be added to __boost-tr1__.
|
||||||
|
|
||||||
|
`unordered_set` and `unordered_multiset` are defined in the header
|
||||||
|
<[headerref boost/unordered_set.hpp]>
|
||||||
|
|
||||||
|
namespace boost {
|
||||||
|
template <
|
||||||
|
class Key,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<Key> >
|
||||||
|
class ``[classref boost::unordered_set unordered_set]``;
|
||||||
|
|
||||||
|
template<
|
||||||
|
class Key,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<Key> >
|
||||||
|
class ``[classref boost::unordered_multiset unordered_multiset]``;
|
||||||
|
}
|
||||||
|
|
||||||
|
`unordered_map` and `unordered_multimap` are defined in the header
|
||||||
|
<[headerref boost/unordered_map.hpp]>
|
||||||
|
|
||||||
|
namespace boost {
|
||||||
|
template <
|
||||||
|
class Key, class T,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<Key> >
|
||||||
|
class ``[classref boost::unordered_map unordered_map]``;
|
||||||
|
|
||||||
|
template<
|
||||||
|
class Key, class T,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<Key> >
|
||||||
|
class ``[classref boost::unordered_multimap unordered_multimap]``;
|
||||||
|
}
|
||||||
|
|
||||||
|
If using Boost.TR1, these classes will be included from `<unordered_set>` and
|
||||||
|
`<unordered_map>`, with the classes included in the `std::tr1` namespace.
|
||||||
|
|
||||||
|
The containers are used in a similar manner to the normal associative
|
||||||
|
containers:
|
||||||
|
|
||||||
|
#include <``[headerref boost/unordered_map.hpp]``>
|
||||||
|
#include <cassert>
|
||||||
|
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
boost::unordered_map<std::string, int> x;
|
||||||
|
x["one"] = 1;
|
||||||
|
x["two"] = 2;
|
||||||
|
x["three"] = 3;
|
||||||
|
|
||||||
|
assert(x["one"] == 1);
|
||||||
|
assert(x["missing"] == 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
But since the elements aren't ordered, the output of:
|
||||||
|
|
||||||
|
BOOST_FOREACH(map::value_type i, x) {
|
||||||
|
std::cout<<i.first<<","<<i.second<<"\n";
|
||||||
|
}
|
||||||
|
|
||||||
|
can be in any order. For example, it might be:
|
||||||
|
|
||||||
|
two,2
|
||||||
|
one,1
|
||||||
|
three,3
|
||||||
|
missing,0
|
||||||
|
|
||||||
|
There are other differences, which will be detailed later.
|
||||||
|
|
||||||
|
[endsect]
|
Reference in New Issue
Block a user