2007-05-31 22:33:39 +00:00
|
|
|
[/ Copyright 2006-2007 Daniel James.
|
|
|
|
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
|
|
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
|
|
|
|
|
|
|
|
|
|
[section:buckets The Data Structure]
|
|
|
|
|
|
|
|
|
|
The containers are made up of a number of 'buckets', each of which can contain
|
|
|
|
|
any number of elements. For example, the following diagram shows an [classref
|
|
|
|
|
boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`,
|
Merged revisions 41822-41992,41994-42101 via svnmerge from
https://svn.boost.org/svn/boost/branches/unordered/dev
........
r41822 | danieljames | 2007-12-07 12:51:54 +0000 (Fri, 07 Dec 2007) | 5 lines
Change the macros to meet boost guidelines.
I should really have done this before the review. At least it'll give them
something to say.
........
r41928 | danieljames | 2007-12-09 19:23:27 +0000 (Sun, 09 Dec 2007) | 1 line
Add some parameters to standalone documentation build.
........
r41929 | danieljames | 2007-12-09 19:24:07 +0000 (Sun, 09 Dec 2007) | 1 line
An extra rehash test for inserting a range.
........
r41930 | danieljames | 2007-12-09 19:24:52 +0000 (Sun, 09 Dec 2007) | 1 line
get_for_erase can be static because all the required information is in the iterator.
........
r41931 | danieljames | 2007-12-09 19:31:00 +0000 (Sun, 09 Dec 2007) | 1 line
ADL doesn't seem to be working properly on Visual C++ 7.1 when calling swap, so workaround this in the compile tests.
........
r41932 | danieljames | 2007-12-09 19:44:46 +0000 (Sun, 09 Dec 2007) | 1 line
Try to make the erase exception requirements a little clearer.
........
r41933 | danieljames | 2007-12-09 19:52:50 +0000 (Sun, 09 Dec 2007) | 1 line
Hopefully clearer comparison of accessors for comparison/hash function objects.
........
r41943 | danieljames | 2007-12-10 00:03:53 +0000 (Mon, 10 Dec 2007) | 1 line
Fix a typo.
........
r41951 | danieljames | 2007-12-10 11:08:02 +0000 (Mon, 10 Dec 2007) | 1 line
Use the locale in the case insensitive comparison, I really should add a test for this.
........
r41994 | danieljames | 2007-12-13 00:26:05 +0000 (Thu, 13 Dec 2007) | 3 lines
Hervé Brönnimann's improved explanation of the formula for avoiding
invalidating iterators.
........
r41995 | danieljames | 2007-12-13 00:30:46 +0000 (Thu, 13 Dec 2007) | 4 lines
Explicity use the classic locale in the case insensitive example. I could make
the locale a member, but that would make the example longer. Also, this would be
a good place to put a note about the need for constant function objects.
........
r41996 | danieljames | 2007-12-13 00:31:55 +0000 (Thu, 13 Dec 2007) | 1 line
Pull the point examples out into test files - fixing a few bugs in the process.
........
r41997 | danieljames | 2007-12-13 00:41:30 +0000 (Thu, 13 Dec 2007) | 3 lines
A few reference links for boost::hash, it might be better to link to the
first page of the Boost.Hash documentation though.
........
r42092 | danieljames | 2007-12-16 10:07:27 +0000 (Sun, 16 Dec 2007) | 2 lines
Fix some typos, and use American spelling.
........
r42093 | danieljames | 2007-12-16 10:11:00 +0000 (Sun, 16 Dec 2007) | 1 line
Small documentation tweak.
........
r42096 | danieljames | 2007-12-16 10:17:03 +0000 (Sun, 16 Dec 2007) | 1 line
Fix some reference documentation errors.
........
r42097 | danieljames | 2007-12-16 10:28:08 +0000 (Sun, 16 Dec 2007) | 1 line
Document the explicit constructors.
........
r42098 | danieljames | 2007-12-16 10:47:13 +0000 (Sun, 16 Dec 2007) | 1 line
Try to make the active issues and proposals a little clearer - including more obvious links to the relevant papers.
........
r42099 | danieljames | 2007-12-16 10:52:30 +0000 (Sun, 16 Dec 2007) | 1 line
Fix some complexity errors in the comparison table.
........
r42100 | danieljames | 2007-12-16 10:59:45 +0000 (Sun, 16 Dec 2007) | 1 line
Use Mapped instead of T in the documentation.
........
r42101 | danieljames | 2007-12-16 11:06:16 +0000 (Sun, 16 Dec 2007) | 1 line
Remove hard-coded length of prime numbers.
........
[SVN r42187]
2007-12-19 23:09:09 +00:00
|
|
|
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
|
|
|
|
have more buckets).
|
2007-05-31 22:33:39 +00:00
|
|
|
|
|
|
|
|
[$../../libs/unordered/doc/diagrams/buckets.png]
|
|
|
|
|
|
|
|
|
|
In order to decide which bucket to place an element in, the container applies
|
|
|
|
|
the hash function, `Hash`, to the element's key (for `unordered_set` and
|
2007-11-15 23:36:33 +00:00
|
|
|
`unordered_multiset` the key is the whole element, but is referred to as the key
|
2007-05-31 22:33:39 +00:00
|
|
|
so that the same terminology can be used for sets and maps). This returns a
|
|
|
|
|
value of type `std::size_t`. `std::size_t` has a much greater range of values
|
|
|
|
|
then the number of buckets, so that container applies another transformation to
|
|
|
|
|
that value to choose a bucket to place the element in.
|
|
|
|
|
|
|
|
|
|
If at a later date the container wants to find an element in the container it
|
|
|
|
|
just has to apply the same process to the element's key to discover which
|
|
|
|
|
bucket it is in. If the hash function has worked well the elements will be
|
|
|
|
|
evenly distributed amongst the buckets so it will only have to examine a small
|
|
|
|
|
number of elements.
|
|
|
|
|
|
|
|
|
|
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
|
|
|
|
This means that when looking for these elements, of another element that would
|
|
|
|
|
be placed in the same bucket, up to 2 comparison have to be made, making
|
|
|
|
|
searching slower. This is known as a collision. To keep things fast we try to
|
|
|
|
|
keep these to a minimum.
|
|
|
|
|
|
|
|
|
|
[table Methods for Accessing Buckets
|
|
|
|
|
[[Method] [Description]]
|
|
|
|
|
|
|
|
|
|
[
|
|
|
|
|
[``size_type bucket_count() const``]
|
|
|
|
|
[The number of buckets.]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``size_type max_bucket_count() const``]
|
|
|
|
|
[An upper bound on the number of buckets.]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``size_type bucket_size(size_type n) const``]
|
|
|
|
|
[The number of elements in bucket `n`.]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``size_type bucket(key_type const& k) const``]
|
|
|
|
|
[Returns the index of the bucket which would contain k]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``
|
|
|
|
|
local_iterator begin(size_type n);
|
|
|
|
|
local_iterator end(size_type n);
|
|
|
|
|
const_local_iterator begin(size_type n) const;
|
|
|
|
|
const_local_iterator end(size_type n) const;
|
|
|
|
|
``]
|
|
|
|
|
[Return begin and end iterators for bucket `n`.]
|
|
|
|
|
]
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
[h2 Controlling the number of buckets]
|
|
|
|
|
|
|
|
|
|
As more elements are added to an unordered associative container, the number
|
|
|
|
|
of elements in the buckets will increase causing performance to get worse. To
|
|
|
|
|
combat this the containers increase the bucket count as elements are inserted.
|
|
|
|
|
|
|
|
|
|
The standard gives you two methods to influence the bucket count. First you can
|
|
|
|
|
specify the minimum number of buckets in the constructor, and later, by calling
|
|
|
|
|
`rehash`.
|
|
|
|
|
|
|
|
|
|
The other method is the `max_load_factor` member function. The 'load factor'
|
|
|
|
|
is the average number of elements per bucket, `max_load_factor` can be used
|
|
|
|
|
to give a /hint/ of a value that the load factor should be kept below. The
|
|
|
|
|
draft standard doesn't actually require the container to pay much attention
|
|
|
|
|
to this value. The only time the load factor is /required/ to be less than the
|
|
|
|
|
maximum is following a call to `rehash`. But most implementations will probably
|
|
|
|
|
try to keep the number of elements below the max load factor, and set the
|
|
|
|
|
maximum load factor something the same or near to your hint - unless your hint
|
|
|
|
|
is unreasonably small.
|
|
|
|
|
|
|
|
|
|
It is not specified anywhere how member functions other than `rehash` affect
|
|
|
|
|
the bucket count, although `insert` is only allowed to invalidate iterators
|
|
|
|
|
when the insertion causes the load factor to reach the maximum. Which will
|
|
|
|
|
typically mean that insert will only change the number of buckets when an
|
|
|
|
|
insert causes this.
|
|
|
|
|
|
|
|
|
|
In a similar manner to using `reserve` for `vector`s, it can be a good idea
|
|
|
|
|
to call `rehash` before inserting a large number of elements. This will get
|
|
|
|
|
the expensive rehashing out of the way and let you store iterators, safe in
|
|
|
|
|
the knowledge that they won't be invalidated. If you are inserting `n`
|
|
|
|
|
elements into container `x`, you could first call:
|
|
|
|
|
|
|
|
|
|
x.rehash((x.size() + n) / x.max_load_factor() + 1);
|
|
|
|
|
|
|
|
|
|
[blurb Note: `rehash`'s argument is the number of buckets, not the number of
|
Merged revisions 41822-41992,41994-42101 via svnmerge from
https://svn.boost.org/svn/boost/branches/unordered/dev
........
r41822 | danieljames | 2007-12-07 12:51:54 +0000 (Fri, 07 Dec 2007) | 5 lines
Change the macros to meet boost guidelines.
I should really have done this before the review. At least it'll give them
something to say.
........
r41928 | danieljames | 2007-12-09 19:23:27 +0000 (Sun, 09 Dec 2007) | 1 line
Add some parameters to standalone documentation build.
........
r41929 | danieljames | 2007-12-09 19:24:07 +0000 (Sun, 09 Dec 2007) | 1 line
An extra rehash test for inserting a range.
........
r41930 | danieljames | 2007-12-09 19:24:52 +0000 (Sun, 09 Dec 2007) | 1 line
get_for_erase can be static because all the required information is in the iterator.
........
r41931 | danieljames | 2007-12-09 19:31:00 +0000 (Sun, 09 Dec 2007) | 1 line
ADL doesn't seem to be working properly on Visual C++ 7.1 when calling swap, so workaround this in the compile tests.
........
r41932 | danieljames | 2007-12-09 19:44:46 +0000 (Sun, 09 Dec 2007) | 1 line
Try to make the erase exception requirements a little clearer.
........
r41933 | danieljames | 2007-12-09 19:52:50 +0000 (Sun, 09 Dec 2007) | 1 line
Hopefully clearer comparison of accessors for comparison/hash function objects.
........
r41943 | danieljames | 2007-12-10 00:03:53 +0000 (Mon, 10 Dec 2007) | 1 line
Fix a typo.
........
r41951 | danieljames | 2007-12-10 11:08:02 +0000 (Mon, 10 Dec 2007) | 1 line
Use the locale in the case insensitive comparison, I really should add a test for this.
........
r41994 | danieljames | 2007-12-13 00:26:05 +0000 (Thu, 13 Dec 2007) | 3 lines
Hervé Brönnimann's improved explanation of the formula for avoiding
invalidating iterators.
........
r41995 | danieljames | 2007-12-13 00:30:46 +0000 (Thu, 13 Dec 2007) | 4 lines
Explicity use the classic locale in the case insensitive example. I could make
the locale a member, but that would make the example longer. Also, this would be
a good place to put a note about the need for constant function objects.
........
r41996 | danieljames | 2007-12-13 00:31:55 +0000 (Thu, 13 Dec 2007) | 1 line
Pull the point examples out into test files - fixing a few bugs in the process.
........
r41997 | danieljames | 2007-12-13 00:41:30 +0000 (Thu, 13 Dec 2007) | 3 lines
A few reference links for boost::hash, it might be better to link to the
first page of the Boost.Hash documentation though.
........
r42092 | danieljames | 2007-12-16 10:07:27 +0000 (Sun, 16 Dec 2007) | 2 lines
Fix some typos, and use American spelling.
........
r42093 | danieljames | 2007-12-16 10:11:00 +0000 (Sun, 16 Dec 2007) | 1 line
Small documentation tweak.
........
r42096 | danieljames | 2007-12-16 10:17:03 +0000 (Sun, 16 Dec 2007) | 1 line
Fix some reference documentation errors.
........
r42097 | danieljames | 2007-12-16 10:28:08 +0000 (Sun, 16 Dec 2007) | 1 line
Document the explicit constructors.
........
r42098 | danieljames | 2007-12-16 10:47:13 +0000 (Sun, 16 Dec 2007) | 1 line
Try to make the active issues and proposals a little clearer - including more obvious links to the relevant papers.
........
r42099 | danieljames | 2007-12-16 10:52:30 +0000 (Sun, 16 Dec 2007) | 1 line
Fix some complexity errors in the comparison table.
........
r42100 | danieljames | 2007-12-16 10:59:45 +0000 (Sun, 16 Dec 2007) | 1 line
Use Mapped instead of T in the documentation.
........
r42101 | danieljames | 2007-12-16 11:06:16 +0000 (Sun, 16 Dec 2007) | 1 line
Remove hard-coded length of prime numbers.
........
[SVN r42187]
2007-12-19 23:09:09 +00:00
|
|
|
elements, which is why the new size is divided by the maximum load factor. The
|
|
|
|
|
+ 1 guarantees there is no invalidation; without it, reallocation could occur
|
|
|
|
|
if the number of bucket exactly divides the target size, since the container is
|
|
|
|
|
allowed to rehash when the load factor is equal to the maximum load factor.]
|
2007-05-31 22:33:39 +00:00
|
|
|
|
|
|
|
|
[table Methods for Controlling Bucket Size
|
|
|
|
|
[[Method] [Description]]
|
|
|
|
|
|
|
|
|
|
[
|
|
|
|
|
[``float load_factor() const``]
|
|
|
|
|
[The average number of elements per bucket.]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``float max_load_factor() const``]
|
|
|
|
|
[Returns the current maximum load factor.]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``float max_load_factor(float z)``]
|
|
|
|
|
[Changes the container's maximum load factor, using `z` as a hint.]
|
|
|
|
|
]
|
|
|
|
|
[
|
|
|
|
|
[``void rehash(size_type n)``]
|
|
|
|
|
[Changes the number of buckets so that there at least n buckets, and
|
|
|
|
|
so that the load factor is less than the maximum load factor.]
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
[endsect]
|