forked from boostorg/unordered
uploaded current status
This commit is contained in:
@ -5,7 +5,7 @@
|
|||||||
= The Data Structure
|
= The Data Structure
|
||||||
|
|
||||||
The containers are made up of a number of 'buckets', each of which can contain
|
The containers are made up of a number of 'buckets', each of which can contain
|
||||||
any number of elements. For example, the following diagram shows an <<unordered_set,unordered_set>> with 7 buckets containing 5 elements, `A`,
|
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
|
||||||
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
||||||
have more buckets).
|
have more buckets).
|
||||||
|
|
||||||
@ -31,20 +31,34 @@ equality predicates in the next section>>.
|
|||||||
|
|
||||||
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
||||||
When looking for elements in this bucket up to 2 comparisons are made, making
|
When looking for elements in this bucket up to 2 comparisons are made, making
|
||||||
the search slower. This is known as a collision. To keep things fast we try to
|
the search slower. This is known as a *collision*. To keep things fast we try to
|
||||||
keep collisions to a minimum.
|
keep collisions to a minimum.
|
||||||
|
|
||||||
|
If instead of `boost::unordered_set` we had used <<unordered_flat_set,`boost::unordered_flat_set`>>, the
|
||||||
|
diagram would look as follows:
|
||||||
|
|
||||||
|
image::buckets oa.png[]
|
||||||
|
|
||||||
|
In open-addressing containers, buckets can hold at most one element; if a collision happens
|
||||||
|
(like is the case of `D` in the example), the element uses some other available bucket in
|
||||||
|
the vicinity of the original position. Given this simpler scenario, Boost.Unordered
|
||||||
|
open-addressing containers offer a very limited API for accessing buckets.
|
||||||
|
|
||||||
[caption=, title='Table {counter:table-counter}. Methods for Accessing Buckets']
|
[caption=, title='Table {counter:table-counter}. Methods for Accessing Buckets']
|
||||||
[cols="1,.^1", frame=all, grid=rows]
|
[cols="1,.^1", frame=all, grid=rows]
|
||||||
|===
|
|===
|
||||||
|Method |Description
|
2+^h| *All containers*
|
||||||
|
h|*Method* h|*Description*
|
||||||
|
|
||||||
|`size_type bucket_count() const`
|
|`size_type bucket_count() const`
|
||||||
|The number of buckets.
|
|The number of buckets.
|
||||||
|
|
||||||
|
2+^h| *Closed-addressing containers only* +
|
||||||
|
`boost::unordered_[multi]set`, `boost::unordered_[multi]map`
|
||||||
|
h|*Method* h|*Description*
|
||||||
|
|
||||||
|`size_type max_bucket_count() const`
|
|`size_type max_bucket_count() const`
|
||||||
|An upper bound on the number of buckets.
|
|An upper bound on the number of buckets.
|
||||||
|
|
||||||
|`size_type bucket_size(size_type n) const`
|
|`size_type bucket_size(size_type n) const`
|
||||||
|The number of elements in bucket `n`.
|
|The number of elements in bucket `n`.
|
||||||
|
|
||||||
@ -69,14 +83,14 @@ keep collisions to a minimum.
|
|||||||
== Controlling the number of buckets
|
== Controlling the number of buckets
|
||||||
|
|
||||||
As more elements are added to an unordered associative container, the number
|
As more elements are added to an unordered associative container, the number
|
||||||
of elements in the buckets will increase causing performance to degrade.
|
of collisions will increase causing performance to degrade.
|
||||||
To combat this the containers increase the bucket count as elements are inserted.
|
To combat this the containers increase the bucket count as elements are inserted.
|
||||||
You can also tell the container to change the bucket count (if required) by
|
You can also tell the container to change the bucket count (if required) by
|
||||||
calling `rehash`.
|
calling `rehash`.
|
||||||
|
|
||||||
The standard leaves a lot of freedom to the implementer to decide how the
|
The standard leaves a lot of freedom to the implementer to decide how the
|
||||||
number of buckets is chosen, but it does make some requirements based on the
|
number of buckets is chosen, but it does make some requirements based on the
|
||||||
container's 'load factor', the average number of elements per bucket.
|
container's 'load factor', the number of elements divided by the number of buckets.
|
||||||
Containers also have a 'maximum load factor' which they should try to keep the
|
Containers also have a 'maximum load factor' which they should try to keep the
|
||||||
load factor below.
|
load factor below.
|
||||||
|
|
||||||
@ -97,7 +111,8 @@ or close to the hint - unless your hint is unreasonably small or large.
|
|||||||
[caption=, title='Table {counter:table-counter}. Methods for Controlling Bucket Size']
|
[caption=, title='Table {counter:table-counter}. Methods for Controlling Bucket Size']
|
||||||
[cols="1,.^1", frame=all, grid=rows]
|
[cols="1,.^1", frame=all, grid=rows]
|
||||||
|===
|
|===
|
||||||
|Method |Description
|
2+^h| *All containers*
|
||||||
|
h|*Method* h|*Description*
|
||||||
|
|
||||||
|`X(size_type n)`
|
|`X(size_type n)`
|
||||||
|Construct an empty container with at least `n` buckets (`X` is the container type).
|
|Construct an empty container with at least `n` buckets (`X` is the container type).
|
||||||
@ -112,22 +127,45 @@ or close to the hint - unless your hint is unreasonably small or large.
|
|||||||
|Returns the current maximum load factor.
|
|Returns the current maximum load factor.
|
||||||
|
|
||||||
|`float max_load_factor(float z)`
|
|`float max_load_factor(float z)`
|
||||||
|Changes the container's maximum load factor, using `z` as a hint.
|
|Changes the container's maximum load factor, using `z` as a hint. +
|
||||||
|
**Open-addressing containers:** this function does nothing: users are not allowed to change the maximum load factor.
|
||||||
|
|
||||||
|`void rehash(size_type n)`
|
|`void rehash(size_type n)`
|
||||||
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
|
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
|
||||||
|
|
||||||
|
2+^h| *Open-addressing containers only* +
|
||||||
|
`boost::unordered_flat_set`, `boost::unordered_flat_map`
|
||||||
|
h|*Method* h|*Description*
|
||||||
|
|
||||||
|
|`size_type max_load() const`
|
||||||
|
|Returns the maximum number of allowed elements in the container before rehash.
|
||||||
|
|
||||||
|===
|
|===
|
||||||
|
|
||||||
|
A note on `max_load` for open-addressing containers: the maximum load will naturally decrease when
|
||||||
|
new insertions are performed, but _won't_ increase at the same rate when erasing: for instance,
|
||||||
|
adding 1,000 elements to a <<unordered_flat_map,`boost::unordered_flat_map`>> and then
|
||||||
|
erasing those 1,000 elements will typically reduce the maximum load by around 160 rather
|
||||||
|
than restoring it to its original value. This is done internally by Boost.Unordered in order
|
||||||
|
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
|
||||||
|
The maximum load will be reset to its theoretical maximum
|
||||||
|
(`max_load_factor() * bucket_count()`) right after `rehash`.
|
||||||
|
|
||||||
== Iterator Invalidation
|
== Iterator Invalidation
|
||||||
|
|
||||||
It is not specified how member functions other than `rehash` and `reserve` affect
|
It is not specified how member functions other than `rehash` and `reserve` affect
|
||||||
the bucket count, although `insert` is only allowed to invalidate iterators
|
the bucket count, although `insert` can only invalidate iterators
|
||||||
when the insertion causes the load factor to be greater than or equal to the
|
when the insertion causes the container's load to be greater than the maximum allowed.
|
||||||
maximum load factor. For most implementations this means that `insert` will only
|
For most implementations this means that `insert` will only
|
||||||
change the number of buckets when this happens. While iterators can be
|
change the number of buckets when this happens. Iterators can be
|
||||||
invalidated by calls to `insert`, `rehash` and `reserve`, pointers and references to the
|
invalidated by calls to `insert`, `rehash` and `reserve`.
|
||||||
container's elements are never invalidated.
|
|
||||||
|
As for pointers and references,
|
||||||
|
they are never invalidated for closed-addressing containers (`boost::unordered_[multi]set`, `boost::unordered_[multi]map`),
|
||||||
|
but they will when rehashing occurs for open-addressing
|
||||||
|
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
|
||||||
|
these containers store elements directly into their holding buckets, so
|
||||||
|
when allocating a new bucket array the elements must be transferred by means of move construction.
|
||||||
|
|
||||||
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
||||||
to call `reserve` before inserting a large number of elements. This will get
|
to call `reserve` before inserting a large number of elements. This will get
|
||||||
|
@ -25,19 +25,22 @@
|
|||||||
|No equivalent. Since the elements aren't ordered `lower_bound` and `upper_bound` would be meaningless.
|
|No equivalent. Since the elements aren't ordered `lower_bound` and `upper_bound` would be meaningless.
|
||||||
|
|
||||||
|`equal_range(k)` returns an empty range at the position that `k` would be inserted if `k` isn't present in the container.
|
|`equal_range(k)` returns an empty range at the position that `k` would be inserted if `k` isn't present in the container.
|
||||||
|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
|
|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. +
|
||||||
|
**Closed-addressing containers:** To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
|
||||||
|
|
||||||
|`iterator`, `const_iterator` are of the bidirectional category.
|
|`iterator`, `const_iterator` are of the bidirectional category.
|
||||||
|`iterator`, `const_iterator` are of at least the forward category.
|
|`iterator`, `const_iterator` are of at least the forward category.
|
||||||
|
|
||||||
|Iterators, pointers and references to the container's elements are never invalidated.
|
|Iterators, pointers and references to the container's elements are never invalidated.
|
||||||
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. Pointers and references to the container's elements are never invalidated.
|
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|
||||||
|
**Closed-addressing containers:** Pointers and references to the container's elements are never invalidated. +
|
||||||
|
**Open-addressing containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
|
||||||
|
|
||||||
|Iterators iterate through the container in the order defined by the comparison object.
|
|Iterators iterate through the container in the order defined by the comparison object.
|
||||||
|Iterators iterate through the container in an arbitrary order, that can change as elements are inserted, although equivalent elements are always adjacent.
|
|Iterators iterate through the container in an arbitrary order, that can change as elements are inserted, although equivalent elements are always adjacent.
|
||||||
|
|
||||||
|No equivalent
|
|No equivalent
|
||||||
|Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
|
|**Closed-addressing containers:** Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
|
||||||
|
|
||||||
|Can be compared using the `==`, `!=`, `<`, `\<=`, `>`, `>=` operators.
|
|Can be compared using the `==`, `!=`, `<`, `\<=`, `>`, `>=` operators.
|
||||||
|Can be compared using the `==` and `!=` operators.
|
|Can be compared using the `==` and `!=` operators.
|
||||||
@ -45,9 +48,6 @@
|
|||||||
|
|
|
|
||||||
|When inserting with a hint, implementations are permitted to ignore the hint.
|
|When inserting with a hint, implementations are permitted to ignore the hint.
|
||||||
|
|
||||||
|`erase` never throws an exception
|
|
||||||
|The containers' hash or predicate function can throw exceptions from `erase`.
|
|
||||||
|
|
||||||
|===
|
|===
|
||||||
|
|
||||||
---
|
---
|
||||||
|
@ -5,13 +5,15 @@
|
|||||||
|
|
||||||
:cpp: C++
|
:cpp: C++
|
||||||
|
|
||||||
|
== Closed-addressing containers: unordered_[multi]set, unordered_[multi]map
|
||||||
|
|
||||||
The intent of Boost.Unordered is to implement a close (but imperfect)
|
The intent of Boost.Unordered is to implement a close (but imperfect)
|
||||||
implementation of the {cpp}17 standard, that will work with {cpp}98 upwards.
|
implementation of the {cpp}17 standard, that will work with {cpp}98 upwards.
|
||||||
The wide compatibility does mean some comprimises have to be made.
|
The wide compatibility does mean some comprimises have to be made.
|
||||||
With a compiler and library that fully support {cpp}11, the differences should
|
With a compiler and library that fully support {cpp}11, the differences should
|
||||||
be minor.
|
be minor.
|
||||||
|
|
||||||
== Move emulation
|
=== Move emulation
|
||||||
|
|
||||||
Support for move semantics is implemented using Boost.Move. If rvalue
|
Support for move semantics is implemented using Boost.Move. If rvalue
|
||||||
references are available it will use them, but if not it uses a close,
|
references are available it will use them, but if not it uses a close,
|
||||||
@ -23,7 +25,7 @@ but imperfect emulation. On such compilers:
|
|||||||
* The containers themselves are not movable.
|
* The containers themselves are not movable.
|
||||||
* Argument forwarding is not perfect.
|
* Argument forwarding is not perfect.
|
||||||
|
|
||||||
== Use of allocators
|
=== Use of allocators
|
||||||
|
|
||||||
{cpp}11 introduced a new allocator system. It's backwards compatible due to
|
{cpp}11 introduced a new allocator system. It's backwards compatible due to
|
||||||
the lax requirements for allocators in the old standard, but might need
|
the lax requirements for allocators in the old standard, but might need
|
||||||
@ -56,7 +58,7 @@ Due to imperfect move emulation, some assignments might check
|
|||||||
`propagate_on_container_copy_assignment` on some compilers and
|
`propagate_on_container_copy_assignment` on some compilers and
|
||||||
`propagate_on_container_move_assignment` on others.
|
`propagate_on_container_move_assignment` on others.
|
||||||
|
|
||||||
== Construction/Destruction using allocators
|
=== Construction/Destruction using allocators
|
||||||
|
|
||||||
The following support is required for full use of {cpp}11 style
|
The following support is required for full use of {cpp}11 style
|
||||||
construction/destruction:
|
construction/destruction:
|
||||||
@ -76,7 +78,7 @@ constructing a `std::pair` using `boost::tuple` (see <<compliance_pairs,below>>)
|
|||||||
When support is not available `allocator_traits::construct` and
|
When support is not available `allocator_traits::construct` and
|
||||||
`allocator_traits::destroy` are never called.
|
`allocator_traits::destroy` are never called.
|
||||||
|
|
||||||
== Pointer Traits
|
=== Pointer Traits
|
||||||
|
|
||||||
`pointer_traits` aren't used. Instead, pointer types are obtained from
|
`pointer_traits` aren't used. Instead, pointer types are obtained from
|
||||||
rebound allocators, this can cause problems if the allocator can't be
|
rebound allocators, this can cause problems if the allocator can't be
|
||||||
@ -84,7 +86,7 @@ used with incomplete types. If `const_pointer` is not defined in the
|
|||||||
allocator, `boost::pointer_to_other<pointer, const value_type>::type`
|
allocator, `boost::pointer_to_other<pointer, const value_type>::type`
|
||||||
is used to obtain a const pointer.
|
is used to obtain a const pointer.
|
||||||
|
|
||||||
== Pairs
|
=== Pairs
|
||||||
|
|
||||||
Since the containers use `std::pair` they're limited to the version
|
Since the containers use `std::pair` they're limited to the version
|
||||||
from the current standard library. But since {cpp}11 ``std::pair``'s
|
from the current standard library. But since {cpp}11 ``std::pair``'s
|
||||||
@ -105,7 +107,7 @@ Older drafts of the standard also supported variadic constructors
|
|||||||
for `std::pair`, where the first argument would be used for the
|
for `std::pair`, where the first argument would be used for the
|
||||||
first part of the pair, and the remaining for the second part.
|
first part of the pair, and the remaining for the second part.
|
||||||
|
|
||||||
== Miscellaneous
|
=== Miscellaneous
|
||||||
|
|
||||||
When swapping, `Pred` and `Hash` are not currently swapped by calling
|
When swapping, `Pred` and `Hash` are not currently swapped by calling
|
||||||
`swap`, their copy constructors are used. As a consequence when swapping
|
`swap`, their copy constructors are used. As a consequence when swapping
|
||||||
@ -114,3 +116,28 @@ an exception may be thrown from their copy constructor.
|
|||||||
Variadic constructor arguments for `emplace` are only used when both
|
Variadic constructor arguments for `emplace` are only used when both
|
||||||
rvalue references and variadic template parameters are available.
|
rvalue references and variadic template parameters are available.
|
||||||
Otherwise `emplace` can only take up to 10 constructors arguments.
|
Otherwise `emplace` can only take up to 10 constructors arguments.
|
||||||
|
|
||||||
|
== Open-addressing containers: unordered_flat_set, unordered_flat_map
|
||||||
|
|
||||||
|
The C++ standard does not currently provide any open-addressing container
|
||||||
|
specification to adhere to, so `boost::unordered_flat_set` and
|
||||||
|
`boost::unordered_flat_map` take inspiration from `std::unordered_set` and
|
||||||
|
`std::unordered_map`, respectively, and depart from their interface where
|
||||||
|
convenient or as dictated by their internal data structure, which is
|
||||||
|
radically different from that imposed by the standard (closed addressing, node based).
|
||||||
|
|
||||||
|
`unordered_flat_set` and `unordered_flat_map` only work with reasonably
|
||||||
|
compliant C++11 (or later) compilers. Language-level features such as move semantics
|
||||||
|
and variadic template parameters are then not emulated.
|
||||||
|
`unordered_flat_set` and `unordered_flat_map` are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^].
|
||||||
|
|
||||||
|
The main differences with C++ unordered associative containers are:
|
||||||
|
|
||||||
|
* `value_type` must be move-constructible.
|
||||||
|
* Pointer stability is not kept under rehashing.
|
||||||
|
* `begin()` is not constant-time.
|
||||||
|
* `erase(iterator)` returns `void` instead of an iterator to the following element.
|
||||||
|
* There is no API for bucket handling (except `bucket_count`) or node extraction/insertion.
|
||||||
|
* The maximum load factor of the container is managed internally and can't be set by the user. The maximum load,
|
||||||
|
exposed through the public function `max_load`, can not increase monotonically with the number of erasures.
|
||||||
|
|
||||||
|
@ -11,4 +11,8 @@ Copyright (C) 2005-2008 Daniel James
|
|||||||
|
|
||||||
Copyright (C) 2022 Christian Mazakas
|
Copyright (C) 2022 Christian Mazakas
|
||||||
|
|
||||||
|
Copyright (C) 2022 Joaquín M López Muñoz
|
||||||
|
|
||||||
|
Copyright (C) 2022 Peter Dimov
|
||||||
|
|
||||||
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
||||||
|
@ -29,14 +29,14 @@ struct hash_is_avalanching;
|
|||||||
|
|
||||||
A hash function is said to have the _avalanching property_ if small changes in the input translate to
|
A hash function is said to have the _avalanching property_ if small changes in the input translate to
|
||||||
large changes in the returned hash code —ideally, flipping one bit in the representation of
|
large changes in the returned hash code —ideally, flipping one bit in the representation of
|
||||||
the input value results in each bit of the hash code flipping with probability 50%. This property is
|
the input value results in each bit of the hash code flipping with probability 50%. Approaching
|
||||||
critical for the proper behavior of open-addressing hash containers.
|
this property is critical for the proper behavior of open-addressing hash containers.
|
||||||
|
|
||||||
`hash_is_avalanching<Hash>` derives from `std::true_type` if `Hash::is_avalanching` is a valid type,
|
`hash_is_avalanching<Hash>::value` is `true` if `Hash::is_avalanching` is a valid type,
|
||||||
and derives from `std::false_type` otherwise.
|
and `false` otherwise.
|
||||||
Users can then declare a hash function `Hash` as avalanching either by embedding an `is_avalanching` typedef
|
Users can then declare a hash function `Hash` as avalanching either by embedding an `is_avalanching` typedef
|
||||||
into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to derive from
|
into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to a class with
|
||||||
`std::true_type`.
|
an embedded compile-time constant `value` set to `true`.
|
||||||
|
|
||||||
xref:unordered_flat_set[`boost::unordered_flat_set`] and xref:unordered_flat_map[`boost::unordered_flat_map`]
|
xref:unordered_flat_set[`boost::unordered_flat_set`] and xref:unordered_flat_map[`boost::unordered_flat_map`]
|
||||||
use the provided hash function `Hash` as-is if `hash_is_avalanching<Hash>::value` is `true`; otherwise, they
|
use the provided hash function `Hash` as-is if `hash_is_avalanching<Hash>::value` is `true`; otherwise, they
|
||||||
|
@ -18,12 +18,12 @@ or isn't practical. In contrast, a hash table only needs an equality function
|
|||||||
and a hash function for the key.
|
and a hash function for the key.
|
||||||
|
|
||||||
With this in mind, unordered associative containers were added to the {cpp}
|
With this in mind, unordered associative containers were added to the {cpp}
|
||||||
standard. This is an implementation of the containers described in {cpp}11,
|
standard. Boost.Unordered provides an implementation of the containers described in {cpp}11,
|
||||||
with some <<compliance,deviations from the standard>> in
|
with some <<compliance,deviations from the standard>> in
|
||||||
order to work with non-{cpp}11 compilers and libraries.
|
order to work with non-{cpp}11 compilers and libraries.
|
||||||
|
|
||||||
`unordered_set` and `unordered_multiset` are defined in the header
|
`unordered_set` and `unordered_multiset` are defined in the header
|
||||||
`<boost/unordered_set.hpp>`
|
`<boost/unordered/unordered_set.hpp>`
|
||||||
[source,c++]
|
[source,c++]
|
||||||
----
|
----
|
||||||
namespace boost {
|
namespace boost {
|
||||||
@ -44,7 +44,7 @@ namespace boost {
|
|||||||
----
|
----
|
||||||
|
|
||||||
`unordered_map` and `unordered_multimap` are defined in the header
|
`unordered_map` and `unordered_multimap` are defined in the header
|
||||||
`<boost/unordered_map.hpp>`
|
`<boost/unordered/unordered_map.hpp>`
|
||||||
|
|
||||||
[source,c++]
|
[source,c++]
|
||||||
----
|
----
|
||||||
@ -65,10 +65,51 @@ namespace boost {
|
|||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
When using Boost.TR1, these classes are included from `<unordered_set>` and
|
These containers, and all other implementations of standard unordered associative
|
||||||
`<unordered_map>`, with the classes added to the `std::tr1` namespace.
|
containers, use an approach to its internal data structure design called
|
||||||
|
*closed addressing*. Starting in Boost 1.81, Boost.Unordered also provides containers
|
||||||
|
`boost::unordered_flat_set` and `boost::unordered_flat_map`, which use a
|
||||||
|
different data structure strategy commonly known as *open addressing* and depart in
|
||||||
|
a small number of ways from the standard so as to offer much better performance
|
||||||
|
in exchange (more than 2 times faster in typical scenarios):
|
||||||
|
|
||||||
The containers are used in a similar manner to the normal associative
|
|
||||||
|
[source,c++]
|
||||||
|
----
|
||||||
|
// #include <boost/unordered/unordered_flat_set.hpp>
|
||||||
|
//
|
||||||
|
// Note: no multiset version
|
||||||
|
|
||||||
|
namespace boost {
|
||||||
|
template <
|
||||||
|
class Key,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<Key> >
|
||||||
|
class unordered_flat_set;
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
[source,c++]
|
||||||
|
----
|
||||||
|
// #include <boost/unordered/unordered_flat_map.hpp>
|
||||||
|
//
|
||||||
|
// Note: no multimap version
|
||||||
|
|
||||||
|
namespace boost {
|
||||||
|
template <
|
||||||
|
class Key, class Mapped,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||||
|
class unordered_flat_map;
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
`boost::unordered_flat_set` and `boost::unordered_flat_map` require a
|
||||||
|
reasonably compliant C++11 compiler.
|
||||||
|
|
||||||
|
Boost.Unordered containers are used in a similar manner to the normal associative
|
||||||
containers:
|
containers:
|
||||||
|
|
||||||
[source,cpp]
|
[source,cpp]
|
||||||
@ -87,7 +128,7 @@ But since the elements aren't ordered, the output of:
|
|||||||
|
|
||||||
[source,c++]
|
[source,c++]
|
||||||
----
|
----
|
||||||
BOOST_FOREACH(map::value_type i, x) {
|
for(const map::value_type& i: x) {
|
||||||
std::cout<<i.first<<","<<i.second<<"\n";
|
std::cout<<i.first<<","<<i.second<<"\n";
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
@ -4,15 +4,17 @@
|
|||||||
|
|
||||||
= Implementation Rationale
|
= Implementation Rationale
|
||||||
|
|
||||||
The intent of this library is to implement the unordered
|
== boost::unordered_[multi]set and boost::unordered_[multi]map
|
||||||
containers in the standard, so the interface was fixed. But there are
|
|
||||||
|
These containers adhere to the standard requirements for unordered associative
|
||||||
|
containers, so the interface was fixed. But there are
|
||||||
still some implementation decisions to make. The priorities are
|
still some implementation decisions to make. The priorities are
|
||||||
conformance to the standard and portability.
|
conformance to the standard and portability.
|
||||||
|
|
||||||
The http://en.wikipedia.org/wiki/Hash_table[Wikipedia article on hash tables^]
|
The http://en.wikipedia.org/wiki/Hash_table[Wikipedia article on hash tables^]
|
||||||
has a good summary of the implementation issues for hash tables in general.
|
has a good summary of the implementation issues for hash tables in general.
|
||||||
|
|
||||||
== Data Structure
|
=== Data Structure
|
||||||
|
|
||||||
By specifying an interface for accessing the buckets of the container the
|
By specifying an interface for accessing the buckets of the container the
|
||||||
standard pretty much requires that the hash table uses chained addressing.
|
standard pretty much requires that the hash table uses chained addressing.
|
||||||
@ -37,7 +39,7 @@ bucket but there are some serious problems with this:
|
|||||||
|
|
||||||
So chained addressing is used.
|
So chained addressing is used.
|
||||||
|
|
||||||
== Number of Buckets
|
=== Number of Buckets
|
||||||
|
|
||||||
There are two popular methods for choosing the number of buckets in a hash
|
There are two popular methods for choosing the number of buckets in a hash
|
||||||
table. One is to have a prime number of buckets, another is to use a power
|
table. One is to have a prime number of buckets, another is to use a power
|
||||||
@ -70,3 +72,44 @@ distribution.
|
|||||||
Since release 1.80.0, prime numbers are chosen for the number of buckets in
|
Since release 1.80.0, prime numbers are chosen for the number of buckets in
|
||||||
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
|
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
|
||||||
the result of the user's hash function as was used for release 1.79.0.
|
the result of the user's hash function as was used for release 1.79.0.
|
||||||
|
|
||||||
|
== boost::unordered_flat_set and boost::unordered_flat_map
|
||||||
|
|
||||||
|
The C++ standard specification of unordered associative containers impose
|
||||||
|
severe limitations on permissible implementations, the most important being
|
||||||
|
that closed addressing is implicitly assumed. Slightly relaxing this specification
|
||||||
|
opens up the possibility of providing container variations taking full
|
||||||
|
advantage of open-addressing techniques.
|
||||||
|
|
||||||
|
The design of `boost::unordered_flat_set` and `boost::unordered_flat_map` has been
|
||||||
|
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
|
||||||
|
We discuss here the most relevant principles.
|
||||||
|
|
||||||
|
=== Hash function
|
||||||
|
|
||||||
|
Given its rich functionality and cross-platform interoperability,
|
||||||
|
`boost::hash` remains the default hash function of `boost::unordered_flat_set` and `boost::unordered_flat_map`.
|
||||||
|
As it happens, `boost::hash` for integral and other basic types does not provide
|
||||||
|
the good statistical properties required by open addressing; to cope with this,
|
||||||
|
we implement a post-mixing stage:
|
||||||
|
|
||||||
|
* 64-bit architectures: we use the `xmx` function defined in
|
||||||
|
Jon Maiga's http://jonkagstrom.com/bit-mixer-construction/index.html[The construct of a bit mixer^].
|
||||||
|
* 32-bit architectures: the mixer used was selected from a set generated with https://github.com/skeeto/hash-prospector[Hash Function Prospector^]
|
||||||
|
as the best overall performer in our internal benchmarks. Score assigned by Hash Prospector is 333.7934929677524.
|
||||||
|
|
||||||
|
When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
|
||||||
|
`boost::hash` specializations for string types are marked as avalanching.
|
||||||
|
|
||||||
|
=== Platform interoperability
|
||||||
|
|
||||||
|
The observable behavior of `boost::unordered_flat_set` and `boost::unordered_flat_map` is deterministically
|
||||||
|
identical across different compilers as long as their ``std::size_type``s are the same size and the user-provided
|
||||||
|
hash function and equality predicate are also interoperable
|
||||||
|
—this includes elements being ordered in exactly the same way for the same sequence of
|
||||||
|
operations.
|
||||||
|
|
||||||
|
Although the implementation internally uses SIMD technologies, such as https://en.wikipedia.org/wiki/SSE2[SSE2^]
|
||||||
|
and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[Neon^], when available,
|
||||||
|
this does not affect interoperatility. For instance, the behavior is the same
|
||||||
|
for Visual Studio on an Intel CPU with SSE2 in x64 and for GCC on an IBM s390x without any supported SIMD technology.
|
||||||
|
Reference in New Issue
Block a user