mirror of
https://github.com/boostorg/unordered.git
synced 2025-07-30 03:17:15 +02:00
refactored to modernize and improve flow
This commit is contained in:
@ -13,9 +13,10 @@
|
|||||||
include::unordered/intro.adoc[]
|
include::unordered/intro.adoc[]
|
||||||
include::unordered/buckets.adoc[]
|
include::unordered/buckets.adoc[]
|
||||||
include::unordered/hash_equality.adoc[]
|
include::unordered/hash_equality.adoc[]
|
||||||
include::unordered/comparison.adoc[]
|
include::unordered/regular.adoc[]
|
||||||
include::unordered/concurrent_flat_map_intro.adoc[]
|
include::unordered/concurrent.adoc[]
|
||||||
include::unordered/compliance.adoc[]
|
include::unordered/compliance.adoc[]
|
||||||
|
include::unordered/structures.adoc[]
|
||||||
include::unordered/benchmarks.adoc[]
|
include::unordered/benchmarks.adoc[]
|
||||||
include::unordered/rationale.adoc[]
|
include::unordered/rationale.adoc[]
|
||||||
include::unordered/ref.adoc[]
|
include::unordered/ref.adoc[]
|
||||||
|
@ -2,9 +2,9 @@
|
|||||||
:idprefix: buckets_
|
:idprefix: buckets_
|
||||||
:imagesdir: ../diagrams
|
:imagesdir: ../diagrams
|
||||||
|
|
||||||
= The Data Structure
|
= Basics of Hash Tables
|
||||||
|
|
||||||
The containers are made up of a number of 'buckets', each of which can contain
|
The containers are made up of a number of _buckets_, each of which can contain
|
||||||
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
|
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
|
||||||
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
||||||
have more buckets).
|
have more buckets).
|
||||||
@ -12,8 +12,7 @@ have more buckets).
|
|||||||
image::buckets.png[]
|
image::buckets.png[]
|
||||||
|
|
||||||
In order to decide which bucket to place an element in, the container applies
|
In order to decide which bucket to place an element in, the container applies
|
||||||
the hash function, `Hash`, to the element's key (for `unordered_set` and
|
the hash function, `Hash`, to the element's key (for sets the key is the whole element, but is referred to as the key
|
||||||
`unordered_multiset` the key is the whole element, but is referred to as the key
|
|
||||||
so that the same terminology can be used for sets and maps). This returns a
|
so that the same terminology can be used for sets and maps). This returns a
|
||||||
value of type `std::size_t`. `std::size_t` has a much greater range of values
|
value of type `std::size_t`. `std::size_t` has a much greater range of values
|
||||||
then the number of buckets, so the container applies another transformation to
|
then the number of buckets, so the container applies another transformation to
|
||||||
@ -80,7 +79,7 @@ h|*Method* h|*Description*
|
|||||||
|
|
||||||
|===
|
|===
|
||||||
|
|
||||||
== Controlling the number of buckets
|
== Controlling the Number of Buckets
|
||||||
|
|
||||||
As more elements are added to an unordered associative container, the number
|
As more elements are added to an unordered associative container, the number
|
||||||
of collisions will increase causing performance to degrade.
|
of collisions will increase causing performance to degrade.
|
||||||
@ -90,8 +89,8 @@ calling `rehash`.
|
|||||||
|
|
||||||
The standard leaves a lot of freedom to the implementer to decide how the
|
The standard leaves a lot of freedom to the implementer to decide how the
|
||||||
number of buckets is chosen, but it does make some requirements based on the
|
number of buckets is chosen, but it does make some requirements based on the
|
||||||
container's 'load factor', the number of elements divided by the number of buckets.
|
container's _load factor_, the number of elements divided by the number of buckets.
|
||||||
Containers also have a 'maximum load factor' which they should try to keep the
|
Containers also have a _maximum load factor_ which they should try to keep the
|
||||||
load factor below.
|
load factor below.
|
||||||
|
|
||||||
You can't control the bucket count directly but there are two ways to
|
You can't control the bucket count directly but there are two ways to
|
||||||
@ -133,9 +132,10 @@ h|*Method* h|*Description*
|
|||||||
|`void rehash(size_type n)`
|
|`void rehash(size_type n)`
|
||||||
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
|
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
|
||||||
|
|
||||||
2+^h| *Open-addressing containers only* +
|
2+^h| *Open-addressing and concurrent containers only* +
|
||||||
`boost::unordered_flat_set`, `boost::unordered_flat_map` +
|
`boost::unordered_flat_set`, `boost::unordered_flat_map` +
|
||||||
`boost::unordered_node_set`, `boost::unordered_node_map` +
|
`boost::unordered_node_set`, `boost::unordered_node_map` +
|
||||||
|
`boost::concurrent_flat_map`
|
||||||
h|*Method* h|*Description*
|
h|*Method* h|*Description*
|
||||||
|
|
||||||
|`size_type max_load() const`
|
|`size_type max_load() const`
|
||||||
@ -143,7 +143,7 @@ h|*Method* h|*Description*
|
|||||||
|
|
||||||
|===
|
|===
|
||||||
|
|
||||||
A note on `max_load` for open-addressing containers: the maximum load will be
|
A note on `max_load` for open-addressing and concurrent containers: the maximum load will be
|
||||||
(`max_load_factor() * bucket_count()`) right after `rehash` or on container creation, but may
|
(`max_load_factor() * bucket_count()`) right after `rehash` or on container creation, but may
|
||||||
slightly decrease when erasing elements in high-load situations. For instance, if we
|
slightly decrease when erasing elements in high-load situations. For instance, if we
|
||||||
have a <<unordered_flat_map,`boost::unordered_flat_map`>> with `size()` almost
|
have a <<unordered_flat_map,`boost::unordered_flat_map`>> with `size()` almost
|
||||||
@ -151,216 +151,4 @@ at `max_load()` level and then erase 1,000 elements, `max_load()` may decrease b
|
|||||||
few dozen elements. This is done internally by Boost.Unordered in order
|
few dozen elements. This is done internally by Boost.Unordered in order
|
||||||
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
|
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
|
||||||
|
|
||||||
== Iterator Invalidation
|
|
||||||
|
|
||||||
It is not specified how member functions other than `rehash` and `reserve` affect
|
|
||||||
the bucket count, although `insert` can only invalidate iterators
|
|
||||||
when the insertion causes the container's load to be greater than the maximum allowed.
|
|
||||||
For most implementations this means that `insert` will only
|
|
||||||
change the number of buckets when this happens. Iterators can be
|
|
||||||
invalidated by calls to `insert`, `rehash` and `reserve`.
|
|
||||||
|
|
||||||
As for pointers and references,
|
|
||||||
they are never invalidated for node-based containers
|
|
||||||
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
|
|
||||||
but they will when rehashing occurs for
|
|
||||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
|
|
||||||
these containers store elements directly into their holding buckets, so
|
|
||||||
when allocating a new bucket array the elements must be transferred by means of move construction.
|
|
||||||
|
|
||||||
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
|
||||||
to call `reserve` before inserting a large number of elements. This will get
|
|
||||||
the expensive rehashing out of the way and let you store iterators, safe in
|
|
||||||
the knowledge that they won't be invalidated. If you are inserting `n`
|
|
||||||
elements into container `x`, you could first call:
|
|
||||||
|
|
||||||
```
|
|
||||||
x.reserve(n);
|
|
||||||
```
|
|
||||||
|
|
||||||
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
|
|
||||||
so as to not exceed the maximum load factor.
|
|
||||||
+
|
|
||||||
Because the maximum load factor is defined as the number of elements divided by the total
|
|
||||||
number of available buckets, this function is logically equivalent to:
|
|
||||||
+
|
|
||||||
```
|
|
||||||
x.rehash(std::ceil(n / x.max_load_factor()))
|
|
||||||
```
|
|
||||||
+
|
|
||||||
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
|
|
||||||
|
|
||||||
== Fast Closed Addressing Implementation
|
|
||||||
|
|
||||||
++++
|
|
||||||
<style>
|
|
||||||
.imageblock > .title {
|
|
||||||
text-align: inherit;
|
|
||||||
}
|
|
||||||
</style>
|
|
||||||
++++
|
|
||||||
|
|
||||||
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
|
|
||||||
|
|
||||||
[#img-bucket-groups,.text-center]
|
|
||||||
.A simple bucket group approach
|
|
||||||
image::bucket-groups.png[align=center]
|
|
||||||
|
|
||||||
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
|
|
||||||
|
|
||||||
Canonical standard implementations will wind up looking like the diagram below:
|
|
||||||
|
|
||||||
[.text-center]
|
|
||||||
.The canonical standard approach
|
|
||||||
image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank]
|
|
||||||
|
|
||||||
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
|
|
||||||
|
|
||||||
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
|
|
||||||
|
|
||||||
```c++
|
|
||||||
auto const idx = get_bucket_idx(hash_function(key));
|
|
||||||
node* p = buckets[idx]; // first load
|
|
||||||
node* n = p->next; // second load
|
|
||||||
if (n && is_in_bucket(n, idx)) {
|
|
||||||
value_type const& v = *n; // third load
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
With a simple bucket group layout, this is all that must be done:
|
|
||||||
```c++
|
|
||||||
auto const idx = get_bucket_idx(hash_function(key));
|
|
||||||
node* n = buckets[idx]; // first load
|
|
||||||
if (n) {
|
|
||||||
value_type const& v = *n; // second load
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
|
|
||||||
|
|
||||||
[#img-fca-layout]
|
|
||||||
.The new layout used by Boost
|
|
||||||
image::fca.png[align=center]
|
|
||||||
|
|
||||||
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
|
|
||||||
|
|
||||||
A more detailed description of Boost.Unordered's closed-addressing implementation is
|
|
||||||
given in an
|
|
||||||
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
|
|
||||||
For more information on implementation rationale, read the
|
|
||||||
xref:#rationale_closed_addressing_containers[corresponding section].
|
|
||||||
|
|
||||||
== Open Addressing Implementation
|
|
||||||
|
|
||||||
The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and
|
|
||||||
`boost:unordered_flat_set`/`unordered_node_set`.
|
|
||||||
|
|
||||||
|
|
||||||
[#img-foa-layout]
|
|
||||||
.Open-addressing layout used by Boost.Unordered.
|
|
||||||
image::foa.png[align=center]
|
|
||||||
|
|
||||||
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
|
|
||||||
`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array.
|
|
||||||
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
|
|
||||||
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
|
|
||||||
16-byte words.
|
|
||||||
|
|
||||||
[#img-foa-metadata]
|
|
||||||
.Breakdown of a metadata word.
|
|
||||||
image::foa-metadata.png[align=center]
|
|
||||||
|
|
||||||
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
|
|
||||||
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
|
|
||||||
|
|
||||||
- 0 if the corresponding bucket is empty.
|
|
||||||
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
|
|
||||||
stop iteration when the container has been fully traversed.
|
|
||||||
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
|
|
||||||
the element.
|
|
||||||
|
|
||||||
When looking for an element with hash value _h_, SIMD technologies such as
|
|
||||||
https://en.wikipedia.org/wiki/SSE2[SSE2] and
|
|
||||||
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
|
|
||||||
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
|
|
||||||
15 buckets with just a handful of CPU instructions: non-matching buckets can be
|
|
||||||
readily discarded, and those whose reduced hash value matches need be inspected via full
|
|
||||||
comparison with the corresponding element. If the looked-for element is not present,
|
|
||||||
the overflow byte is inspected:
|
|
||||||
|
|
||||||
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
|
|
||||||
element is not present).
|
|
||||||
- If the bit is set to 1 (the group has been _overflowed_), further groups are
|
|
||||||
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
|
|
||||||
the process is repeated.
|
|
||||||
|
|
||||||
Insertion is algorithmically similar: empty buckets are located using SIMD,
|
|
||||||
and when going past a full group its corresponding overflow bit is set to 1.
|
|
||||||
|
|
||||||
In architectures without SIMD support, the logical layout stays the same, but the metadata
|
|
||||||
word is codified using a technique we call _bit interleaving_: this layout allows us
|
|
||||||
to emulate SIMD with reasonably good performance using only standard arithmetic and
|
|
||||||
logical operations.
|
|
||||||
|
|
||||||
[#img-foa-metadata-interleaving]
|
|
||||||
.Bit-interleaved metadata word.
|
|
||||||
image::foa-metadata-interleaving.png[align=center]
|
|
||||||
|
|
||||||
A more detailed description of Boost.Unordered's open-addressing implementation is
|
|
||||||
given in an
|
|
||||||
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
|
|
||||||
For more information on implementation rationale, read the
|
|
||||||
xref:#rationale_open_addresing_containers[corresponding section].
|
|
||||||
|
|
||||||
== Concurrent Open Addressing Implementation
|
|
||||||
|
|
||||||
`boost::concurrent_flat_map` uses the basic
|
|
||||||
xref::#buckets_open_addressing_implementation[open-addressing layout] described above
|
|
||||||
augmented with synchronization mechanisms.
|
|
||||||
|
|
||||||
|
|
||||||
[#img-cfoa-layout]
|
|
||||||
.Concurrent open-addressing layout used by Boost.Unordered.
|
|
||||||
image::cfoa.png[align=center]
|
|
||||||
|
|
||||||
Two levels of synchronization are used:
|
|
||||||
|
|
||||||
* Container level: A read-write mutex is used to control access from any operation
|
|
||||||
to the container. Typically, such access is in read mode (that is, concurrent) even
|
|
||||||
for modifying operations, so for most practical purposes there is no thread
|
|
||||||
contention at this level. Access is only in write mode (blocking) when rehashing or
|
|
||||||
performing container-wide operations such as swapping or assignment.
|
|
||||||
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
|
|
||||||
** A read-write spinlock for synchronized access to any element in the group.
|
|
||||||
** An atomic _insertion counter_ used for optimistic insertion as described
|
|
||||||
below.
|
|
||||||
|
|
||||||
By using atomic operations to access the group metadata, lookup is (group-level)
|
|
||||||
lock-free up to the point where an actual comparison needs to be done with an element
|
|
||||||
that has been previously SIMD-matched: only then it's the group's spinlock used.
|
|
||||||
|
|
||||||
Insertion uses the following _optimistic algorithm_:
|
|
||||||
|
|
||||||
* The value of the insertion counter for the initial group in the probe
|
|
||||||
sequence is locally recorded (let's call this value `c0`).
|
|
||||||
* Lookup is as described above. If lookup finds no equivalent element,
|
|
||||||
search for an available slot for insertion successively locks/unlocks
|
|
||||||
each group in the probing sequence.
|
|
||||||
* When an available slot is located, it is preemptively occupied (its
|
|
||||||
reduced hash value is set) and the insertion counter is atomically
|
|
||||||
incremented: if no other thread has incremented the counter during the
|
|
||||||
whole operation (which is checked by comparing with `c0`), then we're
|
|
||||||
good to go and complete the insertion, otherwise we roll back and start
|
|
||||||
over.
|
|
||||||
|
|
||||||
This algorithm has very low contention both at the lookup and actual
|
|
||||||
insertion phases in exchange for the possibility that computations have
|
|
||||||
to be started over if some other thread interferes in the process by
|
|
||||||
performing a succesful insertion beginning at the same group. In
|
|
||||||
practice, the start-over frequency is extremely small, measured in the range
|
|
||||||
of parts per million for some of our benchmarks.
|
|
||||||
|
|
||||||
For more information on implementation rationale, read the
|
|
||||||
xref:#rationale_concurrent_hashmap[corresponding section].
|
|
||||||
|
@ -6,8 +6,9 @@
|
|||||||
:github-pr-url: https://github.com/boostorg/unordered/pull
|
:github-pr-url: https://github.com/boostorg/unordered/pull
|
||||||
:cpp: C++
|
:cpp: C++
|
||||||
|
|
||||||
== Release 1.83.0
|
== Release 1.83.0 - Major update
|
||||||
|
|
||||||
|
* Added `boost::concurrent_flat_map`, a fast, thread-safe hashmap based on open addressing.
|
||||||
* Sped up iteration of open-addressing containers.
|
* Sped up iteration of open-addressing containers.
|
||||||
|
|
||||||
== Release 1.82.0 - Major update
|
== Release 1.82.0 - Major update
|
||||||
|
@ -5,7 +5,7 @@
|
|||||||
|
|
||||||
:cpp: C++
|
:cpp: C++
|
||||||
|
|
||||||
== Closed-addressing containers
|
== Closed-addressing Containers
|
||||||
|
|
||||||
`unordered_[multi]set` and `unordered_[multi]map` are intended to provide a conformant
|
`unordered_[multi]set` and `unordered_[multi]map` are intended to provide a conformant
|
||||||
implementation of the {cpp}20 standard that will work with {cpp}98 upwards.
|
implementation of the {cpp}20 standard that will work with {cpp}98 upwards.
|
||||||
@ -13,7 +13,7 @@ This wide compatibility does mean some compromises have to be made.
|
|||||||
With a compiler and library that fully support {cpp}11, the differences should
|
With a compiler and library that fully support {cpp}11, the differences should
|
||||||
be minor.
|
be minor.
|
||||||
|
|
||||||
=== Move emulation
|
=== Move Emulation
|
||||||
|
|
||||||
Support for move semantics is implemented using Boost.Move. If rvalue
|
Support for move semantics is implemented using Boost.Move. If rvalue
|
||||||
references are available it will use them, but if not it uses a close,
|
references are available it will use them, but if not it uses a close,
|
||||||
@ -25,7 +25,7 @@ but imperfect emulation. On such compilers:
|
|||||||
* The containers themselves are not movable.
|
* The containers themselves are not movable.
|
||||||
* Argument forwarding is not perfect.
|
* Argument forwarding is not perfect.
|
||||||
|
|
||||||
=== Use of allocators
|
=== Use of Allocators
|
||||||
|
|
||||||
{cpp}11 introduced a new allocator system. It's backwards compatible due to
|
{cpp}11 introduced a new allocator system. It's backwards compatible due to
|
||||||
the lax requirements for allocators in the old standard, but might need
|
the lax requirements for allocators in the old standard, but might need
|
||||||
@ -58,7 +58,7 @@ Due to imperfect move emulation, some assignments might check
|
|||||||
`propagate_on_container_copy_assignment` on some compilers and
|
`propagate_on_container_copy_assignment` on some compilers and
|
||||||
`propagate_on_container_move_assignment` on others.
|
`propagate_on_container_move_assignment` on others.
|
||||||
|
|
||||||
=== Construction/Destruction using allocators
|
=== Construction/Destruction Using Allocators
|
||||||
|
|
||||||
The following support is required for full use of {cpp}11 style
|
The following support is required for full use of {cpp}11 style
|
||||||
construction/destruction:
|
construction/destruction:
|
||||||
@ -117,7 +117,7 @@ Variadic constructor arguments for `emplace` are only used when both
|
|||||||
rvalue references and variadic template parameters are available.
|
rvalue references and variadic template parameters are available.
|
||||||
Otherwise `emplace` can only take up to 10 constructors arguments.
|
Otherwise `emplace` can only take up to 10 constructors arguments.
|
||||||
|
|
||||||
== Open-addressing containers
|
== Open-addressing Containers
|
||||||
|
|
||||||
The C++ standard does not currently provide any open-addressing container
|
The C++ standard does not currently provide any open-addressing container
|
||||||
specification to adhere to, so `boost::unordered_flat_set`/`unordered_node_set` and
|
specification to adhere to, so `boost::unordered_flat_set`/`unordered_node_set` and
|
||||||
@ -144,7 +144,7 @@ The main differences with C++ unordered associative containers are:
|
|||||||
** Pointer stability is not kept under rehashing.
|
** Pointer stability is not kept under rehashing.
|
||||||
** There is no API for node extraction/insertion.
|
** There is no API for node extraction/insertion.
|
||||||
|
|
||||||
== Concurrent Hashmap
|
== Concurrent Containers
|
||||||
|
|
||||||
There is currently no specification in the C++ standard for this or any other concurrent
|
There is currently no specification in the C++ standard for this or any other concurrent
|
||||||
data structure. `boost::concurrent_flat_map` takes the same template parameters as `std::unordered_map`
|
data structure. `boost::concurrent_flat_map` takes the same template parameters as `std::unordered_map`
|
||||||
|
@ -1,8 +1,9 @@
|
|||||||
[#concurrent_flat_map_intro]
|
[#concurrent]
|
||||||
= An introduction to boost::concurrent_flat_map
|
= Concurrent Containers
|
||||||
|
|
||||||
:idprefix: concurrent_flat_map_intro_
|
:idprefix: concurrent_
|
||||||
|
|
||||||
|
Boost.Unordered currently provides just one concurrent container named `boost::concurrent_flat_map`.
|
||||||
`boost::concurrent_flat_map` is a hash table that allows concurrent write/read access from
|
`boost::concurrent_flat_map` is a hash table that allows concurrent write/read access from
|
||||||
different threads without having to implement any synchronzation mechanism on the user's side.
|
different threads without having to implement any synchronzation mechanism on the user's side.
|
||||||
|
|
||||||
@ -131,7 +132,7 @@ by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result
|
|||||||
in higher parallelization. Consult the xref:#concurrent_flat_map[reference]
|
in higher parallelization. Consult the xref:#concurrent_flat_map[reference]
|
||||||
for a complete list of available operations.
|
for a complete list of available operations.
|
||||||
|
|
||||||
== Whole-table visitation
|
== Whole-table Visitation
|
||||||
|
|
||||||
In the absence of iterators, `boost::concurrent_flat_map` provides `visit_all`
|
In the absence of iterators, `boost::concurrent_flat_map` provides `visit_all`
|
||||||
as an alternative way to process all the elements in the map:
|
as an alternative way to process all the elements in the map:
|
||||||
@ -168,7 +169,7 @@ may be inserted, modified or erased by other threads during visitation. It is
|
|||||||
advisable not to assume too much about the exact global state of a `boost::concurrent_flat_map`
|
advisable not to assume too much about the exact global state of a `boost::concurrent_flat_map`
|
||||||
at any point in your program.
|
at any point in your program.
|
||||||
|
|
||||||
== Blocking operations
|
== Blocking Operations
|
||||||
|
|
||||||
``boost::concurrent_flat_map``s can be copied, assigned, cleared and merged just like any
|
``boost::concurrent_flat_map``s can be copied, assigned, cleared and merged just like any
|
||||||
Boost.Unordered container. Unlike most other operations, these are _blocking_,
|
Boost.Unordered container. Unlike most other operations, these are _blocking_,
|
||||||
@ -177,5 +178,5 @@ clear or merge operation is in progress. Blocking is taken care of automatically
|
|||||||
and the user need not take any special precaution, but overall performance may be affected.
|
and the user need not take any special precaution, but overall performance may be affected.
|
||||||
|
|
||||||
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
|
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
|
||||||
or during insertion when the table's load hits `max_load()`. As with non-concurrent hashmaps,
|
or during insertion when the table's load hits `max_load()`. As with non-concurrent containers,
|
||||||
reserving space in advance of bulk insertions will generally speed up the process.
|
reserving space in advance of bulk insertions will generally speed up the process.
|
@ -4,146 +4,22 @@
|
|||||||
:idprefix: intro_
|
:idprefix: intro_
|
||||||
:cpp: C++
|
:cpp: C++
|
||||||
|
|
||||||
For accessing data based on key lookup, the {cpp} standard library offers `std::set`,
|
link:https://en.wikipedia.org/wiki/Hash_table[Hash tables^] are extremely popular
|
||||||
`std::map`, `std::multiset` and `std::multimap`. These are generally
|
computer data structures and can be found under one form or another in virtually any programming
|
||||||
implemented using balanced binary trees so that lookup time has
|
language. Whereas other associative structures such as rb-trees (used in {cpp} by `std::set` and `std::map`)
|
||||||
logarithmic complexity. That is generally okay, but in many cases a
|
have logarithmic-time complexity for insertion and lookup, hash tables, if configured properly,
|
||||||
link:https://en.wikipedia.org/wiki/Hash_table[hash table^] can perform better, as accessing data has constant complexity,
|
perform these operations in constant time on average, and are generally much faster.
|
||||||
on average. The worst case complexity is linear, but that occurs rarely and
|
|
||||||
with some care, can be avoided.
|
|
||||||
|
|
||||||
Also, the existing containers require a 'less than' comparison object
|
{cpp} introduced __unordered associative containers__ `std::unordered_set`, `std::unordered_map`,
|
||||||
to order their elements. For some data types this is impossible to implement
|
`std::unordered_multiset` and `std::unordered_multimap` in {cpp}11, but research on hash tables
|
||||||
or isn't practical. In contrast, a hash table only needs an equality function
|
hasn't stopped since: advances in CPU architectures such as
|
||||||
and a hash function for the key.
|
more powerful caches, link:https://en.wikipedia.org/wiki/Single_instruction,_multiple_data[SIMD] operations
|
||||||
|
and increasingly available link:https://en.wikipedia.org/wiki/Multi-core_processor[multicore processors]
|
||||||
|
open up possibilities for improved hash-based data structures and new use cases that
|
||||||
|
are simply beyond reach of unordered associative containers as specified in 2011.
|
||||||
|
|
||||||
With this in mind, unordered associative containers were added to the {cpp}
|
Boost.Unordered offers a catalog of hash containers with different standards compliance levels,
|
||||||
standard. Boost.Unordered provides an implementation of the containers described in {cpp}11,
|
performances and intented usage scenarios:
|
||||||
with some <<compliance,deviations from the standard>> in
|
|
||||||
order to work with non-{cpp}11 compilers and libraries.
|
|
||||||
|
|
||||||
`unordered_set` and `unordered_multiset` are defined in the header
|
|
||||||
`<boost/unordered/unordered_set.hpp>`
|
|
||||||
[source,c++]
|
|
||||||
----
|
|
||||||
namespace boost {
|
|
||||||
template <
|
|
||||||
class Key,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<Key> >
|
|
||||||
class unordered_set;
|
|
||||||
|
|
||||||
template<
|
|
||||||
class Key,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<Key> >
|
|
||||||
class unordered_multiset;
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
`unordered_map` and `unordered_multimap` are defined in the header
|
|
||||||
`<boost/unordered/unordered_map.hpp>`
|
|
||||||
|
|
||||||
[source,c++]
|
|
||||||
----
|
|
||||||
namespace boost {
|
|
||||||
template <
|
|
||||||
class Key, class Mapped,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
|
||||||
class unordered_map;
|
|
||||||
|
|
||||||
template<
|
|
||||||
class Key, class Mapped,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
|
||||||
class unordered_multimap;
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
These containers, and all other implementations of standard unordered associative
|
|
||||||
containers, use an approach to its internal data structure design called
|
|
||||||
*closed addressing*. Starting in Boost 1.81, Boost.Unordered also provides containers
|
|
||||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`, which use a
|
|
||||||
different data structure strategy commonly known as *open addressing* and depart in
|
|
||||||
a small number of ways from the standard so as to offer much better performance
|
|
||||||
in exchange (more than 2 times faster in typical scenarios):
|
|
||||||
|
|
||||||
|
|
||||||
[source,c++]
|
|
||||||
----
|
|
||||||
// #include <boost/unordered/unordered_flat_set.hpp>
|
|
||||||
//
|
|
||||||
// Note: no multiset version
|
|
||||||
|
|
||||||
namespace boost {
|
|
||||||
template <
|
|
||||||
class Key,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<Key> >
|
|
||||||
class unordered_flat_set;
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
[source,c++]
|
|
||||||
----
|
|
||||||
// #include <boost/unordered/unordered_flat_map.hpp>
|
|
||||||
//
|
|
||||||
// Note: no multimap version
|
|
||||||
|
|
||||||
namespace boost {
|
|
||||||
template <
|
|
||||||
class Key, class Mapped,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
|
||||||
class unordered_flat_map;
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
Starting in Boost 1.82, the containers `boost::unordered_node_set` and `boost::unordered_node_map`
|
|
||||||
are introduced: they use open addressing like `boost::unordered_flat_set` and `boost::unordered_flat_map`,
|
|
||||||
but internally store element _nodes_, like `boost::unordered_set` and `boost::unordered_map`,
|
|
||||||
which provide stability of pointers and references to the elements:
|
|
||||||
|
|
||||||
[source,c++]
|
|
||||||
----
|
|
||||||
// #include <boost/unordered/unordered_node_set.hpp>
|
|
||||||
//
|
|
||||||
// Note: no multiset version
|
|
||||||
|
|
||||||
namespace boost {
|
|
||||||
template <
|
|
||||||
class Key,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<Key> >
|
|
||||||
class unordered_node_set;
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
[source,c++]
|
|
||||||
----
|
|
||||||
// #include <boost/unordered/unordered_node_map.hpp>
|
|
||||||
//
|
|
||||||
// Note: no multimap version
|
|
||||||
|
|
||||||
namespace boost {
|
|
||||||
template <
|
|
||||||
class Key, class Mapped,
|
|
||||||
class Hash = boost::hash<Key>,
|
|
||||||
class Pred = std::equal_to<Key>,
|
|
||||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
|
||||||
class unordered_node_map;
|
|
||||||
}
|
|
||||||
----
|
|
||||||
|
|
||||||
These are all the containers provided by Boost.Unordered:
|
|
||||||
|
|
||||||
[caption=, title='Table {counter:table-counter}. Boost.Unordered containers']
|
[caption=, title='Table {counter:table-counter}. Boost.Unordered containers']
|
||||||
[cols="1,1,.^1", frame=all, grid=rows]
|
[cols="1,1,.^1", frame=all, grid=rows]
|
||||||
@ -165,44 +41,49 @@ These are all the containers provided by Boost.Unordered:
|
|||||||
^| `boost::unordered_flat_set` +
|
^| `boost::unordered_flat_set` +
|
||||||
`boost::unordered_flat_map`
|
`boost::unordered_flat_map`
|
||||||
|
|
||||||
|
^.^h|*Concurrent*
|
||||||
|
^|
|
||||||
|
^| `boost::concurrent_flat_map`
|
||||||
|
|
||||||
|===
|
|===
|
||||||
|
|
||||||
Closed-addressing containers are pass:[C++]98-compatible. Open-addressing containers require a
|
* **Closed-addressing containers** are fully compliant with the C++ specification
|
||||||
reasonably compliant pass:[C++]11 compiler.
|
for unordered associative containers and feature one of the fastest implementations
|
||||||
|
in the market within the technical constraints imposed by the required standard interface.
|
||||||
|
* **Open-addressing containers** rely on much faster data structures and algorithms
|
||||||
|
(more than 2 times faster in typical scenarios) while slightly diverging from the standard
|
||||||
|
interface to accommodate the implementation.
|
||||||
|
There are two variants: **flat** (the fastest) and **node-based**, which
|
||||||
|
provide pointer stability under rehashing at the expense of being slower.
|
||||||
|
* Finally, `boost::concurrent_flat_map` (the only **concurrent container** provided
|
||||||
|
at present) is a hashmap designed and implemented to be used in high-performance
|
||||||
|
multithreaded scenarios. Its interface is radically different from that of regular C++ containers.
|
||||||
|
|
||||||
Boost.Unordered containers are used in a similar manner to the normal associative
|
All sets and maps in Boost.Unordered are instantiatied similarly as
|
||||||
containers:
|
`std::unordered_set` and `std::unordered_map`, respectively:
|
||||||
|
|
||||||
[source,cpp]
|
|
||||||
----
|
|
||||||
typedef boost::unordered_map<std::string, int> map;
|
|
||||||
map x;
|
|
||||||
x["one"] = 1;
|
|
||||||
x["two"] = 2;
|
|
||||||
x["three"] = 3;
|
|
||||||
|
|
||||||
assert(x.at("one") == 1);
|
|
||||||
assert(x.find("missing") == x.end());
|
|
||||||
----
|
|
||||||
|
|
||||||
But since the elements aren't ordered, the output of:
|
|
||||||
|
|
||||||
[source,c++]
|
[source,c++]
|
||||||
----
|
----
|
||||||
for(const map::value_type& i: x) {
|
namespace boost {
|
||||||
std::cout<<i.first<<","<<i.second<<"\n";
|
template <
|
||||||
|
class Key,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<Key> >
|
||||||
|
class unordered_set;
|
||||||
|
// same for unordered_multiset, unordered_flat_set, unordered_node_set
|
||||||
|
|
||||||
|
template <
|
||||||
|
class Key, class Mapped,
|
||||||
|
class Hash = boost::hash<Key>,
|
||||||
|
class Pred = std::equal_to<Key>,
|
||||||
|
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||||
|
class unordered_map;
|
||||||
|
// same for unordered_multimap, unordered_flat_map, unordered_node_map
|
||||||
|
// and concurrent_flat_map
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
can be in any order. For example, it might be:
|
|
||||||
|
|
||||||
[source]
|
|
||||||
----
|
|
||||||
two,2
|
|
||||||
one,1
|
|
||||||
three,3
|
|
||||||
----
|
|
||||||
|
|
||||||
To store an object in an unordered associative container requires both a
|
To store an object in an unordered associative container requires both a
|
||||||
key equality function and a hash function. The default function objects in
|
key equality function and a hash function. The default function objects in
|
||||||
the standard containers support a few basic types including integer types,
|
the standard containers support a few basic types including integer types,
|
||||||
@ -213,16 +94,3 @@ you have to extend Boost.Hash to support the type or use
|
|||||||
your own custom equality predicates and hash functions. See the
|
your own custom equality predicates and hash functions. See the
|
||||||
<<hash_equality,Equality Predicates and Hash Functions>> section
|
<<hash_equality,Equality Predicates and Hash Functions>> section
|
||||||
for more details.
|
for more details.
|
||||||
|
|
||||||
There are other differences, which are listed in the
|
|
||||||
<<comparison,Comparison with Associative Containers>> section.
|
|
||||||
|
|
||||||
== A concurrent hashmap
|
|
||||||
|
|
||||||
Starting in Boost 1.83, Boost.Unordered provides `boost::concurrent_flat_map`,
|
|
||||||
a thread-safe hash table for high performance multithreaded scenarios. Although
|
|
||||||
it shares the internal data structure and most of the algorithms with Boost.Unordered
|
|
||||||
open-addressing `boost::unordered_flat_map`, ``boost::concurrent_flat_map``'s API departs significantly
|
|
||||||
from that of C++ unordered associative containers to make this table suitable for
|
|
||||||
concurrent usage. Consult the xref:#concurrent_flat_map_intro[dedicated tutorial]
|
|
||||||
for more information.
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
= Implementation Rationale
|
= Implementation Rationale
|
||||||
|
|
||||||
== Closed-addressing containers
|
== Closed-addressing Containers
|
||||||
|
|
||||||
`boost::unordered_[multi]set` and `boost::unordered_[multi]map`
|
`boost::unordered_[multi]set` and `boost::unordered_[multi]map`
|
||||||
adhere to the standard requirements for unordered associative
|
adhere to the standard requirements for unordered associative
|
||||||
@ -74,7 +74,7 @@ Since release 1.80.0, prime numbers are chosen for the number of buckets in
|
|||||||
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
|
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
|
||||||
the result of the user's hash function as was used for release 1.79.0.
|
the result of the user's hash function as was used for release 1.79.0.
|
||||||
|
|
||||||
== Open-addresing containers
|
== Open-addresing Containers
|
||||||
|
|
||||||
The C++ standard specification of unordered associative containers impose
|
The C++ standard specification of unordered associative containers impose
|
||||||
severe limitations on permissible implementations, the most important being
|
severe limitations on permissible implementations, the most important being
|
||||||
@ -86,7 +86,7 @@ The design of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unord
|
|||||||
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
|
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
|
||||||
We discuss here the most relevant principles.
|
We discuss here the most relevant principles.
|
||||||
|
|
||||||
=== Hash function
|
=== Hash Function
|
||||||
|
|
||||||
Given its rich functionality and cross-platform interoperability,
|
Given its rich functionality and cross-platform interoperability,
|
||||||
`boost::hash` remains the default hash function of open-addressing containers.
|
`boost::hash` remains the default hash function of open-addressing containers.
|
||||||
@ -105,7 +105,7 @@ whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from https://arxiv.org/ab
|
|||||||
When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
|
When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
|
||||||
`boost::hash` specializations for string types are marked as avalanching.
|
`boost::hash` specializations for string types are marked as avalanching.
|
||||||
|
|
||||||
=== Platform interoperability
|
=== Platform Interoperability
|
||||||
|
|
||||||
The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically
|
The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically
|
||||||
identical across different compilers as long as their ``std::size_t``s are the same size and the user-provided
|
identical across different compilers as long as their ``std::size_t``s are the same size and the user-provided
|
||||||
@ -118,7 +118,7 @@ and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[N
|
|||||||
this does not affect interoperatility. For instance, the behavior is the same
|
this does not affect interoperatility. For instance, the behavior is the same
|
||||||
for Visual Studio on an x64-mode Intel CPU with SSE2 and for GCC on an IBM s390x without any supported SIMD technology.
|
for Visual Studio on an x64-mode Intel CPU with SSE2 and for GCC on an IBM s390x without any supported SIMD technology.
|
||||||
|
|
||||||
== Concurrent Hashmap
|
== Concurrent Containers
|
||||||
|
|
||||||
The same data structure used by Boost.Unordered open-addressing containers has been chosen
|
The same data structure used by Boost.Unordered open-addressing containers has been chosen
|
||||||
also as the foundation of `boost::concurrent_flat_map`:
|
also as the foundation of `boost::concurrent_flat_map`:
|
||||||
@ -132,7 +132,7 @@ lookup that are lock-free up to the last step of actual element comparison.
|
|||||||
of all elements between `boost::concurrent_flat_map` and `boost::unordered_flat_map`.
|
of all elements between `boost::concurrent_flat_map` and `boost::unordered_flat_map`.
|
||||||
(This feature has not been implemented yet.)
|
(This feature has not been implemented yet.)
|
||||||
|
|
||||||
=== Hash function and platform interoperability
|
=== Hash Function and Platform Interoperability
|
||||||
|
|
||||||
`boost::concurrent_flat_map` makes the same decisions and provides the same guarantees
|
`boost::concurrent_flat_map` makes the same decisions and provides the same guarantees
|
||||||
as Boost.Unordered open-addressing containers with regards to
|
as Boost.Unordered open-addressing containers with regards to
|
||||||
|
@ -1,8 +1,99 @@
|
|||||||
|
[#regular]
|
||||||
|
= Regular Containers
|
||||||
|
|
||||||
|
:idprefix: regular_
|
||||||
|
|
||||||
|
Boost.Unordered closed-addressing containers (`boost::unordered_set`, `boost::unordered_map`,
|
||||||
|
`boost::unordered_multiset` and `boost::unordered_multimap`) are fully conformant with the
|
||||||
|
C++ specification for unordered associative containers, so for those who know how to use
|
||||||
|
`std::unordered_set`, `std::unordered_map`, etc., their homonyms in Boost:Unordered are
|
||||||
|
drop-in replacements. The interface of open-addressing containers (`boost::unordered_node_set`,
|
||||||
|
`boost::unordered_node_map`, `boost::unordered_flat_set` and `boost::unordered_flat_map`)
|
||||||
|
is very similar, but they present some minor differences listed in the dedicated
|
||||||
|
xref:#compliance_open_addressing_containers[standard compliance section].
|
||||||
|
|
||||||
|
|
||||||
|
For readers without previous experience with hash containers but familiar
|
||||||
|
with normal associatve containers (`std::set`, `std::map`,
|
||||||
|
`std::multiset` and `std::multimap`), Boost.Unordered containers are used in a similar manner:
|
||||||
|
|
||||||
|
[source,cpp]
|
||||||
|
----
|
||||||
|
typedef boost::unordered_map<std::string, int> map;
|
||||||
|
map x;
|
||||||
|
x["one"] = 1;
|
||||||
|
x["two"] = 2;
|
||||||
|
x["three"] = 3;
|
||||||
|
|
||||||
|
assert(x.at("one") == 1);
|
||||||
|
assert(x.find("missing") == x.end());
|
||||||
|
----
|
||||||
|
|
||||||
|
But since the elements aren't ordered, the output of:
|
||||||
|
|
||||||
|
[source,c++]
|
||||||
|
----
|
||||||
|
for(const map::value_type& i: x) {
|
||||||
|
std::cout<<i.first<<","<<i.second<<"\n";
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
can be in any order. For example, it might be:
|
||||||
|
|
||||||
|
[source]
|
||||||
|
----
|
||||||
|
two,2
|
||||||
|
one,1
|
||||||
|
three,3
|
||||||
|
----
|
||||||
|
|
||||||
|
There are other differences, which are listed in the
|
||||||
|
<<comparison,Comparison with Associative Containers>> section.
|
||||||
|
|
||||||
|
== Iterator Invalidation
|
||||||
|
|
||||||
|
It is not specified how member functions other than `rehash` and `reserve` affect
|
||||||
|
the bucket count, although `insert` can only invalidate iterators
|
||||||
|
when the insertion causes the container's load to be greater than the maximum allowed.
|
||||||
|
For most implementations this means that `insert` will only
|
||||||
|
change the number of buckets when this happens. Iterators can be
|
||||||
|
invalidated by calls to `insert`, `rehash` and `reserve`.
|
||||||
|
|
||||||
|
As for pointers and references,
|
||||||
|
they are never invalidated for node-based containers
|
||||||
|
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
|
||||||
|
but they will when rehashing occurs for
|
||||||
|
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
|
||||||
|
these containers store elements directly into their holding buckets, so
|
||||||
|
when allocating a new bucket array the elements must be transferred by means of move construction.
|
||||||
|
|
||||||
|
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
||||||
|
to call `reserve` before inserting a large number of elements. This will get
|
||||||
|
the expensive rehashing out of the way and let you store iterators, safe in
|
||||||
|
the knowledge that they won't be invalidated. If you are inserting `n`
|
||||||
|
elements into container `x`, you could first call:
|
||||||
|
|
||||||
|
```
|
||||||
|
x.reserve(n);
|
||||||
|
```
|
||||||
|
|
||||||
|
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
|
||||||
|
so as to not exceed the maximum load factor.
|
||||||
|
+
|
||||||
|
Because the maximum load factor is defined as the number of elements divided by the total
|
||||||
|
number of available buckets, this function is logically equivalent to:
|
||||||
|
+
|
||||||
|
```
|
||||||
|
x.rehash(std::ceil(n / x.max_load_factor()))
|
||||||
|
```
|
||||||
|
+
|
||||||
|
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
|
||||||
|
|
||||||
[#comparison]
|
[#comparison]
|
||||||
|
|
||||||
:idprefix: comparison_
|
:idprefix: comparison_
|
||||||
|
|
||||||
= Comparison with Associative Containers
|
== Comparison with Associative Containers
|
||||||
|
|
||||||
[caption=, title='Table {counter:table-counter} Interface differences']
|
[caption=, title='Table {counter:table-counter} Interface differences']
|
||||||
[cols="1,1", frame=all, grid=rows]
|
[cols="1,1", frame=all, grid=rows]
|
||||||
@ -32,7 +123,7 @@
|
|||||||
|`iterator`, `const_iterator` are of at least the forward category.
|
|`iterator`, `const_iterator` are of at least the forward category.
|
||||||
|
|
||||||
|Iterators, pointers and references to the container's elements are never invalidated.
|
|Iterators, pointers and references to the container's elements are never invalidated.
|
||||||
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|
|<<regular_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|
||||||
**Node-based containers:** Pointers and references to the container's elements are never invalidated. +
|
**Node-based containers:** Pointers and references to the container's elements are never invalidated. +
|
||||||
**Flat containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
|
**Flat containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
|
||||||
|
|
179
doc/unordered/structures.adoc
Normal file
179
doc/unordered/structures.adoc
Normal file
@ -0,0 +1,179 @@
|
|||||||
|
[#structures]
|
||||||
|
= Data Structures
|
||||||
|
|
||||||
|
:idprefix: structures_
|
||||||
|
|
||||||
|
== Closed-addressing Containers
|
||||||
|
|
||||||
|
++++
|
||||||
|
<style>
|
||||||
|
.imageblock > .title {
|
||||||
|
text-align: inherit;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
++++
|
||||||
|
|
||||||
|
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
|
||||||
|
|
||||||
|
[#img-bucket-groups,.text-center]
|
||||||
|
.A simple bucket group approach
|
||||||
|
image::bucket-groups.png[align=center]
|
||||||
|
|
||||||
|
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
|
||||||
|
|
||||||
|
Canonical standard implementations will wind up looking like the diagram below:
|
||||||
|
|
||||||
|
[.text-center]
|
||||||
|
.The canonical standard approach
|
||||||
|
image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank]
|
||||||
|
|
||||||
|
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
|
||||||
|
|
||||||
|
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
|
||||||
|
|
||||||
|
```c++
|
||||||
|
auto const idx = get_bucket_idx(hash_function(key));
|
||||||
|
node* p = buckets[idx]; // first load
|
||||||
|
node* n = p->next; // second load
|
||||||
|
if (n && is_in_bucket(n, idx)) {
|
||||||
|
value_type const& v = *n; // third load
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
With a simple bucket group layout, this is all that must be done:
|
||||||
|
```c++
|
||||||
|
auto const idx = get_bucket_idx(hash_function(key));
|
||||||
|
node* n = buckets[idx]; // first load
|
||||||
|
if (n) {
|
||||||
|
value_type const& v = *n; // second load
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
|
||||||
|
|
||||||
|
[#img-fca-layout]
|
||||||
|
.The new layout used by Boost
|
||||||
|
image::fca.png[align=center]
|
||||||
|
|
||||||
|
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
|
||||||
|
|
||||||
|
A more detailed description of Boost.Unordered's closed-addressing implementation is
|
||||||
|
given in an
|
||||||
|
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
|
||||||
|
For more information on implementation rationale, read the
|
||||||
|
xref:#rationale_closed_addressing_containers[corresponding section].
|
||||||
|
|
||||||
|
== Open-addressing Containers
|
||||||
|
|
||||||
|
The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and
|
||||||
|
`boost:unordered_flat_set`/`unordered_node_set`.
|
||||||
|
|
||||||
|
|
||||||
|
[#img-foa-layout]
|
||||||
|
.Open-addressing layout used by Boost.Unordered.
|
||||||
|
image::foa.png[align=center]
|
||||||
|
|
||||||
|
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
|
||||||
|
`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array.
|
||||||
|
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
|
||||||
|
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
|
||||||
|
16-byte words.
|
||||||
|
|
||||||
|
[#img-foa-metadata]
|
||||||
|
.Breakdown of a metadata word.
|
||||||
|
image::foa-metadata.png[align=center]
|
||||||
|
|
||||||
|
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
|
||||||
|
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
|
||||||
|
|
||||||
|
- 0 if the corresponding bucket is empty.
|
||||||
|
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
|
||||||
|
stop iteration when the container has been fully traversed.
|
||||||
|
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
|
||||||
|
the element.
|
||||||
|
|
||||||
|
When looking for an element with hash value _h_, SIMD technologies such as
|
||||||
|
https://en.wikipedia.org/wiki/SSE2[SSE2] and
|
||||||
|
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
|
||||||
|
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
|
||||||
|
15 buckets with just a handful of CPU instructions: non-matching buckets can be
|
||||||
|
readily discarded, and those whose reduced hash value matches need be inspected via full
|
||||||
|
comparison with the corresponding element. If the looked-for element is not present,
|
||||||
|
the overflow byte is inspected:
|
||||||
|
|
||||||
|
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
|
||||||
|
element is not present).
|
||||||
|
- If the bit is set to 1 (the group has been _overflowed_), further groups are
|
||||||
|
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
|
||||||
|
the process is repeated.
|
||||||
|
|
||||||
|
Insertion is algorithmically similar: empty buckets are located using SIMD,
|
||||||
|
and when going past a full group its corresponding overflow bit is set to 1.
|
||||||
|
|
||||||
|
In architectures without SIMD support, the logical layout stays the same, but the metadata
|
||||||
|
word is codified using a technique we call _bit interleaving_: this layout allows us
|
||||||
|
to emulate SIMD with reasonably good performance using only standard arithmetic and
|
||||||
|
logical operations.
|
||||||
|
|
||||||
|
[#img-foa-metadata-interleaving]
|
||||||
|
.Bit-interleaved metadata word.
|
||||||
|
image::foa-metadata-interleaving.png[align=center]
|
||||||
|
|
||||||
|
A more detailed description of Boost.Unordered's open-addressing implementation is
|
||||||
|
given in an
|
||||||
|
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
|
||||||
|
For more information on implementation rationale, read the
|
||||||
|
xref:#rationale_open_addresing_containers[corresponding section].
|
||||||
|
|
||||||
|
== Concurrent Containers
|
||||||
|
|
||||||
|
`boost::concurrent_flat_map` uses the basic
|
||||||
|
xref:#structures_open_addressing_containers[open-addressing layout] described above
|
||||||
|
augmented with synchronization mechanisms.
|
||||||
|
|
||||||
|
|
||||||
|
[#img-cfoa-layout]
|
||||||
|
.Concurrent open-addressing layout used by Boost.Unordered.
|
||||||
|
image::cfoa.png[align=center]
|
||||||
|
|
||||||
|
Two levels of synchronization are used:
|
||||||
|
|
||||||
|
* Container level: A read-write mutex is used to control access from any operation
|
||||||
|
to the container. Typically, such access is in read mode (that is, concurrent) even
|
||||||
|
for modifying operations, so for most practical purposes there is no thread
|
||||||
|
contention at this level. Access is only in write mode (blocking) when rehashing or
|
||||||
|
performing container-wide operations such as swapping or assignment.
|
||||||
|
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
|
||||||
|
** A read-write spinlock for synchronized access to any element in the group.
|
||||||
|
** An atomic _insertion counter_ used for optimistic insertion as described
|
||||||
|
below.
|
||||||
|
|
||||||
|
By using atomic operations to access the group metadata, lookup is (group-level)
|
||||||
|
lock-free up to the point where an actual comparison needs to be done with an element
|
||||||
|
that has been previously SIMD-matched: only then it's the group's spinlock used.
|
||||||
|
|
||||||
|
Insertion uses the following _optimistic algorithm_:
|
||||||
|
|
||||||
|
* The value of the insertion counter for the initial group in the probe
|
||||||
|
sequence is locally recorded (let's call this value `c0`).
|
||||||
|
* Lookup is as described above. If lookup finds no equivalent element,
|
||||||
|
search for an available slot for insertion successively locks/unlocks
|
||||||
|
each group in the probing sequence.
|
||||||
|
* When an available slot is located, it is preemptively occupied (its
|
||||||
|
reduced hash value is set) and the insertion counter is atomically
|
||||||
|
incremented: if no other thread has incremented the counter during the
|
||||||
|
whole operation (which is checked by comparing with `c0`), then we're
|
||||||
|
good to go and complete the insertion, otherwise we roll back and start
|
||||||
|
over.
|
||||||
|
|
||||||
|
This algorithm has very low contention both at the lookup and actual
|
||||||
|
insertion phases in exchange for the possibility that computations have
|
||||||
|
to be started over if some other thread interferes in the process by
|
||||||
|
performing a succesful insertion beginning at the same group. In
|
||||||
|
practice, the start-over frequency is extremely small, measured in the range
|
||||||
|
of parts per million for some of our benchmarks.
|
||||||
|
|
||||||
|
For more information on implementation rationale, read the
|
||||||
|
xref:#rationale_concurrent_containers[corresponding section].
|
Reference in New Issue
Block a user