forked from boostorg/unordered
refactored to modernize and improve flow
This commit is contained in:
@ -13,9 +13,10 @@
|
||||
include::unordered/intro.adoc[]
|
||||
include::unordered/buckets.adoc[]
|
||||
include::unordered/hash_equality.adoc[]
|
||||
include::unordered/comparison.adoc[]
|
||||
include::unordered/concurrent_flat_map_intro.adoc[]
|
||||
include::unordered/regular.adoc[]
|
||||
include::unordered/concurrent.adoc[]
|
||||
include::unordered/compliance.adoc[]
|
||||
include::unordered/structures.adoc[]
|
||||
include::unordered/benchmarks.adoc[]
|
||||
include::unordered/rationale.adoc[]
|
||||
include::unordered/ref.adoc[]
|
||||
|
@ -2,9 +2,9 @@
|
||||
:idprefix: buckets_
|
||||
:imagesdir: ../diagrams
|
||||
|
||||
= The Data Structure
|
||||
= Basics of Hash Tables
|
||||
|
||||
The containers are made up of a number of 'buckets', each of which can contain
|
||||
The containers are made up of a number of _buckets_, each of which can contain
|
||||
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
|
||||
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
||||
have more buckets).
|
||||
@ -12,8 +12,7 @@ have more buckets).
|
||||
image::buckets.png[]
|
||||
|
||||
In order to decide which bucket to place an element in, the container applies
|
||||
the hash function, `Hash`, to the element's key (for `unordered_set` and
|
||||
`unordered_multiset` the key is the whole element, but is referred to as the key
|
||||
the hash function, `Hash`, to the element's key (for sets the key is the whole element, but is referred to as the key
|
||||
so that the same terminology can be used for sets and maps). This returns a
|
||||
value of type `std::size_t`. `std::size_t` has a much greater range of values
|
||||
then the number of buckets, so the container applies another transformation to
|
||||
@ -80,7 +79,7 @@ h|*Method* h|*Description*
|
||||
|
||||
|===
|
||||
|
||||
== Controlling the number of buckets
|
||||
== Controlling the Number of Buckets
|
||||
|
||||
As more elements are added to an unordered associative container, the number
|
||||
of collisions will increase causing performance to degrade.
|
||||
@ -90,8 +89,8 @@ calling `rehash`.
|
||||
|
||||
The standard leaves a lot of freedom to the implementer to decide how the
|
||||
number of buckets is chosen, but it does make some requirements based on the
|
||||
container's 'load factor', the number of elements divided by the number of buckets.
|
||||
Containers also have a 'maximum load factor' which they should try to keep the
|
||||
container's _load factor_, the number of elements divided by the number of buckets.
|
||||
Containers also have a _maximum load factor_ which they should try to keep the
|
||||
load factor below.
|
||||
|
||||
You can't control the bucket count directly but there are two ways to
|
||||
@ -133,9 +132,10 @@ h|*Method* h|*Description*
|
||||
|`void rehash(size_type n)`
|
||||
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
|
||||
|
||||
2+^h| *Open-addressing containers only* +
|
||||
2+^h| *Open-addressing and concurrent containers only* +
|
||||
`boost::unordered_flat_set`, `boost::unordered_flat_map` +
|
||||
`boost::unordered_node_set`, `boost::unordered_node_map` +
|
||||
`boost::concurrent_flat_map`
|
||||
h|*Method* h|*Description*
|
||||
|
||||
|`size_type max_load() const`
|
||||
@ -143,7 +143,7 @@ h|*Method* h|*Description*
|
||||
|
||||
|===
|
||||
|
||||
A note on `max_load` for open-addressing containers: the maximum load will be
|
||||
A note on `max_load` for open-addressing and concurrent containers: the maximum load will be
|
||||
(`max_load_factor() * bucket_count()`) right after `rehash` or on container creation, but may
|
||||
slightly decrease when erasing elements in high-load situations. For instance, if we
|
||||
have a <<unordered_flat_map,`boost::unordered_flat_map`>> with `size()` almost
|
||||
@ -151,216 +151,4 @@ at `max_load()` level and then erase 1,000 elements, `max_load()` may decrease b
|
||||
few dozen elements. This is done internally by Boost.Unordered in order
|
||||
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
|
||||
|
||||
== Iterator Invalidation
|
||||
|
||||
It is not specified how member functions other than `rehash` and `reserve` affect
|
||||
the bucket count, although `insert` can only invalidate iterators
|
||||
when the insertion causes the container's load to be greater than the maximum allowed.
|
||||
For most implementations this means that `insert` will only
|
||||
change the number of buckets when this happens. Iterators can be
|
||||
invalidated by calls to `insert`, `rehash` and `reserve`.
|
||||
|
||||
As for pointers and references,
|
||||
they are never invalidated for node-based containers
|
||||
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
|
||||
but they will when rehashing occurs for
|
||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
|
||||
these containers store elements directly into their holding buckets, so
|
||||
when allocating a new bucket array the elements must be transferred by means of move construction.
|
||||
|
||||
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
||||
to call `reserve` before inserting a large number of elements. This will get
|
||||
the expensive rehashing out of the way and let you store iterators, safe in
|
||||
the knowledge that they won't be invalidated. If you are inserting `n`
|
||||
elements into container `x`, you could first call:
|
||||
|
||||
```
|
||||
x.reserve(n);
|
||||
```
|
||||
|
||||
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
|
||||
so as to not exceed the maximum load factor.
|
||||
+
|
||||
Because the maximum load factor is defined as the number of elements divided by the total
|
||||
number of available buckets, this function is logically equivalent to:
|
||||
+
|
||||
```
|
||||
x.rehash(std::ceil(n / x.max_load_factor()))
|
||||
```
|
||||
+
|
||||
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
|
||||
|
||||
== Fast Closed Addressing Implementation
|
||||
|
||||
++++
|
||||
<style>
|
||||
.imageblock > .title {
|
||||
text-align: inherit;
|
||||
}
|
||||
</style>
|
||||
++++
|
||||
|
||||
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
|
||||
|
||||
[#img-bucket-groups,.text-center]
|
||||
.A simple bucket group approach
|
||||
image::bucket-groups.png[align=center]
|
||||
|
||||
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
|
||||
|
||||
Canonical standard implementations will wind up looking like the diagram below:
|
||||
|
||||
[.text-center]
|
||||
.The canonical standard approach
|
||||
image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank]
|
||||
|
||||
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
|
||||
|
||||
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
|
||||
|
||||
```c++
|
||||
auto const idx = get_bucket_idx(hash_function(key));
|
||||
node* p = buckets[idx]; // first load
|
||||
node* n = p->next; // second load
|
||||
if (n && is_in_bucket(n, idx)) {
|
||||
value_type const& v = *n; // third load
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
With a simple bucket group layout, this is all that must be done:
|
||||
```c++
|
||||
auto const idx = get_bucket_idx(hash_function(key));
|
||||
node* n = buckets[idx]; // first load
|
||||
if (n) {
|
||||
value_type const& v = *n; // second load
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
|
||||
|
||||
[#img-fca-layout]
|
||||
.The new layout used by Boost
|
||||
image::fca.png[align=center]
|
||||
|
||||
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
|
||||
|
||||
A more detailed description of Boost.Unordered's closed-addressing implementation is
|
||||
given in an
|
||||
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
|
||||
For more information on implementation rationale, read the
|
||||
xref:#rationale_closed_addressing_containers[corresponding section].
|
||||
|
||||
== Open Addressing Implementation
|
||||
|
||||
The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and
|
||||
`boost:unordered_flat_set`/`unordered_node_set`.
|
||||
|
||||
|
||||
[#img-foa-layout]
|
||||
.Open-addressing layout used by Boost.Unordered.
|
||||
image::foa.png[align=center]
|
||||
|
||||
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
|
||||
`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array.
|
||||
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
|
||||
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
|
||||
16-byte words.
|
||||
|
||||
[#img-foa-metadata]
|
||||
.Breakdown of a metadata word.
|
||||
image::foa-metadata.png[align=center]
|
||||
|
||||
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
|
||||
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
|
||||
|
||||
- 0 if the corresponding bucket is empty.
|
||||
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
|
||||
stop iteration when the container has been fully traversed.
|
||||
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
|
||||
the element.
|
||||
|
||||
When looking for an element with hash value _h_, SIMD technologies such as
|
||||
https://en.wikipedia.org/wiki/SSE2[SSE2] and
|
||||
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
|
||||
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
|
||||
15 buckets with just a handful of CPU instructions: non-matching buckets can be
|
||||
readily discarded, and those whose reduced hash value matches need be inspected via full
|
||||
comparison with the corresponding element. If the looked-for element is not present,
|
||||
the overflow byte is inspected:
|
||||
|
||||
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
|
||||
element is not present).
|
||||
- If the bit is set to 1 (the group has been _overflowed_), further groups are
|
||||
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
|
||||
the process is repeated.
|
||||
|
||||
Insertion is algorithmically similar: empty buckets are located using SIMD,
|
||||
and when going past a full group its corresponding overflow bit is set to 1.
|
||||
|
||||
In architectures without SIMD support, the logical layout stays the same, but the metadata
|
||||
word is codified using a technique we call _bit interleaving_: this layout allows us
|
||||
to emulate SIMD with reasonably good performance using only standard arithmetic and
|
||||
logical operations.
|
||||
|
||||
[#img-foa-metadata-interleaving]
|
||||
.Bit-interleaved metadata word.
|
||||
image::foa-metadata-interleaving.png[align=center]
|
||||
|
||||
A more detailed description of Boost.Unordered's open-addressing implementation is
|
||||
given in an
|
||||
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
|
||||
For more information on implementation rationale, read the
|
||||
xref:#rationale_open_addresing_containers[corresponding section].
|
||||
|
||||
== Concurrent Open Addressing Implementation
|
||||
|
||||
`boost::concurrent_flat_map` uses the basic
|
||||
xref::#buckets_open_addressing_implementation[open-addressing layout] described above
|
||||
augmented with synchronization mechanisms.
|
||||
|
||||
|
||||
[#img-cfoa-layout]
|
||||
.Concurrent open-addressing layout used by Boost.Unordered.
|
||||
image::cfoa.png[align=center]
|
||||
|
||||
Two levels of synchronization are used:
|
||||
|
||||
* Container level: A read-write mutex is used to control access from any operation
|
||||
to the container. Typically, such access is in read mode (that is, concurrent) even
|
||||
for modifying operations, so for most practical purposes there is no thread
|
||||
contention at this level. Access is only in write mode (blocking) when rehashing or
|
||||
performing container-wide operations such as swapping or assignment.
|
||||
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
|
||||
** A read-write spinlock for synchronized access to any element in the group.
|
||||
** An atomic _insertion counter_ used for optimistic insertion as described
|
||||
below.
|
||||
|
||||
By using atomic operations to access the group metadata, lookup is (group-level)
|
||||
lock-free up to the point where an actual comparison needs to be done with an element
|
||||
that has been previously SIMD-matched: only then it's the group's spinlock used.
|
||||
|
||||
Insertion uses the following _optimistic algorithm_:
|
||||
|
||||
* The value of the insertion counter for the initial group in the probe
|
||||
sequence is locally recorded (let's call this value `c0`).
|
||||
* Lookup is as described above. If lookup finds no equivalent element,
|
||||
search for an available slot for insertion successively locks/unlocks
|
||||
each group in the probing sequence.
|
||||
* When an available slot is located, it is preemptively occupied (its
|
||||
reduced hash value is set) and the insertion counter is atomically
|
||||
incremented: if no other thread has incremented the counter during the
|
||||
whole operation (which is checked by comparing with `c0`), then we're
|
||||
good to go and complete the insertion, otherwise we roll back and start
|
||||
over.
|
||||
|
||||
This algorithm has very low contention both at the lookup and actual
|
||||
insertion phases in exchange for the possibility that computations have
|
||||
to be started over if some other thread interferes in the process by
|
||||
performing a succesful insertion beginning at the same group. In
|
||||
practice, the start-over frequency is extremely small, measured in the range
|
||||
of parts per million for some of our benchmarks.
|
||||
|
||||
For more information on implementation rationale, read the
|
||||
xref:#rationale_concurrent_hashmap[corresponding section].
|
||||
|
@ -6,8 +6,9 @@
|
||||
:github-pr-url: https://github.com/boostorg/unordered/pull
|
||||
:cpp: C++
|
||||
|
||||
== Release 1.83.0
|
||||
== Release 1.83.0 - Major update
|
||||
|
||||
* Added `boost::concurrent_flat_map`, a fast, thread-safe hashmap based on open addressing.
|
||||
* Sped up iteration of open-addressing containers.
|
||||
|
||||
== Release 1.82.0 - Major update
|
||||
|
@ -5,7 +5,7 @@
|
||||
|
||||
:cpp: C++
|
||||
|
||||
== Closed-addressing containers
|
||||
== Closed-addressing Containers
|
||||
|
||||
`unordered_[multi]set` and `unordered_[multi]map` are intended to provide a conformant
|
||||
implementation of the {cpp}20 standard that will work with {cpp}98 upwards.
|
||||
@ -13,7 +13,7 @@ This wide compatibility does mean some compromises have to be made.
|
||||
With a compiler and library that fully support {cpp}11, the differences should
|
||||
be minor.
|
||||
|
||||
=== Move emulation
|
||||
=== Move Emulation
|
||||
|
||||
Support for move semantics is implemented using Boost.Move. If rvalue
|
||||
references are available it will use them, but if not it uses a close,
|
||||
@ -25,7 +25,7 @@ but imperfect emulation. On such compilers:
|
||||
* The containers themselves are not movable.
|
||||
* Argument forwarding is not perfect.
|
||||
|
||||
=== Use of allocators
|
||||
=== Use of Allocators
|
||||
|
||||
{cpp}11 introduced a new allocator system. It's backwards compatible due to
|
||||
the lax requirements for allocators in the old standard, but might need
|
||||
@ -58,7 +58,7 @@ Due to imperfect move emulation, some assignments might check
|
||||
`propagate_on_container_copy_assignment` on some compilers and
|
||||
`propagate_on_container_move_assignment` on others.
|
||||
|
||||
=== Construction/Destruction using allocators
|
||||
=== Construction/Destruction Using Allocators
|
||||
|
||||
The following support is required for full use of {cpp}11 style
|
||||
construction/destruction:
|
||||
@ -117,7 +117,7 @@ Variadic constructor arguments for `emplace` are only used when both
|
||||
rvalue references and variadic template parameters are available.
|
||||
Otherwise `emplace` can only take up to 10 constructors arguments.
|
||||
|
||||
== Open-addressing containers
|
||||
== Open-addressing Containers
|
||||
|
||||
The C++ standard does not currently provide any open-addressing container
|
||||
specification to adhere to, so `boost::unordered_flat_set`/`unordered_node_set` and
|
||||
@ -144,7 +144,7 @@ The main differences with C++ unordered associative containers are:
|
||||
** Pointer stability is not kept under rehashing.
|
||||
** There is no API for node extraction/insertion.
|
||||
|
||||
== Concurrent Hashmap
|
||||
== Concurrent Containers
|
||||
|
||||
There is currently no specification in the C++ standard for this or any other concurrent
|
||||
data structure. `boost::concurrent_flat_map` takes the same template parameters as `std::unordered_map`
|
||||
|
@ -1,8 +1,9 @@
|
||||
[#concurrent_flat_map_intro]
|
||||
= An introduction to boost::concurrent_flat_map
|
||||
[#concurrent]
|
||||
= Concurrent Containers
|
||||
|
||||
:idprefix: concurrent_flat_map_intro_
|
||||
:idprefix: concurrent_
|
||||
|
||||
Boost.Unordered currently provides just one concurrent container named `boost::concurrent_flat_map`.
|
||||
`boost::concurrent_flat_map` is a hash table that allows concurrent write/read access from
|
||||
different threads without having to implement any synchronzation mechanism on the user's side.
|
||||
|
||||
@ -131,7 +132,7 @@ by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result
|
||||
in higher parallelization. Consult the xref:#concurrent_flat_map[reference]
|
||||
for a complete list of available operations.
|
||||
|
||||
== Whole-table visitation
|
||||
== Whole-table Visitation
|
||||
|
||||
In the absence of iterators, `boost::concurrent_flat_map` provides `visit_all`
|
||||
as an alternative way to process all the elements in the map:
|
||||
@ -168,7 +169,7 @@ may be inserted, modified or erased by other threads during visitation. It is
|
||||
advisable not to assume too much about the exact global state of a `boost::concurrent_flat_map`
|
||||
at any point in your program.
|
||||
|
||||
== Blocking operations
|
||||
== Blocking Operations
|
||||
|
||||
``boost::concurrent_flat_map``s can be copied, assigned, cleared and merged just like any
|
||||
Boost.Unordered container. Unlike most other operations, these are _blocking_,
|
||||
@ -177,5 +178,5 @@ clear or merge operation is in progress. Blocking is taken care of automatically
|
||||
and the user need not take any special precaution, but overall performance may be affected.
|
||||
|
||||
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
|
||||
or during insertion when the table's load hits `max_load()`. As with non-concurrent hashmaps,
|
||||
or during insertion when the table's load hits `max_load()`. As with non-concurrent containers,
|
||||
reserving space in advance of bulk insertions will generally speed up the process.
|
@ -4,146 +4,22 @@
|
||||
:idprefix: intro_
|
||||
:cpp: C++
|
||||
|
||||
For accessing data based on key lookup, the {cpp} standard library offers `std::set`,
|
||||
`std::map`, `std::multiset` and `std::multimap`. These are generally
|
||||
implemented using balanced binary trees so that lookup time has
|
||||
logarithmic complexity. That is generally okay, but in many cases a
|
||||
link:https://en.wikipedia.org/wiki/Hash_table[hash table^] can perform better, as accessing data has constant complexity,
|
||||
on average. The worst case complexity is linear, but that occurs rarely and
|
||||
with some care, can be avoided.
|
||||
link:https://en.wikipedia.org/wiki/Hash_table[Hash tables^] are extremely popular
|
||||
computer data structures and can be found under one form or another in virtually any programming
|
||||
language. Whereas other associative structures such as rb-trees (used in {cpp} by `std::set` and `std::map`)
|
||||
have logarithmic-time complexity for insertion and lookup, hash tables, if configured properly,
|
||||
perform these operations in constant time on average, and are generally much faster.
|
||||
|
||||
Also, the existing containers require a 'less than' comparison object
|
||||
to order their elements. For some data types this is impossible to implement
|
||||
or isn't practical. In contrast, a hash table only needs an equality function
|
||||
and a hash function for the key.
|
||||
{cpp} introduced __unordered associative containers__ `std::unordered_set`, `std::unordered_map`,
|
||||
`std::unordered_multiset` and `std::unordered_multimap` in {cpp}11, but research on hash tables
|
||||
hasn't stopped since: advances in CPU architectures such as
|
||||
more powerful caches, link:https://en.wikipedia.org/wiki/Single_instruction,_multiple_data[SIMD] operations
|
||||
and increasingly available link:https://en.wikipedia.org/wiki/Multi-core_processor[multicore processors]
|
||||
open up possibilities for improved hash-based data structures and new use cases that
|
||||
are simply beyond reach of unordered associative containers as specified in 2011.
|
||||
|
||||
With this in mind, unordered associative containers were added to the {cpp}
|
||||
standard. Boost.Unordered provides an implementation of the containers described in {cpp}11,
|
||||
with some <<compliance,deviations from the standard>> in
|
||||
order to work with non-{cpp}11 compilers and libraries.
|
||||
|
||||
`unordered_set` and `unordered_multiset` are defined in the header
|
||||
`<boost/unordered/unordered_set.hpp>`
|
||||
[source,c++]
|
||||
----
|
||||
namespace boost {
|
||||
template <
|
||||
class Key,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<Key> >
|
||||
class unordered_set;
|
||||
|
||||
template<
|
||||
class Key,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<Key> >
|
||||
class unordered_multiset;
|
||||
}
|
||||
----
|
||||
|
||||
`unordered_map` and `unordered_multimap` are defined in the header
|
||||
`<boost/unordered/unordered_map.hpp>`
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
namespace boost {
|
||||
template <
|
||||
class Key, class Mapped,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||
class unordered_map;
|
||||
|
||||
template<
|
||||
class Key, class Mapped,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||
class unordered_multimap;
|
||||
}
|
||||
----
|
||||
|
||||
These containers, and all other implementations of standard unordered associative
|
||||
containers, use an approach to its internal data structure design called
|
||||
*closed addressing*. Starting in Boost 1.81, Boost.Unordered also provides containers
|
||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`, which use a
|
||||
different data structure strategy commonly known as *open addressing* and depart in
|
||||
a small number of ways from the standard so as to offer much better performance
|
||||
in exchange (more than 2 times faster in typical scenarios):
|
||||
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
// #include <boost/unordered/unordered_flat_set.hpp>
|
||||
//
|
||||
// Note: no multiset version
|
||||
|
||||
namespace boost {
|
||||
template <
|
||||
class Key,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<Key> >
|
||||
class unordered_flat_set;
|
||||
}
|
||||
----
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
// #include <boost/unordered/unordered_flat_map.hpp>
|
||||
//
|
||||
// Note: no multimap version
|
||||
|
||||
namespace boost {
|
||||
template <
|
||||
class Key, class Mapped,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||
class unordered_flat_map;
|
||||
}
|
||||
----
|
||||
|
||||
Starting in Boost 1.82, the containers `boost::unordered_node_set` and `boost::unordered_node_map`
|
||||
are introduced: they use open addressing like `boost::unordered_flat_set` and `boost::unordered_flat_map`,
|
||||
but internally store element _nodes_, like `boost::unordered_set` and `boost::unordered_map`,
|
||||
which provide stability of pointers and references to the elements:
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
// #include <boost/unordered/unordered_node_set.hpp>
|
||||
//
|
||||
// Note: no multiset version
|
||||
|
||||
namespace boost {
|
||||
template <
|
||||
class Key,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<Key> >
|
||||
class unordered_node_set;
|
||||
}
|
||||
----
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
// #include <boost/unordered/unordered_node_map.hpp>
|
||||
//
|
||||
// Note: no multimap version
|
||||
|
||||
namespace boost {
|
||||
template <
|
||||
class Key, class Mapped,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||
class unordered_node_map;
|
||||
}
|
||||
----
|
||||
|
||||
These are all the containers provided by Boost.Unordered:
|
||||
Boost.Unordered offers a catalog of hash containers with different standards compliance levels,
|
||||
performances and intented usage scenarios:
|
||||
|
||||
[caption=, title='Table {counter:table-counter}. Boost.Unordered containers']
|
||||
[cols="1,1,.^1", frame=all, grid=rows]
|
||||
@ -165,44 +41,49 @@ These are all the containers provided by Boost.Unordered:
|
||||
^| `boost::unordered_flat_set` +
|
||||
`boost::unordered_flat_map`
|
||||
|
||||
^.^h|*Concurrent*
|
||||
^|
|
||||
^| `boost::concurrent_flat_map`
|
||||
|
||||
|===
|
||||
|
||||
Closed-addressing containers are pass:[C++]98-compatible. Open-addressing containers require a
|
||||
reasonably compliant pass:[C++]11 compiler.
|
||||
* **Closed-addressing containers** are fully compliant with the C++ specification
|
||||
for unordered associative containers and feature one of the fastest implementations
|
||||
in the market within the technical constraints imposed by the required standard interface.
|
||||
* **Open-addressing containers** rely on much faster data structures and algorithms
|
||||
(more than 2 times faster in typical scenarios) while slightly diverging from the standard
|
||||
interface to accommodate the implementation.
|
||||
There are two variants: **flat** (the fastest) and **node-based**, which
|
||||
provide pointer stability under rehashing at the expense of being slower.
|
||||
* Finally, `boost::concurrent_flat_map` (the only **concurrent container** provided
|
||||
at present) is a hashmap designed and implemented to be used in high-performance
|
||||
multithreaded scenarios. Its interface is radically different from that of regular C++ containers.
|
||||
|
||||
Boost.Unordered containers are used in a similar manner to the normal associative
|
||||
containers:
|
||||
|
||||
[source,cpp]
|
||||
----
|
||||
typedef boost::unordered_map<std::string, int> map;
|
||||
map x;
|
||||
x["one"] = 1;
|
||||
x["two"] = 2;
|
||||
x["three"] = 3;
|
||||
|
||||
assert(x.at("one") == 1);
|
||||
assert(x.find("missing") == x.end());
|
||||
----
|
||||
|
||||
But since the elements aren't ordered, the output of:
|
||||
All sets and maps in Boost.Unordered are instantiatied similarly as
|
||||
`std::unordered_set` and `std::unordered_map`, respectively:
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
for(const map::value_type& i: x) {
|
||||
std::cout<<i.first<<","<<i.second<<"\n";
|
||||
----
|
||||
namespace boost {
|
||||
template <
|
||||
class Key,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<Key> >
|
||||
class unordered_set;
|
||||
// same for unordered_multiset, unordered_flat_set, unordered_node_set
|
||||
|
||||
template <
|
||||
class Key, class Mapped,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||
class unordered_map;
|
||||
// same for unordered_multimap, unordered_flat_map, unordered_node_map
|
||||
// and concurrent_flat_map
|
||||
}
|
||||
----
|
||||
|
||||
can be in any order. For example, it might be:
|
||||
|
||||
[source]
|
||||
----
|
||||
two,2
|
||||
one,1
|
||||
three,3
|
||||
----
|
||||
|
||||
To store an object in an unordered associative container requires both a
|
||||
key equality function and a hash function. The default function objects in
|
||||
the standard containers support a few basic types including integer types,
|
||||
@ -213,16 +94,3 @@ you have to extend Boost.Hash to support the type or use
|
||||
your own custom equality predicates and hash functions. See the
|
||||
<<hash_equality,Equality Predicates and Hash Functions>> section
|
||||
for more details.
|
||||
|
||||
There are other differences, which are listed in the
|
||||
<<comparison,Comparison with Associative Containers>> section.
|
||||
|
||||
== A concurrent hashmap
|
||||
|
||||
Starting in Boost 1.83, Boost.Unordered provides `boost::concurrent_flat_map`,
|
||||
a thread-safe hash table for high performance multithreaded scenarios. Although
|
||||
it shares the internal data structure and most of the algorithms with Boost.Unordered
|
||||
open-addressing `boost::unordered_flat_map`, ``boost::concurrent_flat_map``'s API departs significantly
|
||||
from that of C++ unordered associative containers to make this table suitable for
|
||||
concurrent usage. Consult the xref:#concurrent_flat_map_intro[dedicated tutorial]
|
||||
for more information.
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
= Implementation Rationale
|
||||
|
||||
== Closed-addressing containers
|
||||
== Closed-addressing Containers
|
||||
|
||||
`boost::unordered_[multi]set` and `boost::unordered_[multi]map`
|
||||
adhere to the standard requirements for unordered associative
|
||||
@ -74,7 +74,7 @@ Since release 1.80.0, prime numbers are chosen for the number of buckets in
|
||||
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
|
||||
the result of the user's hash function as was used for release 1.79.0.
|
||||
|
||||
== Open-addresing containers
|
||||
== Open-addresing Containers
|
||||
|
||||
The C++ standard specification of unordered associative containers impose
|
||||
severe limitations on permissible implementations, the most important being
|
||||
@ -86,7 +86,7 @@ The design of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unord
|
||||
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
|
||||
We discuss here the most relevant principles.
|
||||
|
||||
=== Hash function
|
||||
=== Hash Function
|
||||
|
||||
Given its rich functionality and cross-platform interoperability,
|
||||
`boost::hash` remains the default hash function of open-addressing containers.
|
||||
@ -105,7 +105,7 @@ whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from https://arxiv.org/ab
|
||||
When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
|
||||
`boost::hash` specializations for string types are marked as avalanching.
|
||||
|
||||
=== Platform interoperability
|
||||
=== Platform Interoperability
|
||||
|
||||
The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically
|
||||
identical across different compilers as long as their ``std::size_t``s are the same size and the user-provided
|
||||
@ -118,7 +118,7 @@ and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[N
|
||||
this does not affect interoperatility. For instance, the behavior is the same
|
||||
for Visual Studio on an x64-mode Intel CPU with SSE2 and for GCC on an IBM s390x without any supported SIMD technology.
|
||||
|
||||
== Concurrent Hashmap
|
||||
== Concurrent Containers
|
||||
|
||||
The same data structure used by Boost.Unordered open-addressing containers has been chosen
|
||||
also as the foundation of `boost::concurrent_flat_map`:
|
||||
@ -132,7 +132,7 @@ lookup that are lock-free up to the last step of actual element comparison.
|
||||
of all elements between `boost::concurrent_flat_map` and `boost::unordered_flat_map`.
|
||||
(This feature has not been implemented yet.)
|
||||
|
||||
=== Hash function and platform interoperability
|
||||
=== Hash Function and Platform Interoperability
|
||||
|
||||
`boost::concurrent_flat_map` makes the same decisions and provides the same guarantees
|
||||
as Boost.Unordered open-addressing containers with regards to
|
||||
|
@ -1,8 +1,99 @@
|
||||
[#regular]
|
||||
= Regular Containers
|
||||
|
||||
:idprefix: regular_
|
||||
|
||||
Boost.Unordered closed-addressing containers (`boost::unordered_set`, `boost::unordered_map`,
|
||||
`boost::unordered_multiset` and `boost::unordered_multimap`) are fully conformant with the
|
||||
C++ specification for unordered associative containers, so for those who know how to use
|
||||
`std::unordered_set`, `std::unordered_map`, etc., their homonyms in Boost:Unordered are
|
||||
drop-in replacements. The interface of open-addressing containers (`boost::unordered_node_set`,
|
||||
`boost::unordered_node_map`, `boost::unordered_flat_set` and `boost::unordered_flat_map`)
|
||||
is very similar, but they present some minor differences listed in the dedicated
|
||||
xref:#compliance_open_addressing_containers[standard compliance section].
|
||||
|
||||
|
||||
For readers without previous experience with hash containers but familiar
|
||||
with normal associatve containers (`std::set`, `std::map`,
|
||||
`std::multiset` and `std::multimap`), Boost.Unordered containers are used in a similar manner:
|
||||
|
||||
[source,cpp]
|
||||
----
|
||||
typedef boost::unordered_map<std::string, int> map;
|
||||
map x;
|
||||
x["one"] = 1;
|
||||
x["two"] = 2;
|
||||
x["three"] = 3;
|
||||
|
||||
assert(x.at("one") == 1);
|
||||
assert(x.find("missing") == x.end());
|
||||
----
|
||||
|
||||
But since the elements aren't ordered, the output of:
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
for(const map::value_type& i: x) {
|
||||
std::cout<<i.first<<","<<i.second<<"\n";
|
||||
}
|
||||
----
|
||||
|
||||
can be in any order. For example, it might be:
|
||||
|
||||
[source]
|
||||
----
|
||||
two,2
|
||||
one,1
|
||||
three,3
|
||||
----
|
||||
|
||||
There are other differences, which are listed in the
|
||||
<<comparison,Comparison with Associative Containers>> section.
|
||||
|
||||
== Iterator Invalidation
|
||||
|
||||
It is not specified how member functions other than `rehash` and `reserve` affect
|
||||
the bucket count, although `insert` can only invalidate iterators
|
||||
when the insertion causes the container's load to be greater than the maximum allowed.
|
||||
For most implementations this means that `insert` will only
|
||||
change the number of buckets when this happens. Iterators can be
|
||||
invalidated by calls to `insert`, `rehash` and `reserve`.
|
||||
|
||||
As for pointers and references,
|
||||
they are never invalidated for node-based containers
|
||||
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
|
||||
but they will when rehashing occurs for
|
||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
|
||||
these containers store elements directly into their holding buckets, so
|
||||
when allocating a new bucket array the elements must be transferred by means of move construction.
|
||||
|
||||
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
||||
to call `reserve` before inserting a large number of elements. This will get
|
||||
the expensive rehashing out of the way and let you store iterators, safe in
|
||||
the knowledge that they won't be invalidated. If you are inserting `n`
|
||||
elements into container `x`, you could first call:
|
||||
|
||||
```
|
||||
x.reserve(n);
|
||||
```
|
||||
|
||||
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
|
||||
so as to not exceed the maximum load factor.
|
||||
+
|
||||
Because the maximum load factor is defined as the number of elements divided by the total
|
||||
number of available buckets, this function is logically equivalent to:
|
||||
+
|
||||
```
|
||||
x.rehash(std::ceil(n / x.max_load_factor()))
|
||||
```
|
||||
+
|
||||
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
|
||||
|
||||
[#comparison]
|
||||
|
||||
:idprefix: comparison_
|
||||
|
||||
= Comparison with Associative Containers
|
||||
== Comparison with Associative Containers
|
||||
|
||||
[caption=, title='Table {counter:table-counter} Interface differences']
|
||||
[cols="1,1", frame=all, grid=rows]
|
||||
@ -32,7 +123,7 @@
|
||||
|`iterator`, `const_iterator` are of at least the forward category.
|
||||
|
||||
|Iterators, pointers and references to the container's elements are never invalidated.
|
||||
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|
||||
|<<regular_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|
||||
**Node-based containers:** Pointers and references to the container's elements are never invalidated. +
|
||||
**Flat containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
|
||||
|
179
doc/unordered/structures.adoc
Normal file
179
doc/unordered/structures.adoc
Normal file
@ -0,0 +1,179 @@
|
||||
[#structures]
|
||||
= Data Structures
|
||||
|
||||
:idprefix: structures_
|
||||
|
||||
== Closed-addressing Containers
|
||||
|
||||
++++
|
||||
<style>
|
||||
.imageblock > .title {
|
||||
text-align: inherit;
|
||||
}
|
||||
</style>
|
||||
++++
|
||||
|
||||
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
|
||||
|
||||
[#img-bucket-groups,.text-center]
|
||||
.A simple bucket group approach
|
||||
image::bucket-groups.png[align=center]
|
||||
|
||||
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
|
||||
|
||||
Canonical standard implementations will wind up looking like the diagram below:
|
||||
|
||||
[.text-center]
|
||||
.The canonical standard approach
|
||||
image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank]
|
||||
|
||||
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
|
||||
|
||||
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
|
||||
|
||||
```c++
|
||||
auto const idx = get_bucket_idx(hash_function(key));
|
||||
node* p = buckets[idx]; // first load
|
||||
node* n = p->next; // second load
|
||||
if (n && is_in_bucket(n, idx)) {
|
||||
value_type const& v = *n; // third load
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
With a simple bucket group layout, this is all that must be done:
|
||||
```c++
|
||||
auto const idx = get_bucket_idx(hash_function(key));
|
||||
node* n = buckets[idx]; // first load
|
||||
if (n) {
|
||||
value_type const& v = *n; // second load
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
|
||||
|
||||
[#img-fca-layout]
|
||||
.The new layout used by Boost
|
||||
image::fca.png[align=center]
|
||||
|
||||
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
|
||||
|
||||
A more detailed description of Boost.Unordered's closed-addressing implementation is
|
||||
given in an
|
||||
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
|
||||
For more information on implementation rationale, read the
|
||||
xref:#rationale_closed_addressing_containers[corresponding section].
|
||||
|
||||
== Open-addressing Containers
|
||||
|
||||
The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and
|
||||
`boost:unordered_flat_set`/`unordered_node_set`.
|
||||
|
||||
|
||||
[#img-foa-layout]
|
||||
.Open-addressing layout used by Boost.Unordered.
|
||||
image::foa.png[align=center]
|
||||
|
||||
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
|
||||
`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array.
|
||||
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
|
||||
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
|
||||
16-byte words.
|
||||
|
||||
[#img-foa-metadata]
|
||||
.Breakdown of a metadata word.
|
||||
image::foa-metadata.png[align=center]
|
||||
|
||||
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
|
||||
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
|
||||
|
||||
- 0 if the corresponding bucket is empty.
|
||||
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
|
||||
stop iteration when the container has been fully traversed.
|
||||
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
|
||||
the element.
|
||||
|
||||
When looking for an element with hash value _h_, SIMD technologies such as
|
||||
https://en.wikipedia.org/wiki/SSE2[SSE2] and
|
||||
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
|
||||
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
|
||||
15 buckets with just a handful of CPU instructions: non-matching buckets can be
|
||||
readily discarded, and those whose reduced hash value matches need be inspected via full
|
||||
comparison with the corresponding element. If the looked-for element is not present,
|
||||
the overflow byte is inspected:
|
||||
|
||||
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
|
||||
element is not present).
|
||||
- If the bit is set to 1 (the group has been _overflowed_), further groups are
|
||||
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
|
||||
the process is repeated.
|
||||
|
||||
Insertion is algorithmically similar: empty buckets are located using SIMD,
|
||||
and when going past a full group its corresponding overflow bit is set to 1.
|
||||
|
||||
In architectures without SIMD support, the logical layout stays the same, but the metadata
|
||||
word is codified using a technique we call _bit interleaving_: this layout allows us
|
||||
to emulate SIMD with reasonably good performance using only standard arithmetic and
|
||||
logical operations.
|
||||
|
||||
[#img-foa-metadata-interleaving]
|
||||
.Bit-interleaved metadata word.
|
||||
image::foa-metadata-interleaving.png[align=center]
|
||||
|
||||
A more detailed description of Boost.Unordered's open-addressing implementation is
|
||||
given in an
|
||||
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
|
||||
For more information on implementation rationale, read the
|
||||
xref:#rationale_open_addresing_containers[corresponding section].
|
||||
|
||||
== Concurrent Containers
|
||||
|
||||
`boost::concurrent_flat_map` uses the basic
|
||||
xref:#structures_open_addressing_containers[open-addressing layout] described above
|
||||
augmented with synchronization mechanisms.
|
||||
|
||||
|
||||
[#img-cfoa-layout]
|
||||
.Concurrent open-addressing layout used by Boost.Unordered.
|
||||
image::cfoa.png[align=center]
|
||||
|
||||
Two levels of synchronization are used:
|
||||
|
||||
* Container level: A read-write mutex is used to control access from any operation
|
||||
to the container. Typically, such access is in read mode (that is, concurrent) even
|
||||
for modifying operations, so for most practical purposes there is no thread
|
||||
contention at this level. Access is only in write mode (blocking) when rehashing or
|
||||
performing container-wide operations such as swapping or assignment.
|
||||
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
|
||||
** A read-write spinlock for synchronized access to any element in the group.
|
||||
** An atomic _insertion counter_ used for optimistic insertion as described
|
||||
below.
|
||||
|
||||
By using atomic operations to access the group metadata, lookup is (group-level)
|
||||
lock-free up to the point where an actual comparison needs to be done with an element
|
||||
that has been previously SIMD-matched: only then it's the group's spinlock used.
|
||||
|
||||
Insertion uses the following _optimistic algorithm_:
|
||||
|
||||
* The value of the insertion counter for the initial group in the probe
|
||||
sequence is locally recorded (let's call this value `c0`).
|
||||
* Lookup is as described above. If lookup finds no equivalent element,
|
||||
search for an available slot for insertion successively locks/unlocks
|
||||
each group in the probing sequence.
|
||||
* When an available slot is located, it is preemptively occupied (its
|
||||
reduced hash value is set) and the insertion counter is atomically
|
||||
incremented: if no other thread has incremented the counter during the
|
||||
whole operation (which is checked by comparing with `c0`), then we're
|
||||
good to go and complete the insertion, otherwise we roll back and start
|
||||
over.
|
||||
|
||||
This algorithm has very low contention both at the lookup and actual
|
||||
insertion phases in exchange for the possibility that computations have
|
||||
to be started over if some other thread interferes in the process by
|
||||
performing a succesful insertion beginning at the same group. In
|
||||
practice, the start-over frequency is extremely small, measured in the range
|
||||
of parts per million for some of our benchmarks.
|
||||
|
||||
For more information on implementation rationale, read the
|
||||
xref:#rationale_concurrent_containers[corresponding section].
|
Reference in New Issue
Block a user