refactored to modernize and improve flow

This commit is contained in:
joaquintides
2023-05-18 20:18:58 +02:00
parent ff10b287e2
commit 3d640ac032
9 changed files with 354 additions and 425 deletions

View File

@ -13,9 +13,10 @@
include::unordered/intro.adoc[]
include::unordered/buckets.adoc[]
include::unordered/hash_equality.adoc[]
include::unordered/comparison.adoc[]
include::unordered/concurrent_flat_map_intro.adoc[]
include::unordered/regular.adoc[]
include::unordered/concurrent.adoc[]
include::unordered/compliance.adoc[]
include::unordered/structures.adoc[]
include::unordered/benchmarks.adoc[]
include::unordered/rationale.adoc[]
include::unordered/ref.adoc[]

View File

@ -2,9 +2,9 @@
:idprefix: buckets_
:imagesdir: ../diagrams
= The Data Structure
= Basics of Hash Tables
The containers are made up of a number of 'buckets', each of which can contain
The containers are made up of a number of _buckets_, each of which can contain
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
have more buckets).
@ -12,8 +12,7 @@ have more buckets).
image::buckets.png[]
In order to decide which bucket to place an element in, the container applies
the hash function, `Hash`, to the element's key (for `unordered_set` and
`unordered_multiset` the key is the whole element, but is referred to as the key
the hash function, `Hash`, to the element's key (for sets the key is the whole element, but is referred to as the key
so that the same terminology can be used for sets and maps). This returns a
value of type `std::size_t`. `std::size_t` has a much greater range of values
then the number of buckets, so the container applies another transformation to
@ -80,7 +79,7 @@ h|*Method* h|*Description*
|===
== Controlling the number of buckets
== Controlling the Number of Buckets
As more elements are added to an unordered associative container, the number
of collisions will increase causing performance to degrade.
@ -90,8 +89,8 @@ calling `rehash`.
The standard leaves a lot of freedom to the implementer to decide how the
number of buckets is chosen, but it does make some requirements based on the
container's 'load factor', the number of elements divided by the number of buckets.
Containers also have a 'maximum load factor' which they should try to keep the
container's _load factor_, the number of elements divided by the number of buckets.
Containers also have a _maximum load factor_ which they should try to keep the
load factor below.
You can't control the bucket count directly but there are two ways to
@ -133,9 +132,10 @@ h|*Method* h|*Description*
|`void rehash(size_type n)`
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
2+^h| *Open-addressing containers only* +
2+^h| *Open-addressing and concurrent containers only* +
`boost::unordered_flat_set`, `boost::unordered_flat_map` +
`boost::unordered_node_set`, `boost::unordered_node_map` +
`boost::concurrent_flat_map`
h|*Method* h|*Description*
|`size_type max_load() const`
@ -143,7 +143,7 @@ h|*Method* h|*Description*
|===
A note on `max_load` for open-addressing containers: the maximum load will be
A note on `max_load` for open-addressing and concurrent containers: the maximum load will be
(`max_load_factor() * bucket_count()`) right after `rehash` or on container creation, but may
slightly decrease when erasing elements in high-load situations. For instance, if we
have a <<unordered_flat_map,`boost::unordered_flat_map`>> with `size()` almost
@ -151,216 +151,4 @@ at `max_load()` level and then erase 1,000 elements, `max_load()` may decrease b
few dozen elements. This is done internally by Boost.Unordered in order
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
== Iterator Invalidation
It is not specified how member functions other than `rehash` and `reserve` affect
the bucket count, although `insert` can only invalidate iterators
when the insertion causes the container's load to be greater than the maximum allowed.
For most implementations this means that `insert` will only
change the number of buckets when this happens. Iterators can be
invalidated by calls to `insert`, `rehash` and `reserve`.
As for pointers and references,
they are never invalidated for node-based containers
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
but they will when rehashing occurs for
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
these containers store elements directly into their holding buckets, so
when allocating a new bucket array the elements must be transferred by means of move construction.
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
to call `reserve` before inserting a large number of elements. This will get
the expensive rehashing out of the way and let you store iterators, safe in
the knowledge that they won't be invalidated. If you are inserting `n`
elements into container `x`, you could first call:
```
x.reserve(n);
```
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
so as to not exceed the maximum load factor.
+
Because the maximum load factor is defined as the number of elements divided by the total
number of available buckets, this function is logically equivalent to:
+
```
x.rehash(std::ceil(n / x.max_load_factor()))
```
+
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
== Fast Closed Addressing Implementation
++++
<style>
.imageblock > .title {
text-align: inherit;
}
</style>
++++
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
[#img-bucket-groups,.text-center]
.A simple bucket group approach
image::bucket-groups.png[align=center]
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
Canonical standard implementations will wind up looking like the diagram below:
[.text-center]
.The canonical standard approach
image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank]
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
```c++
auto const idx = get_bucket_idx(hash_function(key));
node* p = buckets[idx]; // first load
node* n = p->next; // second load
if (n && is_in_bucket(n, idx)) {
value_type const& v = *n; // third load
// ...
}
```
With a simple bucket group layout, this is all that must be done:
```c++
auto const idx = get_bucket_idx(hash_function(key));
node* n = buckets[idx]; // first load
if (n) {
value_type const& v = *n; // second load
// ...
}
```
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
[#img-fca-layout]
.The new layout used by Boost
image::fca.png[align=center]
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
A more detailed description of Boost.Unordered's closed-addressing implementation is
given in an
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
For more information on implementation rationale, read the
xref:#rationale_closed_addressing_containers[corresponding section].
== Open Addressing Implementation
The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and
`boost:unordered_flat_set`/`unordered_node_set`.
[#img-foa-layout]
.Open-addressing layout used by Boost.Unordered.
image::foa.png[align=center]
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array.
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
16-byte words.
[#img-foa-metadata]
.Breakdown of a metadata word.
image::foa-metadata.png[align=center]
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
- 0 if the corresponding bucket is empty.
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
stop iteration when the container has been fully traversed.
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
the element.
When looking for an element with hash value _h_, SIMD technologies such as
https://en.wikipedia.org/wiki/SSE2[SSE2] and
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
15 buckets with just a handful of CPU instructions: non-matching buckets can be
readily discarded, and those whose reduced hash value matches need be inspected via full
comparison with the corresponding element. If the looked-for element is not present,
the overflow byte is inspected:
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
element is not present).
- If the bit is set to 1 (the group has been _overflowed_), further groups are
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
the process is repeated.
Insertion is algorithmically similar: empty buckets are located using SIMD,
and when going past a full group its corresponding overflow bit is set to 1.
In architectures without SIMD support, the logical layout stays the same, but the metadata
word is codified using a technique we call _bit interleaving_: this layout allows us
to emulate SIMD with reasonably good performance using only standard arithmetic and
logical operations.
[#img-foa-metadata-interleaving]
.Bit-interleaved metadata word.
image::foa-metadata-interleaving.png[align=center]
A more detailed description of Boost.Unordered's open-addressing implementation is
given in an
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
For more information on implementation rationale, read the
xref:#rationale_open_addresing_containers[corresponding section].
== Concurrent Open Addressing Implementation
`boost::concurrent_flat_map` uses the basic
xref::#buckets_open_addressing_implementation[open-addressing layout] described above
augmented with synchronization mechanisms.
[#img-cfoa-layout]
.Concurrent open-addressing layout used by Boost.Unordered.
image::cfoa.png[align=center]
Two levels of synchronization are used:
* Container level: A read-write mutex is used to control access from any operation
to the container. Typically, such access is in read mode (that is, concurrent) even
for modifying operations, so for most practical purposes there is no thread
contention at this level. Access is only in write mode (blocking) when rehashing or
performing container-wide operations such as swapping or assignment.
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
** A read-write spinlock for synchronized access to any element in the group.
** An atomic _insertion counter_ used for optimistic insertion as described
below.
By using atomic operations to access the group metadata, lookup is (group-level)
lock-free up to the point where an actual comparison needs to be done with an element
that has been previously SIMD-matched: only then it's the group's spinlock used.
Insertion uses the following _optimistic algorithm_:
* The value of the insertion counter for the initial group in the probe
sequence is locally recorded (let's call this value `c0`).
* Lookup is as described above. If lookup finds no equivalent element,
search for an available slot for insertion successively locks/unlocks
each group in the probing sequence.
* When an available slot is located, it is preemptively occupied (its
reduced hash value is set) and the insertion counter is atomically
incremented: if no other thread has incremented the counter during the
whole operation (which is checked by comparing with `c0`), then we're
good to go and complete the insertion, otherwise we roll back and start
over.
This algorithm has very low contention both at the lookup and actual
insertion phases in exchange for the possibility that computations have
to be started over if some other thread interferes in the process by
performing a succesful insertion beginning at the same group. In
practice, the start-over frequency is extremely small, measured in the range
of parts per million for some of our benchmarks.
For more information on implementation rationale, read the
xref:#rationale_concurrent_hashmap[corresponding section].

View File

@ -6,8 +6,9 @@
:github-pr-url: https://github.com/boostorg/unordered/pull
:cpp: C++
== Release 1.83.0
== Release 1.83.0 - Major update
* Added `boost::concurrent_flat_map`, a fast, thread-safe hashmap based on open addressing.
* Sped up iteration of open-addressing containers.
== Release 1.82.0 - Major update

View File

@ -5,7 +5,7 @@
:cpp: C++
== Closed-addressing containers
== Closed-addressing Containers
`unordered_[multi]set` and `unordered_[multi]map` are intended to provide a conformant
implementation of the {cpp}20 standard that will work with {cpp}98 upwards.
@ -13,7 +13,7 @@ This wide compatibility does mean some compromises have to be made.
With a compiler and library that fully support {cpp}11, the differences should
be minor.
=== Move emulation
=== Move Emulation
Support for move semantics is implemented using Boost.Move. If rvalue
references are available it will use them, but if not it uses a close,
@ -25,7 +25,7 @@ but imperfect emulation. On such compilers:
* The containers themselves are not movable.
* Argument forwarding is not perfect.
=== Use of allocators
=== Use of Allocators
{cpp}11 introduced a new allocator system. It's backwards compatible due to
the lax requirements for allocators in the old standard, but might need
@ -58,7 +58,7 @@ Due to imperfect move emulation, some assignments might check
`propagate_on_container_copy_assignment` on some compilers and
`propagate_on_container_move_assignment` on others.
=== Construction/Destruction using allocators
=== Construction/Destruction Using Allocators
The following support is required for full use of {cpp}11 style
construction/destruction:
@ -117,7 +117,7 @@ Variadic constructor arguments for `emplace` are only used when both
rvalue references and variadic template parameters are available.
Otherwise `emplace` can only take up to 10 constructors arguments.
== Open-addressing containers
== Open-addressing Containers
The C++ standard does not currently provide any open-addressing container
specification to adhere to, so `boost::unordered_flat_set`/`unordered_node_set` and
@ -144,7 +144,7 @@ The main differences with C++ unordered associative containers are:
** Pointer stability is not kept under rehashing.
** There is no API for node extraction/insertion.
== Concurrent Hashmap
== Concurrent Containers
There is currently no specification in the C++ standard for this or any other concurrent
data structure. `boost::concurrent_flat_map` takes the same template parameters as `std::unordered_map`

View File

@ -1,8 +1,9 @@
[#concurrent_flat_map_intro]
= An introduction to boost::concurrent_flat_map
[#concurrent]
= Concurrent Containers
:idprefix: concurrent_flat_map_intro_
:idprefix: concurrent_
Boost.Unordered currently provides just one concurrent container named `boost::concurrent_flat_map`.
`boost::concurrent_flat_map` is a hash table that allows concurrent write/read access from
different threads without having to implement any synchronzation mechanism on the user's side.
@ -131,7 +132,7 @@ by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result
in higher parallelization. Consult the xref:#concurrent_flat_map[reference]
for a complete list of available operations.
== Whole-table visitation
== Whole-table Visitation
In the absence of iterators, `boost::concurrent_flat_map` provides `visit_all`
as an alternative way to process all the elements in the map:
@ -168,7 +169,7 @@ may be inserted, modified or erased by other threads during visitation. It is
advisable not to assume too much about the exact global state of a `boost::concurrent_flat_map`
at any point in your program.
== Blocking operations
== Blocking Operations
``boost::concurrent_flat_map``s can be copied, assigned, cleared and merged just like any
Boost.Unordered container. Unlike most other operations, these are _blocking_,
@ -177,5 +178,5 @@ clear or merge operation is in progress. Blocking is taken care of automatically
and the user need not take any special precaution, but overall performance may be affected.
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
or during insertion when the table's load hits `max_load()`. As with non-concurrent hashmaps,
or during insertion when the table's load hits `max_load()`. As with non-concurrent containers,
reserving space in advance of bulk insertions will generally speed up the process.

View File

@ -4,146 +4,22 @@
:idprefix: intro_
:cpp: C++
For accessing data based on key lookup, the {cpp} standard library offers `std::set`,
`std::map`, `std::multiset` and `std::multimap`. These are generally
implemented using balanced binary trees so that lookup time has
logarithmic complexity. That is generally okay, but in many cases a
link:https://en.wikipedia.org/wiki/Hash_table[hash table^] can perform better, as accessing data has constant complexity,
on average. The worst case complexity is linear, but that occurs rarely and
with some care, can be avoided.
link:https://en.wikipedia.org/wiki/Hash_table[Hash tables^] are extremely popular
computer data structures and can be found under one form or another in virtually any programming
language. Whereas other associative structures such as rb-trees (used in {cpp} by `std::set` and `std::map`)
have logarithmic-time complexity for insertion and lookup, hash tables, if configured properly,
perform these operations in constant time on average, and are generally much faster.
Also, the existing containers require a 'less than' comparison object
to order their elements. For some data types this is impossible to implement
or isn't practical. In contrast, a hash table only needs an equality function
and a hash function for the key.
{cpp} introduced __unordered associative containers__ `std::unordered_set`, `std::unordered_map`,
`std::unordered_multiset` and `std::unordered_multimap` in {cpp}11, but research on hash tables
hasn't stopped since: advances in CPU architectures such as
more powerful caches, link:https://en.wikipedia.org/wiki/Single_instruction,_multiple_data[SIMD] operations
and increasingly available link:https://en.wikipedia.org/wiki/Multi-core_processor[multicore processors]
open up possibilities for improved hash-based data structures and new use cases that
are simply beyond reach of unordered associative containers as specified in 2011.
With this in mind, unordered associative containers were added to the {cpp}
standard. Boost.Unordered provides an implementation of the containers described in {cpp}11,
with some <<compliance,deviations from the standard>> in
order to work with non-{cpp}11 compilers and libraries.
`unordered_set` and `unordered_multiset` are defined in the header
`<boost/unordered/unordered_set.hpp>`
[source,c++]
----
namespace boost {
template <
class Key,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
class unordered_set;
template<
class Key,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
class unordered_multiset;
}
----
`unordered_map` and `unordered_multimap` are defined in the header
`<boost/unordered/unordered_map.hpp>`
[source,c++]
----
namespace boost {
template <
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_map;
template<
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_multimap;
}
----
These containers, and all other implementations of standard unordered associative
containers, use an approach to its internal data structure design called
*closed addressing*. Starting in Boost 1.81, Boost.Unordered also provides containers
`boost::unordered_flat_set` and `boost::unordered_flat_map`, which use a
different data structure strategy commonly known as *open addressing* and depart in
a small number of ways from the standard so as to offer much better performance
in exchange (more than 2 times faster in typical scenarios):
[source,c++]
----
// #include <boost/unordered/unordered_flat_set.hpp>
//
// Note: no multiset version
namespace boost {
template <
class Key,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
class unordered_flat_set;
}
----
[source,c++]
----
// #include <boost/unordered/unordered_flat_map.hpp>
//
// Note: no multimap version
namespace boost {
template <
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_flat_map;
}
----
Starting in Boost 1.82, the containers `boost::unordered_node_set` and `boost::unordered_node_map`
are introduced: they use open addressing like `boost::unordered_flat_set` and `boost::unordered_flat_map`,
but internally store element _nodes_, like `boost::unordered_set` and `boost::unordered_map`,
which provide stability of pointers and references to the elements:
[source,c++]
----
// #include <boost/unordered/unordered_node_set.hpp>
//
// Note: no multiset version
namespace boost {
template <
class Key,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
class unordered_node_set;
}
----
[source,c++]
----
// #include <boost/unordered/unordered_node_map.hpp>
//
// Note: no multimap version
namespace boost {
template <
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_node_map;
}
----
These are all the containers provided by Boost.Unordered:
Boost.Unordered offers a catalog of hash containers with different standards compliance levels,
performances and intented usage scenarios:
[caption=, title='Table {counter:table-counter}. Boost.Unordered containers']
[cols="1,1,.^1", frame=all, grid=rows]
@ -165,44 +41,49 @@ These are all the containers provided by Boost.Unordered:
^| `boost::unordered_flat_set` +
`boost::unordered_flat_map`
^.^h|*Concurrent*
^|
^| `boost::concurrent_flat_map`
|===
Closed-addressing containers are pass:[C++]98-compatible. Open-addressing containers require a
reasonably compliant pass:[C++]11 compiler.
* **Closed-addressing containers** are fully compliant with the C++ specification
for unordered associative containers and feature one of the fastest implementations
in the market within the technical constraints imposed by the required standard interface.
* **Open-addressing containers** rely on much faster data structures and algorithms
(more than 2 times faster in typical scenarios) while slightly diverging from the standard
interface to accommodate the implementation.
There are two variants: **flat** (the fastest) and **node-based**, which
provide pointer stability under rehashing at the expense of being slower.
* Finally, `boost::concurrent_flat_map` (the only **concurrent container** provided
at present) is a hashmap designed and implemented to be used in high-performance
multithreaded scenarios. Its interface is radically different from that of regular C++ containers.
Boost.Unordered containers are used in a similar manner to the normal associative
containers:
[source,cpp]
----
typedef boost::unordered_map<std::string, int> map;
map x;
x["one"] = 1;
x["two"] = 2;
x["three"] = 3;
assert(x.at("one") == 1);
assert(x.find("missing") == x.end());
----
But since the elements aren't ordered, the output of:
All sets and maps in Boost.Unordered are instantiatied similarly as
`std::unordered_set` and `std::unordered_map`, respectively:
[source,c++]
----
for(const map::value_type& i: x) {
std::cout<<i.first<<","<<i.second<<"\n";
----
namespace boost {
template <
class Key,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
class unordered_set;
// same for unordered_multiset, unordered_flat_set, unordered_node_set
template <
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_map;
// same for unordered_multimap, unordered_flat_map, unordered_node_map
// and concurrent_flat_map
}
----
can be in any order. For example, it might be:
[source]
----
two,2
one,1
three,3
----
To store an object in an unordered associative container requires both a
key equality function and a hash function. The default function objects in
the standard containers support a few basic types including integer types,
@ -213,16 +94,3 @@ you have to extend Boost.Hash to support the type or use
your own custom equality predicates and hash functions. See the
<<hash_equality,Equality Predicates and Hash Functions>> section
for more details.
There are other differences, which are listed in the
<<comparison,Comparison with Associative Containers>> section.
== A concurrent hashmap
Starting in Boost 1.83, Boost.Unordered provides `boost::concurrent_flat_map`,
a thread-safe hash table for high performance multithreaded scenarios. Although
it shares the internal data structure and most of the algorithms with Boost.Unordered
open-addressing `boost::unordered_flat_map`, ``boost::concurrent_flat_map``'s API departs significantly
from that of C++ unordered associative containers to make this table suitable for
concurrent usage. Consult the xref:#concurrent_flat_map_intro[dedicated tutorial]
for more information.

View File

@ -4,7 +4,7 @@
= Implementation Rationale
== Closed-addressing containers
== Closed-addressing Containers
`boost::unordered_[multi]set` and `boost::unordered_[multi]map`
adhere to the standard requirements for unordered associative
@ -74,7 +74,7 @@ Since release 1.80.0, prime numbers are chosen for the number of buckets in
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
the result of the user's hash function as was used for release 1.79.0.
== Open-addresing containers
== Open-addresing Containers
The C++ standard specification of unordered associative containers impose
severe limitations on permissible implementations, the most important being
@ -86,7 +86,7 @@ The design of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unord
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
We discuss here the most relevant principles.
=== Hash function
=== Hash Function
Given its rich functionality and cross-platform interoperability,
`boost::hash` remains the default hash function of open-addressing containers.
@ -105,7 +105,7 @@ whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from https://arxiv.org/ab
When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
`boost::hash` specializations for string types are marked as avalanching.
=== Platform interoperability
=== Platform Interoperability
The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically
identical across different compilers as long as their ``std::size_t``s are the same size and the user-provided
@ -118,7 +118,7 @@ and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[N
this does not affect interoperatility. For instance, the behavior is the same
for Visual Studio on an x64-mode Intel CPU with SSE2 and for GCC on an IBM s390x without any supported SIMD technology.
== Concurrent Hashmap
== Concurrent Containers
The same data structure used by Boost.Unordered open-addressing containers has been chosen
also as the foundation of `boost::concurrent_flat_map`:
@ -132,7 +132,7 @@ lookup that are lock-free up to the last step of actual element comparison.
of all elements between `boost::concurrent_flat_map` and `boost::unordered_flat_map`.
(This feature has not been implemented yet.)
=== Hash function and platform interoperability
=== Hash Function and Platform Interoperability
`boost::concurrent_flat_map` makes the same decisions and provides the same guarantees
as Boost.Unordered open-addressing containers with regards to

View File

@ -1,8 +1,99 @@
[#regular]
= Regular Containers
:idprefix: regular_
Boost.Unordered closed-addressing containers (`boost::unordered_set`, `boost::unordered_map`,
`boost::unordered_multiset` and `boost::unordered_multimap`) are fully conformant with the
C++ specification for unordered associative containers, so for those who know how to use
`std::unordered_set`, `std::unordered_map`, etc., their homonyms in Boost:Unordered are
drop-in replacements. The interface of open-addressing containers (`boost::unordered_node_set`,
`boost::unordered_node_map`, `boost::unordered_flat_set` and `boost::unordered_flat_map`)
is very similar, but they present some minor differences listed in the dedicated
xref:#compliance_open_addressing_containers[standard compliance section].
For readers without previous experience with hash containers but familiar
with normal associatve containers (`std::set`, `std::map`,
`std::multiset` and `std::multimap`), Boost.Unordered containers are used in a similar manner:
[source,cpp]
----
typedef boost::unordered_map<std::string, int> map;
map x;
x["one"] = 1;
x["two"] = 2;
x["three"] = 3;
assert(x.at("one") == 1);
assert(x.find("missing") == x.end());
----
But since the elements aren't ordered, the output of:
[source,c++]
----
for(const map::value_type& i: x) {
std::cout<<i.first<<","<<i.second<<"\n";
}
----
can be in any order. For example, it might be:
[source]
----
two,2
one,1
three,3
----
There are other differences, which are listed in the
<<comparison,Comparison with Associative Containers>> section.
== Iterator Invalidation
It is not specified how member functions other than `rehash` and `reserve` affect
the bucket count, although `insert` can only invalidate iterators
when the insertion causes the container's load to be greater than the maximum allowed.
For most implementations this means that `insert` will only
change the number of buckets when this happens. Iterators can be
invalidated by calls to `insert`, `rehash` and `reserve`.
As for pointers and references,
they are never invalidated for node-based containers
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
but they will when rehashing occurs for
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
these containers store elements directly into their holding buckets, so
when allocating a new bucket array the elements must be transferred by means of move construction.
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
to call `reserve` before inserting a large number of elements. This will get
the expensive rehashing out of the way and let you store iterators, safe in
the knowledge that they won't be invalidated. If you are inserting `n`
elements into container `x`, you could first call:
```
x.reserve(n);
```
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
so as to not exceed the maximum load factor.
+
Because the maximum load factor is defined as the number of elements divided by the total
number of available buckets, this function is logically equivalent to:
+
```
x.rehash(std::ceil(n / x.max_load_factor()))
```
+
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
[#comparison]
:idprefix: comparison_
= Comparison with Associative Containers
== Comparison with Associative Containers
[caption=, title='Table {counter:table-counter} Interface differences']
[cols="1,1", frame=all, grid=rows]
@ -32,7 +123,7 @@
|`iterator`, `const_iterator` are of at least the forward category.
|Iterators, pointers and references to the container's elements are never invalidated.
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|<<regular_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
**Node-based containers:** Pointers and references to the container's elements are never invalidated. +
**Flat containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.

View File

@ -0,0 +1,179 @@
[#structures]
= Data Structures
:idprefix: structures_
== Closed-addressing Containers
++++
<style>
.imageblock > .title {
text-align: inherit;
}
</style>
++++
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
[#img-bucket-groups,.text-center]
.A simple bucket group approach
image::bucket-groups.png[align=center]
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
Canonical standard implementations will wind up looking like the diagram below:
[.text-center]
.The canonical standard approach
image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank]
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
```c++
auto const idx = get_bucket_idx(hash_function(key));
node* p = buckets[idx]; // first load
node* n = p->next; // second load
if (n && is_in_bucket(n, idx)) {
value_type const& v = *n; // third load
// ...
}
```
With a simple bucket group layout, this is all that must be done:
```c++
auto const idx = get_bucket_idx(hash_function(key));
node* n = buckets[idx]; // first load
if (n) {
value_type const& v = *n; // second load
// ...
}
```
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
[#img-fca-layout]
.The new layout used by Boost
image::fca.png[align=center]
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
A more detailed description of Boost.Unordered's closed-addressing implementation is
given in an
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
For more information on implementation rationale, read the
xref:#rationale_closed_addressing_containers[corresponding section].
== Open-addressing Containers
The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and
`boost:unordered_flat_set`/`unordered_node_set`.
[#img-foa-layout]
.Open-addressing layout used by Boost.Unordered.
image::foa.png[align=center]
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array.
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
16-byte words.
[#img-foa-metadata]
.Breakdown of a metadata word.
image::foa-metadata.png[align=center]
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
- 0 if the corresponding bucket is empty.
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
stop iteration when the container has been fully traversed.
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
the element.
When looking for an element with hash value _h_, SIMD technologies such as
https://en.wikipedia.org/wiki/SSE2[SSE2] and
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
15 buckets with just a handful of CPU instructions: non-matching buckets can be
readily discarded, and those whose reduced hash value matches need be inspected via full
comparison with the corresponding element. If the looked-for element is not present,
the overflow byte is inspected:
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
element is not present).
- If the bit is set to 1 (the group has been _overflowed_), further groups are
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
the process is repeated.
Insertion is algorithmically similar: empty buckets are located using SIMD,
and when going past a full group its corresponding overflow bit is set to 1.
In architectures without SIMD support, the logical layout stays the same, but the metadata
word is codified using a technique we call _bit interleaving_: this layout allows us
to emulate SIMD with reasonably good performance using only standard arithmetic and
logical operations.
[#img-foa-metadata-interleaving]
.Bit-interleaved metadata word.
image::foa-metadata-interleaving.png[align=center]
A more detailed description of Boost.Unordered's open-addressing implementation is
given in an
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
For more information on implementation rationale, read the
xref:#rationale_open_addresing_containers[corresponding section].
== Concurrent Containers
`boost::concurrent_flat_map` uses the basic
xref:#structures_open_addressing_containers[open-addressing layout] described above
augmented with synchronization mechanisms.
[#img-cfoa-layout]
.Concurrent open-addressing layout used by Boost.Unordered.
image::cfoa.png[align=center]
Two levels of synchronization are used:
* Container level: A read-write mutex is used to control access from any operation
to the container. Typically, such access is in read mode (that is, concurrent) even
for modifying operations, so for most practical purposes there is no thread
contention at this level. Access is only in write mode (blocking) when rehashing or
performing container-wide operations such as swapping or assignment.
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
** A read-write spinlock for synchronized access to any element in the group.
** An atomic _insertion counter_ used for optimistic insertion as described
below.
By using atomic operations to access the group metadata, lookup is (group-level)
lock-free up to the point where an actual comparison needs to be done with an element
that has been previously SIMD-matched: only then it's the group's spinlock used.
Insertion uses the following _optimistic algorithm_:
* The value of the insertion counter for the initial group in the probe
sequence is locally recorded (let's call this value `c0`).
* Lookup is as described above. If lookup finds no equivalent element,
search for an available slot for insertion successively locks/unlocks
each group in the probing sequence.
* When an available slot is located, it is preemptively occupied (its
reduced hash value is set) and the insertion counter is atomically
incremented: if no other thread has incremented the counter during the
whole operation (which is checked by comparing with `c0`), then we're
good to go and complete the insertion, otherwise we roll back and start
over.
This algorithm has very low contention both at the lookup and actual
insertion phases in exchange for the possibility that computations have
to be started over if some other thread interferes in the process by
performing a succesful insertion beginning at the same group. In
practice, the start-over frequency is extremely small, measured in the range
of parts per million for some of our benchmarks.
For more information on implementation rationale, read the
xref:#rationale_concurrent_containers[corresponding section].