added tutorial on boost::concurrent_flat_map

This commit is contained in:
joaquintides
2023-05-08 18:37:36 +02:00
parent 02197674f4
commit ba25041fc8
3 changed files with 191 additions and 0 deletions

View File

@ -14,6 +14,7 @@ include::unordered/intro.adoc[]
include::unordered/buckets.adoc[]
include::unordered/hash_equality.adoc[]
include::unordered/comparison.adoc[]
include::unordered/concurrent_flat_map_intro.adoc[]
include::unordered/compliance.adoc[]
include::unordered/benchmarks.adoc[]
include::unordered/rationale.adoc[]

View File

@ -0,0 +1,180 @@
[#concurrent_flat_map_intro]
= An introduction to boost::concurrent_flat_map
:idprefix: concurrent_flat_map_intro_
`boost::concurrent_flat_map` is a hash table that allows concurrent write/read access from
different threads without having to implement any synchronzation mechanism on the user's side.
[source,c++]
----
std::vector<int> input;
boost::concurrent_flat_map<int,int> m;
...
// process input in parallel
const int num_threads = 8;
std::vector<std::jthread> threads;
std::size_t chunk = input.size() / num_threads; // how many elements per thread
for (int i = 0; i < num_threads; ++i) {
threads.emplace_back([&,i] {
// calculate the portion of input this thread takes care of
std::size_t start = i * chunk;
std::size_t end = (i == num_threads - 1)? input.size(): (i + 1) * chunk;
for (std::size_t n = start; n < end; ++n) {
m.emplace(input[n], calculation(input[n]));
}
});
}
----
In the example above, threads access `m` without synchronization, just as we'd do in a
single-threaded scenario. In an ideal setting, if a given workload is distributed among
_N_ threads, execution is _N_ times faster than with one thread —this limit is
never attained in practice due to synchronization overheads and _contention_ (one thread
waiting for another to leave a locked portion of the map), but `boost::concurrent_flat_map`
is designed to perform with very little overhead and typically achieves _linear scaling_
(that is, performance is proportional to the number of threads up to the number of
logical cores in the CPU).
== Visitation-based API
The first thing a new user of `boost::concurrent_flat_map` will notice is that this
class _does not provide iterators_ (which makes it technically
not a https://en.cppreference.com/w/cpp/named_req/Container[Container^]
in the C++ standard sense). The reason for this is that iterators are inherently
thread-unsafe. Consider this hypothetical code:
[source,c++]
----
auto it = m.find(k); // A: get an iterator pointing to the element with key k
if (it != m.end() ) {
some_function(*it); // B: use the value of the element
}
----
In a multithreaded scenario, the iterator `it` may be invalid at point B if some other
thread issues an `m.erase(k)` operation between A and B. There are designs that
can remedy this by making iterators lock the element they point to, but this
approach lends itself to high contention and can easily produce deadlocks in a program.
`operator[]` has similar concurrency issues, and is not provided by
`boost::concurrent_flat_map` either. Instead, element access is done through
so-called _visitation functions_:
[source,c++]
----
m.visit(k, [](const auto& x) { // x is the element with key k (if it exists)
some_function(x); // use it
});
----
The visitation function passed by the user (in this case, a lambda function)
is executed internally by `boost::concurrent_flat_map` in
a thread-safe manner, so it can access the element without worries about other
threads interfering in the process. On the other hand, a
visitation function can _not_ access the container itself:
[source,c++]
----
m.visit(k, [&](const auto& x) {
some_function(x, m.size()); // forbidden: m can't be accessed inside visitation
});
----
Access to a different container is allowed, though:
[source,c++]
----
m.visit(k, [&](const auto& x) {
if (some_function(x)) {
m2.insert(x); // OK, m2 is a different boost::concurrent_flat_map
}
});
----
But, in general, visitation functions should be as lightweight as possible to
reduce contention and increase parallelization. In some cases, moving heavy work
outside of visitation may be beneficial:
[source,c++]
----
std::optional<value_type> o;
bool found = m.visit(k, [&](const auto& x) {
o = x;
});
if (found) {
some_heavy_duty_function(*o);
}
----
Visitation is pervasive in the API provided by `boost::concurrent_flat_map`, and
many classical operations have visitation-enabled variations:
[source,c++]
----
m.insert_or_visit(x, [](auto& y) {
// if insertion failed because of an equivalent element y,
// do something with it, for instance:
++y.second; // increment the mapped part of the element
});
----
Note that in this last example the visitation function could actually _modify_
the element: as a general rule, operations on a `boost::concurrent_flat_map` `m`
will grant visitation functions const/non-const access to the element depending on whether
`m` is const/non-const. Const access can be always be explicitly requested
by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result
in higher parallelization. Consult the xref:#concurrent_flat_map[reference]
for a complete list of available operations.
== Whole-table visitation
In the absence of iterators, `boost::concurrent_flat_map` provides `visit_all`
as an alternative way to process all the elements in the map:
[source,c++]
----
m.visit_all([](auto& x) {
x.second = 0; // reset the mapped part of the element
});
----
In C++17 compilers implementing standard parallel algorithms, whole-table
visitation can be parallelized:
[source,c++]
----
m.visit_all(std::execution::par, [](auto& x) { // run in parallel
x.second = 0; // reset the mapped part of the element
});
----
There is another whole-table visitation operation, `erase_if`:
[source,c++]
----
m.erase_if([](auto& x) {
return x.second == 0; // erase the elements whose mapped value is zero
});
----
`erase_if` can also be parallelized. Note that, in order to increase efficiency,
these operations do not block the table during execution: this implies that elements
may be inserted, modified or erased by other threads during visitation. It is
advisable not to assume too much about the exact global state of a `boost::concurrent_flat_map`
at any point in your program.
== Blocking operations
``boost::concurrent_flat_map``s can be copied, assigned, cleared and merged just like any
Boost.Unordered container. Unlike most other operations, these are _blocking_,
that is, all other threads are prevented from accesing the tables involved while a copy, assignment,
clear or merge operation is in progress. Blocking is taken care of automatically by the library
and the user need not take any special precaution, but overall performance may be affected.
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
or during insertion when the table's load hits `max_load()`. As with non-concurrent hashmaps,
reserving space in advance of bulk insertions will generally speed up the process.

View File

@ -216,3 +216,13 @@ for more details.
There are other differences, which are listed in the
<<comparison,Comparison with Associative Containers>> section.
== A concurrent hashmap
Starting in Boost 1.83, Boost.Unordered provides `boost::concurrent_flat_map`,
a thread-safe hash table for high performance multithreaded scenarios. Although
it shares the internal data structure and most of the algorithms with Boost.Unordered
open-addressing `boost::unordered_flat_map`, ``boost::concurrent_flat_map``'s API departs significantly
from that of C++ unordered associative containers to make this table suitable for
concurrent usage. Consult the xref:#concurrent_flat_map_intro[dedicated tutorial]
for more information.