Revert "update documentation to use antora"

This reverts commit 3c452f93c5.
This commit is contained in:
Christian Mazakas
2024-12-31 12:11:06 -08:00
parent 3c452f93c5
commit 40cf55240b
167 changed files with 0 additions and 24741 deletions
-725
View File
@@ -1,725 +0,0 @@
[#benchmarks]
:idprefix: benchmarks_
= Benchmarks
== boost::unordered_[multi]set
All benchmarks were created using `unordered_set<unsigned int>` (non-duplicate) and `unordered_multiset<unsigned int>` (duplicate). The source code can be https://github.com/boostorg/boost_unordered_benchmarks/tree/boost_unordered_set[found here^].
The insertion benchmarks insert `n` random values, where `n` is between 10,000 and 3 million. For the duplicated benchmarks, the same random values are repeated an average of 5 times.
The erasure benchmarks erase all `n` elements randomly until the container is empty. Erasure by key uses `erase(const key_type&)` to remove entire groups of equivalent elements in each operation.
The successful lookup benchmarks are done by looking up all `n` values, in their original insertion order.
The unsuccessful lookup benchmarks use `n` randomly generated integers but using a different seed value.
=== GCC 12 + libstdc++-v3, x64
==== Insertion
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/gcc/running insertion.xlsx.practice.png[width=250,link=_images/benchmarks-set/gcc/running insertion.xlsx.practice.png,window=_blank]
|image::benchmarks-set/gcc/running insertion.xlsx.practice non-unique.png[width=250,link=_images/benchmarks-set/gcc/running insertion.xlsx.practice non-unique.png,window=_blank]
|image::benchmarks-set/gcc/running insertion.xlsx.practice non-unique 5.png[width=250,link=_images/benchmarks-set/gcc/running insertion.xlsx.practice non-unique 5.png,window=_blank]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/gcc/running insertion.xlsx.practice norehash.png[width=250,link= _images/benchmarks-set/gcc/running insertion.xlsx.practice norehash.png,window=_blank]
|image::benchmarks-set/gcc/running insertion.xlsx.practice norehash non-unique.png[width=250,link= _images/benchmarks-set/gcc/running insertion.xlsx.practice norehash non-unique.png,window=_blank]
|image::benchmarks-set/gcc/running insertion.xlsx.practice norehash non-unique 5.png[width=250,link= _images/benchmarks-set/gcc/running insertion.xlsx.practice norehash non-unique 5.png,window=_blank]
h|non-duplicate elements, +
prior `reserve`
h|duplicate elements, +
prior `reserve`
h|duplicate elements, +
max load factor 5, +
prior `reserve`
|===
==== Erasure
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/gcc/scattered erasure.xlsx.practice.png[width=250,link= _images/benchmarks-set/gcc/scattered erasure.xlsx.practice.png,window=_blank]
|image::benchmarks-set/gcc/scattered erasure.xlsx.practice non-unique.png[width=250,link= _images/benchmarks-set/gcc/scattered erasure.xlsx.practice non-unique.png,window=_blank]
|image::benchmarks-set/gcc/scattered erasure.xlsx.practice non-unique 5.png[width=250,link= _images/benchmarks-set/gcc/scattered erasure.xlsx.practice non-unique 5.png,window=_blank]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|
|image::benchmarks-set/gcc/scattered erasure by key.xlsx.practice non-unique.png[width=250,link= _images/benchmarks-set/gcc/scattered erasure by key.xlsx.practice non-unique.png,window=_blank]
|image::benchmarks-set/gcc/scattered erasure by key.xlsx.practice non-unique 5.png[width=250,link= _images/benchmarks-set/gcc/scattered erasure by key.xlsx.practice non-unique 5.png,window=_blank]
|
h|by key, duplicate elements
h|by key, duplicate elements, +
max load factor 5
|===
==== Successful Lookup
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/gcc/scattered successful looukp.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/gcc/scattered successful looukp.xlsx.practice.png]
|image::benchmarks-set/gcc/scattered successful looukp.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/gcc/scattered successful looukp.xlsx.practice non-unique.png]
|image::benchmarks-set/gcc/scattered successful looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/gcc/scattered successful looukp.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
==== Unsuccessful lookup
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/gcc/scattered unsuccessful looukp.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/gcc/scattered unsuccessful looukp.xlsx.practice.png]
|image::benchmarks-set/gcc/scattered unsuccessful looukp.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/gcc/scattered unsuccessful looukp.xlsx.practice non-unique.png]
|image::benchmarks-set/gcc/scattered unsuccessful looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/gcc/scattered unsuccessful looukp.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
=== Clang 15 + libc++, x64
==== Insertion
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/clang_libcpp/running insertion.xlsx.practice.png[width=250, window=_blank,link= _images/benchmarks-set/clang_libcpp/running insertion.xlsx.practice.png]
|image::benchmarks-set/clang_libcpp/running insertion.xlsx.practice non-unique.png[width=250, window=_blank,link= _images/benchmarks-set/clang_libcpp/running insertion.xlsx.practice non-unique.png]
|image::benchmarks-set/clang_libcpp/running insertion.xlsx.practice non-unique 5.png[width=250, window=_blank,link= _images/benchmarks-set/clang_libcpp/running insertion.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/clang_libcpp/running insertion.xlsx.practice norehash.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/running insertion.xlsx.practice norehash.png]
|image::benchmarks-set/clang_libcpp/running insertion.xlsx.practice norehash non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/running insertion.xlsx.practice norehash non-unique.png]
|image::benchmarks-set/clang_libcpp/running insertion.xlsx.practice norehash non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/running insertion.xlsx.practice norehash non-unique 5.png]
h|non-duplicate elements, +
prior `reserve`
h|duplicate elements, +
prior `reserve`
h|duplicate elements, +
max load factor 5, +
prior `reserve`
|===
==== Erasure
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/clang_libcpp/scattered erasure.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered erasure.xlsx.practice.png]
|image::benchmarks-set/clang_libcpp/scattered erasure.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered erasure.xlsx.practice non-unique.png]
|image::benchmarks-set/clang_libcpp/scattered erasure.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered erasure.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|
|image::benchmarks-set/clang_libcpp/scattered erasure by key.xlsx.practice non-unique.png[width=250,link= _images/benchmarks-set/clang_libcpp/scattered erasure by key.xlsx.practice non-unique.png,window=_blank]
|image::benchmarks-set/clang_libcpp/scattered erasure by key.xlsx.practice non-unique 5.png[width=250,link= _images/benchmarks-set/clang_libcpp/scattered erasure by key.xlsx.practice non-unique 5.png,window=_blank]
|
h|by key, duplicate elements
h|by key, duplicate elements, +
max load factor 5
|===
==== Successful lookup
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/clang_libcpp/scattered successful looukp.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered successful looukp.xlsx.practice.png]
|image::benchmarks-set/clang_libcpp/scattered successful looukp.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered successful looukp.xlsx.practice non-unique.png]
|image::benchmarks-set/clang_libcpp/scattered successful looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered successful looukp.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
==== Unsuccessful lookup
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/clang_libcpp/scattered unsuccessful looukp.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered unsuccessful looukp.xlsx.practice.png]
|image::benchmarks-set/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique.png]
|image::benchmarks-set/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
=== Visual Studio 2022 + Dinkumware, x64
==== Insertion
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/vs/running insertion.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/vs/running insertion.xlsx.practice.png]
|image::benchmarks-set/vs/running insertion.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/vs/running insertion.xlsx.practice non-unique.png]
|image::benchmarks-set/vs/running insertion.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/vs/running insertion.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/vs/running insertion.xlsx.practice norehash.png[width=250,window=_blank,link= _images/benchmarks-set/vs/running insertion.xlsx.practice norehash.png]
|image::benchmarks-set/vs/running insertion.xlsx.practice norehash non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/vs/running insertion.xlsx.practice norehash non-unique.png]
|image::benchmarks-set/vs/running insertion.xlsx.practice norehash non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/vs/running insertion.xlsx.practice norehash non-unique 5.png]
h|non-duplicate elements, +
prior `reserve`
h|duplicate elements, +
prior `reserve`
h|duplicate elements, +
max load factor 5, +
prior `reserve`
|===
==== Erasure
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/vs/scattered erasure.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered erasure.xlsx.practice.png]
|image::benchmarks-set/vs/scattered erasure.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered erasure.xlsx.practice non-unique.png]
|image::benchmarks-set/vs/scattered erasure.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered erasure.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|
|image::benchmarks-set/vs/scattered erasure by key.xlsx.practice non-unique.png[width=250,link= _images/benchmarks-set/vs/scattered erasure by key.xlsx.practice non-unique.png,window=_blank]
|image::benchmarks-set/vs/scattered erasure by key.xlsx.practice non-unique 5.png[width=250,link= _images/benchmarks-set/vs/scattered erasure by key.xlsx.practice non-unique 5.png,window=_blank]
|
h|by key, duplicate elements
h|by key, duplicate elements, +
max load factor 5
|===
==== Successful lookup
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/vs/scattered successful looukp.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered successful looukp.xlsx.practice.png]
|image::benchmarks-set/vs/scattered successful looukp.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered successful looukp.xlsx.practice non-unique.png]
|image::benchmarks-set/vs/scattered successful looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered successful looukp.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
==== Unsuccessful lookup
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-set/vs/scattered unsuccessful looukp.xlsx.practice.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered unsuccessful looukp.xlsx.practice.png]
|image::benchmarks-set/vs/scattered unsuccessful looukp.xlsx.practice non-unique.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered unsuccessful looukp.xlsx.practice non-unique.png]
|image::benchmarks-set/vs/scattered unsuccessful looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link= _images/benchmarks-set/vs/scattered unsuccessful looukp.xlsx.practice non-unique 5.png]
h|non-duplicate elements
h|duplicate elements
h|duplicate elements, +
max load factor 5
|===
== boost::unordered_(flat|node)_map
All benchmarks were created using:
* `https://abseil.io/docs/cpp/guides/container[absl::flat_hash_map^]<uint64_t, uint64_t>`
* `boost::unordered_map<uint64_t, uint64_t>`
* `boost::unordered_flat_map<uint64_t, uint64_t>`
* `boost::unordered_node_map<uint64_t, uint64_t>`
The source code can be https://github.com/boostorg/boost_unordered_benchmarks/tree/boost_unordered_flat_map[found here^].
The insertion benchmarks insert `n` random values, where `n` is between 10,000 and 10 million.
The erasure benchmarks erase traverse the `n` elements and erase those with odd key (50% on average).
The successful lookup benchmarks are done by looking up all `n` values, in their original insertion order.
The unsuccessful lookup benchmarks use `n` randomly generated integers but using a different seed value.
=== GCC 12, x64
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/gcc-x64/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x64/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/gcc-x64/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x64/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/gcc-x64/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x64/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/gcc-x64/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x64/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
=== Clang 15, x64
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/clang-x64/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x64/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/clang-x64/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x64/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/clang-x64/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x64/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/clang-x64/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x64/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
=== Visual Studio 2022, x64
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/vs-x64/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x64/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/vs-x64/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x64/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/vs-x64/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x64/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/vs-x64/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x64/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
=== Clang 12, ARM64
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/clang-arm64/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-arm64/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/clang-arm64/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-arm64/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/clang-arm64/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-arm64/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/clang-arm64/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-arm64/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
=== GCC 12, x86
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/gcc-x86/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x86/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/gcc-x86/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x86/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/gcc-x86/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x86/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/gcc-x86/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/gcc-x86/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
=== Clang 15, x86
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/clang-x86/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x86/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/clang-x86/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x86/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/clang-x86/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x86/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/clang-x86/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/clang-x86/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
=== Visual Studio 2022, x86
[caption=]
[cols="4*^.^a", frame=all, grid=all]
|===
|image::benchmarks-flat_map/vs-x86/Running insertion.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x86/Running insertion.xlsx.plot.png]
|image::benchmarks-flat_map/vs-x86/Running erasure.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x86/Running erasure.xlsx.plot.png]
|image::benchmarks-flat_map/vs-x86/Scattered successful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x86/Scattered successful looukp.xlsx.plot.png]
|image::benchmarks-flat_map/vs-x86/Scattered unsuccessful looukp.xlsx.plot.png[width=250,window=_blank,link= _images/benchmarks-flat_map/vs-x86/Scattered unsuccessful looukp.xlsx.plot.png]
h|running insertion
h|running erasure
h|successful lookup
h|unsuccessful lookup
|===
== boost::concurrent_(flat|node)_map
All benchmarks were created using:
* `https://spec.oneapi.io/versions/latest/elements/oneTBB/source/containers/concurrent_hash_map_cls.html[oneapi::tbb::concurrent_hash_map^]<int, int>`
* `https://github.com/greg7mdp/gtl/blob/main/docs/phmap.md[gtl::parallel_flat_hash_map^]<int, int>` with 64 submaps
* `boost::concurrent_flat_map<int, int>`
* `boost::concurrent_node_map<int, int>`
The source code can be https://github.com/boostorg/boost_unordered_benchmarks/tree/boost_concurrent_flat_map[found here^].
The benchmarks exercise a number of threads _T_ (between 1 and 16) concurrently performing operations
randomly chosen among **update**, **successful lookup** and **unsuccessful lookup**. The keys used in the
operations follow a https://en.wikipedia.org/wiki/Zipf%27s_law#Formal_definition[Zipf distribution^]
with different _skew_ parameters: the higher the skew, the more concentrated are the keys in the lower values
of the covered range.
`boost::concurrent_flat_map` and `boost::concurrent_node_map` are exercised using both regular and xref:#concurrent_bulk_visitation[bulk visitation]:
in the latter case, lookup keys are buffered in a local array and then processed at
once each time the buffer reaches xref:#concurrent_flat_map_constants[`bulk_visit_size`].
=== GCC 12, x64
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x64/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
=== Clang 15, x64
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x64/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
=== Visual Studio 2022, x64
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x64/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
=== Clang 12, ARM64
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-arm64/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
=== GCC 12, x86
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/gcc-x86/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
=== Clang 15, x86
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/clang-x86/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
=== Visual Studio 2022, x86
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.500k, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.500k, 0.01.png]
|image::benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.500k, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.500k, 0.5.png]
|image::benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.500k, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.500k, 0.99.png]
h|500k updates, 4.5M lookups +
skew=0.01
h|500k updates, 4.5M lookups +
skew=0.5
h|500k updates, 4.5M lookups +
skew=0.99
|===
[caption=]
[cols="3*^.^a", frame=all, grid=all]
|===
|image::benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.5M, 0.01.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.5M, 0.01.png]
|image::benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.5M, 0.5.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.5M, 0.5.png]
|image::benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.5M, 0.99.png[width=250,window=_blank,link= _images/benchmarks-concurrent_map/vs-x86/Parallel workload.xlsx.5M, 0.99.png]
h|5M updates, 45M lookups +
skew=0.01
h|5M updates, 45M lookups +
skew=0.5
h|5M updates, 45M lookups +
skew=0.99
|===
-12
View File
@@ -1,12 +0,0 @@
[#bibliography]
:idprefix: bibliography_
= Bibliography
* _C/C++ Users Journal_. February, 2006. Pete Becker. http://www.ddj.com/cpp/184402066[STL and TR1: Part III - Unordered containers^]. +
An introduction to the standard unordered containers.
* _Wikipedia_. https://en.wikipedia.org/wiki/Hash_table[Hash table^]. +
An introduction to hash table implementations. Discusses the differences between closed-addressing and open-addressing approaches.
* Peter Dimov, 2022. https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
-147
View File
@@ -1,147 +0,0 @@
[#buckets]
:idprefix: buckets_
= Basics of Hash Tables
The containers are made up of a number of _buckets_, each of which can contain
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
have more buckets).
image::buckets.png[]
In order to decide which bucket to place an element in, the container applies
the hash function, `Hash`, to the element's key (for sets the key is the whole element, but is referred to as the key
so that the same terminology can be used for sets and maps). This returns a
value of type `std::size_t`. `std::size_t` has a much greater range of values
then the number of buckets, so the container applies another transformation to
that value to choose a bucket to place the element in.
Retrieving the elements for a given key is simple. The same process is applied
to the key to find the correct bucket. Then the key is compared with the
elements in the bucket to find any elements that match (using the equality
predicate `Pred`). If the hash function has worked well the elements will be
evenly distributed amongst the buckets so only a small number of elements will
need to be examined.
There is <<hash_equality, more information on hash functions and
equality predicates in the next section>>.
You can see in the diagram that `A` & `D` have been placed in the same bucket.
When looking for elements in this bucket up to 2 comparisons are made, making
the search slower. This is known as a *collision*. To keep things fast we try to
keep collisions to a minimum.
If instead of `boost::unordered_set` we had used <<unordered_flat_set,`boost::unordered_flat_set`>>, the
diagram would look as follows:
image::buckets-oa.png[]
In open-addressing containers, buckets can hold at most one element; if a collision happens
(like is the case of `D` in the example), the element uses some other available bucket in
the vicinity of the original position. Given this simpler scenario, Boost.Unordered
open-addressing containers offer a very limited API for accessing buckets.
[caption=, title='Table {counter:table-counter}. Methods for Accessing Buckets']
[cols="1,.^1", frame=all, grid=rows]
|===
2+^h| *All containers*
h|*Method* h|*Description*
|`size_type bucket_count() const`
|The number of buckets.
2+^h| *Closed-addressing containers only*
h|*Method* h|*Description*
|`size_type max_bucket_count() const`
|An upper bound on the number of buckets.
|`size_type bucket_size(size_type n) const`
|The number of elements in bucket `n`.
|`size_type bucket(key_type const& k) const`
|Returns the index of the bucket which would contain `k`.
|`local_iterator begin(size_type n)`
1.6+|Return begin and end iterators for bucket `n`.
|`local_iterator end(size_type n)`
|`const_local_iterator begin(size_type n) const`
|`const_local_iterator end(size_type n) const`
|`const_local_iterator cbegin(size_type n) const`
|`const_local_iterator cend(size_type n) const`
|===
== Controlling the Number of Buckets
As more elements are added to an unordered associative container, the number
of collisions will increase causing performance to degrade.
To combat this the containers increase the bucket count as elements are inserted.
You can also tell the container to change the bucket count (if required) by
calling `rehash`.
The standard leaves a lot of freedom to the implementer to decide how the
number of buckets is chosen, but it does make some requirements based on the
container's _load factor_, the number of elements divided by the number of buckets.
Containers also have a _maximum load factor_ which they should try to keep the
load factor below.
You can't control the bucket count directly but there are two ways to
influence it:
* Specify the minimum number of buckets when constructing a container or when calling `rehash`.
* Suggest a maximum load factor by calling `max_load_factor`.
`max_load_factor` doesn't let you set the maximum load factor yourself, it just
lets you give a _hint_. And even then, the standard doesn't actually
require the container to pay much attention to this value. The only time the
load factor is _required_ to be less than the maximum is following a call to
`rehash`. But most implementations will try to keep the number of elements
below the max load factor, and set the maximum load factor to be the same as
or close to the hint - unless your hint is unreasonably small or large.
[caption=, title='Table {counter:table-counter}. Methods for Controlling Bucket Size']
[cols="1,.^1", frame=all, grid=rows]
|===
2+^h| *All containers*
h|*Method* h|*Description*
|`X(size_type n)`
|Construct an empty container with at least `n` buckets (`X` is the container type).
|`X(InputIterator i, InputIterator j, size_type n)`
|Construct an empty container with at least `n` buckets and insert elements from the range `[i, j)` (`X` is the container type).
|`float load_factor() const`
|The average number of elements per bucket.
|`float max_load_factor() const`
|Returns the current maximum load factor.
|`float max_load_factor(float z)`
|Changes the container's maximum load factor, using `z` as a hint. +
**Open-addressing and concurrent containers:** this function does nothing: users are not allowed to change the maximum load factor.
|`void rehash(size_type n)`
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
2+^h| *Open-addressing and concurrent containers only*
h|*Method* h|*Description*
|`size_type max_load() const`
|Returns the maximum number of allowed elements in the container before rehash.
|===
A note on `max_load` for open-addressing and concurrent containers: the maximum load will be
(`max_load_factor() * bucket_count()`) right after `rehash` or on container creation, but may
slightly decrease when erasing elements in high-load situations. For instance, if we
have a <<unordered_flat_map,`boost::unordered_flat_map`>> with `size()` almost
at `max_load()` level and then erase 1,000 elements, `max_load()` may decrease by around a
few dozen elements. This is done internally by Boost.Unordered in order
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
-458
View File
@@ -1,458 +0,0 @@
[#changes]
= Change Log
:idprefix: changes_
:svn-ticket-url: https://svn.boost.org/trac/boost/ticket
:github-pr-url: https://github.com/boostorg/unordered/pull
:cpp: C++
== Release 1.87.0 - Major update
* Added concurrent, node-based containers `boost::concurrent_node_map` and `boost::concurrent_node_set`.
* Added `insert_and_visit(x, f1, f2)` and similar operations to concurrent containers, which
allow for visitation of an element right after insertion (by contrast, `insert_or_visit(x, f)` only
visits the element if insertion did _not_ take place).
* Made visitation exclusive-locked within certain
`boost::concurrent_flat_set` operations to allow for safe mutable modification of elements
({github-pr-url}/265[PR#265^]).
* In Visual Studio Natvis, supported any container with an allocator that uses fancy pointers. This applies to any fancy pointer type, as long as the proper Natvis customization point "Intrinsic" functions are written for the fancy pointer type.
* Added GDB pretty-printers for all containers and iterators. For a container with an allocator that uses fancy pointers, these only work if the proper pretty-printer is written for the fancy pointer type itself.
* Fixed `std::initializer_list` assignment issues for open-addressing containers
({github-pr-url}/277[PR#277^]).
* Allowed non-copyable callables to be passed to the `std::initializer_list` overloads of `insert_{and|or}_[c]visit` for concurrent containers, by internally passing a `std::reference_wrapper` of the callable to the iterator-pair overloads.
== Release 1.86.0
* Added container `pmr` aliases when header `<memory_resource>` is available. The alias `boost::unordered::pmr::[container]` refers to `boost::unordered::[container]` with a `std::pmr::polymorphic_allocator` allocator type.
* Equipped open-addressing and concurrent containers to internally calculate and provide statistical metrics affected by the quality of the hash function. This functionality is enabled by the global macro `BOOST_UNORDERED_ENABLE_STATS`.
* Avalanching hash functions must now be marked via an `is_avalanching` typedef with an embedded `value` constant set to `true` (typically, defining `is_avalanching` as `std::true_type`). `using is_avalanching = void` is deprecated but allowed for backwards compatibility.
* Added Visual Studio Natvis framework custom visualizations for containers and iterators. This works for all containers with an allocator using raw pointers. In this release, containers and iterators are not supported if their allocator uses fancy pointers. This may be addressed in later releases.
== Release 1.85.0
* Optimized `emplace()` for a `value_type` or `init_type` (if applicable) argument to bypass creating an intermediate object. The argument is already the same type as the would-be intermediate object.
* Optimized `emplace()` for `k,v` arguments on map containers to delay constructing the object until it is certain that an element should be inserted. This optimization happens when the map's `key_type` is move constructible or when the `k` argument is a `key_type`.
* Fixed support for allocators with `explicit` copy constructors ({github-pr-url}/234[PR#234^]).
* Fixed bug in the `const` version of `unordered_multimap::find(k, hash, eq)` ({github-pr-url}/238[PR#238^]).
== Release 1.84.0 - Major update
* Added `boost::concurrent_flat_set`.
* Added `[c]visit_while` operations to concurrent containers,
with serial and parallel variants.
* Added efficient move construction of `boost::unordered_flat_(map|set)` from
`boost::concurrent_flat_(map|set)` and vice versa.
* Added bulk visitation to concurrent containers for increased lookup performance.
* Added debug-mode mechanisms for detecting illegal reentrancies into
a concurrent container from user code.
* Added Boost.Serialization support to all containers and their (non-local) iterator types.
* Added support for fancy pointers to open-addressing and concurrent containers.
This enables scenarios like the use of Boost.Interprocess allocators to construct containers in shared memory.
* Fixed bug in member of pointer operator for local iterators of closed-addressing
containers ({github-pr-url}/221[PR#221^], credit goes to GitHub user vslashg for finding
and fixing this issue).
* Starting with this release, `boost::unordered_[multi]set` and `boost::unordered_[multi]map`
only work with C++11 onwards.
== Release 1.83.0 - Major update
* Added `boost::concurrent_flat_map`, a fast, thread-safe hashmap based on open addressing.
* Sped up iteration of open-addressing containers.
* In open-addressing containers, `erase(iterator)`, which previously returned nothing, now
returns a proxy object convertible to an iterator to the next element.
This enables the typical `it = c.erase(it)` idiom without incurring any performance penalty
when the returned proxy is not used.
== Release 1.82.0 - Major update
* {cpp}03 support is planned for deprecation. Boost 1.84.0 will no longer support
{cpp}03 mode and {cpp}11 will become the new minimum for using the library.
* Added node-based, open-addressing containers
`boost::unordered_node_map` and `boost::unordered_node_set`.
* Extended heterogeneous lookup to more member functions as specified in
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2363r5.html[P2363].
* Replaced the previous post-mixing process for open-addressing containers with
a new algorithm based on extended multiplication by a constant.
* Fixed bug in internal emplace() impl where stack-local types were not properly
constructed using the Allocator of the container which breaks uses-allocator
construction.
== Release 1.81.0 - Major update
* Added fast containers `boost::unordered_flat_map` and `boost::unordered_flat_set`
based on open addressing.
* Added CTAD deduction guides for all containers.
* Added missing constructors as specified in https://cplusplus.github.io/LWG/issue2713[LWG issue 2713].
== Release 1.80.0 - Major update
* Refactor internal implementation to be dramatically faster
* Allow `final` Hasher and KeyEqual objects
* Update documentation, adding benchmark graphs and notes on the new internal
data structures
== Release 1.79.0
* Improved {cpp}20 support:
** All containers have been updated to support
heterogeneous `count`, `equal_range` and `find`.
** All containers now implement the member function `contains`.
** `erase_if` has been implemented for all containers.
* Improved {cpp}23 support:
** All containers have been updated to support
heterogeneous `erase` and `extract`.
* Changed behavior of `reserve` to eagerly
allocate ({github-pr-url}/59[PR#59^]).
* Various warning fixes in the test suite.
* Update code to internally use `boost::allocator_traits`.
* Switch to Fibonacci hashing.
* Update documentation to be written in AsciiDoc instead of QuickBook.
== Release 1.67.0
* Improved {cpp}17 support:
** Add template deduction guides from the standard.
** Use a simple implementation of `optional` in node handles, so
that they're closer to the standard.
** Add missing `noexcept` specifications to `swap`, `operator=`
and node handles, and change the implementation to match.
Using `std::allocator_traits::is_always_equal`, or our own
implementation when not available, and
`boost::is_nothrow_swappable` in the implementation.
* Improved {cpp}20 support:
** Use `boost::to_address`, which has the proposed {cpp}20 semantics,
rather than the old custom implementation.
* Add `element_type` to iterators, so that `std::pointer_traits`
will work.
* Use `std::piecewise_construct` on recent versions of Visual {cpp},
and other uses of the Dinkumware standard library,
now using Boost.Predef to check compiler and library versions.
* Use `std::iterator_traits` rather than the boost iterator traits
in order to remove dependency on Boost.Iterator.
* Remove iterators' inheritance from `std::iterator`, which is
deprecated in {cpp}17, thanks to Daniela Engert
({github-pr-url}/7[PR#7^]).
* Stop using `BOOST_DEDUCED_TYPENAME`.
* Update some Boost include paths.
* Rename some internal methods, and variables.
* Various testing improvements.
* Miscellaneous internal changes.
== Release 1.66.0
* Simpler move construction implementation.
* Documentation fixes ({github-pr-url}/6[GitHub #6^]).
== Release 1.65.0
* Add deprecated attributes to `quick_erase` and `erase_return_void`.
I really will remove them in a future version this time.
* Small standards compliance fixes:
** `noexpect` specs for `swap` free functions.
** Add missing `insert(P&&)` methods.
== Release 1.64.0
* Initial support for new {cpp}17 member functions:
`insert_or_assign` and `try_emplace` in `unordered_map`,
* Initial support for `merge` and `extract`.
Does not include transferring nodes between
`unordered_map` and `unordered_multimap` or between `unordered_set` and
`unordered_multiset` yet. That will hopefully be in the next version of
Boost.
== Release 1.63.0
* Check hint iterator in `insert`/`emplace_hint`.
* Fix some warnings, mostly in the tests.
* Manually write out `emplace_args` for small numbers of arguments -
should make template error messages a little more bearable.
* Remove superfluous use of `boost::forward` in emplace arguments,
which fixes emplacing string literals in old versions of Visual {cpp}.
* Fix an exception safety issue in assignment. If bucket allocation
throws an exception, it can overwrite the hash and equality functions while
leaving the existing elements in place. This would mean that the function
objects wouldn't match the container elements, so elements might be in the
wrong bucket and equivalent elements would be incorrectly handled.
* Various reference documentation improvements.
* Better allocator support ({svn-ticket-url}/12459[#12459^]).
* Make the no argument constructors implicit.
* Implement missing allocator aware constructors.
* Fix assigning the hash/key equality functions for empty containers.
* Remove unary/binary_function from the examples in the documentation.
They are removed in {cpp}17.
* Support 10 constructor arguments in emplace. It was meant to support up to 10
arguments, but an off by one error in the preprocessor code meant it only
supported up to 9.
== Release 1.62.0
* Remove use of deprecated `boost::iterator`.
* Remove `BOOST_NO_STD_DISTANCE` workaround.
* Remove `BOOST_UNORDERED_DEPRECATED_EQUALITY` warning.
* Simpler implementation of assignment, fixes an exception safety issue
for `unordered_multiset` and `unordered_multimap`. Might be a little slower.
* Stop using return value SFINAE which some older compilers have issues
with.
== Release 1.58.0
* Remove unnecessary template parameter from const iterators.
* Rename private `iterator` typedef in some iterator classes, as it
confuses some traits classes.
* Fix move assignment with stateful, propagate_on_container_move_assign
allocators ({svn-ticket-url}/10777[#10777^]).
* Fix rare exception safety issue in move assignment.
* Fix potential overflow when calculating number of buckets to allocate
({github-pr-url}/4[GitHub #4^]).
== Release 1.57.0
* Fix the `pointer` typedef in iterators ({svn-ticket-url}/10672[#10672^]).
* Fix Coverity warning
({github-pr-url}/2[GitHub #2^]).
== Release 1.56.0
* Fix some shadowed variable warnings ({svn-ticket-url}/9377[#9377^]).
* Fix allocator use in documentation ({svn-ticket-url}/9719[#9719^]).
* Always use prime number of buckets for integers. Fixes performance
regression when inserting consecutive integers, although makes other
uses slower ({svn-ticket-url}/9282[#9282^]).
* Only construct elements using allocators, as specified in {cpp}11 standard.
== Release 1.55.0
* Avoid some warnings ({svn-ticket-url}/8851[#8851^], {svn-ticket-url}/8874[#8874^]).
* Avoid exposing some detail functions via. ADL on the iterators.
* Follow the standard by only using the allocators' construct and destroy
methods to construct and destroy stored elements. Don't use them for internal
data like pointers.
== Release 1.54.0
* Mark methods specified in standard as `noexpect`. More to come in the next
release.
* If the hash function and equality predicate are known to both have nothrow
move assignment or construction then use them.
== Release 1.53.0
* Remove support for the old pre-standard variadic pair constructors, and
equality implementation. Both have been deprecated since Boost 1.48.
* Remove use of deprecated config macros.
* More internal implementation changes, including a much simpler
implementation of `erase`.
== Release 1.52.0
* Faster assign, which assigns to existing nodes where possible, rather than
creating entirely new nodes and copy constructing.
* Fixed bug in `erase_range` ({svn-ticket-url}/7471[#7471^]).
* Reverted some of the internal changes to how nodes are created, especially
for {cpp}11 compilers. 'construct' and 'destroy' should work a little better
for {cpp}11 allocators.
* Simplified the implementation a bit. Hopefully more robust.
== Release 1.51.0
* Fix construction/destruction issue when using a {cpp}11 compiler with a
{cpp}03 allocator ({svn-ticket-url}/7100[#7100^]).
* Remove a `try..catch` to support compiling without exceptions.
* Adjust SFINAE use to try to support g++ 3.4 ({svn-ticket-url}/7175[#7175^]).
* Updated to use the new config macros.
== Release 1.50.0
* Fix equality for `unordered_multiset` and `unordered_multimap`.
* {svn-ticket-url}/6857[Ticket 6857^]:
Implement `reserve`.
* {svn-ticket-url}/6771[Ticket 6771^]:
Avoid gcc's `-Wfloat-equal` warning.
* {svn-ticket-url}/6784[Ticket 6784^]:
Fix some Sun specific code.
* {svn-ticket-url}/6190[Ticket 6190^]:
Avoid gcc's `-Wshadow` warning.
* {svn-ticket-url}/6905[Ticket 6905^]:
Make namespaces in macros compatible with `bcp` custom namespaces.
Fixed by Luke Elliott.
* Remove some of the smaller prime number of buckets, as they may make
collisions quite probable (e.g. multiples of 5 are very common because
we used base 10).
* On old versions of Visual {cpp}, use the container library's implementation
of `allocator_traits`, as it's more likely to work.
* On machines with 64 bit std::size_t, use power of 2 buckets, with Thomas
Wang's hash function to pick which one to use. As modulus is very slow
for 64 bit values.
* Some internal changes.
== Release 1.49.0
* Fix warning due to accidental odd assignment.
* Slightly better error messages.
== Release 1.48.0 - Major update
This is major change which has been converted to use Boost.Move's move
emulation, and be more compliant with the {cpp}11 standard. See the
xref:compliance.adoc[compliance section] for details.
The container now meets {cpp}11's complexity requirements, but to do so
uses a little more memory. This means that `quick_erase` and
`erase_return_void` are no longer required, they'll be removed in a
future version.
{cpp}11 support has resulted in some breaking changes:
* Equality comparison has been changed to the {cpp}11 specification.
In a container with equivalent keys, elements in a group with equal
keys used to have to be in the same order to be considered equal,
now they can be a permutation of each other. To use the old
behavior define the macro `BOOST_UNORDERED_DEPRECATED_EQUALITY`.
* The behaviour of swap is different when the two containers to be
swapped has unequal allocators. It used to allocate new nodes using
the appropriate allocators, it now swaps the allocators if
the allocator has a member structure `propagate_on_container_swap`,
such that `propagate_on_container_swap::value` is true.
* Allocator's `construct` and `destroy` functions are called with raw
pointers, rather than the allocator's `pointer` type.
* `emplace` used to emulate the variadic pair constructors that
appeared in early {cpp}0x drafts. Since they were removed it no
longer does so. It does emulate the new `piecewise_construct`
pair constructors - only you need to use
`boost::piecewise_construct`. To use the old emulation of
the variadic constructors define
`BOOST_UNORDERED_DEPRECATED_PAIR_CONSTRUCT`.
== Release 1.45.0
* Fix a bug when inserting into an `unordered_map` or `unordered_set` using
iterators which returns `value_type` by copy.
== Release 1.43.0
* {svn-ticket-url}/3966[Ticket 3966^]:
`erase_return_void` is now `quick_erase`, which is the
http://home.roadrunner.com/~hinnant/issue_review/lwg-active.html#579[
current forerunner for resolving the slow erase by iterator^], although
there's a strong possibility that this may change in the future. The old
method name remains for backwards compatibility but is considered deprecated
and will be removed in a future release.
* Use Boost.Exception.
* Stop using deprecated `BOOST_HAS_*` macros.
== Release 1.42.0
* Support instantiating the containers with incomplete value types.
* Reduced the number of warnings (mostly in tests).
* Improved codegear compatibility.
* {svn-ticket-url}/3693[Ticket 3693^]:
Add `erase_return_void` as a temporary workaround for the current
`erase` which can be inefficient because it has to find the next
element to return an iterator.
* Add templated find overload for compatible keys.
* {svn-ticket-url}/3773[Ticket 3773^]:
Add missing `std` qualifier to `ptrdiff_t`.
* Some code formatting changes to fit almost all lines into 80 characters.
== Release 1.41.0 - Major update
* The original version made heavy use of macros to sidestep some of the older
compilers' poor template support. But since I no longer support those
compilers and the macro use was starting to become a maintenance burden it
has been rewritten to use templates instead of macros for the implementation
classes.
* The container object is now smaller thanks to using `boost::compressed_pair`
for EBO and a slightly different function buffer - now using a bool instead
of a member pointer.
* Buckets are allocated lazily which means that constructing an empty container
will not allocate any memory.
== Release 1.40.0
* {svn-ticket-url}/2975[Ticket 2975^]:
Store the prime list as a preprocessor sequence - so that it will always get
the length right if it changes again in the future.
* {svn-ticket-url}/1978[Ticket 1978^]:
Implement `emplace` for all compilers.
* {svn-ticket-url}/2908[Ticket 2908^],
{svn-ticket-url}/3096[Ticket 3096^]:
Some workarounds for old versions of borland, including adding explicit
destructors to all containers.
* {svn-ticket-url}/3082[Ticket 3082^]:
Disable incorrect Visual {cpp} warnings.
* Better configuration for {cpp}0x features when the headers aren't available.
* Create less buckets by default.
== Release 1.39.0
* {svn-ticket-url}/2756[Ticket 2756^]: Avoid a warning
on Visual {cpp} 2009.
* Some other minor internal changes to the implementation, tests and
documentation.
* Avoid an unnecessary copy in `operator[]`.
* {svn-ticket-url}/2975[Ticket 2975^]: Fix length of
prime number list.
== Release 1.38.0
* Use link:../../../core/swap.html[`boost::swap`^].
* {svn-ticket-url}/2237[Ticket 2237^]:
Document that the equality and inequality operators are undefined for two
objects if their equality predicates aren't equivalent. Thanks to Daniel
Krügler.
* {svn-ticket-url}/1710[Ticket 1710^]:
Use a larger prime number list. Thanks to Thorsten Ottosen and Hervé
Brönnimann.
* Use
link:../../../type_traits/index.html[aligned storage^] to store the types.
This changes the way the allocator is used to construct nodes. It used to
construct the node with two calls to the allocator's `construct`
method - once for the pointers and once for the value. It now constructs
the node with a single call to construct and then constructs the value using
in place construction.
* Add support for {cpp}0x initializer lists where they're available (currently
only g++ 4.4 in {cpp}0x mode).
== Release 1.37.0
* Rename overload of `emplace` with hint, to `emplace_hint` as specified in
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2691.pdf[n2691^].
* Provide forwarding headers at `<boost/unordered/unordered_map_fwd.hpp>` and
`<boost/unordered/unordered_set_fwd.hpp>`.
* Move all the implementation inside `boost/unordered`, to assist
modularization and hopefully make it easier to track Release subversion.
== Release 1.36.0
First official release.
* Rearrange the internals.
* Move semantics - full support when rvalue references are available, emulated
using a cut down version of the Adobe move library when they are not.
* Emplace support when rvalue references and variadic template are available.
* More efficient node allocation when rvalue references and variadic template
are available.
* Added equality operators.
== Boost 1.35.0 Add-on - 31st March 2008
Unofficial release uploaded to vault, to be used with Boost 1.35.0. Incorporated
many of the suggestions from the review.
* Improved portability thanks to Boost regression testing.
* Fix lots of typos, and clearer text in the documentation.
* Fix floating point to `std::size_t` conversion when calculating sizes from
the max load factor, and use `double` in the calculation for greater accuracy.
* Fix some errors in the examples.
== Review Version
Initial review version, for the review conducted from 7th December 2007 to
16th December 2007.
-150
View File
@@ -1,150 +0,0 @@
[#compliance]
= Standard Compliance
:idprefix: compliance_
:cpp: C++
== Closed-addressing Containers
`boost::unordered_[multi]set` and `boost::unordered_[multi]map` provide a conformant
implementation for {cpp}11 (or later) compilers of the latest standard revision of
{cpp} unordered associative containers, with very minor deviations as noted.
The containers are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^]
and support https://en.cppreference.com/w/cpp/named_req/Allocator#Fancy_pointers[fancy pointers^].
=== Deduction Guides
Deduction guides for
https://en.cppreference.com/w/cpp/language/class_template_argument_deduction[class template argument deduction (CTAD)^]
are only available on {cpp}17 (or later) compilers.
=== Piecewise Pair Emplacement
In accordance with the standard specification,
`boost::unordered_[multi]map::emplace` supports piecewise pair construction:
[source,c++]
----
boost::unordered_multimap<std::string, std::complex> x;
x.emplace(
std::piecewise_construct,
std::make_tuple("key"), std::make_tuple(1, 2));
----
Additionally, the same
functionality is provided via non-standard `boost::unordered::piecewise_construct`
and Boost.Tuple:
[source,c++]
----
x.emplace(
boost::unordered::piecewise_construct,
boost::make_tuple("key"), boost::make_tuple(1, 2));
----
This feature has been retained for backwards compatibility with
previous versions of Boost.Unordered: users are encouraged to
update their code to use `std::piecewise_construct` and
``std::tuple``s instead.
=== Swap
When swapping, `Pred` and `Hash` are not currently swapped by calling
`swap`, their copy constructors are used. As a consequence, when swapping
an exception may be thrown from their copy constructor.
== Open-addressing Containers
The C++ standard does not currently provide any open-addressing container
specification to adhere to, so `boost::unordered_flat_set`/`unordered_node_set` and
`boost::unordered_flat_map`/`unordered_node_map` take inspiration from `std::unordered_set` and
`std::unordered_map`, respectively, and depart from their interface where
convenient or as dictated by their internal data structure, which is
radically different from that imposed by the standard (closed addressing).
Open-addressing containers provided by Boost.Unordered only work with reasonably
compliant C++11 (or later) compilers. Language-level features such as move semantics
and variadic template parameters are then not emulated.
The containers are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^]
and support https://en.cppreference.com/w/cpp/named_req/Allocator#Fancy_pointers[fancy pointers^].
The main differences with C++ unordered associative containers are:
* In general:
** `begin()` is not constant-time.
** `erase(iterator)` does not return an iterator to the following element, but
a proxy object that converts to that iterator if requested; this avoids
a potentially costly iterator increment operation when not needed.
** There is no API for bucket handling (except `bucket_count`).
** The maximum load factor of the container is managed internally and can't be set by the user. The maximum load,
exposed through the public function `max_load`, may decrease on erasure under high-load conditions.
* Flat containers (`boost::unordered_flat_set` and `boost::unordered_flat_map`):
** `value_type` must be move-constructible.
** Pointer stability is not kept under rehashing.
** There is no API for node extraction/insertion.
== Concurrent Containers
There is currently no specification in the C++ standard for this or any other type of concurrent
data structure. The APIs of `boost::concurrent_flat_set`/`boost::concurrent_node_set` and
`boost::concurrent_flat_map`/`boost::concurrent_node_map`
are modelled after `std::unordered_flat_set` and `std::unordered_flat_map`, respectively,
with the crucial difference that iterators are not provided
due to their inherent problems in concurrent scenarios (high contention, prone to deadlocking):
so, Boost.Unordered concurrent containers are technically not models of
https://en.cppreference.com/w/cpp/named_req/Container[Container^], although
they meet all the requirements of https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^]
containers (including
https://en.cppreference.com/w/cpp/named_req/Allocator#Fancy_pointers[fancy pointer^] support)
except those implying iterators.
In a non-concurrent unordered container, iterators serve two main purposes:
* Access to an element previously located via lookup.
* Container traversal.
In place of iterators, Boost.Unordered concurrent containers use _internal visitation_
facilities as a thread-safe substitute. Classical operations returning an iterator to an
element already existing in the container, like for instance:
[source,c++]
----
iterator find(const key_type& k);
std::pair<iterator, bool> insert(const value_type& obj);
----
are transformed to accept a _visitation function_ that is passed such element:
[source,c++]
----
template<class F> size_t visit(const key_type& k, F f);
template<class F> bool insert_or_visit(const value_type& obj, F f);
----
(In the second case `f` is only invoked if there's an equivalent element
to `obj` in the table, not if insertion is successful). Container traversal
is served by:
[source,c++]
----
template<class F> size_t visit_all(F f);
----
of which there are parallelized versions in C++17 compilers with parallel
algorithm support. In general, the interface of concurrent containers
is derived from that of their non-concurrent counterparts by a fairly straightforward
process of replacing iterators with visitation where applicable. If for
regular maps `iterator` and `const_iterator` provide mutable and const access to elements,
respectively, here visitation is granted mutable or const access depending on
the constness of the member function used (there are also `*cvisit` overloads for
explicit const visitation); In the case of `boost::concurrent_flat_set`, visitation is always const.
One notable operation not provided by `boost::concurrent_flat_map`/`boost::concurrent_node_map`
is `operator[]`/`at`, which can be
replaced, if in a more convoluted manner, by
xref:#concurrent_flat_map_try_emplace_or_cvisit[`try_emplace_or_visit`].
//-
-320
View File
@@ -1,320 +0,0 @@
[#concurrent]
= Concurrent Containers
:idprefix: concurrent_
Boost.Unordered provides `boost::concurrent_node_set`, `boost::concurrent_node_map`,
`boost::concurrent_flat_set` and `boost::concurrent_flat_map`,
hash tables that allow concurrent write/read access from
different threads without having to implement any synchronzation mechanism on the user's side.
[source,c++]
----
std::vector<int> input;
boost::concurrent_flat_map<int,int> m;
...
// process input in parallel
const int num_threads = 8;
std::vector<std::jthread> threads;
std::size_t chunk = input.size() / num_threads; // how many elements per thread
for (int i = 0; i < num_threads; ++i) {
threads.emplace_back([&,i] {
// calculate the portion of input this thread takes care of
std::size_t start = i * chunk;
std::size_t end = (i == num_threads - 1)? input.size(): (i + 1) * chunk;
for (std::size_t n = start; n < end; ++n) {
m.emplace(input[n], calculation(input[n]));
}
});
}
----
In the example above, threads access `m` without synchronization, just as we'd do in a
single-threaded scenario. In an ideal setting, if a given workload is distributed among
_N_ threads, execution is _N_ times faster than with one thread —this limit is
never attained in practice due to synchronization overheads and _contention_ (one thread
waiting for another to leave a locked portion of the map), but Boost.Unordered concurrent containers
are designed to perform with very little overhead and typically achieve _linear scaling_
(that is, performance is proportional to the number of threads up to the number of
logical cores in the CPU).
== Visitation-based API
The first thing a new user of Boost.Unordered concurrent containers
will notice is that these classes _do not provide iterators_ (which makes them technically
not https://en.cppreference.com/w/cpp/named_req/Container[Containers^]
in the C++ standard sense). The reason for this is that iterators are inherently
thread-unsafe. Consider this hypothetical code:
[source,c++]
----
auto it = m.find(k); // A: get an iterator pointing to the element with key k
if (it != m.end() ) {
some_function(*it); // B: use the value of the element
}
----
In a multithreaded scenario, the iterator `it` may be invalid at point B if some other
thread issues an `m.erase(k)` operation between A and B. There are designs that
can remedy this by making iterators lock the element they point to, but this
approach lends itself to high contention and can easily produce deadlocks in a program.
`operator[]` has similar concurrency issues, and is not provided by
`boost::concurrent_flat_map`/`boost::concurrent_node_map` either. Instead, element access is done through
so-called _visitation functions_:
[source,c++]
----
m.visit(k, [](const auto& x) { // x is the element with key k (if it exists)
some_function(x); // use it
});
----
The visitation function passed by the user (in this case, a lambda function)
is executed internally by Boost.Unordered in
a thread-safe manner, so it can access the element without worrying about other
threads interfering in the process.
On the other hand, a visitation function can _not_ access the container itself:
[source,c++]
----
m.visit(k, [&](const auto& x) {
some_function(x, m.size()); // forbidden: m can't be accessed inside visitation
});
----
Access to a different container is allowed, though:
[source,c++]
----
m.visit(k, [&](const auto& x) {
if (some_function(x)) {
m2.insert(x); // OK, m2 is a different boost::concurrent_flat_map
}
});
----
But, in general, visitation functions should be as lightweight as possible to
reduce contention and increase parallelization. In some cases, moving heavy work
outside of visitation may be beneficial:
[source,c++]
----
std::optional<value_type> o;
bool found = m.visit(k, [&](const auto& x) {
o = x;
});
if (found) {
some_heavy_duty_function(*o);
}
----
Visitation is prominent in the API provided by concurrent containers, and
many classical operations have visitation-enabled variations:
[source,c++]
----
m.insert_or_visit(x, [](auto& y) {
// if insertion failed because of an equivalent element y,
// do something with it, for instance:
++y.second; // increment the mapped part of the element
});
----
Note that in this last example the visitation function could actually _modify_
the element: as a general rule, operations on a concurrent map `m`
will grant visitation functions const/non-const access to the element depending on whether
`m` is const/non-const. Const access can be always be explicitly requested
by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result
in higher parallelization. For concurrent sets, on the other hand,
visitation is always const access.
Although expected to be used much less frequently, concurrent containers
also provide insertion operations where an element can be visited right after
element creation (in addition to the usual visitation when an equivalent
element already exists):
[source,c++]
----
m.insert_and_cvisit(x,
[](const auto& y) {
std::cout<< "(" << y.first << ", " << y.second <<") inserted\n";
},
[](const auto& y) {
std::cout<< "(" << y.first << ", " << y.second << ") already exists\n";
});
----
Consult the references of
xref:#concurrent_node_set[`boost::concurrent_node_set`],
xref:#concurrent_flat_map[`boost::concurrent_node_map`],
xref:#concurrent_flat_set[`boost::concurrent_flat_set`] and
xref:#concurrent_flat_map[`boost::concurrent_flat_map`]
for the complete list of visitation-enabled operations.
== Whole-Table Visitation
In the absence of iterators, `visit_all` is provided
as an alternative way to process all the elements in the container:
[source,c++]
----
m.visit_all([](auto& x) {
x.second = 0; // reset the mapped part of the element
});
----
In C++17 compilers implementing standard parallel algorithms, whole-table
visitation can be parallelized:
[source,c++]
----
m.visit_all(std::execution::par, [](auto& x) { // run in parallel
x.second = 0; // reset the mapped part of the element
});
----
Traversal can be interrupted midway:
[source,c++]
----
// finds the key to a given (unique) value
int key = 0;
int value = ...;
bool found = !m.visit_while([&](const auto& x) {
if(x.second == value) {
key = x.first;
return false; // finish
}
else {
return true; // keep on visiting
}
});
if(found) { ... }
----
There is one last whole-table visitation operation, `erase_if`:
[source,c++]
----
m.erase_if([](auto& x) {
return x.second == 0; // erase the elements whose mapped value is zero
});
----
`visit_while` and `erase_if` can also be parallelized. Note that, in order to increase efficiency,
whole-table visitation operations do not block the table during execution: this implies that elements
may be inserted, modified or erased by other threads during visitation. It is
advisable not to assume too much about the exact global state of a concurrent container
at any point in your program.
== Bulk visitation
Suppose you have an `std::array` of keys you want to look up for in a concurrent map:
[source,c++]
----
std::array<int, N> keys;
...
for(const auto& key: keys) {
m.visit(key, [](auto& x) { ++x.second; });
}
----
_Bulk visitation_ allows us to pass all the keys in one operation:
[source,c++]
----
m.visit(keys.begin(), keys.end(), [](auto& x) { ++x.second; });
----
This functionality is not provided for mere syntactic convenience, though: by processing all the
keys at once, some internal optimizations can be applied that increase
performance over the regular, one-at-a-time case (consult the
xref:#benchmarks_boostconcurrent_flat_map[benchmarks]). In fact, it may be beneficial
to buffer incoming keys so that they can be bulk visited in chunks:
[source,c++]
----
static constexpr auto bulk_visit_size = boost::concurrent_flat_map<int,int>::bulk_visit_size;
std::array<int, bulk_visit_size> buffer;
std::size_t i=0;
while(...) { // processing loop
...
buffer[i++] = k;
if(i == bulk_visit_size) {
map.visit(buffer.begin(), buffer.end(), [](auto& x) { ++x.second; });
i = 0;
}
...
}
// flush remaining keys
map.visit(buffer.begin(), buffer.begin() + i, [](auto& x) { ++x.second; });
----
There's a latency/throughput tradeoff here: it will take longer for incoming keys to
be processed (since they are buffered), but the number of processed keys per second
is higher. `bulk_visit_size` is the recommended chunk size —smaller buffers
may yield worse performance.
== Blocking Operations
Concurrent containers can be copied, assigned, cleared and merged just like any other
Boost.Unordered container. Unlike most other operations, these are _blocking_,
that is, all other threads are prevented from accesing the tables involved while a copy, assignment,
clear or merge operation is in progress. Blocking is taken care of automatically by the library
and the user need not take any special precaution, but overall performance may be affected.
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
or during insertion when the table's load hits `max_load()`. As with non-concurrent containers,
reserving space in advance of bulk insertions will generally speed up the process.
== Interoperability with non-concurrent containers
As open-addressing and concurrent containers are based on the same internal data structure,
they can be efficiently move-constructed from their non-concurrent counterpart, and vice versa.
[caption=, title='Table {counter:table-counter}. Concurrent/non-concurrent interoperatibility']
[cols="1,1", frame=all, grid=all]
|===
^|`boost::concurrent_node_set`
^|`boost::unordered_node_set`
^|`boost::concurrent_node_map`
^|`boost::unordered_node_map`
^|`boost::concurrent_flat_set`
^|`boost::unordered_flat_set`
^|`boost::concurrent_flat_map`
^|`boost::unordered_flat_map`
|===
This interoperability comes handy in multistage scenarios where parts of the data processing happen
in parallel whereas other steps are non-concurrent (or non-modifying). In the following example,
we want to construct a histogram from a huge input vector of words:
the population phase can be done in parallel with `boost::concurrent_flat_map` and results
then transferred to the final container.
[source,c++]
----
std::vector<std::string> words = ...;
// Insert words in parallel
boost::concurrent_flat_map<std::string_view, std::size_t> m0;
std::for_each(
std::execution::par, words.begin(), words.end(),
[&](const auto& word) {
m0.try_emplace_or_visit(word, 1, [](auto& x) { ++x.second; });
});
// Transfer to a regular unordered_flat_map
boost::unordered_flat_map m=std::move(m0);
----
-20
View File
@@ -1,20 +0,0 @@
[#copyright]
= Copyright and License
:idprefix: copyright_
*Daniel James*
Copyright (C) 2003, 2004 Jeremy B. Maitin-Shepard
Copyright (C) 2005-2008 Daniel James
Copyright (C) 2022-2023 Christian Mazakas
Copyright (C) 2022-2024 Joaqu&iacute;n M L&oacute;pez Mu&ntilde;oz
Copyright (C) 2022-2023 Peter Dimov
Copyright (C) 2024 Braden Ganetsky
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
-90
View File
@@ -1,90 +0,0 @@
[#debuggability]
:idprefix: debuggability_
= Debuggability
== Visual Studio Natvis
All containers and iterators have custom visualizations in the Natvis framework.
=== Using in your project
To visualize Boost.Unordered containers in the Natvis framework in your project, simply add the file link:https://github.com/boostorg/unordered/blob/develop/extra/boost_unordered.natvis[/extra/boost_unordered.natvis] to your Visual Studio project as an "Existing Item".
=== Visualization structure
The visualizations mirror those for the standard unordered containers. A container has a maximum of 100 elements displayed at once. Each set element has its item name listed as `[i]`, where `i` is the index in the display, starting at `0`. Each map element has its item name listed as `[\{key-display}]` by default. For example, if the first element is the pair `("abc", 1)`, the item name will be `["abc"]`. This behaviour can be overridden by using the view "ShowElementsByIndex", which switches the map display behaviour to name the elements by index. This same view name is used in the standard unordered containers.
By default, the closed-addressing containers will show the `[hash_function]` and `[key_eq]`, the `[spare_hash_function]` and `[spare_key_eq]` if applicable, the `[allocator]`, and the elements. Using the view "detailed" adds the `[bucket_count]` and `[max_load_factor]`. Conversely, using the view "simple" shows only the elements, with no other items present.
By default, the open-addressing containers will show the `[hash_function]`, `[key_eq]`, `[allocator]`, and the elements. Using the view "simple" shows only the elements, with no other items present. Both the SIMD and the non-SIMD implementations are viewable through the Natvis framework.
Iterators are displayed similarly to their standard counterparts. An iterator is displayed as though it were the element that it points to. An end iterator is simply displayed as `{ end iterator }`.
=== Fancy pointers
The container visualizations also work if you are using fancy pointers in your allocator, such as `boost::interprocess::offset_ptr`. While this is rare, Boost.Unordered has natvis customization points to support any type of fancy pointer. `boost::interprocess::offset_ptr` has support already defined in the Boost.Interprocess library, and you can add support to your own type by following the instructions contained in a comment near the end of the file link:https://github.com/boostorg/unordered/blob/develop/extra/boost_unordered.natvis[/extra/boost_unordered.natvis].
== GDB Pretty-Printers
All containers and iterators have a custom GDB pretty-printer.
=== Using in your project
Always, when using pretty-printers, you must enable pretty-printing like below. This is typically a one-time setup.
```plaintext
(gdb) set print pretty on
```
By default, if you compile into an ELF binary format, your binary will contain the Boost.Unordered pretty-printers. To use the embedded pretty-printers, ensure you allow auto-loading like below. This must be done every time you load GDB, or add it to a ".gdbinit" file.
```plaintext
(gdb) add-auto-load-safe-path [/path/to/executable]
```
You can choose to compile your binary _without_ embedding the pretty-printers by defining `BOOST_ALL_NO_EMBEDDED_GDB_SCRIPTS`, which disables the embedded GDB pretty-printers for all Boost libraries that have this feature.
You can load the pretty-printers externally from the non-embedded Python script. Add the script, link:https://github.com/boostorg/unordered/blob/develop/extra/boost_unordered_printers.py[/extra/boost_unordered_printers.py], using the `source` command as shown below.
```plaintext
(gdb) source [/path/to/boost]/libs/unordered/extra/boost_unordered_printers.py
```
=== Visualization structure
The visualizations mirror the standard unordered containers. The map containers display an association from key to mapped value. The set containers display an association from index to value. An iterator is either displayed with its item, or as an end iterator. Here is what may be shown for an example `boost::unordered_map`, an example `boost::unordered_set`, and their respective begin and end iterators.
```plaintext
(gdb) print example_unordered_map
$1 = boost::unordered_map with 3 elements = {["C"] = "c", ["B"] = "b", ["A"] = "a"}
(gdb) print example_unordered_map_begin
$2 = iterator = { {first = "C", second = "c"} }
(gdb) print example_unordered_map_end
$3 = iterator = { end iterator }
(gdb) print example_unordered_set
$4 = boost::unordered_set with 3 elements = {[0] = "c", [1] = "b", [2] = "a"}
(gdb) print example_unordered_set_begin
$5 = iterator = { "c" }
(gdb) print example_unordered_set_end
$6 = iterator = { end iterator }
```
The other containers are identical other than replacing "`boost::unordered_{map|set}`" with the appropriate template name when displaying the container itself. Note that each sub-element (i.e. the key, the mapped value, or the value) is displayed based on its own printing settings which may include its own pretty-printer.
Both the SIMD and the non-SIMD implementations are viewable through the GDB pretty-printers.
For open-addressing containers where xref:#hash_quality_container_statistics[container statistics] are enabled, you can obtain these statistics by calling `get_stats()` on the container, from within GDB. This is overridden in GDB as an link:https://sourceware.org/gdb/current/onlinedocs/gdb.html/Xmethod-API.html[xmethod], so it will not invoke any C++ synchronization code. See the following printout as an example for the expected format.
```plaintext
(gdb) print example_flat_map.get_stats()
$1 = [stats] = {[insertion] = {[count] = 5, [probe_length] = {avg = 1.0, var = 0.0, dev = 0.0}},
[successful_lookup] = {[count] = 0, [probe_length] = {avg = 0.0, var = 0.0, dev = 0.0},
[num_comparisons] = {avg = 0.0, var = 0.0, dev = 0.0}}, [unsuccessful_lookup] = {[count] = 5,
[probe_length] = {avg = 1.0, var = 0.0, dev = 0.0},
[num_comparisons] = {avg = 0.0, var = 0.0, dev = 0.0}}}
```
=== Fancy pointers
The pretty-printers also work if you are using fancy pointers in your allocator, such as `boost::interprocess::offset_ptr`. While this is rare, Boost.Unordered has GDB pretty-printer customization points to support any type of fancy pointer. `boost::interprocess::offset_ptr` has support already defined in the Boost.Interprocess library, and you can add support to your own type by following the instructions contained in a comment near the end of the file link:https://github.com/boostorg/unordered/blob/develop/extra/boost_unordered_printers.py[/extra/boost_unordered_printers.py].
-149
View File
@@ -1,149 +0,0 @@
[#hash_equality]
:idprefix: hash_equality_
= Equality Predicates and Hash Functions
While the associative containers use an ordering relation to specify how the
elements are stored, the unordered associative containers use an equality
predicate and a hash function. For example, <<unordered_map,boost::unordered_map>>
is declared as:
```cpp
template <
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_map;
```
The hash function comes first as you might want to change the hash function
but not the equality predicate. For example, if you wanted to use the
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function#FNV-1a_hash[FNV-1a hash^] you could write:
```cpp
boost::unordered_map<std::string, int, hash::fnv_1a>
dictionary;
```
There is an link:../../examples/fnv1.hpp[implementation of FNV-1a^] in the examples directory.
If you wish to use a different equality function, you will also need to use a matching hash function. For example, to implement a case insensitive dictionary you need to define a case insensitive equality predicate and hash function:
```cpp
struct iequal_to
{
bool operator()(std::string const& x,
std::string const& y) const
{
return boost::algorithm::iequals(x, y, std::locale());
}
};
struct ihash
{
std::size_t operator()(std::string const& x) const
{
std::size_t seed = 0;
std::locale locale;
for(std::string::const_iterator it = x.begin();
it != x.end(); ++it)
{
boost::hash_combine(seed, std::toupper(*it, locale));
}
return seed;
}
};
```
Which you can then use in a case insensitive dictionary:
```cpp
boost::unordered_map<std::string, int, ihash, iequal_to>
idictionary;
```
This is a simplified version of the example at
link:../../examples/case_insensitive.hpp[/libs/unordered/examples/case_insensitive.hpp^] which supports other locales and string types.
CAUTION: Be careful when using the equality (`==`) operator with custom equality
predicates, especially if you're using a function pointer. If you compare two
containers with different equality predicates then the result is undefined.
For most stateless function objects this is impossible - since you can only
compare objects with the same equality predicate you know the equality
predicates must be equal. But if you're using function pointers or a stateful
equality predicate (e.g. `boost::function`) then you can get into trouble.
== Custom Types
Similarly, a custom hash function can be used for custom types:
```cpp
struct point {
int x;
int y;
};
bool operator==(point const& p1, point const& p2)
{
return p1.x == p2.x && p1.y == p2.y;
}
struct point_hash
{
std::size_t operator()(point const& p) const
{
std::size_t seed = 0;
boost::hash_combine(seed, p.x);
boost::hash_combine(seed, p.y);
return seed;
}
};
boost::unordered_multiset<point, point_hash> points;
```
Since the default hash function is link:../../../container_hash/index.html[Boost.Hash^],
we can extend it to support the type so that the hash function doesn't need to be explicitly given:
```cpp
struct point {
int x;
int y;
};
bool operator==(point const& p1, point const& p2)
{
return p1.x == p2.x && p1.y == p2.y;
}
std::size_t hash_value(point const& p) {
std::size_t seed = 0;
boost::hash_combine(seed, p.x);
boost::hash_combine(seed, p.y);
return seed;
}
// Now the default function objects work.
boost::unordered_multiset<point> points;
```
See the link:../../../container_hash/index.html[Boost.Hash documentation^] for more detail on how to
do this. Remember that it relies on extensions to the standard - so it
won't work for other implementations of the unordered associative containers,
you'll need to explicitly use Boost.Hash.
[caption=, title='Table {counter:table-counter} Methods for accessing the hash and equality functions']
[cols="1,.^1", frame=all, grid=rows]
|===
|Method |Description
|`hasher hash_function() const`
|Returns the container's hash function.
|`key_equal key_eq() const`
|Returns the container's key equality function..
|===
-145
View File
@@ -1,145 +0,0 @@
[#hash_quality]
= Hash Quality
:idprefix: hash_quality_
In order to work properly, hash tables require that the supplied hash function
be of __good quality__, roughly meaning that it uses its `std::size_t` output
space as uniformly as possible, much like a random number generator would do
—except, of course, that the value of a hash function is not random but strictly determined
by its input argument.
Closed-addressing containers in Boost.Unordered are fairly robust against
hash functions with less-than-ideal quality, but open-addressing and concurrent
containers are much more sensitive to this factor, and their performance can
degrade dramatically if the hash function is not appropriate. In general, if
you're using functions provided by or generated with link:../../../container_hash/index.html[Boost.Hash^],
the quality will be adequate, but you have to be careful when using alternative
hash algorithms.
The rest of this section applies only to open-addressing and concurrent containers.
== Hash Post-mixing and the Avalanching Property
Even if your supplied hash function does not conform to the uniform behavior
required by open addressing, chances are that
the performance of Boost.Unordered containers will be acceptable, because the library
executes an internal __post-mixing__ step that improves the statistical
properties of the calculated hash values. This comes with an extra computational
cost; if you'd like to opt out of post-mixing, annotate your hash function as
follows:
[source,c++]
----
struct my_string_hash_function
{
using is_avalanching = std::true_type; // instruct Boost.Unordered to not use post-mixing
std::size_t operator()(const std::string& x) const
{
...
}
};
----
By setting the
xref:#hash_traits_hash_is_avalanching[hash_is_avalanching] trait, we inform Boost.Unordered
that `my_string_hash_function` is of sufficient quality to be used directly without
any post-mixing safety net. This comes at the risk of degraded performance in the
cases where the hash function is not as well-behaved as we've declared.
== Container Statistics
If we globally define the macro `BOOST_UNORDERED_ENABLE_STATS`, open-addressing and
concurrent containers will calculate some internal statistics directly correlated to the
quality of the hash function:
[source,c++]
----
#define BOOST_UNORDERED_ENABLE_STATS
#include <boost/unordered/unordered_map.hpp>
...
int main()
{
boost::unordered_flat_map<std::string, int, my_string_hash> m;
... // use m
auto stats = m.get_stats();
... // inspect stats
}
----
The `stats` object provides the following information:
[source,subs=+quotes]
----
stats
.insertion // *Insertion operations*
.count // Number of operations
.probe_length // Probe length per operation
.average
.variance
.deviation
.successful_lookup // *Lookup operations (element found)*
.count // Number of operations
.probe_length // Probe length per operation
.average
.variance
.deviation
.num_comparisons // Elements compared per operation
.average
.variance
.deviation
.unsuccessful_lookup // *Lookup operations (element not found)*
.count // Number of operations
.probe_length // Probe length per operation
.average
.variance
.deviation
.num_comparisons // Elements compared per operation
.average
.variance
.deviation
----
Statistics for three internal operations are maintained: insertions (without considering
the previous lookup to determine that the key is not present yet), successful lookups,
and unsuccessful lookups (including those issued internally when inserting elements).
_Probe length_ is the number of
xref:#structures_open_addressing_containers[bucket groups] accessed per operation.
If the hash function behaves properly:
* Average probe lengths should be close to 1.0.
* The average number of comparisons per successful lookup should be close to 1.0 (that is,
just the element found is checked).
* The average number of comparisons per unsuccessful lookup should be close to 0.0.
An link:../../benchmark/string_stats.cpp[example^] is provided that displays container
statistics for `boost::hash<std::string>`, an implementation of the
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function#FNV-1a_hash[FNV-1a hash^]
and two ill-behaved custom hash functions that have been incorrectly marked as avalanching:
[listing]
----
boost::unordered_flat_map: 319 ms
insertion: probe length 1.08771
successful lookup: probe length 1.06206, num comparisons 1.02121
unsuccessful lookup: probe length 1.12301, num comparisons 0.0388251
boost::unordered_flat_map, FNV-1a: 301 ms
insertion: probe length 1.09567
successful lookup: probe length 1.06202, num comparisons 1.0227
unsuccessful lookup: probe length 1.12195, num comparisons 0.040527
boost::unordered_flat_map, slightly_bad_hash: 654 ms
insertion: probe length 1.03443
successful lookup: probe length 1.04137, num comparisons 6.22152
unsuccessful lookup: probe length 1.29334, num comparisons 11.0335
boost::unordered_flat_map, bad_hash: 12216 ms
insertion: probe length 699.218
successful lookup: probe length 590.183, num comparisons 43.4886
unsuccessful lookup: probe length 1361.65, num comparisons 75.238
----
-12
View File
@@ -1,12 +0,0 @@
= Boost.Unordered
:toc: left
:toclevels: 3
:idprefix:
:docinfo: private-footer
:source-highlighter: rouge
:source-language: c++
:nofooter:
:sectlinks:
:leveloffset: +1
-100
View File
@@ -1,100 +0,0 @@
[#intro]
= Introduction
:idprefix: intro_
:cpp: C++
link:https://en.wikipedia.org/wiki/Hash_table[Hash tables^] are extremely popular
computer data structures and can be found under one form or another in virtually any programming
language. Whereas other associative structures such as rb-trees (used in {cpp} by `std::set` and `std::map`)
have logarithmic-time complexity for insertion and lookup, hash tables, if configured properly,
perform these operations in constant time on average, and are generally much faster.
{cpp} introduced __unordered associative containers__ `std::unordered_set`, `std::unordered_map`,
`std::unordered_multiset` and `std::unordered_multimap` in {cpp}11, but research on hash tables
hasn't stopped since: advances in CPU architectures such as
more powerful caches, link:https://en.wikipedia.org/wiki/Single_instruction,_multiple_data[SIMD] operations
and increasingly available link:https://en.wikipedia.org/wiki/Multi-core_processor[multicore processors]
open up possibilities for improved hash-based data structures and new use cases that
are simply beyond reach of unordered associative containers as specified in 2011.
Boost.Unordered offers a catalog of hash containers with different standards compliance levels,
performances and intented usage scenarios:
[caption=, title='Table {counter:table-counter}. Boost.Unordered containers']
[cols="1,1,.^1", frame=all, grid=all]
|===
^h|
^h|*Node-based*
^h|*Flat*
^.^h|*Closed addressing*
^m|
boost::unordered_set +
boost::unordered_map +
boost::unordered_multiset +
boost::unordered_multimap
^|
^.^h|*Open addressing*
^m| boost::unordered_node_set +
boost::unordered_node_map
^m| boost::unordered_flat_set +
boost::unordered_flat_map
^.^h|*Concurrent*
^| `boost::concurrent_node_set` +
`boost::concurrent_node_map`
^| `boost::concurrent_flat_set` +
`boost::concurrent_flat_map`
|===
* **Closed-addressing containers** are fully compliant with the C++ specification
for unordered associative containers and feature one of the fastest implementations
in the market within the technical constraints imposed by the required standard interface.
* **Open-addressing containers** rely on much faster data structures and algorithms
(more than 2 times faster in typical scenarios) while slightly diverging from the standard
interface to accommodate the implementation.
There are two variants: **flat** (the fastest) and **node-based**, which
provide pointer stability under rehashing at the expense of being slower.
* Finally, **concurrent containers** are designed and implemented to be used in high-performance
multithreaded scenarios. Their interface is radically different from that of regular C++ containers.
Flat and node-based variants are provided.
All sets and maps in Boost.Unordered are instantiatied similarly as
`std::unordered_set` and `std::unordered_map`, respectively:
[source,c++]
----
namespace boost {
template <
class Key,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key> >
class unordered_set;
// same for unordered_multiset, unordered_flat_set, unordered_node_set,
// concurrent_flat_set and concurrent_node_set
template <
class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
class unordered_map;
// same for unordered_multimap, unordered_flat_map, unordered_node_map,
// concurrent_flat_map and concurrent_node_map
}
----
Storing an object in an unordered associative container requires both a
key equality function and a hash function. The default function objects in
the standard containers support a few basic types including integer types,
floating point types, pointer types, and the standard strings. Since
Boost.Unordered uses link:../../../container_hash/index.html[boost::hash^] it also supports some other types,
including standard containers. To use any types not supported by these methods
you have to extend Boost.Hash to support the type or use
your own custom equality predicates and hash functions. See the
<<hash_equality,Equality Predicates and Hash Functions>> section
for more details.
-143
View File
@@ -1,143 +0,0 @@
[#rationale]
:idprefix: rationale_
= Implementation Rationale
== Closed-addressing Containers
`boost::unordered_[multi]set` and `boost::unordered_[multi]map`
adhere to the standard requirements for unordered associative
containers, so the interface was fixed. But there are
still some implementation decisions to make. The priorities are
conformance to the standard and portability.
The http://en.wikipedia.org/wiki/Hash_table[Wikipedia article on hash tables^]
has a good summary of the implementation issues for hash tables in general.
=== Data Structure
By specifying an interface for accessing the buckets of the container the
standard pretty much requires that the hash table uses closed addressing.
It would be conceivable to write a hash table that uses another method. For
example, it could use open addressing, and use the lookup chain to act as a
bucket but there are some serious problems with this:
* The standard requires that pointers to elements aren't invalidated, so
the elements can't be stored in one array, but will need a layer of
indirection instead - losing the efficiency and most of the memory gain,
the main advantages of open addressing.
* Local iterators would be very inefficient and may not be able to
meet the complexity requirements.
* There are also the restrictions on when iterators can be invalidated. Since
open addressing degrades badly when there are a high number of collisions the
restrictions could prevent a rehash when it's really needed. The maximum load
factor could be set to a fairly low value to work around this - but the
standard requires that it is initially set to 1.0.
* And since the standard is written with a eye towards closed
addressing, users will be surprised if the performance doesn't reflect that.
So closed addressing is used.
=== Number of Buckets
There are two popular methods for choosing the number of buckets in a hash
table. One is to have a prime number of buckets, another is to use a power
of 2.
Using a prime number of buckets, and choosing a bucket by using the modulus
of the hash function's result will usually give a good result. The downside
is that the required modulus operation is fairly expensive. This is what the
containers used to do in most cases.
Using a power of 2 allows for much quicker selection of the bucket to use,
but at the expense of losing the upper bits of the hash value. For some
specially designed hash functions it is possible to do this and still get a
good result but as the containers can take arbitrary hash functions this can't
be relied on.
To avoid this a transformation could be applied to the hash function, for an
example see
http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm[Thomas Wang's article on integer hash functions^].
Unfortunately, a transformation like Wang's requires knowledge of the number
of bits in the hash value, so it was only used when `size_t` was 64 bit.
Since release 1.79.0, https://en.wikipedia.org/wiki/Hash_function#Fibonacci_hashing[Fibonacci hashing]
is used instead. With this implementation, the bucket number is determined
by using `(h * m) >> (w - k)`, where `h` is the hash value, `m` is `2^w` divided
by the golden ratio, `w` is the word size (32 or 64), and `2^k` is the
number of buckets. This provides a good compromise between speed and
distribution.
Since release 1.80.0, prime numbers are chosen for the number of buckets in
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
the result of the user's hash function as was used for release 1.79.0.
== Open-addresing Containers
The C++ standard specification of unordered associative containers impose
severe limitations on permissible implementations, the most important being
that closed addressing is implicitly assumed. Slightly relaxing this specification
opens up the possibility of providing container variations taking full
advantage of open-addressing techniques.
The design of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` has been
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
We discuss here the most relevant principles.
=== Hash Function
Given its rich functionality and cross-platform interoperability,
`boost::hash` remains the default hash function of open-addressing containers.
As it happens, `boost::hash` for integral and other basic types does not possess
the statistical properties required by open addressing; to cope with this,
we implement a post-mixing stage:
{nbsp}{nbsp}{nbsp}{nbsp} _a_ <- _h_ *mulx* _C_, +
{nbsp}{nbsp}{nbsp}{nbsp} _h_ <- *high*(_a_) *xor* *low*(_a_),
where *mulx* is an _extended multiplication_ (128 bits in 64-bit architectures, 64 bits in 32-bit environments),
and *high* and *low* are the upper and lower halves of an extended word, respectively.
In 64-bit architectures, _C_ is the integer part of 2^64^&#8725;https://en.wikipedia.org/wiki/Golden_ratio[_&phi;_],
whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from https://arxiv.org/abs/2001.05304[Steele and Vigna (2021)^].
When using a hash function directly suitable for open addressing, post-mixing can be opted out of via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
`boost::hash` specializations for string types are marked as avalanching.
=== Platform Interoperability
The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically
identical across different compilers as long as their ``std::size_t``s are the same size and the user-provided
hash function and equality predicate are also interoperable
&#8212;this includes elements being ordered in exactly the same way for the same sequence of
operations.
Although the implementation internally uses SIMD technologies, such as https://en.wikipedia.org/wiki/SSE2[SSE2^]
and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[Neon^], when available,
this does not affect interoperatility. For instance, the behavior is the same
for Visual Studio on an x64-mode Intel CPU with SSE2 and for GCC on an IBM s390x without any supported SIMD technology.
== Concurrent Containers
The same data structure used by Boost.Unordered open-addressing containers has been chosen
also as the foundation of `boost::concurrent_flat_set`/`boost::concurrent_node_set` and
`boost::concurrent_flat_map`/`boost::concurrent_node_map`:
* Open-addressing is faster than closed-addressing alternatives, both in non-concurrent and
concurrent scenarios.
* Open-addressing layouts are eminently suitable for concurrent access and modification
with minimal locking. In particular, the metadata array can be used for implementations of
lookup that are lock-free up to the last step of actual element comparison.
* Layout compatibility with Boost.Unordered flat containers allows for
xref:#concurrent_interoperability_with_non_concurrent_containers[fast transfer]
of all elements between a concurrent container and its non-concurrent counterpart,
and vice versa.
=== Hash Function and Platform Interoperability
Concurrent containers make the same decisions and provide the same guarantees
as Boost.Unordered open-addressing containers with regards to
xref:#rationale_hash_function[hash function defaults] and
xref:#rationale_platform_interoperability[platform interoperability].
-17
View File
@@ -1,17 +0,0 @@
[#reference]
= Reference
* xref:reference/unordered_map.adoc[unordered_map]
* xref:reference/unordered_multimap.adoc[unordered_multimap]
* xref:reference/unordered_set.adoc[unordered_set]
* xref:reference/unordered_multiset.adoc[unordered_multiset]
* xref:reference/hash_traits.adoc[hash_traits]
* xref:reference/stats.adoc[stats]
* xref:reference/unordered_flat_map.adoc[unordered_flat_map]
* xref:reference/unordered_flat_set.adoc[unordered_flat_set]
* xref:reference/unordered_node_map.adoc[unordered_node_map]
* xref:reference/unordered_node_set.adoc[unordered_node_set]
* xref:reference/concurrent_flat_map.adoc[concurrent_flat_map]
* xref:reference/concurrent_flat_set.adoc[concurrent_flat_set]
* xref:reference/concurrent_node_map.adoc[concurrent_node_map]
* xref:reference/concurrent_node_set.adoc[concurrent_node_set]
@@ -1,320 +0,0 @@
[#concurrent]
= Concurrent Containers
:idprefix: concurrent_
Boost.Unordered provides `boost::concurrent_node_set`, `boost::concurrent_node_map`,
`boost::concurrent_flat_set` and `boost::concurrent_flat_map`,
hash tables that allow concurrent write/read access from
different threads without having to implement any synchronzation mechanism on the user's side.
[source,c++]
----
std::vector<int> input;
boost::concurrent_flat_map<int,int> m;
...
// process input in parallel
const int num_threads = 8;
std::vector<std::jthread> threads;
std::size_t chunk = input.size() / num_threads; // how many elements per thread
for (int i = 0; i < num_threads; ++i) {
threads.emplace_back([&,i] {
// calculate the portion of input this thread takes care of
std::size_t start = i * chunk;
std::size_t end = (i == num_threads - 1)? input.size(): (i + 1) * chunk;
for (std::size_t n = start; n < end; ++n) {
m.emplace(input[n], calculation(input[n]));
}
});
}
----
In the example above, threads access `m` without synchronization, just as we'd do in a
single-threaded scenario. In an ideal setting, if a given workload is distributed among
_N_ threads, execution is _N_ times faster than with one thread —this limit is
never attained in practice due to synchronization overheads and _contention_ (one thread
waiting for another to leave a locked portion of the map), but Boost.Unordered concurrent containers
are designed to perform with very little overhead and typically achieve _linear scaling_
(that is, performance is proportional to the number of threads up to the number of
logical cores in the CPU).
== Visitation-based API
The first thing a new user of Boost.Unordered concurrent containers
will notice is that these classes _do not provide iterators_ (which makes them technically
not https://en.cppreference.com/w/cpp/named_req/Container[Containers^]
in the C++ standard sense). The reason for this is that iterators are inherently
thread-unsafe. Consider this hypothetical code:
[source,c++]
----
auto it = m.find(k); // A: get an iterator pointing to the element with key k
if (it != m.end() ) {
some_function(*it); // B: use the value of the element
}
----
In a multithreaded scenario, the iterator `it` may be invalid at point B if some other
thread issues an `m.erase(k)` operation between A and B. There are designs that
can remedy this by making iterators lock the element they point to, but this
approach lends itself to high contention and can easily produce deadlocks in a program.
`operator[]` has similar concurrency issues, and is not provided by
`boost::concurrent_flat_map`/`boost::concurrent_node_map` either. Instead, element access is done through
so-called _visitation functions_:
[source,c++]
----
m.visit(k, [](const auto& x) { // x is the element with key k (if it exists)
some_function(x); // use it
});
----
The visitation function passed by the user (in this case, a lambda function)
is executed internally by Boost.Unordered in
a thread-safe manner, so it can access the element without worrying about other
threads interfering in the process.
On the other hand, a visitation function can _not_ access the container itself:
[source,c++]
----
m.visit(k, [&](const auto& x) {
some_function(x, m.size()); // forbidden: m can't be accessed inside visitation
});
----
Access to a different container is allowed, though:
[source,c++]
----
m.visit(k, [&](const auto& x) {
if (some_function(x)) {
m2.insert(x); // OK, m2 is a different boost::concurrent_flat_map
}
});
----
But, in general, visitation functions should be as lightweight as possible to
reduce contention and increase parallelization. In some cases, moving heavy work
outside of visitation may be beneficial:
[source,c++]
----
std::optional<value_type> o;
bool found = m.visit(k, [&](const auto& x) {
o = x;
});
if (found) {
some_heavy_duty_function(*o);
}
----
Visitation is prominent in the API provided by concurrent containers, and
many classical operations have visitation-enabled variations:
[source,c++]
----
m.insert_or_visit(x, [](auto& y) {
// if insertion failed because of an equivalent element y,
// do something with it, for instance:
++y.second; // increment the mapped part of the element
});
----
Note that in this last example the visitation function could actually _modify_
the element: as a general rule, operations on a concurrent map `m`
will grant visitation functions const/non-const access to the element depending on whether
`m` is const/non-const. Const access can be always be explicitly requested
by using `cvisit` overloads (for instance, `insert_or_cvisit`) and may result
in higher parallelization. For concurrent sets, on the other hand,
visitation is always const access.
Although expected to be used much less frequently, concurrent containers
also provide insertion operations where an element can be visited right after
element creation (in addition to the usual visitation when an equivalent
element already exists):
[source,c++]
----
m.insert_and_cvisit(x,
[](const auto& y) {
std::cout<< "(" << y.first << ", " << y.second <<") inserted\n";
},
[](const auto& y) {
std::cout<< "(" << y.first << ", " << y.second << ") already exists\n";
});
----
Consult the references of
xref:#concurrent_node_set[`boost::concurrent_node_set`],
xref:#concurrent_flat_map[`boost::concurrent_node_map`],
xref:#concurrent_flat_set[`boost::concurrent_flat_set`] and
xref:#concurrent_flat_map[`boost::concurrent_flat_map`]
for the complete list of visitation-enabled operations.
== Whole-Table Visitation
In the absence of iterators, `visit_all` is provided
as an alternative way to process all the elements in the container:
[source,c++]
----
m.visit_all([](auto& x) {
x.second = 0; // reset the mapped part of the element
});
----
In C++17 compilers implementing standard parallel algorithms, whole-table
visitation can be parallelized:
[source,c++]
----
m.visit_all(std::execution::par, [](auto& x) { // run in parallel
x.second = 0; // reset the mapped part of the element
});
----
Traversal can be interrupted midway:
[source,c++]
----
// finds the key to a given (unique) value
int key = 0;
int value = ...;
bool found = !m.visit_while([&](const auto& x) {
if(x.second == value) {
key = x.first;
return false; // finish
}
else {
return true; // keep on visiting
}
});
if(found) { ... }
----
There is one last whole-table visitation operation, `erase_if`:
[source,c++]
----
m.erase_if([](auto& x) {
return x.second == 0; // erase the elements whose mapped value is zero
});
----
`visit_while` and `erase_if` can also be parallelized. Note that, in order to increase efficiency,
whole-table visitation operations do not block the table during execution: this implies that elements
may be inserted, modified or erased by other threads during visitation. It is
advisable not to assume too much about the exact global state of a concurrent container
at any point in your program.
== Bulk visitation
Suppose you have an `std::array` of keys you want to look up for in a concurrent map:
[source,c++]
----
std::array<int, N> keys;
...
for(const auto& key: keys) {
m.visit(key, [](auto& x) { ++x.second; });
}
----
_Bulk visitation_ allows us to pass all the keys in one operation:
[source,c++]
----
m.visit(keys.begin(), keys.end(), [](auto& x) { ++x.second; });
----
This functionality is not provided for mere syntactic convenience, though: by processing all the
keys at once, some internal optimizations can be applied that increase
performance over the regular, one-at-a-time case (consult the
xref:#benchmarks_boostconcurrent_flat_map[benchmarks]). In fact, it may be beneficial
to buffer incoming keys so that they can be bulk visited in chunks:
[source,c++]
----
static constexpr auto bulk_visit_size = boost::concurrent_flat_map<int,int>::bulk_visit_size;
std::array<int, bulk_visit_size> buffer;
std::size_t i=0;
while(...) { // processing loop
...
buffer[i++] = k;
if(i == bulk_visit_size) {
map.visit(buffer.begin(), buffer.end(), [](auto& x) { ++x.second; });
i = 0;
}
...
}
// flush remaining keys
map.visit(buffer.begin(), buffer.begin() + i, [](auto& x) { ++x.second; });
----
There's a latency/throughput tradeoff here: it will take longer for incoming keys to
be processed (since they are buffered), but the number of processed keys per second
is higher. `bulk_visit_size` is the recommended chunk size —smaller buffers
may yield worse performance.
== Blocking Operations
Concurrent containers can be copied, assigned, cleared and merged just like any other
Boost.Unordered container. Unlike most other operations, these are _blocking_,
that is, all other threads are prevented from accesing the tables involved while a copy, assignment,
clear or merge operation is in progress. Blocking is taken care of automatically by the library
and the user need not take any special precaution, but overall performance may be affected.
Another blocking operation is _rehashing_, which happens explicitly via `rehash`/`reserve`
or during insertion when the table's load hits `max_load()`. As with non-concurrent containers,
reserving space in advance of bulk insertions will generally speed up the process.
== Interoperability with non-concurrent containers
As open-addressing and concurrent containers are based on the same internal data structure,
they can be efficiently move-constructed from their non-concurrent counterpart, and vice versa.
[caption=, title='Table {counter:table-counter}. Concurrent/non-concurrent interoperatibility']
[cols="1,1", frame=all, grid=all]
|===
^|`boost::concurrent_node_set`
^|`boost::unordered_node_set`
^|`boost::concurrent_node_map`
^|`boost::unordered_node_map`
^|`boost::concurrent_flat_set`
^|`boost::unordered_flat_set`
^|`boost::concurrent_flat_map`
^|`boost::unordered_flat_map`
|===
This interoperability comes handy in multistage scenarios where parts of the data processing happen
in parallel whereas other steps are non-concurrent (or non-modifying). In the following example,
we want to construct a histogram from a huge input vector of words:
the population phase can be done in parallel with `boost::concurrent_flat_map` and results
then transferred to the final container.
[source,c++]
----
std::vector<std::string> words = ...;
// Insert words in parallel
boost::concurrent_flat_map<std::string_view, std::size_t> m0;
std::for_each(
std::execution::par, words.begin(), words.end(),
[&](const auto& word) {
m0.try_emplace_or_visit(word, 1, [](auto& x) { ++x.second; });
});
// Transfer to a regular unordered_flat_map
boost::unordered_flat_map m=std::move(m0);
----
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
@@ -1,51 +0,0 @@
[#hash_traits]
== Hash traits
:idprefix: hash_traits_
=== Synopsis
[listing,subs="+macros,+quotes"]
-----
// #include <boost/unordered/hash_traits.hpp>
namespace boost {
namespace unordered {
template<typename Hash>
struct xref:#hash_traits_hash_is_avalanching[hash_is_avalanching];
} // namespace unordered
} // namespace boost
-----
---
=== hash_is_avalanching
```c++
template<typename Hash>
struct hash_is_avalanching;
```
A hash function is said to have the _avalanching property_ if small changes in the input translate to
large changes in the returned hash code &#8212;ideally, flipping one bit in the representation of
the input value results in each bit of the hash code flipping with probability 50%. Approaching
this property is critical for the proper behavior of open-addressing hash containers.
`hash_is_avalanching<Hash>::value` is:
* `false` if `Hash::is_avalanching` is not present,
* `Hash::is_avalanching::value` if this is present and convertible at compile time to a `bool`,
* `true` if `Hash::is_avalanching` is `void` (this usage is deprecated),
* ill-formed otherwise.
Users can then declare a hash function `Hash` as avalanching either by embedding an appropriate `is_avalanching` typedef
into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to a class with
an embedded compile-time constant `value` set to `true`.
Open-addressing and concurrent containers
use the provided hash function `Hash` as-is if `hash_is_avalanching<Hash>::value` is `true`; otherwise, they
implement a bit-mixing post-processing stage to increase the quality of hashing at the expense of
extra computational cost.
---
@@ -1,71 +0,0 @@
[#stats]
== Statistics
:idprefix: stats_
Open-addressing and concurrent containers can be configured to keep running statistics
of some internal operations affected by the quality of the supplied hash function.
=== Synopsis
[listing,subs="+macros,+quotes"]
-----
struct xref:#stats_stats_summary_type[__stats-summary-type__]
{
double average;
double variance;
double deviation;
};
struct xref:#stats_insertion_stats_type[__insertion-stats-type__]
{
std::size_t count;
xref:#stats_stats_summary_type[__stats-summary-type__] probe_length;
};
struct xref:stats_lookup_stats_type[__lookup-stats-type__]
{
std::size_t count;
xref:#stats_stats_summary_type[__stats-summary-type__] probe_length;
xref:#stats_stats_summary_type[__stats-summary-type__] num_comparisons;
};
struct xref:stats_stats_type[__stats-type__]
{
xref:#stats_insertion_stats_type[__insertion-stats-type__] insertion;
xref:stats_lookup_stats_type[__lookup-stats-type__] successful_lookup,
unsuccessful_lookup;
};
-----
==== __stats-summary-type__
Provides the average value, variance and standard deviation of a sequence of numerical values.
==== __insertion-stats-type__
Provides the number of insertion operations performed by a container and
statistics on the associated __probe length__ (number of
xref:#structures_open_addressing_containers[bucket groups] accessed per operation).
==== __lookup-stats-type__
For successful (element found) or unsuccessful (not found) lookup,
provides the number of operations performed by a container and
statistics on the associated __probe length__ (number of
xref:#structures_open_addressing_containers[bucket groups] accessed)
and number of element comparisons per operation.
==== __stats-type__
Provides statistics on insertion, successful and unsuccessful lookups performed by a container.
If the supplied hash function has good quality, then:
* Average probe lenghts should be close to 1.0.
* For successful lookups, the average number of element comparisons should be close to 1.0.
* For unsuccessful lookups, the average number of element comparisons should be close to 0.0.
These statistics can be used to determine if a given hash function
can be marked as xref:hash_traits_hash_is_avalanching[__avalanching__].
---
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
-203
View File
@@ -1,203 +0,0 @@
[#regular]
= Regular Containers
:idprefix: regular_
Boost.Unordered closed-addressing containers (`boost::unordered_set`, `boost::unordered_map`,
`boost::unordered_multiset` and `boost::unordered_multimap`) are fully conformant with the
C++ specification for unordered associative containers, so for those who know how to use
`std::unordered_set`, `std::unordered_map`, etc., their homonyms in Boost.Unordered are
drop-in replacements. The interface of open-addressing containers (`boost::unordered_node_set`,
`boost::unordered_node_map`, `boost::unordered_flat_set` and `boost::unordered_flat_map`)
is very similar, but they present some minor differences listed in the dedicated
xref:#compliance_open_addressing_containers[standard compliance section].
For readers without previous experience with hash containers but familiar
with normal associative containers (`std::set`, `std::map`,
`std::multiset` and `std::multimap`), Boost.Unordered containers are used in a similar manner:
[source,cpp]
----
typedef boost::unordered_map<std::string, int> map;
map x;
x["one"] = 1;
x["two"] = 2;
x["three"] = 3;
assert(x.at("one") == 1);
assert(x.find("missing") == x.end());
----
But since the elements aren't ordered, the output of:
[source,c++]
----
for(const map::value_type& i: x) {
std::cout<<i.first<<","<<i.second<<"\n";
}
----
can be in any order. For example, it might be:
[source]
----
two,2
one,1
three,3
----
There are other differences, which are listed in the
<<comparison,Comparison with Associative Containers>> section.
== Iterator Invalidation
It is not specified how member functions other than `rehash` and `reserve` affect
the bucket count, although `insert` can only invalidate iterators
when the insertion causes the container's load to be greater than the maximum allowed.
For most implementations this means that `insert` will only
change the number of buckets when this happens. Iterators can be
invalidated by calls to `insert`, `rehash` and `reserve`.
As for pointers and references,
they are never invalidated for node-based containers
(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`),
but they will be when rehashing occurs for
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
these containers store elements directly into their holding buckets, so
when allocating a new bucket array the elements must be transferred by means of move construction.
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
to call `reserve` before inserting a large number of elements. This will get
the expensive rehashing out of the way and let you store iterators, safe in
the knowledge that they won't be invalidated. If you are inserting `n`
elements into container `x`, you could first call:
```
x.reserve(n);
```
Note:: `reserve(n)` reserves space for at least `n` elements, allocating enough buckets
so as to not exceed the maximum load factor.
+
Because the maximum load factor is defined as the number of elements divided by the total
number of available buckets, this function is logically equivalent to:
+
```
x.rehash(std::ceil(n / x.max_load_factor()))
```
+
See the <<unordered_map_rehash,reference for more details>> on the `rehash` function.
[#comparison]
:idprefix: comparison_
== Comparison with Associative Containers
[caption=, title='Table {counter:table-counter} Interface differences']
[cols="1,1", frame=all, grid=rows]
|===
|Associative Containers |Unordered Associative Containers
|Parameterized by an ordering relation `Compare`
|Parameterized by a function object `Hash` and an equivalence relation `Pred`
|Keys can be compared using `key_compare` which is accessed by member function `key_comp()`, values can be compared using `value_compare` which is accessed by member function `value_comp()`.
|Keys can be hashed using `hasher` which is accessed by member function `hash_function()`, and checked for equality using `key_equal` which is accessed by member function `key_eq()`. There is no function object for compared or hashing values.
|Constructors have optional extra parameters for the comparison object.
|Constructors have optional extra parameters for the initial minimum number of buckets, a hash function and an equality object.
|Keys `k1`, `k2` are considered equivalent if `!Compare(k1, k2) && !Compare(k2, k1)`.
|Keys `k1`, `k2` are considered equivalent if `Pred(k1, k2)`
|Member function `lower_bound(k)` and `upper_bound(k)`
|No equivalent. Since the elements aren't ordered `lower_bound` and `upper_bound` would be meaningless.
|`equal_range(k)` returns an empty range at the position that `k` would be inserted if `k` isn't present in the container.
|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. +
**Closed-addressing containers:** To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
|`iterator`, `const_iterator` are of the bidirectional category.
|`iterator`, `const_iterator` are of at least the forward category.
|Iterators, pointers and references to the container's elements are never invalidated.
|<<regular_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
**Node-based containers:** Pointers and references to the container's elements are never invalidated. +
**Flat containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
|Iterators iterate through the container in the order defined by the comparison object.
|Iterators iterate through the container in an arbitrary order, that can change as elements are inserted, although equivalent elements are always adjacent.
|No equivalent
|**Closed-addressing containers:** Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
|Can be compared using the `==`, `!=`, `<`, `\<=`, `>`, `>=` operators.
|Can be compared using the `==` and `!=` operators.
|
|When inserting with a hint, implementations are permitted to ignore the hint.
|===
---
[caption=, title='Table {counter:table-counter} Complexity Guarantees']
[cols="1,1,1", frame=all, grid=rows]
|===
|Operation |Associative Containers |Unordered Associative Containers
|Construction of empty container
|constant
|O(_n_) where _n_ is the minimum number of buckets.
|Construction of container from a range of _N_ elements
|O(_N log N_), O(_N_) if the range is sorted with `value_comp()`
|Average case O(_N_), worst case O(_N^2^_)
|Insert a single element
|logarithmic
|Average case constant, worst case linear
|Insert a single element with a hint
|Amortized constant if `t` elements inserted right after hint, logarithmic otherwise
|Average case constant, worst case linear (ie. the same as a normal insert).
|Inserting a range of _N_ elements
|_N_ log(`size()` + _N_)
|Average case O(_N_), worst case O(_N_ * `size()`)
|Erase by key, `k`
|O(log(`size()`) + `count(k)`)
|Average case: O(`count(k)`), Worst case: O(`size()`)
|Erase a single element by iterator
|Amortized constant
|Average case: O(1), Worst case: O(`size()`)
|Erase a range of _N_ elements
|O(log(`size()`) + _N_)
|Average case: O(_N_), Worst case: O(`size()`)
|Clearing the container
|O(`size()`)
|O(`size()`)
|Find
|logarithmic
|Average case: O(1), Worst case: O(`size()`)
|Count
|O(log(`size()`) + `count(k)`)
|Average case: O(1), Worst case: O(`size()`)
|`equal_range(k)`
|logarithmic
|Average case: O(`count(k)`), Worst case: O(`size()`)
|`lower_bound`,`upper_bound`
|logarithmic
|n/a
|===
-180
View File
@@ -1,180 +0,0 @@
[#structures]
= Data Structures
:idprefix: structures_
== Closed-addressing Containers
++++
<style>
.imageblock > .title {
text-align: inherit;
}
</style>
++++
Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below:
[#img-bucket-groups,.text-center]
.A simple bucket group approach
image::bucket-groups.png[align=center]
An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`.
Canonical standard implementations will wind up looking like the diagram below:
[.text-center]
.The canonical standard approach
image::singly-linked.png[align=center,link=_images/singly-linked.png,window=_blank]
It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here].
This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done:
```c++
auto const idx = get_bucket_idx(hash_function(key));
node* p = buckets[idx]; // first load
node* n = p->next; // second load
if (n && is_in_bucket(n, idx)) {
value_type const& v = *n; // third load
// ...
}
```
With a simple bucket group layout, this is all that must be done:
```c++
auto const idx = get_bucket_idx(hash_function(key));
node* n = buckets[idx]; // first load
if (n) {
value_type const& v = *n; // second load
// ...
}
```
In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below:
[#img-fca-layout]
.The new layout used by Boost
image::fca.png[align=center]
Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups.
A more detailed description of Boost.Unordered's closed-addressing implementation is
given in an
https://bannalia.blogspot.com/2022/06/advancing-state-of-art-for.html[external article].
For more information on implementation rationale, read the
xref:rationale.adoc#rationale_open_addresing_containers[corresponding section].
== Open-addressing Containers
The diagram shows the basic internal layout of `boost::unordered_flat_set`/`unordered_node_set` and
`boost:unordered_flat_map`/`unordered_node_map`.
[#img-foa-layout]
.Open-addressing layout used by Boost.Unordered.
image::foa.png[align=center]
As with all open-addressing containers, elements (or pointers to the element nodes in the case of
`boost::unordered_node_set` and `boost::unordered_node_map`) are stored directly in the bucket array.
This array is logically divided into 2^_n_^ _groups_ of 15 elements each.
In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^
16-byte words.
[#img-foa-metadata]
.Breakdown of a metadata word.
image::foa-metadata.png[align=center]
A metadata word is divided into 15 _h_~_i_~ bytes (one for each associated
bucket), and an _overflow byte_ (_ofw_ in the diagram). The value of _h_~_i_~ is:
- 0 if the corresponding bucket is empty.
- 1 to encode a special empty bucket called a _sentinel_, which is used internally to
stop iteration when the container has been fully traversed.
- If the bucket is occupied, a _reduced hash value_ obtained from the hash value of
the element.
When looking for an element with hash value _h_, SIMD technologies such as
https://en.wikipedia.org/wiki/SSE2[SSE2] and
https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(Neon)[Neon] allow us
to very quickly inspect the full metadata word and look for the reduced value of _h_ among all the
15 buckets with just a handful of CPU instructions: non-matching buckets can be
readily discarded, and those whose reduced hash value matches need be inspected via full
comparison with the corresponding element. If the looked-for element is not present,
the overflow byte is inspected:
- If the bit in the position _h_ mod 8 is zero, lookup terminates (and the
element is not present).
- If the bit is set to 1 (the group has been _overflowed_), further groups are
checked using https://en.wikipedia.org/wiki/Quadratic_probing[_quadratic probing_], and
the process is repeated.
Insertion is algorithmically similar: empty buckets are located using SIMD,
and when going past a full group its corresponding overflow bit is set to 1.
In architectures without SIMD support, the logical layout stays the same, but the metadata
word is codified using a technique we call _bit interleaving_: this layout allows us
to emulate SIMD with reasonably good performance using only standard arithmetic and
logical operations.
[#img-foa-metadata-interleaving]
.Bit-interleaved metadata word.
image::foa-metadata-interleaving.png[align=center]
A more detailed description of Boost.Unordered's open-addressing implementation is
given in an
https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
For more information on implementation rationale, read the
xref:#rationale_open_addresing_containers[corresponding section].
== Concurrent Containers
`boost::concurrent_flat_set`/`boost::concurrent_node_set` and
`boost::concurrent_flat_map`/`boost::concurrent_node_map` use the basic
xref:#structures_open_addressing_containers[open-addressing layout] described above
augmented with synchronization mechanisms.
[#img-cfoa-layout]
.Concurrent open-addressing layout used by Boost.Unordered.
image::cfoa.png[align=center]
Two levels of synchronization are used:
* Container level: A read-write mutex is used to control access from any operation
to the container. Typically, such access is in read mode (that is, concurrent) even
for modifying operations, so for most practical purposes there is no thread
contention at this level. Access is only in write mode (blocking) when rehashing or
performing container-wide operations such as swapping or assignment.
* Group level: Each 15-slot group is equipped with an 8-byte word containing:
** A read-write spinlock for synchronized access to any element in the group.
** An atomic _insertion counter_ used for optimistic insertion as described
below.
By using atomic operations to access the group metadata, lookup is (group-level)
lock-free up to the point where an actual comparison needs to be done with an element
that has been previously SIMD-matched: only then is the group's spinlock used.
Insertion uses the following _optimistic algorithm_:
* The value of the insertion counter for the initial group in the probe
sequence is locally recorded (let's call this value `c0`).
* Lookup is as described above. If lookup finds no equivalent element,
search for an available slot for insertion successively locks/unlocks
each group in the probing sequence.
* When an available slot is located, it is preemptively occupied (its
reduced hash value is set) and the insertion counter is atomically
incremented: if no other thread has incremented the counter during the
whole operation (which is checked by comparing with `c0`), then we're
good to go and complete the insertion, otherwise we roll back and start
over.
This algorithm has very low contention both at the lookup and actual
insertion phases in exchange for the possibility that computations have
to be started over if some other thread interferes in the process by
performing a succesful insertion beginning at the same group. In
practice, the start-over frequency is extremely small, measured in the range
of parts per million for some of our benchmarks.
For more information on implementation rationale, read the
xref:#rationale_concurrent_containers[corresponding section].