From 1dd085daaa6afa5232661b75b35a8b159bfa5f60 Mon Sep 17 00:00:00 2001 From: joaquintides Date: Mon, 13 Feb 2023 13:29:52 +0100 Subject: [PATCH] added unordered_node_[map|set] containers to the tutorial --- doc/unordered/buckets.adoc | 15 ++++---- doc/unordered/comparison.adoc | 4 +-- doc/unordered/compliance.adoc | 30 ++++++++-------- doc/unordered/intro.adoc | 65 +++++++++++++++++++++++++++++++++-- doc/unordered/rationale.adoc | 18 +++++----- 5 files changed, 98 insertions(+), 34 deletions(-) diff --git a/doc/unordered/buckets.adoc b/doc/unordered/buckets.adoc index de6239ca..494958ec 100644 --- a/doc/unordered/buckets.adoc +++ b/doc/unordered/buckets.adoc @@ -134,7 +134,8 @@ h|*Method* h|*Description* |Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor. 2+^h| *Open-addressing containers only* + -`boost::unordered_flat_set`, `boost::unordered_flat_map` +`boost::unordered_flat_set`, `boost::unordered_flat_map` + +`boost::unordered_node_set`, `boost::unordered_node_map` + h|*Method* h|*Description* |`size_type max_load() const` @@ -160,8 +161,9 @@ change the number of buckets when this happens. Iterators can be invalidated by calls to `insert`, `rehash` and `reserve`. As for pointers and references, -they are never invalidated for closed-addressing containers (`boost::unordered_[multi]set`, `boost::unordered_[multi]map`), -but they will when rehashing occurs for open-addressing +they are never invalidated for node-based containers +(`boost::unordered_[multi]set`, `boost::unordered_[multi]map`, `boost::unordered_node_set`, `boost::unordered_node_map`), +but they will when rehashing occurs for `boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because these containers store elements directly into their holding buckets, so when allocating a new bucket array the elements must be transferred by means of move construction. @@ -252,15 +254,16 @@ xref:#rationale_boostunordered_multiset_and_boostunordered_multimap[correspondin == Open Addressing Implementation -The diagram shows the basic internal layout of `boost::unordered_flat_map` and -`boost:unordered_flat_set`. +The diagram shows the basic internal layout of `boost::unordered_flat_map`/`unordered_node_map` and +`boost:unordered_flat_set`/`unordered_node_set`. [#img-foa-layout] .Open-addressing layout used by Boost.Unordered. image::foa.png[align=center] -As with all open-addressing containers, elements are stored directly in the bucket array. +As with all open-addressing containers, elements (or element nodes in the case of +`boost::unordered_node_map` and `boost::unordered_node_set`) are stored directly in the bucket array. This array is logically divided into 2^_n_^ _groups_ of 15 elements each. In addition to the bucket array, there is an associated _metadata array_ with 2^_n_^ 16-byte words. diff --git a/doc/unordered/comparison.adoc b/doc/unordered/comparison.adoc index 8d8564e5..1d5dd97b 100644 --- a/doc/unordered/comparison.adoc +++ b/doc/unordered/comparison.adoc @@ -33,8 +33,8 @@ |Iterators, pointers and references to the container's elements are never invalidated. |<>. + -**Closed-addressing containers:** Pointers and references to the container's elements are never invalidated. + -**Open-addressing containers:** Pointers and references to the container's elements are invalidated when rehashing occurs. +**Node-based containers:** Pointers and references to the container's elements are never invalidated. + +**Flat containers:** Pointers and references to the container's elements are invalidated when rehashing occurs. |Iterators iterate through the container in the order defined by the comparison object. |Iterators iterate through the container in an arbitrary order, that can change as elements are inserted, although equivalent elements are always adjacent. diff --git a/doc/unordered/compliance.adoc b/doc/unordered/compliance.adoc index 16e869db..2fe27626 100644 --- a/doc/unordered/compliance.adoc +++ b/doc/unordered/compliance.adoc @@ -117,27 +117,29 @@ Variadic constructor arguments for `emplace` are only used when both rvalue references and variadic template parameters are available. Otherwise `emplace` can only take up to 10 constructors arguments. -== Open-addressing containers: unordered_flat_set, unordered_flat_map +== Open-addressing containers: unordered_flat_set/unordered_node_set, unordered_flat_map/unordered_node_map The C++ standard does not currently provide any open-addressing container -specification to adhere to, so `boost::unordered_flat_set` and -`boost::unordered_flat_map` take inspiration from `std::unordered_set` and +specification to adhere to, so `boost::unordered_flat_set`/`unordered_node_set` and +`boost::unordered_flat_map`/`unordered_node_map` take inspiration from `std::unordered_set` and `std::unordered_map`, respectively, and depart from their interface where convenient or as dictated by their internal data structure, which is -radically different from that imposed by the standard (closed addressing, node based). +radically different from that imposed by the standard (closed addressing). -`unordered_flat_set` and `unordered_flat_map` only work with reasonably +Open-addressing containers provided by Boost.Unordered only work with reasonably compliant C++11 (or later) compilers. Language-level features such as move semantics and variadic template parameters are then not emulated. -`unordered_flat_set` and `unordered_flat_map` are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^]. +The containers are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^]. The main differences with C++ unordered associative containers are: -* `value_type` must be move-constructible. -* Pointer stability is not kept under rehashing. -* `begin()` is not constant-time. -* `erase(iterator)` returns `void` instead of an iterator to the following element. -* There is no API for bucket handling (except `bucket_count`) or node extraction/insertion. -* The maximum load factor of the container is managed internally and can't be set by the user. The maximum load, -exposed through the public function `max_load`, may decrease on erasure under high-load conditions. - +* In general: + ** `begin()` is not constant-time. + ** `erase(iterator)` returns `void` instead of an iterator to the following element. + ** There is no API for bucket handling (except `bucket_count`). + ** The maximum load factor of the container is managed internally and can't be set by the user. The maximum load, + exposed through the public function `max_load`, may decrease on erasure under high-load conditions. +* Flat containers (`boost::unordered_flat_set` and `boost::unordered_flat_map`): + ** `value_type` must be move-constructible. + ** Pointer stability is not kept under rehashing. + ** There is no API for node extraction/insertion diff --git a/doc/unordered/intro.adoc b/doc/unordered/intro.adoc index f015d586..a809958f 100644 --- a/doc/unordered/intro.adoc +++ b/doc/unordered/intro.adoc @@ -106,8 +106,69 @@ namespace boost { } ---- -`boost::unordered_flat_set` and `boost::unordered_flat_map` require a -reasonably compliant C++11 compiler. +Starting in Boost 1.82, the containers `boost::unordered_node_set` and `boost::unordered_node_map` +are introduced: they use open addressing like `boost::unordered_flat_set` and `boost::unordered_flat_map`, +but internally store element _nodes_, like `boost::unordered_set` and `boost::unordered_map`, +which provide stability of pointers and references to the elements: + +[source,c++] +---- +// #include +// +// Note: no multiset version + +namespace boost { + template < + class Key, + class Hash = boost::hash, + class Pred = std::equal_to, + class Alloc = std::allocator > + class unordered_node_set; +} +---- + +[source,c++] +---- +// #include +// +// Note: no multimap version + +namespace boost { + template < + class Key, class Mapped, + class Hash = boost::hash, + class Pred = std::equal_to, + class Alloc = std::allocator > > + class unordered_node_map; +} +---- + +These are all the containers provided by Boost.Unordered: + +[caption=, title='Table {counter:table-counter}. Boost.Unordered containers'] +[cols="1,1,.^1", frame=all, grid=rows] +|=== +^h| +^h|*Node-based* +^h|*Flat* + +^.^h|*Closed addressing* +^| `boost::unordered_set` + +`boost::unordered_map` + +`boost::unordered_multiset` + +`boost::unordered_multimap` +^| + +^.^h|*Open addressing* +^| `boost::unordered_node_set` + +`boost::unordered_node_map` +^| `boost::unordered_flat_set` + +`boost::unordered_flat_map` + +|=== + +Closed-addressing containers are pass:[C++]98-compatible. Open-addressing containers require a +reasonably compliant pass:[C++]11 compiler. Boost.Unordered containers are used in a similar manner to the normal associative containers: diff --git a/doc/unordered/rationale.adoc b/doc/unordered/rationale.adoc index 0c5aca53..4485ab78 100644 --- a/doc/unordered/rationale.adoc +++ b/doc/unordered/rationale.adoc @@ -64,8 +64,8 @@ of bits in the hash value, so it was only used when `size_t` was 64 bit. Since release 1.79.0, https://en.wikipedia.org/wiki/Hash_function#Fibonacci_hashing[Fibonacci hashing] is used instead. With this implementation, the bucket number is determined -by using `(h * m) >> (w - k)`, where `h` is the hash value, `m` is the golden -ratio multiplied by `2^w`, `w` is the word size (32 or 64), and `2^k` is the +by using `(h * m) >> (w - k)`, where `h` is the hash value, `m` is `2^w` divided +by the golden ratio, `w` is the word size (32 or 64), and `2^k` is the number of buckets. This provides a good compromise between speed and distribution. @@ -73,7 +73,7 @@ Since release 1.80.0, prime numbers are chosen for the number of buckets in tandem with sophisticated modulo arithmetic. This removes the need for "mixing" the result of the user's hash function as was used for release 1.79.0. -== boost::unordered_flat_set and boost::unordered_flat_map +== boost::unordered_flat_set/unordered_node_set and boost::unordered_flat_map/unordered_node_map The C++ standard specification of unordered associative containers impose severe limitations on permissible implementations, the most important being @@ -81,14 +81,14 @@ that closed addressing is implicitly assumed. Slightly relaxing this specificati opens up the possibility of providing container variations taking full advantage of open-addressing techniques. -The design of `boost::unordered_flat_set` and `boost::unordered_flat_map` has been +The design of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` has been guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^]. We discuss here the most relevant principles. === Hash function Given its rich functionality and cross-platform interoperability, -`boost::hash` remains the default hash function of `boost::unordered_flat_set` and `boost::unordered_flat_map`. +`boost::hash` remains the default hash function of open-addressing containers. As it happens, `boost::hash` for integral and other basic types does not possess the statistical properties required by open addressing; to cope with this, we implement a post-mixing stage: @@ -98,17 +98,15 @@ we implement a post-mixing stage: where *mulx* is an _extended multiplication_ (128 bits in 64-bit architectures, 64 bits in 32-bit environments), and *high* and *low* are the upper and lower halves of an extended word, respectively. -In 64-bit architectures, _C_ is the integer part of -(1 − https://en.wikipedia.org/wiki/Golden_ratio[_φ_])·2^64^, -whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from -https://arxiv.org/abs/2001.05304[Steele and Vigna (2021)^]. +In 64-bit architectures, _C_ is the integer part of 2^64^∕https://en.wikipedia.org/wiki/Golden_ratio[_φ_], +whereas in 32 bits _C_ = 0xE817FB2Du has been obtained from https://arxiv.org/abs/2001.05304[Steele and Vigna (2021)^]. When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <>trait. `boost::hash` specializations for string types are marked as avalanching. === Platform interoperability -The observable behavior of `boost::unordered_flat_set` and `boost::unordered_flat_map` is deterministically +The observable behavior of `boost::unordered_flat_set`/`unordered_node_set` and `boost::unordered_flat_map`/`unordered_node_map` is deterministically identical across different compilers as long as their ``std::size_type``s are the same size and the user-provided hash function and equality predicate are also interoperable —this includes elements being ordered in exactly the same way for the same sequence of