uploaded current status

2022-10-30 19:16:43 +01:00
parent 90f2f0f67d
commit 2068cf8d5b
7 changed files with 196 additions and 43 deletions
--- a/doc/unordered/buckets.adoc
+++ b/doc/unordered/buckets.adoc
@@ -5,7 +5,7 @@
 = The Data Structure
 The containers are made up of a number of 'buckets', each of which can contain
-any number of elements. For example, the following diagram shows an <<unordered_set,unordered_set>> with 7 buckets containing 5 elements, `A`,
+any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
 `B`, `C`, `D` and `E` (this is just for illustration, containers will typically
 have more buckets).
@@ -31,20 +31,34 @@ equality predicates in the next section>>.
 You can see in the diagram that `A` & `D` have been placed in the same bucket.
 When looking for elements in this bucket up to 2 comparisons are made, making
-the search slower. This is known as a collision. To keep things fast we try to
+the search slower. This is known as a *collision*. To keep things fast we try to
 keep collisions to a minimum.
 If instead of `boost::unordered_set` we had used <<unordered_flat_set,`boost::unordered_flat_set`>>, the
 diagram would look as follows:
 image::buckets oa.png[]
 In open-addressing containers, buckets can hold at most one element; if a collision happens
 (like is the case of `D` in the example), the element uses some other available bucket in
 the vicinity of the original position. Given this simpler scenario, Boost.Unordered
 open-addressing containers offer a very limited API for accessing buckets.
 [caption=, title='Table {counter:table-counter}. Methods for Accessing Buckets']
 [cols="1,.^1", frame=all, grid=rows]
 |===
-|Method |Description
+2+^h| *All containers*
 h|*Method* h|*Description*
 |`size_type bucket_count() const` 
 |The number of buckets.
 2+^h| *Closed-addressing containers only* +
 `boost::unordered_[multi]set`, `boost::unordered_[multi]map` 
 h|*Method* h|*Description*
 |`size_type max_bucket_count() const` 
 |An upper bound on the number of buckets.
 |`size_type bucket_size(size_type n) const` 
 |The number of elements in bucket `n`.
@@ -69,14 +83,14 @@ keep collisions to a minimum.
 == Controlling the number of buckets
 As more elements are added to an unordered associative container, the number
-of elements in the buckets will increase causing performance to degrade.
+of collisions will increase causing performance to degrade.
 To combat this the containers increase the bucket count as elements are inserted.
 You can also tell the container to change the bucket count (if required) by
 calling `rehash`.
 The standard leaves a lot of freedom to the implementer to decide how the
 number of buckets is chosen, but it does make some requirements based on the
-container's 'load factor', the average number of elements per bucket.
+container's 'load factor', the number of elements divided by the number of buckets.
 Containers also have a 'maximum load factor' which they should try to keep the
 load factor below.
@@ -97,7 +111,8 @@ or close to the hint - unless your hint is unreasonably small or large.
 [caption=, title='Table {counter:table-counter}. Methods for Controlling Bucket Size']
 [cols="1,.^1", frame=all, grid=rows]
 |===
-|Method |Description
+2+^h| *All containers*
 h|*Method* h|*Description*
 |`X(size_type n)` 
 |Construct an empty container with at least `n` buckets (`X` is the container type).
@@ -112,22 +127,45 @@ or close to the hint - unless your hint is unreasonably small or large.
 |Returns the current maximum load factor.
 |`float max_load_factor(float z)`
-|Changes the container's maximum load factor, using `z` as a hint.
+|Changes the container's maximum load factor, using `z` as a hint. +
 **Open-addressing containers:** this function does nothing: users are not allowed to change the maximum load factor.
 |`void rehash(size_type n)`
 |Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
 2+^h| *Open-addressing containers only* +
 `boost::unordered_flat_set`, `boost::unordered_flat_map` 
 h|*Method* h|*Description*
 |`size_type max_load() const`
 |Returns the maximum number of allowed elements in the container before rehash.
 |===
 A note on `max_load` for open-addressing containers: the maximum load will naturally decrease when
 new insertions are performed, but _won't_ increase at the same rate when erasing: for instance,
 adding 1,000 elements to a <<unordered_flat_map,`boost::unordered_flat_map`>> and then
 erasing those 1,000 elements will typically reduce the maximum load by around 160 rather
 than restoring it to its original value. This is done internally by Boost.Unordered in order
 to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
 The maximum load will be reset to its theoretical maximum
 (`max_load_factor() * bucket_count()`) right after `rehash`.
 == Iterator Invalidation
 It is not specified how member functions other than `rehash` and `reserve` affect
-the bucket count, although `insert` is only allowed to invalidate iterators
+the bucket count, although `insert` can only invalidate iterators
-when the insertion causes the load factor to be greater than or equal to the
+when the insertion causes the container's load to be greater than the maximum allowed.
-maximum load factor. For most implementations this means that `insert` will only
+For most implementations this means that `insert` will only
-change the number of buckets when this happens. While iterators can be
+change the number of buckets when this happens. Iterators can be
-invalidated by calls to `insert`, `rehash` and `reserve`, pointers and references to the
+invalidated by calls to `insert`, `rehash` and `reserve`.
-container's elements are never invalidated.
+
 As for pointers and references,
 they are never invalidated for closed-addressing containers (`boost::unordered_[multi]set`, `boost::unordered_[multi]map`),
 but they will when rehashing occurs for open-addressing
 `boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
 these containers store elements directly into their holding buckets, so
 when allocating a new bucket array the elements must be transferred by means of move construction.
 In a similar manner to using `reserve` for ``vector``s, it can be a good idea
 to call `reserve` before inserting a large number of elements. This will get
--- a/doc/unordered/comparison.adoc
+++ b/doc/unordered/comparison.adoc
@@ -25,19 +25,22 @@
 |No equivalent. Since the elements aren't ordered `lower_bound` and `upper_bound` would be meaningless.
 |`equal_range(k)` returns an empty range at the position that `k` would be inserted if `k` isn't present in the container.
-|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
+|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. +
 **Closed-addressing containers:** To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
 |`iterator`, `const_iterator` are of the bidirectional category.
 |`iterator`, `const_iterator` are of at least the forward category.
 |Iterators, pointers and references to the container's elements are never invalidated.
-|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. Pointers and references to the container's elements are never invalidated.
+|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
 **Closed-addressing containers:** Pointers and references to the container's elements are never invalidated. +
 **Open-addressing containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
 |Iterators iterate through the container in the order defined by the comparison object.
 |Iterators iterate through the container in an arbitrary order, that can change as elements are inserted, although equivalent elements are always adjacent.
 |No equivalent
-|Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
+|**Closed-addressing containers:** Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
 |Can be compared using the `==`, `!=`, `<`, `\<=`, `>`, `>=` operators.
 |Can be compared using the `==` and `!=` operators.
@@ -45,9 +48,6 @@
 |
 |When inserting with a hint, implementations are permitted to ignore the hint.
 |`erase` never throws an exception
 |The containers' hash or predicate function can throw exceptions from `erase`.
 |===
 ---
--- a/doc/unordered/compliance.adoc
+++ b/doc/unordered/compliance.adoc
@@ -5,13 +5,15 @@
 :cpp: C++
 == Closed-addressing containers: unordered_[multi]set, unordered_[multi]map
 The intent of Boost.Unordered is to implement a close (but imperfect)
 implementation of the {cpp}17 standard, that will work with {cpp}98 upwards.
 The wide compatibility does mean some comprimises have to be made.
 With a compiler and library that fully support {cpp}11, the differences should
 be minor.
-== Move emulation
+=== Move emulation
 Support for move semantics is implemented using Boost.Move. If rvalue
 references are available it will use them, but if not it uses a close,
@@ -23,7 +25,7 @@ but imperfect emulation. On such compilers:
 * The containers themselves are not movable.
 * Argument forwarding is not perfect.
-== Use of allocators
+=== Use of allocators
 {cpp}11 introduced a new allocator system. It's backwards compatible due to
 the lax requirements for allocators in the old standard, but might need
@@ -56,7 +58,7 @@ Due to imperfect move emulation, some assignments might check
 `propagate_on_container_copy_assignment` on some compilers and
 `propagate_on_container_move_assignment` on others.
-== Construction/Destruction using allocators
+=== Construction/Destruction using allocators
 The following support is required for full use of {cpp}11 style
 construction/destruction:
@@ -76,7 +78,7 @@ constructing a `std::pair` using `boost::tuple` (see <<compliance_pairs,below>>)
 When support is not available `allocator_traits::construct` and
 `allocator_traits::destroy` are never called.
-== Pointer Traits
+=== Pointer Traits
 `pointer_traits` aren't used. Instead, pointer types are obtained from
 rebound allocators, this can cause problems if the allocator can't be
@@ -84,7 +86,7 @@ used with incomplete types. If `const_pointer` is not defined in the
 allocator, `boost::pointer_to_other<pointer, const value_type>::type`
 is used to obtain a const pointer.
-== Pairs
+=== Pairs
 Since the containers use `std::pair` they're limited to the version
 from the current standard library. But since {cpp}11 ``std::pair``'s
@@ -105,7 +107,7 @@ Older drafts of the standard also supported variadic constructors
 for `std::pair`, where the first argument would be used for the
 first part of the pair, and the remaining for the second part.
-== Miscellaneous
+=== Miscellaneous
 When swapping, `Pred` and `Hash` are not currently swapped by calling
 `swap`, their copy constructors are used. As a consequence when swapping
@@ -114,3 +116,28 @@ an exception may be thrown from their copy constructor.
 Variadic constructor arguments for `emplace` are only used when both
 rvalue references and variadic template parameters are available.
 Otherwise `emplace` can only take up to 10 constructors arguments.
 == Open-addressing containers: unordered_flat_set, unordered_flat_map
 The C++ standard does not currently provide any open-addressing container
 specification to adhere to, so `boost::unordered_flat_set` and
 `boost::unordered_flat_map` take inspiration from `std::unordered_set` and
 `std::unordered_map`, respectively, and depart from their interface where
 convenient or as dictated by their internal data structure, which is
 radically different from that imposed by the standard (closed addressing, node based).
 `unordered_flat_set` and `unordered_flat_map` only work with reasonably
 compliant C++11 (or later) compilers. Language-level features such as move semantics
 and variadic template parameters are then not emulated. 
 `unordered_flat_set` and `unordered_flat_map` are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^].
 The main differences with C++ unordered associative containers are:
 * `value_type` must be move-constructible.
 * Pointer stability is not kept under rehashing.
 * `begin()` is not constant-time.
 * `erase(iterator)` returns `void` instead of an iterator to the following element.
 * There is no API for bucket handling (except `bucket_count`) or node extraction/insertion.
 * The maximum load factor of the container is managed internally and can't be set by the user. The maximum load,
 exposed through the public function `max_load`, can not increase monotonically with the number of erasures.
--- a/doc/unordered/copyright.adoc
+++ b/doc/unordered/copyright.adoc
@@ -11,4 +11,8 @@ Copyright (C) 2005-2008 Daniel James
 Copyright (C) 2022 Christian Mazakas
 Copyright (C) 2022 Joaqu&iacute;n M L&oacute;pez Mu&ntilde;oz
 Copyright (C) 2022 Peter Dimov
 Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
--- a/doc/unordered/hash_traits.adoc
+++ b/doc/unordered/hash_traits.adoc
@@ -29,14 +29,14 @@ struct hash_is_avalanching;
 A hash function is said to have the _avalanching property_ if small changes in the input translate to
 large changes in the returned hash code &#8212;ideally, flipping one bit in the representation of
-the input value results in each bit of the hash code flipping with probability 50%. This property is
+the input value results in each bit of the hash code flipping with probability 50%. Approaching
-critical for the proper behavior of open-addressing hash containers.
+this property is critical for the proper behavior of open-addressing hash containers.
-`hash_is_avalanching<Hash>` derives from `std::true_type` if `Hash::is_avalanching` is a valid type,
+`hash_is_avalanching<Hash>::value` is `true` if `Hash::is_avalanching` is a valid type,
-and derives from `std::false_type` otherwise.
+and `false` otherwise.
 Users can then declare a hash function `Hash` as avalanching either by embedding an `is_avalanching` typedef
-into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to derive from
+into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to a class with
-`std::true_type`.
+an embedded compile-time constant `value` set to `true`.
 xref:unordered_flat_set[`boost::unordered_flat_set`] and xref:unordered_flat_map[`boost::unordered_flat_map`]
 use the provided hash function `Hash` as-is if `hash_is_avalanching<Hash>::value` is `true`; otherwise, they
--- a/doc/unordered/intro.adoc
+++ b/doc/unordered/intro.adoc
@@ -18,12 +18,12 @@ or isn't practical. In contrast, a hash table only needs an equality function
 and a hash function for the key.
 With this in mind, unordered associative containers were added to the {cpp}
-standard. This is an implementation of the containers described in {cpp}11,
+standard. Boost.Unordered provides an implementation of the containers described in {cpp}11,
 with some <<compliance,deviations from the standard>> in
 order to work with non-{cpp}11 compilers and libraries.
 `unordered_set` and `unordered_multiset` are defined in the header
-`<boost/unordered_set.hpp>`
+`<boost/unordered/unordered_set.hpp>`
 [source,c++]
 ----  
 namespace boost {
@@ -44,7 +44,7 @@ namespace boost {
 ----
 `unordered_map` and `unordered_multimap` are defined in the header
-`<boost/unordered_map.hpp>`
+`<boost/unordered/unordered_map.hpp>`
 [source,c++]
 ----
@@ -65,10 +65,51 @@ namespace boost {
 }
 ----
-When using Boost.TR1, these classes are included from `<unordered_set>` and
+These containers, and all other implementations of standard unordered associative
-`<unordered_map>`, with the classes added to the `std::tr1` namespace.
+containers, use an approach to its internal data structure design called
 *closed addressing*. Starting in Boost 1.81, Boost.Unordered also provides containers
 `boost::unordered_flat_set` and `boost::unordered_flat_map`, which use a
 different data structure strategy commonly known as *open addressing* and depart in
 a small number of ways from the standard so as to offer much better performance
 in exchange (more than 2 times faster in typical scenarios):
-The containers are used in a similar manner to the normal associative
+
 [source,c++]
 ----
 // #include <boost/unordered/unordered_flat_set.hpp>
 //
 // Note: no multiset version
 namespace boost {
    template <
        class Key,
        class Hash = boost::hash<Key>,
        class Pred = std::equal_to<Key>,
        class Alloc = std::allocator<Key> >
    class unordered_flat_set;
 }
 ----
 [source,c++]
 ----
 // #include <boost/unordered/unordered_flat_map.hpp>
 //
 // Note: no multimap version
 namespace boost {
    template <
        class Key, class Mapped,
        class Hash = boost::hash<Key>,
        class Pred = std::equal_to<Key>,
        class Alloc = std::allocator<std::pair<Key const, Mapped> > >
    class unordered_flat_map;
 }
 ----
 `boost::unordered_flat_set` and `boost::unordered_flat_map` require a
 reasonably compliant C++11 compiler.
 Boost.Unordered containers are used in a similar manner to the normal associative
 containers:
 [source,cpp]
@@ -87,7 +128,7 @@ But since the elements aren't ordered, the output of:
 [source,c++]
 ----
-BOOST_FOREACH(map::value_type i, x) {
+for(const map::value_type& i: x) {
    std::cout<<i.first<<","<<i.second<<"\n";
 }
 ----
--- a/doc/unordered/rationale.adoc
+++ b/doc/unordered/rationale.adoc
@@ -4,15 +4,17 @@
 = Implementation Rationale
-The intent of this library is to implement the unordered
+== boost::unordered_[multi]set and boost::unordered_[multi]map
-containers in the standard, so the interface was fixed. But there are
+
 These containers adhere to the standard requirements for unordered associative
 containers, so the interface was fixed. But there are
 still some implementation decisions to make. The priorities are
 conformance to the standard and portability.
 The http://en.wikipedia.org/wiki/Hash_table[Wikipedia article on hash tables^]
 has a good summary of the implementation issues for hash tables in general.
-== Data Structure
+=== Data Structure
 By specifying an interface for accessing the buckets of the container the
 standard pretty much requires that the hash table uses chained addressing.
@@ -37,7 +39,7 @@ bucket but there are some serious problems with this:
 So chained addressing is used.
-== Number of Buckets
+=== Number of Buckets
 There are two popular methods for choosing the number of buckets in a hash
 table. One is to have a prime number of buckets, another is to use a power
@@ -70,3 +72,44 @@ distribution.
 Since release 1.80.0, prime numbers are chosen for the number of buckets in
 tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
 the result of the user's hash function as was used for release 1.79.0.
 == boost::unordered_flat_set and boost::unordered_flat_map
 The C++ standard specification of unordered associative containers impose
 severe limitations on permissible implementations, the most important being
 that closed addressing is implicitly assumed. Slightly relaxing this specification
 opens up the possibility of providing container variations taking full
 advantage of open-addressing techniques.
 The design of `boost::unordered_flat_set` and `boost::unordered_flat_map` has been
 guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
 We discuss here the most relevant principles.
 === Hash function
 Given its rich functionality and cross-platform interoperability,
 `boost::hash` remains the default hash function of `boost::unordered_flat_set` and `boost::unordered_flat_map`.
 As it happens, `boost::hash` for integral and other basic types does not provide
 the good statistical properties required by open addressing; to cope with this,
 we implement a post-mixing stage:
 *  64-bit architectures: we use the `xmx` function defined in
 Jon Maiga's http://jonkagstrom.com/bit-mixer-construction/index.html[The construct of a bit mixer^].
 *  32-bit architectures: the mixer used was selected from a set generated with https://github.com/skeeto/hash-prospector[Hash Function Prospector^]
 as the best overall performer in our internal benchmarks. Score assigned by Hash Prospector is 333.7934929677524.
 When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
 `boost::hash` specializations for string types are marked as avalanching.
 === Platform interoperability
 The observable behavior of `boost::unordered_flat_set` and `boost::unordered_flat_map` is deterministically
 identical across different compilers as long as their ``std::size_type``s are the same size and the user-provided
 hash function and equality predicate are also interoperable
 &#8212;this includes elements being ordered in exactly the same way for the same sequence of
 operations.
 Although the implementation internally uses SIMD technologies, such as https://en.wikipedia.org/wiki/SSE2[SSE2^]
 and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[Neon^], when available,
 this does not affect interoperatility. For instance, the behavior is the same
 for Visual Studio on an Intel CPU with SSE2 in x64 and for GCC on an IBM s390x without any supported SIMD technology.