forked from boostorg/unordered
uploaded current status
This commit is contained in:
@ -5,7 +5,7 @@
|
||||
= The Data Structure
|
||||
|
||||
The containers are made up of a number of 'buckets', each of which can contain
|
||||
any number of elements. For example, the following diagram shows an <<unordered_set,unordered_set>> with 7 buckets containing 5 elements, `A`,
|
||||
any number of elements. For example, the following diagram shows a <<unordered_set,`boost::unordered_set`>> with 7 buckets containing 5 elements, `A`,
|
||||
`B`, `C`, `D` and `E` (this is just for illustration, containers will typically
|
||||
have more buckets).
|
||||
|
||||
@ -31,20 +31,34 @@ equality predicates in the next section>>.
|
||||
|
||||
You can see in the diagram that `A` & `D` have been placed in the same bucket.
|
||||
When looking for elements in this bucket up to 2 comparisons are made, making
|
||||
the search slower. This is known as a collision. To keep things fast we try to
|
||||
the search slower. This is known as a *collision*. To keep things fast we try to
|
||||
keep collisions to a minimum.
|
||||
|
||||
If instead of `boost::unordered_set` we had used <<unordered_flat_set,`boost::unordered_flat_set`>>, the
|
||||
diagram would look as follows:
|
||||
|
||||
image::buckets oa.png[]
|
||||
|
||||
In open-addressing containers, buckets can hold at most one element; if a collision happens
|
||||
(like is the case of `D` in the example), the element uses some other available bucket in
|
||||
the vicinity of the original position. Given this simpler scenario, Boost.Unordered
|
||||
open-addressing containers offer a very limited API for accessing buckets.
|
||||
|
||||
[caption=, title='Table {counter:table-counter}. Methods for Accessing Buckets']
|
||||
[cols="1,.^1", frame=all, grid=rows]
|
||||
|===
|
||||
|Method |Description
|
||||
2+^h| *All containers*
|
||||
h|*Method* h|*Description*
|
||||
|
||||
|`size_type bucket_count() const`
|
||||
|The number of buckets.
|
||||
|
||||
2+^h| *Closed-addressing containers only* +
|
||||
`boost::unordered_[multi]set`, `boost::unordered_[multi]map`
|
||||
h|*Method* h|*Description*
|
||||
|
||||
|`size_type max_bucket_count() const`
|
||||
|An upper bound on the number of buckets.
|
||||
|
||||
|`size_type bucket_size(size_type n) const`
|
||||
|The number of elements in bucket `n`.
|
||||
|
||||
@ -69,14 +83,14 @@ keep collisions to a minimum.
|
||||
== Controlling the number of buckets
|
||||
|
||||
As more elements are added to an unordered associative container, the number
|
||||
of elements in the buckets will increase causing performance to degrade.
|
||||
of collisions will increase causing performance to degrade.
|
||||
To combat this the containers increase the bucket count as elements are inserted.
|
||||
You can also tell the container to change the bucket count (if required) by
|
||||
calling `rehash`.
|
||||
|
||||
The standard leaves a lot of freedom to the implementer to decide how the
|
||||
number of buckets is chosen, but it does make some requirements based on the
|
||||
container's 'load factor', the average number of elements per bucket.
|
||||
container's 'load factor', the number of elements divided by the number of buckets.
|
||||
Containers also have a 'maximum load factor' which they should try to keep the
|
||||
load factor below.
|
||||
|
||||
@ -97,7 +111,8 @@ or close to the hint - unless your hint is unreasonably small or large.
|
||||
[caption=, title='Table {counter:table-counter}. Methods for Controlling Bucket Size']
|
||||
[cols="1,.^1", frame=all, grid=rows]
|
||||
|===
|
||||
|Method |Description
|
||||
2+^h| *All containers*
|
||||
h|*Method* h|*Description*
|
||||
|
||||
|`X(size_type n)`
|
||||
|Construct an empty container with at least `n` buckets (`X` is the container type).
|
||||
@ -112,22 +127,45 @@ or close to the hint - unless your hint is unreasonably small or large.
|
||||
|Returns the current maximum load factor.
|
||||
|
||||
|`float max_load_factor(float z)`
|
||||
|Changes the container's maximum load factor, using `z` as a hint.
|
||||
|Changes the container's maximum load factor, using `z` as a hint. +
|
||||
**Open-addressing containers:** this function does nothing: users are not allowed to change the maximum load factor.
|
||||
|
||||
|`void rehash(size_type n)`
|
||||
|Changes the number of buckets so that there at least `n` buckets, and so that the load factor is less than the maximum load factor.
|
||||
|
||||
2+^h| *Open-addressing containers only* +
|
||||
`boost::unordered_flat_set`, `boost::unordered_flat_map`
|
||||
h|*Method* h|*Description*
|
||||
|
||||
|`size_type max_load() const`
|
||||
|Returns the maximum number of allowed elements in the container before rehash.
|
||||
|
||||
|===
|
||||
|
||||
A note on `max_load` for open-addressing containers: the maximum load will naturally decrease when
|
||||
new insertions are performed, but _won't_ increase at the same rate when erasing: for instance,
|
||||
adding 1,000 elements to a <<unordered_flat_map,`boost::unordered_flat_map`>> and then
|
||||
erasing those 1,000 elements will typically reduce the maximum load by around 160 rather
|
||||
than restoring it to its original value. This is done internally by Boost.Unordered in order
|
||||
to keep its performance stable, and must be taken into account when planning for rehash-free insertions.
|
||||
The maximum load will be reset to its theoretical maximum
|
||||
(`max_load_factor() * bucket_count()`) right after `rehash`.
|
||||
|
||||
== Iterator Invalidation
|
||||
|
||||
It is not specified how member functions other than `rehash` and `reserve` affect
|
||||
the bucket count, although `insert` is only allowed to invalidate iterators
|
||||
when the insertion causes the load factor to be greater than or equal to the
|
||||
maximum load factor. For most implementations this means that `insert` will only
|
||||
change the number of buckets when this happens. While iterators can be
|
||||
invalidated by calls to `insert`, `rehash` and `reserve`, pointers and references to the
|
||||
container's elements are never invalidated.
|
||||
the bucket count, although `insert` can only invalidate iterators
|
||||
when the insertion causes the container's load to be greater than the maximum allowed.
|
||||
For most implementations this means that `insert` will only
|
||||
change the number of buckets when this happens. Iterators can be
|
||||
invalidated by calls to `insert`, `rehash` and `reserve`.
|
||||
|
||||
As for pointers and references,
|
||||
they are never invalidated for closed-addressing containers (`boost::unordered_[multi]set`, `boost::unordered_[multi]map`),
|
||||
but they will when rehashing occurs for open-addressing
|
||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`: this is because
|
||||
these containers store elements directly into their holding buckets, so
|
||||
when allocating a new bucket array the elements must be transferred by means of move construction.
|
||||
|
||||
In a similar manner to using `reserve` for ``vector``s, it can be a good idea
|
||||
to call `reserve` before inserting a large number of elements. This will get
|
||||
|
@ -25,19 +25,22 @@
|
||||
|No equivalent. Since the elements aren't ordered `lower_bound` and `upper_bound` would be meaningless.
|
||||
|
||||
|`equal_range(k)` returns an empty range at the position that `k` would be inserted if `k` isn't present in the container.
|
||||
|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
|
||||
|`equal_range(k)` returns a range at the end of the container if `k` isn't present in the container. It can't return a positioned range as `k` could be inserted into multiple place. +
|
||||
**Closed-addressing containers:** To find out the bucket that `k` would be inserted into use `bucket(k)`. But remember that an insert can cause the container to rehash - meaning that the element can be inserted into a different bucket.
|
||||
|
||||
|`iterator`, `const_iterator` are of the bidirectional category.
|
||||
|`iterator`, `const_iterator` are of at least the forward category.
|
||||
|
||||
|Iterators, pointers and references to the container's elements are never invalidated.
|
||||
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. Pointers and references to the container's elements are never invalidated.
|
||||
|<<buckets_iterator_invalidation,Iterators can be invalidated by calls to insert or rehash>>. +
|
||||
**Closed-addressing containers:** Pointers and references to the container's elements are never invalidated. +
|
||||
**Open-addressing containers:** Pointers and references to the container's elements are invalidated when rehashing occurs.
|
||||
|
||||
|Iterators iterate through the container in the order defined by the comparison object.
|
||||
|Iterators iterate through the container in an arbitrary order, that can change as elements are inserted, although equivalent elements are always adjacent.
|
||||
|
||||
|No equivalent
|
||||
|Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
|
||||
|**Closed-addressing containers:** Local iterators can be used to iterate through individual buckets. (The order of local iterators and iterators aren't required to have any correspondence.)
|
||||
|
||||
|Can be compared using the `==`, `!=`, `<`, `\<=`, `>`, `>=` operators.
|
||||
|Can be compared using the `==` and `!=` operators.
|
||||
@ -45,9 +48,6 @@
|
||||
|
|
||||
|When inserting with a hint, implementations are permitted to ignore the hint.
|
||||
|
||||
|`erase` never throws an exception
|
||||
|The containers' hash or predicate function can throw exceptions from `erase`.
|
||||
|
||||
|===
|
||||
|
||||
---
|
||||
|
@ -5,13 +5,15 @@
|
||||
|
||||
:cpp: C++
|
||||
|
||||
== Closed-addressing containers: unordered_[multi]set, unordered_[multi]map
|
||||
|
||||
The intent of Boost.Unordered is to implement a close (but imperfect)
|
||||
implementation of the {cpp}17 standard, that will work with {cpp}98 upwards.
|
||||
The wide compatibility does mean some comprimises have to be made.
|
||||
With a compiler and library that fully support {cpp}11, the differences should
|
||||
be minor.
|
||||
|
||||
== Move emulation
|
||||
=== Move emulation
|
||||
|
||||
Support for move semantics is implemented using Boost.Move. If rvalue
|
||||
references are available it will use them, but if not it uses a close,
|
||||
@ -23,7 +25,7 @@ but imperfect emulation. On such compilers:
|
||||
* The containers themselves are not movable.
|
||||
* Argument forwarding is not perfect.
|
||||
|
||||
== Use of allocators
|
||||
=== Use of allocators
|
||||
|
||||
{cpp}11 introduced a new allocator system. It's backwards compatible due to
|
||||
the lax requirements for allocators in the old standard, but might need
|
||||
@ -56,7 +58,7 @@ Due to imperfect move emulation, some assignments might check
|
||||
`propagate_on_container_copy_assignment` on some compilers and
|
||||
`propagate_on_container_move_assignment` on others.
|
||||
|
||||
== Construction/Destruction using allocators
|
||||
=== Construction/Destruction using allocators
|
||||
|
||||
The following support is required for full use of {cpp}11 style
|
||||
construction/destruction:
|
||||
@ -76,7 +78,7 @@ constructing a `std::pair` using `boost::tuple` (see <<compliance_pairs,below>>)
|
||||
When support is not available `allocator_traits::construct` and
|
||||
`allocator_traits::destroy` are never called.
|
||||
|
||||
== Pointer Traits
|
||||
=== Pointer Traits
|
||||
|
||||
`pointer_traits` aren't used. Instead, pointer types are obtained from
|
||||
rebound allocators, this can cause problems if the allocator can't be
|
||||
@ -84,7 +86,7 @@ used with incomplete types. If `const_pointer` is not defined in the
|
||||
allocator, `boost::pointer_to_other<pointer, const value_type>::type`
|
||||
is used to obtain a const pointer.
|
||||
|
||||
== Pairs
|
||||
=== Pairs
|
||||
|
||||
Since the containers use `std::pair` they're limited to the version
|
||||
from the current standard library. But since {cpp}11 ``std::pair``'s
|
||||
@ -105,7 +107,7 @@ Older drafts of the standard also supported variadic constructors
|
||||
for `std::pair`, where the first argument would be used for the
|
||||
first part of the pair, and the remaining for the second part.
|
||||
|
||||
== Miscellaneous
|
||||
=== Miscellaneous
|
||||
|
||||
When swapping, `Pred` and `Hash` are not currently swapped by calling
|
||||
`swap`, their copy constructors are used. As a consequence when swapping
|
||||
@ -114,3 +116,28 @@ an exception may be thrown from their copy constructor.
|
||||
Variadic constructor arguments for `emplace` are only used when both
|
||||
rvalue references and variadic template parameters are available.
|
||||
Otherwise `emplace` can only take up to 10 constructors arguments.
|
||||
|
||||
== Open-addressing containers: unordered_flat_set, unordered_flat_map
|
||||
|
||||
The C++ standard does not currently provide any open-addressing container
|
||||
specification to adhere to, so `boost::unordered_flat_set` and
|
||||
`boost::unordered_flat_map` take inspiration from `std::unordered_set` and
|
||||
`std::unordered_map`, respectively, and depart from their interface where
|
||||
convenient or as dictated by their internal data structure, which is
|
||||
radically different from that imposed by the standard (closed addressing, node based).
|
||||
|
||||
`unordered_flat_set` and `unordered_flat_map` only work with reasonably
|
||||
compliant C++11 (or later) compilers. Language-level features such as move semantics
|
||||
and variadic template parameters are then not emulated.
|
||||
`unordered_flat_set` and `unordered_flat_map` are fully https://en.cppreference.com/w/cpp/named_req/AllocatorAwareContainer[AllocatorAware^].
|
||||
|
||||
The main differences with C++ unordered associative containers are:
|
||||
|
||||
* `value_type` must be move-constructible.
|
||||
* Pointer stability is not kept under rehashing.
|
||||
* `begin()` is not constant-time.
|
||||
* `erase(iterator)` returns `void` instead of an iterator to the following element.
|
||||
* There is no API for bucket handling (except `bucket_count`) or node extraction/insertion.
|
||||
* The maximum load factor of the container is managed internally and can't be set by the user. The maximum load,
|
||||
exposed through the public function `max_load`, can not increase monotonically with the number of erasures.
|
||||
|
||||
|
@ -11,4 +11,8 @@ Copyright (C) 2005-2008 Daniel James
|
||||
|
||||
Copyright (C) 2022 Christian Mazakas
|
||||
|
||||
Copyright (C) 2022 Joaquín M López Muñoz
|
||||
|
||||
Copyright (C) 2022 Peter Dimov
|
||||
|
||||
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
||||
|
@ -29,14 +29,14 @@ struct hash_is_avalanching;
|
||||
|
||||
A hash function is said to have the _avalanching property_ if small changes in the input translate to
|
||||
large changes in the returned hash code —ideally, flipping one bit in the representation of
|
||||
the input value results in each bit of the hash code flipping with probability 50%. This property is
|
||||
critical for the proper behavior of open-addressing hash containers.
|
||||
the input value results in each bit of the hash code flipping with probability 50%. Approaching
|
||||
this property is critical for the proper behavior of open-addressing hash containers.
|
||||
|
||||
`hash_is_avalanching<Hash>` derives from `std::true_type` if `Hash::is_avalanching` is a valid type,
|
||||
and derives from `std::false_type` otherwise.
|
||||
`hash_is_avalanching<Hash>::value` is `true` if `Hash::is_avalanching` is a valid type,
|
||||
and `false` otherwise.
|
||||
Users can then declare a hash function `Hash` as avalanching either by embedding an `is_avalanching` typedef
|
||||
into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to derive from
|
||||
`std::true_type`.
|
||||
into the definition of `Hash`, or directly by specializing `hash_is_avalanching<Hash>` to a class with
|
||||
an embedded compile-time constant `value` set to `true`.
|
||||
|
||||
xref:unordered_flat_set[`boost::unordered_flat_set`] and xref:unordered_flat_map[`boost::unordered_flat_map`]
|
||||
use the provided hash function `Hash` as-is if `hash_is_avalanching<Hash>::value` is `true`; otherwise, they
|
||||
|
@ -18,12 +18,12 @@ or isn't practical. In contrast, a hash table only needs an equality function
|
||||
and a hash function for the key.
|
||||
|
||||
With this in mind, unordered associative containers were added to the {cpp}
|
||||
standard. This is an implementation of the containers described in {cpp}11,
|
||||
standard. Boost.Unordered provides an implementation of the containers described in {cpp}11,
|
||||
with some <<compliance,deviations from the standard>> in
|
||||
order to work with non-{cpp}11 compilers and libraries.
|
||||
|
||||
`unordered_set` and `unordered_multiset` are defined in the header
|
||||
`<boost/unordered_set.hpp>`
|
||||
`<boost/unordered/unordered_set.hpp>`
|
||||
[source,c++]
|
||||
----
|
||||
namespace boost {
|
||||
@ -44,7 +44,7 @@ namespace boost {
|
||||
----
|
||||
|
||||
`unordered_map` and `unordered_multimap` are defined in the header
|
||||
`<boost/unordered_map.hpp>`
|
||||
`<boost/unordered/unordered_map.hpp>`
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
@ -65,10 +65,51 @@ namespace boost {
|
||||
}
|
||||
----
|
||||
|
||||
When using Boost.TR1, these classes are included from `<unordered_set>` and
|
||||
`<unordered_map>`, with the classes added to the `std::tr1` namespace.
|
||||
These containers, and all other implementations of standard unordered associative
|
||||
containers, use an approach to its internal data structure design called
|
||||
*closed addressing*. Starting in Boost 1.81, Boost.Unordered also provides containers
|
||||
`boost::unordered_flat_set` and `boost::unordered_flat_map`, which use a
|
||||
different data structure strategy commonly known as *open addressing* and depart in
|
||||
a small number of ways from the standard so as to offer much better performance
|
||||
in exchange (more than 2 times faster in typical scenarios):
|
||||
|
||||
The containers are used in a similar manner to the normal associative
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
// #include <boost/unordered/unordered_flat_set.hpp>
|
||||
//
|
||||
// Note: no multiset version
|
||||
|
||||
namespace boost {
|
||||
template <
|
||||
class Key,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<Key> >
|
||||
class unordered_flat_set;
|
||||
}
|
||||
----
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
// #include <boost/unordered/unordered_flat_map.hpp>
|
||||
//
|
||||
// Note: no multimap version
|
||||
|
||||
namespace boost {
|
||||
template <
|
||||
class Key, class Mapped,
|
||||
class Hash = boost::hash<Key>,
|
||||
class Pred = std::equal_to<Key>,
|
||||
class Alloc = std::allocator<std::pair<Key const, Mapped> > >
|
||||
class unordered_flat_map;
|
||||
}
|
||||
----
|
||||
|
||||
`boost::unordered_flat_set` and `boost::unordered_flat_map` require a
|
||||
reasonably compliant C++11 compiler.
|
||||
|
||||
Boost.Unordered containers are used in a similar manner to the normal associative
|
||||
containers:
|
||||
|
||||
[source,cpp]
|
||||
@ -87,7 +128,7 @@ But since the elements aren't ordered, the output of:
|
||||
|
||||
[source,c++]
|
||||
----
|
||||
BOOST_FOREACH(map::value_type i, x) {
|
||||
for(const map::value_type& i: x) {
|
||||
std::cout<<i.first<<","<<i.second<<"\n";
|
||||
}
|
||||
----
|
||||
|
@ -4,15 +4,17 @@
|
||||
|
||||
= Implementation Rationale
|
||||
|
||||
The intent of this library is to implement the unordered
|
||||
containers in the standard, so the interface was fixed. But there are
|
||||
== boost::unordered_[multi]set and boost::unordered_[multi]map
|
||||
|
||||
These containers adhere to the standard requirements for unordered associative
|
||||
containers, so the interface was fixed. But there are
|
||||
still some implementation decisions to make. The priorities are
|
||||
conformance to the standard and portability.
|
||||
|
||||
The http://en.wikipedia.org/wiki/Hash_table[Wikipedia article on hash tables^]
|
||||
has a good summary of the implementation issues for hash tables in general.
|
||||
|
||||
== Data Structure
|
||||
=== Data Structure
|
||||
|
||||
By specifying an interface for accessing the buckets of the container the
|
||||
standard pretty much requires that the hash table uses chained addressing.
|
||||
@ -37,7 +39,7 @@ bucket but there are some serious problems with this:
|
||||
|
||||
So chained addressing is used.
|
||||
|
||||
== Number of Buckets
|
||||
=== Number of Buckets
|
||||
|
||||
There are two popular methods for choosing the number of buckets in a hash
|
||||
table. One is to have a prime number of buckets, another is to use a power
|
||||
@ -70,3 +72,44 @@ distribution.
|
||||
Since release 1.80.0, prime numbers are chosen for the number of buckets in
|
||||
tandem with sophisticated modulo arithmetic. This removes the need for "mixing"
|
||||
the result of the user's hash function as was used for release 1.79.0.
|
||||
|
||||
== boost::unordered_flat_set and boost::unordered_flat_map
|
||||
|
||||
The C++ standard specification of unordered associative containers impose
|
||||
severe limitations on permissible implementations, the most important being
|
||||
that closed addressing is implicitly assumed. Slightly relaxing this specification
|
||||
opens up the possibility of providing container variations taking full
|
||||
advantage of open-addressing techniques.
|
||||
|
||||
The design of `boost::unordered_flat_set` and `boost::unordered_flat_map` has been
|
||||
guided by Peter Dimov's https://pdimov.github.io/articles/unordered_dev_plan.html[Development Plan for Boost.Unordered^].
|
||||
We discuss here the most relevant principles.
|
||||
|
||||
=== Hash function
|
||||
|
||||
Given its rich functionality and cross-platform interoperability,
|
||||
`boost::hash` remains the default hash function of `boost::unordered_flat_set` and `boost::unordered_flat_map`.
|
||||
As it happens, `boost::hash` for integral and other basic types does not provide
|
||||
the good statistical properties required by open addressing; to cope with this,
|
||||
we implement a post-mixing stage:
|
||||
|
||||
* 64-bit architectures: we use the `xmx` function defined in
|
||||
Jon Maiga's http://jonkagstrom.com/bit-mixer-construction/index.html[The construct of a bit mixer^].
|
||||
* 32-bit architectures: the mixer used was selected from a set generated with https://github.com/skeeto/hash-prospector[Hash Function Prospector^]
|
||||
as the best overall performer in our internal benchmarks. Score assigned by Hash Prospector is 333.7934929677524.
|
||||
|
||||
When using a hash function directly suitable for open addressing, post-mixing can be opted out by via a dedicated <<hash_traits_hash_is_avalanching,`hash_is_avalanching`>>trait.
|
||||
`boost::hash` specializations for string types are marked as avalanching.
|
||||
|
||||
=== Platform interoperability
|
||||
|
||||
The observable behavior of `boost::unordered_flat_set` and `boost::unordered_flat_map` is deterministically
|
||||
identical across different compilers as long as their ``std::size_type``s are the same size and the user-provided
|
||||
hash function and equality predicate are also interoperable
|
||||
—this includes elements being ordered in exactly the same way for the same sequence of
|
||||
operations.
|
||||
|
||||
Although the implementation internally uses SIMD technologies, such as https://en.wikipedia.org/wiki/SSE2[SSE2^]
|
||||
and https://en.wikipedia.org/wiki/ARM_architecture_family#Advanced_SIMD_(NEON)[Neon^], when available,
|
||||
this does not affect interoperatility. For instance, the behavior is the same
|
||||
for Visual Studio on an Intel CPU with SSE2 in x64 and for GCC on an IBM s390x without any supported SIMD technology.
|
||||
|
Reference in New Issue
Block a user