diff --git a/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..75389666 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice non-unique.png new file mode 100644 index 00000000..b40a5f47 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash non-unique 5.png b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash non-unique 5.png new file mode 100644 index 00000000..9aedb625 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash non-unique.png b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash non-unique.png new file mode 100644 index 00000000..3843b6de Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash non-unique.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash.png b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash.png new file mode 100644 index 00000000..7b74a09b Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice norehash.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice.png b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice.png new file mode 100644 index 00000000..d7333431 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/running insertion.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..bf0bd7d2 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice non-unique.png new file mode 100644 index 00000000..dfa05ea8 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice.png b/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice.png new file mode 100644 index 00000000..510cff26 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered erasure.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..f1696a5c Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice non-unique.png new file mode 100644 index 00000000..cdd6094e Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice.png b/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice.png new file mode 100644 index 00000000..6c9bdb1c Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered successful looukp.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..b15214ed Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique.png new file mode 100644 index 00000000..34a7a226 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice.png b/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice.png new file mode 100644 index 00000000..b595ba59 Binary files /dev/null and b/doc/diagrams/benchmarks/clang_libcpp/scattered unsuccessful looukp.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..d23363c5 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice non-unique.png new file mode 100644 index 00000000..9fe81db6 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash non-unique 5.png b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash non-unique 5.png new file mode 100644 index 00000000..805832f8 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash non-unique.png b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash non-unique.png new file mode 100644 index 00000000..0804d890 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash non-unique.png differ diff --git a/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash.png b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash.png new file mode 100644 index 00000000..69009334 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice norehash.png differ diff --git a/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice.png b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice.png new file mode 100644 index 00000000..64dce253 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/running insertion.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..2c916c5a Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice non-unique.png new file mode 100644 index 00000000..33a719ef Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice.png b/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice.png new file mode 100644 index 00000000..e7109cc3 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered erasure.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..0c0d0442 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice non-unique.png new file mode 100644 index 00000000..2404d59a Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice.png b/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice.png new file mode 100644 index 00000000..aa2f2670 Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered successful looukp.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..7baa248e Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice non-unique.png new file mode 100644 index 00000000..1ada54ff Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice.png b/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice.png new file mode 100644 index 00000000..bd2402de Binary files /dev/null and b/doc/diagrams/benchmarks/gcc/scattered unsuccessful looukp.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..b32e8cf2 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice non-unique.png new file mode 100644 index 00000000..c2df113c Binary files /dev/null and b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash non-unique 5.png b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash non-unique 5.png new file mode 100644 index 00000000..4f70d8b0 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash non-unique.png b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash non-unique.png new file mode 100644 index 00000000..f8a1710f Binary files /dev/null and b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash non-unique.png differ diff --git a/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash.png b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash.png new file mode 100644 index 00000000..f9407d3a Binary files /dev/null and b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice norehash.png differ diff --git a/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice.png b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice.png new file mode 100644 index 00000000..05b6ef77 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/running insertion.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..73d1703c Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice non-unique.png new file mode 100644 index 00000000..475e570f Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice.png b/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice.png new file mode 100644 index 00000000..2f2bc13c Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered erasure.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..df655dec Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice non-unique.png new file mode 100644 index 00000000..e9ae4f25 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice.png b/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice.png new file mode 100644 index 00000000..60e85ba5 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered successful looukp.xlsx.practice.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice non-unique 5.png b/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice non-unique 5.png new file mode 100644 index 00000000..6273e2bb Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice non-unique 5.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice non-unique.png b/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice non-unique.png new file mode 100644 index 00000000..2724d755 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice non-unique.png differ diff --git a/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice.png b/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice.png new file mode 100644 index 00000000..31be4e90 Binary files /dev/null and b/doc/diagrams/benchmarks/vs/scattered unsuccessful looukp.xlsx.practice.png differ diff --git a/doc/diagrams/bucket-groups.png b/doc/diagrams/bucket-groups.png new file mode 100644 index 00000000..d9c5e96d Binary files /dev/null and b/doc/diagrams/bucket-groups.png differ diff --git a/doc/diagrams/fca.png b/doc/diagrams/fca.png new file mode 100644 index 00000000..d1ecb63c Binary files /dev/null and b/doc/diagrams/fca.png differ diff --git a/doc/diagrams/singly-linked.png b/doc/diagrams/singly-linked.png new file mode 100644 index 00000000..ae3cf61a Binary files /dev/null and b/doc/diagrams/singly-linked.png differ diff --git a/doc/roadmap.md b/doc/roadmap.md new file mode 100644 index 00000000..88267547 --- /dev/null +++ b/doc/roadmap.md @@ -0,0 +1,188 @@ +# Refactoring Roadmap + +[Proof of concept](https://github.com/joaquintides/fca_unordered) implementation for a fast closed-addressing implementation. + +## Plan of Refactoring + +* remove `ptr_node` and `ptr_bucket` +* see if the code can survive a lack of the `extra_node` or maybe we hard-code it in +* implement bucket groups as they are in `fca` but don't use them directly yet, add alongside the `buckets_` data member in `struct table` +* try to remove `bucket_info_` from the node structure (breaks all call-sites that use `get_bucket()` and dependents) +* make sure `fca` can successfully handle multi-variants at this stage + supports mutable iterators for `map`/`multimap` +* do a hard-break: + * update code to no longer use one single linked list across all buckets (each bucket contains its own unique list) + * integrate the `bucket_group` structure into the `table` (update iterator call-sites to include `bucket_iterator`s) + +Blockers: +* how to handle `multi` variants with new `fca` prototype + +## Implementation Differences + +### Unordered + +### Node Type + +Bullet Points: +* reify node type into a single one +* come up with implementation for multi- variants +* code that touches `get_bucket()` and `*_in_group()` member functions may need updating + +There are two node types in Unordered, `struct node` and `struct ptr_node`, and the node type is selected conditionally based on the Allocator's pointer type: +```c++ +template +struct pick_node2 +{ + typedef boost::unordered::detail::node node; + // ... +}; + +template +struct pick_node2*, + boost::unordered::detail::ptr_bucket*> +{ + typedef boost::unordered::detail::ptr_node node; + // ... +}; + +template struct pick_node +{ + typedef typename boost::remove_const::type nonconst; + + typedef boost::unordered::detail::allocator_traits< + typename boost::unordered::detail::rebind_wrap >::type> + tentative_node_traits; + + typedef boost::unordered::detail::allocator_traits< + typename boost::unordered::detail::rebind_wrap::type> + tentative_bucket_traits; + + typedef pick_node2 + pick; + + typedef typename pick::node node; + typedef typename pick::bucket bucket; + typedef typename pick::link_pointer link_pointer; +}; +``` + +The node types are identical in terms of interface and the only difference is that `node` is chosen when the Allocator uses fancy pointers and `ptr_node` is chosen when the Allocator's pointer type is `T*`. + +Nodes in Unorderd store `bucket_info_`: +```cpp +template +struct node : boost::unordered::detail::value_base +{ + link_pointer next_; + std::size_t bucket_info_; + node() : next_(), bucket_info_(0) {} + // ... +}; +``` + +`bucket_info_` maps each node back to its corresponding bucket via the member function: +```cpp +std::size_t get_bucket() const +{ + return bucket_info_ & ((std::size_t)-1 >> 1); +} +``` + +`bucket_info_` is also used to demarcate the start of equivalent nodes in the containers via: +```cpp +// Note that nodes start out as the first in their group, as `bucket_info_` defaults to 0. +std::size_t is_first_in_group() const +{ return !(bucket_info_ & ~((std::size_t)-1 >> 1)); } + +void set_first_in_group() +{ bucket_info_ = bucket_info_ & ((std::size_t)-1 >> 1); } + +void reset_first_in_group() +{ bucket_info_ = bucket_info_ | ~((std::size_t)-1 >> 1); } +``` + +A goal of refactoring is to simply have one node type: +```cpp +template +struct node { + node *next; + T value; +}; +``` +that is used unconditionally. This also requires updating the code that touches the `bucket_info_` along with the code that that touches the `*_in_group()` member functions. + +### Bucket Type + +Bullet points: +* reify bucket structure into a single one +* figure out how to add `bucket_group`s to the table struct + +Buckets are similar to nodes in that there are two variations: `template struct bucket` and `struct ptr_bucket`. + +The buckets exist to contain a pointer to a node, however they contain an `enum { extra_node = true };` or `enum { extra_node = false }` to determine whether or not the code should explicitly allocate a default constructed node whose address assigned as the dummy node at the end of the bucket array. + +`extra_node` is used in the creation and deletion of the bucket array but it is not inherently clear what its intended purpose is. + +### Iterators + +Iterators are currently templated on the type of Node they store. Because `fca` constructs iterators with two arguments, all the call-sites that instantiate iterators will need to be updated but this a straight-forward mechanical change. + +Iterators are selected, as of now, via the `detail::map` and `detail::set` class templates. + +For example, for `unordered_map`, `iterator` is defined as: +```cpp +typedef boost::unordered::detail::map types; +typedef typename types::table table; +typedef typename table::iterator iterator; +``` + +The iterator is a member typedef of the `table` which is `types::table`. Examining `types` (aka `detail::map<...>`), we see: +```cpp +template +struct map { + // ... + typedef boost::unordered::detail::table table; + // ... +}; +``` + +Examining the `detail::table` struct, we see: +```cpp +template +struct table { + // ... + typedef typename Types::iterator iterator; + // ... +} +``` + +Collapsing all of this, we see that our iterator types are defined here: +```cpp +template +struct map +{ + // ... + typedef boost::unordered::detail::pick_node pick; + typedef typename pick::node node; + + typedef boost::unordered::iterator_detail::iterator iterator; + typedef boost::unordered::iterator_detail::c_iterator c_iterator; + typedef boost::unordered::iterator_detail::l_iterator l_iterator; + typedef boost::unordered::iterator_detail::cl_iterator + cl_iterator; + // ... +}; +``` + +This is similarly designed for `detail::set`: +```cpp +typedef boost::unordered::iterator_detail::c_iterator iterator; +typedef boost::unordered::iterator_detail::c_iterator c_iterator; +typedef boost::unordered::iterator_detail::cl_iterator l_iterator; +typedef boost::unordered::iterator_detail::cl_iterator + cl_iterator; +``` + +The only difference here is that `set::iterator` is always a `c_iterator`, a `const_iterator` type. diff --git a/doc/unordered/buckets.adoc b/doc/unordered/buckets.adoc index 28cca9b9..fcaf516c 100644 --- a/doc/unordered/buckets.adoc +++ b/doc/unordered/buckets.adoc @@ -1,5 +1,6 @@ [#buckets] :idprefix: buckets_ +:imagesdir: ../diagrams = The Data Structure @@ -8,7 +9,7 @@ any number of elements. For example, the following diagram shows an <> on the `rehash` function. +== Fast Closed Addressing Implementation + +++++ + +++++ + +Boost.Unordered sports one of the fastest implementations of closed addressing, also commonly known as https://en.wikipedia.org/wiki/Hash_table#Separate_chaining[separate chaining]. An example figure representing the data structure is below: + +[#img-bucket-groups,.text-center] +.A simple bucket group approach +image::bucket-groups.png[align=center] + +An array of "buckets" is allocated and each bucket in turn points to its own individual linked list. This makes meeting the standard requirements of bucket iteration straight-forward. Unfortunately, iteration of the entire container is often times slow using this layout as each bucket must be examined for occupancy, yielding a time complexity of `O(bucket_count() + size())` when the standard requires complexity to be `O(size())`. + +Canonical standard implementations will wind up looking like the diagram below: + +[.text-center] +.The canonical standard approach +image::singly-linked.png[align=center,link=../diagrams/singly-linked.png,window=_blank] + +It's worth noting that this approach is only used by pass:[libc++] and pass:[libstdc++]; the MSVC Dinkumware implementation uses a different one. A more detailed analysis of the standard containers can be found http://bannalia.blogspot.com/2013/10/implementation-of-c-unordered.html[here]. + +This unusually laid out data structure is chosen to make iteration of the entire container efficient by inter-connecting all of the nodes into a singly-linked list. One might also notice that buckets point to the node _before_ the start of the bucket's elements. This is done so that removing elements from the list can be done efficiently without introducing the need for a doubly-linked list. Unfortunately, this data structure introduces a guaranteed extra indirection. For example, to access the first element of a bucket, something like this must be done: + +```c++ +auto const idx = get_bucket_idx(hash_function(key)); +node* p = buckets[idx]; // first load +node* n = p->next; // second load +if (n && is_in_bucket(n, idx)) { + value_type const& v = *n; // third load + // ... +} +``` + +With a simple bucket group layout, this is all that must be done: +```c++ +auto const idx = get_bucket_idx(hash_function(key)); +node* n = buckets[idx]; // first load +if (n) { + value_type const& v = *n; // second load + // ... +} +``` + +In practice, the extra indirection can have a dramatic performance impact to common operations such as `insert`, `find` and `erase`. But to keep iteration of the container fast, Boost.Unordered introduces a novel data structure, a "bucket group". A bucket group is a fixed-width view of a subsection of the buckets array. It contains a bitmask (a `std::size_t`) which it uses to track occupancy of buckets and contains two pointers so that it can form a doubly-linked list with non-empty groups. An example diagram is below: + +[#img-fca-layout] +.The new layout used by Boost +image::fca.png[align=center] + +Thus container-wide iteration is turned into traversing the non-empty bucket groups (an operation with constant time complexity) which reduces the time complexity back to `O(size())`. In total, a bucket group is only 4 words in size and it views `sizeof(std::size_t) * CHAR_BIT` buckets meaning that for all common implementations, there's only 4 bits of space overhead per bucket introduced by the bucket groups. + +For more information on implementation rationale, read the <>. + += Benchmarks + +All benchmarks were created using `unordered_set` (non-duplicate) and `unordered_multiset` (duplicate). The source code can be https://github.com/joaquintides/boost_unordered_benchmark[found here]. + +The insertion benchmarks insert `n` random values, where `n` is between 10,000 and 3 million. For the duplicated benchmarks, the same random values are repeated an average of 5 times. + +The erasure benchmarks erase all `n` elements randomly until the container is empty. + +The successful lookup benchmarks are done by looking up all `n` values, in the their original insertion order. + +The unsuccessful lookup benchmarks use `n` randomly generated integers but using a different seed value. + +== GCC 11 + libstdc++-v3 + +=== Insertion + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/gcc/running insertion.xlsx.practice.png[width=250,link=../diagrams/benchmarks/gcc/running insertion.xlsx.practice.png,window=_blank] +|image::benchmarks/gcc/running%20insertion.xlsx.practice non-unique.png[width=250,link=../diagrams/benchmarks/gcc/running%20insertion.xlsx.practice non-unique.png,window=_blank] +|image::benchmarks/gcc/running%20insertion.xlsx.practice non-unique 5.png[width=250,link=../diagrams/benchmarks/gcc/running%20insertion.xlsx.practice non-unique 5.png,window=_blank] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements + +max load factor 5 +|=== + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/gcc/running%20insertion.xlsx.practice norehash.png[width=250,link=../diagrams/benchmarks/gcc/running%20insertion.xlsx.practice norehash.png,window=_blank] +|image::benchmarks/gcc/running%20insertion.xlsx.practice norehash non-unique.png[width=250,link=../diagrams/benchmarks/gcc/running%20insertion.xlsx.practice norehash non-unique.png,window=_blank] +|image::benchmarks/gcc/running%20insertion.xlsx.practice norehash non-unique 5.png[width=250,link=../diagrams/benchmarks/gcc/running%20insertion.xlsx.practice norehash non-unique 5.png,window=_blank] + +h|non-duplicate elements, + +prior `reserve` +h|duplicate elements, + +prior `reserve` +h|duplicate elements, + +max load factor 5, + +prior `reserve` + +|=== + +=== Erasure + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/gcc/scattered%20erasure.xlsx.practice.png[width=250,link=../diagrams/benchmarks/gcc/scattered%20erasure.xlsx.practice.png,window=_blank] +|image::benchmarks/gcc/scattered%20erasure.xlsx.practice non-unique.png[width=250,link=../diagrams/benchmarks/gcc/scattered%20erasure.xlsx.practice non-unique.png,window=_blank] +|image::benchmarks/gcc/scattered%20erasure.xlsx.practice non-unique 5.png[width=250,link=../diagrams/benchmarks/gcc/scattered%20erasure.xlsx.practice non-unique 5.png,window=_blank] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements + +max load factor 5 +|=== + +=== Successful Lookup + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/gcc/scattered%20successful%20looukp.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/gcc/scattered%20successful%20looukp.xlsx.practice.png] +|image::benchmarks/gcc/scattered%20successful%20looukp.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/gcc/scattered%20successful%20looukp.xlsx.practice non-unique.png] +|image::benchmarks/gcc/scattered%20successful%20looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/gcc/scattered%20successful%20looukp.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +=== Unsuccessful lookup + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/gcc/scattered%20unsuccessful%20looukp.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/gcc/scattered%20unsuccessful%20looukp.xlsx.practice.png] +|image::benchmarks/gcc/scattered%20unsuccessful%20looukp.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/gcc/scattered%20unsuccessful%20looukp.xlsx.practice non-unique.png] +|image::benchmarks/gcc/scattered%20unsuccessful%20looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/gcc/scattered%20unsuccessful%20looukp.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +== Clang 12 + libc++ + +=== Insertion + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/clang_libcpp/running%20insertion.xlsx.practice.png[width=250, window=_blank,link=../diagrams/benchmarks/clang_libcpp/running%20insertion.xlsx.practice.png] +|image::benchmarks/clang_libcpp/running%20insertion.xlsx.practice non-unique.png[width=250, window=_blank,link=../diagrams/benchmarks/clang_libcpp/running%20insertion.xlsx.practice non-unique.png] +|image::benchmarks/clang_libcpp/running%20insertion.xlsx.practice non-unique 5.png[width=250, window=_blank,link=../diagrams/benchmarks/clang_libcpp/running%20insertion.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/clang_libcpp/running%20insertion.xlsx.practice norehash.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/running%20insertion.xlsx.practice norehash.png] +|image::benchmarks/clang_libcpp/running%20insertion.xlsx.practice norehash non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/running%20insertion.xlsx.practice norehash non-unique.png] +|image::benchmarks/clang_libcpp/running%20insertion.xlsx.practice norehash non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/running%20insertion.xlsx.practice norehash non-unique 5.png] + +h|non-duplicate elements, + +prior `reserve` +h|duplicate elements, + +prior `reserve` +h|duplicate elements, + +max load factor 5, + +prior `reserve` + +|=== + +=== Erasure + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/clang_libcpp/scattered%20erasure.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20erasure.xlsx.practice.png] +|image::benchmarks/clang_libcpp/scattered%20erasure.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20erasure.xlsx.practice non-unique.png] +|image::benchmarks/clang_libcpp/scattered%20erasure.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20erasure.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +=== Successful lookup + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/clang_libcpp/scattered%20successful%20looukp.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20successful%20looukp.xlsx.practice.png] +|image::benchmarks/clang_libcpp/scattered%20successful%20looukp.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20successful%20looukp.xlsx.practice non-unique.png] +|image::benchmarks/clang_libcpp/scattered%20successful%20looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20successful%20looukp.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +=== Unsuccessful lookup + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/clang_libcpp/scattered%20unsuccessful%20looukp.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20unsuccessful%20looukp.xlsx.practice.png] +|image::benchmarks/clang_libcpp/scattered%20unsuccessful%20looukp.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20unsuccessful%20looukp.xlsx.practice non-unique.png] +|image::benchmarks/clang_libcpp/scattered%20unsuccessful%20looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/clang_libcpp/scattered%20unsuccessful%20looukp.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +== Visual Studio 2019 + Dinkumware + +=== Insertion + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/vs/running%20insertion.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/running%20insertion.xlsx.practice.png] +|image::benchmarks/vs/running%20insertion.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/running%20insertion.xlsx.practice non-unique.png] +|image::benchmarks/vs/running%20insertion.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/running%20insertion.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/vs/running%20insertion.xlsx.practice norehash.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/running%20insertion.xlsx.practice norehash.png] +|image::benchmarks/vs/running%20insertion.xlsx.practice norehash non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/running%20insertion.xlsx.practice norehash non-unique.png] +|image::benchmarks/vs/running%20insertion.xlsx.practice norehash non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/running%20insertion.xlsx.practice norehash non-unique 5.png] + +h|non-duplicate elements, + +prior `reserve` +h|duplicate elements, + +prior `reserve` +h|duplicate elements, + +max load factor 5, + +prior `reserve` + +|=== + +=== Erasure + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/vs/scattered%20erasure.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/scattered%20erasure.xlsx.practice.png] +|image::benchmarks/vs/scattered%20erasure.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/scattered%20erasure.xlsx.practice non-unique.png] +|image::benchmarks/vs/scattered%20erasure.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/scattered%20erasure.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +=== Successful lookup + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/vs/scattered%20successful%20looukp.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/benchmarks/vs/scattered%20successful%20looukp.xlsx.practice.png] +|image::benchmarks/vs/scattered%20successful%20looukp.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/benchmarks/vs/scattered%20successful%20looukp.xlsx.practice non-unique.png] +|image::benchmarks/vs/scattered%20successful%20looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/benchmarks/vs/scattered%20successful%20looukp.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== + +=== Unsuccessful lookup + +[caption=] +[cols="3*^.^a", frame=all, grid=all] +|=== + +|image::benchmarks/vs/scattered%20unsuccessful%20looukp.xlsx.practice.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/scattered%20unsuccessful%20looukp.xlsx.practice.png] +|image::benchmarks/vs/scattered%20unsuccessful%20looukp.xlsx.practice non-unique.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/scattered%20unsuccessful%20looukp.xlsx.practice non-unique.png] +|image::benchmarks/vs/scattered%20unsuccessful%20looukp.xlsx.practice non-unique 5.png[width=250,window=_blank,link=../diagrams/benchmarks/vs/scattered%20unsuccessful%20looukp.xlsx.practice non-unique 5.png] + +h|non-duplicate elements +h|duplicate elements +h|duplicate elements, + +max load factor 5 + +|=== diff --git a/doc/unordered/changes.adoc b/doc/unordered/changes.adoc index 79a28a51..f930e680 100644 --- a/doc/unordered/changes.adoc +++ b/doc/unordered/changes.adoc @@ -6,6 +6,12 @@ :github-pr-url: https://github.com/boostorg/unordered/pull :cpp: C++ +== Release 1.80.0 + +* Refactor internal implementation to be dramatically faster +* Fix long-standing bug where `final`-qualified Hasher and KeyEqual couldn't be + used + == Release 1.79.0 * Improved {cpp}20 support: diff --git a/doc/unordered/rationale.adoc b/doc/unordered/rationale.adoc index fe4a7707..43617a66 100644 --- a/doc/unordered/rationale.adoc +++ b/doc/unordered/rationale.adoc @@ -66,3 +66,7 @@ by using `(h * m) >> (w - k)`, where `h` is the hash value, `m` is the golden ratio multiplied by `2^w`, `w` is the word size (32 or 64), and `2^k` is the number of buckets. This provides a good compromise between speed and distribution. + +Since release 1.80.0, prime numbers are chosen for the number of buckets in +tandem with sophisticated modulo arithmetic. This removes the need for "mixing" +the result of the user's hash function as was used for release 1.79.0.