unordered/doc/roadmap.md

# Refactoring Roadmap

[Proof of concept](https://github.com/joaquintides/fca_unordered) implementation for a fast closed-addressing implementation.

## Plan of Refactoring

* remove `ptr_node` and `ptr_bucket`
* see if the code can survive a lack of the `extra_node` or maybe we hard-code it in
* implement bucket groups as they are in `fca` but don't use them directly yet, add alongside the `buckets_` data member in `struct table`
* try to remove `bucket_info_` from the node structure (breaks all call-sites that use `get_bucket()` and dependents)
* make sure `fca` can successfully handle multi-variants at this stage + supports mutable iterators for `map`/`multimap`
* do a hard-break:
    * update code to no longer use one single linked list across all buckets (each bucket contains its own unique list)
    * integrate the `bucket_group<Node>` structure into the `table` (update iterator call-sites to include `bucket_iterator`s)

Blockers:
* how to handle `multi` variants with new `fca` prototype

## Implementation Differences

### Unordered

### Node Type

Bullet Points:
* reify node type into a single one
* come up with implementation for multi- variants
* code that touches `get_bucket()` and `*_in_group()` member functions may need updating

There are two node types in Unordered, `struct node` and `struct ptr_node`, and the node type is selected conditionally based on the Allocator's pointer type:
```c++
template <typename A, typename T, typename NodePtr, typename BucketPtr>
struct pick_node2
{
  typedef boost::unordered::detail::node<A, T> node;
  // ...
};

template <typename A, typename T>
struct pick_node2<A, T, boost::unordered::detail::ptr_node<T>*,
  boost::unordered::detail::ptr_bucket*>
{ 
  typedef boost::unordered::detail::ptr_node<T> node;
  // ...
};

template <typename A, typename T> struct pick_node
{
  typedef typename boost::remove_const<T>::type nonconst;

  typedef boost::unordered::detail::allocator_traits<
    typename boost::unordered::detail::rebind_wrap<A,
      boost::unordered::detail::ptr_node<nonconst> >::type>
    tentative_node_traits;

  typedef boost::unordered::detail::allocator_traits<
    typename boost::unordered::detail::rebind_wrap<A,
      boost::unordered::detail::ptr_bucket>::type>
    tentative_bucket_traits;

  typedef pick_node2<A, nonconst, typename tentative_node_traits::pointer,
    typename tentative_bucket_traits::pointer>
    pick;

  typedef typename pick::node node;
  typedef typename pick::bucket bucket;
  typedef typename pick::link_pointer link_pointer;
};
```

The node types are identical in terms of interface and the only difference is that `node` is chosen when the Allocator uses fancy pointers and `ptr_node` is chosen when the Allocator's pointer type is `T*`.

Nodes in Unorderd store `bucket_info_`:
```cpp
template <typename A, typename T>
struct node : boost::unordered::detail::value_base<T>
{
  link_pointer next_;
  std::size_t bucket_info_;
  node() : next_(), bucket_info_(0) {}
  // ...
};
```

`bucket_info_` maps each node back to its corresponding bucket via the member function:
```cpp
std::size_t get_bucket() const
{
  return bucket_info_ & ((std::size_t)-1 >> 1);
}
```

`bucket_info_` is also used to demarcate the start of equivalent nodes in the containers via:
```cpp
// Note that nodes start out as the first in their group, as `bucket_info_` defaults to 0.
std::size_t is_first_in_group() const
{ return !(bucket_info_ & ~((std::size_t)-1 >> 1)); }

void set_first_in_group()
{ bucket_info_ = bucket_info_ & ((std::size_t)-1 >> 1); }

void reset_first_in_group()
{ bucket_info_ = bucket_info_ | ~((std::size_t)-1 >> 1); }
```

A goal of refactoring is to simply have one node type:
```cpp
template<class T>
struct node {
  node *next;
  T    value;
};
```
that is used unconditionally. This also requires updating the code that touches the `bucket_info_` along with the code that that touches the `*_in_group()` member functions.

### Bucket Type

Bullet points:
* reify bucket structure into a single one
* figure out how to add `bucket_group`s to the table struct

Buckets are similar to nodes in that there are two variations: `template<class NodePointer> struct bucket` and `struct ptr_bucket`.

The buckets exist to contain a pointer to a node, however they contain an `enum { extra_node = true };` or `enum { extra_node = false }` to determine whether or not the code should explicitly allocate a default constructed node whose address assigned as the dummy node at the end of the bucket array.

`extra_node` is used in the creation and deletion of the bucket array but it is not inherently clear what its intended purpose is.

### Iterators

Iterators are currently templated on the type of Node they store. Because `fca` constructs iterators with two arguments, all the call-sites that instantiate iterators will need to be updated but this a straight-forward mechanical change.

Iterators are selected, as of now, via the `detail::map` and `detail::set` class templates.

For example, for `unordered_map`, `iterator` is defined as:
```cpp
typedef boost::unordered::detail::map<A, K, T, H, P> types;
typedef typename types::table table;
typedef typename table::iterator iterator;
```

The iterator is a member typedef of the `table` which is `types::table`. Examining `types` (aka `detail::map<...>`), we see:
```cpp
template <typename A, typename K, typename M, typename H, typename P>
struct map {
  // ...
  typedef boost::unordered::detail::table<types> table;
  // ...
};
```

Examining the `detail::table<types>` struct, we see:
```cpp
template <typename Types>
struct table {
  // ...
  typedef typename Types::iterator iterator;
  // ...
}
```

Collapsing all of this, we see that our iterator types are defined here:
```cpp
template <typename A, typename K, typename M, typename H, typename P>
struct map
{
  // ...
  typedef boost::unordered::detail::pick_node<A, value_type> pick;
  typedef typename pick::node node;

  typedef boost::unordered::iterator_detail::iterator<node> iterator;
  typedef boost::unordered::iterator_detail::c_iterator<node> c_iterator;
  typedef boost::unordered::iterator_detail::l_iterator<node> l_iterator;
  typedef boost::unordered::iterator_detail::cl_iterator<node>
    cl_iterator;
  // ...
};
```

This is similarly designed for `detail::set`:
```cpp
typedef boost::unordered::iterator_detail::c_iterator<node> iterator;
typedef boost::unordered::iterator_detail::c_iterator<node> c_iterator;
typedef boost::unordered::iterator_detail::cl_iterator<node> l_iterator;
typedef boost::unordered::iterator_detail::cl_iterator<node>
  cl_iterator;
```

The only difference here is that `set::iterator` is always a `c_iterator`, a `const_iterator` type.
Update documentation for new FCA implementation 2022-06-01 11:45:56 -07:00			`# Refactoring Roadmap`

			`[Proof of concept](https://github.com/joaquintides/fca_unordered) implementation for a fast closed-addressing implementation.`

			`## Plan of Refactoring`

			* remove `ptr_node` and `ptr_bucket`
			* see if the code can survive a lack of the `extra_node` or maybe we hard-code it in
			* implement bucket groups as they are in `fca` but don't use them directly yet, add alongside the `buckets_` data member in `struct table`
			* try to remove `bucket_info_` from the node structure (breaks all call-sites that use `get_bucket()` and dependents)
			* make sure `fca` can successfully handle multi-variants at this stage + supports mutable iterators for `map`/`multimap`
			`* do a hard-break:`
			`* update code to no longer use one single linked list across all buckets (each bucket contains its own unique list)`
			* integrate the `bucket_group<Node>` structure into the `table` (update iterator call-sites to include `bucket_iterator`s)

			`Blockers:`
			* how to handle `multi` variants with new `fca` prototype

			`## Implementation Differences`

			`### Unordered`

			`### Node Type`

			`Bullet Points:`
			`* reify node type into a single one`
			`* come up with implementation for multi- variants`
			* code that touches `get_bucket()` and `*_in_group()` member functions may need updating

			There are two node types in Unordered, `struct node` and `struct ptr_node`, and the node type is selected conditionally based on the Allocator's pointer type:
			```c++
			`template <typename A, typename T, typename NodePtr, typename BucketPtr>`
			`struct pick_node2`
			`{`
			`typedef boost::unordered::detail::node<A, T> node;`
			`// ...`
			`};`

			`template <typename A, typename T>`
			`struct pick_node2<A, T, boost::unordered::detail::ptr_node<T>*,`
			`boost::unordered::detail::ptr_bucket*>`
			`{`
			`typedef boost::unordered::detail::ptr_node<T> node;`
			`// ...`
			`};`

			`template <typename A, typename T> struct pick_node`
			`{`
			`typedef typename boost::remove_const<T>::type nonconst;`

			`typedef boost::unordered::detail::allocator_traits<`
			`typename boost::unordered::detail::rebind_wrap<A,`
			`boost::unordered::detail::ptr_node<nonconst> >::type>`
			`tentative_node_traits;`

			`typedef boost::unordered::detail::allocator_traits<`
			`typename boost::unordered::detail::rebind_wrap<A,`
			`boost::unordered::detail::ptr_bucket>::type>`
			`tentative_bucket_traits;`

			`typedef pick_node2<A, nonconst, typename tentative_node_traits::pointer,`
			`typename tentative_bucket_traits::pointer>`
			`pick;`

			`typedef typename pick::node node;`
			`typedef typename pick::bucket bucket;`
			`typedef typename pick::link_pointer link_pointer;`
			`};`
			```

			The node types are identical in terms of interface and the only difference is that `node` is chosen when the Allocator uses fancy pointers and `ptr_node` is chosen when the Allocator's pointer type is `T*`.

			Nodes in Unorderd store `bucket_info_`:
			```cpp
			`template <typename A, typename T>`
			`struct node : boost::unordered::detail::value_base<T>`
			`{`
			`link_pointer next_;`
			`std::size_t bucket_info_;`
			`node() : next_(), bucket_info_(0) {}`
			`// ...`
			`};`
			```

			`bucket_info_` maps each node back to its corresponding bucket via the member function:
			```cpp
			`std::size_t get_bucket() const`
			`{`
			`return bucket_info_ & ((std::size_t)-1 >> 1);`
			`}`
			```

			`bucket_info_` is also used to demarcate the start of equivalent nodes in the containers via:
			```cpp
			// Note that nodes start out as the first in their group, as `bucket_info_` defaults to 0.
			`std::size_t is_first_in_group() const`
			`{ return !(bucket_info_ & ~((std::size_t)-1 >> 1)); }`

			`void set_first_in_group()`
			`{ bucket_info_ = bucket_info_ & ((std::size_t)-1 >> 1); }`

			`void reset_first_in_group()`
			`{ bucket_info_ = bucket_info_ \| ~((std::size_t)-1 >> 1); }`
			```

			`A goal of refactoring is to simply have one node type:`
			```cpp
			`template<class T>`
			`struct node {`
			`node *next;`
			`T value;`
			`};`
			```
			that is used unconditionally. This also requires updating the code that touches the `bucket_info_` along with the code that that touches the `*_in_group()` member functions.

			`### Bucket Type`

			`Bullet points:`
			`* reify bucket structure into a single one`
			* figure out how to add `bucket_group`s to the table struct

			Buckets are similar to nodes in that there are two variations: `template<class NodePointer> struct bucket` and `struct ptr_bucket`.

			The buckets exist to contain a pointer to a node, however they contain an `enum { extra_node = true };` or `enum { extra_node = false }` to determine whether or not the code should explicitly allocate a default constructed node whose address assigned as the dummy node at the end of the bucket array.

			`extra_node` is used in the creation and deletion of the bucket array but it is not inherently clear what its intended purpose is.

			`### Iterators`

			Iterators are currently templated on the type of Node they store. Because `fca` constructs iterators with two arguments, all the call-sites that instantiate iterators will need to be updated but this a straight-forward mechanical change.

			Iterators are selected, as of now, via the `detail::map` and `detail::set` class templates.

			For example, for `unordered_map`, `iterator` is defined as:
			```cpp
			`typedef boost::unordered::detail::map<A, K, T, H, P> types;`
			`typedef typename types::table table;`
			`typedef typename table::iterator iterator;`
			```

			The iterator is a member typedef of the `table` which is `types::table`. Examining `types` (aka `detail::map<...>`), we see:
			```cpp
			`template <typename A, typename K, typename M, typename H, typename P>`
			`struct map {`
			`// ...`
			`typedef boost::unordered::detail::table<types> table;`
			`// ...`
			`};`
			```

			Examining the `detail::table<types>` struct, we see:
			```cpp
			`template <typename Types>`
			`struct table {`
			`// ...`
			`typedef typename Types::iterator iterator;`
			`// ...`
			`}`
			```

			`Collapsing all of this, we see that our iterator types are defined here:`
			```cpp
			`template <typename A, typename K, typename M, typename H, typename P>`
			`struct map`
			`{`
			`// ...`
			`typedef boost::unordered::detail::pick_node<A, value_type> pick;`
			`typedef typename pick::node node;`

			`typedef boost::unordered::iterator_detail::iterator<node> iterator;`
			`typedef boost::unordered::iterator_detail::c_iterator<node> c_iterator;`
			`typedef boost::unordered::iterator_detail::l_iterator<node> l_iterator;`
			`typedef boost::unordered::iterator_detail::cl_iterator<node>`
			`cl_iterator;`
			`// ...`
			`};`
			```

			This is similarly designed for `detail::set`:
			```cpp
			`typedef boost::unordered::iterator_detail::c_iterator<node> iterator;`
			`typedef boost::unordered::iterator_detail::c_iterator<node> c_iterator;`
			`typedef boost::unordered::iterator_detail::cl_iterator<node> l_iterator;`
			`typedef boost::unordered::iterator_detail::cl_iterator<node>`
			`cl_iterator;`
			```

			The only difference here is that `set::iterator` is always a `c_iterator`, a `const_iterator` type.