added implementation description for cfoa

2023-05-13 19:29:41 +02:00
parent 69ee0039e0
commit 48f703132e
2 changed files with 48 additions and 0 deletions
--- a/doc/diagrams/cfoa.png
+++ b/doc/diagrams/cfoa.png
--- a/doc/unordered/buckets.adoc
+++ b/doc/unordered/buckets.adoc
@ -313,3 +313,51 @@ given in an
 https://bannalia.blogspot.com/2022/11/inside-boostunorderedflatmap.html[external article].
 For more information on implementation rationale, read the
 xref:#rationale_open_addresing_containers[corresponding section].
+
+== Concurrent Open Addressing Implementation
+
+`boost::concurrent_flat_map` uses the basic
+xref::#buckets_open_addressing_implementation[open-addressing layout] described above
+augmented with synchronization mechanisms.
+
+
+[#img-cfoa-layout]
+.Concurrent open-addressing layout used by Boost.Unordered.
+image::cfoa.png[align=center]
+
+Two levels of synchronization are used:
+
+* Container level: A read-write mutex is used to control access from any operation
+to the container. Typically, such access is in read mode (that is, concurrent) even
+for modifying operations, so for most practical purposes there is no thread
+contention at this level. Access is only in write mode (blocking) when rehashing or
+performing container-wide operations such as swapping or assignment.
+* Group level: Each 15-slot group is equipped with an 8-byte word containing:
+  ** A read-write spinlock for synchronized access to any element in the group.
+  ** An atomic _insertion counter_ used for optimistic insertion as described
+  below.
+
+By using atomic operations to access the group metadata, lookup is (group-level)
+lock-free up to the point where an actual comparison needs to be done with an element
+that has been previously SIMD-matched: only then it's the group's spinlock used.
+
+Insertion uses the following _optimistic algorithm_:
+
+* The value of the insertion counter for the initial group in the probe
+sequence is locally recorded (let's call this value `c0`).
+* Lookup is as described above. If lookup finds no equivalent element,
+search for an available slot for insertion successively locks/unlocks
+each group in the probing sequence.
+* When an available slot is located, it is preemptively occupied (its
+reduced hash value is set) and the insertion counter is atomically
+incremented: if no other thread has incremented the counter during the
+whole operation (which is checked by comparing with `c0`), then we're
+good to go and complete the insertion, otherwise we roll back and start
+over.
+
+This algorithm has very low contention both at the lookup and actual
+insertion phases in exchange for the possibility that computations have
+to be started over if some other thread interferes in the process by
+performing a succesful insertion beginning at the same group. In
+practice, the start-over frequency is extremely small, measured in the range
+of parts per million for some of our benchmarks.