[/ Copyright 2006-2007 Daniel James. / Distributed under the Boost Software License, Version 1.0. (See accompanying / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] [section:buckets The Data Structure] The containers are made up of a number of 'buckets', each of which can contain any number of elements. For example, the following diagram shows an [classref boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`, `B`, `C`, `D` and `E` (this is just for illustration, containers will typically have more buckets). [$../../libs/unordered/doc/diagrams/buckets.png] In order to decide which bucket to place an element in, the container applies the hash function, `Hash`, to the element's key (for `unordered_set` and `unordered_multiset` the key is the whole element, but is referred to as the key so that the same terminology can be used for sets and maps). This returns a value of type `std::size_t`. `std::size_t` has a much greater range of values then the number of buckets, so that container applies another transformation to that value to choose a bucket to place the element in. If at a later date the container wants to find an element in the container it just has to apply the same process to the element's key to discover which bucket it is in. If the hash function has worked well the elements will be evenly distributed amongst the buckets so it will only have to examine a small number of elements. You can see in the diagram that `A` & `D` have been placed in the same bucket. This means that when looking for these elements, of another element that would be placed in the same bucket, up to 2 comparison have to be made, making searching slower. This is known as a collision. To keep things fast we try to keep these to a minimum. [table Methods for Accessing Buckets [[Method] [Description]] [ [``size_type bucket_count() const``] [The number of buckets.] ] [ [``size_type max_bucket_count() const``] [An upper bound on the number of buckets.] ] [ [``size_type bucket_size(size_type n) const``] [The number of elements in bucket `n`.] ] [ [``size_type bucket(key_type const& k) const``] [Returns the index of the bucket which would contain k] ] [ [`` local_iterator begin(size_type n); local_iterator end(size_type n); const_local_iterator begin(size_type n) const; const_local_iterator end(size_type n) const; ``] [Return begin and end iterators for bucket `n`.] ] ] [h2 Controlling the number of buckets] As more elements are added to an unordered associative container, the number of elements in the buckets will increase causing performance to get worse. To combat this the containers increase the bucket count as elements are inserted. The standard gives you two methods to influence the bucket count. First you can specify the minimum number of buckets in the constructor, and later, by calling `rehash`. The other method is the `max_load_factor` member function. The 'load factor' is the average number of elements per bucket, `max_load_factor` can be used to give a /hint/ of a value that the load factor should be kept below. The draft standard doesn't actually require the container to pay much attention to this value. The only time the load factor is /required/ to be less than the maximum is following a call to `rehash`. But most implementations will probably try to keep the number of elements below the max load factor, and set the maximum load factor something the same or near to your hint - unless your hint is unreasonably small. It is not specified anywhere how member functions other than `rehash` affect the bucket count, although `insert` is only allowed to invalidate iterators when the insertion causes the load factor to reach the maximum. Which will typically mean that insert will only change the number of buckets when an insert causes this. In a similar manner to using `reserve` for `vector`s, it can be a good idea to call `rehash` before inserting a large number of elements. This will get the expensive rehashing out of the way and let you store iterators, safe in the knowledge that they won't be invalidated. If you are inserting `n` elements into container `x`, you could first call: x.rehash((x.size() + n) / x.max_load_factor() + 1); [blurb Note: `rehash`'s argument is the number of buckets, not the number of elements, which is why the new size is divided by the maximum load factor. The + 1 guarantees there is no invalidation; without it, reallocation could occur if the number of bucket exactly divides the target size, since the container is allowed to rehash when the load factor is equal to the maximum load factor.] [table Methods for Controlling Bucket Size [[Method] [Description]] [ [``float load_factor() const``] [The average number of elements per bucket.] ] [ [``float max_load_factor() const``] [Returns the current maximum load factor.] ] [ [``float max_load_factor(float z)``] [Changes the container's maximum load factor, using `z` as a hint.] ] [ [``void rehash(size_type n)``] [Changes the number of buckets so that there at least n buckets, and so that the load factor is less than the maximum load factor.] ] ] [endsect]