forked from boostorg/unordered
		
	
		
			
				
	
	
		
			149 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			149 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
[/ Copyright 2006-2007 Daniel James.
 | 
						|
 / Distributed under the Boost Software License, Version 1.0. (See accompanying
 | 
						|
 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
 | 
						|
 | 
						|
[section:buckets The Data Structure]
 | 
						|
 | 
						|
The containers are made up of a number of 'buckets', each of which can contain
 | 
						|
any number of elements. For example, the following diagram shows an [classref
 | 
						|
boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`,
 | 
						|
`B`, `C`, `D` and `E` (this is just for illustration, in practise containers
 | 
						|
will have more buckets).
 | 
						|
 | 
						|
[$../../libs/unordered/doc/diagrams/buckets.png]
 | 
						|
 | 
						|
In order to decide which bucket to place an element in, the container applies
 | 
						|
the hash function, `Hash`, to the element's key (for `unordered_set` and
 | 
						|
`unordered_multiset` the key is the whole element, but is refered to as the key
 | 
						|
so that the same terminology can be used for sets and maps). This returns a
 | 
						|
value of type `std::size_t`.  `std::size_t` has a much greater range of values
 | 
						|
then the number of buckets, so that container applies another transformation to
 | 
						|
that value to choose a bucket to place the element in.
 | 
						|
 | 
						|
If at a later date the container wants to find an element in the container it
 | 
						|
just has to apply the same process to the element's key to discover which
 | 
						|
bucket it is in. If the hash function has worked well the elements will be
 | 
						|
evenly distributed amongst the buckets so it will only have to examine a small
 | 
						|
number of elements.
 | 
						|
 | 
						|
You can see in the diagram that `A` & `D` have been placed in the same bucket.
 | 
						|
This means that when looking for these elements, of another element that would
 | 
						|
be placed in the same bucket, up to 2 comparison have to be made, making
 | 
						|
searching slower. This is known as a collision. To keep things fast we try to
 | 
						|
keep these to a minimum.  
 | 
						|
 | 
						|
[table Methods for Accessing Buckets
 | 
						|
    [[Method] [Description]]
 | 
						|
 | 
						|
    [
 | 
						|
        [``size_type bucket_count() const``]
 | 
						|
        [The number of buckets.]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``size_type max_bucket_count() const``]
 | 
						|
        [An upper bound on the number of buckets.]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``size_type bucket_size(size_type n) const``]
 | 
						|
        [The number of elements in bucket `n`.]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``size_type bucket(key_type const& k) const``]
 | 
						|
        [Returns the index of the bucket which would contain k]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``
 | 
						|
            local_iterator begin(size_type n);
 | 
						|
            local_iterator end(size_type n);
 | 
						|
            const_local_iterator begin(size_type n) const;
 | 
						|
            const_local_iterator end(size_type n) const;
 | 
						|
        ``]
 | 
						|
        [Return begin and end iterators for bucket `n`.]
 | 
						|
    ]
 | 
						|
]
 | 
						|
 | 
						|
[h2 Controlling the number of buckets]
 | 
						|
 | 
						|
As more elements are added to an unordered associative container, the number
 | 
						|
of elements in the buckets will increase causing performance to get worse. To
 | 
						|
combat this the containers increase the bucket count as elements are inserted.
 | 
						|
 | 
						|
The standard gives you two methods to influence the bucket count. First you can
 | 
						|
specify the minimum number of buckets in the constructor, and later, by calling
 | 
						|
`rehash`.
 | 
						|
 | 
						|
The other method is the `max_load_factor` member function. The 'load factor'
 | 
						|
is the average number of elements per bucket, `max_load_factor` can be used
 | 
						|
to give a /hint/ of a value that the load factor should be kept below. The
 | 
						|
draft standard doesn't actually require the container to pay much attention
 | 
						|
to this value. The only time the load factor is /required/ to be less than the
 | 
						|
maximum is following a call to `rehash`. But most implementations will probably
 | 
						|
try to keep the number of elements below the max load factor, and set the
 | 
						|
maximum load factor something the same or near to your hint - unless your hint
 | 
						|
is unreasonably small.
 | 
						|
 | 
						|
It is not specified anywhere how member functions other than `rehash` affect
 | 
						|
the bucket count, although `insert` is only allowed to invalidate iterators
 | 
						|
when the insertion causes the load factor to reach the maximum. Which will
 | 
						|
typically mean that insert will only change the number of buckets when an
 | 
						|
insert causes this.
 | 
						|
 | 
						|
In a similar manner to using `reserve` for `vector`s, it can be a good idea
 | 
						|
to call `rehash` before inserting a large number of elements. This will get
 | 
						|
the expensive rehashing out of the way and let you store iterators, safe in
 | 
						|
the knowledge that they won't be invalidated. If you are inserting `n`
 | 
						|
elements into container `x`, you could first call:
 | 
						|
 | 
						|
    x.rehash((x.size() + n) / x.max_load_factor() + 1);
 | 
						|
 | 
						|
[blurb Note: `rehash`'s argument is the number of buckets, not the number of
 | 
						|
elements, which is why the new size is divided by the maximum load factor. The
 | 
						|
`+ 1` is required because the container is allowed to resize when the load
 | 
						|
factor is equal to the maximum load factor.]
 | 
						|
 | 
						|
[table Methods for Controlling Bucket Size
 | 
						|
    [[Method] [Description]]
 | 
						|
 | 
						|
    [
 | 
						|
        [``float load_factor() const``]
 | 
						|
        [The average number of elements per bucket.]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``float max_load_factor() const``]
 | 
						|
        [Returns the current maximum load factor.]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``float max_load_factor(float z)``]
 | 
						|
        [Changes the container's maximum load factor, using `z` as a hint.]
 | 
						|
    ]
 | 
						|
    [
 | 
						|
        [``void rehash(size_type n)``]
 | 
						|
        [Changes the number of buckets so that there at least n buckets, and
 | 
						|
        so that the load factor is less than the maximum load factor.]
 | 
						|
    ]
 | 
						|
 | 
						|
]
 | 
						|
 | 
						|
[/ I'm not at all happy with this section. So I've commented it out.]
 | 
						|
 | 
						|
[/ h2 Rehash Techniques]
 | 
						|
 | 
						|
[/If the container has a load factor much smaller than the maximum, `rehash`
 | 
						|
might decrease the number of buckets, reducing the memory usage. This isn't
 | 
						|
guaranteed by the standard but this implementation will do it.
 | 
						|
 | 
						|
If you want to stop the table from ever rehashing due to an insert, you can
 | 
						|
set the maximum load factor to infinity (or perhaps a load factor that it'll
 | 
						|
never reach - say `x.max_size()`. As you can only give a 'hint' for the maximum
 | 
						|
load factor, this isn't guaranteed to work. But again, it'll work in this
 | 
						|
implementation. (TODO: If an unordered container with infinite load factor
 | 
						|
is copied, bad things could happen. So maybe this advice should be removed. Or
 | 
						|
maybe the implementation should cope with that).
 | 
						|
 | 
						|
If you do this and want to make the container rehash, `rehash` will still work.
 | 
						|
But be careful that you only ever call it with a sufficient number of buckets
 | 
						|
- otherwise it's very likely that the container will decrease the bucket
 | 
						|
count to an overly small amount.]
 | 
						|
 | 
						|
[endsect]
 |