| 
									
										
										
										
											2007-05-31 22:33:39 +00:00
										 |  |  | [/ Copyright 2006-2007 Daniel James. | 
					
						
							|  |  |  |  / Distributed under the Boost Software License, Version 1.0. (See accompanying | 
					
						
							|  |  |  |  / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [section:buckets The Data Structure] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The containers are made up of a number of 'buckets', each of which can contain | 
					
						
							|  |  |  | any number of elements. For example, the following diagram shows an [classref | 
					
						
							|  |  |  | boost::unordered_set unordered_set] with 7 buckets containing 5 elements, `A`, | 
					
						
							| 
									
										
											  
											
												Merged revisions 41822-41992,41994-42101 via svnmerge from 
https://svn.boost.org/svn/boost/branches/unordered/dev
........
  r41822 | danieljames | 2007-12-07 12:51:54 +0000 (Fri, 07 Dec 2007) | 5 lines
  
  Change the macros to meet boost guidelines.
  
  I should really have done this before the review. At least it'll give them
  something to say.
........
  r41928 | danieljames | 2007-12-09 19:23:27 +0000 (Sun, 09 Dec 2007) | 1 line
  
  Add some parameters to standalone documentation build.
........
  r41929 | danieljames | 2007-12-09 19:24:07 +0000 (Sun, 09 Dec 2007) | 1 line
  
  An extra rehash test for inserting a range.
........
  r41930 | danieljames | 2007-12-09 19:24:52 +0000 (Sun, 09 Dec 2007) | 1 line
  
  get_for_erase can be static because all the required information is in the iterator.
........
  r41931 | danieljames | 2007-12-09 19:31:00 +0000 (Sun, 09 Dec 2007) | 1 line
  
  ADL doesn't seem to be working properly on Visual C++ 7.1 when calling swap, so workaround this in the compile tests.
........
  r41932 | danieljames | 2007-12-09 19:44:46 +0000 (Sun, 09 Dec 2007) | 1 line
  
  Try to make the erase exception requirements a little clearer.
........
  r41933 | danieljames | 2007-12-09 19:52:50 +0000 (Sun, 09 Dec 2007) | 1 line
  
  Hopefully clearer comparison of accessors for comparison/hash function objects.
........
  r41943 | danieljames | 2007-12-10 00:03:53 +0000 (Mon, 10 Dec 2007) | 1 line
  
  Fix a typo.
........
  r41951 | danieljames | 2007-12-10 11:08:02 +0000 (Mon, 10 Dec 2007) | 1 line
  
  Use the locale in the case insensitive comparison, I really should add a test for this.
........
  r41994 | danieljames | 2007-12-13 00:26:05 +0000 (Thu, 13 Dec 2007) | 3 lines
  
  Hervé Brönnimann's improved explanation of the formula for avoiding
  invalidating iterators.
........
  r41995 | danieljames | 2007-12-13 00:30:46 +0000 (Thu, 13 Dec 2007) | 4 lines
  
  Explicity use the classic locale in the case insensitive example. I could make
  the locale a member, but that would make the example longer. Also, this would be
  a good place to put a note about the need for constant function objects.
........
  r41996 | danieljames | 2007-12-13 00:31:55 +0000 (Thu, 13 Dec 2007) | 1 line
  
  Pull the point examples out into test files - fixing a few bugs in the process.
........
  r41997 | danieljames | 2007-12-13 00:41:30 +0000 (Thu, 13 Dec 2007) | 3 lines
  
  A few reference links for boost::hash, it might be better to link to the
  first page of the Boost.Hash documentation though.
........
  r42092 | danieljames | 2007-12-16 10:07:27 +0000 (Sun, 16 Dec 2007) | 2 lines
  
  Fix some typos, and use American spelling.
........
  r42093 | danieljames | 2007-12-16 10:11:00 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Small documentation tweak.
........
  r42096 | danieljames | 2007-12-16 10:17:03 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Fix some reference documentation errors.
........
  r42097 | danieljames | 2007-12-16 10:28:08 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Document the explicit constructors.
........
  r42098 | danieljames | 2007-12-16 10:47:13 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Try to make the active issues and proposals a little clearer - including more obvious links to the relevant papers.
........
  r42099 | danieljames | 2007-12-16 10:52:30 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Fix some complexity errors in the comparison table.
........
  r42100 | danieljames | 2007-12-16 10:59:45 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Use Mapped instead of T in the documentation.
........
  r42101 | danieljames | 2007-12-16 11:06:16 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Remove hard-coded length of prime numbers.
........
[SVN r42187]
											
										 
											2007-12-19 23:09:09 +00:00
										 |  |  | `B`, `C`, `D` and `E` (this is just for illustration, containers will typically | 
					
						
							|  |  |  | have more buckets). | 
					
						
							| 
									
										
										
										
											2007-05-31 22:33:39 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | [$../../libs/unordered/doc/diagrams/buckets.png] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In order to decide which bucket to place an element in, the container applies | 
					
						
							|  |  |  | the hash function, `Hash`, to the element's key (for `unordered_set` and | 
					
						
							| 
									
										
										
										
											2007-11-15 23:36:33 +00:00
										 |  |  | `unordered_multiset` the key is the whole element, but is referred to as the key | 
					
						
							| 
									
										
										
										
											2007-05-31 22:33:39 +00:00
										 |  |  | so that the same terminology can be used for sets and maps). This returns a | 
					
						
							|  |  |  | value of type `std::size_t`.  `std::size_t` has a much greater range of values | 
					
						
							|  |  |  | then the number of buckets, so that container applies another transformation to | 
					
						
							|  |  |  | that value to choose a bucket to place the element in. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If at a later date the container wants to find an element in the container it | 
					
						
							|  |  |  | just has to apply the same process to the element's key to discover which | 
					
						
							|  |  |  | bucket it is in. If the hash function has worked well the elements will be | 
					
						
							|  |  |  | evenly distributed amongst the buckets so it will only have to examine a small | 
					
						
							|  |  |  | number of elements. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can see in the diagram that `A` & `D` have been placed in the same bucket. | 
					
						
							|  |  |  | This means that when looking for these elements, of another element that would | 
					
						
							|  |  |  | be placed in the same bucket, up to 2 comparison have to be made, making | 
					
						
							|  |  |  | searching slower. This is known as a collision. To keep things fast we try to | 
					
						
							|  |  |  | keep these to a minimum.   | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [table Methods for Accessing Buckets | 
					
						
							|  |  |  |     [[Method] [Description]] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``size_type bucket_count() const``] | 
					
						
							|  |  |  |         [The number of buckets.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``size_type max_bucket_count() const``] | 
					
						
							|  |  |  |         [An upper bound on the number of buckets.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``size_type bucket_size(size_type n) const``] | 
					
						
							|  |  |  |         [The number of elements in bucket `n`.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``size_type bucket(key_type const& k) const``] | 
					
						
							|  |  |  |         [Returns the index of the bucket which would contain k] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [`` | 
					
						
							|  |  |  |             local_iterator begin(size_type n); | 
					
						
							|  |  |  |             local_iterator end(size_type n); | 
					
						
							|  |  |  |             const_local_iterator begin(size_type n) const; | 
					
						
							|  |  |  |             const_local_iterator end(size_type n) const; | 
					
						
							|  |  |  |         ``] | 
					
						
							|  |  |  |         [Return begin and end iterators for bucket `n`.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  | ] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [h2 Controlling the number of buckets] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As more elements are added to an unordered associative container, the number | 
					
						
							|  |  |  | of elements in the buckets will increase causing performance to get worse. To | 
					
						
							|  |  |  | combat this the containers increase the bucket count as elements are inserted. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The standard gives you two methods to influence the bucket count. First you can | 
					
						
							|  |  |  | specify the minimum number of buckets in the constructor, and later, by calling | 
					
						
							|  |  |  | `rehash`. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The other method is the `max_load_factor` member function. The 'load factor' | 
					
						
							|  |  |  | is the average number of elements per bucket, `max_load_factor` can be used | 
					
						
							|  |  |  | to give a /hint/ of a value that the load factor should be kept below. The | 
					
						
							|  |  |  | draft standard doesn't actually require the container to pay much attention | 
					
						
							|  |  |  | to this value. The only time the load factor is /required/ to be less than the | 
					
						
							|  |  |  | maximum is following a call to `rehash`. But most implementations will probably | 
					
						
							|  |  |  | try to keep the number of elements below the max load factor, and set the | 
					
						
							|  |  |  | maximum load factor something the same or near to your hint - unless your hint | 
					
						
							|  |  |  | is unreasonably small. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | It is not specified anywhere how member functions other than `rehash` affect | 
					
						
							|  |  |  | the bucket count, although `insert` is only allowed to invalidate iterators | 
					
						
							|  |  |  | when the insertion causes the load factor to reach the maximum. Which will | 
					
						
							|  |  |  | typically mean that insert will only change the number of buckets when an | 
					
						
							|  |  |  | insert causes this. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In a similar manner to using `reserve` for `vector`s, it can be a good idea | 
					
						
							|  |  |  | to call `rehash` before inserting a large number of elements. This will get | 
					
						
							|  |  |  | the expensive rehashing out of the way and let you store iterators, safe in | 
					
						
							|  |  |  | the knowledge that they won't be invalidated. If you are inserting `n` | 
					
						
							|  |  |  | elements into container `x`, you could first call: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     x.rehash((x.size() + n) / x.max_load_factor() + 1); | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [blurb Note: `rehash`'s argument is the number of buckets, not the number of | 
					
						
							| 
									
										
											  
											
												Merged revisions 41822-41992,41994-42101 via svnmerge from 
https://svn.boost.org/svn/boost/branches/unordered/dev
........
  r41822 | danieljames | 2007-12-07 12:51:54 +0000 (Fri, 07 Dec 2007) | 5 lines
  
  Change the macros to meet boost guidelines.
  
  I should really have done this before the review. At least it'll give them
  something to say.
........
  r41928 | danieljames | 2007-12-09 19:23:27 +0000 (Sun, 09 Dec 2007) | 1 line
  
  Add some parameters to standalone documentation build.
........
  r41929 | danieljames | 2007-12-09 19:24:07 +0000 (Sun, 09 Dec 2007) | 1 line
  
  An extra rehash test for inserting a range.
........
  r41930 | danieljames | 2007-12-09 19:24:52 +0000 (Sun, 09 Dec 2007) | 1 line
  
  get_for_erase can be static because all the required information is in the iterator.
........
  r41931 | danieljames | 2007-12-09 19:31:00 +0000 (Sun, 09 Dec 2007) | 1 line
  
  ADL doesn't seem to be working properly on Visual C++ 7.1 when calling swap, so workaround this in the compile tests.
........
  r41932 | danieljames | 2007-12-09 19:44:46 +0000 (Sun, 09 Dec 2007) | 1 line
  
  Try to make the erase exception requirements a little clearer.
........
  r41933 | danieljames | 2007-12-09 19:52:50 +0000 (Sun, 09 Dec 2007) | 1 line
  
  Hopefully clearer comparison of accessors for comparison/hash function objects.
........
  r41943 | danieljames | 2007-12-10 00:03:53 +0000 (Mon, 10 Dec 2007) | 1 line
  
  Fix a typo.
........
  r41951 | danieljames | 2007-12-10 11:08:02 +0000 (Mon, 10 Dec 2007) | 1 line
  
  Use the locale in the case insensitive comparison, I really should add a test for this.
........
  r41994 | danieljames | 2007-12-13 00:26:05 +0000 (Thu, 13 Dec 2007) | 3 lines
  
  Hervé Brönnimann's improved explanation of the formula for avoiding
  invalidating iterators.
........
  r41995 | danieljames | 2007-12-13 00:30:46 +0000 (Thu, 13 Dec 2007) | 4 lines
  
  Explicity use the classic locale in the case insensitive example. I could make
  the locale a member, but that would make the example longer. Also, this would be
  a good place to put a note about the need for constant function objects.
........
  r41996 | danieljames | 2007-12-13 00:31:55 +0000 (Thu, 13 Dec 2007) | 1 line
  
  Pull the point examples out into test files - fixing a few bugs in the process.
........
  r41997 | danieljames | 2007-12-13 00:41:30 +0000 (Thu, 13 Dec 2007) | 3 lines
  
  A few reference links for boost::hash, it might be better to link to the
  first page of the Boost.Hash documentation though.
........
  r42092 | danieljames | 2007-12-16 10:07:27 +0000 (Sun, 16 Dec 2007) | 2 lines
  
  Fix some typos, and use American spelling.
........
  r42093 | danieljames | 2007-12-16 10:11:00 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Small documentation tweak.
........
  r42096 | danieljames | 2007-12-16 10:17:03 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Fix some reference documentation errors.
........
  r42097 | danieljames | 2007-12-16 10:28:08 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Document the explicit constructors.
........
  r42098 | danieljames | 2007-12-16 10:47:13 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Try to make the active issues and proposals a little clearer - including more obvious links to the relevant papers.
........
  r42099 | danieljames | 2007-12-16 10:52:30 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Fix some complexity errors in the comparison table.
........
  r42100 | danieljames | 2007-12-16 10:59:45 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Use Mapped instead of T in the documentation.
........
  r42101 | danieljames | 2007-12-16 11:06:16 +0000 (Sun, 16 Dec 2007) | 1 line
  
  Remove hard-coded length of prime numbers.
........
[SVN r42187]
											
										 
											2007-12-19 23:09:09 +00:00
										 |  |  | elements, which is why the new size is divided by the maximum load factor.  The | 
					
						
							|  |  |  | + 1 guarantees there is no invalidation; without it, reallocation could occur | 
					
						
							|  |  |  | if the number of bucket exactly divides the target size, since the container is | 
					
						
							|  |  |  | allowed to rehash when the load factor is equal to the maximum load factor.] | 
					
						
							| 
									
										
										
										
											2007-05-31 22:33:39 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | [table Methods for Controlling Bucket Size | 
					
						
							|  |  |  |     [[Method] [Description]] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``float load_factor() const``] | 
					
						
							|  |  |  |         [The average number of elements per bucket.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``float max_load_factor() const``] | 
					
						
							|  |  |  |         [Returns the current maximum load factor.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``float max_load_factor(float z)``] | 
					
						
							|  |  |  |         [Changes the container's maximum load factor, using `z` as a hint.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  |     [ | 
					
						
							|  |  |  |         [``void rehash(size_type n)``] | 
					
						
							|  |  |  |         [Changes the number of buckets so that there at least n buckets, and | 
					
						
							|  |  |  |         so that the load factor is less than the maximum load factor.] | 
					
						
							|  |  |  |     ] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ] | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | [endsect] |