mirror of
https://github.com/boostorg/functional.git
synced 2025-08-02 05:54:29 +02:00
Hash: Note about the quality of the hash function.
In response to the thread starting at: http://lists.boost.org/Archives/boost/2011/10/186476.php [SVN r74856]
This commit is contained in:
@@ -14,11 +14,16 @@
|
|||||||
]
|
]
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[def __issues__
|
||||||
|
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf
|
||||||
|
Library Extension Technical Report Issues List]]
|
||||||
|
|
||||||
[include:hash intro.qbk]
|
[include:hash intro.qbk]
|
||||||
[include:hash tutorial.qbk]
|
[include:hash tutorial.qbk]
|
||||||
[include:hash portability.qbk]
|
[include:hash portability.qbk]
|
||||||
[include:hash disable.qbk]
|
[include:hash disable.qbk]
|
||||||
[include:hash changes.qbk]
|
[include:hash changes.qbk]
|
||||||
|
[include:hash rationale.qbk]
|
||||||
[xinclude ref.xml]
|
[xinclude ref.xml]
|
||||||
[include:hash links.qbk]
|
[include:hash links.qbk]
|
||||||
[include:hash thanks.qbk]
|
[include:hash thanks.qbk]
|
||||||
|
@@ -18,9 +18,6 @@
|
|||||||
[def __multi-index-short__ [@boost:/libs/multi_index/doc/index.html
|
[def __multi-index-short__ [@boost:/libs/multi_index/doc/index.html
|
||||||
Boost.MultiIndex]]
|
Boost.MultiIndex]]
|
||||||
[def __bimap__ [@boost:/libs/bimap/index.html Boost.Bimap]]
|
[def __bimap__ [@boost:/libs/bimap/index.html Boost.Bimap]]
|
||||||
[def __issues__
|
|
||||||
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf
|
|
||||||
Library Extension Technical Report Issues List]]
|
|
||||||
[def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function hash function]]
|
[def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function hash function]]
|
||||||
[def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table hash table]]
|
[def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table hash table]]
|
||||||
|
|
||||||
@@ -44,5 +41,12 @@ __issues__ (page 63), this adds support for:
|
|||||||
* the standard containers.
|
* the standard containers.
|
||||||
* extending [classref boost::hash] for custom types.
|
* extending [classref boost::hash] for custom types.
|
||||||
|
|
||||||
|
[note
|
||||||
|
This hash function is designed to be used in containers based on
|
||||||
|
the STL and is not suitable as a general purpose hash function.
|
||||||
|
For more details see the [link hash.rationale rationale].
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
[endsect]
|
[endsect]
|
||||||
|
|
||||||
|
55
hash/doc/rationale.qbk
Normal file
55
hash/doc/rationale.qbk
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
|
||||||
|
[/ Copyright 2011 Daniel James.
|
||||||
|
/ Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||||||
|
/ file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ]
|
||||||
|
|
||||||
|
[section:rationale Rationale]
|
||||||
|
|
||||||
|
The rationale for the design can be found in the original design
|
||||||
|
[footnote issue 6.18 of the __issues__ (page 63)], but an issue that
|
||||||
|
occasionally comes up is the quality of the hash function, so that
|
||||||
|
demands some more attention.
|
||||||
|
|
||||||
|
Many hash functions strive to have little correlation between the input
|
||||||
|
and output values. They attempt to uniformally distribute the output
|
||||||
|
values for very similar inputs. This hash function makes no such
|
||||||
|
attempt. In fact, for integers, the result of the hash function is often
|
||||||
|
just the input value. So similar but different input values will result
|
||||||
|
in similar but different output values.
|
||||||
|
|
||||||
|
This means that it is not appropriate as a general hash function. For
|
||||||
|
example, a hash table may discard bits from the hash function resulting
|
||||||
|
in likely collisions, or might have poor collision resolution when hash
|
||||||
|
values are clustered together. In such cases this hash function will
|
||||||
|
preform poorly.
|
||||||
|
|
||||||
|
So why not implement a higher quality hash function? Well, the standard
|
||||||
|
makes no such guarantee, it just requires that the hashes of two
|
||||||
|
different values are unlikely to collide. So containers or algorithms
|
||||||
|
designed to work with the standard hash function will have to be
|
||||||
|
implemented to work well when the hash function's output is correlated
|
||||||
|
to its input. Since they are paying that cost it would be wasteful to
|
||||||
|
expand the effort to make a higher quality hash function.
|
||||||
|
|
||||||
|
If you do need a higher quality hash function, there are several options
|
||||||
|
available. One is to use a second hash on the output of this hash
|
||||||
|
function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm
|
||||||
|
Thomas Wang's hash function]. But for many types this might not work as
|
||||||
|
well as a hash algorithm tailored for the input.
|
||||||
|
|
||||||
|
For strings that are several fast, high quality hash functions
|
||||||
|
available, such as:
|
||||||
|
|
||||||
|
* [@http://burtleburtle.net/bob/hash/index.html Bob Jenkins' hash
|
||||||
|
functions]
|
||||||
|
* [@http://www.azillionmonkeys.com/qed/hash.html Paul Hsieh's hash
|
||||||
|
functions]
|
||||||
|
* [@http://code.google.com/p/cityhash/ Google's CityHash]
|
||||||
|
* [@http://code.google.com/p/smhasher/ MurmurHash3]
|
||||||
|
|
||||||
|
These may also be appropriate for hashing a binary representation of
|
||||||
|
your data - providing that all equal values have an equal
|
||||||
|
representation, which is not always the case (e.g. for floating point
|
||||||
|
values).
|
||||||
|
|
||||||
|
[endsect]
|
Reference in New Issue
Block a user