mirror of
https://github.com/boostorg/functional.git
synced 2025-08-01 21:44:28 +02:00
Hash: A few edits to the new rationale.
[SVN r74963]
This commit is contained in:
@@ -14,8 +14,8 @@ Many hash functions strive to have little correlation between the input
|
|||||||
and output values. They attempt to uniformally distribute the output
|
and output values. They attempt to uniformally distribute the output
|
||||||
values for very similar inputs. This hash function makes no such
|
values for very similar inputs. This hash function makes no such
|
||||||
attempt. In fact, for integers, the result of the hash function is often
|
attempt. In fact, for integers, the result of the hash function is often
|
||||||
just the input value. So similar but different input values will result
|
just the input value. So similar but different input values will often
|
||||||
in similar but different output values.
|
result in similar but different output values.
|
||||||
|
|
||||||
This means that it is not appropriate as a general hash function. For
|
This means that it is not appropriate as a general hash function. For
|
||||||
example, a hash table may discard bits from the hash function resulting
|
example, a hash table may discard bits from the hash function resulting
|
||||||
@@ -25,28 +25,23 @@ preform poorly.
|
|||||||
|
|
||||||
So why not implement a higher quality hash function? Well, the standard
|
So why not implement a higher quality hash function? Well, the standard
|
||||||
makes no such guarantee, it just requires that the hashes of two
|
makes no such guarantee, it just requires that the hashes of two
|
||||||
different values are unlikely to collide. So containers or algorithms
|
different values are unlikely to collide. Containers or algorithms
|
||||||
designed to work with the standard hash function will have to be
|
designed to work with the standard hash function will have to be
|
||||||
implemented to work well when the hash function's output is correlated
|
implemented to work well when the hash function's output is correlated
|
||||||
to its input. Since they are paying that cost it would be wasteful to
|
to its input. Since they are paying that cost a higher quality hash function
|
||||||
expand the effort to make a higher quality hash function.
|
would be wasteful.
|
||||||
|
|
||||||
If you do need a higher quality hash function, there are several options
|
For other use cases, if you do need a higher quality hash function,
|
||||||
|
there are several options
|
||||||
available. One is to use a second hash on the output of this hash
|
available. One is to use a second hash on the output of this hash
|
||||||
function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm
|
function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm
|
||||||
Thomas Wang's hash function]. But for many types this might not work as
|
Thomas Wang's hash function]. This this may not work as
|
||||||
well as a hash algorithm tailored for the input.
|
well as a hash algorithm tailored for the input.
|
||||||
|
|
||||||
For strings that are several fast, high quality hash functions
|
For strings that are several fast, high quality hash functions
|
||||||
available, such as:
|
available (for example [@http://code.google.com/p/smhasher/ MurmurHash3]
|
||||||
|
and [@http://code.google.com/p/cityhash/ Google's CityHash]),
|
||||||
* [@http://burtleburtle.net/bob/hash/index.html Bob Jenkins' hash
|
although they tend to be more machine specific.
|
||||||
functions]
|
|
||||||
* [@http://www.azillionmonkeys.com/qed/hash.html Paul Hsieh's hash
|
|
||||||
functions]
|
|
||||||
* [@http://code.google.com/p/cityhash/ Google's CityHash]
|
|
||||||
* [@http://code.google.com/p/smhasher/ MurmurHash3]
|
|
||||||
|
|
||||||
These may also be appropriate for hashing a binary representation of
|
These may also be appropriate for hashing a binary representation of
|
||||||
your data - providing that all equal values have an equal
|
your data - providing that all equal values have an equal
|
||||||
representation, which is not always the case (e.g. for floating point
|
representation, which is not always the case (e.g. for floating point
|
||||||
|
Reference in New Issue
Block a user