Hash: A few edits to the new rationale.

[SVN r74963]
This commit is contained in:
Daniel James
2011-10-16 10:32:12 +00:00
parent 60bcdbd1be
commit 2c4f692c1e

View File

@@ -14,8 +14,8 @@ Many hash functions strive to have little correlation between the input
and output values. They attempt to uniformally distribute the output and output values. They attempt to uniformally distribute the output
values for very similar inputs. This hash function makes no such values for very similar inputs. This hash function makes no such
attempt. In fact, for integers, the result of the hash function is often attempt. In fact, for integers, the result of the hash function is often
just the input value. So similar but different input values will result just the input value. So similar but different input values will often
in similar but different output values. result in similar but different output values.
This means that it is not appropriate as a general hash function. For This means that it is not appropriate as a general hash function. For
example, a hash table may discard bits from the hash function resulting example, a hash table may discard bits from the hash function resulting
@@ -25,28 +25,23 @@ preform poorly.
So why not implement a higher quality hash function? Well, the standard So why not implement a higher quality hash function? Well, the standard
makes no such guarantee, it just requires that the hashes of two makes no such guarantee, it just requires that the hashes of two
different values are unlikely to collide. So containers or algorithms different values are unlikely to collide. Containers or algorithms
designed to work with the standard hash function will have to be designed to work with the standard hash function will have to be
implemented to work well when the hash function's output is correlated implemented to work well when the hash function's output is correlated
to its input. Since they are paying that cost it would be wasteful to to its input. Since they are paying that cost a higher quality hash function
expand the effort to make a higher quality hash function. would be wasteful.
If you do need a higher quality hash function, there are several options For other use cases, if you do need a higher quality hash function,
there are several options
available. One is to use a second hash on the output of this hash available. One is to use a second hash on the output of this hash
function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm
Thomas Wang's hash function]. But for many types this might not work as Thomas Wang's hash function]. This this may not work as
well as a hash algorithm tailored for the input. well as a hash algorithm tailored for the input.
For strings that are several fast, high quality hash functions For strings that are several fast, high quality hash functions
available, such as: available (for example [@http://code.google.com/p/smhasher/ MurmurHash3]
and [@http://code.google.com/p/cityhash/ Google's CityHash]),
* [@http://burtleburtle.net/bob/hash/index.html Bob Jenkins' hash although they tend to be more machine specific.
functions]
* [@http://www.azillionmonkeys.com/qed/hash.html Paul Hsieh's hash
functions]
* [@http://code.google.com/p/cityhash/ Google's CityHash]
* [@http://code.google.com/p/smhasher/ MurmurHash3]
These may also be appropriate for hashing a binary representation of These may also be appropriate for hashing a binary representation of
your data - providing that all equal values have an equal your data - providing that all equal values have an equal
representation, which is not always the case (e.g. for floating point representation, which is not always the case (e.g. for floating point