mirror of
https://github.com/boostorg/container_hash.git
synced 2026-04-28 18:02:14 +02:00
46 lines
2.2 KiB
Plaintext
46 lines
2.2 KiB
Plaintext
////
|
|
Copyright 2005-2008 Daniel James
|
|
Copyright 2022 Christian Mazakas
|
|
Copyright 2022 Peter Dimov
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
https://www.boost.org/LICENSE_1_0.txt
|
|
////
|
|
|
|
[#notes]
|
|
= Design and Implementation Notes
|
|
:idprefix: notes_
|
|
|
|
== Quality of the Hash Function
|
|
|
|
Many hash functions strive to have little correlation between the input and
|
|
output values. They attempt to uniformally distribute the output values for
|
|
very similar inputs. This hash function makes no such attempt. In fact, for
|
|
integers, the result of the hash function is often just the input value. So
|
|
similar but different input values will often result in similar but different
|
|
output values. This means that it is not appropriate as a general hash
|
|
function. For example, a hash table may discard bits from the hash function
|
|
resulting in likely collisions, or might have poor collision resolution when
|
|
hash values are clustered together. In such cases this hash function will
|
|
perform poorly.
|
|
|
|
But the standard has no such requirement for the hash function, it just
|
|
requires that the hashes of two different values are unlikely to collide.
|
|
Containers or algorithms designed to work with the standard hash function will
|
|
have to be implemented to work well when the hash function's output is
|
|
correlated to its input. Since they are paying that cost a higher quality hash
|
|
function would be wasteful.
|
|
|
|
For other use cases, if you do need a higher quality hash function, then
|
|
neither the standard hash function or `boost::hash` are appropriate. There are
|
|
several options available. One is to use a second hash on the output of this
|
|
hash function, such as
|
|
http://web.archive.org/web/20121102023700/http://www.concentric.net/~Ttwang/tech/inthash.htm[Thomas Wang's hash function].
|
|
This this may not work as well as a hash algorithm tailored for the input.
|
|
|
|
For strings there are several fast, high quality hash functions available
|
|
(for example http://code.google.com/p/smhasher/[MurmurHash3] and
|
|
http://code.google.com/p/cityhash/[Google's CityHash]), although they tend to
|
|
be more machine specific. These may also be appropriate for hashing a binary
|
|
representation of your data - providing that all equal values have an equal
|
|
representation, which is not always the case (e.g. for floating point values).
|