diff --git a/hash/doc/hash.qbk b/hash/doc/hash.qbk index b734fb4..fe428f3 100644 --- a/hash/doc/hash.qbk +++ b/hash/doc/hash.qbk @@ -14,11 +14,16 @@ ] ] +[def __issues__ + [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf + Library Extension Technical Report Issues List]] + [include:hash intro.qbk] [include:hash tutorial.qbk] [include:hash portability.qbk] [include:hash disable.qbk] [include:hash changes.qbk] +[include:hash rationale.qbk] [xinclude ref.xml] [include:hash links.qbk] [include:hash thanks.qbk] diff --git a/hash/doc/intro.qbk b/hash/doc/intro.qbk index b027719..076e997 100644 --- a/hash/doc/intro.qbk +++ b/hash/doc/intro.qbk @@ -18,9 +18,6 @@ [def __multi-index-short__ [@boost:/libs/multi_index/doc/index.html Boost.MultiIndex]] [def __bimap__ [@boost:/libs/bimap/index.html Boost.Bimap]] -[def __issues__ - [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf - Library Extension Technical Report Issues List]] [def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function hash function]] [def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table hash table]] @@ -44,5 +41,12 @@ __issues__ (page 63), this adds support for: * the standard containers. * extending [classref boost::hash] for custom types. +[note +This hash function is designed to be used in containers based on +the STL and is not suitable as a general purpose hash function. +For more details see the [link hash.rationale rationale]. +] + + [endsect] diff --git a/hash/doc/portability.qbk b/hash/doc/portability.qbk index fabb298..a65bc19 100644 --- a/hash/doc/portability.qbk +++ b/hash/doc/portability.qbk @@ -90,16 +90,4 @@ boost namespace: Full code for this example is at [@boost:/libs/functional/hash/examples/portable.cpp /libs/functional/hash/examples/portable.cpp]. -[h2 Other Issues] - -On Visual C++ versions 6.5 and 7.0, `hash_value` isn't overloaded for built in -arrays. __boost_hash__, [funcref boost::hash_combine] and [funcref boost::hash_range] all use a workaround to -support built in arrays so this shouldn't be a problem in most cases. - -On Visual C++ versions 6.5 and 7.0, function pointers aren't currently supported. - -When using GCC on Solaris, `boost::hash_value(long double)` treats -`long double`s as `double`s - so the hash function doesn't take into account the -full range of values. - [endsect] diff --git a/hash/doc/rationale.qbk b/hash/doc/rationale.qbk new file mode 100644 index 0000000..f01605b --- /dev/null +++ b/hash/doc/rationale.qbk @@ -0,0 +1,50 @@ + +[/ Copyright 2011 Daniel James. + / Distributed under the Boost Software License, Version 1.0. (See accompanying + / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) ] + +[section:rationale Rationale] + +The rationale for the design can be found in the original design +[footnote issue 6.18 of the __issues__ (page 63)], but an issue that +occasionally comes up is the quality of the hash function, so that +demands some more attention. + +Many hash functions strive to have little correlation between the input +and output values. They attempt to uniformally distribute the output +values for very similar inputs. This hash function makes no such +attempt. In fact, for integers, the result of the hash function is often +just the input value. So similar but different input values will often +result in similar but different output values. + +This means that it is not appropriate as a general hash function. For +example, a hash table may discard bits from the hash function resulting +in likely collisions, or might have poor collision resolution when hash +values are clustered together. In such cases this hash function will +preform poorly. + +So why not implement a higher quality hash function? Well, the standard +makes no such guarantee, it just requires that the hashes of two +different values are unlikely to collide. Containers or algorithms +designed to work with the standard hash function will have to be +implemented to work well when the hash function's output is correlated +to its input. Since they are paying that cost a higher quality hash function +would be wasteful. + +For other use cases, if you do need a higher quality hash function, +there are several options +available. One is to use a second hash on the output of this hash +function, such as [@http://www.concentric.net/~ttwang/tech/inthash.htm +Thomas Wang's hash function]. This this may not work as +well as a hash algorithm tailored for the input. + +For strings that are several fast, high quality hash functions +available (for example [@http://code.google.com/p/smhasher/ MurmurHash3] +and [@http://code.google.com/p/cityhash/ Google's CityHash]), +although they tend to be more machine specific. +These may also be appropriate for hashing a binary representation of +your data - providing that all equal values have an equal +representation, which is not always the case (e.g. for floating point +values). + +[endsect] \ No newline at end of file