forked from boostorg/container_hash
Compare commits
8 Commits
feature/re
...
feature/ha
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
85f9f8a97a | ||
|
|
e92eae9eb2 | ||
|
|
8a1335458a | ||
|
|
f7e537d1a1 | ||
|
|
607b73f1e0 | ||
|
|
789261c68c | ||
|
|
29ee19ee7f | ||
|
|
8bb7d43646 |
20
README.md
Normal file
20
README.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# Boost.ContainerHash
|
||||
|
||||
The Boost.ContainerHash library, part of [Boost C++ Libraries](https://boost.org),
|
||||
provides `boost::hash`, an enhanced implementation of the
|
||||
[hash function](https://en.wikipedia.org/wiki/Hash_function) object specified
|
||||
by C++11 as `std::hash`, and several support facilities (`hash_combine`,
|
||||
`hash_range`, `hash_unordered_range`).
|
||||
|
||||
`boost::hash` supports most standard types and some user-defined types out of
|
||||
the box, and is extensible; it's possible for a user-defined type `X` to make
|
||||
iself hashable via `boost::hash<X>` by defining an appropriate overload of the
|
||||
function `hash_value`.
|
||||
|
||||
See [the documentation of the library](https://www.boost.org/libs/container_hash)
|
||||
for more information.
|
||||
|
||||
## License
|
||||
|
||||
Distributed under the
|
||||
[Boost Software License, Version 1.0](http://boost.org/LICENSE_1_0.txt).
|
||||
@@ -10,6 +10,26 @@ https://www.boost.org/LICENSE_1_0.txt
|
||||
= Design and Implementation Notes
|
||||
:idprefix: notes_
|
||||
|
||||
== Quality of the Hash Function
|
||||
|
||||
Many hash functions strive to have little correlation between the input and
|
||||
output values. They attempt to uniformally distribute the output values for
|
||||
very similar inputs. This hash function makes no such attempt. In fact, for
|
||||
integers, the result of the hash function is often just the input value. So
|
||||
similar but different input values will often result in similar but different
|
||||
output values. This means that it is not appropriate as a general hash
|
||||
function. For example, a hash table may discard bits from the hash function
|
||||
resulting in likely collisions, or might have poor collision resolution when
|
||||
hash values are clustered together. In such cases this hash function will
|
||||
perform poorly.
|
||||
|
||||
But the standard has no such requirement for the hash function, it just
|
||||
requires that the hashes of two different values are unlikely to collide.
|
||||
Containers or algorithms designed to work with the standard hash function will
|
||||
have to be implemented to work well when the hash function's output is
|
||||
correlated to its input. Since they are paying that cost a higher quality hash
|
||||
function would be wasteful.
|
||||
|
||||
== The hash_value Customization Point
|
||||
|
||||
The way one customizes the standard `std::hash` function object for user
|
||||
@@ -154,22 +174,46 @@ With this improved `hash_combine`, `boost::hash` for strings now passes the
|
||||
https://github.com/aappleby/smhasher[SMHasher test suite] by Austin Appleby
|
||||
(for a 64 bit `size_t`).
|
||||
|
||||
== Quality of the Hash Function
|
||||
== hash_range
|
||||
|
||||
Many hash functions strive to have little correlation between the input and
|
||||
output values. They attempt to uniformally distribute the output values for
|
||||
very similar inputs. This hash function makes no such attempt. In fact, for
|
||||
integers, the result of the hash function is often just the input value. So
|
||||
similar but different input values will often result in similar but different
|
||||
output values. This means that it is not appropriate as a general hash
|
||||
function. For example, a hash table may discard bits from the hash function
|
||||
resulting in likely collisions, or might have poor collision resolution when
|
||||
hash values are clustered together. In such cases this hash function will
|
||||
perform poorly.
|
||||
The traditional implementation of `hash_range(seed, first, last)` has been
|
||||
|
||||
But the standard has no such requirement for the hash function, it just
|
||||
requires that the hashes of two different values are unlikely to collide.
|
||||
Containers or algorithms designed to work with the standard hash function will
|
||||
have to be implemented to work well when the hash function's output is
|
||||
correlated to its input. Since they are paying that cost a higher quality hash
|
||||
function would be wasteful.
|
||||
[source]
|
||||
----
|
||||
for( ; first != last; ++first )
|
||||
{
|
||||
boost::hash_combine<typename std::iterator_traits<It>::value_type>( seed, *first );
|
||||
}
|
||||
----
|
||||
|
||||
(the explicit template parameter is needed to support iterators with proxy
|
||||
return types such as `std::vector<bool>::iterator`.)
|
||||
|
||||
This is logical, consistent and straightforward. In the common case where
|
||||
`typename std::iterator_traits<It>::value_type` is `char` -- which it is
|
||||
in the common case of `boost::hash<std::string>` -- this however leaves a
|
||||
lot of performance on the table, because processing each `char` individually
|
||||
is much less efficient than processing several in bulk.
|
||||
|
||||
In Boost 1.81, `hash_range` was changed to process elements of type `char`,
|
||||
`signed char`, `unsigned char`, `std::byte`, or `char8_t`, four of a time.
|
||||
A `uint32_t` is composed from `first[0]` to `first[3]`, and that `uint32_t`
|
||||
is fed to `hash_combine`.
|
||||
|
||||
In principle, when `size_t` is 64 bit, we could have used `uint64_t` instead.
|
||||
We do not, because this allows producing an arbitrary hash value by choosing
|
||||
the input bytes appropriately (because `hash_combine` is reversible.)
|
||||
|
||||
Allowing control only over 32 bits of the full 64 bit `size_t` value makes
|
||||
these "chosen plaintext attacks" harder.
|
||||
|
||||
This is not as harmful to performance as it first appears, because the
|
||||
input to `hash<string>` (e.g. the key in an unordered container) is often
|
||||
short (9 to 13 bytes in some typical scenarios.)
|
||||
|
||||
Note that `hash_range` has also traditionally guaranteed that the same element
|
||||
sequence yields the same hash value regardless of the iterator type. This
|
||||
property remains valid after the changes to `char` range hashing. `hash_range`,
|
||||
applied to the `char` sequence `{ 'a', 'b', 'c' }`, results in the same value
|
||||
whether the sequence comes from `char[3]`, `std::string`, `std::deque<char>`,
|
||||
or `std::list<char>`.
|
||||
|
||||
@@ -225,7 +225,7 @@ Effects: ::
|
||||
+
|
||||
--
|
||||
When `typename std::iterator_traits<It>::value_type` is not `char`, `signed char`,
|
||||
`unsigned char`,
|
||||
`unsigned char`, `std::byte`, or `char8_t`,
|
||||
|
||||
[source]
|
||||
----
|
||||
|
||||
@@ -1,6 +1,13 @@
|
||||
////
|
||||
Copyright 2005-2008 Daniel James
|
||||
Copyright 2022 Christian Mazakas
|
||||
Copyright 2022 Peter Dimov
|
||||
Distributed under the Boost Software License, Version 1.0.
|
||||
https://www.boost.org/LICENSE_1_0.txt
|
||||
////
|
||||
|
||||
[#thanks]
|
||||
= Acknowledgements
|
||||
|
||||
:idprefix: thanks_
|
||||
|
||||
This library is based on the design by Peter Dimov. During the initial development Joaquín M López Muñoz made many useful suggestions and contributed fixes.
|
||||
@@ -12,3 +19,5 @@ The implementation of the hash function for pointers is based on suggestions mad
|
||||
Some useful improvements to the floating point hash algorithm were suggested by Daniel Krügler.
|
||||
|
||||
The original implementation came from Jeremy B. Maitin-Shepard's hash table library, although this is a complete rewrite.
|
||||
|
||||
The documentation was converted from Quickbook to AsciiDoc by Christian Mazakas.
|
||||
|
||||
@@ -27,6 +27,14 @@ template<> struct is_char_type<char>: public boost::true_type {};
|
||||
template<> struct is_char_type<signed char>: public boost::true_type {};
|
||||
template<> struct is_char_type<unsigned char>: public boost::true_type {};
|
||||
|
||||
#if defined(__cpp_char8_t) && __cpp_char8_t >= 201811L
|
||||
template<> struct is_char_type<char8_t>: public boost::true_type {};
|
||||
#endif
|
||||
|
||||
#if defined(__cpp_lib_byte) && __cpp_lib_byte >= 201603L
|
||||
template<> struct is_char_type<std::byte>: public boost::true_type {};
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
template<class It>
|
||||
|
||||
@@ -176,12 +176,11 @@ namespace boost
|
||||
{
|
||||
template<class T,
|
||||
std::size_t Bits = sizeof(T) * CHAR_BIT,
|
||||
int Digits = std::numeric_limits<T>::digits,
|
||||
std::size_t size_t_bits = sizeof(std::size_t) * CHAR_BIT>
|
||||
int Digits = std::numeric_limits<T>::digits>
|
||||
struct hash_float_impl;
|
||||
|
||||
// float
|
||||
template<class T, int Digits, std::size_t size_t_bits> struct hash_float_impl<T, 32, Digits, size_t_bits>
|
||||
template<class T, int Digits> struct hash_float_impl<T, 32, Digits>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
@@ -193,35 +192,19 @@ namespace boost
|
||||
};
|
||||
|
||||
// double
|
||||
template<class T, int Digits> struct hash_float_impl<T, 64, Digits, 64>
|
||||
template<class T, int Digits> struct hash_float_impl<T, 64, Digits>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
boost::uint64_t w;
|
||||
std::memcpy( &w, &v, sizeof( v ) );
|
||||
|
||||
return w;
|
||||
}
|
||||
};
|
||||
|
||||
template<class T, int Digits> struct hash_float_impl<T, 64, Digits, 32>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
boost::uint32_t w[ 2 ];
|
||||
std::memcpy( &w, &v, sizeof( v ) );
|
||||
|
||||
std::size_t seed = 0;
|
||||
|
||||
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
|
||||
|
||||
return seed;
|
||||
return hash_value( w );
|
||||
}
|
||||
};
|
||||
|
||||
// 80 bit long double in 12 bytes
|
||||
template<class T> struct hash_float_impl<T, 96, 64, 64>
|
||||
template<class T> struct hash_float_impl<T, 96, 64>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
@@ -230,32 +213,15 @@ namespace boost
|
||||
|
||||
std::size_t seed = 0;
|
||||
|
||||
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
|
||||
|
||||
return seed;
|
||||
}
|
||||
};
|
||||
|
||||
template<class T> struct hash_float_impl<T, 96, 64, 32>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
boost::uint32_t w[ 3 ] = {};
|
||||
std::memcpy( &w, &v, 80 / CHAR_BIT );
|
||||
|
||||
std::size_t seed = 0;
|
||||
|
||||
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[2] ) + hash_detail::hash_mix( seed );
|
||||
seed = hash_value( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = hash_value( w[1] ) + hash_detail::hash_mix( seed );
|
||||
|
||||
return seed;
|
||||
}
|
||||
};
|
||||
|
||||
// 80 bit long double in 16 bytes
|
||||
template<class T> struct hash_float_impl<T, 128, 64, 64>
|
||||
template<class T> struct hash_float_impl<T, 128, 64>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
@@ -264,32 +230,15 @@ namespace boost
|
||||
|
||||
std::size_t seed = 0;
|
||||
|
||||
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
|
||||
|
||||
return seed;
|
||||
}
|
||||
};
|
||||
|
||||
template<class T> struct hash_float_impl<T, 128, 64, 32>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
boost::uint32_t w[ 3 ] = {};
|
||||
std::memcpy( &w, &v, 80 / CHAR_BIT );
|
||||
|
||||
std::size_t seed = 0;
|
||||
|
||||
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
|
||||
seed = static_cast<std::size_t>( w[2] ) + hash_detail::hash_mix( seed );
|
||||
seed = hash_value( w[0] ) + hash_detail::hash_mix( seed );
|
||||
seed = hash_value( w[1] ) + hash_detail::hash_mix( seed );
|
||||
|
||||
return seed;
|
||||
}
|
||||
};
|
||||
|
||||
// 128 bit long double
|
||||
template<class T, int Digits, std::size_t size_t_bits> struct hash_float_impl<T, 128, Digits, size_t_bits>
|
||||
template<class T, int Digits> struct hash_float_impl<T, 128, Digits>
|
||||
{
|
||||
static std::size_t fn( T v )
|
||||
{
|
||||
@@ -630,6 +579,4 @@ namespace boost
|
||||
#endif
|
||||
}
|
||||
|
||||
#undef BOOST_FUNCTIONAL_HASH_ROTL32
|
||||
|
||||
#endif // #ifndef BOOST_FUNCTIONAL_HASH_HASH_HPP
|
||||
|
||||
@@ -48,5 +48,13 @@ int main()
|
||||
test<float>();
|
||||
test<double>();
|
||||
|
||||
#if defined(__cpp_char8_t) && __cpp_char8_t >= 201811L
|
||||
test<char8_t>();
|
||||
#endif
|
||||
|
||||
#if defined(__cpp_lib_byte) && __cpp_lib_byte >= 201603L
|
||||
test<std::byte>();
|
||||
#endif
|
||||
|
||||
return boost::report_errors();
|
||||
}
|
||||
|
||||
@@ -171,16 +171,16 @@ int main()
|
||||
|
||||
#if SIZE_MAX == 4294967295U
|
||||
|
||||
BOOST_TEST_EQ( hv(1.0), 1072693248U );
|
||||
BOOST_TEST_EQ( hv(-1.0), 3220176896U );
|
||||
BOOST_TEST_EQ( hv(3.14), 3972386992U );
|
||||
BOOST_TEST_EQ( hv(-3.14), 1824903344U );
|
||||
BOOST_TEST_EQ( hv(1e-308), 2213556530U );
|
||||
BOOST_TEST_EQ( hv(-1e-308), 66072882U );
|
||||
BOOST_TEST_EQ( hv(1e+308), 2623678890U );
|
||||
BOOST_TEST_EQ( hv(-1e+308), 476195242U );
|
||||
BOOST_TEST_EQ( hv(std::numeric_limits<double>::infinity()), 2146435072U );
|
||||
BOOST_TEST_EQ( hv(-std::numeric_limits<double>::infinity()), 4293918720U );
|
||||
BOOST_TEST_EQ( hv(1.0), 2619008688U );
|
||||
BOOST_TEST_EQ( hv(-1.0), 146497060U );
|
||||
BOOST_TEST_EQ( hv(3.14), 101651732U );
|
||||
BOOST_TEST_EQ( hv(-3.14), 210858151U );
|
||||
BOOST_TEST_EQ( hv(1e-308), 3911789313U );
|
||||
BOOST_TEST_EQ( hv(-1e-308), 1812507313U );
|
||||
BOOST_TEST_EQ( hv(1e+308), 987802568U );
|
||||
BOOST_TEST_EQ( hv(-1e+308), 1639042439U );
|
||||
BOOST_TEST_EQ( hv(std::numeric_limits<double>::infinity()), 3227645345U );
|
||||
BOOST_TEST_EQ( hv(-std::numeric_limits<double>::infinity()), 2247339177U );
|
||||
|
||||
#else
|
||||
|
||||
@@ -207,21 +207,23 @@ int main()
|
||||
|
||||
if( ldbits == 64 )
|
||||
{
|
||||
BOOST_TEST_EQ( hv(1.0L), 1072693248U );
|
||||
BOOST_TEST_EQ( hv(-1.0L), 3220176896U );
|
||||
BOOST_TEST_EQ( hv(3.14L), 3972386992U );
|
||||
BOOST_TEST_EQ( hv(-3.14L), 1824903344U );
|
||||
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), 2146435072U );
|
||||
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), 4293918720U );
|
||||
BOOST_TEST_EQ( hv(1.0L), hv(1.0) );
|
||||
BOOST_TEST_EQ( hv(-1.0L), hv(-1.0) );
|
||||
BOOST_TEST_EQ( hv(3.14L), hv(3.14) );
|
||||
BOOST_TEST_EQ( hv(-3.14L), hv(-3.14) );
|
||||
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), hv(std::numeric_limits<double>::infinity()) );
|
||||
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), hv(-std::numeric_limits<double>::infinity()) );
|
||||
}
|
||||
else
|
||||
{
|
||||
BOOST_TEST_EQ( hv(1.0L), 3770520689U );
|
||||
BOOST_TEST_EQ( hv(-1.0L), 3770553457U );
|
||||
BOOST_TEST_EQ( hv(3.14L), 1150018772U );
|
||||
BOOST_TEST_EQ( hv(-3.14L), 1150051540U );
|
||||
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), 3770537073U );
|
||||
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), 3770569841U );
|
||||
// ldbits == 96
|
||||
|
||||
BOOST_TEST_EQ( hv(1.0L), 3632050780U );
|
||||
BOOST_TEST_EQ( hv(-1.0L), 3632083548U );
|
||||
BOOST_TEST_EQ( hv(3.14L), 1742026549U );
|
||||
BOOST_TEST_EQ( hv(-3.14L), 1742059317U );
|
||||
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), 3632067164U );
|
||||
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), 3632099932U );
|
||||
}
|
||||
|
||||
#else
|
||||
@@ -246,9 +248,7 @@ int main()
|
||||
}
|
||||
else
|
||||
{
|
||||
// ldbits == 128 && std::numeric_limits<long double>::digits == 113
|
||||
// under ARM64 and S390x, but the values differ presumably because of
|
||||
// __FLOAT_WORD_ORDER__
|
||||
// ldbits == 128 && std::numeric_limits<long double>::digits == 113
|
||||
|
||||
BOOST_TEST_EQ( hv(1.0L), 4611404543450677248ULL );
|
||||
BOOST_TEST_EQ( hv(-1.0L), 13834776580305453056ULL );
|
||||
@@ -344,10 +344,10 @@ int main()
|
||||
|
||||
#if SIZE_MAX == 4294967295U
|
||||
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(+1.0, 0.0)), 1072693248U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(-1.0, 0.0)), 3220176896U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(0.0, +1.0)), 2619008688U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(0.0, -1.0)), 146497060U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(+1.0, 0.0)), 2619008688U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(-1.0, 0.0)), 146497060U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(0.0, +1.0)), 22395692U );
|
||||
BOOST_TEST_EQ( hv(std::complex<double>(0.0, -1.0)), 1449221192U );
|
||||
|
||||
#else
|
||||
|
||||
|
||||
Reference in New Issue
Block a user