Compare commits

...

8 Commits

Author SHA1 Message Date
Peter Dimov
85f9f8a97a Update documentation 2022-09-20 21:20:31 +03:00
Peter Dimov
e92eae9eb2 Treat char8_t and std::byte as char types in hash_range 2022-09-20 21:17:41 +03:00
Peter Dimov
8a1335458a Update Acknowledgements section 2022-09-20 21:09:17 +03:00
Peter Dimov
f7e537d1a1 Update Notes section 2022-09-20 21:06:09 +03:00
Peter Dimov
607b73f1e0 Add README.md 2022-09-20 16:08:10 +03:00
Peter Dimov
789261c68c Update 32 bit float reference values (for GCC) 2022-09-20 15:23:13 +03:00
Peter Dimov
29ee19ee7f Update 32 bit float reference values (for MSVC) 2022-09-20 15:08:21 +03:00
Peter Dimov
8bb7d43646 Simplify hash_value for floating point 2022-09-20 15:01:45 +03:00
8 changed files with 148 additions and 112 deletions

20
README.md Normal file
View File

@@ -0,0 +1,20 @@
# Boost.ContainerHash
The Boost.ContainerHash library, part of [Boost C++ Libraries](https://boost.org),
provides `boost::hash`, an enhanced implementation of the
[hash function](https://en.wikipedia.org/wiki/Hash_function) object specified
by C++11 as `std::hash`, and several support facilities (`hash_combine`,
`hash_range`, `hash_unordered_range`).
`boost::hash` supports most standard types and some user-defined types out of
the box, and is extensible; it's possible for a user-defined type `X` to make
iself hashable via `boost::hash<X>` by defining an appropriate overload of the
function `hash_value`.
See [the documentation of the library](https://www.boost.org/libs/container_hash)
for more information.
## License
Distributed under the
[Boost Software License, Version 1.0](http://boost.org/LICENSE_1_0.txt).

View File

@@ -10,6 +10,26 @@ https://www.boost.org/LICENSE_1_0.txt
= Design and Implementation Notes
:idprefix: notes_
== Quality of the Hash Function
Many hash functions strive to have little correlation between the input and
output values. They attempt to uniformally distribute the output values for
very similar inputs. This hash function makes no such attempt. In fact, for
integers, the result of the hash function is often just the input value. So
similar but different input values will often result in similar but different
output values. This means that it is not appropriate as a general hash
function. For example, a hash table may discard bits from the hash function
resulting in likely collisions, or might have poor collision resolution when
hash values are clustered together. In such cases this hash function will
perform poorly.
But the standard has no such requirement for the hash function, it just
requires that the hashes of two different values are unlikely to collide.
Containers or algorithms designed to work with the standard hash function will
have to be implemented to work well when the hash function's output is
correlated to its input. Since they are paying that cost a higher quality hash
function would be wasteful.
== The hash_value Customization Point
The way one customizes the standard `std::hash` function object for user
@@ -154,22 +174,46 @@ With this improved `hash_combine`, `boost::hash` for strings now passes the
https://github.com/aappleby/smhasher[SMHasher test suite] by Austin Appleby
(for a 64 bit `size_t`).
== Quality of the Hash Function
== hash_range
Many hash functions strive to have little correlation between the input and
output values. They attempt to uniformally distribute the output values for
very similar inputs. This hash function makes no such attempt. In fact, for
integers, the result of the hash function is often just the input value. So
similar but different input values will often result in similar but different
output values. This means that it is not appropriate as a general hash
function. For example, a hash table may discard bits from the hash function
resulting in likely collisions, or might have poor collision resolution when
hash values are clustered together. In such cases this hash function will
perform poorly.
The traditional implementation of `hash_range(seed, first, last)` has been
But the standard has no such requirement for the hash function, it just
requires that the hashes of two different values are unlikely to collide.
Containers or algorithms designed to work with the standard hash function will
have to be implemented to work well when the hash function's output is
correlated to its input. Since they are paying that cost a higher quality hash
function would be wasteful.
[source]
----
for( ; first != last; ++first )
{
boost::hash_combine<typename std::iterator_traits<It>::value_type>( seed, *first );
}
----
(the explicit template parameter is needed to support iterators with proxy
return types such as `std::vector<bool>::iterator`.)
This is logical, consistent and straightforward. In the common case where
`typename std::iterator_traits<It>::value_type` is `char` -- which it is
in the common case of `boost::hash<std::string>` -- this however leaves a
lot of performance on the table, because processing each `char` individually
is much less efficient than processing several in bulk.
In Boost 1.81, `hash_range` was changed to process elements of type `char`,
`signed char`, `unsigned char`, `std::byte`, or `char8_t`, four of a time.
A `uint32_t` is composed from `first[0]` to `first[3]`, and that `uint32_t`
is fed to `hash_combine`.
In principle, when `size_t` is 64 bit, we could have used `uint64_t` instead.
We do not, because this allows producing an arbitrary hash value by choosing
the input bytes appropriately (because `hash_combine` is reversible.)
Allowing control only over 32 bits of the full 64 bit `size_t` value makes
these "chosen plaintext attacks" harder.
This is not as harmful to performance as it first appears, because the
input to `hash<string>` (e.g. the key in an unordered container) is often
short (9 to 13 bytes in some typical scenarios.)
Note that `hash_range` has also traditionally guaranteed that the same element
sequence yields the same hash value regardless of the iterator type. This
property remains valid after the changes to `char` range hashing. `hash_range`,
applied to the `char` sequence `{ 'a', 'b', 'c' }`, results in the same value
whether the sequence comes from `char[3]`, `std::string`, `std::deque<char>`,
or `std::list<char>`.

View File

@@ -225,7 +225,7 @@ Effects: ::
+
--
When `typename std::iterator_traits<It>::value_type` is not `char`, `signed char`,
`unsigned char`,
`unsigned char`, `std::byte`, or `char8_t`,
[source]
----

View File

@@ -1,6 +1,13 @@
////
Copyright 2005-2008 Daniel James
Copyright 2022 Christian Mazakas
Copyright 2022 Peter Dimov
Distributed under the Boost Software License, Version 1.0.
https://www.boost.org/LICENSE_1_0.txt
////
[#thanks]
= Acknowledgements
:idprefix: thanks_
This library is based on the design by Peter Dimov. During the initial development Joaquín M López Muñoz made many useful suggestions and contributed fixes.
@@ -12,3 +19,5 @@ The implementation of the hash function for pointers is based on suggestions mad
Some useful improvements to the floating point hash algorithm were suggested by Daniel Krügler.
The original implementation came from Jeremy B. Maitin-Shepard's hash table library, although this is a complete rewrite.
The documentation was converted from Quickbook to AsciiDoc by Christian Mazakas.

View File

@@ -27,6 +27,14 @@ template<> struct is_char_type<char>: public boost::true_type {};
template<> struct is_char_type<signed char>: public boost::true_type {};
template<> struct is_char_type<unsigned char>: public boost::true_type {};
#if defined(__cpp_char8_t) && __cpp_char8_t >= 201811L
template<> struct is_char_type<char8_t>: public boost::true_type {};
#endif
#if defined(__cpp_lib_byte) && __cpp_lib_byte >= 201603L
template<> struct is_char_type<std::byte>: public boost::true_type {};
#endif
#endif
template<class It>

View File

@@ -176,12 +176,11 @@ namespace boost
{
template<class T,
std::size_t Bits = sizeof(T) * CHAR_BIT,
int Digits = std::numeric_limits<T>::digits,
std::size_t size_t_bits = sizeof(std::size_t) * CHAR_BIT>
int Digits = std::numeric_limits<T>::digits>
struct hash_float_impl;
// float
template<class T, int Digits, std::size_t size_t_bits> struct hash_float_impl<T, 32, Digits, size_t_bits>
template<class T, int Digits> struct hash_float_impl<T, 32, Digits>
{
static std::size_t fn( T v )
{
@@ -193,35 +192,19 @@ namespace boost
};
// double
template<class T, int Digits> struct hash_float_impl<T, 64, Digits, 64>
template<class T, int Digits> struct hash_float_impl<T, 64, Digits>
{
static std::size_t fn( T v )
{
boost::uint64_t w;
std::memcpy( &w, &v, sizeof( v ) );
return w;
}
};
template<class T, int Digits> struct hash_float_impl<T, 64, Digits, 32>
{
static std::size_t fn( T v )
{
boost::uint32_t w[ 2 ];
std::memcpy( &w, &v, sizeof( v ) );
std::size_t seed = 0;
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
return seed;
return hash_value( w );
}
};
// 80 bit long double in 12 bytes
template<class T> struct hash_float_impl<T, 96, 64, 64>
template<class T> struct hash_float_impl<T, 96, 64>
{
static std::size_t fn( T v )
{
@@ -230,32 +213,15 @@ namespace boost
std::size_t seed = 0;
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
return seed;
}
};
template<class T> struct hash_float_impl<T, 96, 64, 32>
{
static std::size_t fn( T v )
{
boost::uint32_t w[ 3 ] = {};
std::memcpy( &w, &v, 80 / CHAR_BIT );
std::size_t seed = 0;
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[2] ) + hash_detail::hash_mix( seed );
seed = hash_value( w[0] ) + hash_detail::hash_mix( seed );
seed = hash_value( w[1] ) + hash_detail::hash_mix( seed );
return seed;
}
};
// 80 bit long double in 16 bytes
template<class T> struct hash_float_impl<T, 128, 64, 64>
template<class T> struct hash_float_impl<T, 128, 64>
{
static std::size_t fn( T v )
{
@@ -264,32 +230,15 @@ namespace boost
std::size_t seed = 0;
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
return seed;
}
};
template<class T> struct hash_float_impl<T, 128, 64, 32>
{
static std::size_t fn( T v )
{
boost::uint32_t w[ 3 ] = {};
std::memcpy( &w, &v, 80 / CHAR_BIT );
std::size_t seed = 0;
seed = static_cast<std::size_t>( w[0] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[1] ) + hash_detail::hash_mix( seed );
seed = static_cast<std::size_t>( w[2] ) + hash_detail::hash_mix( seed );
seed = hash_value( w[0] ) + hash_detail::hash_mix( seed );
seed = hash_value( w[1] ) + hash_detail::hash_mix( seed );
return seed;
}
};
// 128 bit long double
template<class T, int Digits, std::size_t size_t_bits> struct hash_float_impl<T, 128, Digits, size_t_bits>
template<class T, int Digits> struct hash_float_impl<T, 128, Digits>
{
static std::size_t fn( T v )
{
@@ -630,6 +579,4 @@ namespace boost
#endif
}
#undef BOOST_FUNCTIONAL_HASH_ROTL32
#endif // #ifndef BOOST_FUNCTIONAL_HASH_HASH_HPP

View File

@@ -48,5 +48,13 @@ int main()
test<float>();
test<double>();
#if defined(__cpp_char8_t) && __cpp_char8_t >= 201811L
test<char8_t>();
#endif
#if defined(__cpp_lib_byte) && __cpp_lib_byte >= 201603L
test<std::byte>();
#endif
return boost::report_errors();
}

View File

@@ -171,16 +171,16 @@ int main()
#if SIZE_MAX == 4294967295U
BOOST_TEST_EQ( hv(1.0), 1072693248U );
BOOST_TEST_EQ( hv(-1.0), 3220176896U );
BOOST_TEST_EQ( hv(3.14), 3972386992U );
BOOST_TEST_EQ( hv(-3.14), 1824903344U );
BOOST_TEST_EQ( hv(1e-308), 2213556530U );
BOOST_TEST_EQ( hv(-1e-308), 66072882U );
BOOST_TEST_EQ( hv(1e+308), 2623678890U );
BOOST_TEST_EQ( hv(-1e+308), 476195242U );
BOOST_TEST_EQ( hv(std::numeric_limits<double>::infinity()), 2146435072U );
BOOST_TEST_EQ( hv(-std::numeric_limits<double>::infinity()), 4293918720U );
BOOST_TEST_EQ( hv(1.0), 2619008688U );
BOOST_TEST_EQ( hv(-1.0), 146497060U );
BOOST_TEST_EQ( hv(3.14), 101651732U );
BOOST_TEST_EQ( hv(-3.14), 210858151U );
BOOST_TEST_EQ( hv(1e-308), 3911789313U );
BOOST_TEST_EQ( hv(-1e-308), 1812507313U );
BOOST_TEST_EQ( hv(1e+308), 987802568U );
BOOST_TEST_EQ( hv(-1e+308), 1639042439U );
BOOST_TEST_EQ( hv(std::numeric_limits<double>::infinity()), 3227645345U );
BOOST_TEST_EQ( hv(-std::numeric_limits<double>::infinity()), 2247339177U );
#else
@@ -207,21 +207,23 @@ int main()
if( ldbits == 64 )
{
BOOST_TEST_EQ( hv(1.0L), 1072693248U );
BOOST_TEST_EQ( hv(-1.0L), 3220176896U );
BOOST_TEST_EQ( hv(3.14L), 3972386992U );
BOOST_TEST_EQ( hv(-3.14L), 1824903344U );
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), 2146435072U );
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), 4293918720U );
BOOST_TEST_EQ( hv(1.0L), hv(1.0) );
BOOST_TEST_EQ( hv(-1.0L), hv(-1.0) );
BOOST_TEST_EQ( hv(3.14L), hv(3.14) );
BOOST_TEST_EQ( hv(-3.14L), hv(-3.14) );
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), hv(std::numeric_limits<double>::infinity()) );
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), hv(-std::numeric_limits<double>::infinity()) );
}
else
{
BOOST_TEST_EQ( hv(1.0L), 3770520689U );
BOOST_TEST_EQ( hv(-1.0L), 3770553457U );
BOOST_TEST_EQ( hv(3.14L), 1150018772U );
BOOST_TEST_EQ( hv(-3.14L), 1150051540U );
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), 3770537073U );
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), 3770569841U );
// ldbits == 96
BOOST_TEST_EQ( hv(1.0L), 3632050780U );
BOOST_TEST_EQ( hv(-1.0L), 3632083548U );
BOOST_TEST_EQ( hv(3.14L), 1742026549U );
BOOST_TEST_EQ( hv(-3.14L), 1742059317U );
BOOST_TEST_EQ( hv(std::numeric_limits<long double>::infinity()), 3632067164U );
BOOST_TEST_EQ( hv(-std::numeric_limits<long double>::infinity()), 3632099932U );
}
#else
@@ -246,9 +248,7 @@ int main()
}
else
{
// ldbits == 128 && std::numeric_limits<long double>::digits == 113
// under ARM64 and S390x, but the values differ presumably because of
// __FLOAT_WORD_ORDER__
// ldbits == 128 && std::numeric_limits<long double>::digits == 113
BOOST_TEST_EQ( hv(1.0L), 4611404543450677248ULL );
BOOST_TEST_EQ( hv(-1.0L), 13834776580305453056ULL );
@@ -344,10 +344,10 @@ int main()
#if SIZE_MAX == 4294967295U
BOOST_TEST_EQ( hv(std::complex<double>(+1.0, 0.0)), 1072693248U );
BOOST_TEST_EQ( hv(std::complex<double>(-1.0, 0.0)), 3220176896U );
BOOST_TEST_EQ( hv(std::complex<double>(0.0, +1.0)), 2619008688U );
BOOST_TEST_EQ( hv(std::complex<double>(0.0, -1.0)), 146497060U );
BOOST_TEST_EQ( hv(std::complex<double>(+1.0, 0.0)), 2619008688U );
BOOST_TEST_EQ( hv(std::complex<double>(-1.0, 0.0)), 146497060U );
BOOST_TEST_EQ( hv(std::complex<double>(0.0, +1.0)), 22395692U );
BOOST_TEST_EQ( hv(std::complex<double>(0.0, -1.0)), 1449221192U );
#else