Split the hash documentation into several files.

[SVN r32973]
This commit is contained in:
Daniel James
2006-02-16 23:10:26 +00:00
parent de07bf2d69
commit be3a039e88
7 changed files with 831 additions and 843 deletions

View File

@@ -14,846 +14,9 @@
]
]
[def __note__ [$images/note.png]]
[def __alert__ [$images/alert.png]]
[def __tip__ [$images/tip.png]]
[def __boost_hash [classref boost::hash]]
[def __hash_value [funcref boost::hash_value hash_value]]
[def __hash_combine [funcref boost::hash_combine]]
[def __hash_range [funcref boost::hash_range]]
[section:intro Introduction]
[def __tr1__ [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf
C++ Standard Library Technical Report]]
[def __tr1-short__ [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf
Technical Report]]
[def __multi-index__ [@../../libs/multi_index/doc/index.html
Boost Multi-Index Containers Library]]
[def __multi-index-short__ [@../../libs/multi_index/doc/index.html
Boost.MultiIndex]]
[def __issues__ [@http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf
Library Extension Technical Report Issues List]]
[def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function hash function]]
[def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table hash table]]
__boost_hash is an implementation of the __hash-function__ object
specified by the __tr1-short__. It is intended for use as the default hash function
for unordered associative containers, and the __multi-index__'s hash indexes.
As it is compliant with the __tr1-short__, it will work with:
* integers
* floats
* pointers
* strings
It also implements the extension proposed by Peter Dimov in issue 6.18 of the
__issues__, this adds support for:
* arrays
* `std::pair`
* the standard containers.
* extending __boost_hash for custom types.
[endsect]
[section:tutorial Tutorial]
When using a hash index with __multi-index-short__, you don't need to do
anything to use __boost_hash as it uses it by default.
To find out how to use a user-defined type, read the
[link hash.custom section on extending boost::hash for a custom data type].
If your standard library supplies its own implementation of the unordered
associative containers and you wish to use
__boost_hash, just use an extra template parameter:
std::unordered_multiset<std::vector<int>, __boost_hash<int> >
set_of_ints;
std::unordered_set<std::pair<int, int>, __boost_hash<std::pair<int, int> >
set_of_pairs;
std::unordered_map<int, std::string, __boost_hash<int> > map_int_to_string;
To use __boost_hash directly, create an instance and call it as a function:
#include <boost/functional/hash.hpp>
int main()
{
__boost_hash<std::string> string_hash;
std::size_t h = string_hash("Hash me");
}
For an example of generic use, here is a function to generate a vector
containing the hashes of the elements of a container:
template <class Container>
std::vector<std::size_t> get_hashes(Container const& x)
{
std::vector<std::size_t> hashes;
std::transform(x.begin(), x.end(), std::insert_iterator(hashes),
__boost_hash<typename Container::value_type>());
return hashes;
}
[endsect]
[section:custom Extending boost::hash for a custom data type]
__boost_hash is implemented by calling the function __hash_value.
The namespace isn't specified so that it can detect overloads via argument
dependant lookup. So if there is a free function `hash_value` in the same
namespace as a custom type, it will get called.
If you have a structure `library::book`, where each `book` is uniquely
defined by it's member `id`:
namespace library
{
struct book
{
int id;
std::string author;
std::string title;
// ....
};
bool operator==(book const& a, book const& b)
{
return a.id == b.id;
}
}
Then all you would need to do is write the function `library::hash_value`:
namespace library
{
std::size_t hash_value(book const& b)
{
__boost_hash<int> hasher;
return hasher(b.id);
}
}
And you can now use __boost_hash with book:
library::book knife(3458, "Zane Grey", "The Hash Knife Outfit");
library::book dandelion(1354, "Paul J. Shanley",
"Hash & Dandelion Greens");
__boost_hash<library::book> book_hasher;
std::size_t knife_hash_value = book_hasher(knife);
// If std::unordered_set is available:
std::unordered_set<library::book, __boost_hash<library::book> > books;
books.insert(knife);
books.insert(library::book(2443, "Lindgren, Torgny", "Hash"));
books.insert(library::book(1953, "Snyder, Bernadette M.",
"Heavenly Hash: A Tasty Mix of a Mother's Meditations"));
assert(books.find(knife) != books.end());
assert(books.find(dandelion) == books.end());
The full example can be found in:
[@../../libs/functional/hash/examples/books.cpp /libs/functional/hash/examples/books.hpp]
and
[@../../libs/functional/hash/examples/books.cpp /libs/functional/hash/examples/books.cpp].
[blurb
When writing a hash function, first look at how the equality function works.
Objects that are equal must generate the same hash value.
When objects are not equal the should generate different hash values.
In this object equality was based just on the id, if it was based
on the objects name and author the hash function should take them into account
(how to do this is discussed in the next section).
]
[endsect]
[section:combine Combining hash values]
Say you have a point class, representing a two dimensional location:
class point
{
int x;
int y;
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
bool operator==(point const& other) const
{
return x == other.x && y == other.y;
}
};
and you wish to use it as the key for an `unordered_map`. You need to
customise the hash for this structure. To do this we need to combine
the hash values for `x` and `y`. The function
__hash_combine is supplied for this purpose:
class point
{
...
friend std::size_t hash_value(point const& p)
{
std::size_t seed = 0;
__hash_combine(seed, p.x);
__hash_combine(seed, p.y);
return seed;
}
...
};
Calls to hash_combine incrementally build the hash from the different members
of point, it can be repeatedly called for any number of elements. It calls
__hash_value on the supplied element, and combines it with the seed.
Full code for this example is at
[@../../libs/functional/hash/examples/point.cpp /libs/functional/hash/examples/point.cpp].
[blurb
'''
When using __hash_combine the order of the
calls matters.
<programlisting>
std::size_t seed = 0;
boost::hash_combine(seed, 1);
boost::hash_combine(seed, 2);
</programlisting>
results in a different seed to:
<programlisting>
std::size_t seed = 0;
boost::hash_combine(seed, 2);
boost::hash_combine(seed, 1);
</programlisting>
If you are calculating a hash value for data where the order of the data
doesn't matter in comparisons (e.g. a set) you will have to ensure that the
data is always supplied in the same order.
'''
]
To calculate the hash of an iterator range you can use __hash_range:
std::vector<std::string> some_strings;
std::size_t hash = __hash_range(some_strings.begin(), some_strings.end());
[endsect]
[section:portability Portability]
__boost_hash is written to be as portable as possible, but unfortunately, several
older compilers don't support argument dependent lookup (ADL) - the mechanism
used for customization. On those compilers custom overloads for hash_value
need to be declared in the boost namespace.
On a strictly standards compliant compiler, an overload defined in the
boost namespace won't be found when __boost_hash is instantiated,
so for these compilers the overload should only be declared in the same
namespace as the class.
Let's say we have a simple custom type:
namespace foo
{
template <class T>
class custom_type
{
T value;
public:
custom_type(T x) : value(x) {}
friend std::size_t hash_value(custom_type x)
{
__boost_hash<int> hasher;
return hasher(x.value);
}
};
}
On a compliant compiler, when `hash_value` is called for this type,
it will look at the namespace inside the type and find `hash_value`
but on a compiler which doesn't support ADL `hash_value` won't be found.
To make things worse, some compilers which do support ADL won't find
a friend class defined inside the class.
So first move the member function out of the class:
namespace foo
{
template <class T>
class custom_type
{
T value;
public:
custom_type(T x) : value(x) {}
std::size_t hash(custom_type x)
{
__boost_hash<T> hasher;
return hasher(value);
}
};
template <class T>
inline std::size_t hash_value(custom_type<T> x)
{
return x.hash();
}
}
Unfortunately, I couldn't declare hash_value as a friend, as some compilers
don't support template friends, so instead I declared a member function to
calculate the hash, can called it from hash_value.
For compilers which don't support ADL, hash_value needs to be defined in the
boost namespace:
#ifdef BOOST_NO_ARGUMENT_DEPENDENT_LOOKUP
namespace boost
#else
namespace foo
#endif
{
template <class T>
std::size_t hash_value(foo::custom_type<T> x)
{
return x.hash();
}
}
Full code for this example is at
[@../../libs/functional/hash/examples/portable.cpp /libs/functional/hash/examples/portable.cpp].
[h2 Other Issues]
On Visual C++ versions 6.5 and 7.0, `hash_value` isn't overloaded for built in
arrays. __boost_hash, __hash_combine and __hash_range all use a workaround to
support built in arrays so this shouldn't be a problem in most cases.
On Visual C++ versions 6.5 and 7.0, function pointers aren't currently supported.
When using GCC on Solaris, `boost::hash_value(long double)` treats
`long double`s as doubles - so the hash function doesn't take into account the
full range of values.
[endsect]
[/ Quickbook insists on putting a paragraph around the escaped code, which
messes up the library section, so I wrap it in another section. This is no good,
but there you go. ]
[section:reference_ Reference]
'''
<library-reference>
<section id="hash.reference.specification">
<para>For the full specification, see section 6.3 of the
<ulink url="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf">C++ Standard Library Technical Report</ulink>
and issue 6.18 of the
<ulink url="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf">Library Extension Technical Report Issues List</ulink>.
</para>
</section>
<header name="boost/functional/hash.hpp">
<para>
Defines <code><classname>boost::hash</classname></code>,
and helper functions.
</para>
<namespace name="boost">
<!--
boost::hash
-->
<struct name="hash">
<template>
<template-type-parameter name="T"/>
</template>
<inherit access="public">
<classname>std::unary_function&lt;T, std::size_t&gt;</classname>
</inherit>
<purpose>An STL compliant hash function object.</purpose>
<method name="operator()" cv="const">
<type>std::size_t</type>
<parameter name="val">
<paramtype>T const&amp;</paramtype>
</parameter>
<returns><para>
<programlisting><functionname>hash_value</functionname>(val)</programlisting>
</para></returns>
<notes><para>
The call to <code><functionname>hash_value</functionname></code>
is unqualified, so that custom overloads can be
found via argument dependent lookup.
</para></notes>
<throws><para>
Only throws if
<code><functionname>hash_value</functionname>(T)</code> throws.
</para></throws>
</method>
</struct>
<!--
boost::hash_combine
-->
<function name="hash_combine">
<template>
<template-type-parameter name="T"/>
</template>
<type>void</type>
<parameter name="seed"><paramtype>size_t &amp;</paramtype></parameter>
<parameter name="v"><paramtype>T const &amp;</paramtype></parameter>
<purpose>
Called repeatedly to incrementally create a hash value from
several variables.
</purpose>
<effects><programlisting>seed ^= <functionname>hash_value</functionname>(v) + 0x9e3779b9 + (seed &lt;&lt; 6) + (seed &gt;&gt; 2);</programlisting></effects>
<notes>
<para><functionname>hash_value</functionname> is called without
qualification, so that overloads can be found via ADL.</para>
<para>This is an extension to TR1</para>
</notes>
<throws>
Only throws if <functionname>hash_value</functionname>(T) throws.
Strong exception safety, as long as <functionname>hash_value</functionname>(T)
also has strong exception safety.
</throws>
</function>
<!--
boost::hash_range
-->
<overloaded-function name="hash_range">
<signature>
<template>
<template-type-parameter name="It"/>
</template>
<type>std::size_t</type>
<parameter name="first"><paramtype>It</paramtype></parameter>
<parameter name="last"><paramtype>It</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="It"/>
</template>
<type>void</type>
<parameter name="seed"><paramtype>std::size_t&amp;</paramtype></parameter>
<parameter name="first"><paramtype>It</paramtype></parameter>
<parameter name="last"><paramtype>It</paramtype></parameter>
</signature>
<purpose>
Calculate the combined hash value of the elements of an iterator
range.
</purpose>
<effects>
<para>For the two argument overload:
<programlisting>
size_t seed = 0;
for(; first != last; ++first)
{
<functionname>hash_combine</functionname>(seed, *first);
}
return seed;
</programlisting>
</para>For the three arguments overload:
<programlisting>
for(; first != last; ++first)
{
<functionname>hash_combine</functionname>(seed, *first);
}
</programlisting>
<para>
</para>
</effects>
<notes>
<para>
<code>hash_range</code> is sensitive to the order of the elements
so it wouldn't be appropriate to use this with an unordered
container.
</para>
<para>This is an extension to TR1</para>
</notes>
<throws><para>
Only throws if <code><functionname>hash_value</functionname>(std::iterator_traits&lt;It&gt;::value_type)</code>
throws. <code>hash_range(std::size_t&amp;, It, It)</code> has basic exception safety as long as
<code><functionname>hash_value</functionname>(std::iterator_traits&lt;It&gt;::value_type)</code>
has basic exception safety.
</para></throws>
</overloaded-function>
<!--
boost::hash_value - integers
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for integers.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>int</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>unsigned int</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>long</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>unsigned long</paramtype></parameter>
</signature>
<returns>
<code>val</code>
</returns>
</overloaded-function>
<!--
boost::hash_value - floating point
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for floating point values.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>float</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>double</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>long double</paramtype></parameter>
</signature>
<returns>
<para>
An unspecified value, except that equal arguments shall yield the same
result
</para>
</returns>
</overloaded-function>
<!--
boost::hash_value - pointers
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for pointers.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<template><template-type-parameter name="T"/></template>
<type>std::size_t</type>
<parameter name="val"><paramtype>T* const&amp;</paramtype></parameter>
</signature>
<returns>
<para>
An unspecified value, except that equal arguments shall yield the same
result
</para>
</returns>
</overloaded-function>
<!--
boost::hash_value - arrays
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for built in arrays.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<template>
<template-type-parameter name="T"/>
<template-nontype-parameter name="N"><type>unsigned</type></template-nontype-parameter>
</template>
<type>std::size_t</type>
<parameter><paramtype>T (&amp;val)[N]</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="T"/>
<template-nontype-parameter name="N"><type>unsigned</type></template-nontype-parameter>
</template>
<type>std::size_t</type>
<parameter><paramtype>const T (&amp;val)[N]</paramtype></parameter>
</signature>
<returns><code>hash_range(val, val+N)</code></returns>
</overloaded-function>
<!--
boost::hash_value - string
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for <code>std::basic_string</code>.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<template>
<template-type-parameter name="Ch"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val">
<paramtype>std::basic_string&lt;Ch, std::char_traits&lt;Ch&gt;, A&gt; const&amp;</paramtype>
</parameter>
</signature>
<returns><code>hash_range(val.begin(), val.end())</code></returns>
</overloaded-function>
<!--
boost::hash_value - pairs
-->
<overloaded-function name="hash_value">
<signature>
<template>
<template-type-parameter name="A"/>
<template-type-parameter name="B"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::pair&lt;A, B&gt; const &amp;</paramtype></parameter>
</signature>
<effects><programlisting>
size_t seed = 0;
<functionname>hash_combine</functionname>(seed, val.first);
<functionname>hash_combine</functionname>(seed, val.second);
return seed;
</programlisting></effects>
<throws>
Only throws if <code><functionname>hash_value</functionname>(A)</code>
or <code><functionname>hash_value</functionname>(B)</code> throws.
</throws>
<notes><para>This is an extension to TR1</para></notes>
</overloaded-function>
<!--
boost::hash_value - containers
-->
<overloaded-function name="hash_value">
<signature>
<template>
<template-type-parameter name="T"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::vector&lt;T, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="T"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::list&lt;T, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="T"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::deque&lt;T, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::set&lt;K, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::multiset&lt;K, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="T"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::map&lt;K, T, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="T"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::multimap&lt;K, T, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<returns>
<code><functionname>hash_range</functionname>(val.begin(), val.end());</code>
</returns>
<throws>
Only throws if <code><functionname>hash_value</functionname>(T)</code> throws.
</throws>
<notes><para>This is an extension to TR1</para></notes>
</overloaded-function>
</namespace>
</header>
</library-reference>
'''
[/ Remove this comment and everything goes wrong! ;) ]
[endsect]
[section:links Links]
[*A Proposal to Add Hash Tables to the Standard Library]\n
[@http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2003/n1456.html]\n
The hash table proposal explains much of the design. The hash function object
is discussed in Section D.
[*The C++ Standard Library Technical Report.]\n
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf]\n
Contains the hash function specification in section 6.3.2.
[*Library Extension Technical Report Issues List.]\n
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf]\n
The library implements the extension described in Issue 6.18, pages 63-67.
[*Methods for Identifying Versioned and Plagiarised Documents]\n
Timothy C. Hoad, Justin Zobel\n
[@http://www.cs.rmit.edu.au/~jz/fulltext/jasist-tch.pdf]\n
Contains the hash function that __hash_combine is based on.
[endsect]
[section:acknowledgements Acknowledgements]
This library is based on the design by Peter Dimov. During the inital development
Joaquín M López Muñoz made many useful suggestions and contributed fixes.
The review was managed by Thorsten Ottosen, and the library reviewed by:
David Abrahams, Alberto Barbati, Topher Cooper, Caleb Epstein, Dave Harris,
Chris Jefferson, Bronek Kozicki, John Maddock, Tobias Swinger,Jaap Suter
and Rob Stewart.
The implementation of the hash function for pointers is based on suggestions
made by Alberto Barbati and Dave Harris. Dave Harris also suggested an
important improvement to __hash_combine that was taken up.
The original implementation came from Jeremy B. Maitin-Shepard's hash table
library, although this is a complete rewrite.
[endsect]
[include:hash intro.qbk]
[include:hash tutorial.qbk]
[include:hash portability.qbk]
[xinclude ref.xml]
[include:hash links.qbk]
[include:hash thanks.qbk]

36
hash/doc/intro.qbk Normal file
View File

@@ -0,0 +1,36 @@
[section:intro Introduction]
[def __tr1__ [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf
C++ Standard Library Technical Report]]
[def __tr1-short__ [@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf
Technical Report]]
[def __multi-index__ [@../../libs/multi_index/doc/index.html
Boost Multi-Index Containers Library]]
[def __multi-index-short__ [@../../libs/multi_index/doc/index.html
Boost.MultiIndex]]
[def __issues__ [@http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf
Library Extension Technical Report Issues List]]
[def __hash-function__ [@http://en.wikipedia.org/wiki/Hash_function hash function]]
[def __hash-table__ [@http://en.wikipedia.org/wiki/Hash_table hash table]]
[classref boost::hash] is an implementation of the __hash-function__ object
specified by the __tr1-short__. It is intended for use as the default hash function
for unordered associative containers, and the __multi-index__'s hash indexes.
As it is compliant with the __tr1-short__, it will work with:
* integers
* floats
* pointers
* strings
It also implements the extension proposed by Peter Dimov in issue 6.18 of the
__issues__, this adds support for:
* arrays
* `std::pair`
* the standard containers.
* extending [classref boost::hash] for custom types.
[endsect]

22
hash/doc/links.qbk Normal file
View File

@@ -0,0 +1,22 @@
[section:links Links]
[*A Proposal to Add Hash Tables to the Standard Library]\n
[@http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2003/n1456.html]\n
The hash table proposal explains much of the design. The hash function object
is discussed in Section D.
[*The C++ Standard Library Technical Report.]\n
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf]\n
Contains the hash function specification in section 6.3.2.
[*Library Extension Technical Report Issues List.]\n
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1837.pdf]\n
The library implements the extension described in Issue 6.18, pages 63-67.
[*Methods for Identifying Versioned and Plagiarised Documents]\n
Timothy C. Hoad, Justin Zobel\n
[@http://www.cs.rmit.edu.au/~jz/fulltext/jasist-tch.pdf]\n
Contains the hash function that [funcref boost::hash_combine] is based on.
[endsect]

98
hash/doc/portability.qbk Normal file
View File

@@ -0,0 +1,98 @@
[section:portability Portability]
[classref boost::hash] is written to be as portable as possible, but unfortunately, several
older compilers don't support argument dependent lookup (ADL) - the mechanism
used for customization. On those compilers custom overloads for hash_value
need to be declared in the boost namespace.
On a strictly standards compliant compiler, an overload defined in the
boost namespace won't be found when [classref boost::hash] is instantiated,
so for these compilers the overload should only be declared in the same
namespace as the class.
Let's say we have a simple custom type:
namespace foo
{
template <class T>
class custom_type
{
T value;
public:
custom_type(T x) : value(x) {}
friend std::size_t hash_value(custom_type x)
{
[classref boost::hash]<int> hasher;
return hasher(x.value);
}
};
}
On a compliant compiler, when `hash_value` is called for this type,
it will look at the namespace inside the type and find `hash_value`
but on a compiler which doesn't support ADL `hash_value` won't be found.
To make things worse, some compilers which do support ADL won't find
a friend class defined inside the class.
So first move the member function out of the class:
namespace foo
{
template <class T>
class custom_type
{
T value;
public:
custom_type(T x) : value(x) {}
std::size_t hash(custom_type x)
{
[classref boost::hash]<T> hasher;
return hasher(value);
}
};
template <class T>
inline std::size_t hash_value(custom_type<T> x)
{
return x.hash();
}
}
Unfortunately, I couldn't declare hash_value as a friend, as some compilers
don't support template friends, so instead I declared a member function to
calculate the hash, can called it from hash_value.
For compilers which don't support ADL, hash_value needs to be defined in the
boost namespace:
#ifdef BOOST_NO_ARGUMENT_DEPENDENT_LOOKUP
namespace boost
#else
namespace foo
#endif
{
template <class T>
std::size_t hash_value(foo::custom_type<T> x)
{
return x.hash();
}
}
Full code for this example is at
[@../../libs/functional/hash/examples/portable.cpp /libs/functional/hash/examples/portable.cpp].
[h2 Other Issues]
On Visual C++ versions 6.5 and 7.0, `hash_value` isn't overloaded for built in
arrays. [classref boost::hash], [funcref boost::hash_combine] and [funcref boost::hash_range] all use a workaround to
support built in arrays so this shouldn't be a problem in most cases.
On Visual C++ versions 6.5 and 7.0, function pointers aren't currently supported.
When using GCC on Solaris, `boost::hash_value(long double)` treats
`long double`s as doubles - so the hash function doesn't take into account the
full range of values.
[endsect]

459
hash/doc/ref.xml Normal file
View File

@@ -0,0 +1,459 @@
<library-reference>
<section id="hash.reference.specification">
<para>For the full specification, see section 6.3 of the
<ulink url="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf">C++ Standard Library Technical Report</ulink>
and issue 6.18 of the
<ulink url="http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2005/n1756.pdf">Library Extension Technical Report Issues List</ulink>.
</para>
</section>
<header name="boost/functional/hash.hpp">
<para>
Defines <code><classname>boost::hash</classname></code>,
and helper functions.
</para>
<namespace name="boost">
<!--
boost::hash
-->
<struct name="hash">
<template>
<template-type-parameter name="T"/>
</template>
<inherit access="public">
<classname>std::unary_function&lt;T, std::size_t&gt;</classname>
</inherit>
<purpose>An STL compliant hash function object.</purpose>
<method name="operator()" cv="const">
<type>std::size_t</type>
<parameter name="val">
<paramtype>T const&amp;</paramtype>
</parameter>
<returns><para>
<programlisting><functionname>hash_value</functionname>(val)</programlisting>
</para></returns>
<notes><para>
The call to <code><functionname>hash_value</functionname></code>
is unqualified, so that custom overloads can be
found via argument dependent lookup.
</para></notes>
<throws><para>
Only throws if
<code><functionname>hash_value</functionname>(T)</code> throws.
</para></throws>
</method>
</struct>
<!--
boost::hash_combine
-->
<function name="hash_combine">
<template>
<template-type-parameter name="T"/>
</template>
<type>void</type>
<parameter name="seed"><paramtype>size_t &amp;</paramtype></parameter>
<parameter name="v"><paramtype>T const &amp;</paramtype></parameter>
<purpose>
Called repeatedly to incrementally create a hash value from
several variables.
</purpose>
<effects><programlisting>seed ^= <functionname>hash_value</functionname>(v) + 0x9e3779b9 + (seed &lt;&lt; 6) + (seed &gt;&gt; 2);</programlisting></effects>
<notes>
<para><functionname>hash_value</functionname> is called without
qualification, so that overloads can be found via ADL.</para>
<para>This is an extension to TR1</para>
</notes>
<throws>
Only throws if <functionname>hash_value</functionname>(T) throws.
Strong exception safety, as long as <functionname>hash_value</functionname>(T)
also has strong exception safety.
</throws>
</function>
<!--
boost::hash_range
-->
<overloaded-function name="hash_range">
<signature>
<template>
<template-type-parameter name="It"/>
</template>
<type>std::size_t</type>
<parameter name="first"><paramtype>It</paramtype></parameter>
<parameter name="last"><paramtype>It</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="It"/>
</template>
<type>void</type>
<parameter name="seed"><paramtype>std::size_t&amp;</paramtype></parameter>
<parameter name="first"><paramtype>It</paramtype></parameter>
<parameter name="last"><paramtype>It</paramtype></parameter>
</signature>
<purpose>
Calculate the combined hash value of the elements of an iterator
range.
</purpose>
<effects>
<para>For the two argument overload:
<programlisting>
size_t seed = 0;
for(; first != last; ++first)
{
<functionname>hash_combine</functionname>(seed, *first);
}
return seed;
</programlisting>
</para>For the three arguments overload:
<programlisting>
for(; first != last; ++first)
{
<functionname>hash_combine</functionname>(seed, *first);
}
</programlisting>
<para>
</para>
</effects>
<notes>
<para>
<code>hash_range</code> is sensitive to the order of the elements
so it wouldn't be appropriate to use this with an unordered
container.
</para>
<para>This is an extension to TR1</para>
</notes>
<throws><para>
Only throws if <code><functionname>hash_value</functionname>(std::iterator_traits&lt;It&gt;::value_type)</code>
throws. <code>hash_range(std::size_t&amp;, It, It)</code> has basic exception safety as long as
<code><functionname>hash_value</functionname>(std::iterator_traits&lt;It&gt;::value_type)</code>
has basic exception safety.
</para></throws>
</overloaded-function>
<!--
boost::hash_value - integers
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for integers.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>int</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>unsigned int</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>long</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>unsigned long</paramtype></parameter>
</signature>
<returns>
<code>val</code>
</returns>
</overloaded-function>
<!--
boost::hash_value - floating point
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for floating point values.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>float</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>double</paramtype></parameter>
</signature>
<signature>
<type>std::size_t</type>
<parameter name="val"><paramtype>long double</paramtype></parameter>
</signature>
<returns>
<para>
An unspecified value, except that equal arguments shall yield the same
result
</para>
</returns>
</overloaded-function>
<!--
boost::hash_value - pointers
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for pointers.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<template><template-type-parameter name="T"/></template>
<type>std::size_t</type>
<parameter name="val"><paramtype>T* const&amp;</paramtype></parameter>
</signature>
<returns>
<para>
An unspecified value, except that equal arguments shall yield the same
result
</para>
</returns>
</overloaded-function>
<!--
boost::hash_value - arrays
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for built in arrays.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<template>
<template-type-parameter name="T"/>
<template-nontype-parameter name="N"><type>unsigned</type></template-nontype-parameter>
</template>
<type>std::size_t</type>
<parameter><paramtype>T (&amp;val)[N]</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="T"/>
<template-nontype-parameter name="N"><type>unsigned</type></template-nontype-parameter>
</template>
<type>std::size_t</type>
<parameter><paramtype>const T (&amp;val)[N]</paramtype></parameter>
</signature>
<returns><code>hash_range(val, val+N)</code></returns>
</overloaded-function>
<!--
boost::hash_value - string
-->
<overloaded-function name="hash_value">
<purpose>
Implementation of a hash function for <code>std::basic_string</code>.
</purpose>
<description><para>
Generally shouldn't be called directly by users, instead they should use
<classname>boost::hash</classname>, <functionname>boost::hash_range</functionname>
or <functionname>boost::hash_combine</functionname> which
call hash_value without namespace qualification so that overloads
for custom types are found via ADL.
</para></description>
<notes>
<para>Overloads for other types supplied in other headers.</para>
<para>This is an extension to TR1</para>
</notes>
<signature>
<template>
<template-type-parameter name="Ch"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val">
<paramtype>std::basic_string&lt;Ch, std::char_traits&lt;Ch&gt;, A&gt; const&amp;</paramtype>
</parameter>
</signature>
<returns><code>hash_range(val.begin(), val.end())</code></returns>
</overloaded-function>
<!--
boost::hash_value - pairs
-->
<overloaded-function name="hash_value">
<signature>
<template>
<template-type-parameter name="A"/>
<template-type-parameter name="B"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::pair&lt;A, B&gt; const &amp;</paramtype></parameter>
</signature>
<effects><programlisting>
size_t seed = 0;
<functionname>hash_combine</functionname>(seed, val.first);
<functionname>hash_combine</functionname>(seed, val.second);
return seed;
</programlisting></effects>
<throws>
Only throws if <code><functionname>hash_value</functionname>(A)</code>
or <code><functionname>hash_value</functionname>(B)</code> throws.
</throws>
<notes><para>This is an extension to TR1</para></notes>
</overloaded-function>
<!--
boost::hash_value - containers
-->
<overloaded-function name="hash_value">
<signature>
<template>
<template-type-parameter name="T"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::vector&lt;T, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="T"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::list&lt;T, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="T"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::deque&lt;T, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::set&lt;K, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::multiset&lt;K, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="T"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::map&lt;K, T, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<signature>
<template>
<template-type-parameter name="K"/>
<template-type-parameter name="T"/>
<template-type-parameter name="C"/>
<template-type-parameter name="A"/>
</template>
<type>std::size_t</type>
<parameter name="val"><paramtype>std::multimap&lt;K, T, C, A&gt; const &amp;</paramtype></parameter>
</signature>
<returns>
<code><functionname>hash_range</functionname>(val.begin(), val.end());</code>
</returns>
<throws>
Only throws if <code><functionname>hash_value</functionname>(T)</code> throws.
</throws>
<notes><para>This is an extension to TR1</para></notes>
</overloaded-function>
</namespace>
</header>
</library-reference>

18
hash/doc/thanks.qbk Normal file
View File

@@ -0,0 +1,18 @@
[section:acknowledgements Acknowledgements]
This library is based on the design by Peter Dimov. During the inital development
Joaquín M López Muñoz made many useful suggestions and contributed fixes.
The review was managed by Thorsten Ottosen, and the library reviewed by:
David Abrahams, Alberto Barbati, Topher Cooper, Caleb Epstein, Dave Harris,
Chris Jefferson, Bronek Kozicki, John Maddock, Tobias Swinger,Jaap Suter
and Rob Stewart.
The implementation of the hash function for pointers is based on suggestions
made by Alberto Barbati and Dave Harris. Dave Harris also suggested an
important improvement to [funcref boost::hash_combine] that was taken up.
The original implementation came from Jeremy B. Maitin-Shepard's hash table
library, although this is a complete rewrite.
[endsect]

192
hash/doc/tutorial.qbk Normal file
View File

@@ -0,0 +1,192 @@
[section:tutorial Tutorial]
When using a hash index with __multi-index-short__, you don't need to do
anything to use [classref boost::hash] as it uses it by default.
To find out how to use a user-defined type, read the
[link hash.custom section on extending boost::hash for a custom data type].
If your standard library supplies its own implementation of the unordered
associative containers and you wish to use
[classref boost::hash], just use an extra template parameter:
std::unordered_multiset<std::vector<int>, ``[classref boost::hash]``<int> >
set_of_ints;
std::unordered_set<std::pair<int, int>, ``[classref boost::hash]``<std::pair<int, int> >
set_of_pairs;
std::unordered_map<int, std::string, ``[classref boost::hash]``<int> > map_int_to_string;
To use [classref boost::hash] directly, create an instance and call it as a function:
#include <boost/functional/hash.hpp>
int main()
{
``[classref boost::hash]``<std::string> string_hash;
std::size_t h = string_hash("Hash me");
}
For an example of generic use, here is a function to generate a vector
containing the hashes of the elements of a container:
template <class Container>
std::vector<std::size_t> get_hashes(Container const& x)
{
std::vector<std::size_t> hashes;
std::transform(x.begin(), x.end(), std::insert_iterator(hashes),
``[classref boost::hash]``<typename Container::value_type>());
return hashes;
}
[endsect]
[section:custom Extending boost::hash for a custom data type]
[classref boost::hash] is implemented by calling the function [funcref boost::hash_value hash_value].
The namespace isn't specified so that it can detect overloads via argument
dependant lookup. So if there is a free function `hash_value` in the same
namespace as a custom type, it will get called.
If you have a structure `library::book`, where each `book` is uniquely
defined by it's member `id`:
namespace library
{
struct book
{
int id;
std::string author;
std::string title;
// ....
};
bool operator==(book const& a, book const& b)
{
return a.id == b.id;
}
}
Then all you would need to do is write the function `library::hash_value`:
namespace library
{
std::size_t hash_value(book const& b)
{
``[classref boost::hash]``<int> hasher;
return hasher(b.id);
}
}
And you can now use [classref boost::hash] with book:
library::book knife(3458, "Zane Grey", "The Hash Knife Outfit");
library::book dandelion(1354, "Paul J. Shanley",
"Hash & Dandelion Greens");
``[classref boost::hash]``<library::book> book_hasher;
std::size_t knife_hash_value = book_hasher(knife);
// If std::unordered_set is available:
std::unordered_set<library::book, ``[classref boost::hash]``<library::book> > books;
books.insert(knife);
books.insert(library::book(2443, "Lindgren, Torgny", "Hash"));
books.insert(library::book(1953, "Snyder, Bernadette M.",
"Heavenly Hash: A Tasty Mix of a Mother's Meditations"));
assert(books.find(knife) != books.end());
assert(books.find(dandelion) == books.end());
The full example can be found in:
[@../../libs/functional/hash/examples/books.cpp /libs/functional/hash/examples/books.hpp]
and
[@../../libs/functional/hash/examples/books.cpp /libs/functional/hash/examples/books.cpp].
[blurb
When writing a hash function, first look at how the equality function works.
Objects that are equal must generate the same hash value.
When objects are not equal the should generate different hash values.
In this object equality was based just on the id, if it was based
on the objects name and author the hash function should take them into account
(how to do this is discussed in the next section).
]
[endsect]
[section:combine Combining hash values]
Say you have a point class, representing a two dimensional location:
class point
{
int x;
int y;
public:
point() : x(0), y(0) {}
point(int x, int y) : x(x), y(y) {}
bool operator==(point const& other) const
{
return x == other.x && y == other.y;
}
};
and you wish to use it as the key for an `unordered_map`. You need to
customise the hash for this structure. To do this we need to combine
the hash values for `x` and `y`. The function
[funcref boost::hash_combine] is supplied for this purpose:
class point
{
...
friend std::size_t hash_value(point const& p)
{
std::size_t seed = 0;
[funcref boost::hash_combine](seed, p.x);
[funcref boost::hash_combine](seed, p.y);
return seed;
}
...
};
Calls to hash_combine incrementally build the hash from the different members
of point, it can be repeatedly called for any number of elements. It calls
[funcref boost::hash_value hash_value] on the supplied element, and combines it with the seed.
Full code for this example is at
[@../../libs/functional/hash/examples/point.cpp /libs/functional/hash/examples/point.cpp].
[blurb
'''
When using [funcref boost::hash_combine] the order of the
calls matters.
<programlisting>
std::size_t seed = 0;
boost::hash_combine(seed, 1);
boost::hash_combine(seed, 2);
</programlisting>
results in a different seed to:
<programlisting>
std::size_t seed = 0;
boost::hash_combine(seed, 2);
boost::hash_combine(seed, 1);
</programlisting>
If you are calculating a hash value for data where the order of the data
doesn't matter in comparisons (e.g. a set) you will have to ensure that the
data is always supplied in the same order.
'''
]
To calculate the hash of an iterator range you can use [funcref boost::hash_range]:
std::vector<std::string> some_strings;
std::size_t hash = [funcref boost::hash_range](some_strings.begin(), some_strings.end());
[endsect]