diff --git a/doc/article/simple_cxx11_metaprogramming_2.adoc b/doc/article/simple_cxx11_metaprogramming_2.adoc new file mode 100644 index 0000000..c176818 --- /dev/null +++ b/doc/article/simple_cxx11_metaprogramming_2.adoc @@ -0,0 +1,981 @@ +//// +Copyright 2015-2017 Peter Dimov + +Distributed under the Boost Software License, Version 1.0. + +See accompanying file LICENSE_1_0.txt or copy at +http://www.boost.org/LICENSE_1_0.txt +//// + +# Simple {cpp}11 metaprogramming, part 2 +Peter Dimov +2015-06-20 + +[.lead] +__Efficient algorithms for membership testing, random access, and retrieval by +key__ + +NOTE: Being late to the metaprogramming party, I make no claim of having +invented the techniques in this article. A quick look at the implementations +of, for example, Louis Dionne's https://github.com/ldionne/mpl11[mpl11] and +Eric Niebler's https://github.com/ericniebler/meta[meta], shows that most of +these tricks are already known. Dave Abrahams +https://github.com/dabrahams/mpl11[has experimented] along these lines in 2012. +The original inventor of the multiple inheritance trick and the `void*` +arguments trick is probably Richard Smith, who has posted +https://llvm.org/bugs/attachment.cgi?id=8825[two] +https://llvm.org/bugs/attachment.cgi?id=8838[examples] in response to +https://llvm.org/bugs/show_bug.cgi?id=13263[a Clang bug report]. + +## Vectors, sets, and maps + +<>, I outlined a style of +metaprogramming that operated on type lists -- variadic class templates: +``` +template struct mp_list {}; +``` +Classic Lisp uses lists as its only data structure, but operating on a list is +usually linear in the number of its elements. + +In addition to `list`, the STL has `vector`, `set`, and `map`. `vector` +supports random access by index; `set` has efficient test for membership; `map` +associates keys with values and has efficient lookup based on key. + +Instead of introducing separate data structure such as `mp_vector`, `mp_set`, +`mp_map`, we'll keep our data in a list form, and attempt to provide efficient +algorithms for random access, membership testing, and lookup. + +## mp_contains + +Let's starts with sets. A set is just a list with unique elements. To obtain a +set from an arbitrary list, we'll need an algorithm that removes the +duplicates. Let's call it `mp_unique`: +[subs=+quotes] +``` +// mp_if + +template struct mp_if_c_impl; + +template struct mp_if_c_impl +{ + using type = T; +}; + +template struct mp_if_c_impl +{ + using type = E; +}; + +template + using mp_if_c = typename mp_if_c_impl::type; + +template + using mp_if = typename mp_if_c_impl::type; + +// mp_unique + +template struct mp_unique_impl; + +template using mp_unique = typename mp_unique_impl::type; + +template class L> struct mp_unique_impl> +{ + using type = L<>; +}; + +template class L, class T1, class... T> + struct mp_unique_impl> +{ + using _rest = mp_unique>; + using type = mp_if<**mp_contains**<_rest, T1>, _rest, mp_push_front<_rest, T1>>; +}; +``` +For membership testing, we've introduced an algorithm `mp_contains` that +returns `true` when `L` contains `V`. The straightforward recursive +implementation of `mp_contains` is: +``` +template struct mp_contains_impl; + +template using mp_contains = typename mp_contains_impl::type; + +template class L, class V> struct mp_contains_impl, V> +{ + using type = std::false_type; +}; + +template class L, class... T, class V> + struct mp_contains_impl, V> +{ + using type = std::true_type; +}; + +template class L, class T1, class... T, class V> + struct mp_contains_impl, V>: mp_contains_impl, V> +{ +}; +``` +Note that `mp_unique` makes `N` calls to `mp_contains`, where `N` is the +length of the list `L`. This means that `mp_contains` needs to be as fast as +possible, which the above implementation, well, isn't. + +Here are the compile times in seconds for invoking `mp_unique` on a list with +`N` (distinct) elements: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, recursive |2.1 |DNF |||||| + +|clang$$++$$ 3.5.1, recursive |0.9 |4.5 |13.2 |30.2 |DNF ||| + +|g$$++$$ 4.9.2, recursive |0.7 |3.6 |10.4 |23.2 |DNF ||| +|=== +(Tests done under Windows/Cygwin. All compilers are 32 bit. No optimizations. +DNF stands for "did not finish", which usually means that the compiler ran out +of heap space or crashed.) + +We clearly need a better alternative. + +I ended the previous article with an implementation of `mp_contains` that +relied on `mp_count`, which in turn relied on `mp_plus`. Let's see how it +fares: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, mp_count/mp_plus |1.1 |9.8 |50.5 |DNF |||| + +|clang$$++$$ 3.5.1, mp_count/mp_plus |0.5 |1.4 |3.1 |6.1 |DNF ||| + +|g$$++$$ 4.9.2, mp_count/mp_plus |0.5 |1.3 |2.9 |5.8 |9.7 |15.6 |22.4 |32.3 +|=== +Not _that_ bad, at least if your compiler happens to be `g$$++$$`. Still, there +ought to be room for improvement here. + +To do better, we have to somehow leverage the language features, such as pack +expansion, to do more of the work for us. For inspiration, let's turn to +section 14.5.3 paragraph 4 of the {cpp}11 standard, which explains that pack +expansions can occur in the following contexts: + +* **In a function parameter pack (8.3.5); the pattern is the + __parameter-declaration__ without the ellipsis.** +* In a template parameter pack that is a pack expansion (14.1): +* **In an __initializer-list__ (8.5); the pattern is an + __initializer-clause__.** +* **In a __base-specifier-list__ (Clause 10); the pattern is a + __base-specifier__.** +* In a __mem-initializer-list__ (12.6.2); the pattern is a + __mem-initializer__. +* In a __template-argument-list__ (14.3); the pattern is a + __template-argument__. +* In a __dynamic-exception-specification__ (15.4); the pattern is a + __type-id__. +* In an __attribute-list__ (7.6.1); the pattern is an __attribute__. +* In an __alignment-specifier__ (7.6.2); the pattern is the + __alignment-specifier__ without the ellipsis. +* In a __capture-list__ (5.1.2); the pattern is a __capture__. +* In a `sizeof$$...$$` expression (5.3.3); the pattern is an __identifier__. + +The **emphasis** is mine and indicates possible leads. + +Our first option is to expand the parameter pack into arguments for a function +call. Since we're interested in operations that occur at compile time, calling +a function may not appear useful; but {cpp}11 functions can be `constexpr`, and +`constexpr` function "calls" do occur at compile time. + +Recall our `mp_count`: +``` +template struct mp_count_impl; + +template class L, class... T, class V> + struct mp_count_impl, V> +{ + using type = mp_plus...>; +}; + +template using mp_count = typename mp_count_impl::type; +``` +Instead of using the template alias `mp_plus` to sum the `is_same` expressions, +we can use a `constexpr` function: +``` +constexpr std::size_t cx_plus() +{ + return 0; +} + +template constexpr std::size_t cx_plus(T1 t1, T... t) +{ + return t1 + cx_plus(t...); +} + +// mp_size_t + +template using mp_size_t = std::integral_constant; + +// mp_count + +template struct mp_count_impl; + +template class L, class... T, class V> + struct mp_count_impl, V> +{ + using type = mp_size_t::value...)>; +}; + +template using mp_count = typename mp_count_impl::type; +``` +with the following results: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|clang$$++$$ 3.5.1, mp_count/cx_plus |0.4 |1.1 |2.5 |5.0 |DNF ||| + +|g$$++$$ 4.9.2, mp_count/cx_plus |0.4 |0.9 |1.7 |2.9 |4.7 |6.7 |9.2 |11.8 +|=== +We've improved the times, but lost VC$$++$$ 2013 due to its not implementing +`constexpr`. + +Let's try pack expansion into an __initializer-list__. Instead of passing the +`is_same` expressions to a function, we can build a constant array out of them, +then sum the array with a `constexpr` function: +``` +constexpr std::size_t cx_plus2(bool const * first, bool const * last) +{ + return first == last? 0: *first + cx_plus2(first + 1, last); +} + +// mp_count + +template struct mp_count_impl; + +template class L, class... T, class V> + struct mp_count_impl, V> +{ + static constexpr bool _v[] = { std::is_same::value... }; + using type = mp_size_t; +}; + +template using mp_count = typename mp_count_impl::type; +``` +This is a neat trick, but is it fast? +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|clang$$++$$ 3.5.1, mp_count/cx_plus2 |0.4 |0.9 |1.8 |DNF |||| + +|g$$++$$ 4.9.2, mp_count/cx_plus2 |0.4 |0.9 |1.9 |3.4 |5.4 |7.8 |11.0 |14.7 +|=== +That's a bit disappointing. Let's see what can we do with expanding a parameter +pack into a base-specifier-list. We would be able to define a class that +derives from every element of the pack: +``` +struct U: T... {}; +``` +We can then use `std::is_base_of` to test whether a type `V` is a base of +`U`, that is, whether it's one of the elements of the parameter pack. Which is +exactly what we need. + +Arbitrary types such as `void`, `int`, or `void(int)` can't be used as base +classes, but we'll wrap the types in an empty class template, which we'll call +`mp_identity`. +``` +template struct mp_identity +{ + using type = T; +}; + +template struct mp_contains_impl; + +template using mp_contains = typename mp_contains_impl::type; + +template class L, class... T, class V> + struct mp_contains_impl, V> +{ + struct U: mp_identity... {}; + using type = std::is_base_of, U>; +}; +``` +Performance? +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, is_base_of |0.3 |0.6 |1.3 |2.5 |DNF ||| + +|clang$$++$$ 3.5.1, is_base_of |0.3 |0.4 |0.6 |0.8 |DNF ||| + +|g$$++$$ 4.9.2, is_base_of |0.3 |0.4 |0.6 |0.9 |1.3 |1.7 |2.3 |3.0 +|=== +This implementation is a clear winner. + +In fairness, we ought to note that the first four implementations of +`mp_contains` do not rely on the list elements being unique. This makes +`mp_contains` an algorithm that supports arbitrary lists, not just sets. + +The `is_base_of` implementation, however, does not support lists that contain +duplicates, because it's not possible to inherit directly from the same type +twice. So it does not implement the general `mp_contains`, but something that +should probably be named `mp_set_contains`. + +We can avoid the "no duplicates" requirement by modifying the implementation to +inherit from `mp_identity` indirectly, via an intermediate base class: +[subs=+macros] +``` +// indirect_inherit + +template struct inherit_second: T {}; + +template struct indirect_inherit_impl; + +template class L, class... T, std::size_t... J> + struct indirect_inherit_impl, http://en.cppreference.com/w/cpp/utility/integer_sequence[integer_sequence]>: + inherit_second>... {}; + +template using indirect_inherit = + indirect_inherit_impl::value>>; + +// mp_contains + +template struct mp_contains_impl +{ + using U = indirect_inherit; + using type = std::is_base_of, U>; +}; + +template using mp_contains = typename mp_contains_impl::type; +``` +This, however, pretty much nullifies the spectacular performance gains we've +observed with the original `is_base_of`-based implementation: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, recursive |2.1 |DNF |||||| + +|VC$$++$$ 2013, mp_count/mp_plus |1.1 |9.8 |50.5 |DNF |||| + +|VC$$++$$ 2013, is_base_of |0.3 |0.6 |1.3 |2.5 |DNF ||| + +|VC$$++$$ 2013, is_base_of/indirect |1.0 |9.3 |49.5 |153.8 |DNF ||| +|=== +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|clang$$++$$ 3.5.1, recursive |0.9 |4.5 |13.2 |30.2 |DNF ||| + +|clang$$++$$ 3.5.1, mp_count/mp_plus |0.5 |1.4 |3.1 |6.1 |DNF ||| + +|clang$$++$$ 3.5.1, mp_count/cx_plus |0.4 |1.1 |2.5 |5.0 |DNF ||| + +|clang$$++$$ 3.5.1, mp_count/cx_plus2 |0.4 |0.9 |1.8 |DNF |||| + +|clang$$++$$ 3.5.1, is_base_of |0.3 |0.4 |0.6 |0.8 |DNF ||| + +|clang$$++$$ 3.5.1, is_base_of/indirect |0.4 |0.9 |1.6 |2.5 |DNF ||| +|=== +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|g$$++$$ 4.9.2, recursive |0.7 |3.6 |10.4 |23.2 |DNF ||| + +|g$$++$$ 4.9.2, mp_count/mp_plus |0.5 |1.3 |2.9 |5.8 |9.7 |15.6 |22.4 |32.3 + +|g$$++$$ 4.9.2, mp_count/cx_plus |0.4 |0.9 |1.7 |2.9 |4.7 |6.7 |9.2 |11.8 + +|g$$++$$ 4.9.2, mp_count/cx_plus2 |0.4 |0.9 |1.9 |3.4 |5.4 |7.8 |11.0 |14.7 + +|g$$++$$ 4.9.2, is_base_of |0.3 |0.4 |0.6 |0.9 |1.3 |1.7 |2.3 |3.0 + +|g$$++$$ 4.9.2, is_base_of/indirect |0.5 |1.1 |2.3 |4.0 |6.6 |9.8 |13.6 |18.2 +|=== + +## mp_map_find + +A map, in the STL sense, is a data structure that associates keys with values +and can efficiently retrieve, given a key, its associated value. For our +purposes, a map will be any list of lists for which the inner lists have at +least one element, the key; the rest of the elements we'll consider to be the +associated value. For example, the list +``` +[[A, B], [C, D, E], [F], [G, H]] +``` +is a map with keys `A`, `C`, `F`, and `G`, with associated values `[B]`, +`[D, E]`, `[]`, and `[H]`, respectively. We'll require unique keys, for reasons +that'll become evident later. + +I'll show two other examples of maps, this time using real {cpp} code: +``` +using Map = mp_list, mp_list, mp_list>; +``` +``` +using Map2 = std::tuple, std::pair>; +``` +The Lisp name of the algorithm that performs retrieval based on key is `ASSOC`, +but I'll call it `mp_map_find`. `mp_map_find` returns the element of `M` +whose first element is `K`. For example, `mp_map_find` would return +`std::pair`. If there's no such key, it returns `void`. + +There's almost no need to implement and benchmark the recursive version of +`mp_map_find` -- we can be pretty sure it will perform horribly. Still, +``` +template struct mp_map_find_impl; + +template using mp_map_find = typename mp_map_find_impl::type; + +template class M, class K> struct mp_map_find_impl, K> +{ + using type = void; +}; + +template class M, class T1, class... T, class K> + struct mp_map_find_impl, K> +{ + using type = mp_if, K>, T1, mp_map_find, K>>; +}; +``` +The compile time, in seconds, for `N` lookups into a map of size `N`, is as +follows: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, recursive |38.2 |DNF |||||| + +|clang$$++$$ 3.5.1, recursive |2.5 |13.7 |DNF ||||| + +|g$$++$$ 4.9.2, recursive |1.9 |10.2 |28.8 |DNF |||| +|=== +I told you there was no point. + +But, I hear some of you say, you're evaluating the else branch even if the +condition is true, and that's horribly inefficient! + +Well, this would only improve the performance by a factor of approximately two +on average, and only if the element is present, but fine, let's try it. The +element happens to be present in the benchmark, so let's see. +``` +// mp_eval_if + +template class E, class... A> + struct mp_eval_if_c_impl; + +template class E, class... A> + struct mp_eval_if_c_impl +{ + using type = T; +}; + +template class E, class... A> + struct mp_eval_if_c_impl +{ + using type = E; +}; + +template class E, class... A> + using mp_eval_if_c = typename mp_eval_if_c_impl::type; + +template class E, class... A> + using mp_eval_if = typename mp_eval_if_c_impl::type; + +// mp_map_find + +template struct mp_map_find_impl; + +template using mp_map_find = typename mp_map_find_impl::type; + +template class M, class K> struct mp_map_find_impl, K> +{ + using type = void; +}; + +template class M, class T1, class... T, class K> + struct mp_map_find_impl, K> +{ + using type = mp_eval_if, K>, T1, mp_map_find, M, K>; +}; +``` +There you go: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, recursive |15.6 |DNF |||||| + +|clang$$++$$ 3.5.1, recursive |1.8 |9.5 |DNF ||||| + +|g$$++$$ 4.9.2, recursive |1.4 |7.0 |19.7 |DNF |||| +|=== +I told you there was no point. + +Point or no, to establish that the recursive implementation is inefficient is +not the same as to come up with an efficient one. There are two things that +make the `mp_contains` techniques inapplicable to our present case: first, +`mp_contains` only had to return true or false, whereas `mp_map_find` returns a +type, and second, in `mp_contains` we knew the exact type of the element for +which we were looking, whereas here, we only know its `mp_front`. + +Fortunately, there does exist a language feature that can solve both: {cpp} can +deduce the template parameters of base classes when passed a derived class. In +this example, +``` +struct K1 {}; +struct V1 {}; + +struct X: std::pair {}; + +template void f(std::pair const & p); + +int main() +{ + f(X()); +} +``` +the call to `f(X())` deduces `A` as `K1` and `B` as `V1`. If we have more than +one `std::pair` base class, we can fix `A` to be `K1`: +``` +struct K1 {}; +struct V1 {}; + +struct K2 {}; +struct V2 {}; + +struct X: std::pair, std::pair {}; + +template void f(std::pair const & p); + +int main() +{ + f(X()); +} +``` +and `B` will be deduced as `V1`. + +We can retrieve the results of the deduction by returning the type we want: +``` +template std::pair f(std::pair const & p); +``` +and then using `decltype(f(X()))` to obtain this return type. + +What if `X` doesn't have a base of type `std::pair`? The deduction will +fail and we'll get an error that `f(X())` cannot be called. To avoid it, we can +add an overload of `f` that takes anything and returns `void`. But in this +case, what will happen if `X` has two bases of the form that match the first +`f` overload, such as for example `std::pair` and `std::pair`? + +The deduction will fail, the second overload will again be chosen and we'll get +`void`. This is why we require maps to have unique keys. + +Here's an implementation of `mp_map_find` based on this technique: +``` +template struct mp_map_find_impl; + +template + using mp_map_find = typename mp_map_find_impl::type; + +template class M, class... T, class K> + struct mp_map_find_impl, K> +{ + struct U: mp_identity... {}; + + template class L, class... U> + static mp_identity> + f( mp_identity>* ); + + static mp_identity f( ... ); + + using V = decltype( f((U*)0) ); + + using type = typename V::type; +}; +``` +and its corresponding compile times: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, deduction |0.3 |0.7 |1.8 |3.6 |6.4 |10.4 |16.2 |DNF + +|clang$$++$$ 3.5.1, deduction |0.3 |0.4 |0.6 |0.9 |1.2 |1.6 |2.2 |2.7 + +|g$$++$$ 4.9.2, deduction |0.3 |0.5 |0.9 |1.6 |2.3 |3.4 |4.7 |6.3 +|=== +This looks ready to ship. + +The implementation contains one inefficiency though. If we evaluate +`mp_map_find`, then `mp_map_find`, the two nested `U` types are +the same as they only depend on `M`, but the compiler doesn't know that and +will instantiate each one separately. We should move this type outside +`mp_map_find_impl` so that it can be reused: +[subs=+quotes] +``` +template struct **mp_inherit**: T... {}; + +template struct mp_map_find_impl; + +template + using mp_map_find = typename mp_map_find_impl::type; + +template class M, class... T, class K> + struct mp_map_find_impl, K> +{ + **using U = mp_inherit...>;** + + template class L, class... U> + static mp_identity> + f( mp_identity>* ); + + static mp_identity f( ... ); + + using V = decltype( f((U*)0) ); + + using type = typename V::type; +}; +``` +(This same optimization, by the way, applies to our `is_base_of` implementation +of `mp_contains`.) + +The improvement in compile times on our benchmark is measurable: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, deduction+mp_inherit |0.3 |0.6 |1.4 |2.6 |4.5 |7.1 |10.7 |DNF + +|clang$$++$$ 3.5.1, deduction+mp_inherit |0.3 |0.4 |0.6 |0.8 |1.0 |1.4 |1.8 |2.2 + +|g$$++$$ 4.9.2, deduction+mp_inherit |0.3 |0.4 |0.6 |0.9 |1.3 |1.8 |2.3 |2.9 +|=== + +## mp_at + +With sets and maps covered, it's time to tackle vectors. Vectors for us are +just lists, to which we'll need to add the ability to efficiently access an +element based on its index. The customary name for this accessor is +`mp_at`, where `L` is a list and `I` is an `integral_constant` that +represents the index. We'll also follow the Boost.MPL convention and add +`mp_at_c`, where `I` is the index of type `size_t`. + +The recursive implementation of `mp_at` is: +``` +template struct mp_at_c_impl; + +template using mp_at_c = typename mp_at_c_impl::type; + +template using mp_at = typename mp_at_c_impl::type; + +template class L, class T1, class... T> + struct mp_at_c_impl, 0> +{ + using type = T1; +}; + +template class L, class T1, class... T, std::size_t I> + struct mp_at_c_impl, I> +{ + using type = mp_at_c, I-1>; +}; +``` +and the compile times for making `N` calls to `mp_at` with a list of size `N` +as the first argument are: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, recursive |3.6 |DNF |||||| + +|clang$$++$$ 3.5.1, recursive |1.0 |5.1 |15.3 |DNF |||| + +|g$$++$$ 4.9.2, recursive |0.9 |4.7 |14.2 |32.4 |DNF ||| +|=== +To improve upon this apalling result, we'll again exploit pack expansion into a +function call, but in a novel way. Let's suppose that we need to access the +fourth element (`I = 3`). We'll generate the function signature +``` +template W f( void*, void*, void*, W*, ... ); +``` +and then, given a list `L`, we'll evaluate the +expression +``` +decltype( f( (T1*)0, (T2*)0, (T3*)0, (T4*)0, (T5*)0, (T6*)0, (T7*)0 ) ) +``` +The three `void*` parameters will eat the first three elements, `W` will be +deduced as the fourth, and the ellipsis will take care of the rest. + +A working implementation based on this technique is shown below: +``` +// mp_repeat_c + +template struct mp_repeat_c_impl +{ + using _l1 = typename mp_repeat_c_impl::type; + using _l2 = typename mp_repeat_c_impl::type; + + using type = mp_append<_l1, _l1, _l2>; +}; + +template struct mp_repeat_c_impl<0, T...> +{ + using type = mp_list<>; +}; + +template struct mp_repeat_c_impl<1, T...> +{ + using type = mp_list; +}; + +template using mp_repeat_c = + typename mp_repeat_c_impl::type; + +// mp_at + +template struct mp_at_c_impl; + +template class L, class... T, + template class L2, class... U> + struct mp_at_c_impl, L2> +{ + template static W f( U*..., W*, ... ); + + using R = decltype( f( (mp_identity*)0 ... ) ); + + using type = typename R::type; +}; + +template using mp_at_c = + typename mp_at_c_impl>::type; + +template using mp_at = mp_at_c; +``` +and the compile times in the following table show it to be good enough for most +practical purposes. +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, void* |0.4 |1.1 |2.4 |4.7 |DNF ||| + +|clang$$++$$ 3.5.1, void* |0.4 |0.7 |1.2 |1.9 |2.7 |3.8 |5.0 |6.6 + +|g$$++$$ 4.9.2, void* |0.3 |0.5 |0.9 |1.3 |2.1 |3.0 |4.2 |5.5 +|=== +Are we done with `mp_at`, then? + +Let's try something else -- transform the input list `[T1, T2, T3]` into a map +`[[0, T1], [1, T2], [2, T3]]`, then use `mp_map_find` for the lookup: +[subs=+macros] +``` +// mp_map_from_list + +template struct mp_map_from_list_impl; + +template class L, class... T, std::size_t... J> + struct mp_map_from_list_impl, http://en.cppreference.com/w/cpp/utility/integer_sequence[integer_sequence]> +{ + using type = mp_list, T>...>; +}; + +template using mp_map_from_list = typename mp_map_from_list_impl::value>>::type; + +// mp_at + +template struct mp_at_c_impl +{ + using map = mp_map_from_list; + using type = mp_second>>; +}; + +template using mp_at_c = typename mp_at_c_impl::type; + +template using mp_at = typename mp_at_c_impl::type; +``` +At first sight, this looks ridiculous, but metaprogramming has its own rules. +Let's measure: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, map |0.3 |0.7 |1.5 |2.9 |5.0 |7.8 |11.9 |DNF + +|clang$$++$$ 3.5.1, map |0.3 |0.4 |0.6 |0.8 |1.1 |1.5 |1.8 |2.3 + +|g$$++$$ 4.9.2, map |0.3 |0.4 |0.7 |1.0 |1.4 |1.9 |2.5 |3.2 +|=== +Surprise, this is the best implementation. + +## mp_drop + +It turned out that we didn't need the `void*` trick for `mp_at`, but I'll show +an example where we do: `mp_drop`. `mp_drop` returns the list `L` without +its first `N` elements; or, in other words, it drops the first `N` elements +(presumably on the cutting room floor) and returns what's left. + +To implement `mp_drop`, we just need to change +``` +template W f( void*, void*, void*, W*, ... ); +``` +from above to return the rest of the elements, rather than just one: +``` +template mp_list f( void*, void*, void*, W*... ); +``` +Adding the necessary `mp_identity` seasoning produces the following working +implementation: +``` +template struct mp_drop_c_impl; + +template class L, class... T, + template class L2, class... U> + struct mp_drop_c_impl, L2> +{ + template static mp_identity> f( U*..., mp_identity*... ); + + using R = decltype( f( (mp_identity*)0 ... ) ); + + using type = typename R::type; +}; + +template using mp_drop_c = + typename mp_drop_c_impl>::type; + +template using mp_drop = mp_drop_c; +``` +I'll skip the recursive implementation and the performance comparison for this +one. We can pretty much tell who's going to win, and by how much. + +## mp_find_index + +The final algorithm that I'll bring to your attention is `mp_find_index`. +`mp_find_index` returns an integral constant of type `size_t` with a +value that is the index of the first occurence of `V` in `L`. If `V` is not in +`L`, the return value is the size of `L`. + +We'll start with the recursive implementation, as usual: +``` +template struct mp_find_index_impl; + +template using mp_find_index = typename mp_find_index_impl::type; + +template class L, class V> struct mp_find_index_impl, V> +{ + using type = mp_size_t<0>; +}; + +template class L, class... T, class V> + struct mp_find_index_impl, V> +{ + using type = mp_size_t<0>; +}; + +template class L, class T1, class... T, class V> + struct mp_find_index_impl, V> +{ + using type = mp_size_t<1 + mp_find_index, V>::value>; +}; +``` +and will continue with the compile times for `N` calls to `mp_find_index` on a +list with `N` elements, as usual: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, recursive |3.5 |DNF |||||| + +|clang$$++$$ 3.5.1, recursive |1.1 |5.5 |DNF ||||| + +|g$$++$$ 4.9.2, recursive |0.8 |4.6 |13.6 |DNF |||| +|=== +What can we do here? + +Let's go back to `mp_contains` and look at the "mp_count/cx_plus2" +implementation which we rejected in favor of the inheritance-based one. It +built a `constexpr` array of booleans and summed them in a `constexpr` +function. We can do the same here, except instead of summing the array, we can +find the index of the first true value: +``` +template struct mp_find_index_impl; + +template using mp_find_index = typename mp_find_index_impl::type; + +template class L, class V> struct mp_find_index_impl, V> +{ + using type = mp_size_t<0>; +}; + +constexpr std::size_t cx_find_index( bool const * first, bool const * last ) +{ + return first == last || *first? 0: 1 + cx_find_index( first + 1, last ); +} + +template class L, class... T, class V> + struct mp_find_index_impl, V> +{ + static constexpr bool _v[] = { std::is_same::value... }; + + using type = mp_size_t< cx_find_index( _v, _v + sizeof...(T) ) >; +}; +``` +The performance of this version is: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|clang$$++$$ 3.5.1, constexpr |0.5 |1.3 |2.9 |DNF |||| + +|g$$++$$ 4.9.2, constexpr |0.5 |1.4 |3.1 |5.5 |8.7 |13.0 |18.0 |DNF +|=== +which, while not ideal, is significantly better than our recursive punching +bag. But if our compiler of choice is VC$$++$$ 2013, we can't use `constexpr`. + +We may attempt an implementation along the same lines, but with the `constexpr` +function replaced with ordinary metaprogramming: +``` +template struct mp_find_index_impl; + +template using mp_find_index = typename mp_find_index_impl::type; + +template class L, class V> struct mp_find_index_impl, V> +{ + using type = mp_size_t<0>; +}; + +template struct find_index_impl_; + +template<> struct find_index_impl_<> +{ + static const std::size_t value = 0; +}; + +template struct find_index_impl_ +{ + static const std::size_t value = B1? 0: 1 + find_index_impl_::value; +}; + +template + struct find_index_impl_ +{ + static const std::size_t value = B1? 0: B2? 1: B3? 2: B4? 3: B5? 4: + B6? 5: B7? 6: B8? 7: B9? 8: B10? 9: 10 + find_index_impl_::value; +}; + +template class L, class... T, class V> + struct mp_find_index_impl, V> +{ + using type = mp_size_t::value...>::value>; +}; +``` +This is still recursive, so we don't expect miracles, but it wouldn't hurt to +measure: +|=== +||N=100 |N=200 |N=300 |N=400 |N=500 |N=600 |N=700 |N=800 + +|VC$$++$$ 2013, bool... |4.7 |94.5 |488.3 |XFA |||| + +|clang$$++$$ 3.5.1, bool... |0.6 |2.2 |5.8 |12.0 |21.7 |35.2 |DNF | + +|g$$++$$ 4.9.2, bool... |0.6 |2.4 |6.5 |13.2 |23.8 |39.1 |59.0 |DNF +|=== +(where XFA stands for "experimenter fell asleep".) + +This is an interesting tradeoff for VC$$++$$ 2013 and Clang. On the one hand, +this implementation is slower; on the other, it doesn't crash the compiler as +easily. Which to prefer is a matter of taste and of stern evaluation of one's +needs to manipulate type lists of length 300. + +Note that once we have `mp_drop` and `mp_find_index`, we can derive the +`mp_find` algorithm, which returns the suffix of `L` starting with the +first occurence of `V`, if any, and an empty list otherwise, by using +`mp_drop>`. + +## Conclusion + +In this article, I have shown efficient algorithms that allow us to treat type +lists as sets, maps and vectors, demonstrating various {cpp}11 implementation +techniques in the process.