diff --git a/doc/index.html b/doc/index.html index 3f5ce30..85a338f 100644 --- a/doc/index.html +++ b/doc/index.html @@ -198,8 +198,7 @@ application concerns.
-big_int32_t x; +big_int32_t x; ... read into x from a file ... @@ -241,8 +240,7 @@ generate exactly the same code for both.
-big_int32_t x; +big_int32_t x; ... read into x from a file ... @@ -253,8 +251,7 @@ for (int32_t i = 0; i < 1000000; ++i)
-int32_t x; +int32_t x; ... read into x from a file ... @@ -290,121 +287,91 @@ stores, multiple instructions are required.These tests were run against release builds on a circa 2012 4-core little endian X64 Intel Core i5-3570K CPU @ 3.40GHz under Windows 7.
-See speed_test.cpp, -speed_test_functions.hpp, -speed_test_functions.cpp, and -Jamfile.v2 for the actual code and build. The timed functions are in a separate -compilation unit to prevent being optimized away.
- -Because the timings are anomalous, particularly for those high-lighted below -in yellow, the generated code from the GNU compiler was studied in detail. -Exactly the same code is being generated for by-value conversion functions, -in-place conversion functions, and the endian types. Exactly the same code is -being generated whether intrinsics are used or not for 32 and 64-bit tests. +
See loop_time_test.cpp and +Jamfile.v2 for the actual code and build +setup. (For GCC 4.7, there are no 16-bit intrinsics, so they are emulated by using 32-bit intrinsics.)
-
- Conclusions -The decision to use endian types or endian conversion functions should be - made based on application use cases, not assumptions about generated code - efficiency. Modern optimizers generate the same code for either approach, - and whether or not intrinsics are available. |
-
GNU g++ version 4.7.0 | ||||||
Iterations: 1,000,000,000, Intrinsics: __builtin_bswap16, etc. | ||||||
GNU C++ version 4.7.0 | ||||||
Iterations: 1000000000, Intrinsics: __builtin_bswap16, etc. | ||||||
Test Case | -int arg |
-int value(arg) |
-int in place(arg) |
-Endian arg |
+Endian type |
+Endian conversion function |
16-bit aligned big endian | 2.71 s | -2.42 s | 2.42 s | 2.68 s | ||
16-bit aligned little endian | 2.42 s | -2.40 s | 2.68 s | 2.45 s | ||
32-bit aligned big endian | 2.68 s | -2.70 s | 2.70 s | 2.68 s | ||
32-bit aligned little endian | 2.68 s | -2.68 s | 2.65 s | 2.68 s | ||
64-bit aligned big endian | 2.96 s | -2.95 s | -2.95 s | -2.95 s | ||
64-bit aligned little endian | 2.42 s | -2.40 s | 2.70 s | 2.42 s | ||
Iterations: 1,000,000,000, Intrinsics: no byte swap intrinsics | ||||||
16-bit aligned big endian | 1.37 s | 0.81 s | ||||
16-bit aligned little endian | 0.83 s | 0.81 s | ||||
16-bit unaligned big endian | 1.09 s | 0.83 s | ||||
16-bit unaligned little endian | 1.09 s | 0.81 s | ||||
32-bit aligned big endian | 0.98 s | 0.27 s | ||||
32-bit aligned little endian | 0.28 s | 0.27 s | ||||
32-bit unaligned big endian | 3.82 s | 0.27 s | ||||
32-bit unaligned little endian | 3.82 s | 0.27 s | ||||
64-bit aligned big endian | 1.65 s | 0.41 s | ||||
64-bit aligned little endian | 0.41 s | 0.41 s | ||||
64-bit unaligned big endian | 17.53 s | 0.41 s | ||||
64-bit unaligned little endian | 17.52 s | 0.41 s | ||||
Iterations: 1000000000, Intrinsics: no byte swap intrinsics | ||||||
Test Case | -int arg |
-int value(arg) |
-int in place(arg) |
-Endian arg |
+Endian type |
+Endian conversion function |
16-bit aligned big endian | 2.71 s | -2.42 s | 2.42 s | 2.68 s | ||
16-bit aligned little endian | 2.42 s | -2.40 s | 2.68 s | 2.42 s | ||
32-bit aligned big endian | 2.68 s | -2.70 s | 2.67 s | 2.70 s | ||
32-bit aligned little endian | 2.68 s | -2.67 s | 2.70 s | 2.67 s | ||
64-bit aligned big endian | 2.96 s | -2.95 s | -2.95 s | -2.93 s | ||
64-bit aligned little endian | 2.42 s | -2.42 s | 2.67 s | 2.40 s | ||
16-bit aligned big endian | 1.95 s | 0.81 s | ||||
16-bit aligned little endian | 0.83 s | 0.81 s | ||||
16-bit unaligned big endian | 1.19 s | 0.81 s | ||||
16-bit unaligned little endian | 1.20 s | 0.81 s | ||||
32-bit aligned big endian | 0.97 s | 0.28 s | ||||
32-bit aligned little endian | 0.27 s | 0.28 s | ||||
32-bit unaligned big endian | 4.10 s | 0.27 s | ||||
32-bit unaligned little endian | 4.10 s | 0.27 s | ||||
64-bit aligned big endian | 1.64 s | 0.42 s | ||||
64-bit aligned little endian | 0.41 s | 0.41 s | ||||
64-bit unaligned big endian | 17.52 s | 0.42 s | ||||
64-bit unaligned little endian | 17.52 s | 0.41 s |
Microsoft Visual C++ version 11.0 | ||||||
Iterations: 1,000,000,000, Intrinsics: cstdlib _byteswap_ushort, etc. | ||||||
Iterations: 1000000000, Intrinsics: cstdlib _byteswap_ushort, etc. | ||||||
Test Case | -int arg |
-int value(arg) |
-int in place(arg) |
-Endian arg |
+Endian type |
+Endian conversion function |
16-bit aligned big endian | 1.90 s | -1.87 s | 1.89 s | 1.87 s | ||
16-bit aligned little endian | 1.89 s | -1.87 s | 1.89 s | 1.87 s | ||
32-bit aligned big endian | 1.89 s | -1.87 s | 1.89 s | 1.87 s | ||
32-bit aligned little endian | 1.89 s | -1.87 s | 1.87 s | 1.89 s | ||
64-bit aligned big endian | 1.87 s | -1.89 s | 1.87 s | 1.89 s | ||
64-bit aligned little endian | 1.87 s | -1.87 s | 1.87 s | 1.89 s | ||
Iterations: 1,000,000,000, Intrinsics: no byte swap intrinsics | ||||||
16-bit aligned big endian | 2.18 s | 0.83 s | ||||
16-bit aligned little endian | 0.81 s | 0.83 s | ||||
16-bit unaligned big endian | 1.64 s | 0.83 s | ||||
16-bit unaligned little endian | 1.64 s | 0.83 s | ||||
32-bit aligned big endian | 0.83 s | 0.81 s | ||||
32-bit aligned little endian | 0.83 s | 0.81 s | ||||
32-bit unaligned big endian | 3.01 s | 0.83 s | ||||
32-bit unaligned little endian | 3.01 s | 0.81 s | ||||
64-bit aligned big endian | 1.09 s | 1.05 s | ||||
64-bit aligned little endian | 0.83 s | 1.03 s | ||||
64-bit unaligned big endian | 12.64 s | 1.01 s | ||||
64-bit unaligned little endian | 8.41 s | 0.83 s | ||||
Iterations: 1000000000, Intrinsics: no byte swap intrinsics | ||||||
Test Case | -int arg |
-int value(arg) |
-int in place(arg) |
-Endian arg |
+Endian type |
+Endian conversion function |
16-bit aligned big endian | 1.90 s | -1.89 s | 1.87 s | 1.87 s | ||
16-bit aligned little endian | 1.89 s | -1.87 s | 1.89 s | 1.87 s | ||
32-bit aligned big endian | 1.89 s | -1.87 s | 1.87 s | 1.89 s | ||
32-bit aligned little endian | 1.87 s | -1.89 s | 1.87 s | 1.89 s | ||
64-bit aligned big endian | 2.32 s | -2.46 s | -2.45 s | -2.34 s | ||
64-bit aligned little endian | 1.87 s | -1.87 s | 1.89 s | 1.87 s | ||
16-bit aligned big endian | 0.84 s | 0.81 s | ||||
16-bit aligned little endian | 0.83 s | 0.81 s | ||||
16-bit unaligned big endian | 1.65 s | 0.81 s | ||||
16-bit unaligned little endian | 1.65 s | 0.83 s | ||||
32-bit aligned big endian | 3.46 s | 0.83 s | ||||
32-bit aligned little endian | 0.81 s | 0.83 s | ||||
32-bit unaligned big endian | 3.01 s | 0.81 s | ||||
32-bit unaligned little endian | 3.01 s | 0.81 s | ||||
64-bit aligned big endian | 10.50 s | 0.83 s | ||||
64-bit aligned little endian | 0.83 s | 0.97 s | ||||
64-bit unaligned big endian | 12.62 s | 0.81 s | ||||
64-bit unaligned little endian | 8.42 s | 0.81 s |
Last revised: -26 May, 2013
+28 May, 2013© Copyright Beman Dawes, 2011, 2013
Distributed under the Boost Software License, Version 1.0. See www.boost.org/ LICENSE_1_0.txt