Convert Endian documentation to asciidoc

2019-03-24 10:28:32 -04:00
parent 6b65a4b6ed
commit 0427a95459
5 changed files with 508 additions and 0 deletions
--- a/doc/.gitignore
+++ b/doc/.gitignore
@@ -0,0 +1,2 @@
+/html/
+/pdf/
--- a/doc/Jamfile
+++ b/doc/Jamfile
@@ -0,0 +1,29 @@
+# Copyright 2019 Glen Joseph Fernandes
+# (glenjofe@gmail.com)
+#
+# Distributed under the Boost Software License, Version 1.0.
+# (http://www.boost.org/LICENSE_1_0.txt)
+
+project doc/endian ;
+
+import asciidoctor ;
+
+html endian.html : endian.adoc ;
+
+install html_ : endian.html : <location>html ;
+
+pdf endian.pdf : endian.adoc ;
+
+explicit endian.pdf ;
+
+install pdf_ : endian.pdf : <location>pdf ;
+
+explicit pdf_ ;
+
+alias boostdoc ;
+
+explicit boostdoc ;
+
+alias boostrelease : html_ ;
+
+explicit boostrelease ;
--- a/doc/endian-docinfo-footer.html
+++ b/doc/endian-docinfo-footer.html
@@ -0,0 +1,4 @@
+<style>
+*:not(pre)>code { background: none; color: #600000; }
+table tr.even, table tr.alt, table tr:nth-of-type(even) { background: none; }
+</style>
--- a/doc/endian.adoc
+++ b/doc/endian.adoc
@@ -0,0 +1,30 @@
+////
+Copyright 2019 Glen Joseph Fernandes
+(glenjofe@gmail.com)
+
+Distributed under the Boost Software License, Version 1.0.
+(http://www.boost.org/LICENSE_1_0.txt)
+////
+
+# Boost.Endian: The Boost Endian Library
+Beman Dawes
+:toc: left
+:toclevels: 2
+:idprefix:
+:listing-caption: Code Example
+:docinfo: private-footer
+
+:leveloffset: +1
+
+include::endian/overview.adoc[]
+
+:leveloffset: -1
+
+[appendix]
+## Copyright and License
+
+This documentation is
+
+* Copyright 2011-2016 Beman Dawes
+
+and is distributed under the http://www.boost.org/LICENSE_1_0.txt[Boost Software License, Version 1.0].
--- a/doc/endian/overview.adoc
+++ b/doc/endian/overview.adoc
@@ -0,0 +1,443 @@
+////
+Copyright 2011-2016 Beman Dawes
+
+Distributed under the Boost Software License, Version 1.0.
+(http://www.boost.org/LICENSE_1_0.txt)
+////
+
+[#overview]
+# Overview
+
+## Abstract
+
+Boost.Endian provides facilities to manipulate the
+<<overview_endianness,endianness>> of integers and user-defined types.
+
+* Three approaches to endianness are supported. Each has a long history of
+successful use, and each approach has use cases where it is preferred over the
+other approaches.
+* Primary uses:
+** Data portability. The Endian library supports binary data exchange, via
+either external media or network transmission, regardless of platform
+endianness.
+** Program portability. POSIX-based and Windows-based operating systems
+traditionally supply libraries with non-portable functions to perform endian
+conversion. There are at least four incompatible sets of functions in common
+use. The Endian library is portable across all {cpp} platforms.
+* Secondary use: Minimizing data size via sizes and/or alignments not supported
+by the standard {cpp} integer types.
+
+[#overview_endianness]
+## Introduction to endianness
+
+Consider the following code:
+
+```
+int16_t i = 0x0102;
+FILE * file = fopen("test.bin", "wb"); // binary file!
+fwrite(&amp;i, sizeof(int16_t), 1, file);
+fclose(file);
+```
+
+On OS X, Linux, or Windows systems with an Intel CPU, a hex dump of the
+"test.bin" output file produces:
+
+```
+0201
+```
+
+On OS X systems with a PowerPC CPU, or Solaris systems with a SPARC CPU, a hex
+dump of the "test.bin" output file produces:
+
+```
+0102
+```
+
+What's happening here is that Intel CPUs order the bytes of an integer with the
+least-significant byte first, while SPARC CPUs place the most-significant byte
+first. Some CPUs, such as the PowerPC, allow the operating system to choose
+which ordering applies.
+
+Most-significant-byte-first ordering is traditionally called "big endian"
+ordering and least-significant-byte-first is traditionally called
+"little-endian" ordering. The names are derived from
+http://en.wikipedia.org/wiki/Jonathan_Swift[Jonathan Swift]'s satirical novel
+_http://en.wikipedia.org/wiki/Gulliver's_Travels[Gulliver's Travels]_, where
+rival kingdoms opened their soft-boiled eggs at different ends.
+
+See Wikipedia's http://en.wikipedia.org/wiki/Endianness[Endianness] article for
+an extensive discussion of endianness.
+
+Programmers can usually ignore endianness, except when reading a core  dump on
+little-endian systems. But programmers  have to deal with endianness when
+exchanging binary integers and binary floating point values between computer
+systems with differing endianness, whether by physical file transfer or over a
+network. And programmers may also want to use the library when minimizing either
+internal or external data sizes is advantageous.
+
+[#overview_introduction]
+## Introduction to the Boost.Endian library
+
+Boost.Endian provides three different approaches to dealing with endianness. All
+three approaches support integers and user-define types (UDTs).
+
+Each approach has a long history of successful use, and each approach has use
+cases where it is preferred to the other approaches.
+
+<<conversion,Endian conversion functions>>::
+The application uses the built-in integer types to hold values, and calls the
+provided conversion functions to convert byte ordering as needed. Both mutating
+and non-mutating conversions are supplied, and each comes in unconditional and
+conditional variants.
+
+<<buffers, Endian buffer types>>::
+The application uses the provided endian buffer types to hold values, and
+explicitly converts to and from the built-in integer types. Buffer sizes of 8,
+16, 24, 32, 40, 48, 56, and 64 bits (i.e. 1, 2, 3, 4, 5, 6, 7, and 8 bytes) are
+provided. Unaligned integer buffer types are provided for all sizes, and aligned
+buffer types are provided for 16, 32, and 64-bit sizes. The provided specific
+types are typedefs for a generic class template that may be used directly for
+less common use cases.
+
+<<arithmetic.html, Endian arithmetic types>>::
+The application uses the provided endian arithmetic types, which supply the same
+operations as the built-in {cpp} arithmetic types. All conversions are implicit.
+Arithmetic sizes of 8, 16, 24, 32, 40, 48, 56, and 64 bits (i.e. 1, 2, 3, 4, 5,
+6, 7, and 8 bytes) are provided. Unaligned integer types are provided for all
+sizes and aligned arithmetic types are provided for 16, 32, and 64-bit sizes.
+The provided specific types are typedefs for a generic class template that may
+be used directly in generic code of for less common use cases.
+
+Boost Endian is a header-only library. {cpp}11 features affecting interfaces,
+such as `noexcept`, are  used only if available. See
+<<overview_cpp03_support,{cpp}03 support for {cpp}11 features>> for details.
+
+## Choosing between conversion functions, buffer types, and arithmetic types
+
+This section has been moved to its own <<choosing,Choosing the Approach>> page.
+
+[#overview_intrinsic]
+## Built-in support for Intrinsics
+
+Most compilers, including GCC, Clang, and Visual {cpp}, supply  built-in support
+for byte swapping intrinsics. The Endian library uses these intrinsics when
+available since they may result in smaller and faster generated code,
+particularly for optimized builds.
+
+Defining the macro `BOOST_ENDIAN_NO_INTRINSICS` will suppress use of the
+intrinsics. This is useful when a compiler has no intrinsic support or fails to
+locate the appropriate header, perhaps because it is an older release or has
+very limited supporting libraries.
+
+The macro `BOOST_ENDIAN_INTRINSIC_MSG` is defined as either
+`"no byte swap intrinsics"` or a string describing the particular set of
+intrinsics being used. This is useful for eliminating missing intrinsics as a
+source of performance issues.
+
+## Performance
+
+Consider this problem:
+
+### Example 1
+Add 100 to a big endian value in a file, then write the result to a file
+[%header,cols=2*]
+|===
+|Endian arithmetic type approach |Endian conversion function approach
+a|
+----
+big_int32_at x;
+
+... read into x from a file ...
+
+x += 100;
+
+... write x to a file ...
+----
+a|
+----
+int32_t x;
+
+... read into x from a file ...
+
+big_to_native_inplace(x);
+x += 100;
+native_to_big_inplace(x);
+
+... write x to a file ...
+----
+|===
+
+*There will be no performance difference between the two approaches in optimized
+builds, regardless of the native endianness of the machine.* That's because
+optimizing compilers will generate exactly the same code for each. That
+conclusion was confirmed by studying the generated assembly code for GCC and
+Visual {cpp}. Furthermore, time spent doing I/O will determine the speed of this
+application.
+
+Now consider a slightly different problem:
+
+### Example 2
+Add a million values to a big endian value in a file, then write the result to a
+file
+[%header,cols=2*]
+|===
+|Endian arithmetic type approach |Endian conversion function approach
+a|
+----
+big_int32_at x;
+
+... read into x from a file ...
+
+for (int32_t i = 0; i < 1000000; ++i)
+  x += i;
+
+... write x to a file ...
+----
+a|
+----
+int32_t x;
+
+... read into x from a file ...
+
+big_to_native_inplace(x);
+
+for (int32_t i = 0; i < 1000000; ++i)
+  x += i;
+
+native_to_big_inplace(x);
+
+... write x to a file ...
+----
+|===
+
+With the Endian arithmetic approach, on little endian platforms an implicit
+conversion from and then back to big endian is done inside the loop. With the
+Endian conversion function approach, the user has ensured the conversions are
+done outside the loop, so the code may run more quickly on little endian
+platforms.
+
+### Timings
+
+These tests were run against release builds on a circa 2012 4-core little endian
+X64 Intel Core i5-3570K CPU @ 3.40GHz under Windows 7.
+
+CAUTION: The Windows CPU timer has very high granularity. Repeated runs of the
+same tests often yield considerably different results.
+
+See `test/loop_time_test.cpp` for the actual code and `benchmark/Jamfile.v2` for
+the build setup.
+
+#### GNU C++ version 4.8.2 on Linux virtual machine
+Iterations: 10'000'000'000, Intrinsics: __builtin_bswap16, etc.
+[%header,cols=3*]
+|===
+|Test Case |Endian arithmetic type |Endian conversion function
+|16-bit aligned big endian |8.46 s |5.28 s
+|16-bit aligned little endian |5.28 s |5.22 s
+|32-bit aligned big endian |8.40 s |2.11 s
+|32-bit aligned little endian |2.11 s |2.10 s
+|64-bit aligned big endian |14.02 s |3.10 s
+|64-bit aligned little endian |3.00 s |3.03 s
+|===
+
+#### Microsoft Visual C++ version 14.0
+Iterations: 10'000'000'000, Intrinsics: cstdlib _byteswap_ushort, etc.
+[%header,cols=3*]
+|===
+|Test Case |Endian arithmetic type |Endian conversion function
+|16-bit aligned big endian |8.27 s |5.26 s
+|16-bit aligned little endian |5.29 s |5.32 s
+|32-bit aligned big endian |8.36 s |5.24 s
+|32-bit aligned little endian |5.24 s |5.24 s
+|64-bit aligned big endian |13.65 s |3.34 s
+|64-bit aligned little endian |3.35 s |2.73 s
+|===
+
+## Overall FAQ
+
+Is the implementation header only?::
+Yes.
+
+Are {cpp}03 compilers supported?::
+Yes.
+
+Does the implementation use compiler intrinsic built-in byte swapping?::
+Yes, if available. See <<overview_intrinsic,Intrinsic built-in support>>.
+
+Why bother with endianness?::
+Binary data portability is the primary use case.
+
+Does endianness have any uses outside of portable binary file or network I/O formats?::
+Using the unaligned integer types with a size tailored to the application's
+needs is a minor secondary use that saves internal or external memory space. For
+example, using `big_int40_buf_t` or `big_int40_t` in a large array saves a lot
+of space compared to one of the 64-bit types.
+
+Why bother with binary I/O? Why not just use {cpp} Standard Library stream inserters and extractors?::
+* Data interchange formats often specify binary integer data. Binary integer
+data is smaller and therefore I/O is faster and file sizes are smaller. Transfer
+between systems is less expensive.
+* Furthermore, binary integer data is of fixed size, and so fixed-size disk
+records are possible without padding, easing sorting and allowing random access.
+* Disadvantages, such as the inability to use text utilities on the resulting
+files, limit usefulness to applications where the binary I/O advantages are
+paramount.
+
+Which is better, big-endian or little-endian?::
+Big-endian tends to be preferred in a networking environment and is a bit more
+of an industry standard, but little-endian may be preferred for applications
+that run primarily on x86, x86-64, and other little-endian CPU's. The
+http://en.wikipedia.org/wiki/Endian[Wikipedia] article gives more pros and cons.
+
+Why are only big and little native endianness supported?::
+These are the only endian schemes that have any practical value today. PDP-11
+and the other middle endian approaches are interesting  curiosities but have no
+relevance for today's {cpp} developers. The same is true for architectures that
+allow runtime endianness switching. The
+<<conversion_native_order_specification,specification for native ordering>> has
+been carefully crafted to allow support for such orderings in the future, should
+the need arise. Thanks to Howard Hinnant for suggesting this.
+
+Why do both the buffer and arithmetic types exist?::
+Conversions in the buffer types are explicit. Conversions in the arithmetic
+types are implicit. This fundamental difference is a deliberate design feature
+that would be lost if the inheritance hierarchy were collapsed.
+The original design provided only arithmetic types. Buffer types were requested
+during formal review by those wishing total control over when conversion occurs.
+They also felt that buffer types would be less likely to be misused by
+maintenance programmers not familiar with the implications of performing a lot
+of integer operations on the endian arithmetic integer types.
+
+What is gained by using the buffer types rather than always just using the arithmetic types?::
+Assurance that hidden conversions are not performed. This is of overriding
+importance to users concerned about achieving the ultimate in terms of speed.
+"Always just using the arithmetic types" is fine for other users. When the
+ultimate in speed needs to be ensured, the arithmetic types can be used in the
+same design patterns or idioms that would be used for buffer types, resulting in
+the same code being generated for either types.
+
+What are the limitations of integer support?::
+Tests have only been performed on machines that  use two's complement
+arithmetic. The Endian conversion functions only support 16, 32, and 64-bit
+aligned integers. The endian types only support 8, 16, 24, 32, 40, 48, 56, and
+64-bit unaligned integers, and 8, 16, 32, and 64-bit aligned integers.
+
+Why is there no floating point support?::
+An attempt was made to support four-byte ``float``s and eight-byte
+``double``s, limited to
+http://en.wikipedia.org/wiki/IEEE_floating_point[IEEE 754] (also known as
+ISO/IEC/IEEE 60559) floating point and further limited to systems where floating
+point endianness does not differ from integer endianness. Even with those
+limitations, support for floating point types was not reliable and was removed.
+For example, simply reversing the endianness of a floating point number can
+result in a signaling-NAN. For all practical purposes, binary serialization and
+endianness for integers are one and the same problem. That is not true for
+floating point numbers, so binary serialization interfaces and formats for
+floating point does not fit well in an endian-based library.
+
+
+## Release history
+
+### Changes requested by formal review
+
+The library was reworked from top to bottom to accommodate changes requested
+during the formal review. See <<appendix_mini_review_topics,Mini-Review>>
+page for details.
+
+### Other changes since formal review
+
+* Header `boost/endian/endian.hpp` has been renamed to
+`boost/endian/arithmetic.hpp`. Headers
+`boost/endian/conversion.hpp` and `boost/endian/buffers.hpp` have been added.
+Infrastructure file names were changed accordingly.
+* The endian arithmetic type aliases have been renamed, using a naming pattern
+that is consistent for both integer and floating point, and a consistent set of
+aliases supplied for the endian buffer types.
+* The unaligned-type alias names still have the `_t` suffix, but the
+aligned-type alias names now have an `_at` suffix.
+* `endian_reverse()` overloads for `int8_t` and `uint8_t` have been added for
+improved generality. (Pierre Talbot)
+* Overloads of `endian_reverse_inplace()` have been replaced with a single
+`endian_reverse_inplace()` template. (Pierre Talbot)
+* For X86 and X64 architectures, which permit unaligned loads and stores,
+unaligned little endian buffer and arithmetic types use regular loads and
+stores when the size is exact. This makes unaligned little endian buffer and
+arithmetic types significantly more efficient on these architectures. (Jeremy
+Maitin-Shepard)
+* {cpp}11 features affecting interfaces, such as `noexcept`, are now used.
+{cpp}03 compilers are still supported.
+* Acknowledgements have been updated.
+
+## Compatibility with interim releases
+
+Prior to the official Boost release, class template `endian_arithmetic` has been
+used for a decade or more with the same functionality but under the name
+`endian`. Other names also changed in the official release. If the macro
+`BOOST_ENDIAN_DEPRECATED_NAMES` is defined, those old now deprecated names are
+still supported. However, the class template `endian` name is only provided for
+compilers supporting {cpp}11 template aliases. For {cpp}03 compilers, the name
+will have to be changed to `endian_arithmetic`.
+
+To support backward header compatibility, deprecated header
+`boost/endian/endian.hpp` forwards to `boost/endian/arithmetic.hpp`. It requires
+`BOOST_ENDIAN_DEPRECATED_NAMES` be defined. It should only be used while
+transitioning to the official Boost release of the library as it will be removed
+in some future release.
+
+## {cpp}03 support for {cpp}11 features
+
+[%header,cols=2*]
+|===
+|{cpp}11 Feature
+|Action with {cpp}03 Compilers
+|Scoped enums
+|Uses header
+http://www.boost.org/libs/core/doc/html/core/scoped_enum.html[boost/core/scoped_enum.hpp]
+to emulate {cpp}11 scoped enums.
+|`noexcept`
+|Uses `BOOST_NOEXCEPT` macro, which is defined as null for compilers not
+supporting this {cpp}11 feature.
+|{cpp}11 PODs
+(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2342.htm[N2342])
+|Takes advantage of {cpp}03 compilers that relax {cpp}03 POD rules, but see
+Limitations <<buffers_limitations,here>> and <<arithmetic_limitations,here>>.
+Also see macros for explicit POD control <<buffers_compilation,here>> and
+<<arithmetic_compilation,here>>
+|===
+
+## Future directions
+
+Standardization.::
+The plan is to submit Boost.Endian to the {cpp} standards committee for possible
+inclusion in a Technical Specification or the {cpp} standard itself.
+
+Specializations for `numeric_limits`.::
+Roger Leigh requested that all `boost::endian` types provide `numeric_limits`
+specializations.
+See https://github.com/boostorg/endian/issues/4[GitHub issue 4].
+
+Character buffer support.::
+Peter Dimov pointed out during the mini-review that getting and setting basic
+arithmetic types (or `<cstdint>` equivalents) from/to an offset into an array of
+unsigned char is a common need. See
+http://lists.boost.org/Archives/boost/2015/01/219574.php[Boost.Endian
+mini-review posting].
+
+Out-of-range detection.::
+Peter Dimov pointed suggested during the mini-review that throwing an exception
+on buffer values being out-of-range might be desirable. See the end of
+http://lists.boost.org/Archives/boost/2015/01/219659.php[this posting] and
+subsequent replies.
+
+## Acknowledgements
+
+Comments and suggestions were received from Adder, Benaka Moorthi, Christopher
+Kohlhoff, Cliff Green, Daniel James, Dave Handley, Gennaro Proto, Giovanni Piero
+Deretta, Gordon Woodhull, dizzy, Hartmut Kaiser, Howard Hinnant, Jason Newton,
+Jeff Flinn, Jeremy Maitin-Shepard, John Filo, John Maddock, Kim Barrett, Marsh
+Ray, Martin Bonner, Mathias Gaunard, Matias Capeletto, Neil Mayhew, Nevin Liber,
+Olaf van der Spek, Paul Bristow, Peter Dimov, Pierre Talbot, Phil Endecott,
+Philip Bennefall, Pyry Jahkola, Rene Rivera, Robert Stewart, Roger Leigh, Roland
+Schwarz, Scott McMurray, Sebastian Redl, Tim Blechmann, Tim Moore, tymofey,
+Tomas Puverle, Vincente Botet, Yuval Ronen and Vitaly Budovsk. Apologies if
+anyone has been missed.