diff --git a/doc/endian.adoc b/doc/endian.adoc index 8f09fd2..3f48d5a 100644 --- a/doc/endian.adoc +++ b/doc/endian.adoc @@ -20,6 +20,8 @@ include::endian/overview.adoc[] include::endian/conversion.adoc[] +include::endian/arithmetic.adoc[] + include::endian/choosing_approach.adoc[] include::endian/mini_review_topics.adoc[] diff --git a/doc/endian/arithmetic.adoc b/doc/endian/arithmetic.adoc new file mode 100644 index 0000000..42d7eea --- /dev/null +++ b/doc/endian/arithmetic.adoc @@ -0,0 +1,561 @@ +//// +Copyright 2011-2016 Beman Dawes + +Distributed under the Boost Software License, Version 1.0. +(http://www.boost.org/LICENSE_1_0.txt) +//// + +[#arithmetic] +# Endian Arithmetic Types + +## Introduction + +Header `boost/endian/arithmetic.hpp` provides integer binary types with +control over byte order, value type, size, and alignment. Typedefs provide +easy-to-use names for common configurations. + +These types provide portable byte-holders for integer data, independent of +particular computer architectures. Use cases almost always involve I/O, either +via files or network connections. Although data portability is the primary +motivation, these integer byte-holders may also be used to reduce memory use, +file size, or network activity since they provide binary integer sizes not +otherwise available. + +Such integer byte-holder types are traditionally called *endian* types. See the +http://en.wikipedia.org/wiki/Endian[Wikipedia] for a full exploration of +*endianness*, including definitions of *big endian* and *little endian*. + +Boost endian integers provide the same full set of {cpp} assignment, arithmetic, +and relational operators as {cpp} standard integral types, with the standard +semantics. + +Unary arithmetic operators are `+`, `-`, `~`, `!`, plus both prefix and postfix +`--` and `++`. Binary arithmetic operators are `+`, `+=`, `-`, `-=`, `\*`, +``*=``, `/`, `/=`, `&`, `&=`, `|`, `|=`, `^`, `^=`, `<<`, `<\<=`, `>>`, and +`>>=`. Binary relational operators are `==`, `!=`, `<`, `<=`, `>`, and `>=`. + +Implicit conversion to the underlying value type is provided. An implicit +constructor converting from the underlying value type is provided. + +## Example +The `endian_example.cpp` program writes a binary file containing four-byte, +big-endian and little-endian integers: + +``` +#include +#include +#include +#include + +using namespace boost::endian; + +namespace +{ + // This is an extract from a very widely used GIS file format. + // Why the designer decided to mix big and little endians in + // the same file is not known. But this is a real-world format + // and users wishing to write low level code manipulating these + // files have to deal with the mixed endianness. + + struct header + { + big_int32_t file_code; + big_int32_t file_length; + little_int32_t version; + little_int32_t shape_type; + }; + + const char* filename = "test.dat"; +} + +int main(int, char* []) +{ + header h; + + BOOST_STATIC_ASSERT(sizeof(h) == 16U); // reality check + + h.file_code = 0x01020304; + h.file_length = sizeof(header); + h.version = 1; + h.shape_type = 0x01020304; + + // Low-level I/O such as POSIX read/write or + // fread/fwrite is sometimes used for binary file operations + // when ultimate efficiency is important. Such I/O is often + // performed in some {cpp} wrapper class, but to drive home the + // point that endian integers are often used in fairly + // low-level code that does bulk I/O operations, + // fopen/fwrite is used for I/O in this example. + + std::FILE* fi = std::fopen(filename, "wb"); // MUST BE BINARY + + if (!fi) + { + std::cout << "could not open " << filename << '\n'; + return 1; + } + + if (std::fwrite(&h, sizeof(header), 1, fi)!= 1) + { + std::cout << "write failure for " << filename << '\n'; + return 1; + } + + std::fclose(fi); + + std::cout << "created file " << filename << '\n'; + + return 0; +} +``` + +After compiling and executing `endian_example.cpp`, a hex dump of `test.dat` +shows: + +``` +01020304 00000010 01000000 04030201 +``` + +Notice that the first two 32-bit integers are big endian while the second two +are little endian, even though the machine this was compiled and run on was +little endian. + +## Limitations + +Requires ``, `CHAR_BIT == 8`. If `CHAR_BIT` is some other value, +compilation will result in an `#error`. This restriction is in place because the +design, implementation, testing, and documentation has only considered issues +related to 8-bit bytes, and there have been no real-world use cases presented +for other sizes. + +In {cpp}03, `endian_arithmetic` does not meet the requirements for POD types +because it has constructors, private data members, and a base class. This means +that common use cases are relying on unspecified behavior in that the {cpp} +Standard does not guarantee memory layout for non-POD types. This has not been a +problem in practice since all known {cpp} compilers lay out memory as if +`endian` were a POD type. In {cpp}11, it is possible to specify the default +constructor as trivial, and private data members and base classes no longer +disqualify a type from being a POD type. Thus under {cpp}11, `endian_arithmetic` +will no longer be relying on unspecified behavior. + +## Feature set + +* Big endian| little endian | native endian byte ordering. +* Signed | unsigned +* Unaligned | aligned +* 1-8 byte (unaligned) | 1, 2, 4, 8 byte (aligned) +* Choice of value type + +## Enums and typedefs + +Two scoped enums are provided: + +``` +enum class order {big, little, native}; + +enum class align {no, yes}; +``` + +One class template is provided: + +``` +template +class endian_arithmetic; +``` + +Typedefs, such as `big_int32_t`, provide convenient naming conventions for +common use cases: + +[%header,cols=5*] +|=== +|Name |Alignment |Endianness |Sign |Sizes in bits (n) +|big_intn_t |no |big |signed |8,16,24,32,40,48,56,64 +|big_uintn_t |no |big |unsigned |8,16,24,32,40,48,56,64 +|little_intn_t |no |little |signed |8,16,24,32,40,48,56,64 +|little_uintn_t |no |little |unsigned |8,16,24,32,40,48,56,64 +|native_intn_t |no |native |signed |8,16,24,32,40,48,56,64 +|native_uintn_t |no |native |unsigned |8,16,24,32,40,48,56,64 +|big_intn_at |yes |big |signed |8,16,32,64 +|big_uintn_at |yes |big |unsigned |8,16,32,64 +|little_intn_at |yes |little |signed |8,16,32,64 +|little_uintn_at |yes |little |unsigned |8,16,32,64 +|=== + +The unaligned types do not cause compilers to insert padding bytes in classes +and structs. This is an important characteristic that can be exploited to +minimize wasted space in memory, files, and network transmissions. + +CAUTION: Code that uses aligned types is possibly non-portable because +alignment requirements vary between hardware architectures and because +alignment may be affected by compiler switches or pragmas. For example, +alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine. +Furthermore, aligned types are only available on architectures with 8, 16, 32, +and 64-bit integer types. + +TIP: Prefer unaligned arithmetic types. + +TIP: Protect yourself against alignment ills. For example: + +``` +static_assert(sizeof(containing_struct) == 12, "sizeof(containing_struct) is wrong"); +``` + +NOTE: One-byte arithmetic types have identical layout on all platforms, so they +never actually reverse endianness. They are provided to enable generic code, +and to improve code readability and searchability. + +## Class template `endian_arithmetic` + +An `endian_integer` is an integer byte-holder with user-specified +<>, value type, size, and +<>. The usual operations on arithmetic types +are supplied. + +### Synopsis + +``` +#include +#include + +namespace boost +{ + namespace endian + { + // {cpp}11 features emulated if not available + + enum class align {no, yes}; + + template + class endian_arithmetic + : public endian_buffer + { + public: + typedef T value_type; + + // if BOOST_ENDIAN_FORCE_PODNESS is defined && {cpp}11 PODs are not + // available then these two constructors will not be present + endian_arithmetic() noexcept = default; + endian_arithmetic(T v) noexcept; + + endian_arithmetic& operator=(T v) noexcept; + operator value_type() const noexcept; + value_type value() const noexcept; // for exposition; see endian_buffer + const char* data() const noexcept; // for exposition; see endian_buffer + + // arithmetic operations + // note that additional operations are provided by the value_type + value_type operator+(const endian& x) noexcept; + endian& operator+=(endian& x, value_type y) noexcept; + endian& operator-=(endian& x, value_type y) noexcept; + endian& operator*=(endian& x, value_type y) noexcept; + endian& operator/=(endian& x, value_type y) noexcept; + endian& operator%=(endian& x, value_type y) noexcept; + endian& operator&=(endian& x, value_type y) noexcept; + endian& operator|=(endian& x, value_type y) noexcept; + endian& operator^=(endian& x, value_type y) noexcept; + endian& operator<<=(endian& x, value_type y) noexcept; + endian& operator>>=(endian& x, value_type y noexcept; + value_type operator<<(const endian& x, value_type y) noexcept; + value_type operator>>(const endian& x, value_type y) noexcept; + endian& operator++(endian& x) noexcept; + endian& operator--(endian& x) noexcept; + endian operator++(endian& x, int) noexcept; + endian operator--(endian& x, int) noexcept; + + // Stream inserter + template + friend std::basic_ostream& + operator<<(std::basic_ostream& os, const T& x); + + // Stream extractor + template + friend std::basic_istream& + operator>>(std::basic_istream& is, T& x); + }; + + // typedefs + + // unaligned big endian signed integer types + typedef endian big_int8_t; + typedef endian big_int16_t; + typedef endian big_int24_t; + typedef endian big_int32_t; + typedef endian big_int40_t; + typedef endian big_int48_t; + typedef endian big_int56_t; + typedef endian big_int64_t; + + // unaligned big endian unsigned integer types + typedef endian big_uint8_t; + typedef endian big_uint16_t; + typedef endian big_uint24_t; + typedef endian big_uint32_t; + typedef endian big_uint40_t; + typedef endian big_uint48_t; + typedef endian big_uint56_t; + typedef endian big_uint64_t; + + // unaligned little endian signed integer types + typedef endian little_int8_t; + typedef endian little_int16_t; + typedef endian little_int24_t; + typedef endian little_int32_t; + typedef endian little_int40_t; + typedef endian little_int48_t; + typedef endian little_int56_t; + typedef endian little_int64_t; + + // unaligned little endian unsigned integer types + typedef endian little_uint8_t; + typedef endian little_uint16_t; + typedef endian little_uint24_t; + typedef endian little_uint32_t; + typedef endian little_uint40_t; + typedef endian little_uint48_t; + typedef endian little_uint56_t; + typedef endian little_uint64_t; + + // unaligned native endian signed integer types + typedef implementation-defined_int8_t native_int8_t; + typedef implementation-defined_int16_t native_int16_t; + typedef implementation-defined_int24_t native_int24_t; + typedef implementation-defined_int32_t native_int32_t; + typedef implementation-defined_int40_t native_int40_t; + typedef implementation-defined_int48_t native_int48_t; + typedef implementation-defined_int56_t native_int56_t; + typedef implementation-defined_int64_t native_int64_t; + + // unaligned native endian unsigned integer types + typedef implementation-defined_uint8_t native_uint8_t; + typedef implementation-defined_uint16_t native_uint16_t; + typedef implementation-defined_uint24_t native_uint24_t; + typedef implementation-defined_uint32_t native_uint32_t; + typedef implementation-defined_uint40_t native_uint40_t; + typedef implementation-defined_uint48_t native_uint48_t; + typedef implementation-defined_uint56_t native_uint56_t; + typedef implementation-defined_uint64_t native_uint64_t; + + // aligned big endian signed integer types + typedef endian big_int8_at; + typedef endian big_int16_at; + typedef endian big_int32_at; + typedef endian big_int64_at; + + // aligned big endian unsigned integer types + typedef endian big_uint8_at; + typedef endian big_uint16_at; + typedef endian big_uint32_at; + typedef endian big_uint64_at; + + // aligned little endian signed integer types + typedef endian little_int8_at; + typedef endian little_int16_at; + typedef endian little_int32_at; + typedef endian little_int64_at; + + // aligned little endian unsigned integer types + typedef endian little_uint8_at; + typedef endian little_uint16_at; + typedef endian little_uint32_at; + typedef endian little_uint64_at; + + // aligned native endian typedefs are not provided because + // types are superior for that use case + + } // namespace endian +} // namespace boost +``` + +The `implementation-defined` text above is either `big` or `little` according +to the endianness of the platform. + +### Members + +``` +endian() = default; // {cpp}03: endian(){} +``` +[horizontal] +Effects:: Constructs an uninitialized object of type +`endian_arithmetic`. + +``` +endian(T v); +``` +[horizontal] +Effects:: Constructs an object of type `endian_arithmetic`. +Postcondition:: `x == v,` where `x` is the constructed object. + +``` +endian& operator=(T v); +``` +[horizontal] +Postcondition:: `x == v,` where `x` is the constructed object. +Returns:: `*this`. + +``` +operator T() const; +``` +[horizontal] +Returns:: The current value stored in `*this`, converted to `value_type`. + +``` +const char* data() const; +``` +[horizontal] +Returns:: A pointer to the first byte of the endian binary value stored in +`*this`. + +### Other operators + +Other operators on endian objects are forwarded to the equivalent operator on +`value_type`. + +### Stream inserter + +``` +template +friend std::basic_ostream& + operator<<(std::basic_ostream& os, const T& x); + +``` +[horizontal] +Returns:: `os << +x`.

+ +### Stream extractor + +``` +template +friend std::basic_istream& + operator>>(std::basic_istream& is, T& x); +``` +[horizontal] +Effects:: As if: +``` +T i; +if (is >> i) + x = i; +``` +[horizontal] +Returns:: `is`. + +## FAQ + +See the <> FAQ for a library-wide FAQ. + +### Why not just use Boost.Serialization? + +Serialization involves a conversion for every object involved in I/O. Endian +integers require no conversion or copying. They are already in the desired +format for binary I/O. Thus they can be read or written in bulk. + +### Are endian types PODs? + +Yes for {cpp}11. No for {cpp}03, although several +<> are available to force PODness in all cases. + +### What are the implications of endian integer types not being PODs with {cpp}03 +compilers? + +They can't be used in unions. Also, compilers aren't required to align or lay +out storage in portable ways, although this potential problem hasn't prevented +use of Boost.Endian with real compilers. + +### What good is native endianness? + +It provides alignment and size guarantees not available from the built-in +types. It eases generic programming. + +### Why bother with the aligned endian types? + +Aligned integer operations may be faster (as much as 10 to 20 times faster) +if the endianness and alignment of the type matches the endianness and +alignment requirements of the machine. The code, however, will be somewhat less +portable than with the unaligned types. + +### Why provide the arithmetic operations? + +Providing a full set of operations reduces program clutter and makes code +both easier to write and to read. Consider incrementing a variable in a record. +It is very convenient to write: +``` +++record.foo; +``` +Rather than: +``` +int temp(record.foo); +++temp; +record.foo = temp; +``` + +## Design considerations for Boost.Endian types + +* Must be suitable for I/O - in other words, must be memcpyable. +* Must provide exactly the size and internal byte ordering specified. +* Must work correctly when the internal integer representation has more bits +that the sum of the bits in the external byte representation. Sign extension +must work correctly when the internal integer representation type has more +bits than the sum of the bits in the external bytes. For example, using +a 64-bit integer internally to represent 40-bit (5 byte) numbers must work for +both positive and negative values. +* Must work correctly (including using the same defined external +representation) regardless of whether a compiler treats char as signed or +unsigned. +* Unaligned types must not cause compilers to insert padding bytes. +* The implementation should supply optimizations with great care. Experience +has shown that optimizations of endian integers often become pessimizations +when changing machines or compilers. Pessimizations can also happen when +changing compiler switches, compiler versions, or CPU models of the same +architecture. + +## Experience + +Classes with similar functionality have been independently developed by +several Boost programmers and used very successful in high-value, high-use +applications for many years. These independently developed endian libraries +often evolved from C libraries that were also widely used. Endian types have +proven widely useful across a wide range of computer architectures and +applications. + +## Motivating use cases + +Neil Mayhew writes: "I can also provide a meaningful use-case for this +library: reading TrueType font files from disk and processing the contents. The +data format has fixed endianness (big) and has unaligned values in various +places. Using Boost.Endian simplifies and cleans the code wonderfully." + +## {cpp}11 + +The availability of the {cpp}11 +http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2346.htm[Defaulted +Functions] feature is detected automatically, and will be used if present to +ensure that objects of `class endian_arithmetic` are trivial, and thus PODs. + +## Compilation +Boost.Endian is implemented entirely within headers, with no need to link to any +Boost object libraries. + +Several macros allow user control over features: + +* BOOST_ENDIAN_NO_CTORS causes `class endian_arithmetic` to have no +constructors. The intended use is for compiling user code that must be portable +between compilers regardless of {cpp}11 +http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2346.htm[Defaulted +Functions] support. Use of constructors will always fail, +* BOOST_ENDIAN_FORCE_PODNESS causes BOOST_ENDIAN_NO_CTORS to be defined if +the compiler does not support {cpp}11 +http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2346.htm[Defaulted +Functions]. This is ensures that objects of `class endian_arithmetic` are PODs, +and so can be used in {cpp}03 unions. In {cpp}11, `class endian_arithmetic` +objects are PODs, even though they have constructors, so can always be used in +unions. + +## Acknowledgements + +Original design developed by Darin Adler based on classes developed by Mark +Borgerding. Four original class templates combined into a single +`endian_arithmetic` class template by Beman Dawes, who put the library together, +provided documentation, added the typedefs, and also added the +`unrolled_byte_loops` sign partial specialization to correctly extend the sign +when cover integer size differs from endian representation size.