diff --git a/choosing_approach.html b/choosing_approach.html new file mode 100644 index 0000000..effb79f --- /dev/null +++ b/choosing_approach.html @@ -0,0 +1,412 @@ + + +
+ + + +
+
+![]() |
+ + Choosing the Approach | +
+ Endian Home + Conversion Functions + Arithmetic Types + Buffer Types + Choosing Approach | +
Deciding which is the best endianness approach (conversion functions, buffer +types, or arithmetic types) for a particular application involves complex +engineering trade-offs. It is hard to assess those trade-offs without some +understanding of the different interfaces, so you might want to read the +conversion functions, +buffer types, and arithmetic types pages +before diving into this page.
+ +The best approach to endianness for a particular application depends on the interaction between +the application's needs and the characteristics of each of the three approaches.
+ +Recommendation: If you are new to endianness, uncertain, or don't want to invest +the time to +study +engineering trade-offs, use endian arithmetic types. They are safe, easy +to use, and easy to maintain. Use the + +anticipating need design pattern locally around performance hot spots +like lengthy loops, if needed.
+ +A dealing with endianness usually implies a program portability or a data +portability requirement, and often both. That means real programs dealing with +endianness are usually complex, so the examples shown here would really be +written as multiple functions spread across multiple translation units. They +would involve interfaces that can not be altered as they are supplied by +third-parties or the standard library.
+ +The characteristics that differentiate the three approaches to endianness are the endianness +invariants, conversion explicitness, arithmetic operations, sizes available, and +alignment requirements.
+ ++ ++ +Endian conversion functions use objects of the ordinary C++ arithmetic +types like
+ +int
orunsigned short
to hold values. That +breaks the implicit invariant that the C++ language rules apply. The usual +language rules only apply if the endianness of the object is currently set to the native endianness for the platform. That can +make it very hard to reason about logic flow, and result in difficult to +find bugs.For example:
+ +++struct data_t // big endian +{ + int32_t v1; // description ... + int32_t v2; // description ... + ... additional character data members (i.e. non-endian) + int32_t v3; // description ... +}; + +data_t data; + +read(data); +big_to_native_inplace(data.v1); +big_to_native_inplace(data.v2); + +... + +++v1; +third_party::func(data.v2); + +... + +native_to_big_inplace(data.v1); +native_to_big_inplace(data.v2); +write(data); ++The programmer didn't bother to convert
+data.v3
to native + endianness because that member isn't used. A later maintainer needs to pass +data.v3
to the third-party function, so addsthird_party::func(data.v3);
+ somewhere deep in the code. This causes a silent failure because the usual + invariant that an object of typeint32_t
holds a value as + described by the C++ core language does not apply.Endian buffer and arithmetic types hold values internally as arrays of +characters with an invariant that the endianness of the array never changes. +That makes these types easier to use and programs easier to maintain.
+Here is the same example, using an endian arithmetic type:
+++ +struct data_t +{ + big_int32_t v1; // description ... + big_int32_t v2; // description ... + ... additional character data members (i.e. non-endian) + big_int32_t v3; // description ... +}; + +data_t data; + +read(data); + +... + +++v1; +third_party::func(data.v2); + +... + +write(data); ++A later maintainer can add
+third_party::func(data.v3)
and it + will just-work.
+ ++ +Endian conversion functions and buffer types never perform +implicit conversions. This gives users explicit control of when conversion +occurs, and may help avoid unnecessary conversions.
+ +Endian arithmetic types perform conversion implicitly. That makes +these types very easy to use, but can result in unnecessary conversions. Failure +to hoist conversions out of inner loops can bring a performance penalty.
+ +
+ ++ +Endian conversion functions do not supply arithmetic +operations, but this is not a concern since this approach uses ordinary C++ +arithmetic types to hold values.
+ +Endian buffer types do not supply arithmetic operations. Although this +approach avoids unnecessary conversions, it can result in the introduction of +additional variables and confuse maintenance programmers.
+ +Endian arithmetic types do supply arithmetic operations. They +are very easy to use if lots of arithmetic is involved.
+ +
+ ++ +Endianness conversion functions only support 1, 2, 4, and 8 byte +integers. That's sufficient for many applications.
+ +Endian buffer and arithmetic types support 1, 2, 3, 4, 5, 6, 7, and 8 +byte integers. For an application where memory use or I/O speed is the limiting +factor, using sizes tailored to application needs can be useful.
+ +
+ ++ +Endianness conversion functions only support aligned integer and +floating-point types. That's sufficient for most applications.
+ +Endian buffer and arithmetic types support both aligned and unaligned +integer and floating-point types. Unaligned types are rarely needed, but when +needed they are often very useful and workarounds are painful. For example,
+ +++ +Non-portable code like this:
++struct S { + uint16_t a; // big endian + uint32_t b; // big endian +} __attribute__ ((packed));+Can be replaced with portable code like this:
+++struct S { + big_uint16_ut a; + big_uint32_ut b; +};+
Applications often traffic in endian data as records or packets containing +multiple endian data elements. For simplicity, we will just call them records.
+ +If desired endianness differs from native endianness, a conversion has to be +performed. When should that conversion occur? Three design patterns have +evolved.
+ +This pattern defers conversion to the point in the code where the data +element is actually used.
+ +This pattern is appropriate when which endian element is actually used varies +greatly according to record content or other circumstances
+ +This pattern performs conversion to native endianness in anticipation of use, +such as immediately after reading records. If needed, conversion to the output +endianness is performed after all possible needs have passed, such as just +before writing records.
+ +One implementation of this pattern is to create a proxy record with +endianness converted to native in a read function, and expose only that proxy to +the rest of the implementation. If a write function, if needed, handles the +conversion from native to the desired output endianness.
+ +This pattern is appropriate when all endian elements in a record are +typically used regardless of record content or other circumstances
+ +This pattern in general defers conversion but for specific local needs does +anticipatory conversion. Although particularly appropriate when coupled with the endian buffer +or arithmetic types, it also works well with the conversion functions.
+ +Example:
+ +++ +struct data_t +{ + big_int32_t v1; + big_int32_t v2; + big_int32_t v3; +}; + +data_t data; + +read(data); + +... +++v1; +... + +int32_t v3_temp = data.v3; // hoist conversion out of loop + +for (int32_t i = 0; i < large-number; ++i) +{ + ... lengthy computation that accesses v3_temp many times ... +} +data.v3 = v3_temp; + +write(data); ++
In general the above pseudo-code leaves conversion up to the endian
+arithmetic type big_int32_t
. But to avoid conversion inside the
+loop, a temporary is created before the loop is entered, and then used to set
+the new value of data.v3
after the loop is complete.
+ ++ +Question: Won't the compiler's optimizer hoist the conversion out +of the loop anyhow?
+ +Answer: VC++ 2015 Preview, and probably others, does not, even for +a toy test program. Although the savings is small (two register
+ ++bswap
instructions), the cost might +be significant if the loop is repeated enough times. On the other hand, the +program may be so dominated by I/O time that even a lengthy loop will be +immaterial.
An existing codebase runs on big endian systems. It does not +currently deal with endianness. The codebase needs to be modified so it can run +on little endian systems under various operating systems. To ease +transition and protect value of existing files, external data will continue to +be maintained as big endian.
+ +The endian
+arithmetic approach is recommended to meet these needs. A relatively small
+number of header files dealing with binary I/O layouts need to change types. For
+example,
+short
or int16_t
would change to big_int16_t
. No
+changes are required for .cpp
files.
An existing codebase runs on little-endian Linux systems. It already +deals with endianness via +Linux provided +functions. Because of a business merger, the codebase has to be quickly +modified for Windows and possibly other operating systems, while still +supporting Linux. The codebase is reliable and the programmers are all +well-aware of endian issues.
+ +These factors all argue for an endian conversion
+approach that just mechanically changes the calls to htobe32
,
+etc. to boost::endian::native_to_big
, etc. and replaces <endian.h>
+with <boost/endian/conversion.hpp>
.
A new, complex, multi-threaded application is to be developed that must run +on little endian machines, but do big endian network I/O. The developers believe +computational speed for endian variable is critical but have seen numerous bugs +result from inability to reason about endian conversion state. They are also +worried that future maintenance changes could inadvertently introduce a lot of +slow conversions if full-blown endian arithmetic types are used.
+ +The endian buffers approach is made-to-order for +this use case.
+ +A new, complex, multi-threaded application is to be developed that must run +on little endian machines, but do big endian network I/O. The developers believe +computational speed for endian variables is not critical but have seen +numerous bugs result from inability to reason about endian conversion state. +They are also concerned about ease-of-use both during development and long-term +maintenance.
+ +Removing concern about conversion speed and adding concern about ease-of-use +tips the balance strongly in favor the endian +arithmetic approach.
+ +Last revised: +19 January, 2015
+© Copyright Beman Dawes, 2011, 2013, 2014
+Distributed under the Boost Software License, Version 1.0. See +www.boost.org/ LICENSE_1_0.txt
+ ++ + + + \ No newline at end of file