boost.png (6897 bytes) Endian Integers
Boost Home  Tutorial
Contents
Introduction
Hello endian world
Limitations
Feature set
Typedefs
    Comment on naming
Class template endian
    Synopsis
    Members
FAQ
Binary I/O warnings and cautions
Example
Design
Experience
C++0x
Compilation
Acknowledgements
Headers
<boost/integer/endian.hpp>
<boost/integer/endian_binary_stream.hpp>
<boost/binary_stream.hpp>

Introduction

Header <boost/integer/endian.hpp> provides integer-like byte-holder binary types with explicit control over byte order, value type, size, and alignment. Typedefs provide easy-to-use names for common configurations.

These types provide portable byte-holders for integer data, independent of particular computer architectures. Use cases almost always involve I/O, either via files or network connections. Although data portability is the primary motivation, these integer byte-holders may also be used to reduce memory use, file size, or network activity since they provide binary integer sizes not otherwise available.

Such integer byte-holder types are traditionally called endian types. See the Wikipedia for a full exploration of endianness, including definitions of big endian and little endian.

Boost endian integers provide the same full set of C++ assignment, arithmetic, and relational operators as C++ standard integral types, with the standard semantics.

Unary arithmetic operators are +, -, ~, !, prefix and postfix -- and ++. Binary arithmetic operators are +, +=, -, -=, *, *=, /, /=, %/ %=, &, &=, |, |=, ^, ^=, <<, <<=, >>, >>=. Binary relational operators are ==, !=, <, <=, >, >=.

Automatic conversion is provided to the underlying integer value type.

Header <boost/integer/endian_binary_stream.hpp> provides operators <= and => for unformatted binary (as opposed to formatted character) stream insertion and extraction of endian types.

Header <boost/binary_stream.hpp> provides operators <= and => for unformatted binary (as opposed to formatted character) stream insertion and extraction of built-in and std::string types.

Hello endian world

#include <boost/integer/endian.hpp>
#include <boost/integer/endian_binary_stream.hpp>
#include <boost/binary_stream.hpp>
#include <iostream>

using namespace boost;
using namespace boost::integer;

int main()
{
  int_least32_t v = 0x31323334L;  // = ASCII { '1', '2', '3', '4' }
                                  // value chosen to work on text stream
  big32_t    b(v);
  little32_t l(v);

  std::cout << "Hello, endian world!\n\n";

  std::cout << v << ' ' << b << ' ' << l << '\n';
  std::cout <= v <= ' ' <= b <= ' ' <= l <= '\n';
}

On a little-endian CPU, this program outputs:

Hello, endian world!

825373492 825373492 825373492
4321 1234 4321

Limitations

Requires <climits> CHAR_BIT == 8. If CHAR_BIT is some other value, compilation will result in an #error. This restriction is in place because the design, implementation, testing, and documentation has only considered issues related to 8-bit bytes, and there have been no real-world use cases presented for other sizes.

In C++03, endian does not meet the requirements for POD types because it has constructors, private data members, and a base class. This means that common use cases are relying on unspecified behavior in that the C++ Standard does not guarantee memory layout for non-POD types. This has not been a problem in practice since all known C++ compilers do layout memory as if endian were a POD type. In C++0x, it will be possible to specify the default constructor as trivial, and private data members and base classes will no longer disqualify a type from being a POD. Thus under C++0x, endian will no longer be relying on unspecified behavior.

Feature set

Typedefs

One class template is provided:

template <endianness::enum_t E, typename T, std::size_t n_bytes,
  alignment::enum_t A = alignment::unaligned>
class endian;

Sixty typedefs, such as big32_t, provide convenient naming conventions for common use cases:

Name Endianness Sign Sizes in bits (n) Alignment
bign_t big signed 8,16,24,32,40,48,56,64 unaligned
ubign_t big unsigned 8,16,24,32,40,48,56,64 unaligned
littlen_t little signed 8,16,24,32,40,48,56,64 unaligned
ulittlen_t little unsigned 8,16,24,32,40,48,56,64 unaligned
nativen_t native signed 8,16,24,32,40,48,56,64 unaligned
unativen_t native unsigned 8,16,24,32,40,48,56,64 unaligned
aligned_bign_t big signed 16,32,64 aligned
aligned_ubign_t big unsigned 16,32,64 aligned
aligned_littlen_t little signed 16,32,64 aligned
aligned_ulittlen_t little unsigned 16,32,64 aligned

The unaligned types do not cause compilers to insert padding bytes in classes and structs. This is an important characteristic that can be exploited to minimize wasted space in memory, files, and network transmissions.

Warning: Code that uses aligned types is inherently non-portable because alignment requirements vary between hardware architectures and because alignment may be affected by compiler switches or pragmas. Furthermore, aligned types are only available on architectures with 16, 32, and 64-bit integer types.

Note: One-byte big-endian, little-endian, and native-endian types provide identical functionality. All three names are provided to improve code readability and searchability.

Comment on naming

When first exposed to endian types, programmers often fit them into a mental model based on the <cstdint> types. Using that model, it is natural to expect a 56-bit big-endian signed integer to be named int_big56_t rather than big56_t.

As experience using these type grows, the realization creeps in that they are lousy arithmetic integers - they are really byte holders that for convenience support arithmetic operations - and that for use in internal interfaces or anything more than trivial arithmetic computations it is far better to convert values of these endian types to traditional integer types.

That seems to lead to formation of a new mental model specific to endian byte-holder types. In that model, the endianness is the key feature, and the integer aspect is downplayed. Once that mental transition is made, a name like big56_t is a good reflection of the mental model

Class template endian

An endian is an integer byte-holder with user-specified endianness, value type, size, and alignment. The usual operations on integers are supplied.

Synopsis

namespace boost
{
  namespace integer
  {
     
    enum class endianness { big, little, native };  // scoped enum emulated on C++03
    enum class alignment  { unaligned, aligned };   // scoped enum emulated on C++03

    template <endianness E, typename T, std::size_t n_bits,
      alignment A = alignment::unaligned>
    class endian : integer_cover_operators< endian<E, T, n_bits, A>, T >
    {
    public:
      typedef T value_type;
      endian() = default;       // = default replaced by {} on C++03
      explicit endian(T v);
      endian & operator=(T v);
      operator T() const;
    };

    // unaligned big endian signed integer types
    typedef endian< endianness::big, int_least8_t, 8 >   big8_t;
    typedef endian< endianness::big, int_least16_t, 16 > big16_t;
    typedef endian< endianness::big, int_least32_t, 24 > big24_t;
    typedef endian< endianness::big, int_least32_t, 32 > big32_t;
    typedef endian< endianness::big, int_least64_t, 40 > big40_t;
    typedef endian< endianness::big, int_least64_t, 48 > big48_t;
    typedef endian< endianness::big, int_least64_t, 56 > big56_t;
    typedef endian< endianness::big, int_least64_t, 64 > big64_t;

    // unaligned big endian unsigned integer types
    typedef endian< endianness::big, uint_least8_t, 8 >   ubig8_t;
    typedef endian< endianness::big, uint_least16_t, 16 > ubig16_t;
    typedef endian< endianness::big, uint_least32_t, 24 > ubig24_t;
    typedef endian< endianness::big, uint_least32_t, 32 > ubig32_t;
    typedef endian< endianness::big, uint_least64_t, 40 > ubig40_t;
    typedef endian< endianness::big, uint_least64_t, 48 > ubig48_t;
    typedef endian< endianness::big, uint_least64_t, 56 > ubig56_t;
    typedef endian< endianness::big, uint_least64_t, 64 > ubig64_t;

    // unaligned little endian signed integer types
    typedef endian< endianness::little, int_least8_t, 8 >   little8_t;
    typedef endian< endianness::little, int_least16_t, 16 > little16_t;
    typedef endian< endianness::little, int_least32_t, 24 > little24_t;
    typedef endian< endianness::little, int_least32_t, 32 > little32_t;
    typedef endian< endianness::little, int_least64_t, 40 > little40_t;
    typedef endian< endianness::little, int_least64_t, 48 > little48_t;
    typedef endian< endianness::little, int_least64_t, 56 > little56_t;
    typedef endian< endianness::little, int_least64_t, 64 > little64_t;

    // unaligned little endian unsigned integer types
    typedef endian< endianness::little, uint_least8_t, 8 >   ulittle8_t;
    typedef endian< endianness::little, uint_least16_t, 16 > ulittle16_t;
    typedef endian< endianness::little, uint_least32_t, 24 > ulittle24_t;
    typedef endian< endianness::little, uint_least32_t, 32 > ulittle32_t;
    typedef endian< endianness::little, uint_least64_t, 40 > ulittle40_t;
    typedef endian< endianness::little, uint_least64_t, 48 > ulittle48_t;
    typedef endian< endianness::little, uint_least64_t, 56 > ulittle56_t;
    typedef endian< endianness::little, uint_least64_t, 64 > ulittle64_t;

    // unaligned native endian signed integer types
    typedef endian< endianness::native, int_least8_t, 8 >   native8_t;
    typedef endian< endianness::native, int_least16_t, 16 > native16_t;
    typedef endian< endianness::native, int_least32_t, 24 > native24_t;
    typedef endian< endianness::native, int_least32_t, 32 > native32_t;
    typedef endian< endianness::native, int_least64_t, 40 > native40_t;
    typedef endian< endianness::native, int_least64_t, 48 > native48_t;
    typedef endian< endianness::native, int_least64_t, 56 > native56_t;
    typedef endian< endianness::native, int_least64_t, 64 > native64_t;

    // unaligned native endian unsigned integer types
    typedef endian< endianness::native, uint_least8_t, 8 >   unative8_t;
    typedef endian< endianness::native, uint_least16_t, 16 > unative16_t;
    typedef endian< endianness::native, uint_least32_t, 24 > unative24_t;
    typedef endian< endianness::native, uint_least32_t, 32 > unative32_t;
    typedef endian< endianness::native, uint_least64_t, 40 > unative40_t;
    typedef endian< endianness::native, uint_least64_t, 48 > unative48_t;
    typedef endian< endianness::native, uint_least64_t, 56 > unative56_t;
    typedef endian< endianness::native, uint_least64_t, 64 > unative64_t;

    // These types only present if platform has exact size integers:

    // aligned big endian signed integer types
    typedef endian< endianness::big, int16_t, 16, alignment::aligned >  aligned_big16_t;
    typedef endian< endianness::big, int32_t, 32, alignment::aligned >  aligned_big32_t;
    typedef endian< endianness::big, int64_t, 64, alignment::aligned >  aligned_big64_t;

    // aligned big endian unsigned integer types
    typedef endian< endianness::big, uint16_t, 16, alignment::aligned > aligned_ubig16_t;
    typedef endian< endianness::big, uint32_t, 32, alignment::aligned > aligned_ubig32_t;
    typedef endian< endianness::big, uint64_t, 64, alignment::aligned > aligned_ubig64_t;

    // aligned little endian signed integer types
    typedef endian< endianness::little, int16_t, 16, alignment::aligned > aligned_little2_t;
    typedef endian< endianness::little, int32_t, 32, alignment::aligned > aligned_little4_t;
    typedef endian< endianness::little, int64_t, 64, alignment::aligned > aligned_little8_t;

    // aligned little endian unsigned integer types
    typedef endian< endianness::little, uint16_t, 16, alignment::aligned > aligned_ulittle2_t;
    typedef endian< endianness::little, uint32_t, 32, alignment::aligned > aligned_ulittle4_t;
    typedef endian< endianness::little, uint64_t, 64, alignment::aligned > aligned_ulittle8_t;


    // aligned native endian typedefs are not provided because
    // <cstdint> types are superior for this use case

  } // namespace integer
} // namespace boost

Members

endian() = default;  // C++03: endian(){}

Effects: Constructs an object of type endian<E, T, n_bits, A>.

explicit endian(T v);

Effects: Constructs an object of type endian<E, T, n_bits, A>.

Postcondition: x == v, where x is the constructed object.

endian & operator=(T v);

Postcondition: x == v, where x is the constructed object.

Returns: *this.

operator T() const;

Returns: The current value stored in *this, converted to value_type.

Other operators

Other operators on endian objects are forwarded to the equivalent operator on value_type.

FAQ

Why bother with endian types? External data portability and both speed and space efficiency. Availability of additional binary integer sizes and alignments is important in some applications.

Why not just use Boost.Serialization? Serialization involves a conversion for every object involved in I/O. Endian objects require no conversion or copying. They are already in the desired format for binary I/O. Thus they can be read or written in bulk.

Why bother with binary I/O? Why not just use C++ Standard Library stream inserters and extractors? Using binary rather than character representations can be more space efficient, with a side benefit of faster I/O. CPU time is minimized because conversions to and from string are eliminated. Furthermore, binary integers are fixed size, and so fixed-size disk records are possible, easing sorting and allowing direct access. Disadvantages, such as the inability to use text utilities on the resulting files, limit usefulness to applications where the binary I/O advantages are paramount.

Do these types have any uses outside of I/O? Probably not, except for native endianness which can be used for fine grained control over size and alignment.

Is there is a performance hit when doing arithmetic using these types? Yes, for sure, compared to arithmetic operations on native integer types. However, these types are usually be faster, and sometimes much faster, for I/O compared to stream inserters and extractors, or to serialization.

Are endian types POD's? Yes for C++0x. No for C++03, although several macros are available to force PODness in all cases.

What are the implications endian types not being POD's of C++03? They can't be used in unions. In theory, compilers aren't required to align or lay out storage in portable ways, although this problem has never been observed in a real compiler.

Which is better, big-endian or little-endian? Big-endian tends to be a bit more of an industry standard, but little-endian may be preferred for applications that run primarily on x86 (Intel/AMD) and other little-endian CPU's. The Wikipedia article gives more pros and cons.

What good is native endianness? It provides alignment and size guarantees not available from the built-in types. It eases generic programming.

Why bother with the aligned endian types? Aligned integer operations may be faster (20 times, in one measurement) if the endianness and alignment of the type matches the endianness and alignment requirements of the machine. On common CPU architectures, that optimization is only available for aligned types. That allows I/O of maximally efficient types on an application's primary platform, yet produces data files are portable to all platforms. The code, however, is likely to be more fragile and less portable than with the unaligned types.

These types are really just byte-holders. Why provide the arithmetic operations at all? Providing a full set of operations reduces program clutter and makes code both easier to write and to read. Consider incrementing a variable in a record. It is very convenient to write:

    ++record.foo;

Rather than:

    int temp( record.foo);
    ++temp;
    record.foo = temp;

Why do binary stream insertion and extraction use operators <= and >= rather than <<= and >>=? <<= and >>= associate right-to-left, which is the opposite of << and >>, so would be very confusing and error prone. <= and >= associate left-to-right.

Binary I/O warnings and cautions

Warning:  Use only on streams opened with filemode std::ios_base::binary. Thus unformatted binary I/O should not be with the standard streams (cout, cin, etc.) since they are opened in text mode. Use on text streams may produce incorrect results, such as insertion of unwanted characters or premature end-of-file. For example, on Windows 0x0D would become 0x0D, 0x0A.

Caution: When mixing formatted (i.e. operator << or >>) and unformatted (i.e. operator <= or >=) stream I/O, be aware that << and >> take precedence over <= and >=. Use parentheses to force correct order of evaluation. For example:

my_stream << foo <= bar;    // no parentheses needed
(my_stream <= foo) << bar;  // parentheses required 

As a practical matter, it may be easier and safer to never mix the character and binary insertion or extraction operators in the same statement.

Example

The endian_example.cpp program writes a binary file containing four byte big-endian and little-endian integers:

#include <iostream>
#include <cassert>
#include <cstdio>
#include <boost/integer/endian.hpp>

using namespace boost::integer;

namespace 
{
  // This is an extract from a very widely used GIS file format. I have no idea
  // why a designer would mix big and little endians in the same file - but
  // this is a real-world format and users wishing to write low level code
  // manipulating these files have to deal with the mixed endianness.

  struct header
  {
    big32_t     file_code;
    big32_t     file_length;
    little32_t  version;
    little32_t  shape_type;
  };

  const char * filename = "test.dat";
}

int main()
{
  assert( sizeof( header ) == 16 );  // requirement for interoperability

  header h;

  h.file_code   = 0x04030201;
  h.file_length = sizeof( header );
  h.version     = -1;
  h.shape_type  = 0x04030201;

  // Low-level I/O such as POSIX read/write or <cstdio> fread/fwrite is sometimes
  // used for binary file operations when ultimate efficiency is important.
  // Such I/O is often performed in some C++ wrapper class, but to drive home the
  // point that endian integers are often used in fairly low-level code that
  // does bulk I/O operations, <cstdio> fopen/fwrite is used for I/O in this example.

  std::FILE * fi;

  if ( !(fi = std::fopen( filename, "wb" )) )  // MUST BE BINARY
  {
    std::cout << "could not open " << filename << '\n';
    return 1;
  }

  if ( std::fwrite( &h, sizeof( header ), 1, fi ) != 1 ) 
  {
    std::cout << "write failure for " << filename << '\n';
    return 1;
  }

  std::fclose( fi );

  std::cout << "created file " << filename << '\n';
  return 0;
}

After compiling and executing endian_example.cpp, a hex dump of test.dat shows:

0403 0201 0000 0010 ffff ffff 0102 0304

Design considerations for Boost.Endian

Experience

Classes with similar functionality have been independently developed by several Boost programmers and used very successful in high-value, high-use applications for many years. These independently developed endian libraries often evolved from C libraries that were also widely used. Endian integers have proven widely useful across a wide range of computer architectures and applications.

C++0x

The availability of the C++0x Defaulted Functions feature is detected automatically, and will be used if present to ensure that objects of class endian are trivial, and thus POD's.

Compilation

Boost.Endian is implemented entirely within headers, with no need to link to any Boost object libraries.

Several macros allow user control over features:

Acknowledgements

Original design developed by Darin Adler based on classes developed by Mark Borgerding. Four original class templates combined into a single endian class template by Beman Dawes, who put the library together, provided documentation, and added the typedefs. He also added the unrolled_byte_loops sign partial specialization to correctly extend the sign when cover integer size differs from endian representation size.

Comments and suggestions were received from Benaka Moorthi, Christopher Kohlhoff, Cliff Green, Gennaro Proto, Jeff Flinn, John Maddock, Kim Barrett, Marsh Ray, Martin Bonner, Matias Capeletto, Rene Rivera, Scott McMurray, Sebastian Redl, Tomas Puverle, and Yuval Ronen.


Last revised: 19 March, 2009

© Copyright Beman Dawes, 2006-2009

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at www.boost.org/ LICENSE_1_0.txt)