How to Read and Write Binary Data for Your Custom File Formats

In my previous article, Create Custom Binary File Formats for Your Game's Data, I covered the topic of using custom binary file formats to store game assets and resources. In this short tutorial we will take a quick look at how to actually read and write binary data.

Note: This tutorial uses pseudo-code to demonstrate how to read and write binary data, but the code can easily be translated to any programming language that supports basic file I/O operations.

Bitwise Operators

If this is all unfamiliar territory for you, you will notice a few strange operators being used in the code, specifically the &, |, << and >> operators. These are standard bitwise operators, available in most programming language, which are used for manipulating binary values.

Endianness and Streams

Before we can read and write binary data successfully, there are two important concepts that we need to understand: endianness and streams.

Endianness dictates the order of multiple-byte values within a file or within a chunk of memory. For example, if we had a 16-bit value of 0x1020, that value can either be stored as 0x10 followed by 0x20 (big-endian) or 0x20 followed by 0x10 (little-endian).

Streams are array-like objects that contain a sequence of bytes (or bits in some cases). Binary data is read from and written to these streams. Most programming will provide an implementation of binary streams in one form or another; some are more convoluted than others, but they all essentially do the same thing.

Reading Binary Data

Let's start by defining some properties in our code. Ideally these should all be private properties:


__stream   // The array-like object containing the bytes
__endian   // The endianness of the data within the stream
__length   // The number of bytes in the stream
__position // The position of the next byte to read from the stream

Here is an example of what a basic class constructor might look like:


class DataInput( stream, endian ) {
  __stream   = stream
  __endian   = endian
  __length   = stream.length
  __position = 0
}

The following functions will read unsigned integers from the stream:


// Reads an unsigned 8-bit integer
function readU8() {
  // Throw an exception if there are no more bytes available to read
  if( __position >= __length ) {
    throw new Exception( "..." )
  }
  // Return the byte value and increase the __position property
  return __stream[ __position ++ ]
}

// Reads an unsigned 16-bit integer
function readU16() {
  value = 0
  // Endianness needs to be handled for multiple-byte values
  if( __endian == BIG_ENDIAN ) {
    value |= readU8() << 8
    value |= readU8() << 0
  } else {
    // LITTLE_ENDIAN
    value |= readU8() << 0
    value |= readU8() << 8
  }
  return value
}

// Reads an unsigned 24-bit integer
function readU24() {
  value = 0
  if( __endian == BIG_ENDIAN ) {
    value |= readU8() << 16
    value |= readU8() << 8
    value |= readU8() << 0
  } else {
    value |= readU8() << 0
    value |= readU8() << 8
    value |= readU8() << 16
  }
  return value
}

// Reads an unsigned 32-bit integer
function readU32() {
  value = 0
  if( __endian == BIG_ENDIAN ) {
    value |= readU8() << 24
    value |= readU8() << 16
    value |= readU8() << 8
    value |= readU8() << 0
  } else {
    value |= readU8() << 0
    value |= readU8() << 8
    value |= readU8() << 16
    value |= readU8() << 24
  }
  return value
}

These functions will read signed integers from the stream:


// Reads a signed 8-bit integer
function readS8() {
  // Read the unsigned value
  value = readU8()
  // Check if the first (most significant) bit indicates a negative value
  if( value >> 7 == 1 ) {
    // Use "Two's complement" to convert the value
    value = ~( value ^ 0xFF )
  }
  return value
}

// Reads a signed 16-bit integer
function readS16() {
  value = readU16()
  if( value >> 15 == 1 ) {
    value = ~( value ^ 0xFFFF )
  }
  return value
}

// Reads a signed 24-bit integer
function readS24() {
  value = readU24()
  if( value >> 23 == 1 ) {
    value = ~( value ^ 0xFFFFFF )
  }
  return value
}

// Reads a signed 32-bit integer
function readS32() {
  value = readU32()
  if( value >> 31 == 1 ) {
    value = ~( value ^ 0xFFFFFFFF )
  }
  return value
}

Writing Binary Data

Let's start by defining some properties in our code. (These are more or less the same as the properties we defined for reading binary data.) Ideally these should all be private properties:

1
2	__stream // The array-like object that will contain the bytes
3	__endian // The endianness of the data within the stream
4	__position // The position of the next byte to write to the stream

Here is an example of what a basic class constructor might look like:


class DataOutput( stream, endian ) {
  __stream   = stream
  __endian   = endian
  __position = 0
}

The following functions will write unsigned integers to the stream:


// Writes an unsigned 8-bit integer
function writeU8( value ) {
  // Ensures the value is unsigned and within an 8-bit range
  value &= 0xFF
  // Add the value to the stream and increase the __position property.
  __stream[ __position ++ ] = value
}

// Writes an unsigned 16-bit integer
function writeU16( value ) {
  value &= 0xFFFF
  // Endianness needs to be handled for multiple-byte values
  if( __endian == BIG_ENDIAN ) {
    writeU8( value >> 8 )
    writeU8( value >> 0 )
  } else {
    // LITTLE_ENDIAN
    writeU8( value >> 0 )
    writeU8( value >> 8 )
  }
}

// Write an unsigned 24-bit integer
function writeU24( value ) {
  value &= 0xFFFFFF
  if( __endian == BIG_ENDIAN ) {
    writeU8( value >> 16 )
    writeU8( value >> 8  )
    writeU8( value >> 0  )
  } else {
    writeU8( value >> 0  )
    writeU8( value >> 8  )
    writeU8( value >> 16 )
  }
}

// Writes an unsigned 32-bit integer
function writeU32( value ) {
  value &= 0xFFFFFFFF
  if( __endian == BIG_ENDIAN ) {
    writeU8( value >> 24 )
    writeU8( value >> 16 )
    writeU8( value >> 8  )
    writeU8( value >> 0  )
  } else {
    writeU8( value >> 0  )
    writeU8( value >> 8  )
    writeU8( value >> 16 )
    writeU8( value >> 24 )
 }
}

And, again, these functions will write signed integers to the stream. (The functions are actually aliases of the writeU*() functions, but they provide API consistency with the readS*() functions.)


// Writes a signed 8-bit value
function writeS8( value ) {
  writeU8( value )
}

// Writes a signed 16-bit value
function writeS16( value ) {
  writeU16( value )
}

// Writes a signed 24-bit value
function writeS24( value ) {
  writeU24( value )
}

// Writes a signed 32-bit value
function writeS32( value ) {
  writeU32( value )
}

Note: These aliases work because binary data is always stored as unsigned values; for instance, a single byte will always have a value in the range 0 to 255. The conversion to signed values is done when the data is read from a stream.

Conclusion

My goal with this short tutorial was to complement my previous article on creating binary files for your game's data with some examples of how to do the actual reading and writing. I hope it's achieved that; if there's more you'd like to know about the topic, please speak up in the comments!