1. Code
  2. Game Development

How to Read and Write Binary Data for Your Custom File Formats

Scroll to top
6 min read

In my previous article, Create Custom Binary File Formats for Your Game's Data, I covered the topic of using custom binary file formats to store game assets and resources. In this short tutorial we will take a quick look at how to actually read and write binary data.

Note: This tutorial uses pseudo-code to demonstrate how to read and write binary data, but the code can easily be translated to any programming language that supports basic file I/O operations.


Bitwise Operators

If this is all unfamiliar territory for you, you will notice a few strange operators being used in the code, specifically the &, |, << and >> operators. These are standard bitwise operators, available in most programming language, which are used for manipulating binary values.


Endianness and Streams

Before we can read and write binary data successfully, there are two important concepts that we need to understand: endianness and streams.

Endianness dictates the order of multiple-byte values within a file or within a chunk of memory. For example, if we had a 16-bit value of 0x1020, that value can either be stored as 0x10 followed by 0x20 (big-endian) or 0x20 followed by 0x10 (little-endian).

Streams are array-like objects that contain a sequence of bytes (or bits in some cases). Binary data is read from and written to these streams. Most programming will provide an implementation of binary streams in one form or another; some are more convoluted than others, but they all essentially do the same thing.


Reading Binary Data

Let's start by defining some properties in our code. Ideally these should all be private properties:

1
2
__stream   // The array-like object containing the bytes

3
__endian   // The endianness of the data within the stream

4
__length   // The number of bytes in the stream

5
__position // The position of the next byte to read from the stream

Here is an example of what a basic class constructor might look like:

1
2
class DataInput( stream, endian ) {
3
  __stream   = stream
4
  __endian   = endian
5
  __length   = stream.length
6
  __position = 0
7
}

The following functions will read unsigned integers from the stream:

1
2
// Reads an unsigned 8-bit integer

3
function readU8() {
4
  // Throw an exception if there are no more bytes available to read

5
  if( __position >= __length ) {
6
    throw new Exception( "..." )
7
  }
8
  // Return the byte value and increase the __position property

9
  return __stream[ __position ++ ]
10
}
11
12
// Reads an unsigned 16-bit integer

13
function readU16() {
14
  value = 0
15
  // Endianness needs to be handled for multiple-byte values

16
  if( __endian == BIG_ENDIAN ) {
17
    value |= readU8() << 8
18
    value |= readU8() << 0
19
  } else {
20
    // LITTLE_ENDIAN

21
    value |= readU8() << 0
22
    value |= readU8() << 8
23
  }
24
  return value
25
}
26
27
// Reads an unsigned 24-bit integer

28
function readU24() {
29
  value = 0
30
  if( __endian == BIG_ENDIAN ) {
31
    value |= readU8() << 16
32
    value |= readU8() << 8
33
    value |= readU8() << 0
34
  } else {
35
    value |= readU8() << 0
36
    value |= readU8() << 8
37
    value |= readU8() << 16
38
  }
39
  return value
40
}
41
42
// Reads an unsigned 32-bit integer

43
function readU32() {
44
  value = 0
45
  if( __endian == BIG_ENDIAN ) {
46
    value |= readU8() << 24
47
    value |= readU8() << 16
48
    value |= readU8() << 8
49
    value |= readU8() << 0
50
  } else {
51
    value |= readU8() << 0
52
    value |= readU8() << 8
53
    value |= readU8() << 16
54
    value |= readU8() << 24
55
  }
56
  return value
57
}

These functions will read signed integers from the stream:

1
2
// Reads a signed 8-bit integer

3
function readS8() {
4
  // Read the unsigned value

5
  value = readU8()
6
  // Check if the first (most significant) bit indicates a negative value

7
  if( value >> 7 == 1 ) {
8
    // Use "Two's complement" to convert the value

9
    value = ~( value ^ 0xFF )
10
  }
11
  return value
12
}
13
14
// Reads a signed 16-bit integer

15
function readS16() {
16
  value = readU16()
17
  if( value >> 15 == 1 ) {
18
    value = ~( value ^ 0xFFFF )
19
  }
20
  return value
21
}
22
23
// Reads a signed 24-bit integer

24
function readS24() {
25
  value = readU24()
26
  if( value >> 23 == 1 ) {
27
    value = ~( value ^ 0xFFFFFF )
28
  }
29
  return value
30
}
31
32
// Reads a signed 32-bit integer

33
function readS32() {
34
  value = readU32()
35
  if( value >> 31 == 1 ) {
36
    value = ~( value ^ 0xFFFFFFFF )
37
  }
38
  return value
39
}

Writing Binary Data

Let's start by defining some properties in our code. (These are more or less the same as the properties we defined for reading binary data.) Ideally these should all be private properties:

1
2
__stream   // The array-like object that will contain the bytes

3
__endian   // The endianness of the data within the stream

4
__position // The position of the next byte to write to the stream

Here is an example of what a basic class constructor might look like:

1
2
class DataOutput( stream, endian ) {
3
  __stream   = stream
4
  __endian   = endian
5
  __position = 0
6
}

The following functions will write unsigned integers to the stream:

1
2
// Writes an unsigned 8-bit integer

3
function writeU8( value ) {
4
  // Ensures the value is unsigned and within an 8-bit range

5
  value &= 0xFF
6
  // Add the value to the stream and increase the __position property.

7
  __stream[ __position ++ ] = value
8
}
9
10
// Writes an unsigned 16-bit integer

11
function writeU16( value ) {
12
  value &= 0xFFFF
13
  // Endianness needs to be handled for multiple-byte values

14
  if( __endian == BIG_ENDIAN ) {
15
    writeU8( value >> 8 )
16
    writeU8( value >> 0 )
17
  } else {
18
    // LITTLE_ENDIAN

19
    writeU8( value >> 0 )
20
    writeU8( value >> 8 )
21
  }
22
}
23
24
// Write an unsigned 24-bit integer

25
function writeU24( value ) {
26
  value &= 0xFFFFFF
27
  if( __endian == BIG_ENDIAN ) {
28
    writeU8( value >> 16 )
29
    writeU8( value >> 8  )
30
    writeU8( value >> 0  )
31
  } else {
32
    writeU8( value >> 0  )
33
    writeU8( value >> 8  )
34
    writeU8( value >> 16 )
35
  }
36
}
37
38
// Writes an unsigned 32-bit integer

39
function writeU32( value ) {
40
  value &= 0xFFFFFFFF
41
  if( __endian == BIG_ENDIAN ) {
42
    writeU8( value >> 24 )
43
    writeU8( value >> 16 )
44
    writeU8( value >> 8  )
45
    writeU8( value >> 0  )
46
  } else {
47
    writeU8( value >> 0  )
48
    writeU8( value >> 8  )
49
    writeU8( value >> 16 )
50
    writeU8( value >> 24 )
51
 }
52
}

And, again, these functions will write signed integers to the stream. (The functions are actually aliases of the writeU*() functions, but they provide API consistency with the readS*() functions.)

1
2
// Writes a signed 8-bit value

3
function writeS8( value ) {
4
  writeU8( value )
5
}
6
7
// Writes a signed 16-bit value

8
function writeS16( value ) {
9
  writeU16( value )
10
}
11
12
// Writes a signed 24-bit value

13
function writeS24( value ) {
14
  writeU24( value )
15
}
16
17
// Writes a signed 32-bit value

18
function writeS32( value ) {
19
  writeU32( value )
20
}

Note: These aliases work because binary data is always stored as unsigned values; for instance, a single byte will always have a value in the range 0 to 255. The conversion to signed values is done when the data is read from a stream.


Conclusion

My goal with this short tutorial was to complement my previous article on creating binary files for your game's data with some examples of how to do the actual reading and writing. I hope it's achieved that; if there's more you'd like to know about the topic, please speak up in the comments!