Testing Data-Intensive Code With Go, Part 1

Overview

Many non-trivial systems are also data-intensive or data-driven. Testing the parts of the systems that are data-intensive is very different than testing code-intensive systems. First, there may be a lot of sophistication in the data layer itself, such as hybrid data stores, caching, backup, and redundancy.

All this machinery has nothing to do with the application itself, but has to be tested. Second, the code may be very generic, and in order to test it, you need to generate data that is structured in a certain way. In this series of five tutorials, I will address all these aspects, explore several strategies for designing testable data-intensive systems with Go, and dive into specific examples.

In part one, I'll go over the design of an abstract data layer that enables proper testing, how to do error handling in the data layer, how to mock data access code, and how to test against an abstract data layer.

Testing Against a Data Layer

Dealing with real data stores and their intricacies is complicated and unrelated to the business logic. The concept of a data layer allows you to expose a neat interface to your data and hide the gory details of exactly how the data is stored and how to access it. I'll use a sample application called "Songify" for personal music management to illustrate the concepts with real code.

Designing an Abstract Data Layer

Let's review the personal music management domain—users can add songs and label them—and consider what data we need to store and how to access it. The objects in our domain are users, songs, and labels. There are two categories of operations that you want to perform on any data: queries (read-only) and state changes (create, update, delete). Here is a basic interface for the data layer:

package abstract_data_layer

import "time"

type Song struct {
    Url         string
	Name        string
	Description string
}

type Label struct {
	Name string
}

type User struct {
	Name         string
	Email        string
	RegisteredAt time.Time
	LastLogin    time.Time
}

type DataLayer interface {
	// Queries (read-only)
	GetUsers() ([]User, error)
	GetUserByEmail(email string) (User, error)
	GetLabels() ([]Label, error)
	GetSongs() ([]Song, error)
	GetSongsByUser(user User) ([]Song, error)
	GetSongsByLabel(label string) ([]Song, error)

	// State changing operations
	CreateUser(user User) error
	ChangeUserName(user User, name string) error
	AddLabel(label string) error
	AddSong(user User, song Song, labels []Label) error
}

Note that the purpose of this domain model is to present a simple yet not completely trivial data layer to demonstrate the testing aspects. Obviously, in a real application there will be more objects like albums, genres, artists, and much more information about each song. If push comes to shove, you can always store arbitrary information about a song in its description, as well as attaching as many labels as you want.

In practice, you may want to divide your data layer into multiple interfaces. Some of the structs may have more attributes, and the methods may require more arguments (e.g. all the GetXXX() methods will probably require some paging arguments). You may need other data access interfaces and methods for maintenance operations like bulk loading, backups, and migrations. It sometimes makes sense to expose an asynchronous data access interface instead or in addition to the synchronous interface.

What did we gain from this abstract data layer?

One-stop shop for data access operations.
Clear view of the data management requirements of our applications in domain terms.
Ability to change the concrete data layer implementation at will.
Ability to develop the domain/business logic layer early against the interface before the concrete data layer is complete or stable.
Last but not least, the ability to mock the data layer for fast and flexible testing of the domain/business logic.

Errors and Error Handling in the Data Layer

The data may be stored in multiple distributed data stores, on multiple clusters across different geographical locations in a combination of on-premise data centers and the cloud.

There will be failures, and those failures need to be handled. Ideally, the error handling logic (retries, timeouts, notification of catastrophic failures) can be handled by the concrete data layer. The domain logic code should just get back the data or a generic error when the data is unreachable.

In some cases, the domain logic may want more granular access to the data and select a fallback strategy in certain situations (e.g. only partial data is available because part of the cluster is inaccessible, or the data is stale because the cache wasn't refreshed). Those aspects have implications for the design of your data layer and for its testing.

As far as testing goes, you should return your own errors defined in the abstract data layer and map all concrete error messages to your own error types or rely on very generic error messages.

Mocking Data Access Code

Let's mock our data layer. The purpose of the mock is to replace the real data layer during tests. That requires the mock data layer to expose the same interface and to be able to respond to each sequence of methods with a canned (or calculated) response.

In addition, it's useful to keep track of how many times each method was called. I will not demonstrate it here, but it is even possible to keep track of the order of calls to different methods and which arguments were passed to each method to ensure a certain chain of calls.

Here is the mock data layer struct.

package concrete_data_layer

import (
    . "abstract_data_layer"
)


const (
	GET_USERS = iota
	GET_USER_BY_EMAIL 
	GET_LABELS          
	GET_SONGS
	GET_SONGS_BY_USER
	GET_SONG_BY_LABEL
	ERRORS            
)

type MockDataLayer struct {
	Errors                  []error
	GetUsersResponses       [][]User
	GetUserByEmailResponses []User
	GetLabelsResponses      [][]Label
	GetSongsResponses       [][]Song
	GetSongsByUserResponses [][]Song
	GetSongsByLabelResponses[][]Song
	Indices                 []int
}

func NewMockDataLayer() MockDataLayer {
	return MockDataLayer{Indices: []int{0, 0, 0, 0, 0, 0, 0, 0}}
}

The const statement lists all the supported operations and the errors. Each operation has its own index in the Indices slice. The index for each operation represents how many times the corresponding method was called as well as what the next response and error should be.

For each method that has a return value in addition to an error, there is a slice of responses. When the mock method is called, the corresponding response and error (based on the index for this method) are returned. For methods that don't have a return value except an error, there is no need to define a XXXResponses slice.

Note that the Errors are shared by all methods. That means that if you want to test a sequence of calls, you'll need to inject the correct number of errors in the correct order. An alternative design would use for each response a pair consisting of the return value and error. The NewMockDataLayer() function returns a new mock data layer struct with all indices initialized to zero.

Here is the implementation of the GetUsers() method, which illustrates these concepts.

func(m *MockDataLayer) GetUsers() (users []User, err error) {
    i := m.Indices[GET_USERS]
	users = m.GetUsersResponses[i]
	if len(m.Errors) > 0 {
		err = m.Errors[m.Indices[ERRORS]]
		m.Indices[ERRORS]++
	}
	m.Indices[GET_USERS]++
	return
}

The first line gets the current index of the GET_USERS operation (will be 0 initially).

The second line gets the response for the current index.

The third through fifth lines assign the error of the current index if the Errors field was populated and increment the errors index. When testing the happy path, the error will be nil. To make it easier to use, you can just avoid initializing the Errors field and then every method will return nil for the error.

The next line increments the index, so the next call will get the proper response.

The last line just returns. The named return values for users and err are already populated (or nil by default for err).

Here is another method, GetLabels(), which follows the same pattern. The only difference is which index is used and what collection of canned responses is used.

func(m *MockDataLayer) GetLabels() (labels []Label, err error) {
    i := m.Indices[GET_LABELS]
	labels = m.GetLabelsResponses[i]
	if len(m.Errors) > 0 {
		err = m.Errors[m.Indices[ERRORS]]
		m.Indices[ERRORS]++
	}
	m.Indices[GET_LABELS]++
	return
}

This is a prime example of a use case where generics could save a lot of boilerplate code. It's possible to take advantage of reflection to the same effect, but it's outside the scope of this tutorial. The main take-away here is that the mock data layer can follow a general-purpose pattern and support any testing scenario, as you'll see soon.

How about some methods that just return an error? Check out the CreateUser() method. It is even simpler because it only deals with errors and doesn't need to manage the canned responses.

func(m *MockDataLayer) CreateUser(user User) (err error) {
    if len(m.Errors) > 0 {
		i := m.Indices[CREATE_USER]
		err = m.Errors[m.Indices[ERRORS]]
		m.Indices[ERRORS]++
	}
	return
}

This mock data layer is just an example of what it takes to mock an interface and provide some useful services to test. You can come up with your own mock implementation or use available mock libraries. There is even a standard GoMock framework.

I personally find mock frameworks easy to implement and prefer to roll my own (often generating them automatically) because I spend most of my development time writing tests and mocking dependencies. YMMV.

Testing Against an Abstract Data Layer

Now that we have a mock data layer, let's write some tests against it. It's important to realize that here we don't test the data layer itself. We will test the data layer itself with other methods later in this series. The purpose here is to test the logic of the code that depends on the abstract data layer.

For example, suppose a user wants to add a song, but we have a quota of 100 songs per user. The expected behavior is that if the user has fewer than 100 songs and the added song is new, it will be added. If the song already exists then it returns a "Duplicate song" error. If the user already has 100 songs then it returns a "Song quota exceeded" error.

Let's write a test for these test cases using our mock data layer. This is a white-box test, meaning you need to know which methods of the data layer the code under test is going to call and in which order so you can populate the mock responses and errors properly. So the test-first approach is not ideal here. Let's write the code first.

Here is the SongManager struct. It depends only on the abstract data layer. That will enable you to pass it an implementation of a real data layer in production, but a mock data layer during testing.

The SongManager itself is completely agnostic to the concrete implementation of the DataLayer interface. The SongManager struct also accepts a user, which it stores. Presumably, each active user has its own SongManager instance, and users can only add songs for themselves. The NewSongManager() function ensures the input DataLayer interface is not nil.

package song_manager

import (
    "errors"
	. "abstract_data_layer"
)


const (
	MAX_SONGS_PER_USER = 100
)


type SongManager struct {
    user User
	dal DataLayer
}

func NewSongManager(user User, 
                    dal DataLayer) (*SongManager, error) {
	if dal == nil {
		return nil, errors.New("DataLayer can't be nil")
	}
	return &SongManager{user, dal}, nil
}

Let's implement an AddSong() method. The method calls the data layer's GetSongsByUser() first, and then it goes through several checks. If everything is OK, it calls the data layer's AddSong() method and returns the result.

func(lm *SongManager) AddSong(newSong Song, 
                              labels []Label) error {
    songs, err := lm.dal.GetSongsByUser(lm.user)
	if err != nil {
		return nil
	}

	// Check if song is a duplicate
	for _, song := range songs {
		if song.Url == newSong.Url {
			return errors.New("Duplicate song")
		}
	}

	// Check if user has max number of songs
	if len(songs) == MAX_SONGS_PER_USER {
		return errors.New("Song quota exceeded")
	}

	return lm.dal.AddSong(user, newSong, labels)
}

Looking at this code, you can see that there are two other test cases we neglected: the calls to the data layer's methods GetSongByUser() and AddSong() might fail for other reasons. Now, with the implementation of SongManager.AddSong() in front of us, we can write a comprehensive test that covers all the use cases. Let's start with the happy path. The TestAddSong_Success() method creates a user named Gigi and a mock data layer.

It populates the GetSongsByUserResponses field with a slice that contains an empty slice, which will result in an empty slice when the SongManager calls GetSongsByUser() on the mock data layer with no error. There is no need to do anything for the call to the mock data layer's AddSong() method, which will return nil error by default. The test just verifies that indeed no error was returned from the parent call to the SongManager's AddSong() method.

package song_manager

import (
    "testing"
	. "abstract_data_layer"
	. "concrete_data_layer"
)

func TestAddSong_Success(t *testing.T) {
	u := User{Name:"Gigi", Email: "gg@gg.com"}
	mock := NewMockDataLayer()
	// Prepare mock responses
	mock.GetSongsByUserResponses = [][]Song{{}}

	lm, err := NewSongManager(u, &mock)
	if err != nil {
		t.Error("NewSongManager() returned 'nil'")
	}
    url := https://www.youtube.com/watch?v=MlW7T0SUH0E"
	err = lm.AddSong(Song{Url: url", Name: "Chacarron"}, nil)
	if err != nil {
		t.Error("AddSong() failed")
	}
}

$ go test
PASS
ok  	song_manager	0.006s

Testing error conditions is super easy too. You have full control on what the data layer returns from the calls to GetSongsByUser() and AddSong(). Here is a test to verify that when adding a duplicate song you get the proper error message back.

func TestAddSong_Duplicate(t *testing.T) {
    u := User{Name:"Gigi", Email: "gg@gg.com"}

	mock := NewMockDataLayer()
	// Prepare mock responses
	mock.GetSongsByUserResponses = [][]Song{{testSong}}

	lm, err := NewSongManager(u, &mock)
	if err != nil {
		t.Error("NewSongManager() returned 'nil'")
	}

	err = lm.AddSong(testSong, nil)
	if err == nil {
		t.Error("AddSong() should have failed")
	}

	if err.Error() != "Duplicate song" {
		t.Error("AddSong() wrong error: " + err.Error())
	}
}

The following two test cases test that the correct error message is returned when the data layer itself fails. In the first case the data layer's GetSongsByUser() returns an error.

func TestAddSong_DataLayerFailure_1(t *testing.T) {
    u := User{Name:"Gigi", Email: "gg@gg.com"}

	mock := NewMockDataLayer()
	// Prepare mock responses
	mock.GetSongsByUserResponses = [][]Song{{}}
	e := errors.New("GetSongsByUser() failure")
	mock.Errors = []error{e}

	lm, err := NewSongManager(u, &mock)
	if err != nil {
		t.Error("NewSongManager() returned 'nil'")
	}

	err = lm.AddSong(testSong, nil)
	if err == nil {
		t.Error("AddSong() should have failed")
	}

	if err.Error() != "GetSongsByUser() failure" {
		t.Error("AddSong() wrong error: " + err.Error())
	}
}

In the second case, the data layer's AddSong() method returns an error. Since the first call to GetSongsByUser() should succeed, the mock.Errors slice contains two items: nil for the first call and the error for the second call.

func TestAddSong_DataLayerFailure_2(t *testing.T) {
    u := User{Name:"Gigi", Email: "gg@gg.com"}

	mock := NewMockDataLayer()
	// Prepare mock responses
	mock.GetSongsByUserResponses = [][]Song{{}}
	e := errors.New("AddSong() failure")
	mock.Errors = []error{nil, e}

	lm, err := NewSongManager(u, &mock)
	if err != nil {
		t.Error("NewSongManager() returned 'nil'")
	}

	err = lm.AddSong(testSong, nil)
	if err == nil {
		t.Error("AddSong() should have failed")
	}

	if err.Error() != "AddSong() failure" {
		t.Error("AddSong() wrong error: " + err.Error())
	}
}

Conclusion

In this tutorial, we introduced the concept of an abstract data layer. Then, using the personal music management domain, we demonstrated how to design a data layer, build a mock data layer, and use the mock data layer to test the application.

In part two, we will focus on testing using a real in-memory data layer. Stay tuned.