The Repository Design Pattern

The Repository Design Pattern, defined by Eric Evens in his Domain Driven Design book, is one of the most useful and most widely applicable design patterns ever invented. Any application has to work with persistence and with some kind of list of items. These can be users, products, networks, disks, or whatever your application is about. If you have a blog for example, you have to deal with lists of blog posts and lists of comments. The problem that all of these list management logics have in common is how to connect business logic, factories and persistence.

The Factory Design Pattern

As we mentioned in the introductory paragraph, a Repository will connect Factories with Gateways (persistence). These are also design patterns and if you are not familiar with them, this paragraph will shed some light on the subject.

A factory is a simple design pattern that defines a convenient way to create objects. It is a class or set of classes responsible for creating the objects our business logic needs. A factory traditionally has a method called "make()" and it will know how to take all the information needed to build an object and do the object building itself and return a ready-to-use object to the business logic.

Here is a little bit more on the Factory Pattern in an older Nettuts+ tutorial: A Beginner's Guide to Design Patterns. If you prefer a deeper view on the Factory Pattern, check out the first design pattern in the Agile Design Patterns course we have on Tuts+.

The Gateway Pattern

Also known as "Table Data Gateway" is a simple pattern that offers the connection between the business logic and the database itself. Its main responsibility is to do the queries on the database and provide the retrieved data in a data structure typical for the programming language (like an array in PHP). This data is then usually filtered and modified in the PHP code so that we can obtain the information and variables needed to crate our objects. This information must then be passed to the Factories.

The Gateway Design Pattern is explained and exemplified in quite great detail in a Nettuts+ tutorial about

The Comment

We will start with a test that will force us to think about what our Comment object should contain.

class RepositoryTest extends PHPUnit_Framework_TestCase {

	function testACommentHasAllItsComposingParts() {
		$postId = 1;
		$commentAuthor = "Joe";
		$commentAuthorEmail = "joe@gmail.com";
		$commentSubject = "Joe Has an Opinion about the Repository Pattern";
		$commentBody = "I think it is a good idea to use the Repository Pattern to persist and retrieve objects.";

		$comment = new Comment($postId, $commentAuthor, $commentAuthorEmail, $commentSubject, $commentBody);
	}

}

At first glance, a Comment will just be a data object. It may not have any functionality, but that is up to the context of our application to decide. For this example just assume it is a simple data object. Constructed with a set of variables.

1	class Comment {
2
3	}

Just by creating an empty class and requiring it in the test makes it pass.

require_once '../Comment.php';

class RepositoryTest extends PHPUnit_Framework_TestCase {

[ ... ]

}

But that's far from perfect. Our test does not test anything yet. Let's force ourselves to write all the getters on the Comment class.

function testACommentsHasAllItsComposingParts() {
	$postId = 1;
	$commentAuthor = "Joe";
	$commentAuthorEmail = "joe@gmail.com";
	$commentSubject = "Joe Has an Opinion about the Repository Pattern";
	$commentBody = "I think it is a good idea to use the Repository Pattern to persist and retrieve objects.";

	$comment = new Comment($postId, $commentAuthor, $commentAuthorEmail, $commentSubject, $commentBody);

	$this->assertEquals($postId, $comment->getPostId());
	$this->assertEquals($commentAuthor, $comment->getAuthor());
	$this->assertEquals($commentAuthorEmail, $comment->getAuthorEmail());
	$this->assertEquals($commentSubject, $comment->getSubject());
	$this->assertEquals($commentBody, $comment->getBody());
}

To control the length of the tutorial, I wrote all the assertions at once and we will implement them at once as well. In real life, take them one by one.


class Comment {

	private $postId;
	private $author;
	private $authorEmail;
	private $subject;
	private $body;

	function __construct($postId, $author, $authorEmail, $subject, $body) {
		$this->postId = $postId;
		$this->author = $author;
		$this->authorEmail = $authorEmail;
		$this->subject = $subject;
		$this->body = $body;
	}

	public function getPostId() {
		return $this->postId;
	}

	public function getAuthor() {
		return $this->author;
	}

	public function getAuthorEmail() {
		return $this->authorEmail;
	}

	public function getSubject() {
		return $this->subject;
	}

	public function getBody() {
		return $this->body;
	}

}

Except for the list of private variables, the rest of the code was generated by my IDE, NetBeans, so testing the auto generated code may be a little bit of overhead some times. If you are not writing these lines by yourself, feel free to do them directly and don't bother with tests for setters and constructors. Nevertheless, the test helped us better expose our ideas and better document what our Comment class will contain.

We can also consider these test methods and test classes as our "Client" classes from the schemas.

Our Gateway to Persistence

To keep this example as simple as possible, we will implement only an InMemoryPersistence so that we do not complicate our existence with file systems or databases.

require_once '../InMemoryPersistence.php';

class InMemoryPersistenceTest extends PHPUnit_Framework_TestCase {

	function testItCanPerisistAndRetrieveASingleDataArray() {
		$data = array('data');

		$persistence = new InMemoryPersistence();
		$persistence->persist($data);

		$this->assertEquals($data, $persistence->retrieve(0));
	}

}

As usual, we start with the simplest test that could possibly fail and also force us to write some code. This test creates a new InMemoryPersistence object and tries to persist and retrieve an array called data.

require_once __DIR__ . '/Persistence.php';

class InMemoryPersistence implements Persistence {

	private $data = array();

	function persist($data) {
		$this->data = $data;
	}

	function retrieve($id) {
		return $this->data;
	}

}

The simplest code to make it pass is just to keep the incoming $data in a private variable and return it in the retrieve method. The code as it is right now does not care about the sent in $id variable. It is the simplest thing that could possibly make the test pass. We also took the liberty to introduce and implement an interface called Persistence.

interface Persistence {

	function persist($data);
	function retrieve($ids);

}

This interface defines the two methods any Gateway needs to implement. Persist and retrieve. As you probably already guessed, our Gateway is our InMemoryPersistence class and our physical persistence is the private variable holding our data in the memory. But let's get back to the implementation of this in memory persistence.

function testItCanPerisistSeveralElementsAndRetrieveAnyOfThem() {
	$data1 = array('data1');
	$data2 = array('data2');

	$persistence = new InMemoryPersistence();
	$persistence->persist($data1);
	$persistence->persist($data2);

	$this->assertEquals($data1, $persistence->retrieve(0));
	$this->assertEquals($data2, $persistence->retrieve(1));
}

We added another test. In this one we persist two different data arrays. We expect to be able to retrieve each of them individually.

require_once __DIR__ . '/Persistence.php';

class InMemoryPersistence implements Persistence {
	private $data = array();


	function persist($data) {
		$this->data[] = $data;
	}

	function retrieve($id) {
		return $this->data[$id];
	}
}

The test forced us to slightly alter our code. We now need to add data to our array, not just replace it with the one sent in to persists(). We also need to consider the $id parameter and return the element at that index.

This is enough for our InMemoryPersistence. If needed, we can modify it later.

Our Factory

We have a Client (our tests), a persistence with a Gateway, and Comment objects to persists. The next missing thing is our Factory.

We started our coding with a RepositoryTest file. This test, however, actually created a Comment object. Now we need to create tests to verify if our Factory will be able to create Comment objects. It seems like we had an error in judgment and our test is more likely a test for our upcoming Factory than for our Repository. We can move it into another test file, CommentFactoryTest.

require_once '../Comment.php';

class CommentFactoryTest extends PHPUnit_Framework_TestCase {

	function testACommentsHasAllItsComposingParts() {
		$postId = 1;
		$commentAuthor = "Joe";
		$commentAuthorEmail = "joe@gmail.com";
		$commentSubject = "Joe Has an Opinion about the Repository Pattern";
		$commentBody = "I think it is a good idea to use the Repository Pattern to persist and retrieve objects.";

		$comment = new Comment($postId, $commentAuthor, $commentAuthorEmail, $commentSubject, $commentBody);

		$this->assertEquals($postId, $comment->getPostId());
		$this->assertEquals($commentAuthor, $comment->getAuthor());
		$this->assertEquals($commentAuthorEmail, $comment->getAuthorEmail());
		$this->assertEquals($commentSubject, $comment->getSubject());
		$this->assertEquals($commentBody, $comment->getBody());
	}
}

Now, this test obviously passes. And while it is a correct test, we should consider modifying it. We want to create a Factory object, pass in an array and ask it to create a Comment for us.

require_once '../CommentFactory.php';

class CommentFactoryTest extends PHPUnit_Framework_TestCase {

	function testACommentsHasAllItsComposingParts() {
		$postId = 1;
		$commentAuthor = "Joe";
		$commentAuthorEmail = "joe@gmail.com";
		$commentSubject = "Joe Has an Opinion about the Repository Pattern";
		$commentBody = "I think it is a good idea to use the Repository Pattern to persist and retrieve objects.";

		$commentData = array($postId, $commentAuthor, $commentAuthorEmail, $commentSubject, $commentBody);

		$comment = (new CommentFactory())->make($commentData);

		$this->assertEquals($postId, $comment->getPostId());
		$this->assertEquals($commentAuthor, $comment->getAuthor());
		$this->assertEquals($commentAuthorEmail, $comment->getAuthorEmail());
		$this->assertEquals($commentSubject, $comment->getSubject());
		$this->assertEquals($commentBody, $comment->getBody());
	}
}

We should never name our classes based on the design pattern they implement, but Factory and Repository represent more than just the design pattern itself. I personally have nothing against including these two words in our class's names. However I still strongly recommend and respect the concept of not naming our classes after the design patterns we use for the rest of the patterns.

This test is just slightly different from the previous one, but it fails. It tries to create a CommentFactory object. That class does not exist yet. We also try to call a make() method on it with an array containing all the information of a comment as an array. This method is defined in the Factory interface.

1	interface Factory {
2	function make($data);
3	}

This is a very common Factory interface. It defined the only required method for a factory, the method that actually creates the objects we want.

require_once __DIR__ . '/Factory.php';
require_once __DIR__ . '/Comment.php';

class CommentFactory implements Factory {

	function make($components) {
		return new Comment($components[0], $components[1], $components[2], $components[3], $components[4]);
	}

}

And CommentFactory implements the Factory interface successfully by taking the $components parameter in its make() method, creates and returns a new Comment object with the information from there.

We will keep our persistence and object creation logic as simple as possible. We can, for this tutorial, safely ignore any error handling, validation and exception throwing. We will stop here with the persistence and object creation implementation.

Using a Repository to Persist Comments

As we've seen above, we can use a Repository in two ways. To retrieve information from persistence and also to persist information on the persistence layer. Using TDD it is, most of the time, easier to start with the saving (persisting) part of the logic and then use that existing implementation to test data retrieval.

require_once '../../../vendor/autoload.php';
require_once '../CommentRepository.php';
require_once '../CommentFactory.php';

class RepositoryTest extends PHPUnit_Framework_TestCase {

	protected function tearDown() {
		\Mockery::close();
	}

	function testItCallsThePersistenceWhenAddingAComment() {

		$persistanceGateway = \Mockery::mock('Persistence');
		$commentRepository = new CommentRepository($persistanceGateway);

		$commentData = array(1, 'x', 'x', 'x', 'x');
		$comment = (new CommentFactory())->make($commentData);

		$persistanceGateway->shouldReceive('persist')->once()->with($commentData);

		$commentRepository->add($comment);
	}

}

We use Mockery to mock our persistence and inject that mocked object to the Repository. Then we call add() on the repository. This method has a parameter of type Comment. We expect the persistence to be called with an array of data similar to $commentData.

require_once __DIR__ . '/InMemoryPersistence.php';

class CommentRepository {

	private $persistence;

	function __construct(Persistence $persistence = null) {
		$this->persistence = $persistence ? : new InMemoryPersistence();
	}

	function add(Comment $comment) {
		$this->persistence->persist(array(
			$comment->getPostId(),
			$comment->getAuthor(),
			$comment->getAuthorEmail(),
			$comment->getSubject(),
			$comment->getBody()
		));
	}

}

As you can see, the add() method is quite smart. It encapsulates the knowledge about how to transform a PHP object into a plain array usable by the persistence. Remember, our persistence gateway usually is a general object for all of our data. It can and will persist all the data of our application, so sending to it objects would make it do too much: both conversion and effective persistence.

When you have an InMemoryPersistence class like we do, it is very fast. We can use it as an alternative to mocking the gateway.

function testAPersistedCommentCanBeRetrievedFromTheGateway() {

	$persistanceGateway = new InMemoryPersistence();
	$commentRepository = new CommentRepository($persistanceGateway);

	$commentData = array(1, 'x', 'x', 'x', 'x');
	$comment = (new CommentFactory())->make($commentData);

	$commentRepository->add($comment);

	$this->assertEquals($commentData, $persistanceGateway->retrieve(0));
}

Of course if you do not have an in-memory implementation of your persistence, mocking is the only reasonable way to go. Otherwise your test will be just too slow to be practical.

function testItCanAddMultipleCommentsAtOnce() {

	$persistanceGateway = \Mockery::mock('Persistence');
	$commentRepository = new CommentRepository($persistanceGateway);

	$commentData1 = array(1, 'x', 'x', 'x', 'x');
	$comment1 = (new CommentFactory())->make($commentData1);
	$commentData2 = array(2, 'y', 'y', 'y', 'y');
	$comment2 = (new CommentFactory())->make($commentData2);

	$persistanceGateway->shouldReceive('persist')->once()->with($commentData1);
	$persistanceGateway->shouldReceive('persist')->once()->with($commentData2);

	$commentRepository->add(array($comment1, $comment2));
}

Our next logical step is to implement a way to add several comments at once. Your project may not require this functionality and it is not something required by the pattern. In fact, the Repository Pattern only says that it will provide a custom query and persistence language for our business logic. So if our bushiness logic feels the need of adding several comments at once, the Repository is the place where the logic should reside.

function add($commentData) {
	if (is_array($commentData))
		foreach ($commentData as $comment)
			$this->persistence->persist(array(
				$comment->getPostId(),
				$comment->getAuthor(),
				$comment->getAuthorEmail(),
				$comment->getSubject(),
				$comment->getBody()
			));
	else
		$this->persistence->persist(array(
			$commentData->getPostId(),
			$commentData->getAuthor(),
			$commentData->getAuthorEmail(),
			$commentData->getSubject(),
			$commentData->getBody()
		));
}

And the simplest way to make the test pass is to just verify if the parameter we are getting is an array or not. If it is an array, we will cycle through each element and call the persistence with the array we generate from one single Comment object. And while this code is syntactically correct and makes the test pass, it introduces a slight duplication that we can get rid of quite easily.

function add($commentData) {
	if (is_array($commentData))
		foreach ($commentData as $comment)
			$this->addOne($comment);
	else
		$this->addOne($commentData);
}

private function addOne(Comment $comment) {
	$this->persistence->persist(array(
		$comment->getPostId(),
		$comment->getAuthor(),
		$comment->getAuthorEmail(),
		$comment->getSubject(),
		$comment->getBody()
	));
}

When all the tests are green, it is alway time for refactoring before we continue with the next failing test. And we did just that with the add() method. We extracted the addition of a single comment into a private method and called it from two different places in our public add() method. This not only reduced duplication but also opened the possibility of making the addOne() method public and letting the business logic decide if it wants to add one or several comments at a time. This would lead to a different implementation of our Repository, with an addOne() and another addMany() methods. It would be a perfectly legitimate implementation of the Repository Pattern.

Retrieving Comments With Our Repository

A Repository provides a custom query language for the business logic. So the names and functionalities of the query methods of a Repository is hugely up to the requirements of the business logic. You build your repository as you build your business logic, as you need another custom query method. However, there are at least one or two methods that you will find on almost any Repository.

function testItCanFindAllComments() {
	$repository = new CommentRepository();

	$commentData1 = array(1, 'x', 'x', 'x', 'x');
	$comment1 = (new CommentFactory())->make($commentData1);
	$commentData2 = array(2, 'y', 'y', 'y', 'y');
	$comment2 = (new CommentFactory())->make($commentData2);

	$repository->add($comment1);
	$repository->add($comment2);

	$this->assertEquals(array($comment1, $comment2), $repository->findAll());
}

The first such method is called findAll(). This should return all the objects the repository is responsible for, in our case Comments. The test is simple, we add a comment, then another one, and finally we want to call findAll() and get a list containing both comments. This is however not accomplish-able with our InMemoryPersistence as it is at this point. A small update is required.

1	function retrieveAll() {
2	return $this->data;
3	}

That's it. We added a retrieveAll() method which just returns the whole $data array from the class. Simple and effective. It's time to implement findAll() on the CommentRepository now.

function findAll() {
	$allCommentsData = $this->persistence->retrieveAll();
	$comments = array();
	foreach ($allCommentsData as $commentData)
		$comments[] = $this->commentFactory->make($commentData);
	return $comments;
}

findAll() will call the retrieveAll() method on our persistence. That method provides a raw array of data. findAll() will cycle through each element and use the data as necessary to be passed to the Factory. The factory will provide one Comment a time. An array with these comments will be built and returned at the end of findAll(). Simple and effective.

Another common method you will find on repositories is to search for a specific object or group of objects based on their characteristic key. For example, all of our comments are connected to a blog post by a $postId internal variable. I can imagine that in our blog's business logic we would almost always want to find all the comments related to a blog post when that blog post is displayed. So a method called findByPostId($id) sounds reasonable to me.

function testItCanFindCommentsByBlogPostId() {
	$repository = new CommentRepository();

	$commentData1 = array(1, 'x', 'x', 'x', 'x');
	$comment1 = (new CommentFactory())->make($commentData1);
	$commentData2 = array(1, 'y', 'y', 'y', 'y');
	$comment2 = (new CommentFactory())->make($commentData2);
	$commentData3 = array(3, 'y', 'y', 'y', 'y');
	$comment3 = (new CommentFactory())->make($commentData3);

	$repository->add(array($comment1, $comment2));
	$repository->add($comment3);

	$this->assertEquals(array($comment1, $comment2), $repository->findByPostId(1));
}

We just create three simple comments. The first two have the same $postId = 1, the third one has $postID = 3. We add all of them to the repository and then we expect an array with the first two ones when we do a findByPostId() for the $postId = 1.

function findByPostId($postId) {
	return array_filter($this->findAll(), function ($comment) use ($postId){
		return $comment->getPostId() == $postId;
	});
}

The implementation couldn't be simpler. We find all the comments using our already implemented findAll() method and we filter the array. We have no way to ask the persistence to do the filtering for us, so we will do it here. The code will query each Comment object and compare its $postId with the one we sent in as parameter. Great. The test passes. But I feel we missed something.

function testItCanFindCommentsByBlogPostId() {
	$repository = new CommentRepository();

	$commentData1 = array(1, 'x', 'x', 'x', 'x');
	$comment1 = (new CommentFactory())->make($commentData1);
	$commentData2 = array(1, 'y', 'y', 'y', 'y');
	$comment2 = (new CommentFactory())->make($commentData2);
	$commentData3 = array(3, 'y', 'y', 'y', 'y');
	$comment3 = (new CommentFactory())->make($commentData3);

	$repository->add(array($comment1, $comment2));
	$repository->add($comment3);

	$this->assertEquals(array($comment1, $comment2), $repository->findByPostId(1));
	$this->assertEquals(array($comment3), $repository->findByPostId(3));
}

Adding a second assertion to obtain the third comment with the findByPostID() method reveals our mistake. Whenever you can easily test extra paths or cases, like in our case with a simple extra assertion, you should. These simple extra assertions or test methods can reveal hidden problems. Like in our case, array_filter() does not reindex the resulting array. And while we have an array with the correct elements, the indexes are messed up.

1) RepositoryTest::testItCanFindCommentsByBlogPostId
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => Comment Object (...)
+    2 => Comment Object (...)
 )

Now, you may consider this a shortcoming of PHPUnit or a shortcoming of your business logic. I tend to be rigorous with array indexes because I burned my hands with them a few times. So we should consider the error a problem with our logic in the CommentRepository.

function findByPostId($postId) {
	return array_values(
		array_filter($this->findAll(), function ($comment) use ($postId) {
			return $comment->getPostId() == $postId;
		})
	);
}

Yep. That simple. We just run the result through array_values() before returning it. It will nicely reindex our array. Mission accomplished.

Final Thoughts

And that's mission accomplished for our Repository also. We have a class usable by any other business logic class which offers an easy way to persist and retrieve objects. It also decouples the business logic from the factories and data persistence gateways. It reduced logic duplication and significantly simplifies the persistence and retrieval operations for our comments.

Remember, this design pattern can be used for all types of lists and as you start using it, you will see its usefulness. Basically, whenever you have to work with several objects of the same type, you should consider introducing a Repository for them. Repositories are specialized by object type and not general. So for a blog application, you may have distinct repositories for blog posts, for comments, for users, for user configurations, for themes, for designs, for or anything you may have multiple instances of.

And before concluding this, a Repository may have its own list of objects and it may do a local caching of objects. If an object can not be found in the local list, we retrieve it from the persistence, otherwise we serve it from our list. If used with caching, a Repository can be successfully combined with the Singleton Design Pattern.

As usual, thank you for your time and I sincerely hope I taught you something new today.

1
2	class Comment {
3
4	private $postId;
5	private $author;
6	private $authorEmail;
7	private $subject;
8	private $body;
9
10	function __construct($postId, $author, $authorEmail, $subject, $body) {
11	$this->postId = $postId;
12	$this->author = $author;
13	$this->authorEmail = $authorEmail;
14	$this->subject = $subject;
15	$this->body = $body;
16	}
17
18	public function getPostId() {
19	return $this->postId;
20	}
21
22	public function getAuthor() {
23	return $this->author;
24	}
25
26	public function getAuthorEmail() {
27	return $this->authorEmail;
28	}
29
30	public function getSubject() {
31	return $this->subject;
32	}
33
34	public function getBody() {
35	return $this->body;
36	}
37
38	}

1	require_once '../../../vendor/autoload.php';
2	require_once '../CommentRepository.php';
3	require_once '../CommentFactory.php';
4
5	class RepositoryTest extends PHPUnit_Framework_TestCase {
6
7	protected function tearDown() {
8	\Mockery::close();
9	}
10
11	function testItCallsThePersistenceWhenAddingAComment() {
12
13	$persistanceGateway = \Mockery::mock('Persistence');
14	$commentRepository = new CommentRepository($persistanceGateway);
15
16	$commentData = array(1, 'x', 'x', 'x', 'x');
17	$comment = (new CommentFactory())->make($commentData);
18
19	$persistanceGateway->shouldReceive('persist')->once()->with($commentData);
20
21	$commentRepository->add($comment);
22	}
23
24	}