New! Unlimited audio, video & web asset downloads! Unlimited audio, video & web assets! From \$16.50/m

Difficulty:BeginnerLength:LongLanguages:
This post is part of a series called Data Structures Succinctly: Part 1.
Sorting Algorithms

The first data structure we will be looking at is the linked list, and with good reason. Besides being a nearly ubiquitous structure used in everything from operating systems to video games, it is also a building block with which many other data structures can be created.

Overview

In a very general sense, the purpose of a linked list is to provide a consistent mechanism to store and access an arbitrary amount of data. As its name implies, it does this by linking the data together into a list.

Before we dive into what this means, let’s start by reviewing how data is stored in an array.

As the figure shows, array data is stored as a single contiguously allocated chunk of memory that is logically segmented. The data stored in the array is placed in one of these segments and referenced via its location, or index, in the array.

This is a good way to store data. Most programming languages make it very easy to allocate arrays and operate on their contents. Contiguous data storage provides performance benefits (namely data locality), iterating over the data is simple, and the data can be accessed directly by index (random access) in constant time.

There are times, however, when an array is not the ideal solution.

Consider a program with the following requirements:

1. Read an unknown number of integers from an input source (`NextValue` method) until the number 0xFFFF is encountered.
2. Pass all of the integers that have been read (in a single call) to the `ProcessItems` method.

Since the requirements indicate that multiple values need to be passed to the `ProcessItems` method in a single call, one obvious solution would involve using an array of integers. For example:

This solution has several problems, but the most glaring is seen when more than 20 values are read. As the program is now, the values from 21 to n are simply ignored. This could be mitigated by allocating more than 20 values—perhaps 200 or 2000. Maybe the size could be configured by the user, or perhaps if the array became full a larger array could be allocated and all of the existing data copied into it. Ultimately these solutions create complexity and waste memory.

What we need is a collection that allows us to add an arbitrary number of integer values and then enumerate over those integers in the order that they were added. The collection should not have a fixed maximum size and random access indexing is not necessary. What we need is a linked list.

Before we go on and learn how the linked list data structure is designed and implemented, let’s preview what our ultimate solution might look like.

Notice that all of the problems with the array solution no longer exist. There are no longer any issues with the array not being large enough or allocating more than is necessary.

You should also notice that this solution informs some of the design decisions we will be making later, namely that the `LinkedList` class accepts a generic type argument and implements the `IEnumerable` interface.

The Node

At the core of the linked list data structure is the Node class. A node is a container that provides the ability to both store data and connect to other nodes.

In its simplest form, a Node class that contains integers could look like this:

With this we can now create a very primitive linked list. In the following example we will allocate three nodes (first, middle, and last) and then link them together into a list.

We now have a linked list that starts with the node `first` and ends with the node `last`. The `Next` property for the last node points to null which is the end-of-list indicator. Given this list, we can perform some basic operations. For example, the value of each node’s `Data` property:

The `PrintList` method works by iterating over each node in the list, printing the value of the current node, and then moving on to the node pointed to by the `Next` property.

Now that we have an understanding of what a linked list node might look like, let’s look at the actual `LinkedListNode` class.

Before implementing our `LinkedList` class, we need to think about what we’d like to be able to do with the list.

Earlier we saw that the collection needs to support strongly typed data so we know we want to create a generic interface.

Since we’re using the .NET framework to implement the list, it makes sense that we would want this class to be able to act like the other built-in collection types. The easiest way to do this is to implement the `ICollection<T>` interface. Notice I choose `ICollection<T>` and not `IList<T>`. This is because the `IList<T>` interface adds the ability to access values by index. While direct indexing is generally useful, it cannot be efficiently implemented in a linked list.

With these requirements in mind we can create a basic class stub, and then through the rest of the section we can fill in these methods.

 Behavior Adds the provided value to the end of the linked list. Performance O(1)

1. Allocate the new `LinkedListNode` instance.
2. Find the last node of the existing list.
3. Point the `Next` property of the last node to the new node.

The key is to know which node is the last node in the list. There are two ways we can know this. The first way is to keep track of the first node (the “head” node) and walk the list until we have found the last node. This approach does not require that we keep track of the last node, which saves one reference worth of memory (whatever your platform pointer size is), but does require that we perform a traversal of the list every time a node is added. This would make `Add` an O(n) operation.

The second approach requires that we keep track of the last node (the “tail” node) in the list and when we add the new node we simply access our stored reference directly. This is an O(1) algorithm and therefore the preferred approach.

The first thing we need to do is add two private fields to the `LinkedList` class: references to the first (head) and last (tail) nodes.

Next we need to add the method that performs the three steps.

First, it allocates the new `LinkedListNode` instance. Next, it checks whether the list is empty. If the list is empty, the new node is added simply by assigning the `_head` and `_tail` references to the new node. The new node is now both the first and last node in the list. If the list is not empty, the node is added to the end of the list and the `_tail` reference is updated to point to the new end of the list.

The `Count` property is incremented when a node is added to ensure the `ICollection<T>`. `Count` property returns the accurate value.

Remove

 Behavior Removes the first node in the list whose value equals the provided value. The method returns true if a value was removed. Otherwise it returns false. Performance O(n)

Before talking about the `Remove` algorithm, let’s take a look at what it is trying to accomplish. In the following figure, there are four nodes in a list. We want to remove the node with the value three.

When the removal is done, the list will be modified such that the `Next` property on the node with the value two points to the node with the value four.

The basic algorithm for node removal is:

1. Find the node to remove.
2. Update the Next property of the node that precedes the node being removed to point to the node that follows the node being removed.

As always, the devil is in the details. There are a few cases we need to be thinking about when removing a node:

• The list might be empty, or the value we are trying to remove might not be in the list. In this case the list would remain unchanged.
• The node being removed might be the only node in the list. In this case we simply set the `_head` and `_tail` fields to `null`.
• The node to remove might be the first node. In this case there is no preceding node, so instead we need to update the `_head` field to point to the new head node.
• The node might be in the middle of the list.
• The node might be the last node in the list. In this case we update the `_tail` field to reference the penultimate node in the list and set its `Next` property to `null`.

The `Count` property is decremented when a node is removed to ensure the `ICollection<T>`. `Count` property returns the accurate value.

Contains

 Behavior Returns a Boolean that indicates whether the provided value exists within the linked list. Performance O(n)

The `Contains` method is quite simple. It looks at every node in the list, from first to last, and returns true as soon as a node matching the parameter is found. If the end of the list is reached and the node is not found, the method returns `false`.

GetEnumerator

 Behavior Returns an IEnumerator instance that allows enumerating the linked list values from first to last. Performance Returning the enumerator instance is an O(1) operation. Enumerating every item is an O(n) operation.

`GetEnumerator` is implemented by enumerating the list from the first to last node and uses the C# `yield` keyword to return the current node’s value to the caller.

Notice that the LinkedList implements the iteration behavior in the `IEnumerable<T>` version of the GetEnumerator method and defers to this behavior in the IEnumerable version.

Clear

 Behavior Removes all the items from the list. Performance O(1)

The `Clear` method simply sets the `_head` and `_tail` fields to null to clear the list. Because .NET is a garbage collected language, the nodes do not need to be explicitly removed. It is the responsibility of the caller, not the linked list, to ensure that if the nodes contain `IDisposable` references they are properly disposed of.

CopyTo

 Behavior Copies the contents of the linked list from start to finish into the provided array, starting at the specified array index. Performance O(n)

The `CopyTo` method simply iterates over the list items and uses simple assignment to copy the items to the array. It is the caller’s responsibility to ensure that the target array contains the appropriate free space to accommodate all the items in the list.

Count

 Behavior Returns an integer indicating the number of items currently in the list. When the list is empty, the value returned is 0. Performance O(1)

`Count` is simply an automatically implemented property with a public getter and private setter. The real behavior happens in the `Add`, `Remove`, and `Clear` methods.

 Behavior Returns false if the list is not read-only. Performance O(1)

The LinkedList class we just created is known as a singly, linked list. This means that there exists only a single, unidirectional link between a node and the next node in the list. There is a common variation of the linked list which allows the caller to access the list from both ends. This variation is known as a doubly linked list.

To create a doubly linked list we will need to first modify our LinkedListNode class to have a new property named Previous. Previous will act like Next, only it will point to the previous node in the list.

The following sections will only describe the changes between the singly linked list and the new doubly linked list.

Node Class

The only change that will be made in the `LinkedListNode` class is the addition of a new property named `Previous` which points to the previous `LinkedListNode` in the linked list, or returns `null` if it is the first node in the list.

While the singly linked list only added nodes to the end of the list, the doubly linked list will allow adding nodes to the start and end of the list using `AddFirst` and `AddLast`, respectively. The `ICollection<T>`. `Add` method will defer to the `AddLast` method to retain compatibility with the singly linked `List` class.

 Behavior Adds the provided value to the front of the list. Performance O(1)

When adding a node to the front of the list, the actions are very similar to adding to a singly linked list.

1. Set the `Next` property of the new node to the old head node.
2. Set the `Previous` property of the old head node to the new node.
3. Update the `_tail` field (if necessary) and increment `Count`.

 Behavior Adds the provided value to the end of the list. Performance O(1)

Adding a node to the end of the list is even easier than adding one to the start.

The new node is simply appended to the end of the list, updating the state of `_tail` and `_head` as appropriate, and `Count` is incremented.

And as mentioned earlier, `ICollection<T>`.Add will now simply call `AddLast`.

Remove

Like `Add`, the `Remove` method will be extended to support removing nodes from the start or end of the list. The `ICollection<T>`.Remove method will continue to remove items from the start with the only change being to update the appropriate `Previous` property.

RemoveFirst

 Behavior Removes the first value from the list. If the list is empty, no action is taken. Returns true if a value was removed. Otherwise it returns false. Performance O(1)

`RemoveFirst` updates the list by setting the linked list’s `head` property to the second node in the list and updating its `Previous` property to `null`. This removes all references to the previous head node, removing it from the list. If the list contained only a singleton, or was empty, the list will be empty (the `head` and `tail` properties will be `null`).

RemoveLast

 Behavior Removes the last node from the list. If the list is empty, no action is performed. Returns true if a value was removed. Otherwise, it returns false. Performance O(1)

`RemoveLast` works by setting the list's `tail` property to be the node preceding the current tail node. This removes the last node from the list. If the list was empty or had only one node, when the method returns the `head` and `tail` properties, they will both be `null`.

Remove

 Behavior Removes the first node in the list whose value equals the provided value. The method returns true if a value was removed. Otherwise it returns false. Performance O(n)

The `ICollection<T>`. `Remove` method is nearly identical to the singly linked version except that the `Previous` property is now updated during the remove operation. To avoid repeated code, the method calls `RemoveFirst` when it is determined that the node being removed is the first node in the list.

But Why?

We can add nodes to the front and end of the list—so what? Why do we care? As it stands right now, the doubly linked `List` class is no more powerful than the singly linked list. But with just one minor modification, we can open up all kinds of possible behaviors. By exposing the `head` and `tail` properties as read-only public properties, the linked list consumer will be able to implement all sorts of new behaviors.

With this simple change we can enumerate the list manually, which allows us to perform reverse (tail-to-head) enumeration and search.

For example, the following code sample shows how to use the list's `Tail` and `Previous` properties to enumerate the list in reverse and perform some processing on each node.

Additionally, the doubly linked `List` class allows us to easily create the `Deque` class, which is itself a building block for other classes. We will discuss this class later in another section.

Next Up

This completes the second part about linked lists. Next up, we'll move on to the array list.