Next lesson playing in 5 seconds

  • Overview
  • Transcript

3.3 Multiple Axes and Smoothing

Whenever one or two axes aren’t enough, you can always use more. In this lesson, I’ll show you how to use multiple axes to compare different data sets.

Related Links

3.3 Multiple Axes and Smoothing

Hi, and welcome back to Learn Data Visualization with D3.js. In this lesson, I want to take a few minutes to talk about axis and labels. We've already talked a lot about scales, and they go hand in hand with axis and labels. You could say that scales provide the mathematical model, while axis as well as labels are there to visualize it. Axis are normally only used when you have an orthogonal chart type, like a line area or bar chart, or even a 3D scatter plot. There’s only one type of chart that has this visual representation, which is a gauge. But the same mathematical principles apply in the background. In D3,js, you can have left, bottom, right, and top axis. Having two axis for direction is useful if you want to draw a multiple datasets at the same time to have different units to compare them. A perfect example would be to plot the temperature and humidity, which have different units, degrees, Celsius, or Fahrenheit, depending on where you live. And usually, relative humidity and percent. In the lesson about loading data, you've already seen the dataset I'm going to use in my example here. It's weather data from Vienna. Let's start from the beginning. I need an SVG element with an ID of chart. A few books attribute as well as width and height. Then, it's onto the JavaScript. First, let's set some basic variables like always. I'm going to fetch the chart element and set a margin object. This time, I have to increase the bottom and right margins a little bit, since there will be content as well. Then, we also have to get the widths and heights of the charts. Now, let's load the data of the dsv function. The limiter is a semicolon, and the data is at weather-data-vie.csv. I have to transform the data to be processable, which we can do in the callback function, and return the new data. This is the same as in the lesson about data loading. And then callback, I'm going to create a time scale and set the range to the width of the document. Then, it needs a domain, which is from the first to the last state in our time series data. For the humidity and temperature, we need two scales. The first one is going to be for the humidity, and just gets a range. If I omit the domain, it will map from zero to one, which is what we want, as it represents the humidity from a 0 to a 100%. For the second scale, y2, I'm going to have the same range, but the domain will be the minimum and maximum temperature, as we also have negative values. Now, some boilerplate again. I'm going to add a group element that moves the whole chart to the correct bounds. Then, it's time for the axis. First, the horizontal time axis. We're going to add a class of axis toward freestyling, and it transform to move it to the bottom. Now, it's time to use the bottom axis generator. I'm going to add a time format that shows just the month and year, as I know that the text will always be on the first of the month. Finally, I'm going to move and rotate the resulting tick labels to be at an angle, which just looks nicer. Now, it's time for the left axis. The procedure is similar, but we're going to have text every 10%. I'm also going to add a label that gets rotated and moved to the left of the axis. It reads humidity and percent. There are texts that's almost exactly the same. Some just going to copy it. I just have to change the access generator to access right and to divide attribute to be positive, as well as the access class and label text. Finally, we need some lines. As in our previous lesson, we're using the complete dataset at once using the datum function. I'm also adding some classes from coloring later, and then the D attribute to draw the line with D3. For the x-value, we are using the x scale function with the date, and for the y-value, y1, for the humidity. I'm going to do exactly the same for the temperature as well, just changing the scale and grad value, and also the class. Finally, let's make the graph responsive as it's easy to do, by setting the width to 100% and the height to null. Before we have a look at the result, let's add some basic styling. Lines don't have a third color and the temperature line should be red. Likewise, humidity shall be blue. For the excess text, I want the base fill color of black, but temperature should be red again, and humidity blue. Now, let's have a look at our result. The dataset is a whole year with hourly measurements, and it can be varied quite a bit especially with humidity. To make it less noisy, we could apply some smoothing, and we're going to do exactly that. This is something you will have to do on your own, but it's not that hard. I'm going to create a convolute function to average the values with the kernel. The kernel is basically a weight that defines how much the values contribute to the overall value. Depending on the kernel size, we need to determine the center first. Then, we need a quick clamping function to clamp some value to some upper and lower bounds. This is important for our start and end values, as they don't have values below or above them. The bounds are going to be 0, and the last index of the data array. Now comes the meat of the convolution. I'm going to map over all the data to calculate the weight averages. Since we still need to have at least a date value, I'm going to merge the original object and add a smooth value that condensed the new data point. It's important that you put the original object last because otherwise, it will be placed in the same object instead of a new one. To calculate the average, I'm going to use reduce under kernel. To get some accumulator, which is the running value, a reference to the weight and the running index k. The return value of this reduce function will be the accumulated value plus the weight multiplied by the value we get from our accessor. The accessor is used to specify which data value we return, temperature or humidity, much like the callbacks from D3. The data point you pass on will be determined by using the clamp function, offsetting by i, the index of the data point, and then using k in the center to get the points around it. Okay, now we need to create a suitable kernel. There's a list of kernel functions on Wikipedia. The link is in the lesson notes, but distributes weights differently. One of the most common used is the Gaussian term, and that's also the one I'm going to use. So let's create a new array with, let's say, 201 items and fill it with an initial value for everyone here at 0. I'm very aggressive with the values to achieve a higher grade of smoothing. In your case, it might vary how many values you have to use. Now, we can map over the data and create the new array. I don't need the actual value, but I need the index and a reference to the array itself. Kernels go from -1 to +1, well, except for a Gaussian one, which goes from a -3 to +3. This is what the u value is all about. I'm going to calculate this using the index and the overall array length, then we can use the formula from Wikipedia. 1 divided by the square root of 2 pi multiplied by e to the power of minus one-half times u-squared. Since our kernel needs to have a sum of 1, I also have to scale it down by dividing the values by the overall sum in another map statement. Now, we can get the smooth data for humidity and temperature. Finally, let's add two more lines by copying the parts from above and change them to use the smooth data. I'm also going to add an opacity of 0.1 to move the original values to the background. Now, this looks much nicer. It only is a moving average of the actual measurement, but it's now clearer what the temperatures and especially the humidity levels were over the year. Let's do a little lesson recap. Multiple axis can be useful if you want to compare related datasets to each other that have different units. Smoothing can be done with a kernel, the most popular one is the Gaussian kernel. When smoothing, you create a weighted average of the values around the original to eliminate spikes. The kernel defines how much each value contributes to the total. In the next lesson we are going to look at one of the classic chart types, area charts. See you there.

Back to the top