64x64 icon dark hosting
Choose a hosting plan here and get a free year's subscription to Tuts+ (worth $180).

iOS SDK Augmented Reality: Video Processing


Start a hosting plan from $3.92/mo and get a free year on Tuts+ (normally $180)

Welcome to the latest installment in our premium series on augmenting reality with the iOS SDK! In today's tutorial, I will be teaching you how to process and analyze live video streaming from the device camera in order to enhance our view of the world with helpful overlay information.

Where We Left Off. . .

In the first post in this series, I introduced you to the AVFoundation framework and we did all the groundwork necessary to begin displaying the camera feed within our app. If you haven't already, be sure to check out part 1!

Today's Demo App

The term "Augmented Reality" has become a buzz phrase in recent years as smartphones have become powerful enough to place this technology into our pockets. Unfortunately, all the publicity surrounding the term has generated a lot of confusion about what Augmented Reality actually is and how it can be used to enhance our interaction with the world. To clarify, the goal of an Augmented Reality application should be to take a user's existing view or perception of the world and enhance that perception by providing additional information or perspectives that aren't naturally apparent. One of the earliest and most practical implementations of Augmented Reality is the "First Down" highlight commonly seen when watching American Football games on television. The subtle nature of this overlay is exactly what makes it a perfect use of AR. The user isn't distracted by the tech, but their perspective is naturally enhanced.

The demo Augmented Reality application that we'll be building today is very thematically similar to the Football first and 10 system. We'll be processing each video frame from the camera and calculating the average RGB color of that frame. We'll then display the result as an RGB, Hex, and color swatch overlay. This is a small, simple enhancement to the view, but it should serve our purpose of demonstrating how to access the camera video stream and process it with a custom algorithm. So, let's get started!

Step 1: Add Project Resources

In order to complete this tutorial, we're going to need a few more frameworks.

Following the same steps described in Step 1 in the previous tutorial in this series, import these frameworks:

  • Core Video Framework

    As the name implies, this framework is used for video procesing and is primarily responsible for providing video buffering support. The main data type that we are interested in from this framework is CVImageBufferRef, which will be used to access the buffered picture data from our video stream. We'll also use many different CoreVideo functions, including CVPixelBufferGetBytesPerRow() and CVPixelBufferLockBaseAddress(). Anytime you see a "CV" prepended to a data type or function name, you'll know that it came from CoreVideo.

  • Core Media Framework

    Core Media provides the low-level support upon which the AV Foundation framework (added in the last tutorial) is built. We're really just interested in the CMSampleBufferRef data type and the CMSampleBufferGetImageBuffer() function from this framework.

  • Quartz Core Framework

    Quartz Core is responsible for much of the animation that you see when using an iOS device. We only need this in our project for one reason, the CADisplayLink. More on this later on in the tutorial.

In addition to these frameworks, we're also going to need the UIColor Utilties project to add a handy category to the UIColor object. To get this code, you can either visit the project page on GitHub or just look for the files UIColor-Expanded.h and UIColor-Expanded.m in the source code for this tutorial. Either way, you will need to add both of those files to your Xcode project, and then import the category in ARDemoViewController.m:

Step 2: Implement AVCapture Delegate

When we left off in the last tutorial, we had implemented a preview layer displaying what the device camera was able to see, but we didn't actually have a way to access and process the frame data. The first step in doing so is the AV Foundation delegate -captureOutput:didOutputSampleBuffer:fromConnection:. This delegate method will provide us with access to a CMSampleBufferRef of the image data for the current video frame. We'll need to massage that data into a useful form, but this will be a good start.

In ARDemoViewController.h, you need to conform to the right delegate:

Now add the delegate method in ARDemoViewController.m:

The last step we need to take is to modify our video stream code from the last tutorial to designate the current view controller as the delegate:

On line 5 - 8 above, we're configuring additional settings for the video output generated by our AVCaptureSession. Specifically, we're specifying that we want to receive each pixel buffer formatted as kCVPixelFormatType_32BGRA, which basically means that it will be returned as a 32 bit value ordered as Blue, Green, Red, and Alpha channels. This is a bit different from 32 bit RGBA, which is most commonly used, but a slight change in the channel ordering won't slow us down. :)

Next, on lines 9 - 10, we create a dispatch queue that the sample buffer will be processed within. A queue is necessary to provide us with enough time to actually process one frame before receiving the next. While this does theoretically mean that our camera view is only ever capable of seeing the past, the delay shouldn't be noticeable at all if you are processing each frame efficiently.

With the above code in place, the ARDemoViewController should begin receiving delegate calls with a CMSampleBuffer of each video frame!

Step 3: Convert the Sample Buffer

Now that we have a CMSampleBuffer we need to convert it to a format that we can more easily process. Because we're working with Core Video, the format of our choice will be CVImageBufferRef. Add the following line to the delegate method:

Okay, now it's time to start actually processing this pixel buffer. For this tutorial, we're building an Augmented Reality app that can look at any scene or picture and tell us what the Hex and RGB color average of the frame is. Let's create a new method called findColorAverage: just to accomplish that task.

Add the method declaration in ARDemoViewController.h:

And then add an implementation stub in ARDemoViewController.m:

Finally, call the new method at the end of the delegate implementation:

Step 4: Iterate Over the Pixel Buffer

For each video frame that we receive, we want to calculate the average of all pixel values contained in the image. The first step in doing this is to iterate over each pixel contained in the CVImageBufferRef. While the algorithm we will use in the rest of this tutorial is specific to our application, this particular step, iterating over the frame pixels, is a very common task in many different Augmented Reality applications of this type.

The following code will iterate over each pixel in the frame:

On line 3 above, we call CVPixelBufferLockBaseAddress on our pixel buffer to prevent changes from occurring to the data while it is being processed.

On lines 5 - 7, we get some basic meta information about the pixel buffer, namely the height and width of the buffer and the number of bytes stored in each row.

On line 9, a char pointer is used to declare a pixel. In C, the char data type can hold a single byte of data. By default, a single byte of data in C can hold any integer in the range -128 to 127. If that byte is "unsigned", it can hold an integer value between 0 and 255. Why is this important? Because each of the RGB values we want to access are within the range of 0 - 255, meaning they require a single byte to store. You'll also recall that we configured our video output to return a 32 bit value in BGRA format. Because each Byte is equal to 8 bits, this should make a lot more sense now: 8 (B) + 8 (G) + 8 (R) + 8 (A) = 32. To summarize: we're using a char pointer to refer to our pixel data because each RGBA value contains one byte of data.

On line 11 we use a pointer to reference the starting address of the pixel buffer in memory. A char is used here for the same reason it is used for our pixel variable. The entire pixel buffer ref is just a series of repeating RGBA values, so by setting the rowBase variable at the first memory address in the buffer, we'll be able to start looping over all the values next.

Lines 13 - 14 form a nested loop that will iterate over every pixel value in the pixel buffer.

On line 16 we actually assign the pixel buffer to the beginning memory address of the current RGBA sequence. From here we'll be able to reference each byte in the sequence.

Finally, on line 21, we unlock the pixel buffer after completing our processing.

Step 5: Find Frame Color Average

Iterating over the pixel buffer won't do us much good unless we make use of the information. For our project we're looking to return a single color that represents the average of all pixels in the frame. Start by adding a currentColor variable to the .h file:

Be sure to also synthesize this value:

Next, modify the findColorAverage: method like so:

You can see that we've started off our changes with adding the variables red_sum, green_sum, blue_sum, alpha_sum, and count. Calculating an average for pixel values is done the same way that you would calculate an average for anything else. So our RGBA sum variables will hold the total sum of each value as we interate, and the count variable will hold the total pixel count, incrementing each time through the loop.

Pixel assignment actually occurs on lines 24 - 26. Because our pixel variable is just a pointer to a particular byte in memory, we're able to access subsequent memory addresses just like you might expect to be able to with an array. Note that the BGRA ordering is what you would expect for index values: B = 0, G = 1, R = 2, A = 3. We won't actually use the alpha value for anything useful in this tutorial, but I've included it here for the sake of completeness.

After our nested loop has finished iterating through the array, it's time to set the color result generated as the current color. This is just elementary math. The sum of each RGB value is divided by the total number of pixels in the image to generate the average. Because the UIColor method call expects float values and we've been dealing with integers, we divide again by 255 to get the equivelant value as a float.

At this point, each frame from the camera is being processed and currentColor holds the average of all colors from each frame! Pretty cool, but so far we haven't really augmented anything. The user has no way to beneift from this information until we provide an overlay with the data. We'll do that next.

Step 6: Add Interface Overlays

To help users make sense of the information we've calculated, we're going to create an overaly with three objects: a UILabel to hold the hex representation of the color, a UILabel to hold the RGB representation of the color, and a UIView to actually display the color calculated.

Open the ARDemoViewController.xib file and add the two labels. Set the background color for each to black and the font color to white. This will ensure that it stands out on any background. Then set the font to something like Helvetica and increase the size to around 28. We want the text to be easily visible. Connect these labels to IBOutlets in ARDemoViewController.h, making sure they are also synthesized in ARDemoViewController.m (Interface Builder can now do this for you with drag-and-drop). Name one label hexLabel and the other one rgbLabel.

While still in Interface Builder, drag-and-drop a UIView onto the main view and adjust it to be of the size and in the position of your choosing. Connect the view as an IBOutlet and name it colorSwatch.

After you've completed this step, your XIB file should look something like this:

Each object that you added should also be connected via IBOutlets to ARDemoViewController.

The last thing to do is to make sure that these objects are visible after we add the preview layer to the screen. To do this, add the following lines to -viewDidLoad in ARDemoViewController:

Step 7: Create a CADisplayLink

Because the findColorAverage: function needs to execute as quickly as possible to prevent late frames being dropped in the queue, doing a interface work in that function is not advisable. Instead, the findColorAverage: simply calculates the average color of the frame and saves it for later use. We can now setup a second function that will actually do something with the currentColor value, and we could even place that processing on a separate thread if needed. For this project, we just want to update the interface overlay about 4 times per second. We could use an NSTimer for this purpose, but I prefer to use a CADisplayLink whenever possible because it is more consistent and reliable than NSTimer.

In the ARDemoViewController implementation file, add the following to the -viewDidLoad method:

CADisplayLink will fire an update 60 times per second, so by setting the frame interval to 15, we'll initiate a call to the selector updateColorDisplay 4 times each second.

To prevent this from causing a run-time error, let's go ahead and add a selector stub:

Step 8: Update Interface Values

We're now ready to actually augment our display with quasi-useful information about the world around us! Add the following lines of code to updateColorDisplay:

What we're doing above is really pretty straightforward. Because currentColor is stored as a UIColor object, we can just set the backgroundColor property of colorSwatch to it directly. For both labels, we just setup a custom NSString format and make use of the UIColor-Expanded category to easily access both the hex representation of the color and the RGB values.

Step 9: Testing the Application

In order to test your work, I've included 3 simple HTML files in the "test" folder of the project download. Each file has a solid background (red, green, and blue). If you fill your computer monitor with this web page open and point our iPhone app at the screen, you should be able to see the appropriate color popup on the AR display.

Wrap Up

Congratulations! You've built your first real Augmented Reality application!

While we can certainly debate the value of the information we're augmenting our view of the world with, I personally find this project far more interesting than most of the "location-aware" augmented reality apps currently on the market, including the seminal "Tubes" application. Why? Because when I'm in the city looking for a subway, I don't want a "circle-of-the-earth" arrow pointing me through buildings towards my goal. Instead, it is infinitely more practical to simply use directions from Google Maps. For this reason, every location-aware Augmented Reality application I've come across is really little more than a novelty project. While the project we built today is also somewhat of a novelty project, my hope is that it's been an entertaining example of how to begin making your own Augmetned Reality applications that can actually add meaningful information to the world around us.

If you found this tutorial helpful, let me know on Twitter: @markhammonds. I'd love to see what you come up with!

Where to Go From Here. . .

As I'm sure you've guessed, so far we've really just touched the surface of what Augmented Reality applications can do. The next step in your education will largely depend on what you would like to achieve with an AR application.

If you're interested in location-aware Augmented Reality applications, you should really be thinking about using an open-source AR toolkit. While it is certainly possible for you to code your own AR toolkit, doing so is a very complex and advanced task. I would encourage you not to re-invent the wheel, and to look at these projects instead:

A few additional Augmented Reality solutions and projects that support marker-based implementations include:

If you're specifically interested in image processing like we demonstrated in this tutorial and would like to get into more advanced feature and object recognition, good starting points include:

Next Time?

As I hope you can see from the content of this tutorial, Augmented Reality is an incredibly broad topic with lots of different avenues to pursue, from image processing and object recognition to location finding and even 3D gaming.

We are certainly interested in covering all of the above on Mobiletuts+, but we want to make sure our content is relevant to what readers would like to see. Let us know what aspect of Augmented Reality you would like to see more tutorials on by either leaving a comment on Mobiletuts+, posting to our Facebook group wall, or messaging me directly on twitter: @markhammonds.

Until next time. . .thanks for reading!