Next lesson playing in 5 seconds

  • Overview
  • Transcript

2.3 Core ML With Live Video

In this lesson, I'll show you how to extend our image recognition system to use data from a live video source, by using Apple's ARKit framework.

2 lessons, 10:35


What Are Machine Learning and Core ML?

2.Image Recognition With Core ML
3 lessons, 23:32

The Core ML Model

Using the Vision Framework for Image Recognition

Core ML With Live Video

3.Bonus: Natural Language Processing With Core ML
1 lesson, 05:58

Natural Language Processing With Core ML

1 lesson, 01:03


2.3 Core ML With Live Video

Hi, and welcome back to Image Recognition on iOS 11 with CoreML. In this lesson, we are going to do Image Recognition using live video. I'm going to jump right into the course project this time, as we have covered all the basics for this in the last lesson. The only difference is what we supply as data through request handler. Let's start off with interface builder. I'm going to use an ARKit scene view, here, since it's the most likely application to be used and provides easy access to the camera. Let's constrain it to the image view, and my plan is to show or hide them, depending on what's active. I also need a bar button item, let's make it a play button, and add some spacing between the buttons, so one's on the left and another one's on the right. In the view controller, I first have to import ARKit and then add an outlet for the sceneView. At the bottom of the class, I'm also adding an IBAction named toggleCamera that takes care of starting and stopping the ARKit session. It's so simple, I'm going to write it right now. First, we check if the image view is hidden or not. It serves as a dirty indicator. If so, we should show it and hide the scene view. As well as pause the AR session on the scene view. In the else block we can hide the image view, show the scene view and start the air session with a standard wall tracking configuration. If you want to learn more about how this works, check out my course on ARKit here on Tuts+. Okay, it's time to hook up the outlet and action in the interface builder. Before I forget, let's also add the NsCameraUsageDescription to the info.plist since we need that to use the phone's camera. You have seen how long it takes to classify a static image. So doing classification on every frame is not practical. I'm going to add a tap gesture recognizer in view.load that starts classifying the current image after the user has tapped on the screen. Since we are using this function in a selector, we have to prefix it with @objectiveC. Here, we can just just copy the code from perform classification. Instead of a CI image, we are going to use a pixel buffer. And use the static orientation of mirrored and no options To get to pixel buffer, we can ask our scene view session for the current frame and get the captured image. Since this can be null, let's put it in a gap clause. All that's left to do is to use the VGG model for the request and it is off to the races. It's also a good idea to update the label. I'm using the VGG label to indicate that it's computing. Let me build a run and try on classifying objects, like this keyboard. Well, it doesn't like it at the beginning, but at the end, It is fairly confident to have recognized the keyboard. This is image recognition with live video in a nutshell. To recap, you can use ARKit and its scene to easily capture the current frame. The image processor can do much more with ARKit, like detecting faces, which you can track then. The next lesson is a bonus that doesn't have anything to do with image recognition, but can also benefit from the additions to CoreML. It's natural language processing. See you there.

Back to the top