2.2 Using the Vision Framework for Image Recognition
Core ML is a base framework that other frameworks and libraries use for more specific work. In this lesson, you’ll learn about Vision, Apple’s framework for image recognition.
1.Introduction2 lessons, 10:35
2.Image Recognition With Core ML3 lessons, 23:32
3.Bonus: Natural Language Processing With Core ML1 lesson, 05:58
4.Conclusion1 lesson, 01:03
2.2 Using the Vision Framework for Image Recognition
Hi and welcome back to Image Recognition on iOS with CoreML. In this lesson we are going to use a neural network to classify image data with the help of the vision framework. Vision is Apple's high performance image analysis toolkit for feature and face detection, classification and so on. Let's look at what Vision provides to connect with CoreML. First of all there is a VNCoreMLRequest clause that is used to detect image data. There are other classes for detecting bar codes, text and other analysis. It takes a CoreML model that is passed into the initializer by instantiating the autogenerated class and getting the model property from it. Since classification tasks take some time, it also provides a completion handler parameter. That will return the results of the process. There is also an option how to handle cropping and scaling images that don't fit into the target frame. You can either center and crop, or aspect fill or aspects fit. There are two classes that I use to process these requests. Either VNImageRequestHandler or VNSequenceRequestHandler. The latter one is used if you want handle a sequence of images, like video. The request handler, whichever you choose, will have a perform function that takes an array of requests. That's right, you can reuse multiple requests, for instance multiple classifiers and combine the results. After processing the request, the result will normally at a confidence score. And the accompanying category in a VNClassificationObservation object. So, let's implement image classification in the course project. First of all I'm going to need a toolbar. We can reuse the bar button item that's on it and change it to the camera. It's a little bit misleading but UI design is not the focus of this course. I also need an imageview, that will display the image for visual feedback. In my example, I'm going to use not one, but two classification models to compare them. One is going to be a five megabyte model called squeezenet. And the other one is a 550 megabyte one, call VGG16. To display the results l'm going to add a label each It gets a semi-transparent light background to be visible on this screen and gets center aligned. I can copy it as well for the second classifier. Let's switch to the view controller and add the outlets. I'm calling one vggClassificationLabel, the other squeezeClassificationLabel. I'm also going to add outlets for the image field. Finally, we need an action to handle the button being tapped. I'm going to call that takePicture. Then we just took everything up and into Face Builder. And then we are ready to implement our classification. But before I'm going to start with adding an image picker. To show it we are going to create an image picker controller. Set the view controller to its delegate and define the source type as photo library, then we can resend it. In an extension I'm going to add that UI picker image controller delegate prodigal, as well as the navigation control delegate one. And add the image picker controller, picker did finish picking with media info callback function. Within that function I'm going to dismiss the picker first of all. And then I can fetch the image from the info dictionary with the UI image picker controller original image key. And forcefully cost it as a UI image. As a final feedback step, I'm displaying the image in the image field. Now the fun part starts. We are going to call a function called update classifications, to get the new calculations from the neural networks. In this function I am first going to change the labels to classify, so we can visually see when the process starts and finishes. The update classification function mainly access a dispatcher to perform the classification for each model. To improve the classification results, we need an image orientation for CoreML. There is an image orientation property on the UI image, but unfortunately, we can't just use that. VNimage request handler, expects a CGImagePropertyOrientation enum. There is no built in converter for the image property orientation, but we can add that simply enough with an extension. Since, vision doesn't accept UI images, we need to convert them to CI images. There is a convenience initializer for this and I can put that into a GOD clause. I'm going to create a new Swift file and paste in a code snippet that converts a UI image orientation to the corresponding one in core graphics. Back in the ViewController, I'm going to call a second function twice. That also receives the image orientation and the model, or rather the CoreML request, that I'm going to store in a property. Now it's a good time to add the two models to the project. Since one is so big, I haven't included them in the repository. But you can download them from Apple's website that are provided in lesson notes. Just make sure that you copied your file if needed. These properties will be lazily initialized. Since almost everything can throw here, I'm going to add a do catch block around everything. To handle the model we need a wrapper and a vision framework. That is called VNCoreML model. There we can pass in the model. Then we need the request, which takes the model and a completion handler. It provides the request and then optional error. I'm going to call this play classification within it passing the request, the error and the label. Since I am referring self here, I need to use weak self to avoid retained cycles. Then I can set the image scale and crop option. I'm choosing scale fill and return the request. I don't want to deal with errors right now, so I'm just calling fatal error. I can copy and paste this code for a squeeze model, property, and just swap out the model and label. Let's go back down to our processing function. Classification can take some time, so we have to do this in a separate thread. That's why I'm using Dispatch Queue here. I want the results very quickly that's why I set the user initiated quality of service property for the queue. Within the block, we are going to initialize a VM image request handler with the image and the orientation. Then, within another do catch block, we can try to process the handler, passing in the request, within an array. All that it's missing now is the display function, let's add that. Since we are interacting with the UI, we need to do this on the main queue, so another dispatch essing is required. First of all, we are going to check if we have results. If we don't then we can simply show couldn't classify, and return. Otherwise, we are going to get an array of VN classification observation options back. I am only going to take the top two. Next, we check if it's empty. If so, we display Nothing found. Otherwise we can create some descriptions with a map, and return a formatted string in the loop that contains the confidence and category name, which is in the property identifier. Finally, let's add the label to include this classification data. And now, it's time to try it out. I have added a bus and a stealth plane to image library. By simply browsing to it in Safari and saving it on the device. You can see that there is quite a difference between the two classification algorithms. The smaller one is faster, but produces odd results sometimes. Which becomes obvious, when we classify the plane. To recap, the Vision framework is used for image analysis and provides an interface for interacting with CoreML classifications. You can combine multiple requests in one handler, and combine the results. Run image classification in the background, as it takes some time. Different classifiers have different accuracy. Try to find a good trade-off between size and quality. In our next lesson, I'm going to replace the static images with live video and do classification on the fly. See ya there.