How to Use the Google Cloud Vision API in Android Apps

Computer vision is considered an AI-complete problem. In other words, solving it would be equivalent to creating a program that's as smart as humans. Needless to say, such a program is yet to be created. However, if you've ever used apps like Google Goggles or Google Photos—or watched the segment on Google Lens in the keynote of Google I/O 2017—you probably realize that computer vision has become very powerful.

Through a REST-based API called Cloud Vision API, Google shares its revolutionary vision-related technologies with all developers. By using the API, you can effortlessly add impressive features such as face detection, emotion detection, and optical character recognition to your Android apps. In this tutorial, I'll show you how.

Prerequisites

To be able to follow this tutorial, you must have:

a Google Cloud Platform account
a project on the Google Cloud console
the latest version of Android Studio
and a device that runs Android 4.4 or higher

If some of the above requirements sound unfamiliar to you, I suggest you read the following introductory tutorial about the Google Cloud Machine Learning platform:

How to Use Google Cloud Machine Learning Services for Android

Ashraff Hathibelagal

27 Apr 2017

1. Enabling the Cloud Vision API

You can use the Cloud Vision API in your Android app only after you've enabled it in the Google Cloud console and acquired a valid API key. So start by logging in to the console and navigating to API Manager > Library > Vision API. In the page that opens, simply press the Enable button.

If you've already generated an API key for your Cloud console project, you can skip to the next step because you will be able to reuse it with the Cloud Vision API. Otherwise, open the Credentials tab and select Create Credentials > API key.

In the dialog that pops up, you will see your API key.

2. Adding Dependencies

Like most other APIs offered by Google, the Cloud Vision API can be accessed using the Google API Client library. To use the library in your Android Studio project, add the following compile dependencies in the app module's build.gradle file:

1	compile 'com.google.api-client:google-api-client-android:1.22.0'
2	compile 'com.google.apis:google-api-services-vision:v1-rev357-1.22.0'
3	compile 'com.google.code.findbugs:jsr305:2.0.1'

Furthermore, to simplify file I/O operations, I suggest you also add a compile dependency for the Apache Commons IO library.

1	compile 'commons-io:commons-io:2.5'

Because the Google API Client can work only if your app has the INTERNET permission, make sure the following line is present in your project's manifest file:

1	<uses-permission android:name="android.permission.INTERNET"/>

3. Configuring the API Client

You must configure the Google API client before you use it to interact with the Cloud Vision API. Doing so primarily involves specifying the API key, the HTTP transport, and the JSON factory it should use. As you might expect, the HTTP transport will be responsible for communicating with Google's servers, and the JSON factory will, among other things, be responsible for converting the JSON-based results the API generates into Java objects.

For modern Android apps, Google recommends that you use the NetHttpTransport class as the HTTP transport and the AndroidJsonFactory class as the JSON factory.

The Vision class represents the Google API Client for Cloud Vision. Although it is possible to create an instance of the class using its constructor, doing so using the Vision.Builder class instead is easier and more flexible.

While using the Vision.Builder class, you must remember to call the setVisionRequestInitializer() method to specify your API key. The following code shows you how:

Vision.Builder visionBuilder = new Vision.Builder(
                new NetHttpTransport(),
                new AndroidJsonFactory(),
                null);

visionBuilder.setVisionRequestInitializer(
            new VisionRequestInitializer("YOUR_API_KEY"));

Once the Vision.Builder instance is ready, you can call its build() method to generate a new Vision instance you can use throughout your app.

1	Vision vision = visionBuilder.build();

At this point, you have everything you need to start using the Cloud Vision API.

4. Detecting and Analyzing Faces

Detecting faces in photographs is a very common requirement in computer vision-related applications. With the Cloud Vision API, you can create a highly accurate face detector that can also identify emotions, lighting conditions, and face landmarks.

For the sake of demonstration, we'll be running face detection on the following photo, which features the crew of Apollo 9:

I suggest you download a high-resolution version of the photo from Wikimedia Commons and place it in your project's res/raw folder.

Step 1: Encode the Photo

The Cloud Vision API expects its input image to be encoded as a Base64 string that's placed inside an Image object. Before you generate such an object, however, you must convert the photo you downloaded, which is currently a raw image resource, into a byte array. You can quickly do so by opening its input stream using the openRawResource() method of the Resources class and passing it to the toByteArray() method of the IOUtils class.

Because file I/O operations should not be run on the UI thread, make sure you spawn a new thread before opening the input stream. The following code shows you how:

// Create new thread
AsyncTask.execute(new Runnable() {
    @Override
    public void run() {
        // Convert photo to byte array
        InputStream inputStream = 
                    getResources().openRawResource(R.raw.photo);
        byte[] photoData = IOUtils.toByteArray(inputStream);
        inputStream.close();

        // More code here
    }
});

You can now create an Image object by calling its default constructor. To add the byte array to it as a Base64 string, all you need to do is pass the array to its encodeContent() method.

1	Image inputImage = new Image();
2	inputImage.encodeContent(photoData);

Step 2: Make a Request

Because the Cloud Vision API offers several different features, you must explicitly specify the feature you are interested in while making a request to it. To do so, you must create a Feature object and call its setType() method. The following code shows you how to create a Feature object for face detection only:

1	Feature desiredFeature = new Feature();
2	desiredFeature.setType("FACE_DETECTION");

Using the Image and the Feature objects, you can now compose an AnnotateImageRequest instance.

1	AnnotateImageRequest request = new AnnotateImageRequest();
2	request.setImage(inputImage);
3	request.setFeatures(Arrays.asList(desiredFeature));

Note that an AnnotateImageRequest object must always belong to a BatchAnnotateImagesRequest object because the Cloud Vision API is designed to process multiple images at once. To initialize a BatchAnnotateImagesRequest instance containing a single AnnotateImageRequest object, you can use the Arrays.asList() utility method.

1	BatchAnnotateImagesRequest batchRequest =
2	new BatchAnnotateImagesRequest();
3
4	batchRequest.setRequests(Arrays.asList(request));

To actually make the face detection request, you must call the execute() method of an Annotate object that's initialized using the BatchAnnotateImagesRequest object you just created. To generate such an object, you must call the annotate() method offered by the Google API Client for Cloud Vision. Here's how:

1	BatchAnnotateImagesResponse batchResponse =
2	vision.images().annotate(batchRequest).execute();

Step 3: Use the Response

Once the request has been processed, you get a BatchAnnotateImagesResponse object containing the response of the API. For a face detection request, the response contains a FaceAnnotation object for each face the API has detected. You can get a list of all FaceAnnotation objects using the getFaceAnnotations() method.

1	List<FaceAnnotation> faces = batchResponse.getResponses()
2	.get(0).getFaceAnnotations();

A FaceAnnotation object contains a lot of useful information about a face, such as its location, its angle, and the emotion it is expressing. As of version 1, the API can only detect the following emotions: joy, sorrow, anger, and surprise.

To keep this tutorial short, let us now simply display the following information in a Toast:

The count of the faces
The likelihood that they are expressing joy

You can, of course, get the count of the faces by calling the size() method of the List containing the FaceAnnotation objects. To get the likelihood of a face expressing joy, you can call the intuitively named getJoyLikelihood() method of the associated FaceAnnotation object.

Note that because a simple Toast can only display a single string, you'll have to concatenate all the above details. Additionally, a Toast can only be displayed from the UI thread, so make sure you call it after calling the runOnUiThread() method. The following code shows you how:

// Count faces
int numberOfFaces = faces.size();

// Get joy likelihood for each face
String likelihoods = "";
for(int i=0; i<numberOfFaces; i++) {
    likelihoods += "\n It is " +
                faces.get(i).getJoyLikelihood() + 
                " that face " + i + " is happy";
}

// Concatenate everything
final String message =
        "This photo has " + numberOfFaces + " faces" + likelihoods;

// Display toast on UI thread
runOnUiThread(new Runnable() {
    @Override
    public void run() {
        Toast.makeText(getApplicationContext(),
                message, Toast.LENGTH_LONG).show();
    }
});

You can now go ahead and run the app to see the following result:

5. Reading Text

The process of extracting strings from photos of text is called optical character recognition, or OCR for short. The Cloud Vision API allows you to easily create an optical character reader that can handle photos of both printed and handwritten text. What's more, the reader you create will have no trouble reading angled text or text that's overlaid on a colorful picture.

The API offers two different features for OCR:

TEXT_DETECTION, for reading small amounts of text, such as that present on signboards or book covers
and DOCUMENT_TEXT_DETECTION, for reading large amounts of text, such as that present on the pages of a novel

The steps you need to follow in order to make an OCR request are identical to the steps you followed to make a face detection request, except for how you initialize the Feature object. For OCR, you must set its type to either TEXT_DETECTION or DOCUMENT_TEXT_DETECTION. For now, let's go with the former.

1	Feature desiredFeature = new Feature();
2	desiredFeature.setType("TEXT_DETECTION");

You will, of course, also have to place a photo containing text inside your project's res/raw folder. If you don't have such a photo, you can use this one, which shows a street sign:

You can download a high-resolution version of the above photo from Wikimedia Commons.

In order to start processing the results of an OCR operation, after you obtain the BatchAnnotateImagesResponse object, you must call the getFullTextAnnotation() method to get a TextAnnotation object containing all the extracted text.

1	final TextAnnotation text = batchResponse.getResponses()
2	.get(0).getFullTextAnnotation();

You can then call the getText() method of the TextAnnotation object to actually get a reference to a string containing the extracted text.

The following code shows you how to display the extracted text using a Toast:

1	Toast.makeText(getApplicationContext(),
2	text.getText(), Toast.LENGTH_LONG).show();

If you run your app now, you should see something like this:

Conclusion

In this tutorial you learned how to use the Cloud Vision API to add face detection, emotion detection, and optical character recognition capabilities to your Android apps. I'm sure you'll agree with me when I say that these new capabilities will allow your apps to offer more intuitive and smarter user interfaces.

It's worth mentioning that there's one important feature that's missing in the Cloud Vision API: face recognition. In its current form, the API can only detect faces, not identify them.

To learn more about the API, you can refer to the official documentation.

And meanwhile, check out some of our other tutorials on adding computer learning to your Android apps!