1. Code
  2. Android SDK

How to Use the Google Cloud Vision API in Android Apps

Computer vision is considered an AI-complete problem. In other words, solving it would be equivalent to creating a program that's as smart as humans. Needless to say, such a program is yet to be created. However, if you've ever used apps like Google Goggles or Google Photos—or watched the segment on Google Lens in the keynote of Google I/O 2017—you probably realize that computer vision has become very powerful.

Through a REST-based API called Cloud Vision API, Google shares its revolutionary vision-related technologies with all developers. By using the API, you can effortlessly add impressive features such as face detection, emotion detection, and optical character recognition to your Android apps. In this tutorial, I'll show you how.


To be able to follow this tutorial, you must have:

If some of the above requirements sound unfamiliar to you, I suggest you read the following introductory tutorial about the Google Cloud Machine Learning platform:

1. Enabling the Cloud Vision API

You can use the Cloud Vision API in your Android app only after you've enabled it in the Google Cloud console and acquired a valid API key. So start by logging in to the console and navigating to API Manager > Library > Vision API. In the page that opens, simply press the Enable button.

Enable Cloud Vision API

If you've already generated an API key for your Cloud console project, you can skip to the next step because you will be able to reuse it with the Cloud Vision API. Otherwise, open the Credentials tab and select Create Credentials > API key.

Create API key

In the dialog that pops up, you will see your API key.

2. Adding Dependencies

Like most other APIs offered by Google, the Cloud Vision API can be accessed using the Google API Client library. To use the library in your Android Studio project, add the following compile dependencies in the app module's build.gradle file:

Furthermore, to simplify file I/O operations, I suggest you also add a compile dependency for the Apache Commons IO library.

Because the Google API Client can work only if your app has the INTERNET permission, make sure the following line is present in your project's manifest file:

3. Configuring the API Client

You must configure the Google API client before you use it to interact with the Cloud Vision API. Doing so primarily involves specifying the API key, the HTTP transport, and the JSON factory it should use. As you might expect, the HTTP transport will be responsible for communicating with Google's servers, and the JSON factory will, among other things, be responsible for converting the JSON-based results the API generates into Java objects. 

For modern Android apps, Google recommends that you use the NetHttpTransport class as the HTTP transport and the AndroidJsonFactory class as the JSON factory.

The Vision class represents the Google API Client for Cloud Vision. Although it is possible to create an instance of the class using its constructor, doing so using the Vision.Builder class instead is easier and more flexible.

While using the Vision.Builder class, you must remember to call the setVisionRequestInitializer() method to specify your API key. The following code shows you how:

Once the Vision.Builder instance is ready, you can call its build() method to generate a new Vision instance you can use throughout your app.

At this point, you have everything you need to start using the Cloud Vision API.

4. Detecting and Analyzing Faces

Detecting faces in photographs is a very common requirement in computer vision-related applications. With the Cloud Vision API, you can create a highly accurate face detector that can also identify emotions, lighting conditions, and face landmarks.

For the sake of demonstration, we'll be running face detection on the following photo, which features the crew of Apollo 9:

Sample photo for face detection

I suggest you download a high-resolution version of the photo from Wikimedia Commons and place it in your project's res/raw folder.

Step 1: Encode the Photo

The Cloud Vision API expects its input image to be encoded as a Base64 string that's placed inside an Image object. Before you generate such an object, however, you must convert the photo you downloaded, which is currently a raw image resource, into a byte array. You can quickly do so by opening its input stream using the openRawResource() method of the Resources class and passing it to the toByteArray() method of the IOUtils class.

Because file I/O operations should not be run on the UI thread, make sure you spawn a new thread before opening the input stream. The following code shows you how:

You can now create an Image object by calling its default constructor. To add the byte array to it as a Base64 string, all you need to do is pass the array to its encodeContent() method.

Step 2: Make a Request

Because the Cloud Vision API offers several different features, you must explicitly specify the feature you are interested in while making a request to it. To do so, you must create a Feature object and call its setType() method. The following code shows you how to create a Feature object for face detection only:

Using the Image and the Feature objects, you can now compose an AnnotateImageRequest instance.

Note that an AnnotateImageRequest object must always belong to a BatchAnnotateImagesRequest object because the Cloud Vision API is designed to process multiple images at once. To initialize a BatchAnnotateImagesRequest instance containing a single AnnotateImageRequest object, you can use the Arrays.asList() utility method.

To actually make the face detection request, you must call the execute() method of an Annotate object that's initialized using the BatchAnnotateImagesRequest object you just created. To generate such an object, you must call the annotate() method offered by the Google API Client for Cloud Vision. Here's how:

Step 3: Use the Response

Once the request has been processed, you get a BatchAnnotateImagesResponse object containing the response of the API. For a face detection request, the response contains a FaceAnnotation object for each face the API has detected. You can get a list of all FaceAnnotation objects using the getFaceAnnotations() method.

A FaceAnnotation object contains a lot of useful information about a face, such as its location, its angle, and the emotion it is expressing. As of version 1, the API can only detect the following emotions: joy, sorrow, anger, and surprise.

To keep this tutorial short, let us now simply display the following information in a Toast:

  • The count of the faces
  • The likelihood that they are expressing joy

You can, of course, get the count of the faces by calling the size() method of the List containing the FaceAnnotation objects. To get the likelihood of a face expressing joy, you can call the intuitively named getJoyLikelihood() method of the associated FaceAnnotation object. 

Note that because a simple Toast can only display a single string, you'll have to concatenate all the above details. Additionally, a Toast can only be displayed from the UI thread, so make sure you call it after calling the runOnUiThread() method. The following code shows you how:

You can now go ahead and run the app to see the following result:

Face detection results

5. Reading Text

The process of extracting strings from photos of text is called optical character recognition, or OCR for short. The Cloud Vision API allows you to easily create an optical character reader that can handle photos of both printed and handwritten text. What's more, the reader you create will have no trouble reading angled text or text that's overlaid on a colorful picture.

The API offers two different features for OCR:

  • TEXT_DETECTION, for reading small amounts of text, such as that present on signboards or book covers
  • and DOCUMENT_TEXT_DETECTION, for reading large amounts of text, such as that present on the pages of a novel

The steps you need to follow in order to make an OCR request are identical to the steps you followed to make a face detection request, except for how you initialize the Feature object. For OCR, you must set its type to either TEXT_DETECTION or DOCUMENT_TEXT_DETECTION. For now, let's go with the former.

You will, of course, also have to place a photo containing text inside your project's res/raw folder. If you don't have such a photo, you can use this one, which shows a street sign:

Sample photo for text detection

You can download a high-resolution version of the above photo from Wikimedia Commons.

In order to start processing the results of an OCR operation, after you obtain the BatchAnnotateImagesResponse object, you must call the getFullTextAnnotation() method to get a TextAnnotation object containing all the extracted text.

You can then call the getText() method of the TextAnnotation object to actually get a reference to a string containing the extracted text.

The following code shows you how to display the extracted text using a Toast:

If you run your app now, you should see something like this:

Text detection results


In this tutorial you learned how to use the Cloud Vision API to add face detection, emotion detection, and optical character recognition capabilities to your Android apps. I'm sure you'll agree with me when I say that these new capabilities will allow your apps to offer more intuitive and smarter user interfaces.

It's worth mentioning that there's one important feature that's missing in the Cloud Vision API: face recognition. In its current form, the API can only detect faces, not identify them.

To learn more about the API, you can refer to the official documentation.

And meanwhile, check out some of our other tutorials on adding computer learning to your Android apps!

Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.