1.2 Create a Voice-Controlled Android App
In this lesson, we start by creating a new app with a very minimal user interface. Then we’ll add speech synthesis and speech recognition capabilities. Finally, we’ll add the code that lets the app understand and respond to the user’s voice commands.
1.2 Create a Voice-Controlled Android App
[SOUND] Hello, and welcome back to this Coffee Break Course. Let us start by creating a new Android Studio project. I'm going to call this MyVoiceApp. Voice capabilities have been a part of the Android platform for a long time. So we can choose to support older as the case. For now, I'll choose API level 15, to make sure that this App runs on a majority of android devices. Here choose basic activity, which has a floating action button. Let's go to the defaults and press finish. The users will be pressing the floating action button to start the speech recognition process. So let us now add a mic icon to it. To do so, first create a new vector asset by going to the Vector Asset Studio. Here, choose the AV category and select this mic icon. It's called IC mic black. Press OK, next and finish. The floating icon button is defined inside activity main.xml, so open it now. Scroll down to the bottom and change the value of the source combat attribute to @drawable/icmicblack. Next, use the tint attribute to change its color to white. Now you can see that we have an error here and it's because we are using a vector asset. If you hover your mouse over the error, Android Studio will tell you how to fix it. As you can see, all we need to do is change the default config slightly. So open the build.grader file of the app module. Inside the default config block, set the value of vectorDrawables.useSupportLibrary to true. And the header should be gone now. Let's start working on the actual functionality of this app. Open main activity to Java. Speech synthesis is slightly easier than Speech Recognition. So let me now show you how to use Android's Text to Speech engine. First, define our text to speech instance as a member variable of this class. I'll call it myTTS. We'll initialize the TTS instance inside a method called, initializeTextToSpeech. To initialize the TTS, you can use its constructor. In addition to our context,TTS and OninitListener, so create one now. Inside the method we must check if the Android device has NADDS engines installed. To do so, call the get engines method and check if the size of the list it returns is zero. If this is true it means no engines are installed. And that is a problem because our app won't be usable on such a device. So create a toast message using the makeText method. Give a meaningful error message. And then call finish to exit the app. Otherwise, we can start using the TTS engine. Before doing so, make sure that you set it's language using the setLanguage method. I'll set the language to US English. To let the user know that the TTS engine is ready, let us make it say something now. So call a new method called speak and pass a string to it. I'm gonna say, hello I'm ready. Now create the speak method. The TTS engine has its own speak method. However its signature varies depending on the API level. If the API level is 21 or higher, it expects four arguments. So if you call myTTS.speak here you'll see the four arguments. The first is the text it must read out. For the second argument pass TextToSpeech.QUEUE_FLUSH. For the remaining two arguments you can pass null because they aren't necessary for this app. If the API level is less than 21 we must call a deprecated version of speak. The first two arguments stay the same. And even here the last argument can be null. At this point we can run our app to check if the TTS works. Before we do that however, let's make sure that we release the TTS engine when the user closes our app. So overwrite the on pause method and call the shut down method of the engine. If you run the app now you should be able to hear your app talking to you. >> Hello I am ready. >> So that's all there is to speech synthesis. Let us now more into speech recognition. To be able to recognize speech we're gonna need speech recognizer object. I'll call it mySpeechRecognizer. To initialize the Speech Recognizer, call a new method called initialize Speech Recognizer and create it. The first thing we need to do here is check if Speech Recognition is available on the device. If it is available, initialize the SpeechRecognizer object by calling the createSpeechRecognizer method. Now this needs to be associated with a RecognitionListener object, so create one now. There are a lot of methods in this interface but we are interested in only one and that is the on results method. This method is called when the speech recognition is complete and the results of the recognition are available as a list of strings. So inside this method the results are available in the bundle as an array list extra. The name of the extra is results recognition. The recognizer returns multiple results for the same speech. Each result will have a confidence score associated with it. By selecting the first item of the list, we can pick the result with the highest score. We are now passing the result to a method called processResult. As you might have guessed, here's where we will be doing something meaningful with the result. In a real application this method can be very complex depending on how many voice commands you want to handle. And how many variations those voice commands can have. For now however they're gonna keep it quite simple. First convert the command to lower case to simplify the string processing. Now we are gonna handle exactly three commands. The first one will be what is your name? Next will be what is the time? And the last one shall be open a browser. All we need to do now is search for patterns in the command. So let us first check if the command has the substring what. If it is present, let us check if it has the string your name. In this case we can call the speak method and generate a voice reply. You can ofcourse type in anything you want here. If you find the string time, we're gonna have to generate a voice reply containing the time. Doing so is easy just generate a new date instance. And convert it into a time string, using the format date, time method of the DateUtils class. And now call the speak method again to tell the time. For the last command, check if the substring open is present. If true, check if the substring browser is present. Now you may be wondering why we are searching for individual words instead of complete commands. Well you don't have to do it this way but you must understand that the user might not say the exact command we want. For example instead of saying exactly open the browser the user could say, hey Coco, please open up browser for me. So that would be a problem if you search for extra sentences. By looking only for the words we're interested in, we don't have to bother with everything else the user says. Opening a browser is easy, just create a new intent with the action view action and pass a URI tut. So I'll pass tutsplus.com. Our speech recognizer is ready however, it still doesn't know when it should start listening. In this app it should start listening only when the user taps on the floating action button. So inside the onClick listener of the button, create a new intent and initialize it with the action recognized speech action. This intent needs a few parameters in the form of extras. The first extra is gonna be the language model. Usually, the free form model is good enough. You can control the maximum number of results the speech recognizer generates by using the max results extra. I'll say one here because we are handling only the first result with the highest confidence score. And now call the startListening method of the recognizer and pass the intent into it. We're going to use the microphone off an Android device without asking the user. So open the manifest file and add a users permission tag for the RECORD_AUDIO permission. Our app is ready. Press the run button to start using it. I would also suggest that you use a real device instead of an emulator for best results. >> Hello, I am ready. [SOUND] >> What is your name? [SOUND] >> My name is Coco. [SOUND] >> Okay, what is the time now? [SOUND] >> The time now is 9:04 PM. [SOUND] >> Please open a browser for me. [SOUND] You now know how to add voice capabilities to an Android app. Thanks for completing the course. I'm Usher Fatibelego and I hope to see you again soon.