Advertisement
  1. Code
  2. Text to Speech
Code

Let Me Hear Your Browser Talk: Using the Speech Synthesis API

by
Difficulty:IntermediateLength:MediumLanguages:

In 1968, 2001: A Space Odyssey was released. It famously featured HAL 9000, a supercomputer capable of a huge number of things: facial recognition, playing chess and even lip-reading. But the one thing that stuck in audiences' minds, and influenced every piece of science fiction since, was HAL’s ability to talk.

These days, a computer speaking a piece of text given to it is commonplace. However, only recently have you been able to do this directly in a web browser. That’s what I’m going to show you in this tutorial.

The Speech Synthesis API allows you to use JavaScript to take a piece of text and output it to your speakers as speech. As with all new APIs, it’s not implemented in all browsers, so check caniuse.com for current support. At time of writing it is supported in Chrome and Safari, both on desktop and mobile.

Why Won't You Talk to Me?

You’ll be surprised at how simple it is to get your browser to start talking to you. To begin, either create a new HTML file with a script tag inside, or pop open your browser’s JavaScript console. Then inside write the following line.

What we’re doing here is creating a new instance of a synthesized utterance. Think of this as a little envelope that contains instructions telling the browser what it should say and how to say it.

First, we’ve got to think of something extremely important for our browser to say.

Then we’re ready to go. Let’s give our message to the browser’s speech synthesizer and tell it to speak. Remember to turn up the volume on your computer beforehand.

Wow, a talking computer. As easy as that. 

Changing Voices

Now, if that wasn’t impressive enough, the Speech Synthesis API gives us a whole bunch of methods and attributes that we can use upon our “utterance” to tweak in order to change what it sounds like. The most notable of these is to change the “person” speaking. Your operating system comes with a variety of built-in voices to choose from, plus your browser throws in a few extra ones for good measure. Let’s see what voices we have available to us.

If you don't see any output then you may need to run this function again. Chrome has a strange bug where you have to request the number of voices twice for it to initialise correctly. To overcome this, do the following.

The number of voices varies from operating system to operating system, but on OS X I’ve got 74! More characters than an episode of The Simpsons. Let’s try one out.

As you can probably see, speechSynthesis.getVoices() returns an array. We could simply set the voice by doing:

This would tell the browser to use “voice 11’, which in my case is “Agnes”. Poor Agnes, reduced to a number. A nicer way to do this, and to treat Agnes as a real human being, would be to use the ECMAScript 6 method findIndex, which is supported in browsers that also support the Web Synthesis API, so we’re all good.

Now that we’ve got the index of the voices array that Agnes’s voice is in, we can set that voice to be used by our utterance.

No probs, Agnes. You scared me half to death with that loud voice of yours, though. Let’s turn you down a bit.

Volume, Rate, and Pitch

Luckily, all we need to do to quieten the voice is to say:

This sets the volume of Agnes’s voice to be half what it originally was, 0 being silent and 1 being the loudest. The parameters we can tweak don’t end there, however. Is the voice you’ve chosen speaking too slow or too fast? You can change the rate in which the voice reads out your piece of text by using the rate attribute.

The default rate in which a voice speaks is 1. So here we’re slowing it down by a fifth. The slowest rate you can specify is 0.1, while the fastest is 10. Voices also have their own rate limits, so even if you set a rate to 10, it may not speak 10 times as fast as the default rate.

Another interesting parameter you can alter is pitch. Want Agnes to sound like Barry White? Pitch is where it’s at.

Here the lowest pitch you can set is 0, while to make your voice sound like they’ve just inhaled a helium-filled chipmunk, set the pitch to 2, the highest it can go.

Events

Ok, let's have some fun now. The Speech Synthesis API has a few different events that we can play with. These events, start, end, pause, and resume amongst others, allow us to call a function when said event happens. By listening to the end event, we can call a function that starts another voice speaking, thus providing the illusion of a conversation.

Let's set up two different voices, and give each one a sentence to say. Remember, all your code should be in the setTimeout function to make sure all possible voices have loaded.

Before we start Agnes speaking, in the onend function call we set up Albert's reply like so. This means that when Agnes stops speaking, Albert will start.

Looks good. Ready to hear an in-depth conversation? Start Agnes off in the usual way.

Amazing. Your browser is now talking to itself. Skynet has become self-aware.

Advertisement
Advertisement
Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.