1. Code
  2. Media

Generate Video Captions with Soundbooth and Flash

Read Time:11 minsLanguages:

It seems as though we are inundated with video on the web. In many respects, this is not an unexpected phenomena but video is undergoing a rather quiet shift in perception. For the past several years, video has been regarded as content taking up space on the web page. A couple of years back, it slowly dawned on a few people that even though video is indeed content, it is the video’s data that could have huge potential.

In this tutorial I am going to show you one example of how this works. We are going to take the audio track out of a video file, transcribe it and then use that transcription to create video captions in Flash.

To follow this tutorial from start to finish you will need to have Soundbooth CS4, Dreamweaver CS4 and Flash CS4 installed on your computer. If you don’t have Soundbooth I have included a copy of the transcription - ZombieTranscript.txt - in the Examples folder. If you don’t have Dreamweaver, I have included a copy of ZombiesTT.xml that you can use in Flash.

Starting with the Audio

Before we begin let’s get really clear on the terminology we will be using. A transcription takes the spoken word and turns it into the printed word. A caption displays a piece of that transcription in a Flash video project to make a video accessible.

You will see how all of this works in a couple of minutes but you need to clearly understand, we are in the early days of this process, meaning you simply can’t expect to magically turn the audio track into captions in Flash with a couple of mouse clicks. That day is not here, yet, but compared to the conniptions and contortions we had to go through a few years ago to make this happen, today’s process is a "walk in the park."

I would like to thanks Stephanie Skendaris, a student in my College’s Journalism Program for permission to use this clip. She has a bright future in front of her and if you want to check out her work, she appears regularly at which is a lab for the Journalism and Radio programs.

Let’s get started!

Step 1: Launch Soundbooth CS4

When the app launches select Window > Workspace > Edit Audio to Video. This selection adds a video window to the workspace.

Step 2: Preview

Preview the video by clicking the Play button at the bottom of the interface.

Click the Play button to play the video. While the video is playing, listen to the audio because you will be transcribing it next.

Step 3: Open the Metadata Panel

If you can’t see it on the left side of the interface select Window > Metadata or press Command-7 (Mac) or Ctrl-7 (PC). Soundbooth’s transcription feature is built into this panel.

Step 4: Transcribe

Click the Transcribe ... button at the bottom of the Metadata panel, this will open the Speech Transcription Options panel.

Step 5: Select a Language

In this case select English- Canadian because the school is located in Toronto. Choose a Quality option from the Quality pop down. In this case select High (slower) which takes more time to complete but will have fewer errors. Don’t select Identify Speakers because there is only one in this clip.

Step 6: Start the Transcription Process

Click OK to start the transcription process. A Progress Bar, giving you an idea of how much time this will take, appears.

Step 7: Edit the Transcription

Here is the correct transcription:

It's not a typical autumn afternoon in the city. Today, the dead take to the streets for Toronto's seventeen annual Zombie Walk. This year’s route took the zombies on a winding trek through downtown Toronto, from Trinity Bellwoods Park all the way up to the Bloor Cinema. The walk ended here at the Bloor Cinema where the undead are now watching a collection of zombie-themed films supplied by their friends at the "After Dark" Film Festival". Organizers say this has been one of the most successful ones ever. I'm Stephanie Skendaris for

Obviously there are quite a few mistakes, even though you requested high quality.

The interesting aspect of this transcription is, if you click the Play button, the video will play, but if you watch the transcription as the playhead moves, the word under the playhead is highlighted in blue and the word, itself, will be "bracketed" on the timeline. In this manner you can identify where the errors can be found.

To edit a word in the transcription, for example add a period after the word "city", simply double-click the word and make the change. If words are missing or extraneous they can be added or deleted by right-clicking (PC) or Cmd-Clicking (Mac) and making your choice in the resulting pop down menu.

To move from word-to-word in the transcript, use the Tab key. To move from character-to-character in a word, use the left and Right arrow keys.

Step 8: Decision Making Time

Make a decision based upon how you plan to use the transcription. There are two routes to go:

  • Use the text of the transcript for use by the Captioning Component in Flash; or,
  • Convert the transcript into an XML document that can be embedded, as Cue Points, into an FLV file. From there you can use ActionScript’s CUE_POINT constant that is part of the MetadataEvent class to display the captions in a dynamic text box.

The down side of the first option is extra weight in the SWF thanks to the fact you need to use both the FLVPlayback component and the FLVPlayback Captioning component. The downside to the second approach is the resulting XML file will be embedded into the FLV file and can’t be changed.

The up side to the first approach is it is "ActionScript-free" and allows you to change the XML and have those changes instantly reflected in the SWF. The up side to the second approach is a relatively miniscule SWF.

In the interests of time, I am going to take the path of least resistance and go the component route. An additional reason is the XML used by the component meets the W3 Timed Text standard. The other reason is the XML required is far more elegant and a ton easier to use and follow than the XML created by Soundbooth.

Still, some of you may be interested in how a transcript becomes XML and CuePoints in an FLV.

Creating XML and Soundbooth markers

Though we are not going to go this route in this exercise, I raised the issue and should follow through. Though I am not a huge fan of this technique, it is a great way of getting precise Event Cue Points into an FLV and then using those cue points in Flash. Here’s how:

Step 9: Export

Select File > Export > Speech Transcription.

Step 10: Save

Name the file and save it to the Exercise folder.

Step 11: Import Markers

Select File > Import > Markers.

Step 12: XML

Navigate to the XML file, select it and click Open. The markers will appear on the Soundbooth timeline. This step is how the XML transcript is pulled back into Soundbooth and used to create Event Cue Points in an FLV.

Step 13: Save As

Select File > Save As. When the Save As dialog box opens, select FLV/f4v from the Format pop down and rename the file. Click Save. This will launch the Adobe Media Encoder.

Step 14: FLV

Choose FLV as the format. Before you do that though, take a look at the Cue Points area under the preview. Notice how they are now embedded into the file.

At this point you could choose to encode the video and then use the cue points embedded into the FLV to either create captions or make video searchable. If you want to explore this topic further Marcus Geduld and Richard Harrington cover it quite nicely here , click the link at the bottom of the page to download the the chapter, from their book "After Effects for Flash, Flash for After Effects".

Creating Captioned Video in Flash

Prior to the release of Flash CS3, captioning video was right up there with beating yourself in the head with a brick. It wasn’t easy and was to be avoided whenever possible. Still one of the impediments to the otherwise rapid adoption of Flash Video was accessibility which was being increasingly demanded by governments around the world. The introduction of the FLVPlayback Captioning component in Flash CS3 addressed that issue. Here’s how:

Step 15: XML

Enter the following XML code into a Text editor or Dreamweaver CS4:

When you finish, save the document as an .xml document.

What you have just entered is the XML format - TimedText - used by the component. If you open the ZombiesSB.xml document produced by Soundbooth you will notice there is a huge difference between them. If you are going to use the component you must use the TimedText format.

As you can see, you can set the styling for each caption and that each caption has a start and an end point. These numbers were obtained from Soundbooth but you can use QuickTime, the Adobe Media Encoder or any other software that shows you the time code.

Each caption you write must contain a begin attribute, which determines when the caption appears. If you omit the dur attribute the caption will remain on screen until the next caption appears or the video ends. You also might find it interesting to know that dur attribute tells Flash how long the caption will remain on the screen. If you use end, instead, that tells Flash the time when the caption is to disappear.

Step 16: ActionScript

Launch a new Flash Actionscript 3.0 document.

Step 17: Save

Save the fla to the same folder as the .xml file and the .f4v file included in the download.

Step 18: Layers

Add two more layers to the Flash timeline. Name them: Video, Caption and Text .

Step 19: FLVPlayback Component

Add an FLVPlayback component to the Video layer. Open the Component Inspector and set the Content path for the component to Zombies.f4v and select the SkinUnderPlaySeekCaption.swf in the Skin parameter. The component will spring out to the size of the flv - 873 by 480. Resize the stage to 873 by 600 to accommodate the skin and the captions.

Step 20: FLVPlaybackCaptioning Component

Add the FLVPlaybackCaptioning component to the Caption layer. You can put the component anywhere on the stage but I prefer to stick it out of the way on the pasteboard. It just needs to be in the movie for it to work.

Step 21: The Component Inspector

Select the Captioning component and open the Component Inspector. You only need to do two thing here: Tell the Component where to put the captions and where the XML file is located. Set the captionTargetName parameter to txt and the source to ZombiesTT.xml.

If you leave the captionTargetName parameter at the default, auto, the captions will appear over the video.

Step 22: Dynamic Text

Add a Dynamic Text Box to the bottom of the stage and give it the instance name of txt. This is where the caption will appear. A safe bet for the text in the text box is to use Arial or _sans.

Step 23: Test the Movie


We've covered a lot of ground in this exercise. I have shown you how to turn the audio track of a video into a transcription in Soundbooth CS4. You also saw how that transcription was changed into audio markers in Soundbooth and how those audio markers are subsequently turned into event cue points embedded into an FLV file. The interesting aspect of all of this is the entire process can be completed without leaving Soundbooth.

I also showed you how to write a TimedText XML file and how to use that file to provide captions for video playing in the FLVPlayback component. I hope you enjoyed it!

Note: In preparing this tutorial I noticed that when I opened the TimedText xml in Dreamweaver CS4, the app would crash. When the same thing happened in a class I was teaching at my College I got hold of Adobe to see if this was an issue. It is. In this tutorial I use an XML document that contains styling information. Dreamweaver CS4 gives up trying to read it as CSS and crashes. I have been told by Adobe they were unaware of this, that indeed it is a bug and that it will be fixed in the CS5 release of Dreamweaver.This is not a "deal killer". Though I show the XML in Dreamweaver CS4 (prior to its crash), you can still use any text editor to write the XML file.

Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.