7 days of WordPress plugins, themes & templates - for free!* Unlimited asset downloads! Start 7-Day Free Trial
  1. Code
  2. HTML & CSS

HTML5 Audio and Video: What You Must Know

Read Time: 29 mins
This post is part of a series called HTML5 and You.
How to Make All Browsers Render HTML5 Mark-up Correctly - Even IE6
Quick Tip: Learning About HTML5 Local Storage

In promotion of what I consider to be the best HTML5 book currently available on the market, Remy Sharp and Bruce Lawson agreed to donate a chapter of Introducing HTML5 to our readers, which details the ins and outs of working with HTML5 video and audio. This article also has been updated more recently to contain newer information on browser support.

A LONG TIME AGO, in a galaxy that feels a very long way away, multimedia on the web was limited to tinkling MIDI tunes and animated GIFs. As bandwidth got faster and compression technologies improved, MP3 music supplanted MIDI, and real video began to gain ground. All sorts of proprietary players battled it out—Real Player, Windows Media, and so on—until one emerged as the victor in 2005: Adobe Flash, largely because of the ubiquity of its plugin and the fact that it was the delivery mechanism of choice for YouTube. However, Flash was insecure, buggy, and not well integrated with the web, so HTML5, an open standard, eventually replaced Flash for most websites.

In this article, we will talk about the audio and video elements of HTML5.

Native Multimedia: Why, What, and How?

In 2007, Anne van Kesteren wrote to the Working Group:

“Opera has some internal experimental builds with an implementation of a <video> element. The element exposes a simple API (for the moment) much like the Audio() object: play(), pause(), stop(). The idea is that it works like <object> except that it has special <video> semantics much like <img> has image semantics."

While the API has increased in complexity, van Kesteren’s original announcement is now implemented in all major browsers, including Internet Explorer.

An obvious companion to a <video> element is an <audio> element; they share many similar features, so in this chapter we'll discuss them together and only note the differences.

<video>: Why Do You Need a <video> Element?

Previously, if developers wanted to include video in a web page, they had to make use of the <object> element, which is a generic container for “foreign objects." Due to browser inconsistencies, they would also need to use the previously invalid <embed> element and duplicate many parameters. This resulted in code that looked much like this:

<embed> is finally standardized in HTML5; it was never part of any previous flavor of (X)HTML.

This code is ugly and ungainly. Worse than that is the fact that the browser has to pass the video off to a third-party plugin; hope that the user has the correct version of that plugin (or has the rights to download and install it, or the knowledge of how to); and then hope that the plugin is keyboard accessible—along with all the other unknowns involved in handing the content to a third-party application.

Plugins can also be a significant cause of browser instability and can create worry in less technical users when they are prompted to download and install newer versions.

Whenever you include a plugin in your pages, you’re reserving a certain drawing area that the browser delegates to the plugin. As far as the browser is concerned, the plugin’s area remains a black box—the browser does not process or interpret anything that is happening there.

Normally, this is not a problem, but issues can arise when your layout overlaps the plugin’s drawing area. Imagine, for example, a site that contains a movie but also has JavaScript or CSS-based dropdown menus that need to unfold over the movie. By default, the plugin’s drawing area sits on top of the web page, meaning that these menus will strangely appear behind the movie.

Problems and quirks can also arise if your page has dynamic layout changes. If the dimensions of the plugin’s drawing area are resized, this can sometimes have unforeseen effects—a movie playing in the plugin may not resize, but instead simply be cropped or display extra white space. HTML5 provides a standardized way to play video directly in the browser, with no plugins required.

One of the major advantages of the HTML5 video element is that, finally, video is a full-fledged citizen on the web. It’s no longer shunted off to the hinterland of <object> or the non-validating <embed> element.

So now, <video> elements can be styled with CSS; they can be resized on hover using CSS transitions, for example. They can be tweaked and redisplayed onto <canvas> with JavaScript. Best of all, the innate hackability that open web standards provide is opened up. Previously, all your video data was locked away; your bits were trapped in a box. With HTML5 multimedia, your bits are free to be manipulated however you want.

So long as the http end point is a streaming resource on the web, you can just point the <video> or <audio> element at it to stream the content.

Anatomy of the Video Element

At its simplest, including video on a page in HTML5 merely requires this code:

The .ogv file extension is used here to point to an Ogg Theora video.

Similar to <object>, you can put fallback markup between the tags, for older web browsers that do not support native video. You should at least supply a link to the video so users can download it to their hard drives and watch it later on the operating system’s media player.

HTML5 video in a modern browser and fallback content in a legacy browser.
HTML5 video in a modern browser and fallback content in a legacy browser.

However, this example won’t actually do anything just yet. All you can see here is the first frame of the movie. That’s because you haven’t told the video to play, and you haven’t told the browser to provide any controls for playing or pausing the video.


You can tell the browser to play the video or audio automatically, but you almost certainly shouldn’t, as many users (and particularly those using assistive technology, such as a screen reader) will find it highly intrusive. Users on mobile devices probably won’t want you using their bandwidth without them explicitly asking for the video. Nevertheless, here’s how you do it:

Because of the annoyance of autoplaying video, some browsers like Safari on iOS require user interaction to start playing a video, so autoplaying is not always reliable.


Providing controls is approximately 764 percent better than autoplaying your video. You can use some simple JavaScript to write your own (more on that later) or you can tell the browser to provide them automatically:

Naturally, these differ between browsers, in the same way that form controls do, for example, but you’ll find nothing too surprising. There’s a play/pause toggle, a seek bar, and volume control.

These controls are possible to tab in to and therefore are much more accessible than plugins like Flash were, as they are much better integrated with the rest of the HTML.

If the <audio> element has the controls attribute, you’ll see them on the page. Without the attribute, nothing is rendered visually on the page at all, but is, of course, there in the DOM and fully controllable via JavaScript and the new APIs.


The poster attribute points to an image that the browser will use while the video is downloading, or until the user tells the video to play. (This attribute is not applicable to <audio>.) It removes the need for additional tricks like displaying an image and then removing it via JavaScript when the video is started.

If you don’t use the poster attribute, the browser shows the first frame of the movie, which may not be the representative image you want to show.

height, width

These attributes tell the browser the size in pixels of the video. (They are not applicable to <audio>.) If you leave them out, the browser uses the intrinsic width of the video resource, if that is available. Otherwise it is the intrinsic width of the poster frame, if that is available. Otherwise it is 300 pixels. It is recommended to use this to prevent content shifting as the video loads.

If you specify one value but not the other, the browser adjusts the size of the unspecified dimension to preserve the video’s aspect ratio.

If you set both width and height to an aspect ratio that doesn’t match that of the video, the video is not stretched to those dimensions but is rendered “letter-boxed" inside the video element of your specified size while retaining the aspect ratio.


The loop attribute is another Boolean attribute. As you would imagine, it loops the media playback.


Maybe you’re pretty sure that the user wants to activate the media (they've drilled down to it from some navigation, for example, or it’s the only reason to be on the page), but you don’t want to use autoplay. If so, you can suggest that the browser preload the video so that it begins buffering when the page loads in the expectation that the user will activate the controls.

There are three spec-defined states of the preload attribute. If you just say preload, the user agent can decide what to do. A mobile browser may, for example, default to not preloading until explicitly told to do so by the user.

A suggestion to the browser that it should begin downloading the entire file. Note that we say “suggestion." The browser may ignore this—perhaps because it detected a very slow connection or a setting in a mobile browser “Never preload media" to save the user’s bandwidth.

This state suggests to the browser that it shouldn’t preload the resource until the user activates the controls.

This state suggests to the browser that it should just prefetch metadata (dimensions, first frame, track list, duration, and so on) but that it shouldn’t download anything further until the user activates the controls. Most modern browsers default to preload=meta.


As on an <img>, this attribute points to the file to be displayed. However, because not all browsers can play the same formats, in production environments you need to have more than one source file. We’ll cover this in the next section. Using a single source file with the src attribute is only really useful for rapid prototyping or for intranet sites where you know the user’s browser and which codecs it supports.

Codecs: The Horror, the Horror

Currently, the most well-supported format is the h.264 format, used in MP4 files. It has nearly universal browser support, with 96% of users supporting it according to CanIUse. However, h.264 is proprietary and is often less optimized than more modern codecs nowadays.

Luckily, there are more optimized and royalty-free codecs nowadays. Using the webM container format, which is supported almost universally, you can use codecs like VP9 and AV1. VP9 is widely supported and compresses better than h.264. AV1 compresses even better than VP9, but because it is new, it is not universally supported yet. AV1 is currently not supported in Safari.

For an optimal experience for the largest group of people, use VP9 with webM, and keep MP4 as a fallback. You can even add in AV1 video for supporting browsers if you want.

Multiple <source> Elements

To do this, you need to encode your multimedia twice: once as Theora and once as H.264 in the case of video, and in both Vorbis and MP3 for audio. You can also choose more modern codecs to test by adapting the code below.

Then, you tie these separate versions of the file to the media element. Instead of using the single src attribute, you nest separate <source> elements for each encoding with appropriate type attributes inside the <audio> or <video> element and let the browser download the format that it can display.

Note that in this case we do not provide a src attribute in the media element itself:

Line 1 tells the browser that a video is to be inserted and to give it default controls.

Line 2 offers an Ogg Theora video and uses the type attribute to tell the browser what kind of container format is used (by giving the file’s MIME type) and what codec was used for the encoding of the video and the audio stream. We could also offer a WebM video here as a high-quality royalty-free option. Notice that we used quotation marks around these parameters. If you miss out on the type attribute, the browser downloads a small bit of each file before it figures out that it is unsupported, which wastes bandwidth and could delay the media playing.

The content between the tags is fallback content only for browsers that do not support the <video> element at all. A browser that understands HTML5 video but can’t play any of the formats that your code points to will not display the “fallback" content between the tags. This has bitten me on the bottom a few times. Sadly, there is no video record of that.

Line 3 offers an H.264 video. The codec strings for H.264 and AAC are more complicated than those for Ogg because there are several profiles for H.264 and AAC. Higher profiles require more CPU to decode, but they are better compressed and take less bandwidth.

Inside the <video> element is our fallback message, including links to both formats for browsers that can natively deal with neither video type but which is probably on top of an operating system that can deal with one of the formats, so the user can download the file and watch it in a media player outside the browser.

OK, so that’s native HTML5 video for all users of modern browsers. What about users of legacy browsers—including Internet Explorer 8 and older?

Video for Legacy Browsers

Almost every browser supports HTML Video now. Only do this if you have to support very old versions of browsers like IE 8

Older browsers can’t play native video or audio, bless them. But if you’re prepared to rely on plugins, you can ensure that users of older browsers can still experience your content in a way that is no worse than they currently get.

Remember that the contents of the <video> element can contain markup, like the text and links in the previous example? Because the MP4 file type can also be played by the Flash player plugin, you can use the MP4 movie in combination as a fallback for Internet Explorer 8 and older versions of other browsers.

The code for this is as hideous as you’d expect for a transitional hack, but it works everywhere a Flash Player is installed. You can see this nifty technique in an article called “Video for Everybody!" by its inventor, Kroc Camen.

Alternatively, you could host the fallback content on a video hosting site and embed a link to that between the tags of a video element:

You can use the html5media library to hijack the <video> element and automatically add the necessary fallback by adding one line of JavaScript in the head of your page.

Encoding Royalty-Free Video and Audio

Ideally, you should start the conversion from the source format itself, rather than recompressing an already compressed version. Double compression can seriously reduce the quality of the final output.

On the audio side of things, the open-source audio editing software Audacity has built-in support for most modern codecs. For video conversion, there are a few good choices. For .WebM, you can look at www.webmproject.org/tools/ for the growing list.

To convert video files to a different format, you can use a tool like VLC.

Converting files between formats can also be automated and handled server-side. For instance, in a CMS environment, you may not be able to control the format in which authors upload their files, so you may want to do compression at the server end. The open-source ffmpeg library can be installed on a server to bring industrial-strength conversions of uploaded files (maybe you’re starting your own YouTube-killer?).

If you’re worried about storage space and you’re happy to share your media files (audio and video) under one of the various CC licenses, have a look at the Internet Archive, which will convert and host them for you. Just create a password and upload, and then use a <video> element on your page but link to the source file on their servers.

Sending Differently Compressed Videos to Handheld Devices

Video files tend to be large, and sending very high-quality video can be wasteful if sent to handheld devices where the small screen sizes make high quality unnecessary. There’s no point in sending high-definition video meant for a widescreen monitor to a handheld device screen. Compressing a video down to a size appropriate for a small screen can save a lot of bandwidth, making your server and—most importantly—your mobile users happy.

HTML5 allows you to use the media attribute on the source element, which queries the browser to find out the screen size (or number of colors, aspect ratio, and so on) and send different files that are optimized for different screen sizes.

We use min-device-width rather than min-width to cater to devices that have a viewport into the content—that is, every full-featured smartphone browser—as this gives us the width of the viewport display.

This functionality and syntax is borrowed from the CSS Media Queries specification but is part of the markup, as we’re switching source files depending on device characteristics. In the following example, the browser is “asked" if it has a min-device-width of 800px—that is, does it have a wide screen? If it does, it receives hi-res.ogv; if not, it is sent lo-res.ogv:

Also note that you should still use the type attribute with codecs parameters and fallback content previously discussed. We’ve just omitted those for clarity.

Rolling Custom Controls

One truly spiffing aspect of the media element, and therefore the audio and video elements, is that the JavaScript API is super easy. The API for both audio and video descend from the same media API, so they’re nearly exactly the same. The only difference in these elements is that the video element has height and width attributes and a poster attribute. The events, the methods, and all other attributes are the same. With that in mind, we’ll stick with the sexier media element: the <video> element for our JavaScript discussion.

As you saw at the start of this chapter, Anne van Kesteren talks about the new API and that we have new simple methods such as play(), pause() (there’s no stop method: simply pause and move to the start), load(), and canPlayType(). In fact, that’s all the methods on the media element. Everything else is events and attributes.

Using JavaScript and the new media API, you can create and manage your own video player controls. In our example, we walk you through some of the ways to control the video element and create a simple set of controls. Our example won’t blow your mind—it isn’t nearly as sexy as the video element itself (and is a little contrived!)—but you’ll get a good idea of what’s possible through scripting. The best bit is that the UI will be all CSS and HTML. So if you want to style it your own way, it’s easy with just a bit of web standards knowledge—no need to edit an external Flash player or similar.

Our hand-rolled basic video player controls will have a play/pause toggle button and allow the user to scrub along the timeline of the video to skip to a specific section.

Custom video controls

Our starting point will be a video with native controls enabled. We’ll then use JavaScript to strip the native controls and add our own, so that if JavaScript is disabled, the user still has a way to control the video as we intended:

Play, Pause, and Toggling Playback

Next, we want to be able to play and pause the video from a custom control. We’ve included a button element that we’re going to bind a click handler and do the play/pause functionality from. Throughout my code examples, when I refer to the play variable, it will refer to the button element:

We’re using & #25BA, which is a geometric XML entity that looks like a play button. Once the button is clicked, we’ll start the video and switch the value to two pipes using & #x2590, which looks (a little) like a pause.

Using XML Entities

For simplicity, I’ve included the button element as markup, but as we’re progressively enhancing our video controls, all of these additional elements (for play, pause, scrubbing, and so on) should be generated by the JavaScript.

In the play/pause toggle, we have a number of things to do:

  • If the video is currently paused, start playing, or if the video has finished then we need to reset the current time to 0, that is, move the playhead back to the start of the video.
  • Change the toggle button’s value to show that the next time the user clicks, it will toggle from pause to play or play to pause.
  • Finally, we play (or pause) the video:

The problem with this logic is that we’re relying entirely on our own script to determine the state of the play/pause button. What if the user was able to pause or play the video via the native video element controls somehow (some browsers allow the user to right-click and select to play and pause the video)? Also, when the video comes to the end, the play/pause button would still show a pause icon. Ultimately, we need our controls to always relate to the state of the video.

Eventful Media Elements

The media elements fire a broad range of events: when playback starts, when a video has finished loading, if the volume has changed, and so on. So, getting back to our custom play/pause button, we strip the part of the script that deals with changing its visible label:

In the simplified code, if the video has ended, we reset it, then toggle the playback based on its current state. The label on the control itself is updated by separate (anonymous) functions we’ve hooked straight into the event handlers on our video element:

Now, whenever the video is played, paused, or has reached the end, the function associated with the relevant event is fired, making sure that our control shows the right label.

Now that we’re handling playing and pausing, we want to show the user how much of the video has downloaded and therefore how much is playable. This would be the amount of buffered video available. We also want to catch the event that says how much video has been played, so we can move our visual slider to the appropriate location to show how far through the video we are. Finally, and most importantly, we need to capture the event that says the video is ready to be played, that is, there’s enough video data to start watching.

seekable content

Monitoring Download Progress

The media element has a “progress" event, which fires once the media has been fetched but potentially before the media has been processed. When this event fires, we can read the video. seekable object, which has a length, start(), and end() method. We can update our seek bar using the following code (where the buffer variable is the element that shows how much of the video we can seek and has been downloaded):

The code binds to the progress event, and when it fires, it gets the percentage of video that can be played back compared to the length of the video. Note that the keyword this refers to the video element, as that’s the context in which the updateSeekable function will be executed, and the duration attribute is the length of the media in seconds.

However, there’s sometimes a subtle issue in Firefox in its video element that causes the video.seekable.end() value not to be the same as the duration. Or rather, once the media is fully downloaded and processed, the final duration doesn’t match the video.seekable.end() value. To work around this issue, we can also listen for the durationchange event using the same updateSeekable function. This way, if the duration does change after the last process event, the durationchange event fires, and our buffer element will have the correct width:

When the Media File Is Ready to Play

When your browser first encounters the video (or audio) element on a page, the media file isn’t ready to be played just yet. The browser needs to download and then decode the video (or audio) so it can be played. Once that’s complete, the media element will fire the canplay event. Typically, this is the time you would initialize your controls and remove any “loading" indicator. So our code to initialize the controls would typically look like this:

Nothing terribly exciting there. The control initialization enables the play/pause toggle button and resets the playhead in the seek bar.

The events to do with loading fire in the following order: loadstart, durationchange, loadeddata, progress, canplay, canplaythrough.

However, sometimes this event won’t fire right away (or at least when you’re expecting it to fire). Sometimes, the video suspends download because the browser is trying to save downloading too much for you. That can be a headache if you’re expecting the canplay event, which won’t fire unless you give the media element a bit of a kicking.

So instead, we’ve started listening for the loadeddata event. This says that there’s some data that’s been loaded, though not particularly all the data. This means that the metadata is available (height, width, duration, and so on) and some media content—but not all of it. By allowing the user to start to play the video at the point in which loadeddata has fired, it forces browsers like Firefox to go from a suspended state to downloading the rest of the media content, allowing it to play the whole video. So, in fact, the correct point in the event cycle to enable the user interface is the loadeddata:

Preloading Metadata

A recent addition to the media element is the preload attribute (so new that it’s not supported in browsers right now). It allows developers to tell browsers only to download the header information about the media element, which would include the metadata. If support for this attribute does make its way into browsers, it stands to reason we should listen for the loadedmetadata event over the loadeddata event if you wanted to initalise the duration and slider controls of the media.

Fast Forward, Slow Motion, and Reverse

The spec provides an attribute, playbackRate. By default, the assumed playbackRate is 1, meaning normal playback at the intrinsic speed of the media file. Increasing this attribute speeds up the playback; decreasing it slows it down. Negative values indicate that the video will play in reverse.

Not all browsers support playbackRate yet (only WebKit-based browsers support it right now), so if you need to support fast forward and rewind, you can hack around this by programmatically changing currentTime:

As you can see from the previous example, if playbackRate is supported, you can set positive and negative numbers to control the direction of playback. In addition to being able to rewind and fast forward using the playbackRate, you can also use a fraction to play the media back in slow motion using video.playbackRate = 0.5, which plays at half the normal rate.

Multimedia Accessibility

We’ve talked about the keyboard accessibility of the video element, but what about transcripts and captions for multimedia? After all, there is no alt attribute for video or audio as there is for <img>. The fallback content between the tags is only meant for browsers that can’t cope with native video, not for people whose browsers can display the media but can’t see or hear it due to disability or situation (for example, in a noisy environment or needing to conserve bandwidth).

The theory of HTML5 multimedia accessibility is excellent. The original author should make a subtitle file and put it in the container Ogg or MP4 file along with the multimedia files, and the browser will offer a user interface whereby the user can get those captions or subtitles. Even if the video is “embedded" on 1,000 different sites (simply by using an external URL as the source of the video/audio element), those sites get the subtitling information for free, so we get “write once, read everywhere" accessibility.

In order to do this, you can use the <track> connected to a WebVTT file.

You can put this inside a <video> element, and add a srclang attribute if you have versions for multiple different languages.

The data-* Attributes (Custom Data Attributes)

HTML5 allows custom attributes on any element. These can be used to pass information to local scripts.

Previously, to store custom data in your markup, authors would do something annoying like using classes: <input class="spaceship shields-5 lives-3 energy-75">. Then your script would need to waste time grabbing these class names, such as shields-5, splitting them at a delimiter (a hyphen in this example) to extract the value. In his book, PPK on JavaScript (New Riders, ISBN 0321423305), Peter Paul Koch explains how to do this and why he elected to use custom attributes in some HTML4 pages, making the JavaScript leaner and easier to write but also making the page technically invalid. As it’s much easier to use data-shields=5 for passing name/value pairs to scripts, HTML5 legitimizes and standardizes this useful, real-world practice.

We’re using data-begin and data-end; they could just as legitimately be data-start and data-finish, or (in a different genre of video) data-ooh-matron and data-slapandtickle. Like choosing class or id names, you should pick a name that matches the semantics.

Custom data attributes are only meant for passing information to the site’s own scripts, for which there are no more appropriate attributes or elements.

The spec says, “These attributes are not intended for use by software that is independent of the site that uses the attributes" and are therefore not intended to pass information to crawlers or third-party parsers. That’s a job for microformats, microdata, or RDFa.

When the data-* attributes are fully supported in a browser, JavaScript can access the properties using element.dataset.foo (where the data-foo attribute contains the value). Support can be emulated using JavaScript by extending the HTMLElement object, which typically isn’t possible in IE9 alpha release and below, which you can see here. Otherwise scripts can access the values via the get/setAttribute methods. The advantage of the dataset property over setAttribute is that it can be enumerated, but also, when fully implemented in browsers, setting a dataset attribute automatically sets the content attribute on the element giving you a shorthand syntax for setting custom data.

For more information, see the spec.


You’ve seen how HTML5 gives you the first credible alternative to third-party plugins. The incompatible codec support currently makes it harder than using plugins to simply embed video in a page and have it work cross-browser.

On the plus side, because video and audio are now regular elements natively supported by the browser (rather than a “black box" plugin) and offer a powerful API, they’re extremely easy to control via JavaScript. With nothing more than a bit of web standards knowledge, developers can easily build their own custom controls or do all sorts of crazy video manipulation with only a few lines of code. As a safety net for browsers that can't cope, we recommend that you also add links to download your video files outside the <video> element.

There are already a number of ready-made scripts available that allow you to easily leverage the HTML5 synergies in your own pages, without having to do all the coding yourself. The Kaltura player is an open-source video player that works in all browsers. jPlayer is a very liberally licensed jQuery audio player that degrades to Flash in legacy browsers, can be styled with CSS, and can be extended to allow playlists.

Accessing video with JavaScript is more than writing new players. In the next chapter, you’ll learn how to manipulate native media elements for some truly amazing effects. Or at least, our heads bouncing around the screen—and who could conceive of anything more amazing than that?

Buy the Book

This article was excerpted from Introducing HTML5 by Bruce Lawson and Remy Sharp. Copyright ©2011. Used with permission of Pearson Education, Inc. and New Riders.

Did you find this post useful?
Want a weekly email summary?
Subscribe below and we’ll send you a weekly email summary of all new Code tutorials. Never miss out on learning about the next big thing.
Scroll to top
Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.