Sounds Like DVD
Posted Aug 11, 2003

Producing DVD's unique combination of sound and vision is quite straightforward on the video side, but audio is another story. Before you find yourself up to your ears in audio formats, channel configurations, and audio streams, what do you need to know to make your project sound like DVD?

September 2002|Authoring may be the aspect of production that sets DVD apart from its forbearers such as CD and laserdisc, but in the making of many DVD titles, asset preparation actually turns out to be a bigger, more time-consuming job. You not only have to create or obtain all the individual media elements that are specified on the project's asset list, but also to convert those elements into the form needed to integrate them into a DVD.

While video may be the focus of the DVD-Video format, the basic requirements for preparation of DVD-compliant video assets are actually quite straightforward. In contrast, the subject of the audio that accompanies DVD-Video content can be a bit more complicated because it covers a greater variety of supported formats, multiple channel configurations, and multiple audio streams per video program. To clarify the issues involved, we'll first look at the DVD-Video format's audio support, and then at the production processes typically used in audio preparation. The use of audio in DVD-Audio content, meanwhile, is a distinct story unto itself, one that we'll save for another day.

Supported Audio Formats

Before we look at the specifics of audio-for-DVD production, let's review both the possible uses of audio in DVD-Video and the specific requirements for source assets. The first thing to know is that the DVD-Video specification allows each movie—either a video clip or a series of still images (a slideshow or stillshow)—to be accompanied by up to eight independent mono, stereo, or surround sound audio streams. DVD-Video players are required to support seamless switching between these streams (using the player's remote control) during playback.

The most popular uses for these multiple audio streams are to provide different language versions of a soundtrack, or commentary tracks by the director, actors, or special effects crew. They may also be used to deliver the same soundtrack in different audio formats. Or there may be some combination of all the above. It might make sense, for instance, for a title released for North American markets to feature a 5.1-channel soundtrack in English, a stereo English commentary track, and also stereo soundtracks in Spanish and French.

A variety of formats are available for audio streams. To ensure that every player will support some form of audio playback from every disc, the specification mandates audio format requirements related to both players and discs. For NTSC, players must support playback of both Linear PCM and Dolby Digital (sometimes referred to as AC-3), and at least one of the audio streams accompanying every movie (video or stills) must be in one of those two formats. For PAL, players must support not only PCM and Dolby Digital, but also MPEG audio (MPEG-1, Layer 2, and MPEG-2). Similarly, on a PAL disc, at least one audio stream per movie must be Linear PCM, Dolby Digital, or MPEG.

As for additional audio streams in a movie, if any, they may be in any of the required formats or in an optional format, which is a format that players are allowed, but not required to support. These optional formats include DTS (Digital Theater Systems) and SDDS (Sony Dynamic Digital Sound). Of the two, DTS has been more successful in obtaining support from consumer electronics manufacturers, some of whom include DTS decoders in their DVD players or A/V processors.

PCM Possibilities
Linear PCM is uncompressed digital audio like the audio on a CD-Audio. In DVD-Video, however, both 48kHz and 96kHz sample rates are supported, as are 16-, 20-, and 24-bit word lengths. Theoretically, this means that the PCM audio on a DVD-Video may be much higher resolution than on a CD.

In addition to higher resolution, PCM may be included in discrete multichannel format, with up to eight channels used in a given stream. However, an overall limit of 6.144Mbps on all audio streams places practical limits on the number of channels that may be used at a given resolution (bit-rate and word length).

Unfortunately, while it is possible to create a DVD-Video soundtrack with high-resolution audio, it is not possible to guarantee that the viewer will actually hear the sound at full resolution. Many DVD-Video players decimate 96kHz audio to 48kHz prior to D/A conversion, and some players also truncate 20- and 24-bit samples to 16 bits. In addition, the majority of DVD-Video players on the market provide only two outputs for discrete audio channels.

Compression and Bit-rates
Unlike the situation for discrete multichannel sound using PCM, the DVD-Video format provides solid support for multichannel sound in data-compressed audio formats. Dolby Digital, for instance, delivers up to 5.1 channels, meaning five full-range channels (left front, center, right front, left rear, right rear) and a limited-bandwidth low-frequency-effects (LFE) channel (the ".1" channel) that is frequently thought of as the subwoofer channel.

Dolby Digital is the most popular choice for surround sound streams not only because of its mandatory support in players, but also because it yields acceptable quality while requiring relatively little bandwidth. Bandwidth is a major factor influencing the choice of audio format for a given program because DVD-Video's total available bandwidth for audio, video, and subpictures combined is limited to 9.8Mbps. As more of that bandwidth is allocated to audio, less of it is available for video.

Audio streams in linear PCM format use at least 768kbps per channel (16-bit/48kHz), making 1.536Mbps for each stereo stream. That adds up quickly when the title design calls for multiple streams. While total audio bandwidth is limited to 6.144Mbps, most DVD developers never come close to that limit. Instead they choose to devote the bulk of the available bit-rate to achieving the best possible video encoding. For programs with multiple streams, that usually means using data-compressed audio formats such as Dolby Digital rather than PCM.

To reduce the amount of data needed to represent the audio signal, Dolby Digital and the optional audio formats use perceptual-coding techniques for both stereo and surround signal. Compression algorithms are employed to discard sound from areas of the frequency spectrum that contain limited signal energy. At the same time, noise-shaping techniques are used to remove audio information in frequency bands where the missing sound is least likely to be noticed by the human ear.

In concert with these perceptual coding techniques, compressed formats supporting surround use a cross-correlation scheme to avoid redundant storing of information that is common between channels. The systems also use "fold down" mechanisms that mix a surround stream to stereo on-the-fly according to predetermined instructions. This means that even when a soundtrack is stored in a surround format it can be played back in a two-channel presentation environment.

Using data-compression in stereo and surround formats allows a substantial reduction in the bandwidth required for audio. In Dolby Digital format, for instance, audio streams use 192kbps for stereo and 384-448kbps for surround. DTS uses much less compression, but therefore requires greater bandwidth. At 1.536Mbps for a 5.1-channel stream, DTS delivers surround sound in the same bandwidth required for two-channel sound in 16-bit/48kHz PCM.

One more point about Dolby Digital and surround sound: it's easy to get confused between Dolby Surround and Dolby Digital 5.1. Dolby Surround is an audio format in which four channels (left, center, right, and surround) are "matrixed" (encoded) in the analog domain into a two-channel "Left-total, Right-total" (Lt/Rt) signal. This two-channel signal can be played as a stereo soundtrack if a Dolby Pro Logic decoder (standalone or in a receiver) is not available, but when played through a decoder the Lt/Rt signal is decoded to its original four channels. On DVD, the Lt/Rt channels of a Dolby Surround soundtrack may be stored as a two-channel stream in either PCM or Dolby Digital 2.0. This is not the same thing, however, as a Dolby Digital 5.1 surround soundtrack.

Quality Throughout
Now that we understand the kinds of audio that DVD-Video allows, we can start thinking about how that audio is prepared. Depending on both the type of project and the budget, there is a huge variation in the degree of production required. A presentation for a big product roll-out, a DVD kiosk for point-of-sale or museum installation, or a feature film DVD might all require extensive acquisition, editing, and mixing of dialog, music, and effects. On the other hand, an in-house training DVD that will never be seen by the public can probably be handled much more simply.

Regardless of the complexity of the project, however, one rule always holds true: preserve maximum quality at each step in the production chain. Always an important consideration, quality is particularly crucial when the audio is going to be data-compressed with an algorithm such as Dolby Digital. The higher the level of extraneous noise—whether constant (hiss, hum, buzz, ambience, etc.) or incidental (clicks, pops, rumble, etc.)—the harder it is for the encoding algorithm to make good choices when it comes to distinguishing between the information in the signal that needs to be preserved and the "unnecessary" sound elements that can be discarded to reduce bit-rate requirements.

In the acquisition stage, preserving quality starts with choosing an appropriate setting—studio or very quiet room—in which to record narration. You'll want to use professional-quality microphones, and to be aware of issues such as clothing rustle (with clip-on or "lavaliere" mics), hum or crackle (check the mic cable), and pops or rumble (you may need a windscreen). It's also very important to keep the recording level strong but without overloads that can cause distortion. An audio compressor/ limiter can help smooth out levels by bringing soft passages up while holding loud sounds in check, but too much limiting will make the sound lifeless.

If the recording is wild (not sync'd to picture), it's generally convenient to record to DAT, or to the hard drive of a digital audio workstation (DAW). If neither of those is available then the hi-fi tracks of a VHS VCR can make respectable recordings. If you need audio synced, and you don't have a way to record timecode on a separate audio system, then you'll be recording to the audio tracks of the camcorder or video deck. This can be fine if your video acquisition format supports digital audio at good resolution (i.e., 48kHz/16- or 20-bit), but avoid recording on the analog longitudinal (non-hi-fi) audio tracks of any video format such as VHS or 3/4-inch U-Matic.

Stages of Post
Once recorded, sound may go through several stages of post-production before it is actually encoded for DVD. Once again, the specifics depend on the origin of the sound elements and the type of DVD program being created. The first order of business, if necessary, is restoration, which involves cleaning up any crackle, clicks, pops, hiss, or other undesirable artifacts. This may be accomplished with dedicated pass-through rackmount devices such as those made by CEDAR, or with computer-hosted approaches such as CEDAR for Windows or NoNoise from Sonic Solutions. If the audio for a given picture sequence is available from more than one source, it may be advisable to restore the best parts of each available soundtrack and then edit the sections together to create a complete program.

If the soundtrack utilizes any elements—effects, music, or rerecorded dialog or narration—that were not cut along with the picture, then those elements will be "conformed" (edited to sync with the video master) once the picture edit has been finalized. This can be accomplished on any DAW such as Steinberg Nuendo or Digidesign Pro Tools, that supports synchronized video/audio playback.

In some cases, movie soundtracks that have previously been released in formats other than DVD may need to be reconformed because there have been edits made to picture elements (e.g., cuts for release in specific venues such as airlines or broadcast) that result in the original sound elements being out of sync when they are pulled for remixing. The potential for sync problems is compounded by DVD's ability to deliver multiple audio tracks in different languages; if a film was released in multiple languages, it may well have been edited differently for each foreign market, resulting in soundtracks for each language that won't all sync to the picture of a single version. These issues may arise only rarely outside the realm of feature film DVDs, but it's nonetheless advisable as a general rule to check sync throughout before a soundtrack is considered ready for encoding.

Once elements are edited and conformed to picture, they're ready for mixing, the blending and balancing of multiple sound elements (dialog, effects, music) into a complete soundtrack. Of course, mixing won't be necessary if there is only a single audio source (e.g., the audio track from camcorder footage) or the sound was previously mixed for release of the content in another form. However, even the soundtrack of previously released program material may be remixed from source elements to create surround sound. If source elements (stems) are not available, but surround is nonetheless considered important, the content owner or title producer may decide to create a 5.1 mix through selective panning of a mono or stereo mix across a multichannel soundfield.

Even when a program's soundtrack is a finished work that requires no further mixing, the audio preparation process will frequently involve mastering, which means adjusting the dynamics and frequency content (EQ) of a mixed soundtrack for best playback in the target release medium (in this case, DVD). In particular, because DVD is an interactive format that allows the viewer to navigate freely from section to section, it's important to address the consistency of levels between all sections of the program. A complex DVD title may contain dozens of video segments and menus, each with audio that may have been created at different times; level discrepancies would mean that viewers must adjust the volume control every time they jump to a different part of the disc. Instead, these volume adjustments should be addressed before the audio is encoded. At the same time, overall levels must be adjusted to optimize the soundtrack for encoding: peaks are set near (but not over) maximum, and audio compression (as distinct from data compression) may be used to gently boost quieter passages for improved audibility.

After these post-production operations—always performed while the audio is still in PCM—the soundtrack is ready for encoding if Dolby Digital or other non-PCM format will be used. The soundtrack may be transcoded from existing audio files, or it may be captured and encoded as it is played back, either from the audio tracks of a videotape or (as is the norm with 5.1-channel soundtracks) from a separate timecode-striped 8-channel tape in the MDM (Modular Digital Multitrack) format (often referred to as "Tascam DA-88").

Once encoded, the audio undergoes a thorough check. Proper synchronization with encoded video must be confirmed, as well as the absence of any artifacts (glitches or dropouts) introduced during capture or transcoding. If the audio is in surround, the audio operator will also need to compare decoded surround playback with the decoded downmix to be sure there is no phase cancellation or other problem. The material will also be checked at the various dynamic compression settings (not to be confused with data compression) supported by Dolby Digital, which allow the listener to set a consistently comfortable dialog level. A decoder such as Dolby's new DP564 will facilitate these types of professional listening evaluations.

With careful attention to quality throughout production and post, playback of the encoded audio should reveal a soundtrack that both complements and supports the video it accompanies.

Companies Mentioned In This Article

CEDAR Audio USA 43 Deerfield Road, Portland, ME 04101; 207/828 0024; www.cedaraudio.com
Digidesign, Inc. 2001 Junipero Serra Boulevard, Daly City, CA 94014; 650/731.6300; www.digidesign.com
Dolby Laboratories, Inc. 100 Potrero Avenue, San Francisco, CA 94103; 415/558-0200; www.dolby.com
Sonic Solutions 101 Rowland Way, Novato, CA 94945; 888/766-4968; www.sonic.com
Steinberg Media Technologies AG Neuer Hoeltigbaum 22-32, Hamburg, Germany 22143; 49 40 210 35-0; www.nuendo.com