Cracking DVD Encoding
Posted Mar 3, 2004

Part of what makes today's consumer-friendly DVD authoring tools so accessible is their success in shielding users from encoding and other techie tasks. But while ignorance may be bliss, aspiring DVD authors can't do competent commercial work without a working understanding of the encoding process. What's really happening behind the scenes, and what do you need to know about encoding to produce top-quality results?

Today's entry-level DVD creation tools enable users with little or no expertise in video encoding to make pleasing, playable DVDs. They may be audio producers expanding into DVD, 3D animators building DVD into their workflow, video editors with backgrounds in tape or other uncompressed media, corporate trainers migrating analog content, or novices and hobbyists with no technical expertise at all. But they all can become DVD authors, after a fashion, with little or no knowledge of the work going on behind the scenes. The embedded intelligence of today's tools renders encoding—the process that makes video DVD-compatible—virtually invisible.

However inviting this invisibility may be, it's anything but informative. The quagmire of entry-level authoring tools is that anyone can get involved in encoding video for DVD, but in order to advance to greater levels of customization—let alone the professional look that some argue comes only with sophisticated hardware encoders—they must understand the technology and take some control of the process.

With entry-level DVD applications, it's easy to breeze through encoding with little thought or attention on the user end. The software is such that the encoding process happens automatically, as most tools have wizards and presets that save users from even the most rudimentary parameter management. Consequently, the complexities of the encoding process are often hidden from the users within the software.

For those aspiring to work at a professional level, it is necessary to understand the "ins and outs" of the encoding process to achieve advanced levels of video quality. So let's take a closer look at what is going on in your software that you might need to know for a studio-caliber film, but don't (necessarily) have to know to capture and record your daughter's piano recital.

Encoding and Compression
Video captures the sequential movement of a series of still pictures. During the process of compression the pictures are compared, and the elements that do not change (redundancies) over time become compressed information. Compression consists of two fundamental processes: sampling and quantization. The sampling process separates the image into isolated pieces of picture rudiments called pixels. These pixels exist as signal points in 2D space. They are assigned to specific digital values and converted into a finite range of bits and bytes. This process of reducing the infinite scales of pixels to discrete numeric values is called quantization. Once the pixels are systematically converted to numeric values, a digital video file is created.

Because DVD capacity is fixed, the longer the video, the more compression it usually has to undergo. There is an inverse relationship between the level of quality and amount of compression. As a video undergoes more compression, quality will decrease, if only nominally in some cases, because of the video information removed. The closeness of the original video to the recreated video depends upon the amount of compression it undergoes and the way that compression is applied. The efficiency of the compression process is contingent upon the relationship between quality and the given bit rate. Encoders with better compression efficiency can deliver higher quality at lower bit rates, although much of this is dependent on the quality of the video source, and the degree of redundancy inherent to the content itself.

The Importance of the Algorithm
The calculations of the encoding process utilize numeric notations that stand for the individual characteristics of an image. As previously mentioned, the sampling process forms a collection of the numeric values that symbolize every pixel within the frame sequence. According to Richard Diercks, principal of RADCO Media and author of Aquaria, the first commercially available DVD-18 title, these sequences of numbers support the entire encoding process. "Each encoder is driven by a series of algorithms that compress/convert/transcode the digital video stream into an MPEG-2-compliant format in what amounts to a plain old computer file," Diercks says.

Encoding takes into account the process of describing the changes between pictures. The numeric values of the digital video file serve to flag such differences between frames, and the unchanged elements are referred to as redundancies. "In simple terms, video compression algorithms try to take advantage of spatial and temporal similarities within video frames," says William Chien, director of product development at Pinnacle Systems.

Encoding is the process of converting uncompressed video to compressed digital video using various standards, or codecs (compression/decompression algorithms) such as MPEG, Real Video, Windows Media, and QuickTime, depending on the playback medium, the platform that will be used to deliver the content, and the bandwidth available. The process of encoding includes describing the changes between each picture of a frame sequence. Encoding has the greatest effect on final video quality, as it is the process that determines the amount of compression a video stream undergoes.

Leonard Chairiglione and Hiroshi Yasuda created the Moving Picture Experts Group (MPEG) in 1988 with the objective of standardizing video and audio for CDs. The standard method for DVD-Video encoding is MPEG-2, chosen for the high bit-rate video and efficient compression that it specifies. Complications of Compression

Ease of compression depends on the type of source material. The more similarities within the frames of a video, the easier that video is to compress. Furthermore, video streams with less action and detail are easier to compress than those that are action-packed.

A common example used to describe this assertion is the "talking-head sequence." Typically, if the action of a scene is limited to the facial movements of a subject, the pixels are relatively unchanged from one frame to the next. This degree of redundancy can make an enormous difference in the degree of compression possible and add considerable ease to DVD bit-budgeting, as well as simplifying the encoder's task. According to Richard Diercks, "A talking head in front of a soft-focus background can be compressed 200:1 without any loss of image quality." Only the pixels around the mouth and eyes are expected to change with each facial expression, as the background remains the same.

DVD authors can take advantage of scenes with little motion in several ways. First, redundancies make it possible to estimate where an object will move to in the following frame, while preserving all the details remaining constant from the previous frame. "High-compression codecs rely on predicting the future and holding onto the past," says Curtis Palmer, senior vice president and chief technologist of the Media Software Division at Sony Pictures Digital. Such situations cry out for careful management of the encoding process to exploit them effectively. Authors must be familiar with common video stream conditions and characteristics in order to encode flexibly and astutely to coax the highest quality level possible out of the lowest bit rates.

Award-winning DVD author and AlphaDVD managing partner Ralph LaBarge views the compression process as a "tradeoff between quality and quantity." The average bit rate and the amount of compression depend upon the amount and length of content, as well as the output medium used.

The average bit rates of DVDs encoded under the MPEG-2 standard vary based on the type of video. According to LaBarge, "The challenge in DVD authoring is to maximize the quality of video and audio content on a DVD disc while keeping the maximum combined data rate less than 10.08 million bits per second (Mbps). Ideally, the content should all be able to fit onto the DVD" delivery method chosen. This could mean DVD-R, DVD+R, or replicated DVD-5, which max out at 4.7GB; or an 8.5GB replicated DVD-9. While dual-layer DVD±R discs promising 8.5GB capacity are expected to become available in 2004, authors working with recordable media must work within those 4.7GB constraints for now.

Authors creating discs for distribution on replicated media and outputting their content to digital linear tape (DLT) for mastering can take full advantage of DVD-9's 8.5GB capacity. For all practical purposes, consumers, home video hobbyists, prosumers, corporate and educational users, commercial videographers, and just about anyone not expecting to distribute their DVD creations in mass quantities will be working with DVD±R, which means 4.7GB is the upper limit for single-disc projects, regardless of how much content they have.

In the absence of user-customized compression, most entry-level DVD tools encode video at three levels, typically designated "High, Medium, and Low" or "Good, Better, and Best." These euphemisms usually correspond to specific fixed video bit rates. Typically, if the user chooses "Low" or "Good," the software will encode the video at a bit rate around 4Mbps. A "Medium" or "Better" bit rate is usually 6Mbps, and a "High" or "Best" bit rate is 8Mbps. The most popular entry-level DVD authoring tool, Sonic's MyDVD, automatically encodes all video at 8Mbps, which limits you to 60-70 minutes per disc, unless you import it in MPEG-2 encoded at a lower bit rate and don't add any effects or transitions (which mandate re-encoding) once you've got it in MyDVD.

"In general, the tools used to put home movies on recordable DVD discs will use a fixed bit rate for video of 8.5Mbps for a one-hour movie, or 6Mbps for a 90-minute movie," says LaBarge. All other things (like redundancies) being equal, the general rule of thumb is that lower-quality video sources will withstand less compression than higher-grade source material. So most entry-level tools will do their best to push users into the 6-9Mbps range for home video encoding. Hollywood movies, by contrast, have an average bit rate of around 3-5Mbps.

These figures are only estimates and are not set in stone. Some entry-level tools offer custom settings, and involve the user more in the process. Pinnacle Studio 9, a consumer-oriented video editor that integrates DVD authoring into the timeline, assumes a somewhat better video background and offers users a range of options, including a "max-out" option that calculates the maximum bit rate possible according to the amount of video and other content in a given DVD project. Helpfully, Studio also calculates a quality percentage, based on the differential between the maximum bit rate for say, a 97-minute project, and a "100%" quality 60-minute DVD. CBR vs. VBR

There are two modes of DVD encoding, constant bit rate (CBR), and variable bit rate (VBR). When encoding in constant bit rate mode, the level of compression difficulty or motion within the video stream is irrelevant, because the same bit rate is used throughout the entire process. So a static talking-head segment of a given video project will be compressed at the same bit rate as dynamic, high-motion scenes, which does a disservice to both and makes inefficient use of the disc's overall bit budget. Most home videos are encoded in CBR because it is quick and less complicated. (Most entry-level tools don't even include VBR encoders.) So the quality of the final product is consequently compromised. According to DVD author Richard Diercks, "No professional video should be encoded in constant bit rate (CBR), even if bandwidth is high. Our experience is that variable bit rate (VBR) always looks better."

In variable bit rate encoding, the bit rate is adjusted according to the compression difficulty of the frame sequence. When an encoder encounters an inactive frame sequence, it will not allocate as many bits to the scene. "This allows the encoder to produce difficult-to-compress sequences like football scenes at higher data rates, say 5.5Mbps, which translates to higher quality video," writes EMedia contributing editor Jan Ozer in his Peach Pit Press book, Sonic MyDVD 5 for Windows Visual QuickStart Guide. For serious authors, VBR is a must-have feature.

MPEG-2 video is encoded in two ways: VBR and CBR. Some tools offer only CBR, and some offer both. For example, Sonic's professional authoring tools for Windows, Scenarist 3 and DVD Producer, incorporate both VBR and CBR modes of MPEG-2 encoding. Both Sonic products can be used with Sonic's SD series hardware encoders. Industry-standard for Hollywood and other high-end DVD creation scenarios (along with Sonic's Mac OS Creator), Scenarist 3 and DVD Producer are among the few authoring tools to include a hardware encoder.

Ulead's DVD Workshop 2, Pinnacle's Impression, and Apple's DVD Studio Pro 2 are pro authoring tools that utilize both CBR and VBR software encoding. Most entry-level software solutions such as Sonic MyDVD and Apple iDVD 3 operate strictly in CBR mode. Prosumer-class tools cost more and have steeper learning curves than MyDVD and iDVD for both authoring and encoding functions where greater customization and user control are desired. But the payoff for commercial-level work is evident in the output.

Managing the Bit Budget
Managing a DVD's bit budget means mediating between the intended length of the video and the corresponding level of quality possible within the capacity constraints of the disc. For the high-level of bit-budget management possible with VBR encoding, one must determine the bit rate for each segment of the source video before encoding . It is important to figure out ahead of time how much information is going to fit on a disc. The amount of disc space taken up by the video can be calculated by multiplying the data rate by the targeted playing time. DVD author Bruce Nazarian's, provides members (anyone who enters a valid email address) with a Bit-Budget Calculator spreadsheet, which can prove an invaluable tool for allotting DVD capacity as you assemble the jigsaw puzzle of media assets. A similar spreadsheet is found on the DVD-Video/ROM disc tipped into Jim Taylor's definitive book on all things DVD, DVD Demystified.

Depending on the type of tool used, it is possible to limit the decision-making to the bit rate, as other factors are either rendered as constants or distributed according to presets. "For those new to encoding, Vegas and our other tools provide pre-made templates that divvy up the bits between audio and video based on typical cases," says Sony Pictures' Palmer. "This way, you will only need to decide what the final bit rate needs to be and choose the closest template."

In order to encode with greater efficiency and sophistication, one must understand the factors that determine the quality of audio and video when combined on a disc. One example given by Palmer is the difference in the type of audio being allocated. If the audio is mostly music it will require more bits; if mostly voice, it will require less bits. For professional DVD authors working post-post—that is, entering the project after shooting and editing, with a complete set of assets—the audio end of the process becomes much simpler, according to Richard Diercks. "Audio is a fixed bit stream," Diercks says. "There's little you can do about it. You look at your audio requirements as chiseled in granite and allocate video from there. You set your audio and play with the video for quality and fit."

There are different types of audio streams available, each differing in the amount of bandwidth they will occupy. "The only time audio becomes a big issue is when you have multiple audio tracks for the same video," says Diercks. Pulse code modulation (PCM), the uncompressed audio standard used for music CDs, occupies the most space at 1.5Mbps, and is therefore rarely used in DVD authoring. Dolby Digital, on the other hand is the audio specification most commonly used for encoding audio in DVD-video. Typical applications of Dolby Digital require .192Mbps, or .448Mbps for 5.1-surround, which means higher quality in significantly less space than PCM. The MPEG-Audio option offered in many consumer tools is a nice space-saver, but is too low-quality to be considered for professional work.

Software vs. Hardware Encoding
The use of software tools for encoding burdens processors with a heavy workload that they're often too slow or too distracted by tertiary tasks to handle quickly or reliably. Hyperthreaded and multiprocessor systems based on Intel (the 3+gHz Pentium 4 and Xeon) and IBM (Apple's G5) chips are a godsend in this area, since they make it possible to isolate processor-intensive tasks like encoding and rendering and leave the rest of the multitask-oriented PC (or Mac) to do its other work. But few DVD authoring tools are optimized to take advantage of these new processors (and virtually no entry-level solutions), so they're still fighting for the undivided attention of resource-addled machines. Consequently, stability and speed are the most desired characteristic of software tools.

Hardware encoders, which operate virtually or entirely independently of the workstation's processor, are specifically centered upon quality. Professional DVD authors use high-end hardware encoders, which sell for over $20,000. They use multi-pass VBR, segment-based re-encoding, pre-filters, and batch operation modes.

Hardware encoders' efficiency and flexibility is reflected in their high cost. "They don't hiccup, they don't lock up, they remember where they were, and many back up as they move along," says Diercks. And he doesn't mince words about the alternatives: "Software encoders should not be used by any professionals." Sony's Vizaro and Sonic's SD encoder series are examples of hardware tools that are paired with high-end software and used by the top authors in the field.

Sonic's SD Series encoders are PCI cards that plug directly into a PC and lift the entire encoding burden from the host machine's processor. These and other professional-level VBR encoders work in two modes, one-pass and two-pass. In one-pass encoding frames are scanned and encoded simultaneously. Compression takes place as the encoder is reading the entire file. In two-pass the encoder performs two trips through the frames for greater encoding and compression precision.

The first pass is for the purposes of file characterization. The encoder pinpoints scenes that are multifaceted and full of detail or action, which are consequently difficult to compress. Then in a second pass, the encoder variably deals out the bit budget funds over the sequential frames to produce the best quality video.

Although one-pass requires less time, the flexibility of two-pass encoding usually yields a final product that better preserves the quality of the source video. Two-pass encoding is the standard for professional DVD authoring, and users aspiring to that level of authoring should absorb it into their post-production workflow.

Secrets of Software Encoding
Hardware encoding is simply not an option for many users, no matter how high they intend to climb up the pro DVD ladder, so in choosing between speed and quality, users go through a bargaining process in which they must either compromise their time or the final product. As Sony's Curtis Palmer says, deviation from the original footage is inevitable, as "quality loss begins at the lens and the microphone." It is not possible to achieve both top quality and speed simultaneously during the encoding process. Furthermore, as a video undergoes compression, its level of quality decreases.

Most professional tools have an embedded MPEG-2 encoder so that authors are able to control the compression parameters directly. Consequently, professional users concerned with top quality should not encode for speed. "Quality is more important than speed, specifically quality at low average bit rates," says AlphaDVD's LaBarge. One-pass CBR mode generally produces the fastest results but does not elicit the highest levels of quality. Most consumer tools, however, aim for the fastest encoding times at the highest bit rate. Consumer software tools do not allow control over most compression parameters. Therefore, users are primarily concerned with achieving the fastest encoding times at the highest bit rate. But approaching these tools with some foreknowledge of the process—and how the user can manipulate it—can go a long way toward improving results with tools that offer some encoding flexibility.

Sony Pictures' Media Software offerings, including Movie Studio and Vegas, aim to accommodate the needs of beginners and professionals, respectively. "Encoding of audio and video can be a complex endeavor," Sony's Palmer says. "But that doesn't mean you have to understand the entire process to use our tools. However, as you learn more about the encoding process, our professional tools provide all the options necessary to take full control over each of the file formats and codecs."

Pinnacle Systems' goal in fashioning their video software line, according to William Chien, is to tailor their products to video editors and embed a portion of the compression process within their software. "In an advanced product it is appropriate to have advanced options, but the product should still do the right thing for the professional editor who is not a compression expert," says Chien. In Studio 9, Pinnacle's consumer editing/authoring tool, "technical bits" are taken care of automatically, while more video editing knowledge is required and rewarded in their professional software, Liquid Edition.

One example of an interactive feature in software encoding tools is the choice of encoding quality levels. This is one of the few common user manipulations during the encoding process. Sony's Vegas tool offers the following four quality levels of encoding: Draft, Preview, Good, and Best. Generally speaking, each level negotiates the tradeoff between fidelity, time, and quality. "Draft is fastest with low quality, while Best is slowest with high quality," says Curtis Palmer. "Preview is intended for the best real-time previewing performance while maintaining reasonable quality. Good is the default final render quality and is designed to produce great results relatively quickly for most situations."

Each adjustment for higher quality is made at the expense of time. The higher the quality level chosen, the longer the encoding process will take to complete. There's more at stake here than bit rate; it follows logically that, all other things being equal, it takes the same amount of time to encode 30 minutes of video at 8Mbps as at 6Mbps. What else is going on there?

Let's take a closer look at what each of the options in Vegas means to the video output, as well as the processing differences that correspond specifically to each chosen level. In Sony Vegas, the levels of encoding quality vary according to the following factors: scaling, field handling, field rendering, and frame rate re-sampling. The "Best" level uses bi-cubic integration scaling, while both "Good" and "Preview" undergo bi-linear scaling. The type of scaling used in "Draft" is point sample. The setting-dependent field handling and rendering functions only operate at the "Best" and "Good" quality levels, which enables higher acuity output and accounts for the longer rendering time. The frame rate re-sample function, which is switch-dependent, is also exclusive to these top two levels.

Pinnacle offers the same encoder, called SW, in both its professional and consumer products. It uses the one-pass method of VBR encoding. The SW codec boasts "greater speed and greater flexibility for rendering video to a MPEG-2 format for output of a file for DVD," according to Chien. With this, the just-released Studio 9 features smart rendering, which automatically renders edited sections of video and stitches together a stream. Nonetheless it is possible for users to make the choice between encoding for speed or quality.

"In our consumer products, a subset of the encoding parameters is presented," Chien says. "Usually only bit rate and resolution can be adjusted." A mode called "Fast" or "Draft" is optimized specifically for speed, while a high-quality mode option is also available if users choose to focus on quality. Users are able to choose from three possible encoding levels: "Good, Better and Best." These options differ according to "speed, quality, and file size," according to Chien. The "Good" option uses the MPEG-1 standard at a resolution of 352x240, and produces VCD-compatible video at a bit rate of 1.1Mbps. The "Better" level uses the MPEG-2 standard at a resolution of 480x480, and produces SVCD-compatible video at a bit rate of 2.5Mbps. Finally, the "Best" setting uses the MPEG-2 standard at a resolution of 720x480, and produces DVD-compatible video at a bit rate of 6Mbps.

Speaking Encode
The bottom line is that there are very efficient software solutions available to consumers who are content with remaining DVD hobbyists. However, any user who aspires to advance to the various professional levels of DVD authoring, or any media professional who intends to make their DVD skillset the equal of their expertise in audio or video editing, must have the mind-power to take advantage of the options that professional tools provide.

As Sony's Curtis Palmer says, "If you really want to push the limits of a particular codec, it will require experimentation and experience to get the best results."

Companies Mentioned in this Article
Alpha DVD
Apple Computer, Inc.
Gnome Media
Pinnacle Systems, Inc.
Sonic Solutions
Ulead Systems, Inc.
Sony Pictures