Codec
Codec (en
COder/
DECoder): software or hardware that encode and decode audio and video data streams
The purpose of codecs is to reduce the size of digital audio samples and video frames by compressing/encoding in order to speed up transmission and save storage space.
Lossy or Lossless
The goal of all codec designers is to maintain audio and video quality while compressing the binary data further.
Most codecs are LOSSY, in order to get a reasonably small file size. There are LOSSLESS codecs as well, but for most purposes the almost imperceptible increase in quality is not worth the considerable increase in data size. The main exception is if the data will undergo more processing in the future, in which case the repeated lossy encoding would damage the eventual quality too much.
Examples of Lossy file formats: AAC (Advanced Audio Coding), MP3, Vorbis (filename extension .OGG), lossy Windows Media Audio (filename extension .WMA)...
Example of Lossless file formats: Waveform Audio File Format (.WAV),
Audio Interchange File Format (
.aif),Apple Lossless (filename extension .m4a), FLAC, Monkey's Audio (filename extension .APE), Shorten, TTA, lossless Windows Media Audio (filename extension .WMA), WavPack.
Containers
A container format is a computer file format that can contain various types of data, compressed in a manner of standardized codecs. The container file is used to be able to identify and interleave (set a value on how often the audio and video streams are "synchronized") the different data types.
Simpler container formats can contain different types of audio codecs, while more
advanced (flexible) container formats can support audio, video, subtitles, chapters, and metadata (tags) - along with the synchronization information needed to play back the various streams together.
Examples of containers: WAV (RIFF file format) is a simple audio container while AVI (the standard Microsoft Windows container), Matroska(MKV), a video and audio container format ASF (standard container for Microsoft WMA and WMV), MOV (standard QuickTime container)... are flexible (more advanced) containers that can hold many types of audio and video, as well as other media.
Specialised Codecs
SPEECH CODECS are designed to deal with the characteristics of voice, while AUDIO CODECS are developed for music. The difference between speech and audio codecs is that speech codecs look for speech patterns in order to compress the data further.
Codecs may also be able to transcode from one digital format to another; for example, from PCM(.wav or .aif) audio to MP3 audio.
Some most podpular Codecs
VIDEO CODECS
- AVI: movies (Microsoft)
- Cinepak: movies (SuperMac Technologies)
- VC-1: SMPTE 421M
- H.261: videoconferencing (ITU)
- H.263: videoconferencing (ITU)
- H.264: videoconferencing (ITU)
- Indeo: movies (Intel)
- MPEG-1: movies (Moving Pictures Experts Group)
- MPEG-2: movies (Moving Pictures Experts Group)
- MPEG-4: movies (Moving Pictures Experts Group)
- RM, RV: movies and streaming (RealNetworks)
- Sorenson: movies (Sorenson Media)
- WMV: movies and streaming (Microsoft)
AUDIO CODECS
- WAVE: Music (Microsoft)
- AAC: music with digital rights (DRM)
- ACELP.live: music (VoiceAge)
- AIFF: music (Macintosh)
- AU: music (Sun)
- MP3: music (Fraunhofer IIS)
- Ogg Vorbis: music (open source standard)
- RA, RAM: music (RealNetworks streaming)
- WMA: music (Microsoft)
SPEECH CODECS
- ยต-Law PCM: telephone circuit (U.S.)
- ACELP.net: general speech (VoiceAge)
- ACELP.wide: high quality (VoiceAge)
- A-Law PCM: telephone circuit (Europe)
- AMR-NB: GSM, 3GPP (ETSI narrowband)
- AMR-WB: GSM, 3GPP (ETSI wideband)
- DV Audio: MiniDV, audio
- G.711: audio/videoconferencing (ITU)
- G.722: audio/videoconferencing (ITU)
- G.723.1: VoIP, audio/videoconferencing (ITU)
- G.728: audio/videoconferencing (ITU)
- G.729: audio/videoconferencing (ITU)
- GSM 06.10: GSM, cellphone (unknown)
-
VC-1 and H.264 represent a logical technological evolution in video compression compared to MPEG-2. Both of these codecs are generally able to achieve superior quality over MPEG-2 at comparable bit rates.
H.264 (MPEG-4 Part 10 or AVC)
H.264/MPEG-4 Part 10 or
AVC (Advanced Video Coding) is a next-generation video compression format. H.264 is also known as MPEG-4 AVC, and is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video.
H.264/MPEG-4 AVC is a block-oriented motion-compensation-based codec standard developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG). It was the product of a partnership effort known as the Joint Video Team (JVT). The ITU-T
H.264 standard and the ISO/IEC
MPEG-4 AVC standard (formally, ISO/IEC 14496-10 - MPEG-4 Part 10, Advanced Video Coding) are jointly maintained so that they have identical technical content.
Apple has official adopted H.264 as the format for QuickTime. It is also one of the formats chosen to be supported by both high definition DVD standards, and is destined to be the future standard format for Blu-ray. Even AVCHD, the consumer format offered by the same people behind Blu-ray and for use in camcorders and Blu-ray recorders, uses H.264 as the main video format.
VC-1
VC-1 is a video coding standard developed by Microsoft. It began as Windows Media Video 9. It is prevalent in ASF files downloaded from the internet. It is also supposed to be used on HD-DVDs.
Most commonly, VC-1 data is found inside of Microsoft ASF files and identified with the FourCC 'WMV3' for VC-1 simple and main profile and FourCC 'WVC1' for advanced profile. Note that the FourCC 'WMV9' may not actually exist in the wild but the acronym gained prominence anyway due to the fact that this video codec was introduced as part of the Windows Media 9 tool suite. VC-1 video will probably be encapsulated in other types of containers and stream formats such as MPEG for HD-DVD transport.
Overview of VC-1
The VC-1 codec is designed to achieve state-of-the-art compressed video quality at bit rates that may range from very low to very high. The codec can easily handle 1920 pixel × 1080 pixel presentation at 6 to 30 megabits per second (Mbps) for high-definition video. VC-1 is capable of higher resolutions such as 2048 pixels × 1536 pixels for digital cinema, and of a maximum bit rate of 135 Mbps. An example of very low bit rate video would be 160 pixel × 120 pixel presentation at 10 kilobits per second (Kbps) for modem applications.
The basic functionality of VC-1 involves a block-based motion compensation and spatial transform scheme similar to that used in other video compression standards since MPEG-1 and H.261. However, VC-1 includes a number of innovations and optimizations that make it distinct from the basic compression scheme, resulting in excellent quality and efficiency. VC-1 Advanced Profile is also transport and container independent. This provides even greater flexibility for device manufacturers and content services.
Innovations
VC-1 includes a number of innovations that enable it to produce high quality content. This section provides brief descriptions of some of these features.
Adaptive Block Size Transform
Traditionally, 8 × 8 transforms have been used for image and video coding. However, there is evidence to suggest that 4 × 4 transforms can reduce ringing artifacts at edges and discontinuities. VC-1 is capable of coding an 8 × 8 block using either an 8 × 8 transform, two 8 × 4 transforms, two 4 × 8 transforms, or four 4 × 4 transforms. This feature enables coding that takes advantage of the different transform sizes as needed for optimal image quality.
16-Bit Transforms
In order to minimize the computational complexity of the decoder, VC-1 uses 16-bit transforms. This also has the advantage of easy implementation on the large amount of digital signal processing (DSP) hardware built with 16-bit processors. Among the constraints put on VC-1 transforms is the requirement that the 16-bit values used produce results that can fit in 16 bits. The constraints on transforms ensure that decoding is as efficient as possible on a wide range of devices.
Motion Compensation
Motion compensation is the process of generating a prediction of a video frame by displacing the reference frame. Typically, the prediction is formed for a block (an 8 × 8 pixel tile) or a macroblock (a 16 × 16 pixel tile) of data. The displacement of data due to motion is defined by a motion vector, which captures the shift along both the x- and y-axes.
The efficiency of the codec is affected by the size of the predicted block, the granularity of sub-pixel data that can be captured, and the type of filter used for generating sub-pixel predictors. VC-1 uses 16 × 16 blocks for prediction, with the ability to generate mixed frames of 16 × 16 and 8 × 8 blocks. The finest granularity of sub-pixel information supported by VC-1 is 1/4 pixel. Two sets of filters are used by VC-1 for motion compensation. The first is an approximate bicubic filter with four taps. The second is a bilinear filter with two taps.
VC-1 combines the motion vector settings defined by the block size, sub-pixel granularity, and filter type into modes. The result is four motion compensation modes that suit a range of different situations. This classification of settings into modes also helps compact decoder implementations.
Loop Filtering
VC-1 uses an in-loop deblocking filter that attempts to remove block-boundary discontinuities introduced by quantization errors in interpolated frames. These discontinuities can cause visible artifacts in the decompressed video frames and can impact the quality of the frame as a predictor for future interpolated frames.
The loop filter takes into account the adaptive block size transforms. The filter is also optimized to reduce the number of operations required.
Interlace Coding
Interlaced video content is widely used in television broadcasting. When encoding interlaced content, the VC-1 codec can take advantage of the characteristics of interlaced frames to improve compression. This is achieved by using data from both fields to predict motion compensation in interpolated frames.
Advanced B Frame Coding
A bi-directional or B frame is a frame that is interpolated from data both in previous and subsequent frames. B frames are distinct from I frames (also called key frames), which are encoded without reference to other frames. B frames are also distinct from P frames, which are interpolated from previous frames only. VC-1 includes several optimizations that make B frames more efficient.
Fading Compensation
Due to the nature of compression that uses motion compensation, encoding of video frames that contain fades to or from black is very inefficient. With a uniform fade, every macroblock needs adjustments to luminance. VC-1 includes fading compensation, which detects fades and uses alternate methods to adjust luminance. This feature improves compression efficiency for sequences with fading and other global illumination changes.
Differential Quantization
Differential quantization, or dquant, is an encoding method in which multiple quantization steps are used within a single frame. Rather than quantize the entire frame with a single quantization level, macroblocks are identified within the frame that might benefit from lower quantization levels and greater number of preserved AC coefficients. Such macroblocks are then encoded at lower quantization levels than the one used for the remaining macroblocks in the frame. The simplest and typically most efficient form of differential quantization involves only two quantizer levels (bi-level dquant), but VC-1 supports multiple levels, too.
Profiles and Levels
VC-1 contains a number of profile and level combinations that support the encoding of many types of video. The profile determines the codec features that are available, and thereby determines the required decoder complexity (mathematical intensity). The following table lists VC-1 profiles and levels.