Most image and video compressors exploit the statistical (and also perceptual1) correlation between the \(\text {RGB}\) color components2 of the pixels, using a color transform.
Color transforms are pixel-wise operators. As a result, each pixel is represented in a different domain where (usually) three new coefficients3 express the same4 information but in a different color domain.
Most color transforms are designed to split the color information of a pixel into luminance (luma) and chrominance (chroma). The luma is basically the low frequency5 information of the pixel, and the chroma (logically) high frequency information.
For example, in JPEG and H.264/AVC the color information of each pixel is transformed from the \(\text {RGB}\) color space to the \(\text {YCrCb}\) color space, and in JPEG XR, the color space \(\text {YCoCg}\)is used. In these luma-based color spaces, \(\text {Y}\) represents the luma (coefficient) of the pixel. The other two coefficients form the chroma. 1Note that the chrominance of a pixel is determined by two chromas.
Apart from using the terms of component and coefficient, we will use the word channel to refer to the same index component of all the pixels of an image (or video), and subband to denote the same index coefficient generated after the transformation of all pixels of an image (or video). For example, the \(\mathbf {R}\) channel of a color image corresponds to the monochromatic image6 generated by the \(\text {R}\) component of all pixels of the image, and, for example, the \(\mathbf {Y}\) subband of a transformed (\(\text {RGB}\)) image corresponds to the \(\text {Y}\) coefficients of all pixels (also a “monochromatic” image).
In image and video coding, most color transforms map 3 channels (\(\text {RGB}\)) into 3 subbands.
Color transforms applied to natural visual information generally have two key advantages:
If the color transform is orthogonal or biorthogonal, that is, the luma and the cromas are independent, the quantization noise generated in the subbands is additive [1]. Therefore, from a pure RD point of view, the quantization step sizes for each subband should be selected using the same RD slope in all subbands (see the notebook Scalar Quantization of RGB images. Notice that this implies to compute the RD curves.
Therefore, taking a generic luma / croma transform \(\text {YUV}\), we would expect that \begin {equation} \lambda ^{\text {Y}} \approx \lambda ^{\text {U}} \approx \lambda ^{\text {V}} \label {eq:optimal_lambda} \end {equation} for a given quantization step size \(\Delta \), the RDO [2] can be ignored. In the notebook Scalar Quantization of RGB images we can explore (at least visually) the grade of compilance of Eq. \eqref{eq:optimal_lambda}.
See the notebooks Removing RGB redundancy with the DCT, Removing RGB redundancy with the \(\text {YCoCg}\) transform and Removing RGB redundancy with the \(\text {YCrCb}\) transform.
SQ (Scalar Quantization) [3, 7] would be an optimal solution only if the image colors are uniformly distributed within the RGB cube. However, the typical color distribution in natural images is anything but uniform, with some regions of the color space being densely populated and many potentially used colors completely missing. In this case, depending on the quantization step size [4], SQ could be suboptimal because the colors used may not be sampled with sufficient density, while at the same time the encoding system considers colors that do not appear in the image at all [1].
On the other hand, VQ (Vector Quantization) [6, 7] applied to the color domain does not treat the individual \(\text {RGB}\) components separately as does scalar quantization, but each color vector used \({\mathbf C}_i = (\text {R}_i, \text {G}_i, \text {B}_i )\) in the image is treated as a minimum structure. VQ determines a code-book of \(K\) code-vectors (centroids) that minimizes the distortion between the original image and the reconstructed one. Notice that the code-book must be known by the decoder to find a reconstruction.
See the notebooks Vector Quantization (in the color domain) of a RGB image and Vector Quantization (in the 2D domain) of a color (RGB) image.
[1] W. Burger and M.J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java. Springer, 2016.
[2] V. González-Ruiz. Information Theory.
[3] V. González-Ruiz. Scalar Quantization.
[4] V. González-Ruiz. Signal Quantization.
[5] V. González-Ruiz. Transform Coding.
[6] V. González-Ruiz. Vector Quantization.
[7] K. Sayood. Introduction to Data Compression (Slides). Morgan Kaufmann, 2017.
1This will be explained latter in this course.
2A component of a pixel in the \(\text {RGB}\) domain refer to one of the values \(\text {R}\) (red), \(\text {G}\) (green) and \(\text {B}\) (blue) coordinates in the \(\text {RGB}\) color 3D space.
3Most part of the transforms, including the color ones, analyze the signal information from a frequency perspective, generating the so called coefficients whose index in the transform domain is related to a different frequency of the signal.
4In general, the color transforms can be considered lossless, although this is only true if fixed-point arithmetic is used.
5It is worth understanding that the frequency concept in the color transform domain is not related to the frequency concept in the original pixel domain. For example, the \(\text {R}\) component or a pixel represents the amount of red in the pixel, and in the visible spectrum we are refering to frequencies that are lower than the frequency that the \(\text {G}\) and \(\text {B}\) components represent. However, in a color transformed domain, the luma measures the brightness level of the pixel, and we cannot found a subband of frequencies in the visible spectrum that can represent such information because we are using a different representation domain.
6That can be considered as a single-channel/mono-component/scalar image.
7In general, the information provided by the signals.
8For the same bit-rate.
9Notice again, that we will study this effect in a posterior session.