Perceptual Coding

Vicente González Ruiz - Depto Informática - UAL

November 3, 2024

Contents

 1 What is perceptual coding?
 2 ToND varies with the luma intensity
 3 ToND varies with the spatial frequency
 4 Visual masking of the quantization noise
 5 Loop filters
 6 Luma redundancy
 7 Chroma redundancy
 8 Resources
 9 To-Do
 10 References

1 What is perceptual coding?

So far, we have focused on minimizing the Lagrangian [6] \begin {equation} J = R + \lambda D, \label {eq:RD} \end {equation} where \(R\) is the data rate, and \(D\) is an additive1 distance metric, such as the RMSE, the PSNR or the SSIM index [7]. However, the way in which human beings perceive distortion is generally different from how these metrics express it. This chapter introduces some of the most common ways of exploiting the visual distortion perceived by humans.

Notice that if, according to the requirements of the encoding process, \(D\) is below a Threshold of Noticeable Distortion (ToND), the RDO process described by Eq. \eqref{eq:RD} boilds down to select the option with smaller \(R\).

2 ToND varies with the luma intensity

The Weber-Fechner law states that the minimum perceivable visual stimulus difference increases with background luminance2 [5], up to a point in which it decreases. Therefore, the perception of the distortion generated by the lossy coding of an image is smaller in areas with higher and lower intensity values. For this reason, one of the most used quantizers is the deadzone, which also in general changes signal noise (for example, electronic noise) by quantization noise, where the SNR of the signal is smaller (arroung 0).

3 ToND varies with the spatial frequency

The HVS can be modeled as a low-pass filter whose cutoff frequency depends on the distance between the observer and the content (in terms of frequency) of the image.

Some DCT-based image and video coding standards, such as JPEG and H.264, define quantization matrices designed for perceptual coding [2]. These matrices indicate a different quantization step size for each 8x8-DCT coefficient, whose values were found through a study of the subjective impact of the quantization of each coefficient in the ToND. In the case of H.264, such matrices can change between images [5].

In the case of JPEG 2000, each subband uses a different quantization step size [4]. However, note that these values depend on the selected DWT filter.

4 Visual masking of the quantization noise

Quantization noise is generated by the quantizer, producing different coding artifacts3, which are hardly perceived when the (area of the) encoded image is textured [8]. This effect can occur up to reach the ToND.

Another important aspect of our perception is its directionality, which leads the HVS to be more sensitive to distortions added to horizontal and vertical frequencies than to diagonal frequencies [5].

Finally, the rationale behind temporal masking is that the HVS sensitivity to coding artifacts is lower in areas with very high motion activity.

In video, modeling temporal masking is more challenging because the spatio-temporal sensitivity function of the HVS is not separable, i.e., it depends on both the spatial4 and temporal frequencies [5]. However, sources of distortion such as mosquito noise can be hardly perceived in video because this type of noise is temporally uncorrelated.

5 Loop filters

Loop filters are used in motion-compensated video codecs to improve visual quality (and also the encoding RD performance). For example, H.264/AVC uses (usually directional5) deblocking filters in the encoding loop to smooth the transitions between the blocks, when the boundaries between them become perceptible. Loop filters improve significantly the perceived quality of the video in “flat” areas, where the blocking can be more easely appreciated.

6 Luma redundancy

The HVS can perceive only a finite number of different intensities (luma). This number depends on the dynamic range of the pixels, but, in general, we are unable to distinguish more than 64 intensity values [3].

7 Chroma redundancy

Humans do not perceive (spatial) detail in chrominance as well as in luminance [1]. For this reason, the croma can be downsampled to 1/4 of the original sampling rate without noticeable distortion. This feature is used in most of lossy image and video encoding algoritms.

8 Resources

  1. Spectral (color) redundancy.

9 To-Do

  1. Modify the VCF compression pipeline to take advantage of the chroma redundancy. Use different quantization step sizes for each color subband. Complexity 2.
  2. The image codec 2D-DCT.py can use quantization matrices which increase the compression ratio without increase the perceived distortion, but these matrices are ony defined for blocks of \(8\times 8\). Using an image resizing technique, use such matrices for applying perceptual coding for blocks of any size in \(\{2\times 2, 4\times 4, \cdots , 2^n\times 2^n\}\). Complexity 4.
  3. In the case of the 2D-DWT, we can exploit the lower sensitivity of the HVS to diagonal frequencies. This means that we can increase the quantization step size of the HH subbands (compared to the others) without noticeably increasing the perceived distortion. Complexity 4.
  4. The local entropy of the motion vectors can be a good estimation of the motion complexity in a video sequence. In a new motion compensated video codec, adapt the quantization step size to the local entropy, trying to increase the compression ratios without increasing the perceived distortion. Complexity 10.

10 References

[1]   W. Burger and M.J. Burge. Digital Image Processing: An Algorithmic Introduction Using Java. Springer, 2016.

[2]   FERDA Ernawan and SITI HADIATI Nugraini. The optimal quantization matrices for JPEG image compression from psychovisual threshold. Journal of Theoretical and Applied Information Technology, 70(3):566–572, 2014.

[3]   V. González-Ruiz. Visual Redundancy.

[4]   Feng Liu, Eze Ahanonu, Michael W Marcellin, Yuzhang Lin, Amit Ashok, and Ali Bilgin. Visibility of quantization errors in reversible JPEG2000. Signal Processing: Image Communication, 84:115812, 2020.

[5]   M. Naccari and M. Mrak. Perceptually optimized video compression. In Academic Press Library in Signal Processing, volume 5, pages 155–196. Elsevier, 2014.

[6]   G.J. Sullivan and T. Wiegand. Rate-distortion optimization for video compression. IEEE signal processing magazine, 15(6):74–90, 1998.

[7]   Z. Wang, A.C. Bovik, H.R Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.

[8]   H.R. Wu and K.R. Rao. Digital video image quality and perceptual coding. CRC press, 2017.

1The total distortion of two (or more) sources of distortion is the sum of the distortions of these two (or more) sources.

2In general, this is not true for the chroma.

3“Random” noise, blocking, ringing, etc.

4Which in turn depends on the distance between the user and the display.

5Anisotropic.