Temporal Transforms

Vicente González Ruiz - Depto Informática - UAL

November 3, 2024

Contents

 1 Temporal correlation
 2 Motion Compensation (MC)
 3 Motion Estimation (ME)
 4 GOF-ing
 5 Block-based MC and RDO
 6 Frame types
 7 Resources
 8 To-Do
 9 References

1 Temporal correlation

In general, neighbor frames (or images) in (video) sequences exhibit a high temporal correlation degree that can be exploited to improve significantly the RD curves. This correlation generates a temporal redundancy that can be removed using a (temporal) transfor.

A temporal transform inputs two or more frames1, and outputs at least one residual (frame) in which the residual pixels have a higher dynamic range but, in general, also a lower entropy (see the spatial transform theory).

2 Motion Compensation (MC)

Most video coding standards use Motion Compensation (MC) to generate the residual frames [2]. MC exploits the temporal correlation and reduces the entropy of the residuals2. Basically, MC consists in subtracting from each original frame a prediction (frame) built with the information that must be also avaliable3 at the decoder. Notice that, after using MC, the number of residual pixels is equal to the number of pixels in the compensated frame.4

3 Motion Estimation (ME)

To compensate the motion we need first to estimate5 it using Motion Estimation (ME) techniques [3]. Using the motion fields generated by the motion estimator, both the encoder and the decoder generate the predictions that will be desubtracted (added in the case of the decoder) from the predicted images [2]. However, notice that in most of the video coding standards, ME is only performed by the encoder because it is a costly operation, and for this reason, the motion vector fields must be transmitted to the decoder. This responds to the idea of “compress one, decompress many”.

4 GOF-ing

The RD performance of ME/MC depends on the amount of temporal redundancy in the sequence. If such an amount is low, it can be more RD-efficient to interrupt the (ME/)MC process. The set of consecutive frames in which MC is active is usually known as a GOF6 (Group of Frames). Notice that (under the RD prism) the length of the GOFs is variable, and therefore, the GOF partition should be an adaptive process controlled by a RDO algorithm.

However, in some contexts7 it may be necessary to use a fixed GOP partition [2]. For example, if we want to give the option to the users to move fast forward or backward along the sequence, we need to set some maximum GOF size. Another reason to use a maximum GOF size is to limit the propagation of decoding errors (for example, because in a streaming session we have not received some data). When a new GOF stars, the propagation of such errors is stopped.

5 Block-based MC and RDO

The MC schemes used in most video coding standards compensate blocks of pixels [3]. In this context, depending of the block decision mode implemented in the RDO procedure8, blocks can be of different type (I (intra), P (predicted), B (bidirectionally predicted) and S (skipped)) [2]. A I-block is used when we do not found enough temporal correlation between frames and from a RD perspective, it is more advantagous to use intra-coding. When we found one or more reference blocks to perform a good prediction, we are using predictive-coding. Notice that the number of reference blocks can be higher than two, a number also controlled by RDO.

6 Frame types

Depending on the type of blocks used in the frames, we have different types of frames: I, P, and B [2]. For example, in the intra-coding mode, all the frames are I-type because otherwise we could not reset the propagation errors. In Motion Compensated Temporal Filtering [1], the frames are I or B.

7 Resources

  1. Full search block-based ME (Motion Estimation).
  2. Full search dense (1x1) ME.
  3. Farnebäck’s motion estimation.
  4. Introducing the Low-delay (IPP...) Mode.
  5. Multi-Resolution Video Coding (MRVC).

8 To-Do

  1. Create a video codec for processing a sequence of images using an III... scheme, using the 2D-DCT to remove the spatial redundancy. Use MPNG.py and 2D-DCT.py as reference. Complexity 5.
  2. Create a video codec similar to the previous one, but using an IPP... scheme without ME (supossing that all the motion vectors are zero). RDO should be considered to determine the block type. Complexity 7.
  3. Create a video codec similar to the previous one, but using ME (and MC controlled by RDO). Complexity 10.
  4. Create a video codec similar to the previous one, but using an IBB... scheme. Complexity 15.

9 References

[1]   V. González-Ruiz. Motion Compensated Temporal Filtering (MCTF).

[2]   V. González-Ruiz. Motion Compensation.

[3]   V. González-Ruiz. Motion Estimation.

1With pixels or coefficients, depending on the current domain in which the frame has been represented.

2The better the prediction, the lower the entropy of the residuals.

3In order to make a reversible process.

4At least, when we compensate in the image domain.

5In most of the situations, the determination of the true motion of the objects in a real scene is a ill-posed problem because it is impossible to find it using only a sequence of 2D images. A different situation is when we use al least 2 cameras.

6Some standards also use GOP (Group Of Pictures).

7Specifically, constant bit-rate encodings.

8Obviously, the part of the RDO procedure that controls the block-type.