Multi-Resolution Video Coding (MRVC)

Vicente González Ruiz

December 28, 2024

Contents

 1 Transform coding and compression
 2 The 2D-DWT (Distrete Wavelet Transform)
  2.1 Implementation of the 3 components (usually color) DWT step
 3 Multi-level DWT structures
 4 Scalability provided by the DWT
  4.1 Example: A 1-iteration DWT
  4.2 Example: A 2-iterations DWT (\(2\times \mathtt@@ {DWT}\))
 5 Motion DWT (MDWT)
 6 Scalability provided by MDWT
  6.1 Example: 1-iteration MDWT (\(\mathtt@@ {MDWT}(N=5)\))
  6.2 Example: 2-iterations MDWT (\(2\times \mathtt@@ {MDWT}(N=5)\))
  6.3 Example: Interpolating (a sequence) with the DWT
 7 Video transform alternatives
 8 DWT structures and Laplacian Pyramids
 9 Motion Compensated Orthogonal Laplacian Pyramid (MCOLP)
  9.1 The MCOLP butterfly
  9.2 The MCOLP transform
  9.3 Example: \(\mathtt@@ {MRVC}(T=1, N=5)\)
  9.4 Example: \(\mathtt@@ {MRVC}(T=2, N=5)\)
  9.5 Simplification of the notation
  9.6 Example: \(\mathtt@@ {MRVC}(T=4, N=17)\)
  9.7 Scalability provided by MCOLP
  9.8 Example: 3-iters of \(\mathtt@@ {MRVC}(T=2, N=5)\)
 10 Adaptive motion compensation based on the energy of the estimated prediction error
 11 Lossy compression: quantization
  11.1 Example: Effects caused by the total attenuation of the motion compensated (MCed) \(H\) subbands
  11.2 Example: Effects caused by the quantization of MCed \(H\) subbands
  11.3 Example: Quantization of all \(H\) subbands (MCed and intra)
  11.4 Example: Quantization of all subbands

Abstract

MRVCs is a transform that inputs a video (a sequence of frames of pixels) and outputs a decomposition (a sequence of spatio-temporal subbands of coefficients). It consists of two stages:

  1. MDWT (Motion Discrete Wavelet Transform), that computes the 1-levels 2D-DWT to every video frame, resulting a representation with \(2\) SRLs (Spatial Resolution Levels).
  2. MCOLP (Motion Compensated in the Orthogonal Laplacian Pyramid), which estimates the motion in the lower SRL and compensates the motion in the high-frequencies of the higher SRL, using a Orthogonal Lapacial Pyramid structure that has a critical representation in the 2D-DWT domain.

MRVC can be applied recursively to the lower SLR, producing a multiresolution representation of the video and compensating the motion of the sequence. The output decomposition concentrates most of the energy in a small number of subbands (usually, those representing the low frquencies) which enables the lossy compression of the video.

1 Transform coding and compression

Video data contain high amounts of redundancy, spatial and temporal. For this reason, most of video encoders compress an input sequence of images1, basically, in two stages: (1) a transform in the time and in the spatial domains that produces a collection of uncorrelated subbands, and (2) an entropy encoding stage which removes the statistical redundancy that can still remains after the decorrelation (transform). Usually, these coefficients have two interesting features:

  1. Usually, a smaller entropy than the original pixels. This helps to increase the compression ratio.
  2. A multiresolution representation (in space and time) of the visual information. This provides the posibility of decoding the sequence of images using a smaller resolution. This feature is known as spatio-temporal scalability in video.
  3. Usually, most of the signal energy (and therefore, generally, most of the information interesting for the humans) is represented by a small number of subbands. This helps to improve the compression ratio and to prioritize the data in progressive transmission scenarios.

2 The 2D-DWT (Distrete Wavelet Transform)

The 2D-DWT (2-Dimensions Distrete Wavelet Transform)2 is a digital transform that, applied to an image, performs a accumulation of the energy of the image in a small number of subbands and obtains a multiresolution (generally dyadic) representation of such image. The subbands conform a decomposition3 \(\{LL, LH, HL, HH\}\), were \(L\) stands for 1D low-pass analysis filtering and \(H\) for 1D high-pass analysis filtering. Thus, for example, \(LH\) is the result of applying the \(L\) filter to the rows and downsample4 the output by a factor of 2, and then applying the \(H\) filter to the columns of the previous decomposition and again, donwsampling by a factor of 2. The subbands are populated by coefficients representing different spatial areas, depending on the localization of the coefficient and the resolution level. Notice that it is possible to compute the (2D-)DWT using 1D filters. Therefore, the DWT is separable.

The downsampling operation generates a critically sampled structure, where the number of output DWT coefficients is equal to the number of input pixels. This is possible because ideally, the \(L\) filter is designed to reject the frecuencies that the \(H\) filter pass, and viceversa (it is said that the filters are complementary, or that they form a perfect-reconstruction filter bank). In the practice, there is always some aliasing (overlaping between the response of the filters in the frequency domain), but this does not affect5 to the number of output coefficients. In terms of Linear Algebra, analysis \(L\) and \(H\) filters, and the synthesis \(L^{-1}\) and \(H^{-1}\) filters (the filters that reconstruct the pixels from the coefficients generated by \(L\) and \(H\)) would be orthogonal, that is, the basis functions that they convolve with the signals are completely indenpendent.

For the sake of simplicity, we will denote the 1-iteration DWT spatial decomposition as \(\{L, H\}\) where \(L=LL\) and \(H=\{LH, HL, HH\}\). See also DWT vs LPT.

2.1 Implementation of the 3 components (usually color) DWT step

DWT.py (see Fig. 1) implements the forward (DWT) and backward (iDWT) 1-iteration (step) DWT for color images.

 import distortion 
import image_3 
#import L_DWT as L 
#import H_DWT as H 
#import debug 
import information 
 
#import logging 
#logging.basicConfig( 
#    level=logging.INFO, 
##    format="%(pathname)s:%(lineno)d %(levelname)s %(message)s", 
#    format = "[%(filename)s:%(lineno)s - %(funcName)20s() ] %(message)s" 
##    handlers=[ 
##        #logging.FileHandler("debug.log"), 
##        logging.StreamHandler() 
##    ] 
#) 
 
import logging 
logger = logging.getLogger(__name__) 
#logging.basicConfig(format="[%(filename)s:%(lineno)s %(funcName)s()] %(message)s") 
#logger.setLevel(logging.CRITICAL) 
#logger.setLevel(logging.ERROR) 
logger.setLevel(logging.WARNING) 
#logger.setLevel(logging.INFO) 
#logger.setLevel(logging.DEBUG) 
 
#_wavelet = pywt.Wavelet("haar") 
#_wavelet = pywt.Wavelet("db1") 
_wavelet = pywt.Wavelet("db5") 
#_wavelet = pywt.Wavelet("db20") 
#_wavelet = pywt.Wavelet("bior3.5") 
#_wavelet = pywt.Wavelet("bior3.7") 
#_wavelet = pywt.Wavelet("bior6.8") 
#_wavelet = pywt.Wavelet("rbio6.8") 
 
# Number of levels of the DWT 
#N_levels = config.n_levels 
_N_levels = 5 
 
# Signal extension mode 
#_extension_mode = "symmetric" # default 
#_extension_mode = "constant" 
#_extension_mode = "reflect" 
#_extension_mode = "periodic" 
#_extension_mode = "smooth" 
#_extension_mode = "antisymmetric" 
#_extension_mode = "antireflect" 
_extension_mode = "periodization" # Generates the inimal number of coeffs 
#_extension_mode = config.dwt_extension_mode 
 
logger.info(f"Wavelet={_wavelet}") 
logger.info(f"DWT extension mode={_extension_mode}") 
 
def analyze_step(color_image, wavelet=_wavelet): 
    '''Color 1-levels forward 2D-DWT. 
 
    Parameters 
    ---------- 
    color_image : [row, column, component] np.ndarray 
        Color image to be analyzed. 
    wavelet : pywt.Wavelet 
        Wavelet name. 
 
    Returns 
    ------- 
    A 1-levels color decomposition : tuple 
        Subbands (LL, (LH, HL, HH)). 
 
    ''' 
    logger.info(f"wavelet={wavelet}") 
    N_comps = color_image.shape[2] 
    decomposition_by_component = [None]*N_comps 
    # A color decomposition 
    for c in range(N_comps): 
        decomposition_by_component[c] = pywt.dwt2(data=color_image[:,:,c], wavelet=wavelet, mode=_extension_mode) 
    assert decomposition_by_component[0][0].shape == decomposition_by_component[0][1][0].shape 
    N_rows_subband, N_columns_subband = decomposition_by_component[0][0].shape # All subbands have the same shape 
    LL = np.empty(shape=(N_rows_subband, N_columns_subband, N_comps), dtype=np.float64) 
    LH = np.empty(shape=(N_rows_subband, N_columns_subband, N_comps), dtype=np.float64) 
    HL = np.empty(shape=(N_rows_subband, N_columns_subband, N_comps), dtype=np.float64) 
    HH = np.empty(shape=(N_rows_subband, N_columns_subband, N_comps), dtype=np.float64) 
    for c in range(N_comps): 
        LL[:,:,c] = decomposition_by_component[c][0][:,:] 
        LH[:,:,c] = decomposition_by_component[c][1][0][:,:] 
        HL[:,:,c] = decomposition_by_component[c][1][1][:,:] 
        HH[:,:,c] = decomposition_by_component[c][1][2][:,:] 
    return (LL, (LH, HL, HH)) # Ojo, para ser coherentes, deberia 
                              # retornarse una lista, no una tupla, y 
                              # asi tendriamos una descomposicion de 1 
                              # nivel 
 
def synthesize_step(LL, H, wavelet=_wavelet): 
    '''Color 1-levels backward 2D-DWT. 
 
    Parameters 
    ---------- 
    LL : np.ndarray 
        Low-pass subband. 
    H : tuple
Figure 1: Implementation of the multicomponent 1-iteration DWT (forward and backward). More information about the implementation can be found at PyWavelets.

3 Multi-level DWT structures

A \(N\)-iterations6 dyadic DWT is the result of a recursive use of the 1-iteration DWT applied to the \(L\) subband, \(N\) times, producing a \((N+1)\)-levels dyadic structure which is able to provide \(N+1\) Spatial Resolution Levels (SRL’s). For example, in the Fig. 2 the DWT has been applied 2-times, generating 3 SRLs: (1) \(L^2\), (2) \(L^1=\mathtt {iDWT}(L^2, H^2)\), and (3) \(L^0=\mathtt {iDWT}(L^1, H^1)\), where \(\mathtt {iDWT}\) stands for inverse DWT.

    +------+------+-------------+ 0         +------+------+-------------+
    |      |      |             |           |      |      |             |
    | LL^2 | LH^2 |             |           | L^2  |      |             |
    |      |      |             |           |      |      |             |
    +------+------+     LH^1    |           +------+      |             |
    |      |      |             |           |             |             |
    | HL^2 | HH^2 |             |           |        H^2  |             |
    |      |      |             | Y/2-1     |             |             |
    +------+------+-------------+       =   +-------------+             |
    |             |             |           |                           |
    |             |             |           |                           |
    |             |             |           |                           |
    |     HL^1    |     HH^1    |           |                    H^1    |
    |             |             |           |                           |
    |             |             |           |                           |
    |             |             | Y-1       |                           |
    +-------------+-------------+           +---------------------------+
    0           X/2-1         X-1

Figure 2: Decomposition generated by \(\mathtt {DWT}(\mathtt {iters}=2)\) using the standard notation (left) and the compact notation (right).

4 Scalability provided by the DWT

Depending on the number of subband levels (pyramid levels in the OLP), we can reconstruct an image at different spatial scales (resolutions). For example, if a image has been transformed using the DWT of \(2\) iterations (see Fig. 2), we get:

4.1 Example: A 1-iteration DWT

An image is transformed, generating four subbands \(LL\), \(LH\), \(HL\) and \(HH\).

# You must be in the ’src’ directory.

# Copy an image (the output directory must be the same as the input one).
rm /tmp/*.png
cp ../sequences/stockholm/000.png /tmp

# 1-iteration 2D DWT.
python3 -O DWT.py -p /tmp/ -i 000

# Visualize the subbands.
display -normalize /tmp/LL000.png # <- LL^1
display -normalize /tmp/LH000.png # <- LH^1
display -normalize /tmp/HL000.png # <- HL^1
display -normalize /tmp/HH000.png # <- HH^1

# 1-iteration 2D iDWT.
python3 -O DWT.py -b -p /tmp/ -i 000

# Visualize the reconstruction.
python3 ../tools/substract_offset.py -i /tmp/000.png -o /tmp/1.png
display /tmp/1.png

# Show diferences (PyWavelets uses floating point arithmetic).
../tools/show_differences.sh -1 /tmp/000.png -2 ../sequences/stockholm/000.png -o /tmp/diffs.png
display /tmp/diffs.png

4.2 Example: A 2-iterations DWT (\(2\times \mathtt {DWT}\))

# You must be in the ’src’ directory.

# Copy a image (the output directory must be the same than the input one).
rm /tmp/*.png
cp ../sequences/stockholm/000.png /tmp

# First iteration.
python3 -O DWT.py -p /tmp/ -i 000

# Second iteration.
python3 -O DWT.py -p /tmp/LL -i 000

# Visualize the subbands.
ls -l /tmp/*.png
display -normalize /tmp/LL000.png # <- LL^1
display -normalize /tmp/LH000.png # <- LH^1
display -normalize /tmp/HL000.png # <- HL^1
display -normalize /tmp/HH000.png # <- HH^1
display -normalize /tmp/LLLL000.png # <- LL^2
display -normalize /tmp/LLLH000.png # <- LH^2
display -normalize /tmp/LLHL000.png # <- HL^2
display -normalize /tmp/LLHH000.png # <- HH^2

# Reverse second iteration.
python3 -O DWT.py -b -p /tmp/LL -i 000
display -normalize /tmp/LL000.png

# Reverse first iteration.
python3 -O DWT.py -b -p /tmp/ -i 000
python3 ../tools/substract_offset.py -i /tmp/000.png -o /tmp/1.png
display /tmp/1.png

# Show diferences.
../tools/show_differences.sh -1 /tmp/000.png -2 ../sequences/stockholm/000.png -o /tmp/diffs.png
display /tmp/diffs.png

5 Motion DWT (MDWT)

The 2D-DWT can be applied to a sequence of frames (images) by simply transforming each one independently. This is done, for example, in the Motion JPEG2000 video compression standard. Notice that only the spatial redundancy is exploited in MDWT (all the temporal redundancy still remains in the video). The decomposition structure generated by MDWT is shown in the Fig 3. The 1-iteration MDWT inputs a sequence of frames \(\{F_i\}\) and outputs a sequence of decompositions \(\{\{F_i\}.L, \{F_i\}.H\}\), also called sequence spatial subbands. In all this discussion, \(i\) is an integer number starting a \(0\) (\(F_0\) is always the first frame of the video). Notice that \(\{F_i\}.L\) represents the scale \(1\) of the sequence (a version of the original video \(\{F_i\}\) but with a resolution \(Y/2\times X/2\)), and that \(\{F_i\}.L^0 = \{F_i\} = \mathtt {iMDWT}(\{F_i\}.L,\{F_i\}.H )\). See the Fig. 4 for more details.

Figure 3: Subbands produced by the MDWT.
 lass MDWT: 
 
    def __init__(self, wavelet = "bior3.5"): 
        self.dwt = DWT(wavelet) 
 
    def forward(self, prefix="/tmp/", N=5): 
        '''1-iteration Motion 2D DWT of a sequence of images. 
 
        Compute the forward 2D-DWT of each image of the sequence 
        placed at <prefix>. 
 
        Input 
        ----- 
 
            prefix: the sequence of images to be transformed. 
            N: number of images. 
 
        Output 
        ------ 
 
            (disk): the sequence of decompositions. 
 
        ''' 
        for i in range(N): 
            img = image.read(prefix, "{:03d}".format(i)) 
            pyr = self.dwt.forward(img) 
            decomposition.write(pyr, prefix, "{:03d}".format(i)) 
 
    def backward(self, prefix="/tmp/", N=5): 
        '''1-iteration Inverse Motion 2D DWT of a sequence of decompositions. 
 
        Compute the inverse 2D-DWT of each decomposition of the 
        sequence of decompositions placed at <prefix>. 
 
        Input: 
        ----- 
 
            prefix: the sequence of decompositions to be transformed. 
            N: the number of decompositions. 
 
        Output: 
        ------ 
 
            (disk): the sequence of images. 
 
        ''' 
 
        for i in range(N): 
            pyr = decomposition.read(prefix, "{:03d}".format(i)) 
            img = self.dwt.backward(pyr) 
            image.write(img, prefix, "{:03d}".format(i))

.

Figure 4: Implementation of the MDWT.

6 Scalability provided by MDWT

MDWT sequences (of (subband) decompositions) are scalable in space and in time. Spatial scalability is a direct consequence of the DWT (the number of SRLs can be controlled by the number of iterations of DWT). On the other hand, it is trivial to observe that MDWT provides fully temporal scalability (we can access to the images randomly) because each image of the input sequence is transformed independently.

6.1 Example: 1-iteration MDWT (\(\mathtt {MDWT}(N=5)\))

# You must be in the ’src’ directory.
rm /tmp/*.png

# Create the output directory for the 1st-level decompositions.
yes | cp ../sequences/stockholm/*.png /tmp

# 2D 1-iteration MDWT.
python3 -O MDWT.py -p /tmp/ -N 5

# Visualize the subbands.
for i in /tmp/LL00?.png; do convert -normalize $i $i.norm; done
animate /tmp/LL*.norm
for i in /tmp/LH00?.png; do convert -normalize $i $i.norm; done
animate /tmp/LH*.norm
for i in /tmp/HL00?.png; do convert -normalize $i $i.norm; done
animate /tmp/HL*.norm
for i in /tmp/HH00?.png; do convert -normalize $i $i.norm; done
animate /tmp/HH*.norm

# 1-iteration iMDWT.
python3 -O MDWT.py -b -p /tmp/ -N 5

# Visualization of the reconstruction.
for i in /tmp/???.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done
animate /tmp/???.png.png

# Visualization of the residue.
for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done
animate /tmp/diff*.png

6.2 Example: 2-iterations MDWT (\(2\times \mathtt {MDWT}(N=5)\))

# You must be in the ’src’ directory.
rm /tmp/*.png

# Create the output directory for the 1st-level decompositions.
yes | cp ../sequences/stockholm/*.png /tmp

# 2-iterations MDWT.
python3 -O MDWT.py -p /tmp/ -N 5
python3 -O MDWT.py -p /tmp/LL -N 5

# 2-iterations iMDWT.
python3 -O MDWT.py -b -p /tmp/LL -N 5
python3 -O MDWT.py -b -p /tmp/ -N 5

# Visualization of the reconstruction.
for i in /tmp/???.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done
animate /tmp/???.png.png

# Visualization of the residue.
for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done
animate /tmp/diff*.png

6.3 Example: Interpolating (a sequence) with the DWT

If we apply an inverse transform to the original frames7, supposing that they are the low-frequency subbands of the input decomposition, and that the high-frequency subbands are zero, we can interpolate each frame of the original sequence by a factor of \(2\) in each direction.

# You must be in the ’src’ directory.
rm /tmp/*.png

# Copy the current images as LL subbands.
rm /tmp/*.png
for i in {0..4}; do
  image_number=$(printf "%03d" $i)
  cp ../sequences/stockholm/$image_number.png /tmp/LL$image_number.png
done

# Apply the inverse DWT.
python3 -O MDWT.py -b -p /tmp/ -N 5

# Visualization of the reconstruction.
for i in /tmp/???.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done
animate /tmp/???.png.png

# Notice that the dynamic range of the sequence could be changed (the
# images could appear brighter of darker). This is consequence of the
# relative gain of the LL subband in respect of the rest of subbands.
rm /tmp/*.norm
for i in /tmp/???.png; do convert -normalize $i $i.norm; done
animate /tmp/*.png.norm

7 Video transform alternatives

To remove both, spatial and temporal redundancies, two different alternatives are available:

  1. In a “t+2D” transform, the video is first analyzed (transformed) along the time domain and next, over the space domain.
  2. A “2D+t” transform does just the opposite.

Each choice has a number of pros and cons. For example, in a “t+2D” transform we can apply directly any frame predictor based on Motion Estimation (ME) because the input is a normal video. On the contrary, if we implement a “2D+t” transform, the input to the motion estimator is a sequence of decompositions. The overwhelming majority of DWT’s are not shift invariant, which basically means that, exactly the same object placed in two different frames in different positions will generate different wavelet coefficients, even if the displacement is an integer number of pixels and therefore, the pixels of the object in both frames are identical. Therefore, motion estimators that compare coefficient values will not work with accuracy on the decomposition domain. On the other hand, if we want to provide true spatial scalability (processing only those spatial resolutions (scales) necessary to get a spatially scaled of our video), a “t+2D” transformed video presents several drawbacks:

  1. The perfect reconstruction of the original frames only is possible if the “t” stage is identical at both, the encoder and the decoder. Therefore, if the encoder applies “t” at the full resolution (scale 0), the decoder must apply “t” also at full resolution, even if the resolution of the reconstructed frames were smaller. This could be unfeasible for receivers with low computational resourcers.
  2. Perfect reconstruction can be sacrified by using a quantized version of the motion information, but the rate/distortion (R/D) tradeoff will be worst. In this case, also, the decoder would be wasting motion information.
  3. This last problem (the waste of motion information) could be avoided if the motion data were encoded in a progressive representation (suitable for working in a different spatial resolution of the frames). Unfortunately, progressive representations usually need more data than non-progressive ones. Moreover, it is not trivial to pack the motion data and the texture data in a single code-stream to provide all the posible types of scalability (temporal, spatial, and quality).

8 DWT structures and Laplacian Pyramids

This, as it will be seen after, shift variance of the wavelet coefficients is a problem when we want to perform comparisons in the DWT domain. As it can be supposded, this problem can be mitigated if the downsampling operation is removed, as happens in the Overcomplete DWT and in the Laplacian Pyramids (LP). Fortunately, there is an equivalence between DWT subbands and a family of Laplacian Pyramids that we will call Orthogonal LP’s (OLP’s), and it is quite straightforward to convert a decomposition representation into a pyramid representation and viceversa, when the \(L\) and \(H\) filters are used for both, the DWT and the LP, are orghogonal. If the filters used for generating the LP were not orthogonal, then it would not be possible to travel from the LP domain through the DWT domain without lossing information.

9 Motion Compensated Orthogonal Laplacian Pyramid (MCOLP)

MCOLP is a “2D+t” transform. The “2D” stage is an 1-iteration MDWT, and the “t” stage is basicallt a \(T\)-iterations 1D Motion Compensated (MC) DWT. The MDWT removes the spatial redundancy of the video, and the temporal DWT removes the temporal redundancy between \(H\) subbands. The number of iterations \(T\) determines the GOP size \begin {equation} G=2^T. \label {eq:GOP_size} \end {equation}

We can think of MCLOPT such as a MCTF applied to a given spatial resolution level, and because the motion is estimated using the previous lower resolution level (which is available at the decoder), it is not necessary to send the motion information in the code-stream. Notice also that this process can be used for any number of spatial resolution levels, simply by using MCLOPT iteratively on the low spatial resolution level (resulting of the previous iteration of MRVC that applies first a MDWT).

A peculiar feature of MCLOP is that the predict step used in the “t”-DWT is performed in the OLP domain, which is shift invariant because the coefficients have not been downsampled (as happens in the DWT domain).

9.1 The MCOLP butterfly

The butterly inputs three decompositions \(a=\{a.L, a.H\}\), \(b=\{b.L, b.H\}\) and \(c=\{c.L, c.H\}\), and outputs a residue subband \(\tilde {b}.H\), which replaces to \(b.H\) in the original \(b\) decomposition. Therefore, after the use of the bufferfly, we get \(a\) , \(\tilde {b}=\{b.L, \tilde {b}.H\}\) (notice that only the high-frequency information of \(b\) has been compensated) and \(c\). This replacement is fully reversible because the forward transform uses only the information that the inverse transform will have access to.

Notice that the pyramid domain (which is near-invariant to the pixels displacements) has been used to estimate and compensate the \(H\) subbands. This means that the motion has been estimated using integer (not sub-) pixel accuracy, and that the motion has been compensated also integer pixel accuracy in the OLP domain, that implies \(1/2\)-pixel accuracy in the DWT domain.

After the MC, a DWT is computed to \(\tilde {b}.H\) in order to get again a critically sampled version these subbands. After this, the \(b.L\) subband is discarded (in fact, replaced by the original subband). The MC operation can generate information in this subband, because the MC can produce low-frequency information that could not be represented by the \(H\) subband. However, hopefully the energy of the \(L\) subband is going to be small in most of the cases, and in any case, this is not an unsolvable problem because the decoder is going to replicate exactly this behaviour.

 lass MCOLP: 
 
    def __init__(self, shape, wavelet = "bior3.5"): 
        self.zero_L = np.zeros(shape, np.float64) 
        self.zero_H = (np.zeros(shape, np.float64), np.zeros(shape, np.float64), np.zeros(shape, np.float64)) 
        self.dwt = DWT(wavelet) 
 
    def __forward_butterfly(self, aL, aH, bL, bH, cL, cH): 
        '''Forward MCOLP butterfly. 
 
        Input: 
        ----- 
 
        aL, aH, bL, bH, cL, cH: array[y, x, component], the decomposition of 
        the images a, b and c. 
 
        Output: 
        ------ 
 
        residue_bH: array[y, x, component], the residue of the 
        high-requency information of the image b. 
 
        ''' 
 
        AL = self.dwt.backward((aL, self.zero_H)) 
        BL = self.dwt.backward((bL, self.zero_H)) 
        CL = self.dwt.backward((cL, self.zero_H)) 
        AH = self.dwt.backward((self.zero_L, aH)) 
        BH = self.dwt.backward((self.zero_L, bH)) 
        CH = self.dwt.backward((self.zero_L, cH)) 
        prediction_BH  = predictor.generate_prediction(AL, BL, CL, AH, CH) 
        residue_BH = BH - prediction_BH 
        residue_bH = self.dwt.forward(residue_BH) 
        return residue_bH[1] 
 
    def __backward_butterfly(self, aL, aH, bL, residue_bH, cL, cH): 
        '''Backward MCOLP butterfly. 
 
        Input: 
        ----- 
 
        aL, aH, bL, residue_bH, cL, cH: array[y, x, component], the 
        \tilde{b} image. 
 
        Output: 
        ------ 
 
        bH: array[y, x, component], the high-frequencies of the image 
        b. 
 
        ''' 
 
        AL = self.dwt.backward((aL, self.zero_H)) 
        BL = self.dwt.backward((bL, self.zero_H)) 
        CL = self.dwt.backward((cL, self.zero_H)) 
        AH = self.dwt.backward((self.zero_L, aH)) 
        residue_BH = self.dwt.backward((self.zero_L, residue_bH)) 
        CH = self.dwt.backward((self.zero_L, cH)) 
        prediction_BH  = predictor.generate_prediction(AL, BL, CL, AH, CH) 
        BH = residue_BH + prediction_BH 
        bH = self.dwt.forward(BH) 
        return bH[1]
 
 ef generate_prediction(aL, bL, cL, AH, CH): 
    flow_AL_BL = motion_estimation(aL, bL) 
    flow_CL_BL = motion_estimation(cL, bL) 
    flow_AL_BL = DWT.backward((flow_AL_BL, [None, None, None])) 
    flow_CL_BL = DWT.backward((flow_CL_BL, [None, None, None]))
Figure 5: Forward and inverse MCOLP butterflies, and computation of the prediction frame.

9.2 The MCOLP transform

The \(T\)-iterations MCOLP (see Fig. 6) is the result of applying the MCOLP butterfly to all the frames at \(T\) different temporal subsamplings (or scales). At the iteration \(1\leq t\leq T\) of MCOLP, the involved frames are multiple of \(2^t\).

     def forward(self, prefix = "/tmp/", N=5, T=2): 
        '''Forward MCOLP. 
 
        Compute a MC 1D-DWT, estimating in the L subbands and 
        compensaing in the H subbands of the Orthogonal Laplacian 
        Pyramid. The input video (as a sequence of 1-iteration 
        decompositions) must be stored in disk in the directory 
        <prefix>, and the output (as a 1-iteration MC decompositions) 
        will generated in the same directory. 
 
        Input 
        ----- 
 
            prefix : str 
 
                Localization of the input/output images. Example: 
                "/tmp/". 
 
             N : int 
 
                Number of decompositions to process. 
 
             T : int 
 
                Number of iterations of the MCOLP (temporal scales). 
                Controls the GOP size. 
 
                  T | GOP_size 
                ----+---------- 
                  0 |        1 
                  1 |        2 
                  2 |        4 
                  3 |        8 
                  4 |       16 
                  5 |       32 
                  : |        : 
 
        Returns 
        ------- 
 
            The output motion compensated decompositions. 
 
        ''' 
        x = 2 
        for t in range(T): # Temporal scale 
            #print("a={}".format(0), end=' ') 
            i = 0 
            aL, aH = decomposition.read(prefix, "{:03d}".format(0)) 
            while i < (N//x): 
                #print("b={} c={}".format(x*i+x//2, x*i+x)) 
                bL, bH = decomposition.read(prefix, "{:03d}".format(x*i+x//2)) 
                cL, cH = decomposition.read(prefix, "{:03d}".format(x*i+x)) 
                residue_bH = self.__forward_butterfly(aL, aH, bL, bH, cL, cH) 
                decomposition.writeH(residue_bH, prefix, "{:03d}".format(x*i+x//2)) 
                aL, aH = cL, cH 
                #print("a={}".format(x*i+x), end=' ') 
                i += 1 
            x *= 2 
            #print('\n') 
 
    def backward(self, prefix = "/tmp/", N=5, T=2): 
        '''Backward MCOLP. 
 
        Compute the inverse MC 1D-DWT. The input sequence of 
        1-iteration MC decompositions must be stored in disk in the 
        directory <prefix>, and the output (as a 1-iteration 
        decompositions) will generated in the same directory. 
 
        Input 
        ----- 
 
            prefix : str 
 
                Localization of the input/output images. Example: 
                "/tmp/". 
 
             N : int 
 
                Number of decompositions to process. 
 
             T : int 
 
                Number of iterations of the MCOLP (temporal scales). 
 
        Returns 
        ------- 
 
            The sequence of 1-iteration decompositions. 
 
        ''' 
        x = 2**T 
        for t in range(T): # Temporal scale 
            i = 0 
            aL, aH = decomposition.read(prefix, "{:03d}".format(0)) 
            while i < (N//x): 
                bL, residue_bH = decomposition.read(prefix, "{:03d}".format(x*i+x//2)) 
                cL, cH = decomposition.read(prefix, "{:03d}".format(x*i+x)) 
                bH = self.__backward_butterfly(aL, aH, bL, residue_bH, cL, cH) 
                decomposition.writeH(bH, prefix, "{:03d}".format(x*i+x//2)) 
                aL, aH = cL, cH 
                i += 1 
            x //=2
Figure 6: Implementation of the MCOLP.

Notice that, if the parameter \(N=G+1\) (where \(G\) is the GOP size, see Eq. 1), except for the first and the last frame of the sequence, all the \(H\)-subbands are compensated.

9.3 Example: \(\mathtt {MRVC}(T=1, N=5)\)

In this example (see Fig. 7), a sequence of five frames is transformed first with \(\mathtt {MDWT}(N=5)\), producing 2 SRLs by frame. Next, the \(H\) subbands of the odd frames (frames 1 and 3) are motion compensated using \(\mathtt {MCOLP}(T=1, N=5)\).

predictor=1
iterations=1

# You must be in the ’src’ directory to run this.
rm -f /tmp/*.png
cp ../sequences/stockholm/*.png /tmp

# 1-iteration MDWT.
python3 -O MDWT.py -p /tmp/

# Show the length of the subbands.
for i in /tmp/LL???.png; do ls -l $i; done
for i in /tmp/LH???.png; do ls -l $i; done
for i in /tmp/HL???.png; do ls -l $i; done
for i in /tmp/HH???.png; do ls -l $i; done

# 1-iteration MCOLP.
python3 -O MCOLP.py -P $predictor -p /tmp/ -T $iterations

# Show the length of the subbands.
for i in /tmp/LL???.png; do ls -l $i; done
for i in /tmp/LH???.png; do ls -l $i; done
for i in /tmp/HL???.png; do ls -l $i; done
for i in /tmp/HH???.png; do ls -l $i; done

# Has changed in length any of them? Remember that a change in lenght
# implies a change in content.

# Lets recover the original sequence ...
rm /tmp/???.png

# 1-iteration iMCOLP.
python3 -O MCOLP.py -P $predictor -b -p /tmp/ -T $iterations

# 1-iteration iMDWT.
python3 -O MDWT.py -b -p /tmp/

# Visualization of the reconstruction.
for i in /tmp/???.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done
animate /tmp/???.png.png

                                                                  

                                                                  
# Visualization of the residue.
for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done
animate /tmp/diff*.png

Figure 7: Subbands generated after running \(\mathtt {MRVC}(T=1, N=5)\): (1) subband \(\{F_i\}.L\), that provides spatial resolution level \(1\) of the sequence, (2) subband \(\{F_{2i}\}.H\), which provides spatial resolution level \(0\) for the even frames (temporal resolution level \(1\) at full spatial resolution), and (3) subband \(\{{\tilde F}_{2i+1}\}.H\), that provides spatial resolution level \(0\) for the odd frames (temporal resolution level \(0\) at full spatial resolution, considering that \(\{F_{2i}\}.H\) has been already decoded).

9.4 Example: \(\mathtt {MRVC}(T=2, N=5)\)

This example is similar to the previous, but now the number of iterations of MCOLP is \(T=2\), and therefore, the second iteration of MCOLP compensates also the \(H\) subband of the frame 2. This example matches exactly with the described in the Fig. 8. Notice that the GOP size is \(4\) (\(3\) TRLs) and the number of SRLs is \(2\).

predictor=1

# You must be in the ’src’ directory.
rm /tmp/*.png
cp ../sequences/stockholm/*.png /tmp

# 1D 1-iteration MDWT.
python3 -O MDWT.py -p /tmp/

# Save a copy of the MDWT subbands (for comparing later).
rm -rf /tmp/tmp
mkdir /tmp/tmp
cp /tmp/*.png /tmp/tmp

# 1-iteration MCOLP.
python3 -O MCOLP.py -p /tmp/ -T 1

# Is the content of the MCOLP subbands different to the MDWT’s output?
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/LL$ii.png /tmp/tmp/LL$ii.png;echo; done
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/LH$ii.png /tmp/tmp/LH$ii.png;echo; done
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/HL$ii.png /tmp/tmp/HL$ii.png;echo; done
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/HH$ii.png /tmp/tmp/HH$ii.png;echo; done

rm /tmp/*.png
cp ../sequences/stockholm/*.png /tmp

# 1D 1-iteration MDWT.
python3 -O MDWT.py -p /tmp/

# Save a copy of the MDWT subbands (for comparing later).
rm -rf /tmp/tmp
mkdir /tmp/tmp
cp /tmp/*.png /tmp/tmp

# 2-iterations MCOLP.
python3 -O MCOLP.py -p /tmp/ -T 2

# Is the content of the MCOLP subbands different to the MDWT’s output?
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/LL$ii.png /tmp/tmp/LL$ii.png;echo; done
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/LH$ii.png /tmp/tmp/LH$ii.png;echo; done
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/HL$ii.png /tmp/tmp/HL$ii.png;echo; done
                                                                  

                                                                  
for i in {0..4}; do ii=$(printf "%03d" $i); ls -l /tmp/HH$ii.png /tmp/tmp/HH$ii.png;echo; done

# Comparing LH visually.
for i in {0..4}; do ii=$(printf "%03d" $i); convert -normalize /tmp/tmp/LH$ii.png /tmp/tmp/MDWT_LH$ii.png; done
for i in {0..4}; do ii=$(printf "%03d" $i); convert -normalize /tmp/LH$ii.png /tmp/MCOLP_LH$ii.png; done
animate /tmp/tmp/MDWT_LH00?.png &
animate /tmp/MCOLP_LH00?.png &

# Comparing LH002 numerically.
echo MDWT > /tmp/1
printf "%61s\n" MCOLP > /tmp/2
python3 ../tools/show_statistics.py -i /tmp/tmp/LH002.png >> /tmp/1
python3 ../tools/show_statistics.py -i /tmp/LH002.png >> /tmp/2
paste /tmp/1 /tmp/2

# Comparing HL visually.
for i in {0..4}; do ii=$(printf "%03d" $i); convert -normalize /tmp/tmp/HL$ii.png /tmp/tmp/MDWT_HL$ii.png; done
for i in {0..4}; do ii=$(printf "%03d" $i); convert -normalize /tmp/HL$ii.png /tmp/MCOLP_HL$ii.png; done
animate /tmp/tmp/MDWT_HL00?.png &
animate /tmp/MCOLP_HL00?.png &

# Comparing HL002 numerically.
echo MDWT > /tmp/1
printf "%61s\n" MCOLP > /tmp/2
python3 ../tools/show_statistics.py -i /tmp/tmp/HL002.png >> /tmp/1
python3 ../tools/show_statistics.py -i /tmp/HL002.png >> /tmp/2
paste /tmp/1 /tmp/2

# Comparing HH visually.
for i in {0..4}; do ii=$(printf "%03d" $i); convert -normalize /tmp/tmp/HH$ii.png /tmp/tmp/MDWT_HH$ii.png; done
for i in {0..4}; do ii=$(printf "%03d" $i); convert -normalize /tmp/HH$ii.png /tmp/MCOLP_HH$ii.png; done
animate /tmp/tmp/MDWT_HH00?.png &
animate /tmp/MCOLP_HH00?.png &

# Comparing HH002 numerically.
echo MDWT > /tmp/1
printf "%61s\n" MCOLP > /tmp/2
python3 ../tools/show_statistics.py -i /tmp/tmp/HH002.png >> /tmp/1
python3 ../tools/show_statistics.py -i /tmp/HH002.png >> /tmp/2
paste /tmp/1 /tmp/2

# Now we recover the original video.

# 2-iterations iMCOLP.
python3 -O MCOLP.py -P $predictor -b -p /tmp/ -T $iterations

                                                                  

                                                                  
# 1-iteration iMDWT
python3 -O MDWT.py -b -p /tmp/

# Visualization of the reconstruction.
for i in /tmp/???.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done
animate /tmp/???.png.png

# Visualization of the residue.
for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done
animate /tmp/diff*.png

python3 ../tools/show_statistics.py -i /tmp/diff_002.png

Figure 8: Subbands generated after running \(\mathtt {MRVC}(T=2, N=5)\): (1) subband \(\{F_i\}.L\), that provides spatial resolution level \(1\) of the sequence, (2) \(\{F_{2^2i}\}.H\), which provides spatial resolution level \(0\) for frames with number \(2^2i\) (temporal resolution level \(2\) at full spatial resolution), (3) subband \(\{{\tilde F}_{2^2i+2}\}.H\), that provides spatial resolution level \(0\) for the frames with number \(2^2i+2\) (that considering also \(\{F_{2^2i}\}.H\), provides temporal resolution level \(1\) at full spatial resolution), and finally (4) subband \(\{{\tilde F}_{2i+1}\}.H\), which provides spatial resolution level \(0\) for odd images (temporal resolution level \(0\) at full spatial resolution, considering the previouly decoded temporal resolution levels).

9.5 Simplification of the notation

For the sake of simplifying the notation, we will represent \(\{F_t\}.L^s\) by \(L_t^s\), and \(\{F_t\}.H^s\) by \(H_t^s\). Thus, for example, the subbands shown in the Fig. 8 can be represented as: (1) \(L\) (that is a version of the sequence with spatial resolution \(X/2\times Y/2\)), (2) \(H_{2^2i}\) (made up of intra-coded high-frequency spatial information of the frames with numbers \(2^2i\)), (3) \(\tilde {H}_{2^2i+2}\) (with the high-frequency spatio-temporal prediction error resulting from decorrelating even frames with distance \(2^2\)), and (4) \(\tilde {H}_{2i+1}\) (high-frequency spatio-temporal prediction error resulting from decorrelating frames with distance \(2\)).

9.6 Example: \(\mathtt {MRVC}(T=4, N=17)\)

An example with \(G=2^4=16\), i.e. \(5\) TRLs and \(2\) SRLs. The subbands generated are:

  1. \(L\), that by itself, allows to reconstruct all the frames at half resolution.
  2. \(H_{16i+8}\), which with \(L\), provides full resolution for frames with number \(16i+8\) (the rest of frames have half resolution).
  3. \(H_{8i+4}\), that with the previous subbands provides full resolution also for frames with number \(8i+4\).
  4. \(H_{4i+2}\), which also provides full resolution for frames \(4i+2\).
  5. \(H_{2i+1}\), that reconstruct the original sequence.

The temporal decorrelation pattern followed by \(\mathtt {iMCOLP(T=4, N=17)}\) is (notice that the hexadecimal notation has been followed for indexing the frames) shown in the Fig. 9.

GOP0                     GOP1
---- ------------------------------------------------
  F0                      F8                      F10 -> H_{8i }  (= TRL3)
              F4                      Fc              -> H_{8i+4} (+ TRL3 = TRL2)
        F2          F6          Fa          Fe        -> H_{4i+2} (+ TRL2 = TRL1)
     F1    F3    F5    F7    F9    Fb    Fd    Ff     -> H_{2i+1} (+ TRL1 = TRL0)

Figure 9: Motion compensation pattern followed by \(\mathtt {MCOLP(T=4, N=17)}\). The temporal resolution levels (TRLs) are considering only the frames that has been reconstructed at the original spatial resolution.
# You must be in the ’src’ directory.
rm /tmp/*.png
video_file=/tmp/akiyo_cif.y4m
if test -f $video_file; then
  print akiyo exists
else
  wget https://media.xiph.org/video/derf/y4m/akiyo_cif.y4m -O /tmp/akiyo_cif.y4m
fi
mplayer /tmp/akiyo_cif.y4m
ffmpeg -i /tmp/akiyo_cif.y4m -vframes 17 -start_number 0 /tmp/%03d.png

# 1D 1-iteration MDWT.
python3 -O MDWT.py -p /tmp/ -N 17

# Save a copy of the MDWT subbands (for comparing later).
rm -rf /tmp/tmp
mkdir /tmp/tmp
cp /tmp/LL*.png /tmp/tmp
cp /tmp/LH*.png /tmp/tmp
cp /tmp/HL*.png /tmp/tmp
cp /tmp/HH*.png /tmp/tmp

# 4-iterations MCOLP.
python3 -O MCOLP.py -p /tmp/ -T 4 -N 17

ls -l /tmp/LH*.png > /tmp/1
ls -l /tmp/tmp/LH*.png > /tmp/2
paste /tmp/1 /tmp/2

ls -l /tmp/HL*.png > /tmp/1
ls -l /tmp/tmp/HL*.png > /tmp/2
paste /tmp/1 /tmp/2

ls -l /tmp/HH*.png > /tmp/1
ls -l /tmp/tmp/HH*.png > /tmp/2
paste /tmp/1 /tmp/2

# Now we recover the original video.

# 4-iterations iMCOLP.
python3 -O MCOLP.py -b -p /tmp/ -T 4 -N 17

                                                                  

                                                                  
# 1-iteration iMDWT
python3 -O MDWT.py -b -p /tmp/ -N 17

# Visualization of the reconstruction.
for i in /tmp/???.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done
animate /tmp/???.png.png

# Visualization of the residue.
for i in {0..16}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done
animate /tmp/diff*.png

python3 ../tools/show_statistics.py -i /tmp/diff_002.png

9.7 Scalability provided by MCOLP

MCOLP preserves the dyadic spatial decomposition generated by MDWT, and therefore, MCOLP provides the same spatial scalability than MDWT.

Unfortunately, as the rest of video encoders based on MC, MCOLP reduces the temporal scalability provided by MDWT, allowing (in the case of MCOLP) only a dyadic access to the frames.

9.8 Example: 3-iters of \(\mathtt {MRVC}(T=2, N=5)\)

predictor=1
iterations=2

# You must be in the ’src’ directory.
rm /tmp/*.png
cp ../sequences/stockholm/* /tmp

python3 -O MDWT.py  -p /tmp/
python3 -O MCOLP.py -p /tmp/     -P $predictor -T $iterations
python3 -O MDWT.py  -p /tmp/LL
python3 -O MCOLP.py -p /tmp/LL   -P $predictor -T $iterations
python3 -O MDWT.py  -p /tmp/LLLL
python3 -O MCOLP.py -p /tmp/LLLL -P $predictor -T $iterations

rm /tmp/00?.png

python3 -O MCOLP.py -p /tmp/LLLL -P $predictor -b -T $iterations
python3 -O MDWT.py  -p /tmp/LLLL               -b
python3 -O MCOLP.py -p /tmp/LL   -P $predictor -b -T $iterations
python3 -O MDWT.py  -p /tmp/LL                 -b
python3 -O MCOLP.py -p /tmp/     -P $predictor -b -T $iterations
python3 -O MDWT.py  -p /tmp/                   -b

for i in /tmp/00?.png; do python ../tools/substract_offset.py -i $i -o $i.png; done; animate /tmp/00?.png.png

for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done
animate /tmp/diff*.png

In this case, \(G=4\), generating \(3\) TRLs (see Fig. 10), and \(4\) is the number of SRLs (see Fig. 11).


TRLs:

GOP0     GOP1
---- ------------
  F0           F4  -> H_{4i} (= TRL2)
         F2        -> H_{4i+2} (+ TRL2 = TRL1)
      F1    F3     -> H_{2i+1} (+ TRL1 = TRL0)

Figure 10: Motion compensation pattern followed by \(\mathtt {MCOLP(T=2, N=5)}\). The TRLs consider only the frames that has been reconstructed at the original spatial resolution.

+---+---+-------+---------------+
| 3 | 2 |   1   |       0       |
+---+   |       |               |
|       |       |               |
+-------+       |               |
|               |               |
|               |               |
|               |               |
+---------------+               |
|                               |
|                               |
|                               |
|                               |
|                               |
|                               |
|                               |
|                               |
|                               |
|                               |
+-------------------------------+

Figure 11: SRLs generated by \(3\) iterations of \(\mathtt {MRVC}\). \(0\) corresponds with the original resolution of the frames.

10 Adaptive motion compensation based on the energy of the estimated prediction error

As can be seen in the MCOLP bufferfly (see Fig. 5), both predictions \([b_a.H]\) and \([b_c.H]\) weight the same in the prediction \begin {equation} [\hat {b.H}] = \frac {[b_a.H] + [b_c.H]}{2}. \end {equation} This simple computation, that can have very effective for example in dissolves video transitions, can also be counterproductive when objects appear only in one of the reference frames. For this reason, in this section an improved predictor is proposed, based on the estimated prediction error (see \([\tilde {b}.H]\) in the Fig. 5) generated throughout the ME/MC. To estimate this error, we will suppose that the prediction error generated for the high frequencies of an frame is correlated with the prediction error generated for the low frequencies of such frame.

Let \([b_a.L]\) the prediction generated for the subband \([b.L]\) using \([a.L]\) as reference and \([a.L]\rightarrow [b.L]\) as motion, and let \([b_c.L]\) the prediction generated for \([b.L]\) using \([c.L]\) as reference and \([c.L]\rightarrow [b.L]\) as motion. We now compute the prediction errors \begin {equation} \begin {array}{l} {[e_a.L]} = [b.L] - [b_a.L]\\ {[e_c.L]} = [b.L] - [b_c.L]. \end {array} \end {equation}

We define similarity matrices as \begin {equation} \begin {array}{l} {[s_a.L]} = \frac {1}{1+{|[e_a.L]|}}\\ {[s_c.L]} = \frac {1}{1+{|[e_c.L]|}}. \end {array} \label {eq:weighted_prediction} \end {equation} Notice that, if (for example) the error \([e_a.L]_{x,y}=0\), the similarity is \([s_a.L]_{x,y}=1\) (the maximum similarity). On the other side, if the error is high, the similarity tends to be low (but never \(0\)).

With this information, that can be recovered by the decoder, the improved prediction is defined as \begin {equation} [\hat {b.H}] = \frac {[b_a.H][s_a.L]+[s_c.L][b_c.H]}{[s_a.L]+[s_c.L]}. \end {equation} The Fig. 12 shows an implementation of this equation.

Notice that, if (for example) \([s_a.L]_{x,y}=1\) and \([s_c.L]_{x,y}\approx 0\), then \([\hat {b.H}]_{x,y} \approx [b_a.H]_{x,y}\), and viceversa. If \([s_a.L]_{x,y}=[s_c.L]_{x,y}\), then \([\hat {b.H}]_{x,y}=\frac {[b_a.H]_{x,y}+[b_c.H]_{x,y}}{2}\) (even if both similarities are small).

 ef generate_prediction(AL, BL, CL, AH, CH): 
    flow_AL_BL = motion_estimation(AL, BL) 
    flow_CL_BL = motion_estimation(CL, BL) 
    BAH = estimate_frame(AH, flow_AL_BL) 
    BCH = estimate_frame(CH, flow_CL_BL) 
    BAL = estimate_frame(AL, flow_AL_BL) 
    BCL = estimate_frame(CL, flow_CL_BL) 
    EAL = BL - BAL 
    ECL = BL - BCL 
    SAL = 1/(1+abs(EAL)) 
    SCL = 1/(1+abs(ECL)) 
    return (BAH*SAL+BCH*SCL)/(SAL+SCL)
Figure 12: Computation of the weighted average prediction.

11 Lossy compression: quantization

Quantization allows to remove information from signals. We will use it in MCOLP to achieve higher compression ratios, but at the expense of losing visual information (ideally, the less relevant information for the HVS should be removed first).

When the signal cannot be recoverd perfectly, we can use a distortion metric to find how much error has been generated.

11.1 Example: Effects caused by the total attenuation of the motion compensated (MCed) \(H\) subbands

predictor=2

# You must be in the ’src’ directory.
rm /tmp/*.png
cp ../sequences/stockholm/*.png /tmp


# 1D 1-iteration MDWT.
python3 -O MDWT.py -p /tmp/

# 1D 1-iteration MCOLP.
python3 -O MCOLP.py -p /tmp/

# Let’s delete the motion compensated subbands and reconstruct the sequence ...
rm -f /tmp/LH001.png
rm -f /tmp/HL001.png
rm -f /tmp/HH001.png
rm -f /tmp/LH002.png
rm -f /tmp/HL002.png
rm -f /tmp/HH002.png
rm -f /tmp/LH003.png
rm -f /tmp/HL003.png
rm -f /tmp/HH003.png

# 1D 1-iteration iMCOLP.
python3 -O MCOLP.py -P $predictor -b -p /tmp/

# 1D 1-iteration iMDWT.
python3 -O MDWT.py -b -p /tmp/

# Visualization of the reconstruction.
for i in /tmp/00?.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done; animate /tmp/00?.png.png

# Visualization of the differences with the original sequence.
rm -f /tmp/diff*.png; for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done; animate /tmp/diff*.png

# MSE per frame
for i in {0..4}; do ii=$(printf "%03d" $i); MSE=‘python3 -O ../tools/MSE.py -x ../sequences/stockholm/$ii.png -y /tmp/$ii.png‘; printf "MSE(%s)=%s\n" $ii $MSE; done

# PSNR per frame
for i in {0..4}; do ii=$(printf "%03d" $i); PSNR=‘python3 -O ../tools/PSNR.py -x ../sequences/stockholm/$ii.png -y /tmp/$ii.png‘; printf "PSNR(%s)=%s\n" $ii $PSNR; done

11.2 Example: Effects caused by the quantization of MCed \(H\) subbands

q_step=32
predictor=2

# You must be in the ’src’ directory.
rm /tmp/*.png
cp ../sequences/stockholm/* /tmp

python3 -O MDWT.py -p /tmp/
python3 -O MCOLP.py -P $predictor -p /tmp/

#python3 ../tools/show_statistics.py -i /tmp/LH001.png
python3 ../tools/quantize.py -i /tmp/LH001.png -o /tmp/LH001.png -q $q_step
#python3 ../tools/show_statistics.py -i /tmp/LH001.png
python3 ../tools/quantize.py -i /tmp/HL001.png -o /tmp/HL001.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH001.png -o /tmp/HH001.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LH002.png -o /tmp/LH002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL002.png -o /tmp/HL002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH002.png -o /tmp/HH002.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LH003.png -o /tmp/LH003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL003.png -o /tmp/HL003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH003.png -o /tmp/HH003.png -q $q_step

python3 -O MCOLP.py -P $predictor -p /tmp/ -b
python3 -O MDWT.py -p /tmp/ -b

for i in /tmp/00?.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done; animate /tmp/00?.png.png

rm -f /tmp/diff*.png; for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done; animate /tmp/diff*.png

for i in {0..4}; do ii=$(printf "%03d" $i); MSE=‘python3 -O ../tools/MSE.py -x ../sequences/stockholm/$ii.png -y /tmp/$ii.png‘; printf "MSE(%s)=%s\n" $ii $MSE; done

11.3 Example: Quantization of all \(H\) subbands (MCed and intra)

q_step=32
predictor=2

# You must be in the ’src’ directory.
rm /tmp/*.png
cp ../sequences/stockholm/* /tmp

python3 -O MDWT.py -p /tmp/
python3 -O MCOLP.py -P $predictor -p /tmp/

python3 ../tools/quantize.py -i /tmp/LH000.png -o /tmp/LH000.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL000.png -o /tmp/HL000.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH000.png -o /tmp/HH000.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LH001.png -o /tmp/LH001.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL001.png -o /tmp/HL001.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH001.png -o /tmp/HH001.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LH002.png -o /tmp/LH002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL002.png -o /tmp/HL002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH002.png -o /tmp/HH002.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LH003.png -o /tmp/LH003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL003.png -o /tmp/HL003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH003.png -o /tmp/HH003.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LH004.png -o /tmp/LH004.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL004.png -o /tmp/HL004.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH004.png -o /tmp/HH004.png -q $q_step

python3 -O MCOLP.py -P $predictor -p /tmp/ -b
python3 -O MDWT.py -p /tmp/ -b

for i in /tmp/00?.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done; animate /tmp/00?.png.png

rm -f /tmp/diff*.png; for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done; animate /tmp/diff*.png

for i in {0..4}; do ii=$(printf "%03d" $i); MSE=‘python3 -O ../tools/MSE.py -x ../sequences/stockholm/$ii.png -y /tmp/$ii.png‘; printf "MSE(%s)=%s\n" $ii $MSE; done

11.4 Example: Quantization of all subbands

q_step=128
predictor=2

# You must be in the ’src’ directory.
rm /tmp/*.png
cp ../sequences/stockholm/* /tmp

python3 -O MDWT.py -p /tmp/
python3 -O MCOLP.py -P $predictor -p /tmp/

python3 ../tools/quantize.py -i /tmp/LL000.png -o /tmp/LL000.png -q $q_step
python3 ../tools/quantize.py -i /tmp/LH000.png -o /tmp/LH000.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL000.png -o /tmp/HL000.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH000.png -o /tmp/HH000.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LL001.png -o /tmp/LL001.png -q $q_step
python3 ../tools/quantize.py -i /tmp/LH001.png -o /tmp/LH001.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL001.png -o /tmp/HL001.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH001.png -o /tmp/HH001.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LL002.png -o /tmp/LL002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/LH002.png -o /tmp/LH002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL002.png -o /tmp/HL002.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH002.png -o /tmp/HH002.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LL003.png -o /tmp/LL003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/LH003.png -o /tmp/LH003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL003.png -o /tmp/HL003.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH003.png -o /tmp/HH003.png -q $q_step

python3 ../tools/quantize.py -i /tmp/LL004.png -o /tmp/LL004.png -q $q_step
python3 ../tools/quantize.py -i /tmp/LH004.png -o /tmp/LH004.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HL004.png -o /tmp/HL004.png -q $q_step
python3 ../tools/quantize.py -i /tmp/HH004.png -o /tmp/HH004.png -q $q_step

python3 -O MCOLP.py -P $predictor -p /tmp/ -b
python3 -O MDWT.py -p /tmp/ -b

for i in /tmp/00?.png; do python3 ../tools/substract_offset.py -i $i -o $i.png; done; animate /tmp/00?.png.png

rm -f /tmp/diff*.png; for i in {0..4}; do ii=$(printf "%03d" $i); bash ../tools/show_differences.sh -1 /tmp/$ii.png -2 ../sequences/stockholm/$ii.png -o /tmp/diff_$ii.png; done; animate /tmp/diff*.png

                                                                  

                                                                  
for i in {0..4}; do ii=$(printf "%03d" $i); MSE=‘python3 -O ../tools/MSE.py -x ../sequences/stockholm/$ii.png -y /tmp/$ii.png‘; printf "MSE(%s)=%s\n" $ii $MSE; done

1Also called frames, which from a pure mathematical point of view are matrices of pixels.

2In this document, we will use the term DWT to refeer to the 2D-DWT.

3The structure that forms the subbands is called also a decomposition.

4This operation is also called “decimation” and “subsampling”.

5This affects to the idependency of the coefficients in terms of energy contribution to the reconstruction of the signal.

6Notice that PyWavelets uses the term “levels” for refering to the number of iterations of the 1-level DWT.

7In video coding, images are commonly refered as frames.