Research

Multihypothesis Motion-Compensated Prediction
with Forward-Adaptive Hypothesis Switching

Multihypothesis motion-compensating predictors combine several motion-compensated signals to predict the current frame signal. More than one motion-compensated signal, or hypothesis, is selected for transmission. Multiframe motion-compensated prediction is a further concept for efficient video compression and is an example for forward-adaptive hypothesis switching. One motion-compensated signal is selected from multiple reference frames for transmission.

This work extends the theory of multihypothesis motion-compensated prediction to forward-adaptive hypothesis switching. Assume, that we combine N hypotheses. Each hypothesis that is used for the combination is selected from a set of motion-compensated signals of size M. We study the influence of the hypothesis set size M on both the accuracy of motion compensation of forward-adaptive hypothesis switching and the efficiency of multihypothesis motion-compensated prediction. In both cases, we examine the noise-free limiting case. That is, we neglect signal components that are not predictable by motion compensation. Selecting one hypothesis from a set of motion-compensated signals of size M, that is, switching among M hypotheses, will reduce the displacement error variance by factor M when we assume statistically independent displacement errors. Integrating forward-adaptive hypothesis switching into multihypothesis motion-compensated prediction, that is, allowing a combination of switched hypotheses, increases the gain of multihypothesis motion-compensated prediction over the single hypothesis case for growing hypothesis set size M. (Article)

1. Forward-Adaptive Hypothesis Switching

Multiframe motion-compensated prediction is very useful for efficient video compression. The technique extends motion-compensated prediction such that previously decoded frames are utilized. This is achieved by permitting a variable reference picture selection for each block, where each reference picture is a previously decoded frame. The encoder has to select one reference frame per motion vector for transmission. In the following, choosing one signal from a set of motion-compensated signals is called forward-adaptive hypothesis switching (left figure). Each hypothesis used by multihypothesis motion-compensated prediction (right figure) is generated by forward-adaptive hypothesis switching.

2. Minimizing the Radial Displacement Error

How does hypothesis switching improve the accuracy of motion-compensated prediction? Let us assume that the components of the displacement error for each hypothesis are i.i.d. Gaussian. The Euclidean distance to the zero displacement error vector defines the radial displacement error for each hypothesis. The hypothesis with minimum radial displacement error is used to predict the signal.

We exploit the fact that the radial displacement error is Rayleigh distributed. We assume that the variances of the radial displacement errors are identical for all M hypotheses. We see that switching among M independent Rayleigh distributed radial displacement errors results again in a Rayleigh distributed radial displacement error with a variance reduced by factor M.

3. Equivalent Predictor

The previous result suggests to define an equivalent motion-compensating predictor for switched prediction. This predictor uses just one hypothesis but the variance of its displacement error is much smaller.

The equivalent predictor with reduced displacement error variance represents the more accurate motion compensation achieved by switched prediction. Consequently, forward-adaptive hypothesis switching lowers the energy of the motion-compensated prediction error.

4. Results

The left figure depicts average bit-rates obtained with H.263 for the sequence Mobile & Calendar at 34 dB PSNR over the number of reference frames M used by multiframe motion-compensated prediction. We observe that the linear combination of hypotheses (MHP) is more efficient when the hypotheses are obtained by multiframe motion compensation.

The right figure depicts the theoretical rate difference over the number of reference hypotheses M for multihypothesis motion-compensated prediction with forward-adaptive hypothesis switching. The switched hypotheses are just averaged and no residual noise is assumed for simplicity. We assume half-pel accurate motion compensation. We observe that doubling the number of reference hypotheses decreases the bit-rate for motion-compensated prediction by 0.5 bit/sample and for multihypothesis motion-compensated prediction by 1 bit/sample. The gain going from N=1 to N=2 is the largest, independent of the number of reference hypotheses M. In addition, this gain increases for a larger number of available motion-compensated signals M.



Multihypothesis Motion Estimation

Multihypothesis motion-compensating predictors combine several motion-compensated signals to predict the current frame of a video signal. We apply the wide-sense stationary theory of multihypothesis motion compensation for hybrid video codecs to multihypothesis motion estimation. The power spectrum of the prediction error is related to the multi-dimensional displacement error probability density function of N hypotheses. We study the influence of the displacement error correlation on the efficiency of multihypothesis motion compensation. Reducing the displacement error correlation between the hypotheses decreases the variance of the multihypothesis prediction error. (Article)

1. Optimal Multihypothesis Motion Estimation

We focus on the relationship between the prediction error variance and the displacement error correlation coefficient.

The figure depicts the dependency of the normalized prediction error variance on the displacement error correlation coefficient within its valid range. The dependency is plotted for N = 2, 4, 8, ... and integer-pel accurate motion compensation. The correlation coefficient of the frame signal is 0.93. Reference is the prediction error variance of the single-hypothesis predictor. We observe that a decreasing correlation coefficient lowers the prediction error variance.

Jointly estimated hypotheses show the property that their displacement errors are maximally negatively correlated. Hypotheses with negatively correlated displacement errors improve the performance of multihypothesis motion compensation.

We observe for the wide-sense stationary model that jointly optimal motion estimation improves the prediction performance and reduces the prediction error variance up to 12 dB per accuracy refinement step (right figure) compared to 6 dB per accuracy refinement step for uncorrelated displacement errors (left figure). Consequently, the gain of multihypothesis motion-compensated prediction with jointly optimal motion estimation over motion-compensated prediction increases by improving the accuracy of each hypothesis.

2. Hypotheses with Additive Noise

In order to consider signal components that cannot be modeled by motion compensation, statistically independent noise is added to each motion-compensated signal. In addition, an optimum Wiener filter is applied to all hypotheses.

The right figure depicts the prediction error variance for multihypothesis MCP over the displacement inaccuracy for both optimized displacement error correlation and statistically independent displacement errors. The residual noise level is chosen to be -30 dB. For half-pel accurate motion compensation and 2 hypotheses, we gain almost 4 dB in prediction error variance for optimized displacement error correlation over statistically independent displacement errors.



Multihypothesis Motion-Compensated Prediction for Video Coding

Multihypothesis motion-compensated prediction extends traditional motion-compensated prediction used in video coding schemes. Known algorithms for block-based multihypothesis motion-compensated prediction are, for example, overlapped block motion compensation (OBMC) and bidirectionally predicted frames (B-frames).

1. Multihypothesis Motion-Compensated Prediction

Block-based multihypothesis motion-compensated prediction bases on the idea that a combination of several prediction hypotheses is better suited to model complex temporal correlation. In practice, this is accomplished by selecting and combining several blocks from many previously decoded frames. The figures show results for the sequence Foreman. (Article)

2. Video Coding with H.263 and Multihypothesis Prediction

The coding mode for block-based multihypothesis MCP is derived from the INTER block coding mode of the ITU-T Recommendation H.263. In contrast, an additional block-based predictor is incorporated into the coding scheme: predictor P1 and P2 are providing two blocks, namely two hypotheses, which are linearly combined with scalar weights h1 and h2; see left figure.

Being able to encode a block with a 1-hypothesis (INTER) or a 2-hypothesis (INTER2H), it is necessary to decide which strategy is best for each individual block. Aiming a rate-distortion efficient video codec, a rate-distortion optimal decision is applied.

In a first experiment, we evaluate the coding efficiency of a 2-hypothesis prediction type (INTER2H). To accomplish this, we compare two coding schemes. The first scheme allows for each macroblock the types INTRA and INTER, the second scheme INTRA, INTER, and INTER2H. According to the right figure, the additional INTER2H type improves coding gains up to 1dB for the sequence Foreman which corresponds to rate reductions up to 17%. Surprisingly, the INTER4V type shows an almost equal efficient performance within the observed rate interval.

Experiments show, that multihypothesis MCP is not only a competing scheme to MCP with variable block size. Multihypothesis MCP can also be applied to blocks of variable size with the property that the overall coding gain is improved. The new architecture combines the multihypothesis idea with the method of variable size blocks and improves coding gains up to 0.9 dB for the sequence Foreman as depicted in the left figure. The right figure shows coding gains up to 1.5dB for the sequence Mobile & Calendar.

The given architecture for a 2-hypothesis has two properties: First, the coding gain reduces for decreasing rate. Second, dynamic sequences are coded more efficiently than sequences containing few motion. (Article)



Copyright Markus Flierl, July 15, 2004