Learning-Based Multi-Frame Video Quality Enhancement
This paper was presented by Junchao Tong, Xilin Wu, Dandan Ding, Zheng Zhu, and Zoe Liu, “Learning-Based Multi-Frame Video Quality Enhancement,” in the Proceedings of the IEEE International Conference on Image Processing (ICIP), September 22-25, 2019 in Taipei, Taiwan.
The convolution neural network (CNN) has shown its great success in video quality enhancement. Existing methods mainly conduct enhancement tasks in the spatial domain, exploring the pixel correlations within one frame. Taking advantage of the similarity across successive frames, this paper develops a learning-based multi-frame approach, with an aim to explore the greatest potential for video quality enhancement leveraging the temporal correlation. First, we apply a learning-based optical flow to compensate for the temporal motion across neighboring frames. Afterward, a deep CNN network, which is structured in an early-fusion manner, is designed to discover the joint spatial-temporal correlations within a video. To ensure the generality of our CNN model, we further propose a robust training strategy. One high- quality frame and one moderate-quality frame are paired to enhance the remaining low-quality frames in between, which considers a trade-off between frame distances and various frame quality. Experimental results demonstrate that our method outperforms state-of-the-art work in objective quality. The code and model of our approach are published in Github (https://github.com/IVC-Projects/LMVE).
In this paper, we present a novel approach, namely LMVE, to jointly leverage the spatial-temporal correlations among frames for better enhancement on compressed video. LMVE categorizes different frames within one video to three quality levels, and utilizes those high-quality and moderate-quality frames to enhance the low-quality ones in between. FlowNet is first adopted to obtain the optical flow between adjacent frames in order to generate compensated frames. Afterwards, the compensated frames are fed into an early-fusion CNN network, in conjunction with the original low-quality frames. A training strategy is finally applied to obtain robust CNN models. Experimental results demonstrate that LMVE obtains a consistent superior result and outperforms prior work by 0.23 dB in PSNR on average.