Dandan Ding

Professor Dandan Ding is with the Digital Media & Interaction Research Center (DMI), Department of Information Science and Engineering at Hangzhou Normal University as a faculty member. Her research interests include artificial intelligence-based image and video processing, video coding algorithm design and optimization, embedded system architecture, and SoC design.

Dandan is an active participant in video coding standardization. In 2007, she was heavily involved in the standardization activity of MPEG reconfigurable video coding. In 2011, she won the MPEG appreciation prize because of her leadership in the MPEG activities. Dandan has published 50 papers and applied for 10 patents.

In addition to her technical advisement activities to Visionular, Dandan is active in the development of new video coding algorithms for next-generation video coding standards, such as AV1 and AV2. As a result of her work, Dandan was invited to the Alliance for Open Media symposium in 2019 to report on her research work in the AV1 codec standard. In 2018 and 2019, Dandan received the research grant sponsored by Google’s Chrome University Relationship Program (CURP).

Select Projects:

  •  “Research on Video Coding Algorithm and Global Optimization Using Deep Neural Network”, Zhejiang Provincial Natural Science Foundation of China, LY20F010013, 2020-2022. Role: Principal Investigator.
  • “Simulation on the virtual character expression and visual presentation”, a subproject of National key R&D plan, 2017-2021. Role: Principal Investigator.
  •  “Low model complexity in-loop restoration with CNNs for AV2 codec”, Google’s Chrome University Relationship Program (CURP), 2019-2020. Role: Principal Investigator.
  • “4K up-conversion technology for 1080P videos”, Migu Digital Media Co.,Ltd, 2019-2020. Role: Principal Investigator.
  • “Deep Neural Network Based Frame Reconstruction For Optimized Video Coding – An AV2 Approach”, Google’s Chrome University Relationship Program (CURP), 2018-2019. Role: Principal Investigator.
  • “High complexity and high parallelism SoC system for Ultra-HD video coding”, Zhejiang Provincial Natural Science Foundation of China, LQ15F010001, 2015-2017. Role: Principal Investigator.

Papers Published with Visionular:

Dandan Ding, Wenyu Wang, Junchao Tong, Xinbo Gao, Zoe Liu, and Yong Fang, “Bi-Prediction Based Video Quality Enhancement via Learning”, IEEE Transactions on Cybernetics, June 17, 2020.

Synopsis:

Convolutional neural networks (CNNs)-based video quality enhancement generally employs optical flow for pixel-wise motion estimation and compensation, followed by utilizing motion-compensated frames and jointly exploring the spatiotemporal correlation across frames to facilitate the enhancement. This method, called the optical-flow-based method (OPT), usually achieves high accuracy at the expense of high computational complexity. In this article, we develop a new framework, referred to as bi-prediction-based multi-frame video enhancement (PMVE), to achieve a one-pass enhancement procedure. PMVE designs two networks, that is, the prediction network (Pred-net) and the frame-fusion network (FF-net), to implement the two steps of synthesis and fusion, respectively. Specifically, the Pred-net leverages frame pairs to synthesize the so-called virtual frames (VFs) for those low-quality frames (LFs) through bi-prediction. Afterward, the slowly fused FF-net takes the VFs as the input to extract the correlation across the VFs and the related LFs, to obtain an enhanced version of those LFs. Such a framework allows PMVE to leverage the cross-correlation between successive frames for enhancement, hence capable of achieving high accuracy performance. Meanwhile, PMVE effectively avoids the explicit operations of motion estimation and compensation, hence greatly reducing the complexity compared to OPT. The experimental results demonstrate that the peak signal-to-noise ratio (PSNR) performance of PMVE is fully on par with that of OPT while its computational complexity is only 1% of OPT. Compared with other state-of-the-art methods in the literature, PMVE is also confirmed to achieve superior performance in both objective quality and visual quality at a reasonable complexity level. For instance, PMVE can surpass its best counterpart method by up to 0.42 dB in PSNR.

 

Dandan Ding, Lingyi Kong, Guangyao Chen, Zoe Liu, and Yong Fang, “A Switchable Deep Learning Approach for In-loop Filtering in Video Coding”, IEEE Transactions on Circuits and Systems for Video Technologies (T-CSVT), August 15, 2019.

Synopsis:

Deep learning provides great potential for in-loop filtering to improve both coding efficiency and subjective quality in video coding. State-of-the-art work focuses on the network structure design and employs a single powerful network to solve all problems. In contrast, this paper proposes a deep learning-based systematic approach that includes an effective Convolutional Neural Network (CNN) structure, a hierarchical training strategy, and a video codec oriented switchable mechanism. First, we propose a novel CNN structure, i.e., Squeeze- and-Excitation Filtering CNN (SEFCNN), as an optional in-loop filter. To capture the non-linear interaction between channels, the SEFCNN is comprised of two subnets, i.e., Feature EXtracting (FEX) subnet and Feature ENhancing (FEN) subnet. Then, we develop a hierarchical model training strategy to adapt the two subnets to different coding scenarios. For high-rate videos with small artifacts, we train a single global model using the FEX for all types of frames, whereas for low-rate videos with large artifacts, different models are trained using both FEX and FEN for different types of frames. Finally, we propose an adaptive enhancing mechanism that is switchable between the CNN-based and the conventional methods. We selectively apply the CNN model to some frames or some regions in a frame. Experimental results show that the proposed scheme outperforms state-of-the-art work in coding efficiency, while the computational complexity is acceptable after GPU acceleration.

 

Junchao Tong, Xilin Wu, Dandan Ding, Zheng Zhu, and Zoe Liu, “Learning-Based Multi-Frame Video Quality Enhancement,” in the Proceedings of the IEEE International Conference on Image Processing (ICIP), September 22-25, Taipei, Taiwan.

Synopsis:

The convolution neural network (CNN) has shown its great success in video quality enhancement. Existing methods mainly conduct enhancement tasks in the spatial domain, exploring the pixel correlations within one frame. Taking advantage of the similarity across successive frames, this paper develops a learning-based multi-frame approach, with an aim to explore the greatest potential for video quality enhancement leveraging the temporal correlation. First, we apply a learning-based optical flow to compensate for the temporal motion across neighboring frames. Afterward, a deep CNN network, which is structured in an early-fusion manner, is designed to discover the joint spatial-temporal correlations within a video. To ensure the generality of our CNN model, we further propose a robust training strategy. One high- quality frame and one moderate-quality frame are paired to enhance the remaining low-quality frames in between, which considers a trade-off between frame distances and various frame quality. Experimental results demonstrate that our method outperforms state-of-the-art work in objective quality. The code and model of our approach are published in Github (https://github.com/IVC-Projects/LMVE).