Visionular Codec Core Optimizations
This is a three-part series that explains in detail the Intelligent Optimization technology that is foundational to Visionular’s video encoder quality and bitrate efficiency advantages. Click here to read part one. Part two that covers our ML algorithms and image processing modules can be found by clicking here.
The Visionular Intelligent Optimization system contains two critical components, AI-powered image-processing controls, and a highly optimized core codec. In part 1 we introduced a few of the approaches possible to optimize a video encoding recipe for the actual needs of the content, versus a fixed recipe that is applied to every asset equally. In part 2, a thorough explanation of the most visible ML algorithms and image processing modules was covered.
For this part of the white paper, we discuss an aspect of CAE that cannot be underestimated in its importance, and it’s an area that Visionular devotes continuous effort to, that is codec core optimization. We perform this essential task for all mainstream video codec standards, namely H.264/AVC, H.265/HEVC, and AV1.
We balance our development efforts equally on performance and compression efficiency, which when combined with all our intelligent technologies, is the reason that Visionular encoders are universally recognized as being the highest performing, and most efficient, able to deliver unmatched visual quality, regardless of the application or use case.
To illustrate our performance advantage, Figure 1 shows Aurora1 (Visionular’s AV1 encoder) operating faster than x265 being run in the ‘slower’ speed mode to match visual quality.
For this comparison, Aurora1 is able to encode at 2.16x the speed of x265 ‘slower’ while achieving a compression gain of 24.03%. This is a remarkable achievement when you consider that the AV1 video standard is many times more complex than HEVC, and the x265 codec has had more than seven years of active development invested into it, as opposed to just a few short years for AV1.
Figure 2 illustrates Aurora1 operating in its ‘veryfast’ mode and compared with the Rav1e (s10) AV1 encoder that is an open-source implementation initiated by Mozilla.
For this test, Aurora1 is 2.68x faster than Rav1e while achieving a BD-rate gain of 38.98% in PSNR, and a gain of 45.23% in SSIM. This feat demonstrates our commitment to continuous improvement and performance optimization of the codec itself.
It’s important to note that we do not achieve our quality and efficiency gains through processing alone, though they are very helpful and useful. As impressive as these results are, we will not stop innovating and this is why Visionular customers can be confident in their decision to deploy our technology and solutions, today, and for the future.
Many video services and platforms take advantage of the x264 H.264 codec implementation because of its free and easy access and strongly supported open-source community. But, this means that users of x264 are stuck with an unknown development cycle and a roadmap that may not line up with their specific needs.
Figure 3 demonstrates that for the x264 ‘medium’ speed preset, that when the encoding speed is aligned, and with subjective quality optimization features disabled, and without extra parameter tuning, that for the objective metrics of PSNR, SSIM, and VMAF, our H.264 encoder Aurora4 achieves a 10% coding efficiency advantage.
Audio Encoding Bitrate Adaptation
For Visionular video encoders, the target audio encoding bitrate adaptation is based on the audio content category and corresponding complexity. The process is designed to select the best-fit bitrate, adopt optimal coding algorithms, and achieve the largest possible bitrate saving while maintaining the highest quality.
We specify different quality levels according to the needs of the audio content, the quality level for the output that is needed, and the bitrate budget that is available. At each quality level, a number of bitrates are defined, and the quality fluctuation controlled within this narrow range so that the audio quality will be kept steady while the maximum bitrate saving is achieved.
In Figure 4, Visionular’s audio bitrate optimization process is shown to adaptively select the bitrate range for varying quality levels, implementing a video-CRF-like rate control strategy, but for audio. This technique is referred to as Audio CRF mode or ACRF.
The resulting bitrate savings statistics, when compared to other state-of-the-art audio encoding solutions are presented in Figure 5. PESQ (Perceptual Evaluation of Speech Quality) LQO (Listening Quality Analysis) is used as commonly used in the speech/audio analysis and evaluation. The curve comparing the PESQ LQO score with the bitrate is also shown.
For bandwidth-constrained users, it is difficult to provide consistent high bitrate audio quality. When bitrate is reduced to a certain extent, the audio quality will be dramatically degraded and present a large fluctuation. Our low bitrate audio encoding solution could greatly help to maintain the best possible audio quality while gracefully decreasing the encoding bitrate to fit bandwidth constraints.
Subjective and Objective Quality Evaluation
We exploit VMAF as our preferred objective quality metric and use MOS subjective scoring to get a proper view of how humans will perceive the video. In this article, we have mentioned the use of VMAF to guide and assess our video optimization algorithms. For example, we use VMAF to assess our AI preprocessing algorithm and through obtaining the VMAF/ BD-rate performance evaluate the efficacy of our AI pre-processing algorithms.
However, it’s important to recognize that as widely used as objective metrics are, they do not correlate to human vision and they should never completely replace subjective evaluation techniques. We use MOS subjective scoring as the final arbiter to guide our AI quality optimization algorithm iteration and development. For sharpening algorithm optimization and development, detail enhancement for face contours, and hair regions, MOS scores are essential since VMAF does not represent fully the enhancement performance created with each algorithm iteration.
We have adopted the five MOS scoring levels, considering the following aspects: blockiness, noise level, spikiness, edges/fine-textures, contrastness, etc. To accommodate this process, we have designed a proprietary MOS scoring platform that can accommodate 20 subjects to concurrently conduct their subjective evaluation.
Figure 6 presents our MOS scoring workflow, for both video subjective evaluation and audio subjective evolution. For assessing intelligent transcoding of audio signals, we combine MOS subjective valuation and PESQ objective scoring. We differentiate audio quality to 5 levels, corresponding to PESQ scoring 5-excellent, 4-good, 3-medium, 2-poor, and 1-very poor.
Figure 7 shows through the combined use of MOS subjective assessment testing and by using the VMAF objective quality assessment, how we are able to monitor video quality as it’s aligned with the human eye in order to guide our continuous encoding algorithm and codec optimization and development efforts.