View Single Post
Old 17th February 2019, 03:11   #4  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
I ran and added muvsfunc SSIM and GMSD to the first test series linked to above:

Quote:
Originally Posted by WorBry View Post
Crowd Run 1080/50p encoded to x264 with crf 0 to 30:





Interesting that the libvmaf, ffmpeg and muvsfunc SSIM implementations give rather different results:



According to the documentation, the ffmpeg filter does apply the original SSIM algorithm but to to improve speed uses the standard approximation of overlapped 8x8 block sums rather than the original gaussian weights:

https://github.com/FFmpeg/FFmpeg/blo...lter/vf_ssim.c

As described in the original SSIM paper, one problem with the moving 8 x 8 block computation is that the resulting SSIM index map often exhibits undesirable blocking artifacts. By modifying the local statistics with gaussian weights the "quality maps exhibit a locally isotropic property" - which I take to mean it smooths out blocking artifacts in the quality map.

Page 605 - http://www.compression.ru/video/qual...asure/ssim.pdf

The muvsfunc SSIM function does apply gaussian filtering, with a default standard deviation of 1.5 as per the original recommendation:

https://github.com/WolframRhodium/mu...er/muvsfunc.py

Would that explain why the muvsfunc SSIM metric gives higher scores this test series ?

Another factor might be whether preliminary downsampling is applied, as is recommended in the 'Suggested Usage":

https://ece.uwaterloo.ca/~z70wang/research/ssim/"

There's no mention of downscaling in the FFMPEG documentation. The muvsfunc SSIM filter does apply downscaling by default:

Code:
downsample: (bool) Whether to average the clips over local 2x2 window and downsample by a factor of 2 before calculation. Default is True.
And apparently VMAF "includes an empirical downsampling process, as described at the Suggested Usage" in it's elementary SSIM derivation:

https://github.com/Netflix/vmaf/issues/22

So why are the libvmaf SSIM scores even higher, if both are following the original code and applying the down-sampling process ?

The muvsfunc SSIM description does state though that it uses different size gaussian kernel to the one in the original MATLAB code.

Code:
Note that the size of gaussian kernel is different from the one in MATLAB.
Could that explain the difference?

I have to say this leaves me in a quandary about which SSIM implementation to use when comparing the inherent quality efficiency of different video formats, especially at high bitrates - for example, when comparing 'visually lossless' intermediate codecs, where the interest is not in perceptual quality under certain viewing conditions but in preservation of structural fidelity. The ffmpeg SSIM metric gives a much wider spread of values which makes it easier to judge, with some confidence, that one video is of higher quality than another based on the difference of isolated scores (the last graph presented below shows that well), but is it valid as a quotable SSIM score ?

Is there a valid case for omitting the down-sampling step in the muvsfunc SSIM metric under such conditions ?

Anyhow, I also ran muvsfunc GMSD and SSIM on the parallel x265 series:



Including the ffmpeg SSIM results made the graph too 'busy', so here they are separately:



GMSD gave consistently higher scores for x265 over the same bitrate range. The SSIM metrics did also, but by a narrower margin at the higher bitrates:



GMSD looks like it could be very useful. I've yet to test muvsfunc SSIM and GMSD on the 2160/50p Crowd Run x264 and x265 series.

If anyone's interested here's the original GMSD paper:

https://arxiv.org/pdf/1308.3052.pdf

Edit: Came across this article that quotes from an article by RealNetworks CTO Reza Rassool:

Quote:
“if a video service operator were to encode video to achieve a VMAF score of about 93 then they would be confident of optimally serving the vast majority of their audience with content that is either indistinguishable from original or with noticeable but not annoying distortion.” So a 93 VMAF score is about the same as .95 for SSIM
https://streaminglearningcenter.com/...e-ratings.html

In the above tests - at a VMAF score of 93 the corresponding muvsfunc SSIM scores were around 0.96 - 0.97 and the ffmpeg SSIM scores were around the 0.925 - 0.93. The libvmaf component SSIM scores however were way up at around 0.993, which surely suggests there's something more going
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 18th February 2019 at 16:33.
WorBry is offline   Reply With Quote