Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 13th February 2019, 20:31   #1  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
GMSD and SSIM Quality Metrics

I've been running some quality metric tests with VMAF and FFMPEG SSIM and PSNR.

https://forum.doom9.org/showthread.p...70#post1864770

While I still have the test files I'd like to see how the SSIM and GMSD metrics (in muvsfunc) compare.

The function descriptions state:

SSIM - 'The mean SSIM (MSSIM) index value of the distorted image will be stored as frame property 'PlaneSSIM' in the output clip'.

GMSD - 'The distortion degree of the distorted image will be stored as frame property 'PlaneGMSD' in the output clip. The value of GMSD reflects the range of distortion severities in an image.'

But I'm not clear how to access these results.
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 13th February 2019, 20:57   #2  |  Link
zorr
Registered User
 
Join Date: Mar 2018
Posts: 213
Quote:
Originally Posted by WorBry View Post
While I still have the test files I'd like to see how the SSIM and GMSD metrics (in muvsfunc) compare.

The function descriptions state:

SSIM - 'The mean SSIM (MSSIM) index value of the distorted image will be stored as frame property 'PlaneSSIM' in the output clip'.

GMSD - 'The distortion degree of the distorted image will be stored as frame property 'PlaneGMSD' in the output clip. The value of GMSD reflects the range of distortion severities in an image.'

But I'm not clear how to access these results.
The easiest way probably is to use zoptilib.py which is part of the Zopti optimizer. But you don't need to run Zopti at all, just your script which calls zoptilib.

Here's a simple example:

Code:
from zoptilib import Zopti

# read input video
orig = core.ffms2.Source(source=r'source.avi')

# initialize output file and chosen metrics 
zopti = Zopti('results.txt', metrics=['ssim', 'gmsd'])

#   ... process the video ...
# alternate = some_process(orig)

# measure similarity of original and alternate videos, save results to output file
zopti.run(orig, alternate)
The output file will contain frame number and the chosen metrics separated by ; and on the last line the sum of per frame values. The file will be written when all of the video frames have been processed (on the last frame).

The latest version of zoptilib is currently at here.

There's also MDSI metric but for that you need to upgrade to the latest muvsfunc version manually.

Last edited by zorr; 13th February 2019 at 21:56.
zorr is offline   Reply With Quote
Old 13th February 2019, 22:57   #3  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Yes, that works. Thanks.
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 17th February 2019, 02:11   #4  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
I ran and added muvsfunc SSIM and GMSD to the first test series linked to above:

Quote:
Originally Posted by WorBry View Post
Crowd Run 1080/50p encoded to x264 with crf 0 to 30:





Interesting that the libvmaf, ffmpeg and muvsfunc SSIM implementations give rather different results:



According to the documentation, the ffmpeg filter does apply the original SSIM algorithm but to to improve speed uses the standard approximation of overlapped 8x8 block sums rather than the original gaussian weights:

https://github.com/FFmpeg/FFmpeg/blo...lter/vf_ssim.c

As described in the original SSIM paper, one problem with the moving 8 x 8 block computation is that the resulting SSIM index map often exhibits undesirable blocking artifacts. By modifying the local statistics with gaussian weights the "quality maps exhibit a locally isotropic property" - which I take to mean it smooths out blocking artifacts in the quality map.

Page 605 - http://www.compression.ru/video/qual...asure/ssim.pdf

The muvsfunc SSIM function does apply gaussian filtering, with a default standard deviation of 1.5 as per the original recommendation:

https://github.com/WolframRhodium/mu...er/muvsfunc.py

Would that explain why the muvsfunc SSIM metric gives higher scores this test series ?

Another factor might be whether preliminary downsampling is applied, as is recommended in the 'Suggested Usage":

https://ece.uwaterloo.ca/~z70wang/research/ssim/"

There's no mention of downscaling in the FFMPEG documentation. The muvsfunc SSIM filter does apply downscaling by default:

Code:
downsample: (bool) Whether to average the clips over local 2x2 window and downsample by a factor of 2 before calculation. Default is True.
And apparently VMAF "includes an empirical downsampling process, as described at the Suggested Usage" in it's elementary SSIM derivation:

https://github.com/Netflix/vmaf/issues/22

So why are the libvmaf SSIM scores even higher, if both are following the original code and applying the down-sampling process ?

The muvsfunc SSIM description does state though that it uses different size gaussian kernel to the one in the original MATLAB code.

Code:
Note that the size of gaussian kernel is different from the one in MATLAB.
Could that explain the difference?

I have to say this leaves me in a quandary about which SSIM implementation to use when comparing the inherent quality efficiency of different video formats, especially at high bitrates - for example, when comparing 'visually lossless' intermediate codecs, where the interest is not in perceptual quality under certain viewing conditions but in preservation of structural fidelity. The ffmpeg SSIM metric gives a much wider spread of values which makes it easier to judge, with some confidence, that one video is of higher quality than another based on the difference of isolated scores (the last graph presented below shows that well), but is it valid as a quotable SSIM score ?

Is there a valid case for omitting the down-sampling step in the muvsfunc SSIM metric under such conditions ?

Anyhow, I also ran muvsfunc GMSD and SSIM on the parallel x265 series:



Including the ffmpeg SSIM results made the graph too 'busy', so here they are separately:



GMSD gave consistently higher scores for x265 over the same bitrate range. The SSIM metrics did also, but by a narrower margin at the higher bitrates:



GMSD looks like it could be very useful. I've yet to test muvsfunc SSIM and GMSD on the 2160/50p Crowd Run x264 and x265 series.

If anyone's interested here's the original GMSD paper:

https://arxiv.org/pdf/1308.3052.pdf

Edit: Came across this article that quotes from an article by RealNetworks CTO Reza Rassool:

Quote:
“if a video service operator were to encode video to achieve a VMAF score of about 93 then they would be confident of optimally serving the vast majority of their audience with content that is either indistinguishable from original or with noticeable but not annoying distortion.” So a 93 VMAF score is about the same as .95 for SSIM
https://streaminglearningcenter.com/...e-ratings.html

In the above tests - at a VMAF score of 93 the corresponding muvsfunc SSIM scores were around 0.96 - 0.97 and the ffmpeg SSIM scores were around the 0.925 - 0.93. The libvmaf component SSIM scores however were way up at around 0.993, which surely suggests there's something more going
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 18th February 2019 at 15:33.
WorBry is offline   Reply With Quote
Old 17th February 2019, 16:40   #5  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
I'd like to test muvsfunc SSIM with 'downsample=False' to see how the results compare. How can I change that setting so as to get the results through Zoptilib ?

Quote:
Originally Posted by zorr View Post
Code:
from zoptilib import Zopti

# read input video
orig = core.ffms2.Source(source=r'source.avi')

# initialize output file and chosen metrics 
zopti = Zopti('results.txt', metrics=['ssim', 'gmsd'])

#   ... process the video ...
# alternate = some_process(orig)

# measure similarity of original and alternate videos, save results to output file
zopti.run(orig, alternate)
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 17th February 2019 at 16:43.
WorBry is offline   Reply With Quote
Old 17th February 2019, 18:45   #6  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 770
I made an update https://github.com/theChaosCoder/zoptilib
You can use it like this
Code:
zopti = Zopti(output_file, metrics=['ssim', 'mdsi'])
zopti.addParams('ssim', dict(downsample=False, show_map=False))
zopti.addParams('mdsi', dict(down_scale=1))
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 17th February 2019, 19:02   #7  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Brilliant. Haven't got around to looking at MDSI yet. It's RGB only though, isn't it ?
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 17th February 2019, 20:06   #8  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 770
Yes. Also use the latest version https://raw.githubusercontent.com/Wo...er/muvsfunc.py
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 18th February 2019, 17:39   #9  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Quote:
Originally Posted by WorBry View Post
I'd like to test muvsfunc SSIM with 'downsample=False' to see how the results compare.
I've done that with the x264 test series:





Clearly removing the downsampling step has a profound effect, producing a much wider spread of scores - even more so than ffmpeg SSIM - although up at around CRF=2 (400 - 425 Mbps), the scores start to approach those obtained with downsampling applied, and lossless is still reported as such.



As described in the 'Suggested Usage' for the original (Matlab) SSIM code, the purpose of the downsampling is to compensate for viewing the image at a typical distance from the screen:

Quote:
The above (ssim_index.m) is a single scale version of the SSIM indexing measure, which is most effective if used at the appropriate scale. The precisely “right” scale depends on both the image resolution and the viewing distance and is usually difficult to be obtained. In practice, we suggest to use the following empirical formula to determine the scale for images viewed from a typical distance (say 3~5 times of the image height or width): 1) Let F = max(1, round(N/256)), where N is the number of pixels in image height (or width); 2) Average local F by F pixels and then downsample the image by a factor of F; and 3) apply the ssim_index.m program. For example, for an 512 by 512 image, F = max(1, round(512/256)) = 2, so the image should be averaged within a 2 by 2 window and downsampled by a factor of 2 before applying ssim_index.m.
http://www.cns.nyu.edu/~lcv/ssim/

In other words it is a perceptual quality modifier. Whether it's valid to remove that step when using SSIM to compare video images for structural differences that exceed visual acuity (i.e. independent of perceived quality) I'm still not sure.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 18th February 2019 at 17:53.
WorBry is offline   Reply With Quote
Old 18th February 2019, 18:58   #10  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Thought it might be interesting to see how the AVISynth SSIM filter compares also. This plugin has a rather nebulous history going back to the original implementation by LeFungus in 2003:

https://forum.doom9.org/showthread.php?t=61128

His last update was version 0.24, although the results log still reports it as 0.23.

It appears the plugin then received further fixes and modifications made by others, but in the absence of associated documentation, it is not clear exactly what changes were made.

This thread attempted to make sense of it:

https://forum.doom9.org/showthread.p...03#post1089303

I decided to test both the original (as assumed) 'LeFungus' 0.24 version and the 0.25.1 version posted by Mitsubishi in that thread. They produced identical results:



Wow, very different from the other SSIM implementations, with the CRF=30 x264 encode scoring way down at 33 (0.33).
Yet, according to LeFungus, it was developed from the original code.

I suspect this stems from the 'luma masking' parameter that was given as an option (Default: True) in the original (LeFungus) 0.24 plugin. In 0.25.1 that option is not accessible, as such (returns an error), but since 0.25.1 produced identical results, it's reasonable to assume that 'Luma Masking' was being applied.

Possibly it equates with the luminance normalization filtering that is applied in the original SSIM algorithm ? In the AVISynth plugin it is applied as a weighting:

Quote:
This filter is designed to compute an SSIM value by two methods, the original one, and a "enhanced" one that weight these results by lumimasking........In the csv file, when lumimasking is activated, both SSIM values and its weight is written.
https://avisynth.org.ru/docs/english...lters/ssim.htm

Unfortunately, I only recorded the final aggregate SSIM score reported in the log file and didn't generate the csv file that lists the individual frame scores and weightings. I'll maybe re-run some tests to see what difference the weightings made. But really, these results and the vagaries surrounding this plugin, don't exactly instill confidence in it's use.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 18th February 2019 at 19:15.
WorBry is offline   Reply With Quote
Old 18th February 2019, 19:14   #11  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 770
So basically every ssim implementation gave different results... which one can we trust (more)?
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 18th February 2019, 19:51   #12  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Quite so ! Granted the tests were conducted with just one source clip (CrowdRun) - although a good one at that - high quality/complexity/motion/hard to compress.

The results as they stand leave me more inclined to use muvsfunc SSIM as a 'definitive' implementation of the original code. Would be nice if there were an AVISynth(+) implementation of the muvsfunc SSIM filter.

Still don't understand though why the libvaf derived SSIM figures are so much higher. Is it down to difference in Gaussian kernel size or are the reported elementary SSIM scores being further weighted by the VMAF 'model' in some way (before the final VMAF calculation, that is) ?

As for MDSI - results to follow.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 20th February 2019 at 18:48.
WorBry is offline   Reply With Quote
Old 18th February 2019, 20:39   #13  |  Link
zorr
Registered User
 
Join Date: Mar 2018
Posts: 213
I looked at sources of muvsfunc SSIM and Avisynth's SSIM (v0.25.1 by Mitsubishi).

The Avisynth version is not doing the gaussian kernel at all - it's implemented using summed area tables. That's a faster but lower quality way to calculate it, the MSU Quality measurement tool page has an example of the difference.

Also muvsfunc returns SSIM calculated on one plane only, by default the luma. Avisynth SSIM has a plane argument which defaults to 0 and then it returns a weighted sum of the luma and chroma channels:
(0.8 * Y) + (0.1*(U+V))

And yes, Avisynth has the lumimask but it's disabled in the code. Muvsfunc has the variables k1 and k2, but at least they default to same values as the ones used in Avisynth version.

So there are quite a few ways to make the implementations differ, I guess there are similar small differences between the other implementations.

My opinion is that the default muvsfunc SSIM downsampling is not useful when comparing the quality of different script settings (like in the Zopti optimizer).
zorr is offline   Reply With Quote
Old 18th February 2019, 21:08   #14  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Thanks for the insights. That explains a lot.

Quote:
Originally Posted by zorr View Post
Also muvsfunc returns SSIM calculated on one plane only, by default the luma.
So presumably libvmaf is doing the same ?

Quote:
Originally Posted by zorr View Post
Avisynth SSIM has a plane argument which defaults to 0 and then it returns a weighted sum of the luma and chroma channels:
(0.8 * Y) + (0.1*(U+V))
Is that how ffmpeg calculates an aggregate 'All' SSIM score also ? Even if it's not applying Gaussian weights, obtaining individual scores for the Luma and U, V channels can be useful in assessing whether losses are occurring in the chroma only - for examining chroma subsampling efficiencies etc.

Quote:
Originally Posted by zorr View Post
My opinion is that the default muvsfunc SSIM downsampling is not useful when comparing the quality of different script settings (like in the Zopti optimizer).
Yes, I don't see there's anything to be gained in that context.
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 18th February 2019, 22:38   #15  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Question:

For conducting these tests I've been using VirtualDub2 to run the VS scripts and generate the result files.

I'd like to change to KingChaos's Portable (Flatpack) version in future. The changelog for the next update (2019-02-xx) promises to include:

Quote:
- Add VFW "install" script, so that VDub and co can read vpy files
https://forum.doom9.org/showthread.php?t=175529

Meanwhile, for running MDSI (and possibly Buttergauli) scripts, how do I vspipe the RGB24 output to ffmpeg as a null operation, purely to generate the results files ?

I suppose I could use VSEditor > Preview in place of VirtualDub2 but I can't see how to stop the playback looping when it comes to the end of the clip.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 18th February 2019 at 22:42.
WorBry is offline   Reply With Quote
Old 18th February 2019, 22:48   #16  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 770
Quote:
Originally Posted by WorBry View Post
I'd like to change to KingChaos's Portable (Flatpack) version in future. The changelog for the next update (2019-02-xx) promises to include:
I think I'm trapped in an alternate reality.
You can use the reg file for now https://forum.doom9.org/showthread.p...51#post1864051


Quote:
Originally Posted by WorBry View Post
I suppose I could use VSEditor > Preview in place of VirtualDub2 but I can't see how to stop the playback looping when it comes to the end of the clip.
Use the Benchmark (F7) instead.
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 18th February 2019, 23:28   #17  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Quote:
Originally Posted by ChaosKing View Post
I think I'm trapped in an alternate reality
Really, what's the weather like there ?

Quote:
Originally Posted by ChaosKing View Post
Use the Benchmark (F7) instead.
That's the one. Thanks.

Quote:
Originally Posted by ChaosKing View Post
Hadn't seen your other post about the reg edit. So is that basically what the 'VFW "Install" Script' will be doing ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 18th February 2019 at 23:39.
WorBry is offline   Reply With Quote
Old 19th February 2019, 02:53   #18  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Quote:
Originally Posted by zorr View Post
I looked at sources of.....Avisynth's SSIM (v0.25.1 by Mitsubishi).

...And yes, Avisynth has the lumimask but it's disabled in the code.
That's odd - I went back to check and re-run some of the tests with v0.24 and v0.25.1. The results I posted above were definitely with Lumimask=True applied in v0.24, and v0.25.1 gave the same results. You can see both the 'original' and weighted ('enhanced') SSIM scores displayed on the output frames as the script is played through VDub2 and the per-frame scores are listed in separate columns in the generated csv file. The text file however only gives the 'global' weighted score.

The Lumimask parameter may be disabled as an option in v0.25.1, but it's definitely being applied.

That said, setting Lumimask=False in v0.24 didn't radically change the results. I only ran a couple of tests:
Code:
            
            Lumimask=True     Lumimask=False
CRF0        100               100
CRF1        98.50             98.36
CRF12       87.17             86.64
CRF30       33.80             34.09
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 19th February 2019, 13:06   #19  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 770
Quote:
Originally Posted by WorBry View Post
Really, what's the weather like there ?
Cloudy with a Chance of Meatballs

Quote:
Originally Posted by WorBry View Post
Hadn't seen your other post about the reg edit. So is that basically what the 'VFW "Install" Script' will be doing ?
Yes, it will add it to the registry with the correct path.
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 19th February 2019, 16:13   #20  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,187
Quote:
Originally Posted by WorBry View Post
As for MDSI - results to follow.
The MDSI and GMSD results for the x264 and x265 series:





Interesting that the MDSI scores show a fairly linear relation with bitrate plotted as base 2 log. Encoding x264 at any fractional CRF value <1 of course defaults to lossless High444Predictive. Interesting also that the difference between the (bitrate matched) x264 and x265 score plots is fairly constant down to around 24 Mbps.

Those MDSI results were with downscale applied i.e.

Code:
zopti.addParams('mdsi', dict(down_scale=2))
With downscale turned off (the default)...

Code:
zopti.addParams('mdsi', dict(down_scale=1))
....the scores are lower and lose the linear relationship at the higher bitrates.



For those interested, here's the original paper for the MDSI (Mean Deviation Similarity Index) metric.

https://arxiv.org/pdf/1608.07433.pdf

This metric pools combined image gradient (sensitive to structural distortions) and chromacity similarity maps.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 19th February 2019 at 16:41.
WorBry is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:24.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.