Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 1st February 2019, 22:25   #41  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
I've been testing the Vapour Synth VMAF (r3) plugin with a high quality 1080/50p source (CrowdRun, lossless x264 8bit 420 Intra) encoded to x264 over a range of CRF values.

When testing the source file against self (as a control), which should be lossless, I was surprised to find that the VMAF score is not 100.



Is this normal ?

Script:

Code:
import vapoursynth as vs
core = vs.get_core()
clip = core.ffms2.Source(source=r'X:/CrowdRun_x264_lossless.mp4')
result = core.vmaf.VMAF(clip, clip, ssim=True, ms_ssim=True, psnr=True, model=0, log_path=r'X:/VMAF_r3.log' )
result.set_output()
Also the log reports VMAF version="1.3.11", not 1.3.13

Quote:
Originally Posted by HolyWu View Post
Update r3.
[LIST][*]Update libvmaf to v1.3.13, which includes performance improvement.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 1st February 2019 at 22:28.
WorBry is offline   Reply With Quote
Old 1st February 2019, 22:29   #42  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 526
https://github.com/Netflix/vmaf/blob...is-there-a-bug

Quote:
A: VMAF does not guarantee that you get a perfect score in this case, but you should get a score close enough. Similar things would happen to other machine learning-based predictors (another example is VQM-VFD).
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 1st February 2019, 22:32   #43  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Thanks.
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 2nd February 2019, 20:16   #44  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Quote:
Originally Posted by WorBry View Post

When testing the source file against self (as a control), which should be lossless, I was surprised to find that the VMAF score is not 100.



Is this normal ?

Quote:
Originally Posted by ChaosKing View Post
Actually, looking at the per-frame scores in that same log, it is just the VMAF score for the first frame that skews the aggregate result, and it looks like it's the motion2 metric (which measures temporal difference) score of 0 that is responsible for that. All of the remaining 499 frames have a VMAF score of 100.



To enlarge open image and click (+) cursor.

Perhaps there should be an option to exclude the first frame from the aggregate scores?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 2nd February 2019 at 20:28.
WorBry is offline   Reply With Quote
Old 5th February 2019, 05:46   #45  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Quote:
Originally Posted by WorBry View Post
I've been testing the Vapour Synth VMAF (r3) plugin with a high quality 1080/50p source (CrowdRun, lossless x264 8bit 420 Intra) encoded to x264 over a range of CRF values.
Interesting results....having not tested VMAF before.

Here I encoded the CrowdRun 1080/50p 'master' to x264 over CRF 0 - 30. This was using the default vmaf_v0.6.1.pkl model (i.e. Predict Quality on a 1080p HDTV screen at distance 3x the screen height). The VMAF, SSIM and MS-SSIM scores are the aggregate values. The 'classic' SSIM tests were run on Zeranoe ffmpeg win64-static nightly build (20190131).





Big difference in the libvmaf SSIM and ffmpeg SSIM scores. Apparently, the vmaf SSIM implementation "includes an empirical downsampling process, as described at the Suggested Usage section of https://ece.uwaterloo.ca/~z70wang/research/ssim/", whereas the FFMPEG implementation does not have this step:

https://github.com/Netflix/vmaf/issues/22

As for the VMAF metric itself; well, I can appreciate it's value in context of 'perceptual quality'. In this example it effectively declares the x264 transcodes to be visually lossless from CRF 0 to around CRF 16, whereas the ffmpeg-SSIM scores show a progressive decline over the entire CRF/bitrate range.

And here I ran a parallel series encoded to x265 for comparison.



Clearly VMAF judges x265 to have significantly higher perceptual quality than x264 at the lower bitrate range and more so than revealed by SSIM.

That said, I think 'classic' (ffmpeg) SSIM is still a useful tool for analyzing fine differences at the pixel peeping level and beyond visual acuity, and (by virtue of the differential Y, U and V scores) for determining whether the luma and/or chroma are affected.

I did record the libvmaf and ffmpeg PSNR scores also, but they are not as interesting.

@HolyWu, btw, thanks for the plugin.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 6th February 2019 at 03:33.
WorBry is offline   Reply With Quote
Old 5th February 2019, 07:25   #46  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,494
Has anyone else noticed that the VMAF scores in some cases tend to be "too perfect" to measure?

https://forum.doom9.org/showthread.p...21#post1864721
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 5th February 2019, 23:09   #47  |  Link
lansing
Registered User
 
Join Date: Sep 2006
Posts: 1,051
Quote:
Originally Posted by WorBry View Post

Clearly VMAF judges x265 to have significantly higher perceptual quality than x264 at the lower bitrate range and more so than revealed by SSIM.
Good comparison to show that x265 really has no advantage over x264 on 1080p materials if we're going for transparent encoding.

Now we'll just have to wait for people with high end computer to do the 4K comparison.
lansing is offline   Reply With Quote
Old 6th February 2019, 00:41   #48  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
I started off testing at original (Crowd Run) 2160/50p resolution but could see I would be in for a long haul
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 6th February 2019, 13:58   #49  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 526
Quote:
Originally Posted by lansing View Post
I couldn't update it through vsrepo?
Updates via vsrepo will never be available immediately. In addition, the new version was not recognized by the update script so it had to be done by hand. After what Myrsloik need to upload the new compiled repo file to his site. There are many steps as you can see.

But it's available now
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 7th February 2019, 07:52   #50  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Quote:
Originally Posted by WorBry View Post
Actually, looking at the per-frame scores in that same log, it is just the VMAF score for the first frame that skews the aggregate result, and it looks like it's the motion2 metric (which measures temporal difference) score of 0 that is responsible for that. All of the remaining 499 frames have a VMAF score of 100.



To enlarge open image and click (+) cursor.

Perhaps there should be an option to exclude the first frame from the aggregate scores?
There again, that's not always the case. Here, the original 2160/50p Crowd Run (8bit 420, y4m) reference clip encoded to x264 CRF=0 (i.e. lossless with switch to qp 0 and High 444 Predictive profile), and the clips compared with VMAF v3 in Model=1 mode:

Code:
<VMAF version="1.3.11">
<params model="" scaledWidth="3840" scaledHeight="2160" subsample="1" num_bootstrap_models="0" bootstrap_model_list_str="" />
<fyi numOfFrames="500" aggregateVMAF="100" aggregatePSNR="60" aggregateSSIM="1" aggregateMS_SSIM="1"......
....
<frame frameNum="0" adm2="1" motion2="0" ms_ssim="1" psnr="60" ssim="1" vif_scale0="1" vif_scale1="0.999999" vif_scale2="0.999999" vif_scale3="0.999998" vmaf="100" />
<frame frameNum="1" adm2="1" motion2="8.42311" ms_ssim="1" psnr="60" ssim="1" vif_scale0="1" vif_scale1="0.999999" vif_scale2="0.999999" vif_scale3="0.999998" vmaf="100" />
So I guess you have to let it do it's thing and take the scores as they come.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 7th February 2019 at 08:14.
WorBry is offline   Reply With Quote
Old 11th February 2019, 05:20   #51  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Quote:
Originally Posted by lansing View Post
Good comparison to show that x265 really has no advantage over x264 on 1080p materials if we're going for transparent encoding.

Now we'll just have to wait for people with high end computer to do the 4K comparison.
Done. So I ran a parallel test series using the original 2160/50p Crowd Run clip (8bit 420 y4m) as source and reference for the metric tests. For VMAF tests I used the VapourSynth (v3) plugin with the vmaf_4k_v0.6.1.pk model (model=1) which aims to "predict the subjective quality of video displayed on a 4KTV and viewed from the distance of 1.5 times the height of the display device (1.5H)"

The x264 results:





Interesting that the shape of ffmpeg-SSIM vs bitrate plot is quite different to that in the 1080/50p series and the differential between the VMAF and ffmpeg-SSIM scores is larger. The x265 encodes show the same behaviour:



Again the VMAF scores deem that x265 has higher perceptual quality over the lower bitrate range.

As to whether there is an advantage over x264 for 'transparent' encoding; well, I looked more closely at what point at which the VMAF plots hit the maximum score of 100.



For x264 it was at CRF=8 (1296 Mbps) and for x265 at CRF=10 (976 Mbps). So on that basis it could be concluded that x265 is significantly more efficient. That said, if you look at the per-frame VMAF scores, it is clear that the first frame skews the outcome somewhat.

Taking the x264 series first; going from CRF=9 to 16, all frames bar the first frame in each test scored VMAF=100. And in the x265 series also, going from CRF 11 to 17 only the first frame scored less than VMAF=100:



So, if the aggregate VMAF scores are calculated with the first frame excluded (simple average across the remaining 499 frames), CRF=16 (392 Mbps) becomes the point at which VMAF=100 is reached in the x264 series, and CRF=17 (306 Mbps) in the x265 series:



Makes quite a difference. x265 still has the edge on bit savings, but not by as much. I don't have time to calculate 'adjusted' aggregate VMAF scores for the other (lower bitrate) CRF data points. In the 1080/50p series an aggregate VMAF=100 score was never attained for precisely the same reason - the VMAF score of the first frame skewed the aggregate score.

Here are the ffmpeg-SSIM scores obtained in the 2160/50p series at these 'significant' CRF points though:



Edit:
Quote:
Originally Posted by WorBry View Post
Interesting that the shape of ffmpeg-SSIM vs bitrate plot is quite different to that in the 1080/50p series and the differential between the VMAF and ffmpeg-SSIM scores is larger...
I'll maybe see how the 2160/50p and 1080/50p series compare when plotted against bits/pixel.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 11th February 2019 at 06:28.
WorBry is offline   Reply With Quote
Old 11th February 2019, 08:00   #52  |  Link
lansing
Registered User
 
Join Date: Sep 2006
Posts: 1,051
Quote:
Originally Posted by WorBry View Post
x265 still has the edge on bit savings, but not by as much.
I would say there's no advantage at all. The difference is like 0.03 between the two scores at crf 17. I thought it would be like a 5 or 6 point difference...
lansing is offline   Reply With Quote
Old 11th February 2019, 17:49   #53  |  Link
VS_Fan
Registered User
 
Join Date: Jan 2016
Posts: 79
Quote:
Originally Posted by WorBry View Post
So, if the aggregate VMAF scores are calculated with the first frame excluded (simple average across the remaining 499 frames), CRF=16 (392 Mbps) becomes the point at which VMAF=100 is reached in the x264 series, and CRF=17 (306 Mbps) in the x265 series ... x265 still has the edge on bit savings, but not by as much.
Considering your criteria to obtain a 'transparent' encode, with x265 you are reducing the required bitrate from 392 (with x264) to 306 Mbps; you get 28% savings in storage space and/or streaming bandwidth, that's a significant amount!
VS_Fan is offline   Reply With Quote
Old 11th February 2019, 18:04   #54  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 526
You save 22% not 28% which still is very good.
__________________
Search and denoise
ChaosKing is offline   Reply With Quote
Old 11th February 2019, 18:25   #55  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
For better (more valid ?) interpretation of these fine differences at high bitrates I'm thinking now it might have been prudent to run these tests with the 'Confidence Interval' (ci) option.

That said, I see that the log generates ci95_high, ci95_low and stddev values for the individual frames but does not derive aggregate values as I was expecting - according to the VDK documentation, the command line tool reports aggregate values:

https://github.com/Netflix/vmaf/blob...nf_interval.md

So is this a limitation of the libvmaf implementation ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 11th February 2019 at 21:38.
WorBry is offline   Reply With Quote
Old 12th February 2019, 07:17   #56  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Quote:
Originally Posted by WorBry View Post
I'll maybe see how the 2160/50p and 1080/50p series compare when plotted against bits/pixel.
The VMAF results:



Perhaps not surprising, given that the vmaf_4k_v0.6.1 model predicts the subjective quality of video displayed on a 4KTV and viewed from the distance of 1.5 times the height of the screen whereas the vmaf_v0.6.1 model predicts the subjective quality of video displayed displayed a 1080p HDTV screen at distance 3 times the screen height.

Quote:
Originally Posted by WorBry View Post
In the 1080/50p series an aggregate VMAF=100 score was never attained for precisely the same reason - the VMAF score of the first frame skewed the aggregate score.


As seen there, the maximum VMAF score achieved in the 1080 50p series was 99.947 with the lossless (crf0) x264 encode,

What intrigues me more are the FFMPEG-SSIM results:



It's reasonable to assume that down-scaling of the original 2160/50p Crowd Run clip for the 1080/50p tests incurred some loss of fidelity in the 1080/50p source (and reference) clip, making it more 'compressible'. But why is the differential between the bit-matched 1080p and 2160p SSIM scores so much larger at 32-64 bits/pixel than it is down at around 6-8 bits/pixel ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 13th February 2019 at 05:55.
WorBry is offline   Reply With Quote
Old 12th February 2019, 08:26   #57  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 521
Quote:
Originally Posted by WorBry View Post
So is this a limitation of the libvmaf implementation ?
Yes. I just send a pull request to improve that.
HolyWu is offline   Reply With Quote
Old 12th February 2019, 17:47   #58  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Great. Thanks.

https://github.com/Netflix/vmaf/pull/304
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 12th February 2019 at 19:50.
WorBry is offline   Reply With Quote
Old 12th February 2019, 23:42   #59  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,009
Quote:
Originally Posted by WorBry View Post
I did record the libvmaf and ffmpeg PSNR scores also, but they are not as interesting.
Actually they are quite interesting.

PSNR scores obtained with the VapourSynth VMAF filter:



60 is the maximum score, achieved only with the lossless x264 CRF=0 encodes.

And the equivalent ffmpeg PSNR results:



I excluded the CRF=0 encode results because ffmpeg-PSNR reports lossless as Infinity (Inf).

The libvmaf PSNR scores are in general a little lower than the ffmpeg PSNR scores but show the same overall pattern. The 1080/50p series encodes gave higher bit-matched SSIM scores than the 2160/50p series at the higher bit/pixel range but at the lower end (< 24bits/pixel), that is reversed. Interesting also that the libvmaf PSNR metric gives wider separation of the x264 and x265 scores in the higher bit/pixel domain.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 13th February 2019 at 05:48.
WorBry is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:58.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.