Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th March 2019, 21:51   #81  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
Quote:
Originally Posted by WorBry View Post
First thing to note is that the VMAF v4 scores are lower than the scores I obtained previously (with the exact same x264 encodes) with v3. The same default pool=1 (harmonic mean) setting was applied in both cases, so I can only assume this reflects changes in the VMAF model itself.
I see what the issue is now. When I ran the first series of tests with v3 I left it set for CI=False because it did generate the aggregate CI scores (only the per-frame CI scores). Now that aggregate CI scores are available in v4 I set CI=True which switches from 'vmaf_v0.6.1.pkl' model to 'vmaf_b_v0.6.3.pkl'. Testing the x264 series again with v4 and CI=False, the aggregate VMAF scores are exactly the same as those obtained previously with v3.

So actually the issue is that CI=True (model vmaf_b_v0.6.3.pkl) is giving lower aggregate VMAF scores than CI=False (model vmaf_v0.6.1.pkl) ! Why is that ? Surely they should be giving the same VMAF score ?

Edit: There was no mix-up of the 'model' folders , btw - when I updated to v4 I replaced the VMAF.dll and 'model folder' that came with it.

The above graphs amended accordingly:





__________________
Nostalgia's not what it used to be

Last edited by WorBry; 5th March 2019 at 02:41.
WorBry is offline   Reply With Quote
Old 5th March 2019, 03:21   #82  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 538
Quote:
Originally Posted by WorBry View Post
So actually the issue is that CI=True (model vmaf_b_v0.6.3.pkl) is giving lower aggregate VMAF scores than CI=False (model vmaf_v0.6.1.pkl) ! Why is that ? Surely they should be giving the same VMAF score ?
I don't remember seeing any official document mentioning that the VMAF score between non-CI model and CI model should be the same
HolyWu is offline   Reply With Quote
Old 5th March 2019, 05:40   #83  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
I guess the VMAF Confidence Interval doc does explain why there are differences:

https://github.com/Netflix/vmaf/blob...nf_interval.md

Note that the CI=False VMAF scores are within or at the limits of the CI95_High interval.

Edit: btw, testing the parallel series of x265 encodes with v4 CI=True gives exactly the same pattern.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 5th March 2019 at 19:35.
WorBry is offline   Reply With Quote
Old 6th March 2019, 02:52   #84  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 538
Quote:
Originally Posted by WorBry View Post
I guess the VMAF Confidence Interval doc does explain why there are differences:
You are right. See #316.
HolyWu is offline   Reply With Quote
Old 6th March 2019, 03:34   #85  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
Thanks for raising the issue. I have a better understanding of what's going on now.
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 11th March 2019, 10:59   #86  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 95
ran some control tests (same file validated to itself) via ffmpeg libvmaf and VS libvmaf to test whether both implementations return the same data.

since libvmaf only supports up to yuv444p10le, higher quality formats need to be down-converted - ffmpeg does that automatically.

4 sources used for the control tests: RGB48, yuv444p12le, yuv444p10le, yuv444p

VMAF SDK 1.3.14 - Model 0.6.1 - pool: mean

Code:
EXR RGB48                               VMAF 		Note
ffmpeg 			 		98.2549 	converts internally to yuv444p10le 
VS (1) 					97.747 		converted to yuv444p10le via FMTC		
VS (2) 					97.7475 	converted to yuv444p10le via resize.bicubic		

MP4 yuv444p12le 		        VMAF 		Note
ffmpeg 			 		98.1216 	converts internally to yuv444p10le 
VS (1) 					97.7106 	converted to yuv444p10le via FMTC		
VS (2) 					97.7101 	converted to yuv444p10le via resize.bicubic	

MP4 yuv444p10le 		        VMAF 		Note
ffmpeg 			 		98.1044 	no conversion needed
VS					97.7144 	no conversion needed			

MP4 yuv444p 			        VMAF 		Note
ffmpeg 			 		97.7363 	no conversion needed
VS 					97.7363	        no conversion needed

While it can be expected that the 16bit and 12bit sources will not return the same VMAF scores (ffmpeg internal down-conversion may not match the chosen VS conversion method), it is interesting to see that only w/ the 8-bit src the VMAF scores match.

The VMAF scores of the 10-bit src (although no conversion being done) still differ.

what is the reason for that ?
Iron_Mike is offline   Reply With Quote
Old 11th March 2019, 16:32   #87  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 538
Quote:
Originally Posted by Iron_Mike View Post
The VMAF scores of the 10-bit src (although no conversion being done) still differ.

what is the reason for that ?
Because FFmpeg filter doesn't normalize 10 bit to 8 bit like what Netflix does for calculation, hence the inconsistency.
HolyWu is offline   Reply With Quote
Old 11th March 2019, 22:14   #88  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 95
Quote:
Originally Posted by HolyWu View Post
Because FFmpeg filter doesn't normalize 10 bit to 8 bit like what Netflix does for calculation, hence the inconsistency.
which is the better approach

if a 10bit ref/src is provided. up-converting an encoded/distorted clip to 10bit does not lose precision, but down-converting a 10bit ref src to 8bit to then compare to the inferior 8-bit encode loses precision/accuracy...

NF does up-res a lower res encoded clip before comparing to the higher res ref/src (same logic), so this seems odd...

do you have a link to where they state that they downsample the master to 8bit ?

Thanks.
Iron_Mike is offline   Reply With Quote
Old 13th March 2019, 04:48   #89  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
Finally got around to re-testing the Crowd Run 2160/50p x264 series that I kept from earlier tests with v3:

https://forum.doom9.org/showthread.p...16#post1865316

So this was testing with VapourSynth VMAF v4 in Model=1 mode which uses vmaf_4k_v0.6.1 by default (CI=False) and vmaf_4k_rb_v0.6.2 when set to CI=True.

Now in this case CI=False and CI=True produced the exact same aggregate VMAF scores, which came as a surprise:



Now how is that ? The Confidence Interval doc doesn't mention 4K models specifically but I would assume the 'rb' in 'vmaf_4k_rb_v0.6.2' means 'residue bootstrapping', in which case why is residue bootstrapping used to derive CI scores for 4K video, whereas the CI model for HD/SD (vmaf_b_v0.6.3) uses plain bootstrapping ? All rather confusing.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 13th March 2019 at 14:59.
WorBry is offline   Reply With Quote
Old 16th March 2019, 04:12   #90  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 538
Quote:
Originally Posted by Iron_Mike View Post
do you have a link to where they state that they downsample the master to 8bit ?
Netflix doesn't explicitly mention that in the documentation. It's simply done this way in their source code.


Quote:
Originally Posted by WorBry View Post
Now in this case CI=False and CI=True produced the exact same aggregate VMAF scores, which came as a surprise:
I think the non-bootstrapping 4K model should have been named v0.6.2 rather than v0.6.1, as the model was released after VMAF algorithm v0.6.2. And the VMAF score won't be different between residue bootstrapping and plain bootstrapping. Only the CI-related scores will be affected.
HolyWu is offline   Reply With Quote
Old 16th March 2019, 05:02   #91  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
Quote:
Originally Posted by HolyWu View Post
... And the VMAF score won't be different between residue bootstrapping and plain bootstrapping. Only the CI-related scores will be affected.
OK, but still - why in the 4K (2160/50p) tests does CI=True (vmaf_4k_rb_v0.6.2) give exactly the same aggregate VMAF scores as CI=False (vmaf_4k_v0.6.1), when in the 1080/50p tests CI=True (vmaf_b_v0.6.3.pkl) gave consistently lower aggregate VMAF scores than CI=False (vmaf_v0.6.1.pkl) ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 16th March 2019 at 05:10.
WorBry is offline   Reply With Quote
Old 16th March 2019, 05:22   #92  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 538
Quote:
Originally Posted by WorBry View Post
OK, but still - why in the 4K (2160/50p) tests does CI=True (vmaf_4k_rb_v0.6.2) give exactly the same aggregate VMAF scores as CI=False (vmaf_4k_v0.6.1), when in the 1080/50p tests CI=True (vmaf_b_v0.6.3.pkl) gave consistently lower aggregate VMAF scores than CI=False (vmaf_v0.6.1.pkl) ?
If vmaf_4k_v0.6.1 is actually trained with the same underlying environment as vmaf_4k_rb_v0.6.2, they are expected to have the same VMAF scores then. vmaf_v0.6.1.pkl was trained with different underlying environment compared to vmaf_b_v0.6.3.pkl, hence they don't give the same VMAF scores.
HolyWu is offline   Reply With Quote
Old 16th March 2019, 06:03   #93  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
So why don't they update the 'classic' non-bootstrapping HD/SD model, trained in the same environment as vmaf_b_v0.6.3, so that CI=False and CI=True produce the same aggregate VMAF scores as well? Surely it's important to have consistent outcomes ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 16th March 2019 at 06:05.
WorBry is offline   Reply With Quote
Old 16th March 2019, 06:11   #94  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 538
Quote:
Originally Posted by WorBry View Post
So why don't they update the 'classic' non-bootstrapping HD/SD model, trained in the same environment as vmaf_b_v0.6.3, so that CI=False and CI=True produce the same aggregate VMAF scores as well? Surely it's important to have consistent outcomes ?
Can't answer this as I'm not the developer of Netflix/VMAF.
HolyWu is offline   Reply With Quote
Old 16th March 2019, 06:38   #95  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,134
Fair enough
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:43.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.