Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Closed Thread
 
Thread Tools Search this Thread Display Modes
Old 4th March 2019, 21:51   #81  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
Quote:
Originally Posted by WorBry View Post
First thing to note is that the VMAF v4 scores are lower than the scores I obtained previously (with the exact same x264 encodes) with v3. The same default pool=1 (harmonic mean) setting was applied in both cases, so I can only assume this reflects changes in the VMAF model itself.
I see what the issue is now. When I ran the first series of tests with v3 I left it set for CI=False because it did generate the aggregate CI scores (only the per-frame CI scores). Now that aggregate CI scores are available in v4 I set CI=True which switches from 'vmaf_v0.6.1.pkl' model to 'vmaf_b_v0.6.3.pkl'. Testing the x264 series again with v4 and CI=False, the aggregate VMAF scores are exactly the same as those obtained previously with v3.

So actually the issue is that CI=True (model vmaf_b_v0.6.3.pkl) is giving lower aggregate VMAF scores than CI=False (model vmaf_v0.6.1.pkl) ! Why is that ? Surely they should be giving the same VMAF score ?

Edit: There was no mix-up of the 'model' folders , btw - when I updated to v4 I replaced the VMAF.dll and 'model folder' that came with it.

The above graphs amended accordingly:





__________________
Nostalgia's not what it used to be

Last edited by WorBry; 5th March 2019 at 02:41.
WorBry is offline  
Old 5th March 2019, 03:21   #82  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by WorBry View Post
So actually the issue is that CI=True (model vmaf_b_v0.6.3.pkl) is giving lower aggregate VMAF scores than CI=False (model vmaf_v0.6.1.pkl) ! Why is that ? Surely they should be giving the same VMAF score ?
I don't remember seeing any official document mentioning that the VMAF score between non-CI model and CI model should be the same
HolyWu is offline  
Old 5th March 2019, 05:40   #83  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
I guess the VMAF Confidence Interval doc does explain why there are differences:

https://github.com/Netflix/vmaf/blob...nf_interval.md

Note that the CI=False VMAF scores are within or at the limits of the CI95_High interval.

Edit: btw, testing the parallel series of x265 encodes with v4 CI=True gives exactly the same pattern.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 5th March 2019 at 19:35.
WorBry is offline  
Old 6th March 2019, 02:52   #84  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by WorBry View Post
I guess the VMAF Confidence Interval doc does explain why there are differences:
You are right. See #316.
HolyWu is offline  
Old 6th March 2019, 03:34   #85  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
Thanks for raising the issue. I have a better understanding of what's going on now.
__________________
Nostalgia's not what it used to be
WorBry is offline  
Old 11th March 2019, 10:59   #86  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
ran some control tests (same file validated to itself) via ffmpeg libvmaf and VS libvmaf to test whether both implementations return the same data.

since libvmaf only supports up to yuv444p10le, higher quality formats need to be down-converted - ffmpeg does that automatically.

4 sources used for the control tests: RGB48, yuv444p12le, yuv444p10le, yuv444p

VMAF SDK 1.3.14 - Model 0.6.1 - pool: mean

Code:
EXR RGB48                               VMAF 		Note
ffmpeg 			 		98.2549 	converts internally to yuv444p10le 
VS (1) 					97.747 		converted to yuv444p10le via FMTC		
VS (2) 					97.7475 	converted to yuv444p10le via resize.bicubic		

MP4 yuv444p12le 		        VMAF 		Note
ffmpeg 			 		98.1216 	converts internally to yuv444p10le 
VS (1) 					97.7106 	converted to yuv444p10le via FMTC		
VS (2) 					97.7101 	converted to yuv444p10le via resize.bicubic	

MP4 yuv444p10le 		        VMAF 		Note
ffmpeg 			 		98.1044 	no conversion needed
VS					97.7144 	no conversion needed			

MP4 yuv444p 			        VMAF 		Note
ffmpeg 			 		97.7363 	no conversion needed
VS 					97.7363	        no conversion needed

While it can be expected that the 16bit and 12bit sources will not return the same VMAF scores (ffmpeg internal down-conversion may not match the chosen VS conversion method), it is interesting to see that only w/ the 8-bit src the VMAF scores match.

The VMAF scores of the 10-bit src (although no conversion being done) still differ.

what is the reason for that ?
Iron_Mike is offline  
Old 11th March 2019, 16:32   #87  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by Iron_Mike View Post
The VMAF scores of the 10-bit src (although no conversion being done) still differ.

what is the reason for that ?
Because FFmpeg filter doesn't normalize 10 bit to 8 bit like what Netflix does for calculation, hence the inconsistency.
HolyWu is offline  
Old 11th March 2019, 22:14   #88  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by HolyWu View Post
Because FFmpeg filter doesn't normalize 10 bit to 8 bit like what Netflix does for calculation, hence the inconsistency.
which is the better approach

if a 10bit ref/src is provided. up-converting an encoded/distorted clip to 10bit does not lose precision, but down-converting a 10bit ref src to 8bit to then compare to the inferior 8-bit encode loses precision/accuracy...

NF does up-res a lower res encoded clip before comparing to the higher res ref/src (same logic), so this seems odd...

do you have a link to where they state that they downsample the master to 8bit ?

Thanks.
Iron_Mike is offline  
Old 13th March 2019, 04:48   #89  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
Finally got around to re-testing the Crowd Run 2160/50p x264 series that I kept from earlier tests with v3:

https://forum.doom9.org/showthread.p...16#post1865316

So this was testing with VapourSynth VMAF v4 in Model=1 mode which uses vmaf_4k_v0.6.1 by default (CI=False) and vmaf_4k_rb_v0.6.2 when set to CI=True.

Now in this case CI=False and CI=True produced the exact same aggregate VMAF scores, which came as a surprise:



Now how is that ? The Confidence Interval doc doesn't mention 4K models specifically but I would assume the 'rb' in 'vmaf_4k_rb_v0.6.2' means 'residue bootstrapping', in which case why is residue bootstrapping used to derive CI scores for 4K video, whereas the CI model for HD/SD (vmaf_b_v0.6.3) uses plain bootstrapping ? All rather confusing.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 13th March 2019 at 14:59.
WorBry is offline  
Old 16th March 2019, 04:12   #90  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by Iron_Mike View Post
do you have a link to where they state that they downsample the master to 8bit ?
Netflix doesn't explicitly mention that in the documentation. It's simply done this way in their source code.


Quote:
Originally Posted by WorBry View Post
Now in this case CI=False and CI=True produced the exact same aggregate VMAF scores, which came as a surprise:
I think the non-bootstrapping 4K model should have been named v0.6.2 rather than v0.6.1, as the model was released after VMAF algorithm v0.6.2. And the VMAF score won't be different between residue bootstrapping and plain bootstrapping. Only the CI-related scores will be affected.
HolyWu is offline  
Old 16th March 2019, 05:02   #91  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
Quote:
Originally Posted by HolyWu View Post
... And the VMAF score won't be different between residue bootstrapping and plain bootstrapping. Only the CI-related scores will be affected.
OK, but still - why in the 4K (2160/50p) tests does CI=True (vmaf_4k_rb_v0.6.2) give exactly the same aggregate VMAF scores as CI=False (vmaf_4k_v0.6.1), when in the 1080/50p tests CI=True (vmaf_b_v0.6.3.pkl) gave consistently lower aggregate VMAF scores than CI=False (vmaf_v0.6.1.pkl) ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 16th March 2019 at 05:10.
WorBry is offline  
Old 16th March 2019, 05:22   #92  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by WorBry View Post
OK, but still - why in the 4K (2160/50p) tests does CI=True (vmaf_4k_rb_v0.6.2) give exactly the same aggregate VMAF scores as CI=False (vmaf_4k_v0.6.1), when in the 1080/50p tests CI=True (vmaf_b_v0.6.3.pkl) gave consistently lower aggregate VMAF scores than CI=False (vmaf_v0.6.1.pkl) ?
If vmaf_4k_v0.6.1 is actually trained with the same underlying environment as vmaf_4k_rb_v0.6.2, they are expected to have the same VMAF scores then. vmaf_v0.6.1.pkl was trained with different underlying environment compared to vmaf_b_v0.6.3.pkl, hence they don't give the same VMAF scores.
HolyWu is offline  
Old 16th March 2019, 06:03   #93  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
So why don't they update the 'classic' non-bootstrapping HD/SD model, trained in the same environment as vmaf_b_v0.6.3, so that CI=False and CI=True produce the same aggregate VMAF scores as well? Surely it's important to have consistent outcomes ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 16th March 2019 at 06:05.
WorBry is offline  
Old 16th March 2019, 06:11   #94  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by WorBry View Post
So why don't they update the 'classic' non-bootstrapping HD/SD model, trained in the same environment as vmaf_b_v0.6.3, so that CI=False and CI=True produce the same aggregate VMAF scores as well? Surely it's important to have consistent outcomes ?
Can't answer this as I'm not the developer of Netflix/VMAF.
HolyWu is offline  
Old 16th March 2019, 06:38   #95  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,197
Fair enough
__________________
Nostalgia's not what it used to be
WorBry is offline  
Old 30th March 2019, 02:35   #96  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
already posted this in another thread, but thought I'd post VMAF results here as well

from a 16-bit EXR source (15 secs, 360 frames), made nine (9) x265 encodes, all CRF 10, in these formats (using Wolfberry ffmpeg build): yuv420p, yuv422p, yuv444p, yuv420p10le, yuv422p10le, yuv444p10le, yuv420p12le, yuv422p12le, yuv444p12le

VS VMAF results (sources were down-converted to yuv444p10, if higher, since that is the highest input format supported)



FFMPEG VMAF results (internally converts to yuv444p10, if higher source)




as you can see VMAF score indication is the same in both, but the SSIM/MS-SSIM differ... now besides that FFMPEG has that odd dip (scoring 8-bit higher than 10/12-bit), the VS VMAF results are almost flat...

so does VS VMAF internally convert everything to 8-bit (although it supports up to yuv444p10 input format) ?

Last edited by Iron_Mike; 30th March 2019 at 02:51.
Iron_Mike is offline  
Old 30th March 2019, 03:29   #97  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,345
Quote:
Originally Posted by Iron_Mike View Post

so does VS VMAF internally convert everything to 8-bit (although it supports up to yuv444p10 input format) ?

That's what HolyWu said, above - as per Netflix's source code

Quote:
Originally Posted by HolyWu View Post
Netflix doesn't explicitly mention that in the documentation. It's simply done this way in their source code.
And a difference is that ffmpeg's vmaf implementation does not
poisondeathray is offline  
Old 30th March 2019, 03:33   #98  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Quote:
Originally Posted by Iron_Mike View Post
as you can see VMAF score indication is the same in both, but the SSIM/MS-SSIM differ... now besides that FFMPEG has that odd dip (scoring 8-bit higher than 10/12-bit), the VS VMAF results are almost flat...

so does VS VMAF internally convert everything to 8-bit (although it supports up to yuv444p10 input format) ?
Yes. vmafossexec (the CLI of libvmaf) also does this normalization for 10-bit input. The normalized values are stored in floating-point, hence you needn't worry about precision lost. If you enable PSNR calculation in both VS libvmaf and FFmpeg libvmaf as well, you'll probably see bigger difference.
HolyWu is offline  
Old 30th March 2019, 07:49   #99  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
@HolyWu:

since everything gets down-converted to 8-bit internally, why are you guys not making 8-bit input mandatory in VS VMAF ?

I mean the 10-bit input support is pointless, and VS VMAF already requires same format for ref/dist, so the user is already required to convert in most cases before calling VS VMAF...


and btw, I mentioned this in the other thread:

when I use yuv444p (8-bit) as input format (coming from RGB48le) in VS VMAF compared to using yuv444p10 (10-bit), the range of VMAF/SSIM/MS-SSIM values is compressed (closer together)... since everything gets converted to 8-bit internally anyways, the range of values should pretty much be the same... unless the result of the filters I use to down-convert to 8-bit is substantially different to what VMAF uses internally... (I use fmtc or vs.resize)

alongside the other VMAF results, this is the result if I use 8-bit input w/ VS VMAF:


VS VMAF results (sources were down-converted to yuv444p)


Last edited by Iron_Mike; 30th March 2019 at 08:44.
Iron_Mike is offline  
Old 3rd April 2019, 15:39   #100  |  Link
HolyWu
Registered User
 
Join Date: Aug 2006
Location: Taiwan
Posts: 392
Update r5.
  • Accept clips of any planar format with integer sample type of 8-16 bit depth, except RGB. Note that libvmaf only uses luma plane for calculating scores.
  • Remove parameter psnr.
HolyWu is offline  
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:45.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.