Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > VapourSynth

Reply
 
Thread Tools Search this Thread Display Modes
Old 1st March 2019, 15:19   #61  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Quote:
Originally Posted by Iron_Mike View Post
could you elaborate on the first two points of this update ?

does that mean we have to scale both inputs to 8bit before calculating VMAF, or does VMAF do it automatically ?

.....Also, how do I use a "stricter linear frame request" ?
These are internal improvements that were made in update r2 - you don't need to do anything.

Quote:
Originally Posted by Iron_Mike View Post
When I compare a 12bit 444 (yuv444p12le) CRF 10 x265 encode to the 16 bit exr ref footage I get a 96.4 VMAF score... little bit low considering the tests that WorBry has done...
Bear in mind though that all of those tests were done on a single source. Could be any number of factors weighing in there. What scores to you get if you get if you test the ref and x265 clips against themselves ?
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 1st March 2019 at 15:57.
WorBry is offline   Reply With Quote
Old 1st March 2019, 16:43   #62  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 4,477
Quote:
Originally Posted by Iron_Mike View Post

When I compare a 12bit 444 (yuv444p12le) CRF 10 x265 encode to the 16 bit exr ref footage I get a 96.4 VMAF score... little bit low considering the tests that WorBry has done...

I'm not sure how valid that test would be. EXR is usually sRGB linear , and 16bit half float .

For any metric you usually need a common ground to compare. This means same pixel format (same colorspace, same bit depth, same chroma subsampling) . Otherwise you introduce other variables that are not controlled for. e.g. if one run uses one algorithm to scale (e.g. bicubic vs. bilinear, vs...) , or another dithers down using one algorithm, but another does not... or if you convert to RGB using different matrix, etc... there are many factors that invalidate your testing
poisondeathray is offline   Reply With Quote
Old 1st March 2019, 22:12   #63  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Not to mention the potential for frame shifts/misalignment when using different decoders for the reference and test clips, although the filter will report an error if the number of frames is different.

Also needs to be appreciated that the VMAF models are 'trained' for predicting perceptual quality at streaming bitrates primarily. Came across this quote from Netflix:

Quote:
"VMAF has been trained using encodes spanning from CRF 22 @ 1080 (highest quality) to CRF 28 @ 240 (lowest quality). The former is mapped to score 100 and the latter is mapped to score 20. Anything in between is mapped in the middle (for example, SD encode at 480 is typically mapped to 40 ~ 70)."
https://streaminglearningcenter.com/...flix-vmaf.html

I would assume a similar focus was applied in training the 4K model.

So at CRF 10 you are well into uncharted territory.

Personally, I'd be more inclined to look at other metrics available for VapourSynth that are (maybe) better attuned for VQA in the visually lossless domain - GMSD, MDSI and yes, SSIM....Butteraugli, possibly. My own journey of discovery in that vein continues:

https://forum.doom9.org/showthread.php?t=176101
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 1st March 2019 at 22:19.
WorBry is offline   Reply With Quote
Old 1st March 2019, 22:42   #64  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by WorBry View Post
Bear in mind though that all of those tests were done on a single source. Could be any number of factors weighing in there. What scores to you get if you get if you test the ref and x265 clips against themselves ?
I ran all test w/ the ffmpeg libvmaf filter, but I assume since it uses the exact same VMAF models the result would be the same...

16bit EXR ref clip is 400 frames - control test to itself via 0.6.1 results in 98.2

x265 12bit 444 encode (from EXR), control tested to itself via 0.6.1 results in 98.08

x265 12bit 444 encode (from EXR) tested against it's source (16 bit EXR) via 0.6.1 results in 96.4


interesting that the clip you tested had 399 out of 400 frames a perfect 100 in the control test...
Iron_Mike is offline   Reply With Quote
Old 1st March 2019, 22:51   #65  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by poisondeathray View Post
I'm not sure how valid that test would be. EXR is usually sRGB linear , and 16bit half float .

For any metric you usually need a common ground to compare. This means same pixel format (same colorspace, same bit depth, same chroma subsampling) . Otherwise you introduce other variables that are not controlled for. e.g. if one run uses one algorithm to scale (e.g. bicubic vs. bilinear, vs...) , or another dithers down using one algorithm, but another does not... or if you convert to RGB using different matrix, etc... there are many factors that invalidate your testing
EXR contains whatever you put into it (has nothing to do with sRGB) - there are no assumptions here: this is the master of the movie in HD in lossless 16 bit EXR, I took 400 frames as a test sequence

for streaming the movie was encoded via ffmpeg and x265 from EXR (rgb48le) to x265 12bit 444 (yuv444p12le) - final result looks very good, we're using VMAF to compare various encodes (presets/CRF/etc) against each other... exact same as NF does it w/ VMAF... the master one delivers to NF is obviosly also not 8bit 420, they encode from that (high quality) master for NF streaming...

so the "common ground" you state is the same movie in the same resolution, which is the only thing that NF states in their VMAF instructions...

the whole reason for comparison is different output bit depth w/ different output chroma subsampling, on top of different encoding settings, so I do not understand your point...

Last edited by Iron_Mike; 1st March 2019 at 22:53.
Iron_Mike is offline   Reply With Quote
Old 1st March 2019, 22:59   #66  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by WorBry View Post
Not to mention the potential for frame shifts/misalignment when using different decoders for the reference and test clips, although the filter will report an error if the number of frames is different.
not sure what "different encoders" here relates to ?

ffmpeg reads an EXR frame and then reads a frame from the mp4 encode (that was done from the EXR via ffmpeg) and then passes the decoded frames to VMAF for comparison...

is there a setting to avoid frame shifts/misalignment ?


Quote:
Originally Posted by WorBry View Post
So at CRF 10 you are well into uncharted territory.
you yourself validated the test clip up to CRF 0 and results on the charts make sense... your encodes went up to VMAF 100... not sure I understand your concern for CRF 10 ?

Thanks

Last edited by Iron_Mike; 1st March 2019 at 23:07.
Iron_Mike is offline   Reply With Quote
Old 1st March 2019, 23:09   #67  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Quote:
Originally Posted by Iron_Mike View Post
..interesting that the clip you tested had 399 out of 400 frames a perfect 100 in the control test...
For that particular source, yes...actually it was 499 out of 500 frames that scored 100 - it was just that first frame with the motion2 score of 0 that skewed the aggregate score. But I've yet to test other sources.

Go through the list of per-frame VMAF scores from your 'self' tests and you'll be able to identify which frames are skewing the aggregate score.
__________________
Nostalgia's not what it used to be
WorBry is offline   Reply With Quote
Old 1st March 2019, 23:29   #68  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 4,477
Quote:
Originally Posted by Iron_Mike View Post

for streaming the movie was encoded via ffmpeg and x265 from EXR (rgb48le) to x265 12bit 444 (yuv444p12le) - final result looks very good, we're using VMAF to compare various encodes (presets/CRF/etc) against each other... exact same as NF does it w/ VMAF... the master one delivers to NF is obviosly also not 8bit 420, they encode from that (high quality) master for NF streaming...

so the "common ground" you state is the same movie in the same resolution, which is the only thing that NF states in their VMAF instructions...

the whole reason for comparison is different output bit depth w/ different output chroma subsampling, on top of different encoding settings, so I do not understand your point...


You are posting in the vapoursynth vmaf thread. Only certain pixel formats are supported.

https://github.com/HomeOfVapourSynth...ourSynth-VMAF/
Quote:
Clips to calculate VMAF score. Only YUV420P8, YUV422P8, YUV444P8, YUV420P10, YUV422P10, and YUV444P10 are supported.
Those are the common pixel formats supported by vmaf.VMAF .


My point is strive to be more scientific. To eliminate all those confounding variables in a controlled environment. How you perform the various conversions will affect the results that are calculated.

But now it's clear you're using ffmpeg vmaf. Did you look at the ffmpeg log to see what other conversions were occurring ? There might be other stuff going on behind your back


Quote:
Originally Posted by Iron_Mike View Post
ffmpeg reads an EXR frame and then reads a frame from the mp4 encode (that was done from the EXR via ffmpeg) and then passes the decoded frames to VMAF for comparison...

is there a setting to avoid frame shifts/misalignment ?
Sometimes ffmpeg can "mix" up frames, less often with I-frame formats. But if your x265 encode used long GOP, there is a higher chance of a mixup than if it used I-frames only. EXR sequence will be I-frame only

For the vapoursynth , the source filter can be indexed, and is more robust method for frame accuracy. For ffmpeg you can reset the PTS which might help


Quote:
you yourself validated the test clip up to CRF 0 and results on the charts make sense... your encodes went up to VMAF 100... not sure I understand your concern for CRF 10 ?
Look at the results WorBy has been posting . They all plateau off below crf 18 or so. crf16 has the "same quality" as crf 10 or crf 1 if you blindly believe VMAF. ie. Everything looks "the same" to VMAF at higher bitrate ranges. ie. It's not a useful metric for distinguishing between higher quality streams - only for streaming lowish bitrate delivery ranges

Last edited by poisondeathray; 1st March 2019 at 23:34.
poisondeathray is offline   Reply With Quote
Old 1st March 2019, 23:30   #69  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by WorBry View Post
Go through the list of per-frame VMAF scores from your 'self' tests and you'll be able to identify which frames are skewing the aggregate score.
I did that, only a few frames out of the 400 are VMAF 100 - all others have scores in the 97-99 range (in the EXR control test), hence my question...

This all seems normal considering that NF themselves state that in their FAQs but when I saw you get a perfect 100 in 499/500 frames I thought maybe they've updated their model to make control tests perform close to 100...
Iron_Mike is offline   Reply With Quote
Old 1st March 2019, 23:40   #70  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Quote:
Originally Posted by Iron_Mike View Post
not sure what "different encoders" here relates to ?
I said 'decoders'.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 2nd March 2019 at 04:16.
WorBry is offline   Reply With Quote
Old 1st March 2019, 23:42   #71  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by poisondeathray View Post
Those are the common pixel formats supported by vmaf.VMAF.
is this for the VS flavor or for VMAF in general... where did you get this list ?

Quote:
Originally Posted by poisondeathray View Post
But now it's clear you're using ffmpeg vmaf. Did you look at the ffmpeg log to see what other conversions were occurring ? There might be other stuff going on behind your back
since I wasn't sure whether the bitrate was an issue (for the VMAF calculation), I converted the main and ref clips to yuv444p (8bit) before passing them into ffmpeg libvmaf (by specifying the -pix_fmt)... ffmpeg VMAF will tell the format it uses to compare in the console output, for my main/ref clips (16bit/12bit) it defaults to yuv444p10le, but once u pass clips in as 8bit it uses that format...

the VMAF score whether using original bit depth, 12 bit, 10 bit or 8 bit for the main/ref clips was always ~ 96.x (real test, not control)

Quote:
Originally Posted by poisondeathray View Post
Sometimes ffmpeg can "mix" up frames, less often with I-frame formats. But if your x265 encode used long GOP, there is a higher chance of a mixup than if it used I-frames only. EXR sequence will be I-frame only
GOP size on the encode is a fixed 48 frames, fps is 24

Quote:
Originally Posted by poisondeathray View Post
Look at the results WorBy has been posting . They all plateau off below crf 18 or so. crf16 has the same quality as crf 10 or crf 1 if you blindly believe VMAF. ie. Everything looks "the same" to VMAF at higher bitrate ranges. ie. It's not a useful metric for distinguishing higher quality - only for streaming lowish bitrate delivery ranges
well, or in other words:
those results could easily be interpreted that from a certain CRF on, the encode is perceptually identical, which is the whole point of VMAF...

their samples are based on humans reporting perceived quality differences...

Last edited by Iron_Mike; 1st March 2019 at 23:46.
Iron_Mike is offline   Reply With Quote
Old 1st March 2019, 23:48   #72  |  Link
ChaosKing
Registered User
 
Join Date: Dec 2005
Location: Germany
Posts: 1,440
Quote:
Originally Posted by Iron_Mike View Post
is this for the VS flavor or for VMAF in general... where did you get this list ?
Read the Readme: https://github.com/HomeOfVapourSynth...th-VMAF/#usage
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth
VapourSynth Portable FATPACK || VapourSynth Database || https://github.com/avisynth-repository
ChaosKing is offline   Reply With Quote
Old 1st March 2019, 23:50   #73  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 4,477
Quote:
Originally Posted by Iron_Mike View Post
is this for the VS flavor or for VMAF in general... where did you get this list ?
vapoursynth vmaf,

https://github.com/HomeOfVapourSynth...ourSynth-VMAF/


Quote:
since I wasn't sure whether the bitrate was an issue (for the VMAF calculation), I converted the main and ref clips to yuv444p (8bit) before passing them into ffmpeg libvmaf (by specifying the -pix_fmt)... ffmpeg VMAF will tell the format it uses to compare in the console output, for my main/ref clips (16bit/12bit) it defaults to yuv444p10le, but once u pass clips in as 8bit it uses that format...

the VMAF score whether using original bit depth, 12 bit, 10 or or 8bit for the main/ref clips was always ~ 96.x (real test, not control)
VMAF is probably less picky about those sorts of things, but it makes a significant difference on other metrics. There are a bunch of uncontrolled variables and operations there can cause wildly different results with other metrics - how it's scaled, dithering algo, etc...





Quote:
well, or in other words:
those results could easily be interpreted that from a certain CRF on, the encode is perceptually identical, which is the whole point of VMAF...

their samples are based on humans reporting perceived quality differences...
Yes , that's a good way of phrasing it

I personally haven't used VMAF enough to be comfortable with it yet

I personally don't find that particularly useful. I guess it might be good enough for "joe public" , they might not be able to tell the difference. But you can bet people that deal frequently with encoding, codecs, compression ; ie. people that post here - they can tell the difference between say, a crf 10 vs. crf 18 encode.

Maybe a conspiracy theory, but it's almost like a Netflix scheme trying to justify their low delivery bitrate practices

Last edited by poisondeathray; 2nd March 2019 at 00:02.
poisondeathray is offline   Reply With Quote
Old 2nd March 2019, 01:18   #74  |  Link
Iron_Mike
Registered User
 
Join Date: Jul 2010
Posts: 132
Quote:
Originally Posted by poisondeathray View Post
I personally don't find that particularly useful. I guess it might be good enough for "joe public" , they might not be able to tell the difference. But you can bet people that deal frequently with encoding, codecs, compression ; ie. people that post here - they can tell the difference between say, a crf 10 vs. crf 18 encode.
well "joe public" ultimately watches the content... I can tell the diff between CRF 10 and CRF 18 (frame per frame pixel peeping), but I also evaluate content on fully calibrated screens...

problem is w/ "scientific metrics" is that they often not relate a lot to the HVS (Human Vision System), which is the only thing that matters when humans watch the streamed content...

VMAF attempts to address that with their sample data... question always are if enough people were sampled, what kind of people (gender/age/race/ethicity - diff between European and Asian samples etc) and the sample procedure was done as best as possible...

Quote:
Originally Posted by poisondeathray View Post
Maybe a conspiracy theory, but it's almost like a Netflix scheme trying to justify their low delivery bitrate practices
hah ! probably the reason to start the project..
Iron_Mike is offline   Reply With Quote
Old 2nd March 2019, 02:11   #75  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 4,477
Quote:
Originally Posted by Iron_Mike View Post
well "joe public" ultimately watches the content... I can tell the diff between CRF 10 and CRF 18 (frame per frame pixel peeping), but I also evaluate content on fully calibrated screens...
It's an assumption . Audience might videophiles, or doom9ers, or you might be doing these tests for you

Quote:
problem is w/ "scientific metrics" is that they often not relate a lot to the HVS (Human Vision System), which is the only thing that matters when humans watch the streamed content...

Yes, pros/cons to every measure , but there are other HVS modelled metrics.


Quote:
VMAF attempts to address that with their sample data... question always are if enough people were sampled, what kind of people (gender/age/race/ethicity - diff between European and Asian samples etc) and the sample procedure was done as best as possible...
It's just that the RD curve characteristics limit VMAF's usefulness in some situations , since it's trained on higher CRF ranges.

So another way to phrase it - is the data set is not valid at higher bitrates. You cannot apply VMAF at higher bitrates because it was trained at CRF 22-28
poisondeathray is offline   Reply With Quote
Old 2nd March 2019, 02:18   #76  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Quote:
Originally Posted by Iron_Mike View Post
VMAF attempts to address that with their sample data... question always are if enough people were sampled, what kind of people (gender/age/race/ethicity - diff between European and Asian samples etc) and the sample procedure was done as best as possible...
And what biases were introduced into the model by the choice of video codecs used in the subjective DMOS testing. Hmmm

https://forum.doom9.org/showthread.p...37#post1867137

Makes me wonder.

https://www.reddit.com/r/netflix/com...for_hd_titles/
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 2nd March 2019 at 02:31.
WorBry is offline   Reply With Quote
Old 2nd March 2019, 04:18   #77  |  Link
HolyWu
Registered User
 
HolyWu's Avatar
 
Join Date: Aug 2006
Location: Taiwan
Posts: 757
Update r4.
  • Update libvmaf to v1.3.14, which reports aggregate CI scores and fix empty model name in log.
HolyWu is offline   Reply With Quote
Old 2nd March 2019, 05:55   #78  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Cool. Should be interesting to see what statistical significance VMAF gives to those superfine score differences seen at the very high bitrates that I brought attention to earlier, which now, in light of the present discussion, I wish I hadn't

https://forum.doom9.org/showthread.p...24#post1865424

Seeing that comment from Netflix changed my perspective somewhat:

Quote:
"VMAF has been trained using encodes spanning from CRF 22 @ 1080 (highest quality) to CRF 28 @ 240 (lowest quality). The former is mapped to score 100 and the latter is mapped to score 20. Anything in between is mapped in the middle (for example, SD encode at 480 is typically mapped to 40 ~ 70)."
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 2nd March 2019 at 07:51.
WorBry is offline   Reply With Quote
Old 4th March 2019, 07:08   #79  |  Link
WorBry
Registered User
 
Join Date: Jan 2004
Location: Here, there and everywhere
Posts: 1,189
Quote:
Originally Posted by WorBry View Post
Cool. Should be interesting to see what statistical significance VMAF gives to those superfine score differences seen at the very high bitrates that I brought attention to earlier....
I've tested the v4 update with the Crowd Run 1080/50p x264 (CRF 0 - 30) encodes I retained from the first tests with v3:

https://forum.doom9.org/showthread.p...70#post1864770

Here are the VMAF results, together with the aggregate 95% confidence interval (CI95_Low and CI95-High) scores i.e. the aggregate derived from the individual frame confidence intervals. I didn't generate the component SSIM, MS-SSIM and PSNR scores.





First thing to note is that the VMAF v4 scores are lower than the scores I obtained previously (with the exact same x264 encodes) with v3. The same default pool=1 (harmonic mean) setting was applied in both cases, so I can only assume this reflects changes in the VMAF model itself.

And homed in on the higher bitrate range.



As noted in the v3 test series, the VMAF score for the lossless x264 CRF=0 encode (99.9954) didn't quite reach 100, and for the same reason - the component motion2=0 score for the first frame skewed an otherwise perfect 100 score for the other 499 frames.

I've yet to test the parallel x265 series with VMAF v4 but looking at the aggregate CI scores obtained with the x264 files I think I can confidently say that what minor differences were seen at the high bitrates in the first test series are not statistically significant. Seems odd though that the CI95_Low intervals for the CRF 22 - 30 encodes are actually smaller than those of CRF 20 despite being beyond the scope of the trained vmaf_v0.6.1.pkl model. Would have thought they would be larger. I suppose it depends on the content and quality of the source/reference video also.
__________________
Nostalgia's not what it used to be

Last edited by WorBry; 4th March 2019 at 17:42.
WorBry is offline   Reply With Quote
Old 4th March 2019, 07:48   #80  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 789
I did my VMAF test on BPG files.
Vmaf is already embedded in the SVT encoder, not as a json file tester.
Pictures for I frames are better because they have a larger size by the same QP values for different encoders. And so much on the topic photos .
Ma should add codec X265 with VMAF metric .
http://forum.doom9.org/showthread.ph...19#post1867419

Last edited by Jamaika; 4th March 2019 at 07:50.
Jamaika is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:08.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.