Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th April 2019, 18:53   #1601  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by IgorC View Post
Xiph update on one year of AV1 (slides from NAB 2019) [PDF]
https://people.xiph.org/~negge/NAB2019.pdf
There certainly has been a ton of progress in the last year!

Although I am again frustrated by the lack of a real apples-to-apples subjective quality comparison. The only one given was libaom versus HEVC HM for ultra low latency (Slide 38). I don't know that the HM is even optimized for low latency; libaom has a lot more rate control than the typical reference encoder.

And even then, we can see that while Y-PSNR a bitrate increase of 5%, subjective MOS testing showed a decrease of -4%. Metrics are not closely coupled!

The VVC JEM, conversely, showed a 32% decrease for Y-PSNR and 30% decrease for MOS; much better correlated.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th April 2019, 19:09   #1602  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by SmilingWolf View Post
Cmdlines:
x264 --preset veryslow --tune ssim --crf 16 -o test.x264.crf16.264 orig.i420.y4m
x265 --preset veryslow --tune ssim --crf 16 -o test.x265.crf16.hevc orig.i420.y4m
Why --tune ssim if targeting VMAF?

We know that --tune ssim looks subjectively worse than --tune film in x264 and not using --tune at all in x265.

Quote:
vpxenc --codec=vp9 --frame-parallel=0 --tile-columns=1 --auto-alt-ref=6 --good --cpu-used=0 --tune=psnr --passes=2 --threads=2 --end-usage=q --cq-level=20 --test-decode=fatal --ivf -o test.vp9.cq20.ivf orig.i420.y4m
And why a different tune, PSNR, here?

Quote:
SvtAv1EncApp.exe -i orig.i420.yuv -b test.svtav1.cq20.ivf -w 1280 -h 720 -q 20 -enc-mode 3 -fps-num 24000 -fps-denom 1001 -intra-period 23
aomenc --frame-parallel=0 --tile-columns=1 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --threads=2 --row-mt=1 --end-usage=q --cq-level=20 --test-decode=fatal -o test.av1.cq20.webm orig.i420.y4m
VMAF: model used: vmaf_v0.6.1, pooling: harmonic_mean
Also PSNR.

It seems like the same tuning should be used across all encoders! Although tuning for a given metric and then comparing with that metric is more a test of mathematical correctness of rate control than something that says much about viewer experience.

We've seen data that shows libaom underperforms HM and particularly the VVC JEM in subjective metrics versus objective metrics. I'm guessing because libaom has baked in a lot of VMAF-tuned optimizations.

The gold standard for AV1's current competitiveness would be a double-blind comparison of subjective quality at the same total encoding time.

I guess I'm unsure on what exactly the goal of these particular tests are, or how they are expected to be fruitfully applied.

Double-blind testing is a whole lot of work, but inescapably necessary at this point in the codec universe. Things are going to be crazy over the next few years with H.264, HEVC, and AV1 today and with VVC, EVC, and AV2 on the horizon. VMAF is going to need a data set with subjective tests of the "flavor" of artifacts each produces to be able to make good inter-codec quality comparisons.

It'd be nice to know the relative encoding times as well.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th April 2019, 19:29   #1603  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
--tune ssim on both x264 and x265 gave the best objective metrics scores for both PSNR-HVS-M and MS-SSIM (more than --tune psnr even when measuring PSNR-HVS-M).
VMAF has not been tested because it has been added to the encoding and scoring pipeline later than when I carried out the tune tests.

--tune psnr in the libvpx and libaom cmdlines is there as more of a way to make it explicit.
You can consider the whole "--tune" thing in those encoders as either a joke or a misnomer: instead of turning some knobs like they do on the x26X encoders, they set the RDO metric used during encoding.
To add insult to injury, in libaom out of 4 tunes (psnr, ssim, cdef-dist, daala-dist) 2 of them are usable only in single-threaded builds and give terrible results, and ssim is not even implemented, leaving --tune psnr as the only available one.
And it's not a question of setting it or leaving it alone, --tune psnr is the default and there is no way to change or unset it. Whatever you do, whether you know or not, if you encode with libaom you're using --tune psnr.

Relative encoding times are unavailable (or, rather, unrealiable) because the machine comes under various loads because I use it while the encodes are running, and have set the pipeline to leave me at least a couple of free cores at all times.

Last edited by SmilingWolf; 18th April 2019 at 19:34.
SmilingWolf is offline   Reply With Quote
Old 19th April 2019, 00:18   #1604  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Short decoding speed test on 10 year old Intel T3400 (2C2T laptop CPU, SSSE3, no SSE4)(both zeranoe's ffmpeg 20190417-8a3ed5a-win64-static, dav1d 20190410-44d0de4 -threads 4 -tilethreads 2), Chimera 720p 8 bit:
libaom: 749.241 (12 fps)
dav1d: 293.281 (30 fps)
sneaker_ger is offline   Reply With Quote
Old 19th April 2019, 00:37   #1605  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
Quote:
Originally Posted by sneaker_ger View Post
Short decoding speed test:
libaom: 749.241 (12 fps)
dav1d: 293.281 (30 fps)
OK, just how are you going about benchmarking this?

Last time I inquired about this, the best way was pretty much just trial and error by using something like mkvtoolnix to set a given frame rate and then use madvr's OSD to see if there were any dropped frames while playing it back.



Quote:
Originally Posted by sneaker_ger View Post
10 year old Intel T3400 (2C2T laptop CPU, SSSE3, no SSE4)
Kind of odd that CPU released in late 2008 when it uses the same architecture (Merom) as the original Conroe/Merom (65nm) Core 2 Duo from 2006 (though being a Pentium it has less L2 cache).

Weirder yet considering that the Wolfdale/Penryn 45nm Intel CPUs were available by then, and Nehalem was even available on desktop.
__________________
____HTPC____  | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258
Radeon HD5870  | Intel iGPU      
2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600       
Nintendo Maniac 64 is offline   Reply With Quote
Old 19th April 2019, 00:39   #1606  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Quote:
Originally Posted by Nintendo Maniac 64 View Post
OK, just how are you going about benchmarking this?

Last time I inquired about this, the best way was pretty much just trial and error by using something like mkvtoolnix to set a given frame rate and then use madvr's OSD to see if there were any dropped frames while playing it back.
This is just using ffmpeg -benchmark and the fps values are averages. I didn't test for framedrops during difficult scenes.
sneaker_ger is offline   Reply With Quote
Old 19th April 2019, 02:45   #1607  |  Link
soresu
Registered User
 
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
Hmmm, the commit here on the libaom experimental branch has the title "Add comparison between cnn and cdef/restoration."

I wonder if this means they are targetting an ML tool to replace CDEF, which wouldnt surprise me considering how Tim Terriberry mentioned CDEF being evaluated for a more efficient replacement during the latter stages of AV1 development.
soresu is offline   Reply With Quote
Old 19th April 2019, 05:36   #1608  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
Quote:
Originally Posted by sneaker_ger View Post
This is just using ffmpeg -benchmark
I'll be honest, I'm actually completely unfamiliar with using ffmpeg...I am at least familiar with how to use command line, but I've no idea what to actually input to get ffmpeg's benchmark argument to actually function.

Could you perhaps share the exact entire command you used? From there I should be able to figure out how to get things going over here.


(software is a bit of a weak point for me - hardware is much more of my specialty)
__________________
____HTPC____  | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258
Radeon HD5870  | Intel iGPU      
2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600       
Nintendo Maniac 64 is offline   Reply With Quote
Old 19th April 2019, 08:55   #1609  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
Quote:
Originally Posted by Nintendo Maniac 64 View Post
Could you perhaps share the exact entire command you used? From there I should be able to figure out how to get things going over here.
If you want to benchmark solely decoding, something like this:

ffmpeg -benchmark -i file.mp4 -f null -

Fill in the filename, of course, but don't move its position in the command line.

If you want to benchmark DirectShow on Windows, a far better option then your madVR hack is to use GraphStudioNext, which has View -> Performance Test, which lets you specify a file and a decoder, and it'll run only that decoder, without rendering involved.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 19th April 2019 at 08:58.
nevcairiel is offline   Reply With Quote
Old 19th April 2019, 10:46   #1610  |  Link
hajj_3
Registered User
 
Join Date: Mar 2004
Posts: 1,120
DAV1D decoder v0.2.2 has been released, here are the changes:

- Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase. The impact is important on SSSE3, SSE4 and AVX-2 cpus
- SSSE3 optimizations for all blocks size in itx
- SSSE3 optimizations for ipred_paeth and ipref_cfl (420, 422 and 444)
- Speed improvements on CDEF for SSE4 CPUs
- NEON optimizations for SGR and loop filter
- Minor crashes, improvements and build changes

Last edited by hajj_3; 19th April 2019 at 14:01.
hajj_3 is offline   Reply With Quote
Old 19th April 2019, 12:48   #1611  |  Link
dapperdan
Registered User
 
Join Date: Aug 2009
Posts: 201
Quote:
Originally Posted by benwaggoner View Post
Double-blind testing is a whole lot of work, but inescapably necessary at this point in the codec universe. Things are going to be crazy over the next few years with H.264, HEVC, and AV1 today and with VVC, EVC, and AV2 on the horizon. VMAF is going to need a data set with subjective tests of the "flavor" of artifacts each produces to be able to make good inter-codec quality comparisons.

It'd be nice to know the relative encoding times as well.
MSU did some subjective testing with their subjectify.us platform for their recent HEVC tests. Interestingly VP9 improved more than x265 when you compare SSIM to the subjective scores, though two other HEVC encoders beat both.

I think Netflix did a talk about how to use machine learning to reduce the number of comparisons that the real humans needed to do, making this kind of thing more efficient.
dapperdan is offline   Reply With Quote
Old 19th April 2019, 20:26   #1612  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
Quote:
Originally Posted by nevcairiel View Post
ffmpeg -benchmark -i file.mp4 -f null -
Yep that's exactly what I needed, and things are working now!

...except that the ffmpeg build I used seems to use the AOMedia AV1 decoder rather than dav1d. So now the question is where are you getting your ffmpeg builds so that they actually use dav1d?
__________________
____HTPC____  | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258
Radeon HD5870  | Intel iGPU      
2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600       
Nintendo Maniac 64 is offline   Reply With Quote
Old 19th April 2019, 20:44   #1613  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by Nintendo Maniac 64 View Post
Yep that's exactly what I needed, and things are working now!

...except that the ffmpeg build I used seems to use the AOMedia AV1 decoder rather than dav1d. So now the question is where are you getting your ffmpeg builds so that they actually use dav1d?
They probably use both, but prefer aom. To use dav1d, try -c:v libdav1d before -i.
Beelzebubu is offline   Reply With Quote
Old 20th April 2019, 13:14   #1614  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,558
I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.
foxyshadis is offline   Reply With Quote
Old 20th April 2019, 14:33   #1615  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Quote:
Originally Posted by benwaggoner View Post
VMAF isn't a still image metric. Has anyone run a correlation for VMAF against subjective testing for still images?
Here you go, based on the TID2013 dataset:
Code:
Actual profile:
Spearman:            | Kendall:
PSNRHA         0.938 | PSNRHA         0.787
PSNRHMA        0.934 | PSNRHMA        0.777
PSNRHVS        0.926 | PSNRHVS        0.766
PSNRHVSM       0.917 | PSNRHVSM       0.749
FSIMc          0.915 | FSIMc          0.742
FSIM           0.911 | FSIM           0.736
WSNR           0.897 | WSNR           0.718
MSSIM          0.887 | MSSIM          0.697
VSNR           0.882 | VSNR           0.690
VMAF_v0.6.1    0.863 | VMAF_v0.6.1    0.675
VMAF_rb_v0.6.3 0.862 | VMAF_rb_v0.6.3 0.674
NQM            0.857 | NQM            0.666
PSNR           0.825 | PSNR           0.624
VIFP           0.815 | VIFP           0.621
PSNRc          0.803 | PSNRc          0.596
SSIM           0.788 | SSIM           0.577

Simple profile:
Spearman:            | Kendall:
PSNRHA         0.953 | PSNRHA         0.818
PSNRHVS        0.951 | PSNRHVS        0.809
FSIM           0.949 | FSIM           0.795
FSIMc          0.947 | FSIMc          0.792
PSNRHVSM       0.938 | PSNRHMA        0.785
PSNRHMA        0.937 | PSNRHVSM       0.780
WSNR           0.933 | WSNR           0.772
PSNR           0.913 | PSNR           0.745
VSNR           0.912 | VSNR           0.731
MSSIM          0.905 | MSSIM          0.720
VIFP           0.897 | VIFP           0.714
VMAF_rb_v0.6.3 0.891 | VMAF_rb_v0.6.3 0.698
VMAF_v0.6.1    0.889 | VMAF_v0.6.1    0.696
PSNRc          0.876 | PSNRc          0.689
NQM            0.875 | NQM            0.681
SSIM           0.837 | SSIM           0.628

Full profile:
Spearman:            | Kendall:
FSIMc          0.851 | FSIMc          0.666
PSNRHA         0.819 | PSNRHA         0.643
PSNRHMA        0.813 | PSNRHMA        0.631
FSIM           0.801 | FSIM           0.629
MSSIM          0.787 | MSSIM          0.607
VMAF_rb_v0.6.3 0.749 | VMAF_rb_v0.6.3 0.564
VMAF_v0.6.1    0.748 | VMAF_v0.6.1    0.563
PSNRc          0.687 | VSNR           0.508
VSNR           0.681 | PSNRHVS        0.507
PSNRHVS        0.654 | PSNRc          0.496
PSNR           0.640 | PSNRHVSM       0.481
SSIM           0.637 | PSNR           0.470
NQM            0.635 | NQM            0.466
PSNRHVSM       0.625 | SSIM           0.463
VIFP           0.608 | VIFP           0.456
WSNR           0.580 | WSNR           0.446
All bitmap images have been converted to raw full range YUV444P with ffmpeg and then measured with the vmafossexec program.
Code:
ffmpeg.exe -i i01_01_1.bmp -vf "scale=flags=accurate_rnd+bitexact+full_chroma_int+full_chroma_inp,format=yuvj444p" i01_01_1.bmp.yuv
vmafossexec.exe yuv444p 512 384 reference_images/i01.bmp.yuv distorted_images/i01_01_1.bmp.yuv model/vmaf_v0.6.1.pkl
vmafossexec.exe yuv444p 512 384 reference_images/i01.bmp.yuv distorted_images/i01_01_1.bmp.yuv model/vmaf_rb_v0.6.3/vmaf_rb_v0.6.3.pkl --ci
I'm also attaching the raw scores, for completeness sake.

A note on how to read the numbers:
from the paper I get the following: a SROCC of 0.95 is considered excellent, 0.90 is good, and 0.85 is barely acceptable.
Attached Files
File Type: txt VMAF_v0.6.1.txt (29.2 KB, 49 views)
File Type: txt VMAF_rb_v0.6.3.txt (29.2 KB, 41 views)

Last edited by SmilingWolf; 20th April 2019 at 19:09.
SmilingWolf is offline   Reply With Quote
Old 20th April 2019, 16:32   #1616  |  Link
bstrobl
Registered User
 
Join Date: Jun 2016
Posts: 55
Quote:
Originally Posted by foxyshadis View Post
I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.
Seems sensible, I would welcome a couple more threads.
bstrobl is offline   Reply With Quote
Old 20th April 2019, 17:59   #1617  |  Link
TomV
VP Eng, Kaleidescape
 
Join Date: Jan 2018
Location: Mt View, CA
Posts: 51
Quote:
Originally Posted by foxyshadis View Post
I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.
Makes sense. Implementations should have separate threads from the main standardization effort and aomenc. AOM/AV1 news, legal discussions, etc. can be separate threads.
TomV is offline   Reply With Quote
Old 20th April 2019, 18:31   #1618  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by foxyshadis View Post
I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.
Too much effort, too many double posts, too much separated information and too much overhead in general.

Probably a separation of AV1 encoding and AV1 decoding would be more than enough for AV1 codec.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 21st April 2019, 13:40   #1619  |  Link
Audionut
Registered User
 
Join Date: Nov 2003
Posts: 1,281
Quote:
Originally Posted by NikosD View Post
Too much effort, too many double posts, too much separated information and too much overhead in general.
Agreed. Not busy enough yet. You can come back after a couple of days and still might only have a full page to read.
__________________
http://www.7-zip.org/
Audionut is offline   Reply With Quote
Old 21st April 2019, 14:42   #1620  |  Link
dapperdan
Registered User
 
Join Date: Aug 2009
Posts: 201
VMAF isn't designed for still images, but they do provide the tools to create your own VMAF for specific use cases (e.g. anime on a phone screen, or video game cobtebt) so it surprises me that no one has taken the framework and applied it to still images yet.

It should in theory be able to fuse the results of those other still image tests and create something even better aligned with human reported scores than any one alone. Presumably not Netflix's main use case but you'd think they deliver enough still images to make it worthwhile since they already have the skills.
dapperdan is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:23.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.