Alliance for Open Media codecs - Page 81

benwaggoner · 18th April 2019, 18:53

Quote:

Originally Posted by IgorC

Xiph update on one year of AV1 (slides from NAB 2019) [PDF]
https://people.xiph.org/~negge/NAB2019.pdf

There certainly has been a ton of progress in the last year!

Although I am again frustrated by the lack of a real apples-to-apples subjective quality comparison. The only one given was libaom versus HEVC HM for ultra low latency (Slide 38). I don't know that the HM is even optimized for low latency; libaom has a lot more rate control than the typical reference encoder.

And even then, we can see that while Y-PSNR a bitrate increase of 5%, subjective MOS testing showed a decrease of -4%. Metrics are not closely coupled!

The VVC JEM, conversely, showed a 32% decrease for Y-PSNR and 30% decrease for MOS; much better correlated.

benwaggoner · 18th April 2019, 19:09

Quote:

Originally Posted by SmilingWolf

Cmdlines:
x264 --preset veryslow --tune ssim --crf 16 -o test.x264.crf16.264 orig.i420.y4m
x265 --preset veryslow --tune ssim --crf 16 -o test.x265.crf16.hevc orig.i420.y4m

Why --tune ssim if targeting VMAF?

We know that --tune ssim looks subjectively worse than --tune film in x264 and not using --tune at all in x265.

Quote:

vpxenc --codec=vp9 --frame-parallel=0 --tile-columns=1 --auto-alt-ref=6 --good --cpu-used=0 --tune=psnr --passes=2 --threads=2 --end-usage=q --cq-level=20 --test-decode=fatal --ivf -o test.vp9.cq20.ivf orig.i420.y4m

And why a different tune, PSNR, here?

Quote:

SvtAv1EncApp.exe -i orig.i420.yuv -b test.svtav1.cq20.ivf -w 1280 -h 720 -q 20 -enc-mode 3 -fps-num 24000 -fps-denom 1001 -intra-period 23
aomenc --frame-parallel=0 --tile-columns=1 --auto-alt-ref=1 --cpu-used=4 --tune=psnr --passes=2 --threads=2 --row-mt=1 --end-usage=q --cq-level=20 --test-decode=fatal -o test.av1.cq20.webm orig.i420.y4m
VMAF: model used: vmaf_v0.6.1, pooling: harmonic_mean

Also PSNR.

It seems like the same tuning should be used across all encoders! Although tuning for a given metric and then comparing with that metric is more a test of mathematical correctness of rate control than something that says much about viewer experience.

We've seen data that shows libaom underperforms HM and particularly the VVC JEM in subjective metrics versus objective metrics. I'm guessing because libaom has baked in a lot of VMAF-tuned optimizations.

The gold standard for AV1's current competitiveness would be a double-blind comparison of subjective quality at the same total encoding time.

I guess I'm unsure on what exactly the goal of these particular tests are, or how they are expected to be fruitfully applied.

Double-blind testing is a whole lot of work, but inescapably necessary at this point in the codec universe. Things are going to be crazy over the next few years with H.264, HEVC, and AV1 today and with VVC, EVC, and AV2 on the horizon. VMAF is going to need a data set with subjective tests of the "flavor" of artifacts each produces to be able to make good inter-codec quality comparisons.

It'd be nice to know the relative encoding times as well.

SmilingWolf · 18th April 2019, 19:29

--tune ssim on both x264 and x265 gave the best objective metrics scores for both PSNR-HVS-M and MS-SSIM (more than --tune psnr even when measuring PSNR-HVS-M).
VMAF has not been tested because it has been added to the encoding and scoring pipeline later than when I carried out the tune tests.

--tune psnr in the libvpx and libaom cmdlines is there as more of a way to make it explicit.
You can consider the whole "--tune" thing in those encoders as either a joke or a misnomer: instead of turning some knobs like they do on the x26X encoders, they set the RDO metric used during encoding.
To add insult to injury, in libaom out of 4 tunes (psnr, ssim, cdef-dist, daala-dist) 2 of them are usable only in single-threaded builds and give terrible results, and ssim is not even implemented, leaving --tune psnr as the only available one.
And it's not a question of setting it or leaving it alone, --tune psnr is the default and there is no way to change or unset it. Whatever you do, whether you know or not, if you encode with libaom you're using --tune psnr.

Relative encoding times are unavailable (or, rather, unrealiable) because the machine comes under various loads because I use it while the encodes are running, and have set the pipeline to leave me at least a couple of free cores at all times.

sneaker_ger · 19th April 2019, 00:18

Short decoding speed test on 10 year old Intel T3400 (2C2T laptop CPU, SSSE3, no SSE4)(both zeranoe's ffmpeg 20190417-8a3ed5a-win64-static, dav1d 20190410-44d0de4 -threads 4 -tilethreads 2), Chimera 720p 8 bit:
libaom: 749.241 (12 fps)
dav1d: 293.281 (30 fps)

Nintendo Maniac 64 · 19th April 2019, 00:37

Quote:

Originally Posted by sneaker_ger

Short decoding speed test:
libaom: 749.241 (12 fps)
dav1d: 293.281 (30 fps)

OK, just how are you going about benchmarking this?

Last time I inquired about this, the best way was pretty much just trial and error by using something like mkvtoolnix to set a given frame rate and then use madvr's OSD to see if there were any dropped frames while playing it back.

Quote:

Originally Posted by sneaker_ger

10 year old Intel T3400 (2C2T laptop CPU, SSSE3, no SSE4)

Kind of odd that CPU released in late 2008 when it uses the same architecture (Merom) as the original Conroe/Merom (65nm) Core 2 Duo from 2006 (though being a Pentium it has less L2 cache).

Weirder yet considering that the Wolfdale/Penryn 45nm Intel CPUs were available by then, and Nehalem was even available on desktop.

sneaker_ger · 19th April 2019, 00:39

Quote:

Originally Posted by Nintendo Maniac 64

OK, just how are you going about benchmarking this?

Last time I inquired about this, the best way was pretty much just trial and error by using something like mkvtoolnix to set a given frame rate and then use madvr's OSD to see if there were any dropped frames while playing it back.

This is just using ffmpeg -benchmark and the fps values are averages. I didn't test for framedrops during difficult scenes.

soresu · 19th April 2019, 02:45

Hmmm, the commit here on the libaom experimental branch has the title "Add comparison between cnn and cdef/restoration."

I wonder if this means they are targetting an ML tool to replace CDEF, which wouldnt surprise me considering how Tim Terriberry mentioned CDEF being evaluated for a more efficient replacement during the latter stages of AV1 development.

Nintendo Maniac 64 · 19th April 2019, 05:36

Quote:

Originally Posted by sneaker_ger

This is just using ffmpeg -benchmark

I'll be honest, I'm actually completely unfamiliar with using ffmpeg...I am at least familiar with how to use command line, but I've no idea what to actually input to get ffmpeg's benchmark argument to actually function.

Could you perhaps share the exact entire command you used? From there I should be able to figure out how to get things going over here.

(software is a bit of a weak point for me - hardware is much more of my specialty)

nevcairiel · 19th April 2019, 08:55

Quote:

Originally Posted by Nintendo Maniac 64

Could you perhaps share the exact entire command you used? From there I should be able to figure out how to get things going over here.

If you want to benchmark solely decoding, something like this:

ffmpeg -benchmark -i file.mp4 -f null -

Fill in the filename, of course, but don't move its position in the command line.

If you want to benchmark DirectShow on Windows, a far better option then your madVR hack is to use GraphStudioNext, which has View -> Performance Test, which lets you specify a file and a decoder, and it'll run only that decoder, without rendering involved.

hajj_3 · 19th April 2019, 10:46

DAV1D decoder v0.2.2 has been released, here are the changes:

- Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase. The impact is important on SSSE3, SSE4 and AVX-2 cpus
- SSSE3 optimizations for all blocks size in itx
- SSSE3 optimizations for ipred_paeth and ipref_cfl (420, 422 and 444)
- Speed improvements on CDEF for SSE4 CPUs
- NEON optimizations for SGR and loop filter
- Minor crashes, improvements and build changes

dapperdan · 19th April 2019, 12:48

Quote:

Originally Posted by benwaggoner

Double-blind testing is a whole lot of work, but inescapably necessary at this point in the codec universe. Things are going to be crazy over the next few years with H.264, HEVC, and AV1 today and with VVC, EVC, and AV2 on the horizon. VMAF is going to need a data set with subjective tests of the "flavor" of artifacts each produces to be able to make good inter-codec quality comparisons.

It'd be nice to know the relative encoding times as well.

MSU did some subjective testing with their subjectify.us platform for their recent HEVC tests. Interestingly VP9 improved more than x265 when you compare SSIM to the subjective scores, though two other HEVC encoders beat both.

I think Netflix did a talk about how to use machine learning to reduce the number of comparisons that the real humans needed to do, making this kind of thing more efficient.

Nintendo Maniac 64 · 19th April 2019, 20:26

Quote:

Originally Posted by nevcairiel

ffmpeg -benchmark -i file.mp4 -f null -

Yep that's exactly what I needed, and things are working now!

...except that the ffmpeg build I used seems to use the AOMedia AV1 decoder rather than dav1d. So now the question is where are you getting your ffmpeg builds so that they actually use dav1d?

Beelzebubu · 19th April 2019, 20:44

Quote:

Originally Posted by Nintendo Maniac 64

Yep that's exactly what I needed, and things are working now!

...except that the ffmpeg build I used seems to use the AOMedia AV1 decoder rather than dav1d. So now the question is where are you getting your ffmpeg builds so that they actually use dav1d?

They probably use both, but prefer aom. To use dav1d, try -c:v libdav1d before -i.

foxyshadis · 20th April 2019, 13:14

I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.

SmilingWolf · 20th April 2019, 14:33

Quote:

Originally Posted by benwaggoner

VMAF isn't a still image metric. Has anyone run a correlation for VMAF against subjective testing for still images?

Here you go, based on the TID2013 dataset:

Code:

Actual profile:
Spearman:            | Kendall:
PSNRHA         0.938 | PSNRHA         0.787
PSNRHMA        0.934 | PSNRHMA        0.777
PSNRHVS        0.926 | PSNRHVS        0.766
PSNRHVSM       0.917 | PSNRHVSM       0.749
FSIMc          0.915 | FSIMc          0.742
FSIM           0.911 | FSIM           0.736
WSNR           0.897 | WSNR           0.718
MSSIM          0.887 | MSSIM          0.697
VSNR           0.882 | VSNR           0.690
VMAF_v0.6.1    0.863 | VMAF_v0.6.1    0.675
VMAF_rb_v0.6.3 0.862 | VMAF_rb_v0.6.3 0.674
NQM            0.857 | NQM            0.666
PSNR           0.825 | PSNR           0.624
VIFP           0.815 | VIFP           0.621
PSNRc          0.803 | PSNRc          0.596
SSIM           0.788 | SSIM           0.577

Simple profile:
Spearman:            | Kendall:
PSNRHA         0.953 | PSNRHA         0.818
PSNRHVS        0.951 | PSNRHVS        0.809
FSIM           0.949 | FSIM           0.795
FSIMc          0.947 | FSIMc          0.792
PSNRHVSM       0.938 | PSNRHMA        0.785
PSNRHMA        0.937 | PSNRHVSM       0.780
WSNR           0.933 | WSNR           0.772
PSNR           0.913 | PSNR           0.745
VSNR           0.912 | VSNR           0.731
MSSIM          0.905 | MSSIM          0.720
VIFP           0.897 | VIFP           0.714
VMAF_rb_v0.6.3 0.891 | VMAF_rb_v0.6.3 0.698
VMAF_v0.6.1    0.889 | VMAF_v0.6.1    0.696
PSNRc          0.876 | PSNRc          0.689
NQM            0.875 | NQM            0.681
SSIM           0.837 | SSIM           0.628

Full profile:
Spearman:            | Kendall:
FSIMc          0.851 | FSIMc          0.666
PSNRHA         0.819 | PSNRHA         0.643
PSNRHMA        0.813 | PSNRHMA        0.631
FSIM           0.801 | FSIM           0.629
MSSIM          0.787 | MSSIM          0.607
VMAF_rb_v0.6.3 0.749 | VMAF_rb_v0.6.3 0.564
VMAF_v0.6.1    0.748 | VMAF_v0.6.1    0.563
PSNRc          0.687 | VSNR           0.508
VSNR           0.681 | PSNRHVS        0.507
PSNRHVS        0.654 | PSNRc          0.496
PSNR           0.640 | PSNRHVSM       0.481
SSIM           0.637 | PSNR           0.470
NQM            0.635 | NQM            0.466
PSNRHVSM       0.625 | SSIM           0.463
VIFP           0.608 | VIFP           0.456
WSNR           0.580 | WSNR           0.446

All bitmap images have been converted to raw full range YUV444P with ffmpeg and then measured with the vmafossexec program.

Code:

ffmpeg.exe -i i01_01_1.bmp -vf "scale=flags=accurate_rnd+bitexact+full_chroma_int+full_chroma_inp,format=yuvj444p" i01_01_1.bmp.yuv
vmafossexec.exe yuv444p 512 384 reference_images/i01.bmp.yuv distorted_images/i01_01_1.bmp.yuv model/vmaf_v0.6.1.pkl
vmafossexec.exe yuv444p 512 384 reference_images/i01.bmp.yuv distorted_images/i01_01_1.bmp.yuv model/vmaf_rb_v0.6.3/vmaf_rb_v0.6.3.pkl --ci

I'm also attaching the raw scores, for completeness sake.

A note on how to read the numbers:
from the paper I get the following: a SROCC of 0.95 is considered excellent, 0.90 is good, and 0.85 is barely acceptable.

bstrobl · 20th April 2019, 16:32

Quote:

Originally Posted by foxyshadis

I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.

Seems sensible, I would welcome a couple more threads.

TomV · 20th April 2019, 17:59

Quote:

Originally Posted by foxyshadis

I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.

Makes sense. Implementations should have separate threads from the main standardization effort and aomenc. AOM/AV1 news, legal discussions, etc. can be separate threads.

NikosD · 20th April 2019, 18:31

Quote:

Originally Posted by foxyshadis

I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.

Too much effort, too many double posts, too much separated information and too much overhead in general.

Probably a separation of AV1 encoding and AV1 decoding would be more than enough for AV1 codec.

Audionut · 21st April 2019, 13:40

Quote:

Originally Posted by NikosD

Too much effort, too many double posts, too much separated information and too much overhead in general.

Agreed. Not busy enough yet. You can come back after a couple of days and still might only have a full page to read.

dapperdan · 21st April 2019, 14:42

VMAF isn't designed for still images, but they do provide the tools to create your own VMAF for specific use cases (e.g. anime on a phone screen, or video game cobtebt) so it surprises me that no one has taken the framework and applied it to still images yet.

It should in theory be able to fuse the results of those other still image tests and create something even better aligned with human reported scores than any one alone. Presumably not Netflix's main use case but you'd think they deliver enough still images to make it worthwhile since they already have the skills.

18th April 2019, 19:29	#1603 \| Link
SmilingWolf I am maddo saientisto! Join Date: Aug 2018 Posts: 95	--tune ssim on both x264 and x265 gave the best objective metrics scores for both PSNR-HVS-M and MS-SSIM (more than --tune psnr even when measuring PSNR-HVS-M). VMAF has not been tested because it has been added to the encoding and scoring pipeline later than when I carried out the tune tests. --tune psnr in the libvpx and libaom cmdlines is there as more of a way to make it explicit. You can consider the whole "--tune" thing in those encoders as either a joke or a misnomer: instead of turning some knobs like they do on the x26X encoders, they set the RDO metric used during encoding. To add insult to injury, in libaom out of 4 tunes (psnr, ssim, cdef-dist, daala-dist) 2 of them are usable only in single-threaded builds and give terrible results, and ssim is not even implemented, leaving --tune psnr as the only available one. And it's not a question of setting it or leaving it alone, --tune psnr is the default and there is no way to change or unset it. Whatever you do, whether you know or not, if you encode with libaom you're using --tune psnr. Relative encoding times are unavailable (or, rather, unrealiable) because the machine comes under various loads because I use it while the encodes are running, and have set the pipeline to leave me at least a couple of free cores at all times. Last edited by SmilingWolf; 18th April 2019 at 19:34.

19th April 2019, 10:46	#1610 \| Link
hajj_3 Registered User Join Date: Mar 2004 Posts: 1,125	DAV1D decoder v0.2.2 has been released, here are the changes: - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase. The impact is important on SSSE3, SSE4 and AVX-2 cpus - SSSE3 optimizations for all blocks size in itx - SSSE3 optimizations for ipred_paeth and ipref_cfl (420, 422 and 444) - Speed improvements on CDEF for SSE4 CPUs - NEON optimizations for SGR and loop filter - Minor crashes, improvements and build changes Last edited by hajj_3; 19th April 2019 at 14:01.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

19th April 2019, 00:18	#1604 \| Link
sneaker_ger Registered User Join Date: Dec 2002 Posts: 5,565	Short decoding speed test on 10 year old Intel T3400 (2C2T laptop CPU, SSSE3, no SSE4)(both zeranoe's ffmpeg 20190417-8a3ed5a-win64-static, dav1d 20190410-44d0de4 -threads 4 -tilethreads 2), Chimera 720p 8 bit: libaom: 749.241 (12 fps) dav1d: 293.281 (30 fps)

19th April 2019, 02:45	#1607 \| Link
soresu Registered User Join Date: May 2005 Location: Swansea, Wales, UK Posts: 196	Hmmm, the commit here on the libaom experimental branch has the title "Add comparison between cnn and cdef/restoration." I wonder if this means they are targetting an ML tool to replace CDEF, which wouldnt surprise me considering how Tim Terriberry mentioned CDEF being evaluated for a more efficient replacement during the latter stages of AV1 development.

20th April 2019, 13:14	#1614 \| Link
foxyshadis Angel of Night Join Date: Nov 2004 Location: Tangled in the silks Posts: 9,559	I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.

21st April 2019, 14:42	#1620 \| Link
dapperdan Registered User Join Date: Aug 2009 Posts: 201	VMAF isn't designed for still images, but they do provide the tools to create your own VMAF for specific use cases (e.g. anime on a phone screen, or video game cobtebt) so it surprises me that no one has taken the framework and applied it to still images yet. It should in theory be able to fuse the results of those other still image tests and create something even better aligned with human reported scores than any one alone. Presumably not Netflix's main use case but you'd think they deliver enough still images to make it worthwhile since they already have the skills.