Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
18th April 2019, 18:53 | #1601 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
|
Quote:
Although I am again frustrated by the lack of a real apples-to-apples subjective quality comparison. The only one given was libaom versus HEVC HM for ultra low latency (Slide 38). I don't know that the HM is even optimized for low latency; libaom has a lot more rate control than the typical reference encoder. And even then, we can see that while Y-PSNR a bitrate increase of 5%, subjective MOS testing showed a decrease of -4%. Metrics are not closely coupled! The VVC JEM, conversely, showed a 32% decrease for Y-PSNR and 30% decrease for MOS; much better correlated. |
|
18th April 2019, 19:09 | #1602 | Link | |||
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
|
Quote:
We know that --tune ssim looks subjectively worse than --tune film in x264 and not using --tune at all in x265. Quote:
Quote:
It seems like the same tuning should be used across all encoders! Although tuning for a given metric and then comparing with that metric is more a test of mathematical correctness of rate control than something that says much about viewer experience. We've seen data that shows libaom underperforms HM and particularly the VVC JEM in subjective metrics versus objective metrics. I'm guessing because libaom has baked in a lot of VMAF-tuned optimizations. The gold standard for AV1's current competitiveness would be a double-blind comparison of subjective quality at the same total encoding time. I guess I'm unsure on what exactly the goal of these particular tests are, or how they are expected to be fruitfully applied. Double-blind testing is a whole lot of work, but inescapably necessary at this point in the codec universe. Things are going to be crazy over the next few years with H.264, HEVC, and AV1 today and with VVC, EVC, and AV2 on the horizon. VMAF is going to need a data set with subjective tests of the "flavor" of artifacts each produces to be able to make good inter-codec quality comparisons. It'd be nice to know the relative encoding times as well. |
|||
18th April 2019, 19:29 | #1603 | Link |
I am maddo saientisto!
Join Date: Aug 2018
Posts: 95
|
--tune ssim on both x264 and x265 gave the best objective metrics scores for both PSNR-HVS-M and MS-SSIM (more than --tune psnr even when measuring PSNR-HVS-M).
VMAF has not been tested because it has been added to the encoding and scoring pipeline later than when I carried out the tune tests. --tune psnr in the libvpx and libaom cmdlines is there as more of a way to make it explicit. You can consider the whole "--tune" thing in those encoders as either a joke or a misnomer: instead of turning some knobs like they do on the x26X encoders, they set the RDO metric used during encoding. To add insult to injury, in libaom out of 4 tunes (psnr, ssim, cdef-dist, daala-dist) 2 of them are usable only in single-threaded builds and give terrible results, and ssim is not even implemented, leaving --tune psnr as the only available one. And it's not a question of setting it or leaving it alone, --tune psnr is the default and there is no way to change or unset it. Whatever you do, whether you know or not, if you encode with libaom you're using --tune psnr. Relative encoding times are unavailable (or, rather, unrealiable) because the machine comes under various loads because I use it while the encodes are running, and have set the pipeline to leave me at least a couple of free cores at all times. Last edited by SmilingWolf; 18th April 2019 at 19:34. |
19th April 2019, 00:18 | #1604 | Link |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Short decoding speed test on 10 year old Intel T3400 (2C2T laptop CPU, SSSE3, no SSE4)(both zeranoe's ffmpeg 20190417-8a3ed5a-win64-static, dav1d 20190410-44d0de4 -threads 4 -tilethreads 2), Chimera 720p 8 bit:
libaom: 749.241 (12 fps) dav1d: 293.281 (30 fps) |
19th April 2019, 00:37 | #1605 | Link | |
Registered User
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
|
Quote:
Last time I inquired about this, the best way was pretty much just trial and error by using something like mkvtoolnix to set a given frame rate and then use madvr's OSD to see if there were any dropped frames while playing it back. Kind of odd that CPU released in late 2008 when it uses the same architecture (Merom) as the original Conroe/Merom (65nm) Core 2 Duo from 2006 (though being a Pentium it has less L2 cache). Weirder yet considering that the Wolfdale/Penryn 45nm Intel CPUs were available by then, and Nehalem was even available on desktop.
__________________
____HTPC____ | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258 Radeon HD5870 | Intel iGPU 2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600 |
|
19th April 2019, 00:39 | #1606 | Link | |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Quote:
|
|
19th April 2019, 02:45 | #1607 | Link |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
Hmmm, the commit here on the libaom experimental branch has the title "Add comparison between cnn and cdef/restoration."
I wonder if this means they are targetting an ML tool to replace CDEF, which wouldnt surprise me considering how Tim Terriberry mentioned CDEF being evaluated for a more efficient replacement during the latter stages of AV1 development. |
19th April 2019, 05:36 | #1608 | Link |
Registered User
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
|
I'll be honest, I'm actually completely unfamiliar with using ffmpeg...I am at least familiar with how to use command line, but I've no idea what to actually input to get ffmpeg's benchmark argument to actually function.
Could you perhaps share the exact entire command you used? From there I should be able to figure out how to get things going over here. (software is a bit of a weak point for me - hardware is much more of my specialty)
__________________
____HTPC____ | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258 Radeon HD5870 | Intel iGPU 2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600 |
19th April 2019, 08:55 | #1609 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
|
Quote:
ffmpeg -benchmark -i file.mp4 -f null - Fill in the filename, of course, but don't move its position in the command line. If you want to benchmark DirectShow on Windows, a far better option then your madVR hack is to use GraphStudioNext, which has View -> Performance Test, which lets you specify a file and a decoder, and it'll run only that decoder, without rendering involved.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 19th April 2019 at 08:58. |
|
19th April 2019, 10:46 | #1610 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,125
|
DAV1D decoder v0.2.2 has been released, here are the changes:
- Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase. The impact is important on SSSE3, SSE4 and AVX-2 cpus - SSSE3 optimizations for all blocks size in itx - SSSE3 optimizations for ipred_paeth and ipref_cfl (420, 422 and 444) - Speed improvements on CDEF for SSE4 CPUs - NEON optimizations for SGR and loop filter - Minor crashes, improvements and build changes Last edited by hajj_3; 19th April 2019 at 14:01. |
19th April 2019, 12:48 | #1611 | Link | |
Registered User
Join Date: Aug 2009
Posts: 201
|
Quote:
I think Netflix did a talk about how to use machine learning to reduce the number of comparisons that the real humans needed to do, making this kind of thing more efficient. |
|
19th April 2019, 20:26 | #1612 | Link |
Registered User
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
|
Yep that's exactly what I needed, and things are working now!
...except that the ffmpeg build I used seems to use the AOMedia AV1 decoder rather than dav1d. So now the question is where are you getting your ffmpeg builds so that they actually use dav1d?
__________________
____HTPC____ | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258 Radeon HD5870 | Intel iGPU 2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600 |
19th April 2019, 20:44 | #1613 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
|
Quote:
|
|
20th April 2019, 13:14 | #1614 | Link |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
I'd like to solicit opinions on splitting this thread up, especially into aom, rav1e, dav1d, still image (avif) news, as well as solicitations to get the best quality command lines. I'd like to create a separate AV1 forum entirely at this point, but one megathread does not a forum make.
|
20th April 2019, 14:33 | #1615 | Link | |
I am maddo saientisto!
Join Date: Aug 2018
Posts: 95
|
Quote:
Code:
Actual profile: Spearman: | Kendall: PSNRHA 0.938 | PSNRHA 0.787 PSNRHMA 0.934 | PSNRHMA 0.777 PSNRHVS 0.926 | PSNRHVS 0.766 PSNRHVSM 0.917 | PSNRHVSM 0.749 FSIMc 0.915 | FSIMc 0.742 FSIM 0.911 | FSIM 0.736 WSNR 0.897 | WSNR 0.718 MSSIM 0.887 | MSSIM 0.697 VSNR 0.882 | VSNR 0.690 VMAF_v0.6.1 0.863 | VMAF_v0.6.1 0.675 VMAF_rb_v0.6.3 0.862 | VMAF_rb_v0.6.3 0.674 NQM 0.857 | NQM 0.666 PSNR 0.825 | PSNR 0.624 VIFP 0.815 | VIFP 0.621 PSNRc 0.803 | PSNRc 0.596 SSIM 0.788 | SSIM 0.577 Simple profile: Spearman: | Kendall: PSNRHA 0.953 | PSNRHA 0.818 PSNRHVS 0.951 | PSNRHVS 0.809 FSIM 0.949 | FSIM 0.795 FSIMc 0.947 | FSIMc 0.792 PSNRHVSM 0.938 | PSNRHMA 0.785 PSNRHMA 0.937 | PSNRHVSM 0.780 WSNR 0.933 | WSNR 0.772 PSNR 0.913 | PSNR 0.745 VSNR 0.912 | VSNR 0.731 MSSIM 0.905 | MSSIM 0.720 VIFP 0.897 | VIFP 0.714 VMAF_rb_v0.6.3 0.891 | VMAF_rb_v0.6.3 0.698 VMAF_v0.6.1 0.889 | VMAF_v0.6.1 0.696 PSNRc 0.876 | PSNRc 0.689 NQM 0.875 | NQM 0.681 SSIM 0.837 | SSIM 0.628 Full profile: Spearman: | Kendall: FSIMc 0.851 | FSIMc 0.666 PSNRHA 0.819 | PSNRHA 0.643 PSNRHMA 0.813 | PSNRHMA 0.631 FSIM 0.801 | FSIM 0.629 MSSIM 0.787 | MSSIM 0.607 VMAF_rb_v0.6.3 0.749 | VMAF_rb_v0.6.3 0.564 VMAF_v0.6.1 0.748 | VMAF_v0.6.1 0.563 PSNRc 0.687 | VSNR 0.508 VSNR 0.681 | PSNRHVS 0.507 PSNRHVS 0.654 | PSNRc 0.496 PSNR 0.640 | PSNRHVSM 0.481 SSIM 0.637 | PSNR 0.470 NQM 0.635 | NQM 0.466 PSNRHVSM 0.625 | SSIM 0.463 VIFP 0.608 | VIFP 0.456 WSNR 0.580 | WSNR 0.446 Code:
ffmpeg.exe -i i01_01_1.bmp -vf "scale=flags=accurate_rnd+bitexact+full_chroma_int+full_chroma_inp,format=yuvj444p" i01_01_1.bmp.yuv vmafossexec.exe yuv444p 512 384 reference_images/i01.bmp.yuv distorted_images/i01_01_1.bmp.yuv model/vmaf_v0.6.1.pkl vmafossexec.exe yuv444p 512 384 reference_images/i01.bmp.yuv distorted_images/i01_01_1.bmp.yuv model/vmaf_rb_v0.6.3/vmaf_rb_v0.6.3.pkl --ci A note on how to read the numbers: from the paper I get the following: a SROCC of 0.95 is considered excellent, 0.90 is good, and 0.85 is barely acceptable. Last edited by SmilingWolf; 20th April 2019 at 19:09. |
|
20th April 2019, 16:32 | #1616 | Link | |
Registered User
Join Date: Jun 2016
Posts: 55
|
Quote:
|
|
20th April 2019, 17:59 | #1617 | Link | |
VP Eng, Kaleidescape
Join Date: Jan 2018
Location: Mt View, CA
Posts: 51
|
Quote:
|
|
20th April 2019, 18:31 | #1618 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Probably a separation of AV1 encoding and AV1 decoding would be more than enough for AV1 codec.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
21st April 2019, 13:40 | #1619 | Link |
Registered User
Join Date: Nov 2003
Posts: 1,281
|
Agreed. Not busy enough yet. You can come back after a couple of days and still might only have a full page to read.
__________________
http://www.7-zip.org/ |
21st April 2019, 14:42 | #1620 | Link |
Registered User
Join Date: Aug 2009
Posts: 201
|
VMAF isn't designed for still images, but they do provide the tools to create your own VMAF for specific use cases (e.g. anime on a phone screen, or video game cobtebt) so it surprises me that no one has taken the framework and applied it to still images yet.
It should in theory be able to fuse the results of those other still image tests and create something even better aligned with human reported scores than any one alone. Presumably not Netflix's main use case but you'd think they deliver enough still images to make it worthwhile since they already have the skills. |
Thread Tools | Search this Thread |
Display Modes | |
|
|