Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
13th December 2018, 14:28 | #1301 | Link |
Registered User
Join Date: Apr 2018
Posts: 63
|
They make compression efficiency worse.The current implementation splits the frame into equal parts, and most of the times you get a split right in the center of the picture where most action takes place.
dav1d demonstrates pretty well that frame parallel decoding works fairly well, other encoders managed to get perfect frame parallel encoding done, so it just seems a lazy solution until libaom gets row_mt running well. |
13th December 2018, 16:34 | #1302 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,783
|
New uploads: (MSYS2; MinGW32: GCC 7.4.0 / MinGW64: GCC 8.2.1)
AOM v1.0.0-1030-g7ac3eb1bb New parameters: Code:
--enable-dual-filter=<arg> Enable dual filter (0: false, 1: true (default)) --enable-order-hint=<arg> Enable order hint (0: false, 1: true (default)) --enable-dist-wtd-comp=<arg Enable distance-weighted compound (0: false, 1: true (default)) --enable-masked-comp=<arg> Enable masked (wedge/diff-wtd) compound (0: false, 1: true (default)) --enable-interintra-comp=<a Enable interintra compound (0: false, 1: true (default)) --enable-diff-wtd-comp=<arg Enable difference-weighted compound (0: false, 1: true (default)) --enable-interinter-wedge=< Enable interinter wedge compound (0: false, 1: true (default)) --enable-interintra-wedge=< Enable interintra wedge compound (0: false, 1: true (default)) --enable-global-motion=<arg Enable global motion (0: false, 1: true (default)) --enable-warped-motion=<arg Enable local warped motion (0: false, 1: true (default)) --enable-obmc=<arg> Enable OBMC (0: false, 1: true (default)) dav1d 0.1.0 (e5bca59 / 2018-12-13) |
13th December 2018, 16:52 | #1303 | Link |
I am maddo saientisto!
Join Date: Aug 2018
Posts: 95
|
In x265, WPP hurts efficiency too. Should we stop using it?
The clip used is the F.Y.C one I described some pages ago Code:
Cmdlines: x265 --preset veryslow --tune ssim --crf 20 -F 1 --no-wpp -o test.x265.crf20.1F.00WPP.hevc orig.i420.y4m x265 --preset veryslow --tune ssim --crf 20 -F 1 -o test.x265.crf20.1F.12WPP.hevc orig.i420.y4m Sizes: test.x265.crf20.1F.00WPP.hevc: 5566953 test.x265.crf20.1F.12WPP.hevc: 5612446 (+0.81%) PSNR-HVS-M: test.x265.crf20.1F.00WPP.hevc: 42.9368 test.x265.crf20.1F.12WPP.hevc: 42.9299 (-0.02%) MS-SSIM: test.x265.crf20.1F.00WPP.hevc: 26.3172 test.x265.crf20.1F.12WPP.hevc: 26.3112 (-0.02%) I have already measured it: http://forum.doom9.org/showthread.ph...39#post1856939. That's -0.75% space efficiency with 0.0X% loss in quality. It's even comparable to x265's WPP! On the other hand, libaom's --frame-parallel=1 exhibits a 6% overhead. Just so that we're clear, libaom's --frame-parallel has got nothing to do with libdav1d's decoding option with the similar name, which doesn't depend on any optional characteristic of the bitstream. You can already have row-mt WITH tiles which should work decently. Maybe combine it with chunked encoding for better overall performance. So again, no excuses to not use tiles. Last edited by SmilingWolf; 13th December 2018 at 17:10. |
13th December 2018, 16:57 | #1304 | Link | |
Registered User
Join Date: Jul 2018
Posts: 80
|
Quote:
Now they will begin optimizing for SSS3 and SSE4.1 that all CPU have. They not know if it will work fine with just this two extensions... 1) You must think that this codec will not be mainstream until there are some HW encoders (6 months to 1 more year for Big Companies have a custom HW encoder) 2) If a Celerons can't decode 1080p with dav1d, the player have two options: serve another codec, or serve the same codec with less resolution. If you are big enough like youtube you can serve H264 to that HW and save bandwidth with the majority Or, it's just fine to serve AV1 720p videos to Celerons, and more to the others. (they shouldn't be too picky). For sure 8k video will be only be serve with AV1 in Youtube, like today VP9 is for >1080p. Last edited by marcomsousa; 13th December 2018 at 17:08. |
|
13th December 2018, 18:18 | #1305 | Link | ||
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
|
Quote:
For encoding, the speedup is slightly better than for tile parallelism, and the quality loss per added thread is less than for tile threading. For example, in my experiments, frame-multithreading in Eve/VP9 costs 0.0% BDRATE loss for a 1.8x speedup going from 1 to 2 threads, but tile threading only gives a 1.7x speedup and has a BDRATE quality loss of around 0.5%. This pattern holds for more threads, and tends to be true across multiple codecs and encoders. Now, obviously, libaom/vpx have no frame threaded encoding so not much to be said there. But in x264, my experience from many years ago is that they switches from slice to frame multi-threaded encoding for the same reason: better scaling *and* less BDRATE quality loss. So far, so good. OK, next, decoding. This is trickier. For ffh264, for example, we classically found that frame-multithreaded decoding gives a higher speedup than slice-multi-threaded decoding per added thread. Given this pattern of frame multithreading scaling better *and* having less quality loss than within-frame alternatives in a variety of codecs, you'd expect everything to be good, right? Well, not exactly. It holds true, but only to some extend. The problem in decoding of vp9/av1 is that frames depend on entropy output of the previous frame. For h264/5, cabac state resets in each frame, but this is not true for vp9/av1. So, for frame-multithreading, you need to split decoding in 2 passes, and pass 1 of the next dependent frame can only start when the previous frame finished it's pass 1 and started its pass 2. So, vp9/av1 *decoding* scale less well than h264 *decoding* when using frame multithreading. Fortunately, the system load doesn't go up either, so really what it means is that you need more threads to fully saturate a system. It's even better if you combine frame and tile threading, like what dav1d does. Wait, you're asking now, what about that statement that frame parallelism is bad in libvpx? Well, it's not what you think it is. --frame-parallel in vpxenc has nothing to do with frame multi-threading in the encoder. It's a header bit that removes the entropy dependency I just talked about. So now, it scales better when using frame multi-threading, which is why this bit is called the "frame parallelism" bit, but it also costs you all backwards entropy, incurring >1% BDRATE quality loss. However, there is no reason to do this. Hardware is not allowed to support higher resolutions with vs. without this feature, and there is no software decoder that implements frame multithreading with but not without entropy dependencies disabled. And if entropy dependencies are present, you can saturate system load anyway by simply using more threads. So the whole thing is kind of silly. Why give up quality for no gain whatsoever? Quote:
|
||
13th December 2018, 18:29 | #1306 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
|
Quote:
Last edited by Beelzebubu; 13th December 2018 at 18:31. |
|
13th December 2018, 18:48 | #1307 | Link | ||
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
|
Quote:
Quote:
|
||
13th December 2018, 19:50 | #1308 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
|
Quote:
And that's why you see weird things where using 256 instead of 128 threads (I think this is 32/16 frame threads x 8 tile threads) on a 32 core leads to pretty significant speedups (like this). |
|
13th December 2018, 20:00 | #1309 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
|
Quote:
And huh, I can just imagine the tears of people trying to implement low-cost HW decoders for this. I can see how interframe entropy could provide a percent or two of compression efficiency, though. I would rather have per-frame entropy and no slice requirement if I had a choice. |
|
13th December 2018, 20:05 | #1310 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
|
Quote:
|
|
13th December 2018, 21:16 | #1311 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,770
|
Quote:
Limiting entropy state reference to reference frames/tiles would be a lot more robust, but of reduce value. A bunch of non-ref b frames referencing the same frames probably have a lot more in common than any do to the ref-B/P/I frames they reference... Random access would also be slowed by interframe entropy coding; it's essentially adding another layer of reference dependencies. Entropy is easier to decode, but getting to an arbitrary frame in a long GOP could require decoding the entropy state of a lot more frames than it would with a traditional IbBbP with inter-frame entropy only. With 8 b-frames, getting to an arbitrary frame of H.264/HEVC requires decoding about 1/8th of frames between the IDR and the target frame. Seems like it could be a lot worse in AV1, if I am understanding correctly. |
|
13th December 2018, 21:20 | #1312 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
|
Quote:
|
|
14th December 2018, 00:31 | #1313 | Link | |
Registered User
Join Date: Jan 2007
Posts: 729
|
Why do you think the bestest encoders haven't? Enlightened ones have stropped using frame threading.
Quote:
For illustration, look for example at the difference in Windows 10 versus Windows 7 usage shown by general browsing-based statistics sources and by Steam. The former show ~45% for W10 while Steam gives it over 60 %. Last edited by mandarinka; 14th December 2018 at 00:35. |
|
14th December 2018, 08:35 | #1314 | Link | |
I am maddo saientisto!
Join Date: Aug 2018
Posts: 95
|
Quote:
My point was that there is no point in not using either frame threading, WPP (for x265) or tiling (for libaom) when the overhead is not only so low, but even very similar between the two. Yet I have never seen WPP get the same amount of flack tiling gets, especially considering tile-threading in libdav1d can contribute up to +108% of the decoding performance on its own: https://docs.google.com/spreadsheets...gid=1238661928 |
|
14th December 2018, 12:34 | #1315 | Link |
Registered User
Join Date: Sep 2002
Location: France
Posts: 432
|
Tiles will cause a coding efficiency loss, even if negligible in the big picture. But it is not such a boon either, except for encoders with particular limits, or software decoders. Same for WPP, which really is more a software decoder thing. Contrary to dav1d, your regular HEVC software decoder does not exploit the combined "threadability" of frames and tiles/WPP.
Last edited by Kurosu; 14th December 2018 at 12:39. |
14th December 2018, 12:39 | #1316 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
In the long run, features that allow faster software decoding are really just wasted coding efficiency. When a codec goes mainstream, you'll have a full stack of hardware decoders, which usually don't care that much about these things.
On top of that, if you look at frame threading numbers, the advantage from tile threading shrinks extremely rapidly. Comparing its speed advantage without frame threading is really only a very limited picture.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 14th December 2018 at 12:43. |
14th December 2018, 13:17 | #1317 | Link | |
I am maddo saientisto!
Join Date: Aug 2018
Posts: 95
|
Quote:
I still think that we can care about removing tiling from a libaom encoding workflow whenever the hardware goes mainstream and makes 4K decodable even on budget CPUs like v0lt's Pentium G5600, which should be 2-3 years (?), but I'm ok with the above. Hopefully in the same time rav1e will get proper psy-RD and frame-parallel encoding, too, so we won't have to care about it anyway. My main heat for the whole tiling debate comes from excluding from early adoption (i.e. right about now) a lot of low-medium tier systems with "inappropriate" encoding settings. In my early tests libdav1d could scale much better on my processor if combined with tiling rather than simply incresing the frame-threads above a certain threshold. Hard to justify a 4MB difference in 1GB of video when said video can't be decoded in real time at all. Still, the spreadsheet I quoted makes me think I should run the numbers again for dav1d. It has been a couple of months after all. Last edited by SmilingWolf; 14th December 2018 at 13:35. |
|
14th December 2018, 18:37 | #1318 | Link |
Registered User
Join Date: Nov 2010
Posts: 15
|
"Intel: AV1 support not yet in Gen11 Graphics, but coming soon after"
https://www.reddit.com/r/AV1/comment..._graphics_but/ Meaning late 2020, if Intel as usual introduces new CPU generations late in the year? Since these introductions have often only been paper launches, large-scale availability will only occur in 2021? |
14th December 2018, 19:45 | #1319 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
Thats about the time frame most here would expect hardware support. Maybe in 2020, or thereabouts.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
14th December 2018, 21:36 | #1320 | Link |
Registered User
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
|
But lets be honest here - with AMD finally being a viable alternative again, who is really buying Intel for their graphics capabilities?
__________________
____HTPC____ | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258 Radeon HD5870 | Intel iGPU 2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600 |
|
|