Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#121 | Link |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
Massive speed up for 10 bpc content on AVX2 CPUs landed in the dav1d master recently:
https://code.videolan.org/videolan/d..._requests/1195 As well as another separate commit for AVX2 mc.emu_edge on 10 bpc content: https://code.videolan.org/videolan/d..._requests/1196 The work for the main (huge) commit was sponsored by Facebook and Netflix according to the merge request. It will be part of the dav1d 0.9 release which is likely to land pretty soon - this will also include numerous NEON asm for film grain synthesis on 8 bpc content, and the beginnings of the same for 10+ bpc content. This release will render most 4K 10 bpc content pretty playable on many 8 core AVX2 capable CPU's, so even those that bought AMD Renoir and Cezanne based NUCs should still manage pretty well even without the AV1 ASIC coming for Van Gogh and Rembrandt APUs onward. |
![]() |
![]() |
![]() |
#122 | Link |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
davi1d 0.9 (Golden Eagle) was officially released:
https://code.videolan.org/videolan/d...releases/0.9.0
|
![]() |
![]() |
![]() |
#123 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
Quote:
|
|
![]() |
![]() |
![]() |
#124 | Link | |
Registered User
Join Date: Jan 2019
Location: Canada
Posts: 581
|
Quote:
The changes also increased rav1e's encoding speed by 3x (with AVX2), since they share much of the ASM.
__________________
LG C2 OLED | GitHub Projects Last edited by quietvoid; 17th May 2021 at 23:09. |
|
![]() |
![]() |
![]() |
#125 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 111
|
Quote:
In most software-decoder players, I would expect the film grain to be added in the GPU directly. dav1d contains an example (dav1dplay) that demonstrates how to do this (using libplacebo), and you will get some speed-up from this. To emulate this using the dav1d binary, use --filmgrain=0 (or in ffmpeg: -filmgrain 0). In gav1, you'd use --post_filter_mask 0xf. This is especially important because 10-bit filmgrain has no Neon SIMD optimizations yet (but 8-bit Neon/SSSE3/AVX2 and 10-bit AVX2 is present). So keep this in mind when running comparisons. |
|
![]() |
![]() |
![]() |
#126 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 7,187
|
dav1d 0.9.0-0 (g8636b4f / 2021-05-16) (MSYS2 / MinGW, GCC 10.3.0)
I guess the "patches after release" increment is wrong... |
![]() |
![]() |
![]() |
#127 | Link | |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
Quote:
There is also the beginnings of 10 bpc NEON optimisations though which was added just before 0.9 - so I would expect either a 0.9.1/0.9.2 to cover it all before too long since it was only a couple of months from the first NEON 8 bpc FG patch to the last. Last edited by soresu; 19th May 2021 at 07:08. |
|
![]() |
![]() |
![]() |
#128 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
Quote:
That's for 10-bit specifically, correct? |
|
![]() |
![]() |
![]() |
#129 | Link | ||
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#130 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
Dav1d 0.9.1 changelog:
- 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge), prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener, sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32 - Fixes for filmgrain on ARM - itx 4x4 for SSE4 - Misc improvements on SSE2, SSE4 |
![]() |
![]() |
![]() |
#131 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
Quote:
Any updates on net 10-bit decode performance improvements? |
|
![]() |
![]() |
![]() |
#132 | Link | ||
Registered User
Join Date: Jun 2019
Posts: 21
|
dav1d 0.9.1: a ton of asm
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#134 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 111
|
Quote:
(Reason: things like coefficient decoding are basically identical between 8bit and 10bit, but things like prediction are slower because they require twice the memory. Therefore, overall slowdown depends on ratio between things that require double the memory (like prediction) and things that don't (like coef decoding). Because overall memory usage is cumulative between threads, you'll see a slightly larger drop-off with more threads.) |
|
![]() |
![]() |
![]() |
#135 | Link | |
Registered User
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 196
|
Quote:
As well as that they just landed what I think is the last of the 10 bit film grain asm (gen_grain) for ARM64 NEON. All told 1.0.0 should be pretty much every significant SIMD path fairly well optimised for 8 and 10 bpc content minus AVX512. |
|
![]() |
![]() |
![]() |
#136 | Link | |
Registered User
Join Date: Aug 2015
Posts: 321
|
dav1d 0.9.2
Quote:
|
|
![]() |
![]() |
![]() |
#137 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
Changes for 1.0.0 'Peregrine falcon':
------------------------------------- 1.0.0 is a major release of dav1d, adding important features and bug fixes. It notably changes, in an important way, the way threading works, by adding an automatic thread management. It also adds support for AVX-512 acceleration, and adds speedups to existing x86 code (from SSE2 to AVX2). 1.0.0 adds new grain API to ease acceleration on the GPU. Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning of the project to have a proper release. |
![]() |
![]() |
![]() |
#138 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,375
|
1.0.0 has not been released yet. Keep your pants on
![]()
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
![]() |
![]() |
![]() |
#139 | Link | |
Registered User
Join Date: Jun 2019
Posts: 21
|
dav1d 1.0.0 'Peregrine falcon'
dav1d 1.0.0 was released today. (Tag)
Quote:
|
|
![]() |
![]() |
![]() |
#140 | Link |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
[QUOTE=Spyros;1965983]dav1d 1.0.0 was released today. (Tag)
Do we know how much speedup AVX512 provided? We've not seen it to be particularly useful in encoder performance, so it'd be interesting if it helps more on the decode side. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|