Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#141 | Link |
Registered User
Join Date: Aug 2015
Posts: 321
|
8-bit video: SSE4.1 vs AVX2 vs AVX-512 (on 8C/16T Rocket Lake) - https://code.videolan.org/videolan/d..._requests/1301
Last edited by lvqcl; 23rd March 2022 at 19:46. |
![]() |
![]() |
![]() |
#142 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 111
|
Quote:
Extreme example of the latter: 8-bit film grain is more than 3x as fast with AVX512 compared to AVX2. Last edited by Beelzebubu; 23rd March 2022 at 19:04. |
|
![]() |
![]() |
![]() |
#143 | Link |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
Wow, those are some very impressive speedups with AVX512! The new instructions are making at least as much of a difference than the "AVX2, but 2x wider" instructions.
Of course, Icelake CPUs don't have that much market share yet, but these kinds of speedups are quite promising in the long term for software decoding. |
![]() |
![]() |
![]() |
#145 | Link | |
Registered User
Join Date: Jun 2019
Posts: 21
|
dav1d 1.1.0 'Arctic Peregrine Falcon'
dav1d 1.1.0 was released yesterday. (Tag)
Quote:
|
|
![]() |
![]() |
![]() |
#146 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
Changes for 1.2.0 'Arctic Peregrine Falcon':
------------------------------------------- - Improvements on attachments of props and T.35 entries on output pictures - NEON z1/z3 high bit-depth optimizations and improvements for 8bpc - SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations - refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512 - AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64) - AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx |
![]() |
![]() |
![]() |
#147 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
Changes for 1.2.1 'Arctic Peregrine Falcon':
------------------------------------------- - Fix a threading race on task_thread.init_done - NEON z2 8bpc and high bit-depth optimizations - SSSE3 z2 high bit-depth optimziations - Fix a desynced luma/chroma planes issue with Film Grain - Reduce memory consumption - Improve dav1d_parse_sequence_header() speed - OBU: Improve header parsing and fix potential overflows - OBU: Improve ITU-T T.35 parsing speed - Misc buildsystems, CI and headers fixes |
![]() |
![]() |
![]() |
#148 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 491
|
Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
------------------------------------------------------ 1.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction. - Reduce memory usage in numerous places - ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures - new API function to check the API version: dav1d_version_api() - Rewrite of the SGR functions for ARM64 to be faster - NEON implemetation of save_tmvs for ARM32 and ARM64 - x86 palette DSP for pal_idx_finish function
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else. |
![]() |
![]() |
![]() |
#149 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 491
|
dav1d v1.3.0-3-g47107e3
Built on October 05, 2023, GCC 13.2.0 https://code.videolan.org/videolan/dav1d DL : dav1d v1.3.0-3-g47107e3
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else. Last edited by Barough; 5th October 2023 at 21:35. |
![]() |
![]() |
![]() |
#150 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
Changes for 1.4.0 'Road Runner':
------------------------------------------------------ 1.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations - AVX-512 optimizations for z1, z2, z3 in 8bit and high-bit depth - New architecture supported: loongarch - Loongarch optimizations for 8bit - New architecture supported: RISC-V - RISC-V optimizations for itx - Misc improvements in threading and in reducing binary size - Fix potential integer overflow with extremely large frame sizes |
![]() |
![]() |
![]() |
#152 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
v1.4.1 'Road Runner':
-------------------------------- - Optimizations for 6tap filters for NEON (ARM) - More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8) - Reduction of binary size on ARM64, ARM32 and RISC-V - Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter - Msac optimizations |
![]() |
![]() |
![]() |
#153 | Link |
Registered User
Join Date: Aug 2009
Posts: 202
|
dav1d pushed as part of a Google update going out to Android 12+
https://twitter.com/videolan/status/1781025929659392360 Apps will still use the Google developed alternative libgav1 unless they opt in though. |
![]() |
![]() |
![]() |
#154 | Link | |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 111
|
Quote:
"Apps need to opt into dav1d to benefit for now yet soon it will become the default av1 software decoder. " |
|
![]() |
![]() |
![]() |
#155 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,973
|
Quote:
They may leverage dav1d source code, but with their own tweaks and compile. |
|
![]() |
![]() |
![]() |
#157 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,375
|
Quote:
Obviously they compile their own. As does Google for Android. And Microsoft for Windows.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
|
![]() |
![]() |
![]() |
#158 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 7,187
|
dav1d 1.4.1-66-g3623543 (MSYS2; MinGW32 / MinGW64: GCC 14.1.0)
|
![]() |
![]() |
![]() |
#159 | Link |
Registered User
Join Date: Mar 2004
Posts: 1,175
|
Changes for 1.4.2 'Road Runner':
-------------------------------- 1.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC - AVX2 optimizations for 8-tap and new variants for 6-tap - AVX-512 optimizations for 8-tap and new variants for 6-tap - Improve entropy decoding on ARM64 - New ARM64 optimizations for convolutions based on DotProd extension - New ARM64 optimizations for convolutions based on i8mm extension - New ARM64 optimizations for subpel and prep filters for i8mm - Misc improvements on existing ARM64 optimizations, notably for put/prep - New PowerPC9 optimizations for loopfilter - Support for macOS kperf API for benchmarking |
![]() |
![]() |
![]() |
#160 | Link |
Registered User
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 111
|
The 6-tap optimizations for AVX2/512 were inspired by an earlier patch-set (provided by someone from Arm) doing the same on arm platforms. On both Arm (included in the previous release already) and x86 (in this release), on affected sequences (particularly these encoded using faster presets in encoders, which is what you'd find on Youtube etc.) this can provide a ~10% overall performance improvement. Pretty spectacular at this stage of dav1d's life cycle.
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|