Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd May 2025, 18:05   #9761  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,556
I am about to buy an AMD 99503D, which supports AVX-512 very well. I wanted to prepare a command line with --asm avx512, but fail to see avx512 as a valid string in the documentation any more. Are there still versions of x265 that support this?
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 3rd May 2025, 18:36   #9762  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 7,223
Support for AVX512 certainly still exists, I just read it in the sources. But it will not be auto-detected. So add the parameter explicitly; if it is not supported either by hardware or by your OS, or being removed from the CLI without my knowledge, x265 will tell you when you execute it.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 3rd May 2025, 18:47   #9763  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,556
Thank you.
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 4th May 2025, 01:43   #9764  |  Link
TR-7970X
Registered User
 
TR-7970X's Avatar
 
Join Date: Jan 2025
Posts: 65
Quote:
Originally Posted by asarian View Post
Thank you.
I also have a 9950X3D, and I have this:- --asm avx512 in my x265 command line.

The documentation needs to be updated.
__________________
Main Systems:-
Threadripper 7970X on Asus Pro WS TRX50-Sage WiFi
Ryzen 9 9950X3D on MSI Carbon X670E
Ryzen 9 7950X on Gigabyte Aorus Elite B650
Intel 13900KF on MSI Tomahawk B660
TR-7970X is online now   Reply With Quote
Old 4th May 2025, 08:23   #9765  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,556
Quote:
Originally Posted by TR-7970X View Post
I also have a 9950X3D, and I have this:- --asm avx512 in my x265 command line.
Good to hear.

Can you tell me something about the performance boost using AVX512 gives? There are many charts on the 99050X3D, and even on them using x265, but never with AVX512 being used. I feel it will make quite a difference.
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 4th May 2025, 10:52   #9766  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 7,223
In general: In the past, the difference between AVX2 and AVX512 used to be marginal, and AVX512 has a higher risk of thermal throttling.

I can't tell you current results though, sorry. Try to search this thread for AVX512, I believe there have been reports a few months ago.

Like here
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 4th May 2025, 11:06   #9767  |  Link
TR-7970X
Registered User
 
TR-7970X's Avatar
 
Join Date: Jan 2025
Posts: 65
Quote:
Originally Posted by asarian View Post
Good to hear.

Can you tell me something about the performance boost using AVX512 gives? There are many charts on the 99050X3D, and even on them using x265, but never with AVX512 being used. I feel it will make quite a difference.
I can't give you any info on performance, as I haven't been doing much recently, and I guess I just use it because I can.

I never noticed any thermal throttling with the 7950X, and I would suggest that the 9950X3D & the 7970X won't either, but I do have pretty good cooling.

But one thing I have noticed is 2 major typos in your post's, when referencing the 9950X3D !!!!
__________________
Main Systems:-
Threadripper 7970X on Asus Pro WS TRX50-Sage WiFi
Ryzen 9 9950X3D on MSI Carbon X670E
Ryzen 9 7950X on Gigabyte Aorus Elite B650
Intel 13900KF on MSI Tomahawk B660
TR-7970X is online now   Reply With Quote
Old 4th May 2025, 12:39   #9768  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,911
On a i9-11900K I found a few percent speed gain using -- asm avx512, limited by some of Intels safety features (limiting to base clock)
and the available cooling of a 125W CPU+115W GPU notebook system, although with plenty air intake, exhaust and 2x 5000rpm fan).

On a real desktop AMD 7950X/9950X I would expect considerable speed gains in the range of >25%, so well worth it.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 4th May 2025, 15:25   #9769  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 521
Unfortunatelly they are nowhere near 25% (compared to avx2)
Z2697 is offline   Reply With Quote
Old 4th May 2025, 16:54   #9770  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,556
Quote:
Originally Posted by LigH View Post
In general: In the past, the difference between AVX2 and AVX512 used to be marginal, and AVX512 has a higher risk of thermal throttling.
That's the beauty if the 9950X3D! Allegedly it includes an extremely efficient implementation of AVX512, where the CPU doesn't reduce clockspeed for them, and only consumes a few Watt extra on AVX512. So, this should go endlessly better on the 9950X3D.
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 4th May 2025, 17:44   #9771  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 446
Quote:
Originally Posted by Z2697 View Post
Unfortunatelly they are nowhere near 25% (compared to avx2)
Yeah, there are not that many situations in a HEVC encoder where one needs 512bit wide registers.
__________________
My github...
rwill is offline   Reply With Quote
Old 4th May 2025, 18:13   #9772  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,911
Quote:
Yeah, there are not that many situations in a HEVC encoder where one needs 512bit wide registers.
True. I should have gone down to 15% in my estimate.
How many % speed advantage AVX2 vs AVX512 in the described scenario do 7950X/9950X owners note ?
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 4th May 2025, 19:25   #9773  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 521
1%-1.5% and 5%-8% respectively, for at least the set of settings and hardwares that I tested with.
(why there's a range, because lower quality / bitrate have more speedup somehow)
Z2697 is offline   Reply With Quote
Old 4th May 2025, 20:43   #9774  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 370
Quote:
Originally Posted by Emulgator View Post
True. I should have gone down to 15% in my estimate.
How many % speed advantage AVX2 vs AVX512 in the described scenario do 7950X/9950X owners note ?
Quote:
Originally Posted by Z2697 View Post
1%-1.5% and 5%-8% respectively, for at least the set of settings and hardwares that I tested with.
(why there's a range, because lower quality / bitrate have more speedup somehow)
Sounds about right, I've seen about 5-8% with Xeon (Sapphire Rapids), and only 1-3% with 7000-series (Threadripper Pro). With the avx512 change in Zen5 i assume it should be about the same performance gain as the xeon.
excellentswordfight is offline   Reply With Quote
Old 8th May 2025, 23:14   #9775  |  Link
_DLS_
Registered User
 
Join Date: May 2021
Posts: 1
Quote:
Originally Posted by asarian View Post
I am about to buy an AMD 99503D, which supports AVX-512 very well. I wanted to prepare a command line with --asm avx512, but fail to see avx512 as a valid string in the documentation any more. Are there still versions of x265 that support this?
In my tests with the 9950X3D, the speed bump with --asm avx512 can be between 16-36% depending on the other settings too.

on 2160p:

CTU 32, ref 4, subme 4, rd 4, rect, no-amp, aq-mode 2, tu-intra-depth 3, tu-inter-depth 3, max-merge 5, crf 20 => +15.5%
CTU 64, ref 5, subme 5, rd 4, rect, no-amp, aq-mode 2, tu-intra-depth 4, tu-inter-depth 4, max-merge 5, crf 18 => +36%

Temps are manageable on air cooling.
_DLS_ is offline   Reply With Quote
Old 11th May 2025, 22:35   #9776  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 521
Quote:
Originally Posted by _DLS_ View Post
In my tests with the 9950X3D, the speed bump with --asm avx512 can be between 16-36% depending on the other settings too.

on 2160p:

CTU 32, ref 4, subme 4, rd 4, rect, no-amp, aq-mode 2, tu-intra-depth 3, tu-inter-depth 3, max-merge 5, crf 20 => +15.5%
CTU 64, ref 5, subme 5, rd 4, rect, no-amp, aq-mode 2, tu-intra-depth 4, tu-inter-depth 4, max-merge 5, crf 18 => +36%

Temps are manageable on air cooling.
Kinda too good to be true...
Z2697 is offline   Reply With Quote
Old 11th May 2025, 22:50   #9777  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 521
x265 will produce nonconformity bitstream when a very sudden change in chroma channel happens and triggered weighted prediction.

https://bitbucket.org/multicoreware/...roma_offset_lx
(Finally remembered my bitbucket account)

Related issue (4 years ago, when I didn't know the root cause): https://bitbucket.org/multicoreware/x265_git/issues/582

Many hardware decoders will fail to decode such frame, and result in corrupted output.

HM will abort due to assertion.

Code:
int pred = (128 - ((128 * wp[plane].inputWeight) >> (wp[plane].log2WeightDenom)));
int deltaChroma = (wp[plane].inputOffset - pred);
WRITE_SVLC(deltaChroma, "delta_chroma_offset_lX");
TL;DR: the deltaChroma in the code I referenced above should be in the range of [-512, 511], but there’s no check on the value of pred. For example if the value of pred is a fairly “large” negative number, the deltaChroma can exceed the range.

Last edited by Z2697; Yesterday at 23:11.
Z2697 is offline   Reply With Quote
Old Yesterday, 16:43   #9778  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,014
Quote:
Originally Posted by LigH View Post
In general: In the past, the difference between AVX2 and AVX512 used to be marginal, and AVX512 has a higher risk of thermal throttling.
More to the point "the difference between AVX2 and AVX512 used to be marginal, BECAUSE of AVX512 thermal throttling."

The per clock throughput improvements were solid, but the throttling reductions in instructions per second nearly cancelled those out. An implementation that maintains IPS with the IPC gains would be quite impressive.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old Yesterday, 16:47   #9779  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,014
Quote:
Originally Posted by Z2697 View Post
TL;DR: the deltaChroma in the code I referenced above should be in the range of [-512, 511], but there’s no check on the value of pred. For example if the value of pred is a fairly “large” negative number, the deltaChroma can exceed the range.
Good catch!
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old Yesterday, 17:08   #9780  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,556
Quote:
Originally Posted by _DLS_ View Post
In my tests with the 9950X3D, the speed bump with --asm avx512 can be between 16-36% depending on the other settings too.

on 2160p:

CTU 32, ref 4, subme 4, rd 4, rect, no-amp, aq-mode 2, tu-intra-depth 3, tu-inter-depth 3, max-merge 5, crf 20 => +15.5%
CTU 64, ref 5, subme 5, rd 4, rect, no-amp, aq-mode 2, tu-intra-depth 4, tu-inter-depth 4, max-merge 5, crf 18 => +36%

Temps are manageable on air cooling.
Now we're talking!

I heard +37% elsewhere (but didn't mention that yet, as I was looking for objective feedback). So, it now seems that number was quite accurate!
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 07:01.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.