Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th October 2022, 13:55   #8641  |  Link
LeXXuz
21 years and counting...
 
LeXXuz's Avatar
 
Join Date: Oct 2002
Location: Germany
Posts: 716
Is AVX512 really that much of a holy grail some people do claim - regarding performance?
LeXXuz is offline   Reply With Quote
Old 17th October 2022, 14:05   #8642  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 7,175
No. That's why its support is not auto-detected by default. It is a heat risk with only a small performance gain over AVX2.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 18th October 2022, 06:34   #8643  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 847
Quote:
Originally Posted by FranceBB View Post
For what it's worth, I can only test with my very own CPU and all I can say is that it works with my Intel Xeon Gold 6238R, however you have to enable it as it's disabled by default. I probably have a screenshot somewhere.
How can AVX2 enable?
I turn on basic SIMD X265.
#define ARCH_X86_64
#define HAVE_SSE3
#define HAVE_SSSE3
#define HAVE_SSE4
The rest of the AVX2 and AVX512 are probably default.
ARCH_X86_64 && cpuflag(avx512)
Jamaika is offline   Reply With Quote
Old 18th October 2022, 19:14   #8644  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,956
Quote:
Originally Posted by Jamaika View Post
How can AVX2 enable?
AVX2 on down are automatically enabled if available. AVX512 is the special case, as it is more likely to reduce performance than increase it.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 19th October 2022, 10:14   #8645  |  Link
A1
shortest name
 
Join Date: Sep 2022
Posts: 12
Will the current x265-3.5 encoder use ctu 64 parameters still cause 64x64 block textures to be weakened?
A1 is offline   Reply With Quote
Old 19th October 2022, 17:01   #8646  |  Link
vpupkind
Registered User
 
Join Date: Jul 2007
Posts: 63
Quote:
Originally Posted by benwaggoner View Post
AVX2 on down are automatically enabled if available. AVX512 is the special case, as it is more likely to reduce performance than increase it.
I've seen <=5% improvement in motion estimation (umh) in Cascade Lake times, haven't tested on IceLake though. The problem used to be thermals -- a 900MHz frequency penalty due to use of AVX-512.
vpupkind is offline   Reply With Quote
Old 19th October 2022, 22:08   #8647  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,180
Quote:
Originally Posted by Jamaika View Post
How?
ARCH_X86_64 && cpuflag(avx512)
You can enable any asm with the --asm command, like:

Code:
--asm avx512
the same goes if you want to enable only a specific set, like maybe you wanna try the speed differences between AVX512 and AVX so you can use

Code:
--asm avx
and so on and of course there's

Code:
--no-asm
for plain C.




Quote:
Originally Posted by benwaggoner View Post
AVX512 is the special case, as it is more likely to reduce performance than increase it.
...on consumer hardware.
Things are of course different in perfectly cooled down server rooms.

Quote:
Originally Posted by vpupkind View Post
The problem used to be thermals -- a 900MHz frequency penalty due to use of AVX-512.
Correct. Temperatures skyrocket high and make the CPU throttle thus going down up to a point that it's no longer feasible and the speed gain is nullified.
This of course doesn't apply to server rooms where temperature and humidity is perfectly controlled and CPUs can keep a high enough clock under pressure like it happens with my encodes.

This is the situation while encoding a MJPEG2000 4:4:4 12bit HDR PQ IMF with x265 to create a consumer H.265 file (with AVX-512 enabled):



as you can see, clock fluctuates a bit, but given that the overall temperature of the server room is really low, the CPU is able to keep working at regime under sustained pressure and in this case AVX-512 really make sense, which is why they're enabled in all my workflows.

If it wasn't for the wide user-driven community and the open source nature of the project, I would argue AVX512 should be enabled by default just like other intrinsics, 'cause for companies it makes a whole lot of sense...
FranceBB is offline   Reply With Quote
Old 19th October 2022, 22:58   #8648  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 352
Quote:
Originally Posted by FranceBB View Post
...on consumer hardware.
Things are of course different in perfectly cooled down server rooms.
? Not sure what you are on about, for example all Skylake-SP Xeons i've run avx512 code on downclocked ALOT more than my "consumer" Tiger-Lake laptop (and yeah the servers has ofc been in a proper server environment). This topic is a lot more complex then if you run the system in a cool room or not, and I'm pretty sure that most of the downclock on Xeons I've seen has been of powerlimit, not tempeture. AVX-512 behavior has been tweaked alot between models/platforms.

2,6Ghz looks rather good for 28C xeon under avx512 load. Mind sharing the frequency when you run x265 without it and the performance Numbers? Cause i also have some systems with Cascade Lake Refresh Xeons and i lost 5-10% when using avx512.

Edit. You pipe avs to x265 right? How CPU intesive is that script? How much of the load does the x265 process account for?

Last edited by excellentswordfight; 19th October 2022 at 23:33.
excellentswordfight is offline   Reply With Quote
Old 20th October 2022, 00:40   #8649  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,841
Quote:
Is AVX512 really that much of a holy grail some people do claim - regarding performance?
Not that one should hold his breath or should spend more on such CPU, as LigH says.
Here on a i9-11900K notebook ("14nm", 70nm gate pitch, 125W, 5,3GHz single core turbo, 8 core steady 4,5GHz)
I remember to have seen like +10 to +20% x265 fps gain invoking x265 -avx512.
Sustained AVX512 CPU clock was above 4GHz, downclocking was avoided with 2 fans running full 4800rpm.
Nice to have, and I will stay there, but not worth a bigger fuss.
Will have to repeat a test encode and note down my comparison.

Now in 2022 a "5nm" (in reality 51nm gate pitch) CPU (AMD) will be the better investment, I guess.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..."
Emulgator is offline   Reply With Quote
Old 20th October 2022, 07:29   #8650  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,956
So, I finally picked up a 16" Apple MacBook Pro with a M1 Pro processor today.

What's the best way to get a well-optimized x265 binary to run on this hardware?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 20th October 2022, 15:49   #8651  |  Link
vpupkind
Registered User
 
Join Date: Jul 2007
Posts: 63
Quote:
Originally Posted by FranceBB View Post

...on consumer hardware.
Things are of course different in perfectly cooled down server rooms.

Correct. Temperatures skyrocket high and make the CPU throttle thus going down up to a point that it's no longer feasible and the speed gain is nullified.
This of course doesn't apply to server rooms where temperature and humidity is perfectly controlled and CPUs can keep a high enough clock under pressure like it happens with my encodes.
I actually ran this on a bunch of 18-core Sky Lake and Cascade Lake Gold server CPUs. The problem is the internal CPU throttling. We locked it in the P0 state (no throttling overall), but use of AVX-512 still automagically reduced the whole CPU's frequency. Cascade Lake reduced the impact to a subset of cores physically close to the one using AVX-512.
vpupkind is offline   Reply With Quote
Old 20th October 2022, 16:56   #8652  |  Link
Ritsuka
Registered User
 
Join Date: Mar 2007
Posts: 103
Quote:
Originally Posted by benwaggoner View Post
So, I finally picked up a 16" Apple MacBook Pro with a M1 Pro processor today.

What's the best way to get a well-optimized x265 binary to run on this hardware?
Unfortunately I think you will have to compile it yourself, and maybe apply a couple of patches from https://github.com/HandBrake/HandBra...r/contrib/x265
Ritsuka is offline   Reply With Quote
Old 20th October 2022, 17:32   #8653  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,458
Quote:
Originally Posted by benwaggoner View Post
So, I finally picked up a 16" Apple MacBook Pro with a M1 Pro processor today.

What's the best way to get a well-optimized x265 binary to run on this hardware?
I would have said 'MacPorts or Homebrew', except that it seems neither of them apply the NEON acceleration patch for x265 that Apple submitted to Handbrake. And I don't know if the contents of said patch would otherwise be superseded by whatever NEON stuff has been committed upsteam (if any has). I've not tried the patch on my M1 Mac Mini, so I can't say what the difference in performance is (I also don't remember if I bothered building x265 there, either).

There is this build script that handles several things - including the Apple patch - for building FFmpeg: https://github.com/Vargol/ffmpeg-apple-arm64-build. As an aside, the 'avisynth' branch on that repo confuses me, because it's up-to-date with the master branch and has no additional changes, and the master branch doesn't have it enabled, even though that's fully possible.

Quote:
Originally Posted by vpupkind View Post
I actually ran this on a bunch of 18-core Sky Lake and Cascade Lake Gold server CPUs. The problem is the internal CPU throttling. We locked it in the P0 state (no throttling overall), but use of AVX-512 still automagically reduced the whole CPU's frequency. Cascade Lake reduced the impact to a subset of cores physically close to the one using AVX-512.
To expand on this,
https://en.wikipedia.org/wiki/AVX-512#Performance
https://en.wikipedia.org/wiki/Advanc...s#Downclocking

AVX downclocking was present as actual modes in several generations, based on the width of the executed instructions. To wit, GCC and Clang prefer a vector width of 256 when using AVX-512, which would largely sidestep the issue. From the snippets I've read on the topic, this also seems to be the way Zen4 implements AVX-512 in hardware.

Skylake had three levels, Ice Lake had only two. But as of Rocket Lake, those explicit downclocking modes are gone. AVX-512 will not downclock on modern generations just because 512-wide vectors get used, but only because doing so may or may not hit standard thermal or power limits, same as any other intensive process.
qyot27 is offline   Reply With Quote
Old 20th October 2022, 19:43   #8654  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,956
Quote:
Originally Posted by qyot27 View Post
I would have said 'MacPorts or Homebrew', except that it seems neither of them apply the NEON acceleration patch for x265 that Apple submitted to Handbrake. And I don't know if the contents of said patch would otherwise be superseded by whatever NEON stuff has been committed upsteam (if any has). I've not tried the patch on my M1 Mac Mini, so I can't say what the difference in performance is (I also don't remember if I bothered building x265 there, either).

There is this build script that handles several things - including the Apple patch - for building FFmpeg: https://github.com/Vargol/ffmpeg-apple-arm64-build. As an aside, the 'avisynth' branch on that repo confuses me, because it's up-to-date with the master branch and has no additional changes, and the master branch doesn't have it enabled, even though that's fully possible.
And that worked just great on the first try! A lot easier than using autobuildsuite on Windows 10.

I'd still like to get a separate x265 binary not in ffmpeg so I can use identical syntax across platforms, but this is certainly enough for perf testing.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 20th October 2022, 20:12   #8655  |  Link
Ritsuka
Registered User
 
Join Date: Mar 2007
Posts: 103
x265 master branch already contains the Apple intrinsics patches, plus a lot of additional Neon optimizations provided by Amazon. No need to look for weird forks or branches. But there are still a couple of patches in the HandBrake repository that will make it run better.
Ritsuka is offline   Reply With Quote
Old 21st October 2022, 09:25   #8656  |  Link
LeXXuz
21 years and counting...
 
LeXXuz's Avatar
 
Join Date: Oct 2002
Location: Germany
Posts: 716
Pardon my ignorance, but why does AVX512 produce so much more heat than AVX2 mode in x265?
LeXXuz is offline   Reply With Quote
Old 22nd October 2022, 00:17   #8657  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,956
Wow, x265 got more commits nine hours ago than it got the rest of 2022!

https://bitbucket.org/multicoreware/x265_git/commits/
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 22nd October 2022, 00:28   #8658  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,956
This is an interesting new command line added!

--[no-]mctf Enable GOP based temporal filter.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 22nd October 2022, 15:53   #8659  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 7,175
Quote:
Originally Posted by benwaggoner View Post
Wow, x265 got more commits nine hours ago than it got the rest of 2022!

https://bitbucket.org/multicoreware/x265_git/commits/
I hope that fixed some of the issues I had to complain about. The last set of commits before that severely destroyed compilation or multilib linking in MSYS2 with GCC 12.2.
_

No, there is no fix yet; MABS does not build x265 anymore.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 22nd October 2022 at 16:42.
LigH is offline   Reply With Quote
Old 22nd October 2022, 16:54   #8660  |  Link
quietvoid
Registered User
 
Join Date: Jan 2019
Location: Canada
Posts: 575
The new SBRC patch forgot to free memory used from edge detection buffers, so it probably leaks when using --sbrc without AQ mode 4.
It's essentially the same as a 2 year old patch for auto-AQ varying by frame average brightness and edge density.
__________________
LG C2 OLED | GitHub Projects

Last edited by quietvoid; 22nd October 2022 at 17:28.
quietvoid is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 15:20.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.