Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 9th June 2025, 19:39   #1  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
x265 benchmark on a 7975WX Threadripper

Hello, i've finaly finished making my gig, and made some tests.
CPU: 7975WX (AVX512 & HT enabled in the BIOS) => 32 physical cores (64 logicals).
Motherboard: ASUS Pro WS WRX90E-SAGE SE
Memory CORSAIR CMA128GX5M8B5600C40(Ver 5.43.01) 8x 16GB

I've made a Windows 10/Windows 11 dual boot.
PC is totaly offline, a lot of services disabled, so it's doing "almost" nothing else and so there is "almost" no waste of time.

Used x265 4.1.0.104, build with LLVM 20.1.4.
Zen4 is build with: -Ofast -march=znver4
Zen4s is build with: -Ofast -msse2 -mavx -mavx2 -mfma -mtune=znver4
(So Zen4s is optimised for Zen4 buit without using AVX512 instructions).

I've made a lot of differents tests on a small 128 frames 4k 10bits file.
The commande line used is (avg bitrate is 40000):
Code:
SET E_SRC=%8%1.avs
SET E_DST=%5%1.hevc
SET CHAPTERS=%8%7
SET STAT_FILE=%8%1.stats
SET LOG_FILE_1=%8%1_log_1.txt
SET LOG_FILE_2=%8%1_log_2.txt
SET LOG_FILE_3=%8%1_log_3.txt
SET BITRATE=%2
SET TUNING=%6
SET MCLL=%3
SET MDISPLAY=%4

x265_x64 --asm avx512 --preset slower --vbv-maxrate 90000 --vbv-bufsize 70000 --bitrate %BITRATE% --stats %STAT_FILE% --level 5.1 --profile main10 --high-tier --level-idc 51 --hist-scenecut --fades --aq-mode 4 --aq-auto 6 --weightb --rc-lookahead 72 --tskip --tskip-fast --no-rect --me hex --subme 2 --b-intra --no-sao --deblock -1,-1 --psy-rd 2.5 --psy-rdoq 4 --multi-pass-opt-analysis --multi-pass-opt-distortion --video-signal-type-preset BT2100_PQ_YCC -D 10 --max-cll %MCLL% --master-display %MDISPLAY% --hdr10-opt --qpfile %CHAPTERS% --input %E_SRC% --pass 1 -o NUL 2> %LOG_FILE_1%
x265_x64 --asm avx512 --preset slower --vbv-maxrate 90000 --vbv-bufsize 70000 --bitrate %BITRATE% --stats %STAT_FILE% --level 5.1 --profile main10 --high-tier --level-idc 51 --hist-scenecut --fades --aq-mode 4 --aq-auto 6 --weightb --rc-lookahead 72 --tskip --tskip-fast --rect --no-amp --me umh --subme 3 --b-intra --no-sao --deblock -1,-1 --psy-rd 2.5 --psy-rdoq 4 --multi-pass-opt-analysis --multi-pass-opt-distortion --video-signal-type-preset BT2100_PQ_YCC -D 10 --max-cll %MCLL% --master-display %MDISPLAY% --hdr10-opt --qpfile %CHAPTERS% --input %E_SRC% --pass 3 -o NUL 2> %LOG_FILE_2%
x265_x64 --asm avx512 --preset slower --vbv-maxrate 90000 --vbv-bufsize 70000 --bitrate %BITRATE% --stats %STAT_FILE% --level 5.1 --profile main10 --high-tier --level-idc 51 --hist-scenecut --fades --aq-mode 4 --aq-auto 6 --weightb --rc-lookahead 72 --tskip --tskip-fast --rect --amp --b-intra --no-sao --deblock -1,-1 --psy-rd 2.5 --psy-rdoq 4 --scenecut-aware-qp 3 --multi-pass-opt-analysis --multi-pass-opt-distortion --video-signal-type-preset BT2100_PQ_YCC -D 10 --max-cll %MCLL% --master-display %MDISPLAY% --hdr10-opt --qpfile %CHAPTERS% --input %E_SRC% --pass 2 -o %E_DST% 2> %LOG_FILE_3%
Results are sometimes... unexpected.
Of course, each time encode is from the same file.

Zen4
Windows 10
Pass 1: encoded 128 frames in 45.09s (2.84 fps)
Pass 2: encoded 128 frames in 73.77s (1.74 fps)
Pass 3: encoded 128 frames in 62.94s (2.03 fps)
Windows 11
Pass 1: encoded 128 frames in 43.15s (2.97 fps)
Pass 2: encoded 128 frames in 111.11s (1.15 fps)
Pass 3: encoded 128 frames in 93.05s (1.38 fps)

Zen4s
Windows 10
Pass 1: encoded 128 frames in 44.51s (2.88 fps)
Pass 2: encoded 128 frames in 73.16s (1.75 fps)
Pass 3: encoded 128 frames in 62.09s (2.06 fps)
Windows 11
Pass 1: encoded 128 frames in 43.24s (2.96 fps)
Pass 2: encoded 128 frames in 111.24s (1.15 fps)
Pass 3: encoded 128 frames in 93.54s (1.37 fps)

Results:
- Zen4 & Zen4s have (almost) identical results.
- First pass is a little faster on Windows 11, but Windows 11 is significantly slower on Pass 2 & 3 !

Zen4s without the --asm AVX512.
Windows 10
Pass 1: encoded 128 frames in 48.72s (2.63 fps) [AVX512 +9.5%]
Pass 2: encoded 128 frames in 107.07s (1.20 fps) [AVX512 +45,8%]
Pass 3: encoded 128 frames in 96.89s (1.32 fps) [AVX512 +56,1%]
Windows 11
Pass 1: encoded 128 frames in 54.68s (2.34 fps) [AVX512 +26.5%]
Pass 2: encoded 128 frames in 111.77s (1.15 fps) [AVX512 +0.0%]
Pass 3: encoded 128 frames in 94.69s (1.35 fps) [AVX512 +1.5%]

Results:
For Windows 10, the difference is great, but Windows 11...
It's like on Pass 2 & Pass 3 Windows 11 is not using AVX512

I've noticed using the task manager that even if x265 creates 64 threads (it's notified in the log), the total CPU usage was under 50%.
Si I tryed, adding --pools 32, but kept --asm AVX512.

Zen4s
Windows 10
Pass 1: encoded 128 frames in 44.25s (2.89 fps)
Pass 2: encoded 128 frames in 92.07s (1.39 fps)
Pass 3: encoded 128 frames in 76.46s (1.67 fps)

Results : A little slower (expected), but not so much.

So... I said to myself: As i have a lot of memory, if i start 2 encodes in the same time with --pools 32, maybe it could be interesting.
Encodes are made from 2 identical files on 2 differents HDD.

From 1rst test :
1 file full speed (64 threads):
Windows 10: 181,80s => 2 encodes take 363,60s
Windows 11: 248,02s => 2 encodes take 496,04s

Now, there is 2 encodes in parallel, with --pools 32 & --asm AVX512.
Windows 10
File 1:
Pass 1: encoded 128 frames in 50.92s (2.51 fps)
Pass 2: encoded 128 frames in 114.36s (1.12 fps)
Pass 3: encoded 128 frames in 94.96s (1.35 fps)
=> Total of 260,18s
File 2:
Pass 1: encoded 128 frames in 51.39s (2.49 fps)
Pass 2: encoded 128 frames in 84.04s (1.52 fps)
Pass 3: encoded 128 frames in 68.29s (1.87 fps)
=> Total of 203,73s
=> 2 files encoded in 260,18s instead of 363,60s => -28%.
But at one time, one file get slower, the load was not equal between the files.
Windows 11
File 1:
Pass 1: encoded 128 frames in 49.73s (2.57 fps)
Pass 2: encoded 128 frames in 116.22s (1.10 fps)
Pass 3: encoded 128 frames in 96.31s (1.33 fps)
=> Total of 262,26s
File 2:
Pass 1: encoded 128 frames in 49.66s (2.58 fps)
Pass 2: encoded 128 frames in 115.14s (1.11 fps)
Pass 3: encoded 128 frames in 96.03s (1.33 fps)
=> Total of 260,83s
=> 2 files encoded in 262,26s instead of 496,04s => -47%.
The % gain is better than Windows 10, the load is equal, but nevertheless result is finaly the same than with Windows 10.

If this slowdown of Pass 2 & Pass 3 between Windows 10 & Windows 11 could be explained and solved, my guess is that Windows 11 would be better than Windows 10, but for now, it's not the case.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 10th June 2025, 06:27   #2  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 576
VBV is non-deterministic.
Auto-AQ is non-deterministic.
Test result is subject to high margine of error.
Z2697 is offline   Reply With Quote
Old 10th June 2025, 08:42   #3  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
Non-deterministic just means that doing 2 times the exact same encode will not produce the exact same result file, it doesn't mean the encoding time will change drasticaly. As i don't care of file result and just encoding time, i don't think this remark is relevant, and don't agree with it.

But... As i also think test results are relevant, i'll do this evening when back home 4 times the exact same test (on both Windows 10 & Windows 11), and see if there is a significant difference in enconding time between each of them.
If there is, i was wrong, if not you were wrong.
__________________
My github.

Last edited by jpsdr; 10th June 2025 at 08:46.
jpsdr is offline   Reply With Quote
Old 10th June 2025, 08:56   #4  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 463
Quote:
Originally Posted by jpsdr View Post
Non-deterministic just means that doing 2 times the exact same encode will not produce the exact same result file, it doesn't mean the encoding time will change drasticaly.
Given your very short input file and x265's bitrate control implementation I am not so sure about this.
__________________
My github...
rwill is offline   Reply With Quote
Old 10th June 2025, 12:05   #5  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
I'll see the result when back home of launching several time the same test, if time change.
If not, it meens tests are relevants. If yes, i'll do the same test but duplicate 10 times the clip in the avs script, creating 1280 frames, making encoding time between 10 to 20 minutes... And redoing some tests (in that case, probably not so much).
I must says, for time saving, that i hope the result will be that time between test will not change...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 10th June 2025, 18:46   #6  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
Back home, results of enconding time consistancy.

Windows 10
#1
Pass 1: encoded 128 frames in 44.16s (2.90 fps)
Pass 2: encoded 128 frames in 72.95s (1.75 fps)
Pass 3: encoded 128 frames in 61.92s (2.07 fps)
#2
Pass 1: encoded 128 frames in 44.20s (2.90 fps)
Pass 2: encoded 128 frames in 72.52s (1.76 fps)
Pass 3: encoded 128 frames in 62.15s (2.06 fps)
#3
Pass 1: encoded 128 frames in 44.25s (2.89 fps)
Pass 2: encoded 128 frames in 72.82s (1.76 fps)
Pass 3: encoded 128 frames in 62.12s (2.06 fps)
#4
Pass 1: encoded 128 frames in 44.43s (2.88 fps)
Pass 2: encoded 128 frames in 72.64s (1.76 fps)
Pass 3: encoded 128 frames in 62.10s (2.06 fps)
Results
Pass 1 : vary from 44.16s to 44.43s => 0.6%
Pass 1 : vary from 72.52s to 72.95s => 0.6%
Pass 3 : vary from 61.92s to 62.15s => 0.4%

Windows 11
#1
Pass 1: encoded 128 frames in 43.13s (2.97 fps)
Pass 2: encoded 128 frames in 111.58s (1.15 fps)
Pass 3: encoded 128 frames in 93.35s (1.37 fps)
#2
Pass 1: encoded 128 frames in 44.20s (2.90 fps)
Pass 2: encoded 128 frames in 111.47s (1.15 fps)
Pass 3: encoded 128 frames in 93.70s (1.37 fps)
#3
Pass 1: encoded 128 frames in 43.11s (2.97 fps)
Pass 2: encoded 128 frames in 111.42s (1.15 fps)
Pass 3: encoded 128 frames in 93.64s (1.37 fps)
#4
Pass 1: encoded 128 frames in 43.10s (2.97 fps)
Pass 2: encoded 128 frames in 111.52s (1.15 fps)
Pass 3: encoded 128 frames in 93.88s (1.36 fps)
Results
Pass 1 : vary from 43.10s to 44.20s => 2.6%
Pass 1 : vary from 111.42s to 111.58s => 0.1%
Pass 3 : vary from 93.35s to 93.88s => 0.6%

Obviously my results are a lot of things, but NOT subject to high marging of error !
This confirm my statement in post #3.

Nevertheless, i'll try, just the case 32 threads/2 encodes in the same time, looping 10 times my small file in the avs script (so 1280 frames) and only one Windows 10 and Windows 11, to see if the CPU load balance is better on a larger file.
__________________
My github.

Last edited by jpsdr; 10th June 2025 at 18:53.
jpsdr is offline   Reply With Quote
Old 10th June 2025, 19:16   #7  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 576
Then, maybe Hypervisor? Windows 11 is very stubborn on getting that enabled, and other things... for safety (not sure about that).
I mean it's really strange to me, I think Windows 11 should still be very similar to Windows 10 (down in the kernel), without the online bloats running, how can it perform such differently?
Z2697 is offline   Reply With Quote
Old 10th June 2025, 21:47   #8  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
@Z2697
What's odd, is that Windows 11 is a little faster on Pass 1, but a lot slower only on Pass 2 & Pass 3. And according the speed test without AVX512, it looks like on Windows 11 AVX512 is disabled just for Pass 2 & Pass 3. But it make no sense... Why would AVX512 be disabled on Pass 2 & Pass 3 on Windows 11 and not on Windows 10...

=================================

Otherwise, test of a 1280 frames files, AVX512, 32 threads, 2 files encoded in the same time.

Windows 10
File 1
Pass 1: encoded 1280 frames in 489.71s (2.61 fps)
Pass 2: encoded 1280 frames in 881.18s (1.45 fps)
Pass 3: encoded 1280 frames in 709.66s (1.80 fps)
=> Total of 2080.55s
File 2
Pass 1: encoded 1280 frames in 490.34s (2.61 fps)
Pass 2: encoded 1280 frames in 883.15s (1.45 fps)
Pass 3: encoded 1280 frames in 709.87s (1.80 fps)
=> Total of 2083.36s
Result: As i suspected, on this specific test, small file is not accurate to check the CPU load balance, too short in time. With a bigger file, the result shows an excellent CPU balance, with almost identical time for each file, giving a total of 2083.36s for encoding 2 files.
If i make a quick computation according speed of one encoding :
Pass 1 -> 2.89fps => 442.91s
Pass 2 -> 1,75fps => 731.43s
Pass 3 -> 2.06fps => 621.36s
=> Total of 1795.70s -> 3591.40s for 2 files
Encoding time : -42%

Windows 11
File 1
Pass 1: encoded 1280 frames in 437.46s (2.93 fps)
Pass 2: encoded 1280 frames in 1065.31s (1.20 fps)
Pass 3: encoded 1280 frames in 938.72s (1.36 fps)
=> Total of 2441.49s
File 2
Pass 1: encoded 1280 frames in 578.49s (2.21 fps)
Pass 2: encoded 1280 frames in 1028.20s (1.24 fps)
Pass 3: encoded 1280 frames in 897.66s (1.43 fps)
=> Total of 2504.35s
Result: CPU balance is good, but not as good on Windows 10, and time is bigger.

Winner : Windows 10
At least for now...

For the record, i've a 'tunne' install of Windows 11 with by default a lot of crap disabled, i also disabled a lot of things i'm not using like firewall and defender and a lot of network services as PC is totaly offline.
For the record also, i've made a "Windows update" on both of them when i've installed the OS last WE, before making them totaly offline, so they are, normaly, "up to date".

@Z2697
I've checked on my Windows 11, Hyper-V is totaly disabled in the Program features.
And also on my Windows 10 after checking.
__________________
My github.

Last edited by jpsdr; 10th June 2025 at 21:58.
jpsdr is offline   Reply With Quote
Old 11th June 2025, 08:44   #9  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
I just thought this morning of a test i'll do this evening when back home.
I've tested with an LLVM build, i'll test with a GCC build.
Shouldn't change things, in theory, but at this point...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 11th June 2025, 16:18   #10  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 576
Some security settings will enable hypervisor regradless of the checkboxies in the features dialogue.
In fact, I can't find a sane way to disable the hypervisor in Windows 11 24H2. (of course disable SVM in BIOS do the trick)
You can run msinfo32 to check if the hypervisor is running. (it will say hypervisor is detected in the bottom row)

Although the virtualization should be pretty efficient, there may still be some edge cases.

Last edited by Z2697; 11th June 2025 at 16:24.
Z2697 is offline   Reply With Quote
Old 11th June 2025, 19:00   #11  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
So...
I've deactivated SVM in the BIOS.
I've followed all the guides to disable Hyper-V in Windows.
Result : encode speed is a little faster, but, no change in the fact that Pass 2 & 3 are a lot slower in Windows 11 than Windows 10...

Also, no speed difference between LLVM and GCC builds.

Also, if you know how to permanently disable Defender in Windows 11 i take it !!!
I've been able, it seems, to do it under Windows 10, but Windows 11 is...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 11th June 2025, 19:22   #12  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
For now, the only clue i see is this result:
Windows 11
Pass 1: encoded 128 frames in 54.68s (2.34 fps) [AVX512 +26.5%]
Pass 2: encoded 128 frames in 111.77s (1.15 fps) [AVX512 +0.0%]
Pass 3: encoded 128 frames in 94.69s (1.35 fps) [AVX512 +1.5%]

I don't know how it could be possible, but "Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth".
So, how improbable it could be, but not impossible, for now the only conclusion i have, is that there is something in the code that prevent the use of AVX512 path specificaly under Windows 11 and not Windows 10, linked to one of the settings i use in Pass 2 and Pass 3.

I don't konw if people from Multicoreware read here and can eventualy provide their insights.
__________________
My github.

Last edited by jpsdr; 11th June 2025 at 19:27.
jpsdr is offline   Reply With Quote
Old 12th June 2025, 08:56   #13  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
I have a little time, so i post results with Hyper-V deactivated:

Windows 10
Pass 1: encoded 128 frames in 43.33s (2.95 fps) => +4.0%
Pass 2: encoded 128 frames in 70.98s (1.80 fps) => +3.9%
Pass 3: encoded 128 frames in 60.47s (2.12 fps) => +4.1%

Speed increase is very stable, small, but it's always good to take. Value is just high enough to not be considered as "noise".

Windows 11
Pass 1: encoded 128 frames in 42.84s (2.99 fps) => +0.7%
Pass 2: encoded 128 frames in 110.45s (1.16 fps) => +0.6%
Pass 3: encoded 128 frames in 92.55s (1.38 fps) => +0.5%

Speed increase is also very stable, but... so small that it can be "noise".
__________________
My github.

Last edited by jpsdr; 12th June 2025 at 13:34.
jpsdr is offline   Reply With Quote
Old 12th June 2025, 13:56   #14  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
@Z2697
Didn't tried yet, but interesting:
https://winaerotweaker.com/
https://github.com/ionuttbara/windows-defender-remover
Also found (still not tested) this :
https://github.com/TairikuOokami/Win...%20Disable.bat
__________________
My github.

Last edited by jpsdr; 12th June 2025 at 14:28.
jpsdr is offline   Reply With Quote
Old Yesterday, 11:29   #15  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
GCC mcf version is supposed to have a Windows 10 optimised threading model, so, i've tested GCC mcf build vs LLVM build.

AVX512, 64 threads.

Windows 10 - LLVM
Pass 1: encoded 128 frames in 43.30s (2.96 fps)
Pass 2: encoded 128 frames in 70.82s (1.81 fps)
Pass 3: encoded 128 frames in 60.53s (2.11 fps)
Windows 10 - GCC mcf
Pass 1: encoded 128 frames in 44.20s (2.90 fps) => -2.0%
Pass 2: encoded 128 frames in 72.88s (1.76 fps) => -2.8%
Pass 3: encoded 128 frames in 61.88s (2.07 fps) => -2.2%

LLVM wins.

Windows 11 - LLVM
Pass 1: encoded 128 frames in 42.80s (2.99 fps)
Pass 2: encoded 128 frames in 111.23s (1.15 fps)
Pass 3: encoded 128 frames in 93.47s (1.37 fps)
Windows 11 - GCC mcf
Pass 1: encoded 128 frames in 43.97s (2.91 fps) => -2.7%
Pass 2: encoded 128 frames in 112.88s (1.13 fps) => -1.5%
Pass 3: encoded 128 frames in 95.16s (1.35 fps) => -1.8%

LLVM wins.

Around 2% is not a big deal, but as i said, everything is good to take.
__________________
My github.
jpsdr is offline   Reply With Quote
Old Yesterday, 15:15   #16  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Finland
Posts: 5,822
You'll want to try znver2 instead of znver4 and enable AVX512 separately if that's possible. Znver3 and znver4 are both broken in LLVM and actually produce slower binaries than znver2.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old Yesterday, 16:13   #17  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 576
Quote:
Originally Posted by Boulder View Post
You'll want to try znver2 instead of znver4 and enable AVX512 separately if that's possible. Znver3 and znver4 are both broken in LLVM and actually produce slower binaries than znver2.
While that's true, the CPU arch flags in compiler makes almost no difference in x265.
Z2697 is offline   Reply With Quote
Old Yesterday, 16:26   #18  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 576
Can you try Linux?
Z2697 is offline   Reply With Quote
Old Yesterday, 20:24   #19  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,450
No, can't try Linux.

I'll try zenver2 with AVX512 compile options enabled, when i have time, but even if broken, LLVM is still faster than GCC.
__________________
My github.

Last edited by jpsdr; Yesterday at 20:26.
jpsdr is offline   Reply With Quote
Old Today, 00:43   #20  |  Link
RanmaCanada
Registered User
 
Join Date: May 2009
Posts: 347
I am sure many would like to see you run the advanced benchmark sagittare created..

https://forum.doom9.org/showthread.php?t=185855
RanmaCanada is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.