Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 13th March 2020, 20:01   #7441  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Posts: 157
Hi all,

I'm trying to optimize the encoding speed with my Ryzen 3990X (64 cores - SMT disabled) on Windows 10 64 bit.

When encoding UHD @ 23,76 fps file, i only reach 55-58% of CPU charge.

I tried to add "--numa-pools=64" "settings but there's no change.

I currently use 3.2+38-fdbd4e4 build with VS 2019 / AVX2.

Do you have any idea how to speed up encoding?

Thank you.
Lucius Snow is offline   Reply With Quote
Old 13th March 2020, 20:17   #7442  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Quote:
Originally Posted by Lucius Snow View Post
Hi all,

I'm trying to optimize the encoding speed with my Ryzen 3990X (64 cores - SMT disabled) on Windows 10 64 bit.

When encoding UHD @ 23,76 fps file, i only reach 55-58% of CPU charge.

I tried to add "--numa-pools=64" "settings but there's no change.

I currently use 3.2+38-fdbd4e4 build with VS 2019 / AVX2.

Do you have any idea how to speed up encoding?

Thank you.
1) That's not the right switch. You'd want: --pools 64, but it x265 should generate 64 threads automatically if there are 64 logical processors showing to the OS.

2) It depends where the bottleneck is. Your source could be unable to feed frames to the encoder fast enough.

3) Split the encode into pieces on a scene change and run them simultaneously.
Stereodude is offline   Reply With Quote
Old 13th March 2020, 20:35   #7443  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,810
Quote:
Originally Posted by Lucius Snow View Post
Hi all,

I'm trying to optimize the encoding speed with my Ryzen 3990X (64 cores - SMT disabled) on Windows 10 64 bit.

When encoding UHD @ 23,76 fps file, i only reach 55-58% of CPU charge.

I tried to add "--numa-pools=64" "settings but there's no change.

I currently use 3.2+38-fdbd4e4 build with VS 2019 / AVX2.

Do you have any idea how to speed up encoding?

Thank you.
Disabling smt is ultra stupid idea because you lose a lot of performance! Up to 40%. The only good idea is to split video in chunks and encode then simultanously.
Atak_Snajpera is offline   Reply With Quote
Old 13th March 2020, 20:48   #7444  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Quote:
Originally Posted by Atak_Snajpera View Post
Disabling smt is ultra stupid idea because you lose a lot of performance! Up to 40%. The only good idea is to split video in chunks and encode then simultanously.
Windows 10 generally doesn't support more than 64 logical cores per NUMA node, so disabling SMT is the best way to get the most performance out of a 3990 under Windows 10 unless he uses Windows 10 Enterprise which can use all 128. Using 64 real cores is better than 64 logical cores across 32 real cores.
Stereodude is offline   Reply With Quote
Old 13th March 2020, 21:08   #7445  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Posts: 157
Exactly StereoDude.

I use Windows 10 LTSC so I have better performances with SMT disabled.
Lucius Snow is offline   Reply With Quote
Old 14th March 2020, 00:55   #7446  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Posts: 157
Interesting details:

I have a little better performance on Linux (Centos 8.1) but still far from 100% of CPU charge.

However, I made this X265 benchmark test: http://www.xin.at/x265/index-en.php

I reached 00:40:09:644 on Linux and 01:14:24.446 on Windows.

Last edited by Lucius Snow; 14th March 2020 at 01:02.
Lucius Snow is offline   Reply With Quote
Old 14th March 2020, 03:15   #7447  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,810
Quote:
Originally Posted by Stereodude View Post
Windows 10 generally doesn't support more than 64 logical cores per NUMA node, so disabling SMT is the best way to get the most performance out of a 3990 under Windows 10 unless he uses Windows 10 Enterprise which can use all 128. Using 64 real cores is better than 64 logical cores across 32 real cores.
In video encoding 64 logical cpus per CPU group is not a problem. You just have to run additional x265 instances. Problem solved! Disabling smt is a bizarre workaround to low CPU usage by video encoder.
Atak_Snajpera is offline   Reply With Quote
Old 14th March 2020, 09:41   #7448  |  Link
MeteorRain
結城有紀
 
Join Date: Dec 2003
Location: NJ; OR; Shanghai
Posts: 894
Quote:
Originally Posted by Atak_Snajpera View Post
Up to 40%.
Do you have benchmarks to support that? I thought we'd lose at most 20% performance by not using SMT but that's just my wild guess. Would love to see actual numbers showing the difference.
__________________
Projects
x265 - Yuuki-Asuna-mod Download / GitHub
TS - ADTS AAC Splitter | LATM AAC Splitter | BS4K-ASS
Neo AviSynth+ filters - F3KDB | FFT3D | DFTTest | MiniDeen | Temporal Median
MeteorRain is offline   Reply With Quote
Old 14th March 2020, 14:12   #7449  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
What is an exception code of 0xc0000005 in x265?
Code:
Faulting application name: x265.exe, version: 3.3.0.1, time stamp: 0x5e4bfd9a
Faulting module name: x265.exe, version: 3.3.0.1, time stamp: 0x5e4bfd9a
Exception code: 0xc0000005
Fault offset: 0x00000000004fb4d2
Faulting process id: 0x23a4
Faulting application start time: 0x01d5f9aa83090eef
Faulting application path: C:\HDTV Tools\x265\x265.exe
Faulting module path: C:\HDTV Tools\x265\x265.exe
Report Id: 921cdb5a-b329-4dac-b876-d60ff8a06c73
I made a few changes to my x265 command line and have gotten the same exception code twice in 10 hours (out of 8 simultaneous encodes on two different encode segments) mid encode after never seeing it ever before of many days of simultaneous encoding on the same system with the same version of x265. Frankly, I've never seen this error before from any version of x265 on any system before with any combination of command line switches.

I changed from this:
Code:
START "Enc #6" /NORMAL /NODE 0 /AFFINITY 00000F00 "x265.exe" --pools 4 -F 1 --crf 16.0 -p veryslow --no-sao --aq-mode 1 --aq-strength 1.15 --vbv-maxrate 25000 --vbv-bufsize 25000 --level 5.0 --keyint 120 --open-gop -D 10 --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 --qpfile 6.chp -o "out_6.265" "in_6.avs"
to this:
Code:
START "Enc #6" /NORMAL /NODE 0 /AFFINITY 00000F00 "x265.exe" --pools 4 -F 1 --crf 16.0 -p veryslow --aq-strength 1.15 --vbv-maxrate 25000 --vbv-bufsize 25000 --level 5.0 --keyint 120 --open-gop -D 10 --colorprim "bt709" --transfer "bt709" --colormatrix "bt709" --sar 1:1 --qpfile 6.chp -o "out_6.265" "in_6.avs"
The video has the following properties:
Code:
AVSMeter 2.9.8 (x64), 2012-2020, (c) Groucho2004
AviSynth+ 3.4 (r2925, master, x86_64) (3.4.0.0)

Number of frames:                30565
Length (hh:mm:ss.ms):     00:21:14.815
Frame width:                      1920
Frame height:                     1080
Framerate:                      23.976 (24000/1001)
Colorspace:                  YUV420P10
I restarted the first of the two segment that crashed and it has made it past the prior crash point.

I'm using HolyWu's build if that matters.

Last edited by Stereodude; 14th March 2020 at 14:42.
Stereodude is offline   Reply With Quote
Old 14th March 2020, 14:20   #7450  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Quote:
Originally Posted by Atak_Snajpera View Post
In video encoding 64 logical cpus per CPU group is not a problem. You just have to run additional x265 instances. Problem solved! Disabling smt is a bizarre workaround to low CPU usage by video encoder.
*sigh*
Disabling SMT is how you optimize the performance of an application that will only run on a single NUMA node with a processor like a 3990. The FPS of his single encode with SMT disabled from his single x265 instance is significantly higher than it would be if SMT was enabled. I'd estimate 70-80%.

Is there an alternative workaround for maximizing x265 performance with running multiple simultaneous encodes, yes.

Last edited by Stereodude; 14th March 2020 at 14:22.
Stereodude is offline   Reply With Quote
Old 14th March 2020, 18:36   #7451  |  Link
Lucius Snow
Registered User
 
Join Date: Oct 2003
Posts: 157
I can't split the video source file so I don't think running multiple instances would help :\
Lucius Snow is offline   Reply With Quote
Old 14th March 2020, 20:37   #7452  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Quote:
Originally Posted by Lucius Snow View Post
I can't split the video source file so I don't think running multiple instances would help :\
What file format is the video source? You don't have to split the source video file. You only have to split the encoding of the source video file.

By the way, you could just encode two different video source files at the same time.

Last edited by Stereodude; 14th March 2020 at 22:32.
Stereodude is offline   Reply With Quote
Old 15th March 2020, 08:06   #7453  |  Link
LazyNcoder
Registered User
 
Join Date: Feb 2015
Posts: 33
Hey guys, any good and free tool to extract hdr10plus meta tags as json and re use it with x265?
I used quietvoid's hdr10plus_parser tool but it gives me error:
Quote:
Reading parsed dynamic metadata... thread 'main' panicked at 'assertion failed: `(left == right)`
left: `10`,
right: `9`', src/hdr10plus.rs:324:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Tried it on several sources. Any other tool around here?
LazyNcoder is offline   Reply With Quote
Old 15th March 2020, 17:30   #7454  |  Link
quietvoid
Registered User
 
Join Date: Jan 2019
Location: Canada
Posts: 574
Quote:
Originally Posted by LazyNcoder View Post
Hey guys, any good and free tool to extract hdr10plus meta tags as json and re use it with x265?
I used quietvoid's hdr10plus_parser tool but it gives me error:


Tried it on several sources. Any other tool around here?
Make sure to update to 0.2.7. If that doesn't work either, I'll need more info.
quietvoid is online now   Reply With Quote
Old 15th March 2020, 18:15   #7455  |  Link
LazyNcoder
Registered User
 
Join Date: Feb 2015
Posts: 33
Quote:
Originally Posted by quietvoid View Post
Make sure to update to 0.2.7. If that doesn't work either, I'll need more info.
Yes, tried it with 0.2.5 0.2.6 0.2.7 all was the same.
What info do you need? Let me know. I wanted to PM you about it but seems like it's not available.
LazyNcoder is offline   Reply With Quote
Old 15th March 2020, 18:34   #7456  |  Link
quietvoid
Registered User
 
Join Date: Jan 2019
Location: Canada
Posts: 574
Quote:
Originally Posted by LazyNcoder View Post
Yes, tried it with 0.2.5 0.2.6 0.2.7 all was the same.
What info do you need? Let me know. I wanted to PM you about it but seems like it's not available.
I just need a sample (or title) of what you're trying to parse, it's hard to test everything. You can open an issue on Github or PM me here.
quietvoid is online now   Reply With Quote
Old 16th March 2020, 10:42   #7457  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,779
@Stereodude: I used to believe that the more threads are working on the same video, the smaller the scope of each thread gets, the less efficient the search for redundant areas will be, which will limit quality. Is that no concern for you?
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 16th March 2020, 13:28   #7458  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Quote:
Originally Posted by LigH View Post
@Stereodude: I used to believe that the more threads are working on the same video, the smaller the scope of each thread gets, the less efficient the search for redundant areas will be, which will limit quality. Is that no concern for you?
Why do you say that?

I've been using --pools 4 -F 1 for my encodes. Mostly because in prior testing I saw a noticeable quality improvement limiting the simultaneous frames to 1. I limit the pools to 4 mostly because when I set frames to 1 it only uses about 4 threads worth of CPU.

I'm going to retest if -F 1 is still necessary with the latest builds since it seems to have considerably image quality improvements that are leading me to rather different conclusions on my preferred settings vs. the last time I tested over a year ago. I find AQ2 w/ SAO left enabled has a very pleasing look now whereas I previously thought it was terrible. Maybe it's just me...

Last edited by Stereodude; 16th March 2020 at 13:32.
Stereodude is offline   Reply With Quote
Old 16th March 2020, 16:29   #7459  |  Link
fauxreaper
Registered User
 
Join Date: Oct 2014
Posts: 23
With less frame-threads and no-wpp, the encode has better quality and uses less bitrate. In my tests, disabling wpp is more effective to quality and bitrate than decreasing frame-threads.
fauxreaper is offline   Reply With Quote
Old 16th March 2020, 16:40   #7460  |  Link
Stereodude
Registered User
 
Join Date: Dec 2002
Location: Region 0
Posts: 1,436
Where can I get another 64-bit Windows build of 3.3+1-f94b0d32737d? I already have a HolyWu build made with Clang 9.0.0 and want to compare a behavior I see to a different build of the same x265 version.
Stereodude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:19.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.