Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
23rd November 2019, 16:25 | #1 | Link |
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 363
|
X265 slow encoding
hi, ive been doing a few encodes lately on my lab-encoder and im getting abmysal encoding speeds connected to the ffmpeg+x265 not using all the cpu capacity:
HW: Dual Intel Silver 4118 (20cores + 20HT cores), RHEL 7.7 sourcefile is a 422hq Prores file ffmpeg is the latest gitpull from yesterday Result: encoding at 1fps speed 0.04x Machine has a load of 2 (40cores..??), almost all cpu cores are idle ./ffmpeg -loglevel verbose -i file_p25.mov -strict -1 -vf format=yuv420p10 -codec:v libx265 -x265-params keyint=100:min-keyint=100:no-open-gop=1 -level 4.1 -preset veryslow -crf 16 -profile:v main10 -y xtemp3_P6slow_nolimit_max.ts Anyone know whats going on? |
23rd November 2019, 16:32 | #2 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,815
|
Add --ctu 16 or just just ripbot264 in distributed encoding mode
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
24th November 2019, 00:50 | #4 | Link | |
Registered User
Join Date: Apr 2018
Posts: 61
|
Quote:
|
|
19th December 2019, 08:40 | #5 | Link | |
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 363
|
Quote:
Code:
./ffmpeg -loglevel verbose -i ARCHIVE.mov -strict -1 -vf format=yuv420p10 -codec:v libx265 -x265-params keyint=100:min-keyint=100:no-open-gop=1:pmode=1 -level 4.1 -preset veryslow -crf 16 -profile:v main10 -y test-veryslow.ts With and without PMODE (given the syntax is correct) im getting 0.2x (ca. 14% system usage) From the log: x265 [info]: HEVC encoder version 3.2+2-82a66ce12955 x265 [info]: build info [Linux][GCC 6.3.0][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main 10 profile, Level-4 (Main tier) x265 [info]: Thread pool created using 64 threads x265 [info]: Thread pool created using 64 threads x265 [info]: Slices : 1 x265 [info]: frame threads / pool features : 5 / wpp(17 rows)+pmode x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 57 / 4 / 5 x265 [info]: Keyframe min / max / scenecut / bias: 100 / 100 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt : 40 / 8 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / off / off x265 [info]: AQ: mode / str / qg-size / cu-tree : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-16.0 / 0.60 x265 [info]: tools: rect amp rd=6 psy-rd=2.00 rdoq=2 psy-rdoq=1.00 rskip x265 [info]: tools: signhide tmvp b-intra strong-intra-smoothing deblock sao Last edited by TEB; 19th December 2019 at 09:04. |
|
19th December 2019, 13:51 | #6 | Link |
Registered User
Join Date: Oct 2014
Posts: 476
|
Have you tried not using ffmpeg? ffmpeg is useful because it can do anything, but it doesn't do anything well.
Using MeGUI on Windows (calling the x265 binary) I have no problem hitting 100% load my 8-threaded i7 4790 at --preset veryslow on a 1080p Blu-ray. Though interestingly I barely exceed 1FPS in 10 bit. Have processors come so far that my 3.8GHz 4/8 Haswell can nearly be equaled by two cores of a 3GHz Xeon? Passmark shows that Xeon as being reasonably behind my i7 single-threaded. (70% of the speed) Last edited by kuchikirukia; 19th December 2019 at 14:11. |
21st December 2019, 10:43 | #7 | Link | |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
Quote:
|
|
21st December 2019, 20:53 | #8 | Link | |
Lost my old account :(
Join Date: Jul 2017
Posts: 324
|
Quote:
And tbh veryslow is literally very slow, to the point were its almost unusable (especially after the latest preset changes). I would say that 'slower' is the lowest "usable" preset atm. For reference, this is what i get with an Xeon GOLD 6126 (12C/24T) --veryslow 0,8fps (25-40% utilization) --slower --ctu 32 --merange 26 3fps (100% utilization) almost a 4x speed increase. edit. Also keep in mind that the source have a large effect on speed, you cannot do a direct comparison without using the same files. Last edited by excellentswordfight; 21st December 2019 at 21:06. |
|
22nd December 2019, 08:17 | #9 | Link | |
Registered User
Join Date: Oct 2014
Posts: 476
|
Quote:
While it doesn't look like he's going to see anything close to a 4x speedup if he fixes his threadedness issue, if it's a reasonable gain it may turn out a difference between running one to two encodes on each CPU vs five on each. |
|
22nd December 2019, 09:43 | #10 | Link | |
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 363
|
Quote:
Mind explaining what --ctu 32 and --merange 26 means? |
|
22nd December 2019, 09:44 | #11 | Link | |
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 363
|
Quote:
|
|
22nd December 2019, 17:29 | #12 | Link | |
Lost my old account :(
Join Date: Jul 2017
Posts: 324
|
--ctu specify the maxiumum CU size, the default value is rather large and is mostly beneficial for high res (UHD) material and it has a large effect on parallelism at lower res. It can be reduced for greater parallelism without any big effect on compression. I usually leave it at 64 for 1080p, and go for 32 at 720p and bellow, but if you are looking at using more threads and still use single encoding, this is one of the key parameters.
--merange sets the motion search range, the default value (57) is calculated based on the default CTU value of 64. The doc explains it rather well: Quote:
Last edited by excellentswordfight; 22nd December 2019 at 17:44. |
|
23rd December 2019, 05:30 | #13 | Link | ||
Registered User
Join Date: Oct 2014
Posts: 476
|
Quote:
Quote:
While the veryslow preset doesn't scale out to the 40 threads of your system, it should be able to do 8, and my guess as to why you can't hit that would be an issue with your ffmpeg build. |
||
23rd December 2019, 13:59 | #14 | Link | |
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 363
|
Quote:
Code:
./ffmpeg -loglevel verbose -i ARCHIVE.mov -strict -1 -vf format=yuv420p10 -codec:v libx265 -x265-params keyint=100:min-keyint=100:no-open-gop=1:pmode=1:ctu=32:merange:26 -level 4.1 -preset veryslow -crf 16 -profile:v main10 -y test-veryslow.ts Code:
x265 [info]: HEVC encoder version 3.2+2-82a66ce12955 x265 [info]: build info [Linux][GCC 6.3.0][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main 10 profile, Level-4 (Main tier) x265 [info]: Thread pool created using 64 threads x265 [info]: Thread pool created using 64 threads x265 [info]: Slices : 1 x265 [info]: frame threads / pool features : 5 / wpp(17 rows) x265 [info]: Coding QT: max CU size, min CU size : 64 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 57 / 4 / 5 x265 [info]: Keyframe min / max / scenecut / bias: 23 / 250 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt : 40 / 8 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / off / off x265 [info]: AQ: mode / str / qg-size / cu-tree : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-16.0 / 0.60 x265 [info]: tools: rect amp rd=6 psy-rd=2.00 rdoq=2 psy-rdoq=1.00 rskip x265 [info]: tools: signhide tmvp b-intra strong-intra-smoothing deblock sao [mpegts @ 0x793f4c0] service 1 using PCR in pid=256, pcr_period=83ms [mpegts @ 0x793f4c0] muxrate VBR, sdt every 500 ms, pat/pmt every 100 ms |
|
23rd December 2019, 23:13 | #15 | Link | |
Angel of Night
Join Date: Nov 2004
Location: Tangled in the silks
Posts: 9,559
|
Quote:
|
|
24th December 2019, 09:51 | #16 | Link | ||
Lost my old account :(
Join Date: Jul 2017
Posts: 324
|
Quote:
Quote:
Last edited by excellentswordfight; 24th December 2019 at 09:53. |
||
24th December 2019, 14:23 | #18 | Link |
Registered User
Join Date: Feb 2003
Location: Palmcoast of Norway
Posts: 363
|
UPDATE:
Code:
x265 [info]: HEVC encoder version 3.2+2-82a66ce12955 x265 [info]: build info [Linux][GCC 6.3.0][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 x265 [info]: Main 10 profile, Level-4 (Main tier) x265 [info]: Thread pool created using 64 threads x265 [info]: Thread pool created using 64 threads x265 [info]: Slices : 1 x265 [info]: frame threads / pool features : 5 / wpp(34 rows)+pmode x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 32 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 26 / 4 / 5 x265 [info]: Keyframe min / max / scenecut / bias: 100 / 100 / 40 / 5.00 x265 [info]: Lookahead / bframes / badapt : 40 / 8 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / off / off x265 [info]: AQ: mode / str / qg-size / cu-tree : 2 / 1.0 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-16.0 / 0.60 x265 [info]: tools: rect amp rd=6 psy-rd=2.00 rdoq=2 psy-rdoq=1.00 rskip x265 [info]: tools: signhide tmvp b-intra strong-intra-smoothing deblock sao Code:
./ffmpeg -loglevel verbose -i ARCHIVE.mov -strict -1 -vf format=yuv420p10 -codec:v libx265 -x265-params keyint=100:min-keyint=100:no-open-gop=1:pmode=1:ctu=32:merange=26 -level 4.1 -preset veryslow -crf 16 -profile:v main10 -y test-veryslow.ts FPS encoding: 7fps A load of 21 on a 128cored cpu is a tad low Any more tips to improve it and not move to lower quality profiles? TEST1: I tested medium preset for the fun of it, but i still had like 25ish load and ca 44 fps.. So in other words, higher framerate but the load isnt all that great.. TEST2: I spawned 4 encoding instances like the one over in veryslow mode and i got a load of ca. 97 Last edited by TEB; 24th December 2019 at 14:36. |
Thread Tools | Search this Thread |
Display Modes | |
|
|