Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
22nd July 2017, 17:26 | #5461 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 480
|
x265 v2.5+4-01a981f509ea (GCC 7.1.0, 32 & 64-bit 8/10/12bit Multilib Windows Binaries)
x265 [info]: HEVC encoder version 2.5+4-01a981f509ea x265 [info]: build info [Windows][GCC 7.1.0][32/64 bit] 8bit+10bit+12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default |
24th July 2017, 09:51 | #5463 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
x265_2.5+6-d11482e5fedb (merge with stable)
fixes two memory leaks (threading, HDR10+), improves encoder reconfiguration, and allows forced output flushing: Code:
--force-flush <integer> Force the encoder to flush frames. Default 0 0 - flush the encoder only when all the input pictures are over. 1 - flush all the frames even when the input is not over. Slicetype decision may change with this option. 2 - flush the slicetype decided frames only. |
24th July 2017, 14:28 | #5465 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
I'm really sorry to inform you that there is no real AVX512 support inside your CPU, according to reviews and specs (?)
Intel fused the two FMA AVX2 units into one FMA AVX512, so it's like the support of AVX2 by Zen core which has only AVX128 units. Don't forget also that AVX512 clocks are a lot lower than AVX2 and sometimes lower even from base clock. Also, optimizations for AVX512 and x264/x265 will be minimal regarding performance. Threadripper CPU with a lot of cores and a very high clock, could be far more interesting.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
24th July 2017, 14:43 | #5466 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
Slightly more expensive Threadripper@4GHz 1920x (12C/24T) will probably destroy i7-7820x in video encoding. It is odd that you didn't want to wait two weeks for Threadrippers. Intel is now very bad in price to performance ratio.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
24th July 2017, 14:55 | #5467 | Link | |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
|
Quote:
So I would not be surprised if x265 may enable AVX512 instructions only on CPU's where their execution will be a benefit, for the same reason. |
|
25th July 2017, 09:57 | #5468 | Link | |
Registered User
Join Date: Dec 2014
Posts: 240
|
Quote:
Skylake-X should support these AVX512 instructions (in bold): AVX-512-F: F for Foundation AVX-512-BW: Support for 512-bit Word support AVX-512-CD: Conflict Detect (loop vectorization with possible conflicts) AVX-512-DQ: More instructions for double/quad math operations AVX-512-ER: Exponential and Reciprocal AVX-512-IFMA: Integer Fused Multiply Add with 52-bit precision AVX-512-PF: Prefetch Instructions AVX-512-VBMI: Vector Byte Manipulation Instructions AVX-512-VL: Foundation plus <512-bit vector length support AVX-512-4VNNIW: Vector Neural Network Instructions Word (variable precision) AVX-512-4FMAPS: Fused Multiply Accumulation Packed Single precision Last edited by jlpsvk; 25th July 2017 at 10:01. |
|
25th July 2017, 11:37 | #5469 | Link | |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
Quote:
1) AVX-512 instructions "generate" much more heat. Hence introduced by Intel negative AVX offset. 2) Speed-up in x265 will most likely be much lower than SSEx.x vs AVX2. Do not expect miracles in practice.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
|
25th July 2017, 11:44 | #5470 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
It's not the support of instructions that matters but the implementation. That's what I told you and that's exactly the same thing that Ligh told you. Your Skylake-X supports AVX512 but not in a fast way because Intel enables a real AVX512 FMA unit only on 10 core and above. Your CPU has a half speed implementation or slower. But, probably your reply shows us why you chose to buy that CPU in the first place.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
25th July 2017, 12:06 | #5471 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
x264 already got some AVX512 improvements (although its not complete yet, i've been told). You can use it already today to judge improvements. On a 7900X it does result in a real improvement, but as NikosD said, the 7900X has a second separate full 512-bit unit, which the 7800 and 7820 do not have.
The only "downside" of AVX512 is that the CPUs clock down when its in use due to the heat generation, however Skylake can change its clock much faster then previous platforms, so at least it won't be terrible. x265 already exeperienced issues with downlocks when they worked on AVX2 at first, which also downclocks on server CPUs, so hopefully they'll account for that and only use it when there is a real and tangible improvement to be had.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
25th July 2017, 12:14 | #5472 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Do we know how much is the real difference of x264-AVX512 using a 7900X compared to x264-AVX2 version on the same CPU ?
I'm pretty sure that Threadripper 16C/32T with the same price of 7900K will eat Skylake-X for breakfast on x264, even though it has only a fast FMA AVX-128 bit implementation and not a AVX512 of course.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
25th July 2017, 12:24 | #5473 | Link |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
x264 results are not that impressive.
Code:
2017-06-19 13:55:46 < BugMaster|work> I mean overall speed up vs no AVX512 on same CPU 2017-06-19 13:56:09 < Gramner> 5-10% vs avx2 on veryfast 2017-06-19 13:59:00 < BugMaster|work> and for veryslow it similar or should be faster? 2017-06-19 14:00:41 < Gramner> it goes down to +-0 at veryslow currently. |
25th July 2017, 15:07 | #5474 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
10% overall is pretty good from some improved SIMD functions. But like I said, its not done yet. There is more functions to optimize. When Gramner gets to those, he didn't say.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 25th July 2017 at 15:11. |
25th July 2017, 15:21 | #5475 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
NOT OVERALL! Do not bend facts! 10% max is only in veryfast preset. If you have 10+ core CPU then you most likely aim for veryslow preset for max quality. I doubt that you can get more that few percent extra speedup in those slow modes.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 25th July 2017 at 15:28. |
25th July 2017, 15:31 | #5476 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
|
Quote:
Its overall 10% faster in that preset, and thats still a significant speedup. These presets are still quite useful for live encoding for streaming, when the really slow ones are still too slow for realtime (and gaming at the same time, for example).
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 25th July 2017 at 15:33. |
|
25th July 2017, 15:33 | #5477 | Link | ||
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
It clearly says between 5 and 10%. So on average you get less than 10%. So extra ~7% more in useless veryfast preset is just a placebo for me.
Quote:
Quote:
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper Last edited by Atak_Snajpera; 25th July 2017 at 16:06. |
||
25th July 2017, 15:44 | #5478 | Link | |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Quote:
But doom9 folks tend to go for "veryslow or go home!"... Last edited by sneaker_ger; 25th July 2017 at 15:46. |
|
25th July 2017, 16:10 | #5479 | Link | |
Registered User
Join Date: Aug 2006
Posts: 2,229
|
Quote:
Anyways, it's highly probably that Threadripper will still outperform Skylake-X even without AVX2 functions, and it certainly beats it on price. Considering it is costs less and faster, that 5 to 10 percent for 'free' as you put is, actually in effect costs whatever the performance vs outlay cost difference is percentage wise. So definitely NOT a free 'advantage'! |
|
25th July 2017, 16:18 | #5480 | Link |
RipBot264 author
Join Date: May 2006
Location: Poland
Posts: 7,806
|
Also do not forget guys that intel's "NOT GLUED CORES TOGETHER" technology has serious problems with base clock when you add more and more cores.
Intel® Xeon® Platinum 8153 Processor (16c/32t) has base clock at only 2 GHz! http://ark.intel.com/products/series...ble-Processors ThreadRipper@4GHz 1950x will destroy Skylake-X even without almighty AVX-512.
__________________
Windows 7 Image Updater - SkyLake\KabyLake\CoffeLake\Ryzen Threadripper |
Thread Tools | Search this Thread |
Display Modes | |
|
|