Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 22nd July 2017, 17:26   #5461  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.5+4-01a981f509ea (GCC 7.1.0, 32 & 64-bit 8/10/12bit Multilib Windows Binaries)

x265 [info]: HEVC encoder version 2.5+4-01a981f509ea
x265 [info]: build info [Windows][GCC 7.1.0][32/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2


Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
Barough is offline   Reply With Quote
Old 24th July 2017, 09:07   #5462  |  Link
Midzuki
Unavailable
 
Midzuki's Avatar
 
Join Date: Mar 2009
Location: offline
Posts: 1,480
x265.exe 2.5+6-d11482e5fedb

https://forum.videohelp.com/threads/...=1#post2492072
Midzuki is offline   Reply With Quote
Old 24th July 2017, 09:51   #5463  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
x265_2.5+6-d11482e5fedb (merge with stable)

fixes two memory leaks (threading, HDR10+), improves encoder reconfiguration, and allows forced output flushing:

Code:
   --force-flush <integer>       Force the encoder to flush frames. Default 0
                                 0 - flush the encoder only when all the input pictures are over.
                                 1 - flush all the frames even when the input is not over. Slicetype decision may change with this option.
                                 2 - flush the slicetype decided frames only.
I guess this is mainly interesting for scenarios with changing parameters where a quick response is required?
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 24th July 2017, 11:51   #5464  |  Link
jlpsvk
Registered User
 
Join Date: Dec 2014
Posts: 240
Any news about AVX-512 instructions support? Tommorow my new lovely i7-7820X will arrive, so a testing volunteer is here.
jlpsvk is offline   Reply With Quote
Old 24th July 2017, 14:28   #5465  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
I'm really sorry to inform you that there is no real AVX512 support inside your CPU, according to reviews and specs (?)

Intel fused the two FMA AVX2 units into one FMA AVX512, so it's like the support of AVX2 by Zen core which has only AVX128 units.

Don't forget also that AVX512 clocks are a lot lower than AVX2 and sometimes lower even from base clock.

Also, optimizations for AVX512 and x264/x265 will be minimal regarding performance.

Threadripper CPU with a lot of cores and a very high clock, could be far more interesting.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 24th July 2017, 14:43   #5466  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Quote:
Originally Posted by jlpsvk View Post
Any news about AVX-512 instructions support? Tommorow my new lovely i7-7820X will arrive, so a testing volunteer is here.
Slightly more expensive Threadripper@4GHz 1920x (12C/24T) will probably destroy i7-7820x in video encoding. It is odd that you didn't want to wait two weeks for Threadrippers. Intel is now very bad in price to performance ratio.
Atak_Snajpera is offline   Reply With Quote
Old 24th July 2017, 14:55   #5467  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Quote:
Originally Posted by NikosD View Post
Intel fused the two FMA AVX2 units into one FMA AVX512, so it's like the support of AVX2 by Zen core which has only AVX128 units.

Don't forget also that AVX512 clocks are a lot lower than AVX2 and sometimes lower even from base clock.
That reminds me of the behaviour on AMD Phenom-II CPU's which are more or less capable of executing SSE3 instructions, but x264 and x265 refuse to enable them because their implementation is so slow (and possibly even incomplete?), thus the fastest instruction set for those old engines is "SSE2Fast".

So I would not be surprised if x265 may enable AVX512 instructions only on CPU's where their execution will be a benefit, for the same reason.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 25th July 2017, 09:57   #5468  |  Link
jlpsvk
Registered User
 
Join Date: Dec 2014
Posts: 240
Quote:
Originally Posted by LigH View Post
That reminds me of the behaviour on AMD Phenom-II CPU's which are more or less capable of executing SSE3 instructions, but x264 and x265 refuse to enable them because their implementation is so slow (and possibly even incomplete?), thus the fastest instruction set for those old engines is "SSE2Fast".

So I would not be surprised if x265 may enable AVX512 instructions only on CPU's where their execution will be a benefit, for the same reason.
That's why I choosed i7-7820X.

Skylake-X should support these AVX512 instructions (in bold):
AVX-512-F: F for Foundation
AVX-512-BW: Support for 512-bit Word support
AVX-512-CD: Conflict Detect (loop vectorization with possible conflicts)
AVX-512-DQ: More instructions for double/quad math operations

AVX-512-ER: Exponential and Reciprocal
AVX-512-IFMA: Integer Fused Multiply Add with 52-bit precision
AVX-512-PF: Prefetch Instructions
AVX-512-VBMI: Vector Byte Manipulation Instructions
AVX-512-VL: Foundation plus <512-bit vector length support
AVX-512-4VNNIW: Vector Neural Network Instructions Word (variable precision)
AVX-512-4FMAPS: Fused Multiply Accumulation Packed Single precision

Last edited by jlpsvk; 25th July 2017 at 10:01.
jlpsvk is offline   Reply With Quote
Old 25th July 2017, 11:37   #5469  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Quote:
Originally Posted by jlpsvk View Post
That's why I choosed i7-7820X.

Skylake-X should support these AVX512 instructions (in bold):
AVX-512-F: F for Foundation
AVX-512-BW: Support for 512-bit Word support
AVX-512-CD: Conflict Detect (loop vectorization with possible conflicts)
AVX-512-DQ: More instructions for double/quad math operations

AVX-512-ER: Exponential and Reciprocal
AVX-512-IFMA: Integer Fused Multiply Add with 52-bit precision
AVX-512-PF: Prefetch Instructions
AVX-512-VBMI: Vector Byte Manipulation Instructions
AVX-512-VL: Foundation plus <512-bit vector length support
AVX-512-4VNNIW: Vector Neural Network Instructions Word (variable precision)
AVX-512-4FMAPS: Fused Multiply Accumulation Packed Single precision
Did you know that...
1) AVX-512 instructions "generate" much more heat. Hence introduced by Intel negative AVX offset.
2) Speed-up in x265 will most likely be much lower than SSEx.x vs AVX2.

Do not expect miracles in practice.
Atak_Snajpera is offline   Reply With Quote
Old 25th July 2017, 11:44   #5470  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by jlpsvk View Post
That's why I choosed i7-7820X.

Skylake-X should support these AVX512 instructions (in bold):
Sorry to bother you again, but you clearly didn't understand my reply and certainly not Ligh's too.

It's not the support of instructions that matters but the implementation.
That's what I told you and that's exactly the same thing that Ligh told you.

Your Skylake-X supports AVX512 but not in a fast way because Intel enables a real AVX512 FMA unit only on 10 core and above.

Your CPU has a half speed implementation or slower.

But, probably your reply shows us why you chose to buy that CPU in the first place.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 25th July 2017, 12:06   #5471  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
x264 already got some AVX512 improvements (although its not complete yet, i've been told). You can use it already today to judge improvements. On a 7900X it does result in a real improvement, but as NikosD said, the 7900X has a second separate full 512-bit unit, which the 7800 and 7820 do not have.

The only "downside" of AVX512 is that the CPUs clock down when its in use due to the heat generation, however Skylake can change its clock much faster then previous platforms, so at least it won't be terrible. x265 already exeperienced issues with downlocks when they worked on AVX2 at first, which also downclocks on server CPUs, so hopefully they'll account for that and only use it when there is a real and tangible improvement to be had.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now   Reply With Quote
Old 25th July 2017, 12:14   #5472  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Do we know how much is the real difference of x264-AVX512 using a 7900X compared to x264-AVX2 version on the same CPU ?

I'm pretty sure that Threadripper 16C/32T with the same price of 7900K will eat Skylake-X for breakfast on x264, even though it has only a fast FMA AVX-128 bit implementation and not a AVX512 of course.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 25th July 2017, 12:24   #5473  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
x264 results are not that impressive.
Code:
2017-06-19 13:55:46 < BugMaster|work> I mean overall speed up vs no AVX512 on same CPU
2017-06-19 13:56:09 < Gramner> 5-10% vs avx2 on veryfast
2017-06-19 13:59:00 < BugMaster|work> and for veryslow it similar or should be faster?
2017-06-19 14:00:41 < Gramner> it goes down to +-0 at veryslow currently.
sneaker_ger is offline   Reply With Quote
Old 25th July 2017, 15:07   #5474  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
10% overall is pretty good from some improved SIMD functions. But like I said, its not done yet. There is more functions to optimize. When Gramner gets to those, he didn't say.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 25th July 2017 at 15:11.
nevcairiel is online now   Reply With Quote
Old 25th July 2017, 15:21   #5475  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Quote:
Originally Posted by nevcairiel View Post
10% overall is pretty good from some improved SIMD functions. But like I said, its not done yet. There is more functions to optimize. When Gramner gets to those, he didn't say.
NOT OVERALL! Do not bend facts! 10% max is only in veryfast preset. If you have 10+ core CPU then you most likely aim for veryslow preset for max quality. I doubt that you can get more that few percent extra speedup in those slow modes.

Last edited by Atak_Snajpera; 25th July 2017 at 15:28.
Atak_Snajpera is offline   Reply With Quote
Old 25th July 2017, 15:31   #5476  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Quote:
Originally Posted by Atak_Snajpera View Post
NOT OVERALL! Do not bend facts! 10% max is only in veryfast preset.
I never stated the opposite, clearly anyone is capable of reading one post upwards, so keep your pants on.

Its overall 10% faster in that preset, and thats still a significant speedup. These presets are still quite useful for live encoding for streaming, when the really slow ones are still too slow for realtime (and gaming at the same time, for example).
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 25th July 2017 at 15:33.
nevcairiel is online now   Reply With Quote
Old 25th July 2017, 15:33   #5477  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
It clearly says between 5 and 10%. So on average you get less than 10%. So extra ~7% more in useless veryfast preset is just a placebo for me.
Quote:
2017-06-19 13:56:09 < Gramner> 5-10% vs avx2 on veryfast
Quote:
These presets are still quite useful for live encoding for streaming, when the really slow ones are still too slow for realtime (and gaming at the same time, for example).
Who on earth buys very expensive x299 AVX-512 cpu for streaming games? For streaming cheap Ryzen 7 1700@3.8GHz is enough.

Last edited by Atak_Snajpera; 25th July 2017 at 16:06.
Atak_Snajpera is offline   Reply With Quote
Old 25th July 2017, 15:44   #5478  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Quote:
Originally Posted by nevcairiel View Post
10% overall is pretty good from some improved SIMD functions. But like I said, its not done yet. There is more functions to optimize. When Gramner gets to those, he didn't say.
I don't disagree. I actually expected you to answer when I wrote my post. 5% to 10% for free is nothing to turn up your nose at.
But doom9 folks tend to go for "veryslow or go home!"...

Last edited by sneaker_ger; 25th July 2017 at 15:46.
sneaker_ger is offline   Reply With Quote
Old 25th July 2017, 16:10   #5479  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
Quote:
Originally Posted by sneaker_ger View Post
I don't disagree. I actually expected you to answer when I wrote my post. 5% to 10% for free is nothing to turn up your nose at.
But doom9 folks tend to go for "veryslow or go home!"...
It's not 5 to 10 percent for free, you paid extra for the CPU to get that extra speed. Not only that, the extra speed is only applicable to the faster, slower processors thusly. No doubt any improvement for some functions is offset by the associated downclock. It's why a virtualised GPU based onboard SPU (Supplementary Processing Unit) replacing a large part of the ALU and FPU functions could possibly be of advantage here. Sounds pretty good for something I just made up, right?

Anyways, it's highly probably that Threadripper will still outperform Skylake-X even without AVX2 functions, and it certainly beats it on price. Considering it is costs less and faster, that 5 to 10 percent for 'free' as you put is, actually in effect costs whatever the performance vs outlay cost difference is percentage wise. So definitely NOT a free 'advantage'!
burfadel is offline   Reply With Quote
Old 25th July 2017, 16:18   #5480  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Also do not forget guys that intel's "NOT GLUED CORES TOGETHER" technology has serious problems with base clock when you add more and more cores.
Intel® Xeon® Platinum 8153 Processor (16c/32t) has base clock at only 2 GHz!
http://ark.intel.com/products/series...ble-Processors

ThreadRipper@4GHz 1950x will destroy Skylake-X even without almighty AVX-512.
Atak_Snajpera is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:38.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.