Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 10th October 2018, 20:20   #6401  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
@katzenjoghurt:

For the core functions of the x265 encoder (especially those which are most often called in tight loops), some of the following sentences may be true, depending on performance gain, development progress in different bit depths, etc.:
  • There is basic C/C++ code. It depends on the compiler options which instruction set is used. If your CPU supports it, x265 can use this code. If not, it will crash due to unsupported instructions.
  • There is hand-optimized assembler code with MMX/SSE2 optimization. If your CPU supports it, x265 can use this code. If not, x265 should use simpler code.
  • There is hand-optimized assembler code with SSSE3/SSE4 optimization. If your CPU supports it, x265 can use this code. If not, x265 should use simpler code.
  • There is hand-optimized assembler code with AVX optimization. If your CPU supports it, x265 can use this code. If not, x265 should use simpler code.
  • There is hand-optimized assembler code with AVX2 optimization. If your CPU supports it and you enable it explicitly, x265 can use this code.
Well, for any x86-64 CPU today, SSE2 should be the minimum sensible supported instruction set. But already there are small differences. I remember the Athlon 64 (AMD K8) family being a threshold of providing an SSE2 implementation which is considered relatively "fast", but despite supporting SSE3 in specs, x264 and x265 will refuse to use it.

Usually all these code variants are present in a binary of (lib)x265, except they are excluded during compilation (e.g. you may disable all assembler code paths; but why would you want that?).
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 10th October 2018 at 20:40.
LigH is offline   Reply With Quote
Old 10th October 2018, 22:35   #6402  |  Link
katzenjoghurt
Registered User
 
Join Date: Feb 2007
Posts: 128
Quote:
Originally Posted by Forteen88 View Post
Np. There are 2.9 builds there under "x265 binaries for Win64/32 — stable branch"!
OMG! You are right.
I ignored the right table as I always pick my versions from the left side.
Thanks again!

Quote:
Originally Posted by LigH View Post
Usually all these code variants are present in a binary of (lib)x265, except they are excluded during compilation (e.g. you may disable all assembler code paths; but why would you want that?).
Thanks, LigH!
I understand this as: Unless a build mentions something else all optimizations are enabled. (?)
I just was looking for some way to verify.
Some context: What led me to the question was a current version of StaxRip which contained an x265.exe and a "x265 AVX2" zip file - and I couldn't tell if the zip was just some oversight or if the "normal" x265.exe is a version without AVX2 optimiziations.
I will just replace it and be done with it... though... I was wondering how to make sure an unknown x265.exe is indeed the "right" version for my machine.

Last edited by katzenjoghurt; 10th October 2018 at 22:47.
katzenjoghurt is offline   Reply With Quote
Old 10th October 2018, 23:40   #6403  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Just a hint:

Code:
x265.exe --no-asm --version
x265 [info]: HEVC encoder version 2.9+1-169e76b6bbcc
x265 [info]: build info [Windows][GCC 8.2.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: none!
All assembler optimizations forbidden = only basic C/C++ code used.

But if all assembler optimizations are enabled (and all of them are usually linked in the encoder), it only means they are available in case your CPU supports them (which is detected at runtime). It doesn't mean all of them are used on every hardware. If you don't limit them with the --asm mask parameter, x265 detects what your CPU supports while starting, and selects the code paths with the optimal speed supported by your specific CPU.

Code:
x265.exe --version
x265 [info]: HEVC encoder version 2.9+1-169e76b6bbcc
x265 [info]: build info [Windows][GCC 8.2.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT
This x265.exe contains code paths for time-critical assembler routines for MMX+SSE2, SSSE3+SSE4, AVX, and even AVX2. But it runs on an AMD Phenom-II, so it is limited to MMX+SSE2 by the CPU auto-detection.

The very same x265.exe can use AVX or even AVX2 code if you copy it onto a PC with a CPU that supports AVX or even AVX2 and run it there.

{EDIT}If your CPU even supports AVX512, and you insist in using AVX512 instructions, then you need to enable it with an additional parameter --asm avx512 in your command line because it is a bit risky and does not always provide better performance, especially not when your CPU gets temperature throttled. And it will crash if your CPU does not support AVX512.{/EDIT}

So what are these special executables provided on a few sites? In addition to multiple code paths for time-critical assembler routines, also non-critical C/C++ routines get optimized for a modern instruction set, which limits their compatibility; these builds will not even start on older CPU's.

If I would use an x265.exe which was built with C/C++ compiler optimizations for AVX (that is probably what you read for special binaries), it would crash right at the start if run on an AMD Athlon/Phenom which doesn't support AVX, because it would use AVX instructions already for the initialization, already before the encoding even starts. But this is not a time-critical part. There is no serious need to speed up code which runs only once or a few times. – (Jedi mind powers) "This is not the build you are looking for."

Rather generic builds, like mine or Barough's or Midzuki's, are fine for a large range of PC's; builds for x86-64 are probably optimized at least for SSE2 in the code generated by the C/C++ compiler, which is the minimal widely supported instruction set of AMD64 compatible CPU's. And the selection of highly optimized assembler routines for the really time-critical parts is done in the encoder at runtime.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 11th October 2018 at 08:17.
LigH is offline   Reply With Quote
Old 11th October 2018, 01:19   #6404  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by LigH View Post
If your CPU even supports AVX2, and you insist in using AVX2 instructions, then you need to enable it with an additional parameter --asm avx2 in your command line because it is a bit risky and does not always provide better performance, especially not when your CPU gets temperature throttled. And it will crash if your CPU does not support AVX2.
I thought that AVX2 would be used automatically, but AVX512 would only be activated via --asm (which appears to be undocumented in x265.readthedocs.io)
Do I have that wrong?

AVX-512 is only useful with slower UHD resolutions, so it makes sense for it to require an opt in.

Quote:
Rather generic builds, like mine or Barough's or Midzuki's, are fine for a large range of PC's; builds for x86-64 are probably optimized at least for SSE2 in the code generated by the C/C++ compiler, which is the minimal widely supported instruction set of AMD64 compatible CPU's. And the selection of highly optimized assembler routines for the really time-critical parts is done in the encoder at runtime.
Do we have any ballpark sense for how much platform-specific compilation can help encoding performance? I've heard some speculation about ~5% but that was a while ago before the current-gen AMD and Intel processors were out.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 11th October 2018, 08:16   #6405  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Oops, my mistake ... yes, AVX2 is automatic, only AVX512 is manual.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 11th October 2018, 11:55   #6406  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Quote:
AVX-512 is only useful with slower UHD resolutions, so it makes sense for it to require an opt in.
I would like to see how useful is AVX-512 on 28 core xeon I'm expecting negative speed-up
Atak_Snajpera is offline   Reply With Quote
Old 12th October 2018, 01:15   #6407  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by Atak_Snajpera View Post
I would like to see how useful is AVX-512 on 28 core xeon I'm expecting negative speed-up
If you aren't doing Main10 UHD with a slower+ preset, you are almost certainly correct.

That said, an updated microarchitecture could potentially make AVX-512 be more generally useful. AVX2 became a lot more useful with (IIRC) Skylake's microarchitectural change which reduced thermal throttling doing AVX2, really improving throughput.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 12th October 2018, 01:57   #6408  |  Link
qyot27
...?
 
qyot27's Avatar
 
Join Date: Nov 2005
Location: Florida
Posts: 1,419
Quote:
Originally Posted by benwaggoner View Post
That said, an updated microarchitecture could potentially make AVX-512 be more generally useful. AVX2 became a lot more useful with (IIRC) Skylake's microarchitectural change which reduced thermal throttling doing AVX2, really improving throughput.
So, expect a reasonably-mature AVX-512 ca. Sapphire Rapids?
qyot27 is offline   Reply With Quote
Old 12th October 2018, 03:47   #6409  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by qyot27 View Post
So, expect a reasonably-mature AVX-512 ca. Sapphire Rapids?
Plausibly. But x265 maybe the most CPU stressful real software in the world, hitting the cores, caches, and SIMD super hard at once. Hopefully Intel is benchmarking x265 during development!

It is hard to predict the optimal performance tuning of a given CPU without actually having it, as theoretical improvements don’t always work as expected.

I’m curious if anyone has benchmarked performance improvements from arch-specific and profile-driven builds.

X265 is also pretty stressful for compilers too.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 12th October 2018, 04:04   #6410  |  Link
alex1399
Registered User
 
Join Date: Jun 2018
Posts: 56
The x265 reported an error that the file "F:\x265" is not found(maybe it shows file could not open if I recall correctly) when I use --analysis-reuse-file F:\x265 during the second pass encoding.

Great, now I couldn't reproduce this error again.
It has been five days that zeranoe ffmpeg does not release a new version with x265 2.9.

Last edited by alex1399; 12th October 2018 at 04:45.
alex1399 is offline   Reply With Quote
Old 12th October 2018, 09:58   #6411  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Quote:
Hopefully Intel is benchmarking x265 during development!
Nah... They will ask Principled Technology to do "proper" benchmarks They are very good at disabling cores before testing...

Last edited by Atak_Snajpera; 12th October 2018 at 10:01.
Atak_Snajpera is offline   Reply With Quote
Old 12th October 2018, 10:22   #6412  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Fresh build by media-autobuild_suite, GPL v3, Zeranoe-like selection.

Quote:
ffmpeg version N-92161-gf6d48b618a Copyright (c) 2000-2018 the FFmpeg developers
built with gcc {7.3.0|8.2.0} (Rev3, Built by MSYS2 project)
configuration: --disable-autodetect --enable-amf --enable-bzlib --enable-cuda --enable-cuvid --enable-d3d11va --enable-dxva2 --enable-iconv --enable-lzma --enable-nvenc --enable-zlib --enable-sdl2 --disable-debug --enable-ffnvcodec --enable-nvdec --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-fontconfig --enable-libass --enable-libbluray --enable-libfreetype --enable-libmfx --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libwavpack --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libshine --enable-gpl --enable-avisynth --enable-libxvid --enable-libaom --enable-version3 --enable-mbedtls --extra-cflags=-DLIBTWOLAME_STATIC --extra-libs=-lstdc++ --extra-cflags=-DLIBXML_STATIC --extra-libs=-liconv
libavutil 56. 19.101 / 56. 19.101
libavcodec 58. 33.100 / 58. 33.100
libavformat 58. 18.104 / 58. 18.104
libavdevice 58. 4.105 / 58. 4.105
libavfilter 7. 33.101 / 7. 33.101
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 12th October 2018, 18:56   #6413  |  Link
Forteen88
Herr
 
Join Date: Apr 2009
Location: North Europe
Posts: 556
x265 2.9+2 released now!
http://www.msystem.waw.pl/x265/
Forteen88 is offline   Reply With Quote
Old 12th October 2018, 21:59   #6414  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
^ ffmpeg contains v2.9+2.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 14th October 2018, 15:10   #6415  |  Link
DotJun
Registered User
 
Join Date: Aug 2014
Posts: 28
Is there a downside to enabling avx512 on an intel X chip?


Sent from my iPhone using Tapatalk
DotJun is offline   Reply With Quote
Old 14th October 2018, 17:12   #6416  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
As far as I remember from previous discussions...

Most of all: Temperature throttling. AVX512 can be a heavy burden.

Furthermore, switching the CPU into and out of AVX modes can be quite time consuming, which has to be considered in the optimization efforts, and it can make it less efficient for lower resolutions.

Please try to read back, and I believe a thread about AVX512 and AMD Ryzen got even separated from this generic x265 encoder thread.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 15th October 2018, 04:51   #6417  |  Link
StvG
Registered User
 
Join Date: Jul 2018
Posts: 447
Tested binaries download from here.
input.mkv - hevc (Main 10), yuv420p10le(tv), 3840x1606
AVX2 clock speed = AVX512 clock speed

Code:
ffmpeg -i input.mkv -f yuv4mpegpipe -strict -1 - | .\resources\x265-10b.exe --y4m - --ctu 32 -o .\OUTPUT.mkv

x265 [info]: HEVC encoder version 2.8+74-fd517ae68f93
x265 [info]: build info [Windows][GCC 8.2.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

encoded 498 frames in 51.34s (9.70 fps), 5044.61 kb/s, Avg QP:31.42

x265 [info]: HEVC encoder version 2.8+74-fd517ae68f93
x265 [info]: build info [Windows][MSVC 1915][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

encoded 498 frames in 51.80s (9.61 fps), 5044.61 kb/s, Avg QP:31.42
Code:
ffmpeg -i input.mkv -f yuv4mpegpipe -strict -1 - | .\resources\x265-10b.exe --y4m - --ctu 32 -o .\OUTPUT.mkv

x265 [info]: HEVC encoder version 2.9+2-7e978ed93d60
x265 [info]: build info [Windows][GCC 8.2.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

encoded 498 frames in 52.82s (9.43 fps), 5044.61 kb/s, Avg QP:31.42

x265 [info]: HEVC encoder version 2.9+2-7e978ed93d60
x265 [info]: build info [Windows][MSVC 1915][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

encoded 498 frames in 51.55s (9.66 fps), 5044.61 kb/s, Avg QP:31.42
Code:
ffmpeg -i input.mkv -f yuv4mpegpipe -strict -1 - | .\resources\x265-10b.exe --y4m - --ctu 32 -o .\OUTPUT.mkv

VS 2017 Generic compilation ("none")

encoded 498 frames in 51.49s (9.67 fps), 5044.61 kb/s, Avg QP:31.42

VS 2017 AVX2 compilation ("AVX2")

encoded 498 frames in 52.27s (9.53 fps), 5044.61 kb/s, Avg QP:31.42
Code:
ffmpeg -i input.mkv -f yuv4mpegpipe -strict -1 - | .\resources\x265-10b.exe --y4m - --ctu 32 (--asm avx512) -o .\OUTPUT.mkv

x265 [info]: HEVC encoder version 2.9+2-7e978ed93d60
x265 [info]: build info [Windows][MSVC 1915][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

encoded 498 frames in 52.05s (9.57 fps), 5044.61 kb/s, Avg QP:31.42

x265 [info]: HEVC encoder version 2.9+2-7e978ed93d60
x265 [info]: build info [Windows][MSVC 1915][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512

encoded 498 frames in 50.79s (9.80 fps), 5044.61 kb/s, Avg QP:31.42
StvG is offline   Reply With Quote
Old 15th October 2018, 07:21   #6418  |  Link
DotJun
Registered User
 
Join Date: Aug 2014
Posts: 28
I tried a short test clip with avx512 enabled and disabled on a 4K source using the slower preset. FPS went up to 1.37 from 0.84 when I enabled 512.

Encoded clip looks good, no obvious errors that is. File size is roughly the same, but clip length and crf might have something to do with the tiny difference between the two.

64bit x265 on an intel 7820x. Temps are roughly equal to when 512 is disabled. Load is mostly at 100% on all cores with the occasional dip down to 87% every minute or so.
DotJun is offline   Reply With Quote
Old 15th October 2018, 07:30   #6419  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
So it appears to be efficient on your specific CPU model.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 15th October 2018, 12:16   #6420  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Quote:
Originally Posted by DotJun View Post
I tried a short test clip with avx512 enabled and disabled on a 4K source using the slower preset. FPS went up to 1.37 from 0.84 when I enabled 512.

Encoded clip looks good, no obvious errors that is. File size is roughly the same, but clip length and crf might have something to do with the tiny difference between the two.

64bit x265 on an intel 7820x. Temps are roughly equal to when 512 is disabled. Load is mostly at 100% on all cores with the occasional dip down to 87% every minute or so.
You should encode whole movie (130k frames) instead of ultra short clip with few hundred of frames.
The longer you encode the more heat your cpu will produce and hence more aggressive AVX negative offset will be activated.
Atak_Snajpera is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:39.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.