Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 20th February 2019, 09:03   #6741  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 5,822
Nope, already answered that he simply uses media-autobuild suite (see:https://forum.doom9.org/showthread.p...45#post1866145), so no profiling.
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 20th February 2019, 09:31   #6742  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,531
Some build log could be useful, maybe there is something missing. I'd expect that if assembler was not used, the difference would be much bigger though.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 20th February 2019, 20:24   #6743  |  Link
poller
Registered User
 
Join Date: Sep 2018
Posts: 10
i tried hard with GCC again.

seconds. lower is better.

Code:
137.0 no assembly
 47.0 (default)
 45.5 (PGO build) -mtune=ivybridge (default here is -O3 which makes 1st pass PGO .exe crash, thus no better speed i guess)
 44.5 (PGO build) -mtune=ivybridge -O2
 43.9 (PGO build) -mtune=ivybridge -funroll-loops -finline-functions -ftree-loop-vectorize -O2
 39.5 LigH
so i get little improvement with all that fiddling, but still far away from LigH's GCC builds.


giving up here, i have no ideas left.

Last edited by poller; 20th February 2019 at 20:48.
poller is offline   Reply With Quote
Old 20th February 2019, 23:14   #6744  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,829
OK, I forgot little details I edited a long time ago, while testing some compiling issues with a faulty compiler version. A leftover string is:

export CXXFLAGS="-march=pentium4 -mtune=generic"

for the 32-bit compilation (which is still quite generic, just a sensible minimum). That might bring a little advantage. For the 64-bit compilation, the CXXFLAGS is empty.

Furthermore, for the 32-bit compilation, assembly is disabled for 10 and 12 bit precision cores, but enabled for the 8 bit core.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 20th February 2019 at 23:17.
LigH is offline   Reply With Quote
Old 21st February 2019, 09:39   #6745  |  Link
WhatZit
Registered User
 
Join Date: Aug 2016
Posts: 59
Quote:
Originally Posted by Wolfberry View Post
Supply --svt in the command line to use the SVT-HEVC encoder.
Never expected that one!

From http://x265.org/x265-svt-hevc-house/:

Quote:
With changeset a41325fc854f, the x265 library can invoke the SVT-HEVC library for encoding through the —svt option. We have mapped presets and command-line options supported by the x265 application into the equivalent options of SVT-HEVC, and have added a few specific options that are available only when the SVT-HEVC library is invoked. This page in our documentation describes the steps to build, and invoke the SVT-HEVC library in more detail.

Our reason for this integration was to enable our users to evaluate additional relative trade-offs between performance and compression efficiency while working behind the familiar API of the x265 library. In the long term, we plan to leverage this integration to further improve x265’s ability to handle real-time and low turn-around scenarios in pure software; this is the space that SVT-HEVC was focused on. In parallel, we will continue to innovate on our flagship presets that are used in offline encoding where x265 dominates. You can expect to see these changes in the coming releases of x265, increasing the reach of open-source for video compression!
Am I being cynical to suggest that Multicoreware couldn't achieve such speed optimisations on their own, so they formed this "synergy"?
WhatZit is offline   Reply With Quote
Old 21st February 2019, 09:43   #6746  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,712
Personally I think its stupid to incorporate another encoder into the x265 "frontend". If one wanted to use different encoders, one would use say ffmpeg, or just use them directly. x265 should be x265, and nothing else. But oh well. Probably some business driving over common sense.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 21st February 2019, 10:27   #6747  |  Link
shinchiro
Registered User
 
Join Date: Feb 2012
Posts: 46
Lol, good to know I'm not the only one who thinks x265's decision to include another encoder inside itself is stupid. Well, if there's money involved here, I'm not even surprised
shinchiro is offline   Reply With Quote
Old 21st February 2019, 20:32   #6748  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,837
Quote:
Originally Posted by shinchiro View Post
Lol, good to know I'm not the only one who thinks x265's decision to include another encoder inside itself is stupid. Well, if there's money involved here, I'm not even surprised
From the link, it sounds like the big plan is to start incorporating use of certain SVT-HEVC features/tools within x265. Having a highly accelerated coarse motion search mode could help. Kind of like the OpenGL/CUDA experiments with x264 a while ago.

x265 has a TON of features where it can take input from a first pass and then refine it. Some of those don't require the stream be made with x265, and a few work with H.264 sources IIRC.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 22nd February 2019, 08:57   #6749  |  Link
Forteen88
Herr
 
Join Date: Apr 2009
Location: North Europe
Posts: 360
I just wanted to say that I did a little x265 speed-test, one compile vs another,
x265-3.0_Au+7-cb3e172_vs2017-AVX2 (msystem) vs x265-v3.0_Au+7-cb3e172a5f51-SVT-win64 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit].

I encoded a 44 second long cartoon animation, 00096.m2ts, with this setting:
x265.exe --crf 18 --preset veryslow --output-depth 10 --rdoq-level 0 --psy-rdoq 0 --aq-mode 1 --aq-strength 0.4 --qcomp 0.65 --bframes 16 --rc-lookahead 48 --ref 6 --min-keyint 24 --keyint 240 --frame-threads 1 --colormatrix bt709 --deblock -2:-2 --no-sao --psy-rd 0.4 --tskip --tskip-fast --tu-inter 4 --tu-intra 4 --frames 1066


x265-3.0_Au+7-cb3e172_vs2017-AVX2 (msystem) Duration: 00:53:41
x265-v3.0_Au+7-cb3e172a5f51-SVT-win64 [ICC 1900][MSVC 1916 Multilib][SVT][64 bit] Duration: 00:53:32

Not a big difference in speed, considering I have a Intel Core i5-5200U CPU (I thought that the ICC 1900-compile would be much faster).

EDIT: By "much faster", I meant much faster than this encode was, I meant like 10% faster than the non-ICC compile.

Last edited by Forteen88; 25th February 2019 at 10:17. Reason: clarification
Forteen88 is offline   Reply With Quote
Old 22nd February 2019, 11:13   #6750  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 5,822
Quote:
I thought that the ICC 1900-compile would be much faster
to be frank I would have been surprised using a different compiler to have much of an impact,...
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 22nd February 2019, 14:29   #6751  |  Link
poller
Registered User
 
Join Date: Sep 2018
Posts: 10
Quote:
Originally Posted by LigH View Post
OK, I forgot little details I edited a long time ago, while testing some compiling issues with a faulty compiler version. A leftover string is:

export CXXFLAGS="-march=pentium4 -mtune=generic"

for the 32-bit compilation (which is still quite generic, just a sensible minimum). That might bring a little advantage. For the 64-bit compilation, the CXXFLAGS is empty.

Furthermore, for the 32-bit compilation, assembly is disabled for 10 and 12 bit precision cores, but enabled for the 8 bit core.
well, here not even -march=corei7 did help much.
assembly needs to be disabled for x86 high bit, it does not compile when enabled.


Quote:
Not a big difference in speed, considering I have a Intel Core i5-5200U CPU (I thought that the ICC 1900-compile would be much faster).
the same here, actually, all x64 builds (from the net) i tested are pretty much on the same level, my own builds included and also the ICC compile.

but i see differences in the x86 builds. but honestly, not many people will use those anyway.

Last edited by poller; 22nd February 2019 at 14:32.
poller is offline   Reply With Quote
Old 22nd February 2019, 16:51   #6752  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,829
One more build to compare, with two variants:

x265 3.0_Au+7-cb3e172a5f51 MABS compiled with media-autobuild_suite only (EXE only, no DLL)

x265 3.0_Au+7-cb3e172a5f51 compiled with custom build scripts to obtain libx265.dll too, running in interactive MinGW32 / MinGW64 shells
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 22nd February 2019, 21:05   #6753  |  Link
poller
Registered User
 
Join Date: Sep 2018
Posts: 10
nice, some small test:

x265_3.0_RC+14-46b84ff665fd
20.5 seconds
Code:
cpuid=1049583 / frame-threads=3 /                wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / qp-adaptation-range=1.00
x265_3.0_Au+7-cb3e172a5f51
20.5 seconds
Code:
cpuid=1049583 / frame-threads=3 /                wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / qp-adaptation-range=1.00
x265_3.0_Au+7-cb3e172a5f51_MABS
23.3 seconds
Code:
cpuid=1049583 / frame-threads=3 / numa-pools=8 / wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / no-svt / qp-adaptation-range=1.00
my own build
22.6 seconds
Code:
cpuid=1049583 / frame-threads=3 /                wpp / no-pmode / no-pme / no-psnr / no-ssim / log-level=2 / input-csp=1 / input-res=352x288 / interlace=0 / total-frames=2101 / level-idc=0 / high-tier=1 / uhd-bd=0 / ref=3 / no-allow-non-conformance / no-repeat-headers / annexb / no-aud / no-hrd / info / hash=0 / no-temporal-layers / open-gop / min-keyint=25 / keyint=250 / gop-lookahead=0 / bframes=4 / b-adapt=0 / b-pyramid / bframe-bias=0 / rc-lookahead=15 / lookahead-slices=0 / scenecut=40 / radl=0 / no-splice / no-intra-refresh / ctu=64 / min-cu-size=8 / no-rect / no-amp / max-tu-size=32 / tu-inter-depth=1 / tu-intra-depth=1 / limit-tu=0 / rdoq-level=0 / dynamic-rd=0.00 / no-ssim-rd / signhide / no-tskip / nr-intra=0 / nr-inter=0 / no-constrained-intra / strong-intra-smoothing / max-merge=2 / limit-refs=3 / no-limit-modes / me=1 / subme=2 / merange=57 / temporal-mvp / weightp / no-weightb / no-analyze-src-pics / deblock=0:0 / sao / no-sao-non-deblock / rd=2 / no-early-skip / rskip / fast-intra / no-tskip-fast / no-cu-lossless / no-b-intra / no-splitrd-skip / rdpenalty=0 / psy-rd=2.00 / psy-rdoq=0.00 / no-rd-refine / no-lossless / cbqpoffs=0 / crqpoffs=0 / rc=crf / crf=21.0 / qcomp=0.60 / qpstep=4 / stats-write=0 / stats-read=0 / ipratio=1.40 / pbratio=1.30 / aq-mode=2 / aq-strength=1.00 / cutree / zone-count=0 / no-strict-cbr / qg-size=32 / no-rc-grain / qpmax=69 / qpmin=0 / no-const-vbv / sar=255 / sar-width / : / sar-height=128:117 / overscan=0 / videoformat=5 / range=0 / colorprim=2 / transfer=2 / colormatrix=2 / chromaloc=0 / display-window=0 / max-cll=0,0 / min-luma=0 / max-luma=255 / log2-max-poc-lsb=8 / vui-timing-info / vui-hrd-info / slices=1 / no-opt-qp-pps / no-opt-ref-list-length-pps / no-multi-pass-opt-rps / scenecut-bias=0.05 / no-opt-cu-delta-qp / no-aq-motion / no-hdr / no-hdr-opt / no-dhdr10-opt / no-idr-recovery-sei / analysis-reuse-level=5 / scale-factor=0 / refine-intra=0 / refine-inter=0 / refine-mv=0 / refine-ctu-distortion=0 / no-limit-sao / ctu-info=0 / no-lowpass-dct / refine-analysis-type=0 / copy-pic=1 / max-ausize-factor=1.0 / no-dynamic-refine / no-single-sei / no-hevc-aq / qp-adaptation-range=1.00

the MABS build has some additional setting (numa-pools=8) but that did not affect the performance.

this was tested on a i7-3770k
poller is offline   Reply With Quote
Old 22nd February 2019, 22:51   #6754  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,829
What you may not find here are default GNU C/C++ compiler options.

Please note that MABS scripts may set up some specific CFLAGS and CXXFLAGS (e.g. O2 or O3?). The interactive MinGW consoles should not ... so GCC / G++ defaults may apply. Except for the 32-bit build where I explicitly set CXXFLAGS with pretty generic options suitable for 32-bit code on any AMD64 capable CPU, minimally (see above).

I have no clue what I may do "right".
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 22nd February 2019, 23:32   #6755  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,837
Quote:
Originally Posted by Selur View Post
to be frank I would have been surprised using a different compiler to have much of an impact,...
It seems we've seen compilers make about a 10% difference from slowest to fastest. Which is kinda surprising to me given all the hand-tuned assembly that doesn't get compiled.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 22nd February 2019, 23:38   #6756  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,829
With this amount, the only reason I could imagine is memory alignment...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 23rd February 2019, 00:17   #6757  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 556
Since everyone was concerned about x64 platforms and nobody used x86, I tested it on a real x86 platform running Windows Server 2003 x86 with PAE and 16 GB of RAM.
The CPU is an old, dusty Intel Xeon 4c/8th running at 2.60GHz with instruction sets up to SSE4.2:

4/N.A) - x265 3.0_Au+7 - MABS compiled by LigH with media-autobuild_suite only (EXE only, no DLL)

It didn't even start. It refused to start due to missing kernel calls: GetNumaNodeProcessorMaskEx, InitializeConditionVariable, SetThreadGroupAffinity, SleepConditionVariableCS, WakeAllConditionVariable
No luck on Windows Server 2003, so it won't run on XP and its derivatives either.

3) - x265 3.0_Au+7 - compiled by LigH with custom build scripts to obtain libx265.dll too, running in interactive MinGW32 / MinGW64 shells

3.7fps/3.9fps

2) - x265 3.0_Au+7 - compiled with GCC9 (Preview) target SSE4.2

4.2fps/4.3fps

1) - x265 3.0_Au+7 - compiled with GCC8 target SSE4.2

4.7fps/4.8fps

Very basic low-complex Command line:
x265.exe --y4m - --dither --preset medium --level 5.0 --tune fastdecode --no-high-tier --ref 2 --rc-lookahead 3 -b 2 --profile main10 --bitrate 25000 --deblock -4:-4 --min-luma 64 --max-luma 940 --chromaloc 2 --range limited --videoformat component --colorprim bt709 --transfer bt709 --colormatrix bt709 --overscan show --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3 --vbv-maxrate 25000 --vbv-bufsize 25000 --asm=sse4.2 --wpp -o "\\VBOXSVR\Share_Windows_Linux\raw_video.hevc"

Lossless 16bit SD (UHD SDR downscaled) footage.


Anyway, I don't think the comparison is fair, 'cause LigH targeted pentium4, which means only SSE2 are supported.
In other words, I'm comparing SSE4.2 vs SSE2 and it's pretty clear that SSE4.2 have an advantage over SSE2.
As to GCC9, it seems that they changed something in the way -mtune behaves or maybe they changed something else; anyway, it produces an SSE4.2 build slower than the GCC8 SSE4.2 one.
It would be interesting to find out how ICC targeting SSE4.2 behaves on old Intel x86 systems (if Intel Parallel Studio can produce a Windows Server 2003 compatible binary).
__________________
Broadcast Encoder
LinkedIn

Last edited by FranceBB; 27th February 2019 at 04:36.
FranceBB is offline   Reply With Quote
Old 24th February 2019, 10:09   #6758  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,531
Has anyone else noticed how the bitrate shown during the encoding phase and the final bitrate differ from each other quite a lot sometimes? Yesterday I happened to be watching an 70000-frame encode finish and at the last frames, the average bitrate was ~6800 kbps. When the encode finished, the final bitrate was suddenly over 7100 kbps.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 24th February 2019, 11:42   #6759  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 101
@FranceBB I made a x86 test binary targeting SSE4.2 (only 8 bit).

You can test if it works or not, I also compiled ffmpeg with libvmaf, and you can test that as well.
__________________
Monochrome Anomaly

Last edited by Wolfberry; 27th February 2019 at 08:48.
Wolfberry is offline   Reply With Quote
Old 25th February 2019, 03:28   #6760  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 556
Quote:
Originally Posted by Wolfberry View Post
@FranceBB I made a x86 test binary targeting SSE4.2 (only 8 bit).

You can test if it works or not, I also compiled ffmpeg with libvmaf, and you can test that as well.
It's weird.
Dependency Walker didn't find any issue with the installer, but when I try to run it it says that's not a valid x86 application.
Are you sure that you targeted SSE4.2? Out of curiosity, can you try with 4.1 and one without any assembly optimisation?
The CPU is fully capable of handling SSE4.2, so I don't understand.
The OS is Windows Server 2003 x86 with PAE Enabled and 16GB of RAM.
__________________
Broadcast Encoder
LinkedIn
FranceBB is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:21.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.