x264 development - Page 105

sneaker_ger · 11th June 2017, 19:36

Hmm, yes. So "only" "old" decoders gonna break. Personally, I don't think the compression gain is worth it (and it also comes at a speed cost).

Since the changelog mentions some mixed lossy/lossless mode: is that something new/yet to come?

MasterNobody · 11th June 2017, 19:49

First of all it is not decided yet when "Remove compatibility workarounds" will be pushed (but it probably would be after avcodec will be able to decode them). And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI). Same for the 4:4:4 decoding.

P.S. Imho default (i.e. without SEI) decoding in avcodec should be according to specs. But that is debatable.

sneaker_ger · 11th June 2017, 19:53

For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?

MasterNobody · 11th June 2017, 20:03

Quote:

Originally Posted by sneaker_ger

For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?

1) --patritioins only influence inter-frames analyse
2) inter-macroblocks also can use 8x8dct transform.
To be compatible with current decoders you will need --no-8x8dct (but it wouldn't exactly match current behavior).

sneaker_ger · 11th June 2017, 20:05

Quote:

Originally Posted by MasterNobody

but it wouldn't exactly match current behavior

Why not?

MasterNobody · 11th June 2017, 20:17

Quote:

Originally Posted by sneaker_ger

Why not?

1) Because currently 8x8dct is allowed in inter-macroblocks of lossless encoding and only disabled for intra-blocks (disabled i8x8 in intra/inter frames).
2) 4:4:4 encoding currently is out of specs with cabac+8x8dct and you wouldn't be able to return to out of specs behavior.

sneaker_ger · 11th June 2017, 20:21

I see. Thx.

Quote:

Originally Posted by MasterNobody

And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI).

On the other hand, 2014 ~ 2017 ffmpeg will break if you don't remove the SEI. Fun times.

LoRd_MuldeR · 11th June 2017, 20:29

IMO it is much better to break old libavcodec/ffmpeg once (and disable the compatibility workarounds in new libavcodec/ffmpeg), instead of continuing to produce out-of-spec streams for ever and ever.

What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...

MasterNobody · 11th June 2017, 20:52

Quote:

Originally Posted by LoRd_MuldeR

What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...

Wrong. Currently lossless produce correct streams with out of spec feature disabled 3 years ago (commit)
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct.

LoRd_MuldeR · 11th June 2017, 21:03

Quote:

Originally Posted by MasterNobody

Wrong. Currently lossless produce correct streams with out of spec feature disabled 3 years ago (commit)
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct.

Thanks for clarification.

Anyway, I think it's safe to assume that disabling the "out of spec" features in lossless mode costs some compression efficiency. So it's preferable to finally have it fixed and re-enabled.

sneaker_ger · 11th June 2017, 21:41

Efficiency loss in lossless mode is 1%, if that.

asarian · 19th December 2018, 13:14

(Maybe I should post this here instead)

Hmm, just tried the latest x264, with 10-bit encoding:

x264 [warning]: OpenCL: not compiled with OpenCL support, disabling

That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?

LoRd_MuldeR · 19th December 2018, 13:23

Quote:

Originally Posted by asarian

That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?

Let's have a look at x264 code (current master):

Code:

int validate_parameters( x264_t *h, int b_open )
{
        [...]

#if !HAVE_OPENCL
        x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" );
        h->param.b_opencl = 0;
#elif BIT_DEPTH > 8
        x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" );
        h->param.b_opencl = 0;
#else
        if( h->param.i_width < 32 || h->param.i_height < 32 )
        {
            x264_log( h, X264_LOG_WARNING, "OpenCL: frame size is too small, disabling opencl\n" );
            h->param.b_opencl = 0;
        }
#endif

        [...]
}

So, no OpenCL support for bit-depths greater than 8-Bit, or for very small frames.

I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details).

asarian · 19th December 2018, 13:30

^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?

asarian · 19th December 2018, 13:37

Quote:

Originally Posted by LoRd_MuldeR

So, no OpenCL support for bit-depths greater than 8-Bit, or for very small frames.

I'd assume that's either because nobody ever bothered porting the OpenCL code to "high bit-depth". Or it's because of the fact that GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). Or a combination of both reasons.

Sorry, I had missed that part of your post. Good explanation. Makes sense. Thanks.

LoRd_MuldeR · 19th December 2018, 14:02

Quote:

Originally Posted by asarian

^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?

Pre-processor macros like BIT_DEPTH or HAVE_OPENCL are set at compile-time, not run-time.

Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding.

Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for the "8-Bit" and "10-Bit" paths.

Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense considering that we know beforehand that OpenCL is for 8-Bit only.

(The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation)

[UPDATE]

Indeed, HAVE_OPENCL is not simply defined as "0" or "1". It is actually defined as "(BIT_DEPTH == 8)", when building x264 with OpenCL support enabled; would probably be defined to "0" otherwise.

So, it may actually be preferable to change the code to:

Code:

#if !HAVE_OPENCL
#if BIT_DEPTH > 8
        x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" );
#else
        x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" );
#endif
        h->param.b_opencl = 0;
#else
       [...]

(But that's nitpicking, I suppose)

asarian · 19th December 2018, 14:12

Quote:

Originally Posted by LoRd_MuldeR

Pre-processor macros like BIT_DEPTH or HAVE_OPENCL are set at compile-time, not run-time.

Doh on me!

Quote:

Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding.

Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for "8-Bit" and "10-Bit" .

Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense, because we know beforehand that OpenCL is for 8-Bit only.

(The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation)

As usual, thanks for the deep insight.

hydra3333 · 20th December 2018, 02:00

FranceBB · 20th December 2018, 08:24

Quote:

Originally Posted by LoRd_MuldeR

I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details).

Very interesting article indeed.
NVIDIA performs better in Single-Precision Floating Point 16 and 32, but less for 64 on consumer-grade GPUs 'cause there aren't as many 64-capable units as the 32 ones, while AMD consumer-grade GPUs have better 64 performance due to more 64 capable units at the expense of the 32 and 16 ones.
However, on an enterprise level, NVIDIA has better performance on both Single-Precision Floating Point 32 and 64 then AMD has.
An interesting thing is that NVIDIA GPUs have 32-capable units that are also 16-bit capable, therefore not wasting space on 16-bit capable units.
The White Paper at page 12 says "One new capability that has been added [...] is the ability to process both 16-bit and 32-bit precision instructions and data, as described later in this paper. FP16 operation throughput is up to twice FP32 operation throughput". Page 14: " Using FP16 computation improves
performance up to 2x compared to FP32 arithmetic, and similarly FP16 data transfers take less time than FP32 or FP64 transfers."

LigH · 20th December 2018, 10:12

Just a side note ... the mentioned precision may be convenient for video processing; but there are applications which would gain severe speed boost from GPGPU parallelization if it just had the required precision for their demands (like astronomical multi body simulations, see Universe Sandbox forums: PhysX had to be rejected, OpenCL is only partially used).

11th June 2017, 19:36	#2081 \| Link
sneaker_ger Registered User Join Date: Dec 2002 Posts: 5,565	Hmm, yes. So "only" "old" decoders gonna break. Personally, I don't think the compression gain is worth it (and it also comes at a speed cost). Since the changelog mentions some mixed lossy/lossless mode: is that something new/yet to come? Last edited by sneaker_ger; 11th June 2017 at 19:51. Reason: I didn't think it through.

11th June 2017, 19:49	#2082 \| Link
MasterNobody Registered User Join Date: Jul 2007 Posts: 552	First of all it is not decided yet when "Remove compatibility workarounds" will be pushed (but it probably would be after avcodec will be able to decode them). And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI). Same for the 4:4:4 decoding. P.S. Imho default (i.e. without SEI) decoding in avcodec should be according to specs. But that is debatable. Last edited by MasterNobody; 11th June 2017 at 19:51.

11th June 2017, 20:29	#2088 \| Link
LoRd_MuldeR Software Developer Join Date: Jun 2005 Location: Last House on Slunk Street Posts: 13,248	IMO it is much better to break old libavcodec/ffmpeg once (and disable the compatibility workarounds in new libavcodec/ffmpeg), instead of continuing to produce out-of-spec streams for ever and ever. What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg... __________________ Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 11th June 2017 at 20:32.

19th December 2018, 13:14	#2092 \| Link
asarian Registered User Join Date: May 2005 Posts: 1,462	(Maybe I should post this here instead) Hmm, just tried the latest x264, with 10-bit encoding: x264 [warning]: OpenCL: not compiled with OpenCL support, disabling That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings? __________________ Gorgeous, delicious, deculture!

19th December 2018, 13:30	#2094 \| Link
asarian Registered User Join Date: May 2005 Posts: 1,462	^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right? __________________ Gorgeous, delicious, deculture!

11th June 2017, 19:53	#2083 \| Link
sneaker_ger Registered User Join Date: Dec 2002 Posts: 5,565	For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?

11th June 2017, 21:41	#2091 \| Link
sneaker_ger Registered User Join Date: Dec 2002 Posts: 5,565	Efficiency loss in lossless mode is 1%, if that.

20th December 2018, 02:00	#2098 \| Link
hydra3333 Registered User Join Date: Oct 2009 Location: crow-land Posts: 540

20th December 2018, 10:12	#2100 \| Link
LigH German doom9/Gleitz SuMo Join Date: Oct 2001 Location: Germany, rural Altmark Posts: 6,783	Just a side note ... the mentioned precision may be convenient for video processing; but there are applications which would gain severe speed boost from GPGPU parallelization if it just had the required precision for their demands (like astronomical multi body simulations, see Universe Sandbox forums: PhysX had to be rejected, OpenCL is only partially used). __________________ New German Gleitz board MediaFire: x264 \| x265 \| VPx \| AOM \| Xvid