Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
11th June 2017, 19:36 | #2081 | Link |
Registered User
Join Date: Dec 2002
Posts: 5,565
|
Hmm, yes. So "only" "old" decoders gonna break. Personally, I don't think the compression gain is worth it (and it also comes at a speed cost).
Since the changelog mentions some mixed lossy/lossless mode: is that something new/yet to come? Last edited by sneaker_ger; 11th June 2017 at 19:51. Reason: I didn't think it through. |
11th June 2017, 19:49 | #2082 | Link |
Registered User
Join Date: Jul 2007
Posts: 552
|
First of all it is not decided yet when "Remove compatibility workarounds" will be pushed (but it probably would be after avcodec will be able to decode them). And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI). Same for the 4:4:4 decoding.
P.S. Imho default (i.e. without SEI) decoding in avcodec should be according to specs. But that is debatable. Last edited by MasterNobody; 11th June 2017 at 19:51. |
11th June 2017, 20:03 | #2084 | Link | |
Registered User
Join Date: Jul 2007
Posts: 552
|
Quote:
2) inter-macroblocks also can use 8x8dct transform. To be compatible with current decoders you will need --no-8x8dct (but it wouldn't exactly match current behavior). |
|
11th June 2017, 20:17 | #2086 | Link |
Registered User
Join Date: Jul 2007
Posts: 552
|
1) Because currently 8x8dct is allowed in inter-macroblocks of lossless encoding and only disabled for intra-blocks (disabled i8x8 in intra/inter frames).
2) 4:4:4 encoding currently is out of specs with cabac+8x8dct and you wouldn't be able to return to out of specs behavior. |
11th June 2017, 20:29 | #2088 | Link |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
IMO it is much better to break old libavcodec/ffmpeg once (and disable the compatibility workarounds in new libavcodec/ffmpeg), instead of continuing to produce out-of-spec streams for ever and ever.
What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 11th June 2017 at 20:32. |
11th June 2017, 20:52 | #2089 | Link | |
Registered User
Join Date: Jul 2007
Posts: 552
|
Quote:
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct. |
|
11th June 2017, 21:03 | #2090 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
Anyway, I think it's safe to assume that disabling the "out of spec" features in lossless mode costs some compression efficiency. So it's preferable to finally have it fixed and re-enabled.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ |
|
19th December 2018, 13:14 | #2092 | Link |
Registered User
Join Date: May 2005
Posts: 1,462
|
(Maybe I should post this here instead)
Hmm, just tried the latest x264, with 10-bit encoding: x264 [warning]: OpenCL: not compiled with OpenCL support, disabling That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?
__________________
Gorgeous, delicious, deculture! |
19th December 2018, 13:23 | #2093 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
Code:
int validate_parameters( x264_t *h, int b_open ) { [...] #if !HAVE_OPENCL x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" ); h->param.b_opencl = 0; #elif BIT_DEPTH > 8 x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" ); h->param.b_opencl = 0; #else if( h->param.i_width < 32 || h->param.i_height < 32 ) { x264_log( h, X264_LOG_WARNING, "OpenCL: frame size is too small, disabling opencl\n" ); h->param.b_opencl = 0; } #endif [...] } I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details).
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 19th December 2018 at 13:39. |
|
19th December 2018, 13:30 | #2094 | Link |
Registered User
Join Date: May 2005
Posts: 1,462
|
^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?
__________________
Gorgeous, delicious, deculture! |
19th December 2018, 13:37 | #2095 | Link | |
Registered User
Join Date: May 2005
Posts: 1,462
|
Quote:
__________________
Gorgeous, delicious, deculture! |
|
19th December 2018, 14:02 | #2096 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding. Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for the "8-Bit" and "10-Bit" paths. Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense considering that we know beforehand that OpenCL is for 8-Bit only. (The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation) [UPDATE] Indeed, HAVE_OPENCL is not simply defined as "0" or "1". It is actually defined as "(BIT_DEPTH == 8)", when building x264 with OpenCL support enabled; would probably be defined to "0" otherwise. So, it may actually be preferable to change the code to: Code:
#if !HAVE_OPENCL #if BIT_DEPTH > 8 x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" ); #else x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" ); #endif h->param.b_opencl = 0; #else [...]
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 19th December 2018 at 15:08. |
|
19th December 2018, 14:12 | #2097 | Link | ||
Registered User
Join Date: May 2005
Posts: 1,462
|
Quote:
Quote:
__________________
Gorgeous, delicious, deculture! |
||
20th December 2018, 08:24 | #2099 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
|
Quote:
NVIDIA performs better in Single-Precision Floating Point 16 and 32, but less for 64 on consumer-grade GPUs 'cause there aren't as many 64-capable units as the 32 ones, while AMD consumer-grade GPUs have better 64 performance due to more 64 capable units at the expense of the 32 and 16 ones. However, on an enterprise level, NVIDIA has better performance on both Single-Precision Floating Point 32 and 64 then AMD has. An interesting thing is that NVIDIA GPUs have 32-capable units that are also 16-bit capable, therefore not wasting space on 16-bit capable units. The White Paper at page 12 says "One new capability that has been added [...] is the ability to process both 16-bit and 32-bit precision instructions and data, as described later in this paper. FP16 operation throughput is up to twice FP32 operation throughput". Page 14: " Using FP16 computation improves performance up to 2x compared to FP32 arithmetic, and similarly FP16 data transfers take less time than FP32 or FP64 transfers." |
|
20th December 2018, 10:12 | #2100 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,783
|
Just a side note ... the mentioned precision may be convenient for video processing; but there are applications which would gain severe speed boost from GPGPU parallelization if it just had the required precision for their demands (like astronomical multi body simulations, see Universe Sandbox forums: PhysX had to be rejected, OpenCL is only partially used).
|
Tags |
coding, development, x264 dev |
|
|