View Single Post
Old 30th April 2018, 18:22   #38  |  Link
hello_hello
Registered User
 
Join Date: Mar 2011
Posts: 4,823
Quote:
Originally Posted by mkver View Post
I somehow forgot about this thread, but now that I see hello_hellos results I see a system behind it: If ffmpeg downmixes to 32b float, it uses the one set of coefficients, if it downmixes to pcm it uses the other set of coefficients. The most direct way to test this is by not using the aac or libmp3lame encoders, but -c:a pcm_s16le, pcm_s24le, pcm_f32le. And if one does it, the results confirm the preceding statement. The aac encoder probably accepts float input only (or prefers it) so that the auto_resampler resamples everything to float, whereas libmp3lame seems to accept everything so that downmixing in this case only changes the number of channels, not the sample format.
But does anybody have a definitive answer why ffmpeg uses different coefficients for float and non-float? In both cases the ratios coincide: 1/0.707107 and 0.414214/0.292893 are both very good approximations for 2^(0.5) which is probably the value that they are supposed to be. Probably it is because one doesn't need to care about clipping that much if one uses float output.
Personally, I think the way ffmpeg changes the coefficients is somewhat strange. I don't think it's unreasonable to expect the same command line would always produce the same result (aside from the output format).

The coefficients might relate to the ability of the output format/codec to encode peaks above 0dB. I can't speak for ffmpeg's AAC encoder, but QAAC doesn't clip peaks when the input is float. I vaguely remember someone in another forum testing it and QAAC coped with peaks well over 0dB.

I'm not sure about the LAME encoder. The stand-alone version accepts 32 bit float but downsamples to 24 bit integer before encoding, so peaks greater than 0dB would be clipped.
ffmeg's built-in LAME encoder seems to accept 32 bit float though. I've compared it to the standalone LAME, and ffmpeg's LAME encodes peaks above 0dB, at least according to a true peak scan of the encoded audio. Whether those peaks are encoded accurately.... that might be a different story.

Last edited by hello_hello; 30th April 2018 at 19:07.
hello_hello is offline   Reply With Quote