Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
9th February 2021, 21:14 | #1 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
Increase max_volume with AmplifyDB increases not enough
I am trying to amplify a sound track and bring it's max to around 0 dB.
For this, I analyzed the track with FFMPEG which gave this: Code:
= [Parsed_volumedetect_0 @ 000001dd62a2d700] mean_volume: -15.2 dB = [Parsed_volumedetect_0 @ 000001dd62a2d700] max_volume: -0.7 dB Code:
AmplifyDb(0.5) Code:
= [Parsed_volumedetect_0 @ 000001bd83c8d700] mean_volume: -14.7 dB = [Parsed_volumedetect_0 @ 000001bd83c8d700] max_volume: -0.5 dB How can I then get it to around max_volume = 0 without randomly trying values? Thanks for your help. Last edited by Roemer; 9th February 2021 at 21:15. Reason: Typos |
10th February 2021, 00:18 | #2 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,020
|
AmplifyDB will just blindly amplify everything, so, although it will bring the value to 0, it will perform severe clipping in the sense that it will raise everything.
Let me explain. If you have some scenes in which the audio is at -20dB and some others in which is -10dB and you make your calculations to bring the -10dB part to 0dB, AmplifyDb will not bring the -20dB part to 0 (it will just increase it but it won't reach zero) and only the -10dB part will reach 0dB. On the other hand, if you make your calculations for the -20dB part, then both will reach 0, but the -10dB one will be clipped as it will try to map some values "above" 0 which can't be done and they'll just be cut off and you'll end up with a pretty bad distortion. What you're looking for is a loudness correction which increases the short time loudness only when needed and with the right amount. I'm going to suggest FFMpeg this time instead of Avisynth and specifically the loudnorm filter: Code:
ffmpeg.exe -i "source.mxf" -c:a pcm_s24le -af loudnorm=I=-0:LRA=1 -ar 48000 -y "out.wav" Code:
ffmpeg.exe -i "source.mxf" -c:a pcm_s24le -af loudnorm=I=-24:LRA=6 -ar 48000 -y "out.wav" Last edited by FranceBB; 10th February 2021 at 00:21. |
10th February 2021, 00:21 | #3 | Link |
Registered User
Join Date: Mar 2011
Posts: 4,885
|
Normalise instead.
http://avisynth.nl/index.php/Normalize It requires two passes, the first one to find the peak level, so a script will take longer to open. If I normalize, I only add it to the script when I'm ready to convert the audio to another format. Edit: I saw AmplifyDB in the opening post and assumed you were using Avisynth for the actual volume adjustment, but maybe not. http://avisynth.nl/index.php/Amplify Does ffmpeg have a normalise option? Last edited by hello_hello; 10th February 2021 at 00:39. |
10th February 2021, 00:32 | #4 | Link | |
Registered User
Join Date: Mar 2011
Posts: 4,885
|
Quote:
max -0.7db + 0.5dB = -0.2dB The difference between that and the -0.5dB result after adjustment may be due to whether ffmpeg does true peak scanning or not (possibly not by default), but the mean volume result indicates it's been adjusted up as a whole by 0.5dB. -0.7dB is almost nothing below 0dB anyway, so I wouldn't bother adjusting it. If the peak was 10dB below maximum, then maybe.... If Roemer can pick a 0.5dB volume change he might have bionic ears. Last edited by hello_hello; 10th February 2021 at 00:43. |
|
10th February 2021, 00:39 | #5 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
The track is a final mixed track, so the difference between each parts is fine and must stay as it is but I just want to increase the total volume of the entire track. So I definitely do not want normalization as that would break the difference between the parts of the track.
To go a bit into more details: I have over 60 tracks and they range from max -0.1 to max -6 (and mean -17 to mean -28). So if I play them, I can hear volume differences between tracks (again: inside a track, the difference is fine) and I would like to align all volumes of all tracks. So I thought easiest would be to bring each volume_max to around -0.5 dB by just adding AmplifydB(Abs(trackMax)-0.5) which should fix most of the issues. |
10th February 2021, 00:54 | #6 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
Aaaah wait, I misread avisynths normalize. Under normalization I usually understand that the track itself is normalized with it's peaks being more silent and the lower parts being more loud but it seems avisynths Normalize does Peak Normalization, so it just lowers/increases everything for the max peak to match the wanted value, so I could just set it to Normalize(.95) for -0.5 max peak or Normalize(0.9) for around -1 db max peak, is that correct for Avisynth? If so, that would be exactly what I need!
|
10th February 2021, 01:08 | #7 | Link |
Registered User
Join Date: Mar 2011
Posts: 4,885
|
mean -17 to mean -28 is a fair difference, but the peak volume has much less to do with how loud something sounds. Two tracks can have the same average volume but one may have a single, much louder peak than the other. If you adjust them to make the peak volume the same (normalize) the average volume will be much different.
You could try an EBU R128 scan. foobar2000 uses that scanning method for it's ReplayGain scan. There's a portable fb2k version here that's almost ready to go for that sort of thing. https://forum.videohelp.com/threads/...io-encoding%29 I think ffmpeg can use that scanning method too, but I've never done it myself. The official volume for ReplayGain is 89dB, which is a useless way to describe it, as it's a sound pressure level, but it works out to be -18dB in human-speak. The EBU R128 standard volume is -23dB so you'd load a bunch of files into a playlist, run a ReplayGain scan, save the info to tags, then use that info to adjust the volume while converting. You'd adjust the volume to ReplayGain volume minus 5dB, for standard soundtrack volume. The version of foobar2000 I linked to above displays volume in human-speak. There's a couple of tabs dedicated to displaying the volume of files. something like this. Don't be put-off by the fact it's an audio-player. I convert audio almost exclusively with fb2k. (Edit: this pick if from another forum post where I was trying to show the poster the average volume they were trying to achieve was too high, hence the peaks above 0dB). If you're downmixing to stereo it's a two step process as you have to downmix to a temporary wave file, scan it, then convert while adjusting the volume. I'm pretty sure the version I uploaded also includes the plugin allowing foobar2000 to open Avisynth scripts. If you have problems with it crashing (it happens now and then) add KillVideo() to the end of the script while working with the audio, if the script includes video. To run a ReplayGain scan and save the info to tags, you'd have to convert the script output to wave files or flac etc first. Last edited by hello_hello; 10th February 2021 at 02:02. |
10th February 2021, 01:12 | #8 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
Thanks for your answer. Yes, the -17 to -28 is a fair difference but that is because some tracks just have long almost silent parts. I really just want that the max peaks are the same for all tracks and the rest ist just blindly increased/decreased accordingly for each track. So I think the normalize is exactly the right thing.
|
10th February 2021, 01:22 | #9 | Link | |
Registered User
Join Date: Mar 2011
Posts: 4,885
|
Quote:
The R128 scanning method doesn't include any silence in the average volume, or a volume a certain distance below the average. I can't remember the exact details. I don't know what ffmpeg's volumedetect does, but as a general rule, R128 scanning is much better than peak normalising, especially for unrelated files where the difference between peak and average volume can change dramatically. After-all, that's what it was specifically designed for. Last edited by hello_hello; 10th February 2021 at 19:47. |
|
10th February 2021, 01:32 | #10 | Link |
Registered User
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,099
|
Oh boy, you really need to attend an audio 101 course...
First of all you need to understand the difference between Peak Normalization and Loudness Normalization. Peak Normalization raises the level just to the point where the highest peaks reach 0 dB (or a specified lower peak level which is advisable, maybe -1 dB). It does not apply any dynamic range compression. Loudness Normalization tries to bring the perceived loudness of different tracks to the same perceived loudness. The peak levels do not contribute to the perceived loudness at all. You can Peak Normalize the tracks of an album so they all have identical peak levels, but the perceived loudness will still be very different. Loudness Normalization uses a (not so scientific) approach which involves psycho acoustics, it requires user tests to determine what actually contributes to the perceived loudness of an audio track. This typically is a 2-pass process. The first pass analyzes the souirce and determines a value for the perceived loudness. Then you need a value for your target loudness, and at last you apply a linear gain change to achieve your target loudness. This process also does not apply any Dynamic Range Compression. I won't bother you with historical anecdotes about ReplayGain (which started all this). Today's standard for Loudness Normalization is the BS1770 standard (adopted by the Europeans in the EBU R128 standard). The problem with Loudness Normalization is that there is no protection against clipping. If your analysis pass tells you to increase the loudness by 10 dB to reach your target loudness, but the source had peaks at -5 dB, then your result will have heavy clipping by 5 dB. For digital audio this is absolutely unacceptable. There are only two workarounds: First you could ignore what the analysis pass told you and only increase the loudness by 5 dB. This will avoid clipping, but you won't achieve an even perceived loudness. This sounds undesirable, but for many sources this method works pretty well. The other option is to apply Dynamic Range Compression. This should mainly limit the peaks and also bring up the very quiet parts so the target loudness can be reached without introducing clipping. FranceBB is in broadcasting, and the standard for him is to use EBU R128 with a target loudness of -24 LUFS. Outside of broadcasting I recommend to use a higher target loudness of -18 LUFS which is the equivalent of the old ReplayGain target. Also he recommends to use a target Loudness Range of only 6 which is awfully low. I use these ffmpeg parameters: loudnorm=I=-18:TP=-1.5:LRA=14 Loudnorm first tries to reach the target loudness with a linear gain change. But if this is not possible it will use a dynamic method (see it as a sophisticated Automatic Gain Control), and it produces very good sounding results. I tried to keep this as simple as possible, I hope it was not too complicated... Cheers manolito //EDIT// Oops, I forgot to mention that there is a nice GUI for LoudNorm by Muxson called WinLoud. Much easier to use than ffmpeg... Last edited by manolito; 10th February 2021 at 01:50. |
10th February 2021, 01:47 | #11 | Link |
Registered User
Join Date: Feb 2002
Location: California
Posts: 2,706
|
I really appreciate the opportunity to take your course. I knew most of this, but not in such a concise, structured way.
Thankfully, my instincts using Sound Forge and iZotope have been pretty close to what "101" says I should have done, so I don't think I've ruined too much over the years. |
10th February 2021, 08:36 | #12 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
Many thanks for that quick introduction! That helps a lot in understanding.
So now it gets even more complicated A quick summup of my problem: I am working on a personal dubbing project of a tv show. For that I use various sources for audio/video in different languages. Now the show has a fixed intro/outro, so I use the video and audio from one good source for all episodes as in some files, the intro/outro can be partially missing. After that, I use the best available video/audio in the languages I want. I then create an avisynth script to match the audio from different languages together so it is in-sync with the best available video by adding or removing a few frames (including audio) here and there (mostly inside black frames / scene changes). So now regarding audio loudness I have the following issues: 1. The intro/outro has a different loudness than the effective episode 2. Each language has a different loudness 3. Each episode has a different loudness What could be a good approach for this problem? I now thought about: 1. Extracting the intro/outro into separate files and run ffmpegs loudnorm on them to get them to a desired level 2. Extract the episode for each language in a separate files and run ffmpegs loudnorm on then to get to the same level as the intro / outro 3. Merge the intro + episode + outro together This includes some extra steps to my current avisynth scripts as currently I have one avisynth file which does everything and has a flag which defines which language it should use in the "output" so I can just run this avisynth twice: once with the 1st language and the video, once with the 2nd language, ignoring the video and then mux the video and audio from the 1st and the audio from the 2nd encode together. But I guess that won't be possible anymore. I guess I need to do the intro/outro once separately, then in the avisynth file just do as before but only with the episode content and in the end, concatenate the (mkv) files together (is there any downside by just concatenating them instead of re-encoding them together?) and then mux them together. |
10th February 2021, 13:30 | #13 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
After some more investigation, I now split the intro and the episode into separate avisynth scripts and I have a main script that combines them together.
I also have some variables to amplify the intro/outro and the episode in each language. This is a simple Amplify on the streams in Avisynth. I now run an ffmpeg ebur128 analysis for each episode and check the Integraded loudness (I), Range (LRA) and Peak. I then play around with the amplify values to get to an I of around -18 and peak never above -0.1. This way, I only have to encode the audio only once and can encode the whole video (intro + episode + outro) at once directly from avisynth. Also this way, the amplification is linear, so no dynamic range compression as I want. The only downside is, that I need to play around a bit with the amplify values to get to the desired loudness. I hope that all makes sense Thank you all for your inputs. Last edited by Roemer; 10th February 2021 at 13:33. |
10th February 2021, 16:08 | #15 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,656
|
Being an audio engineer for the best time of my life, used to fine-match everything by hand:
Adding to the many good advices given here I might suggest to give some one-click tools a shot: One is LordMulder's DynamicAudioNormaliser tool. And the other is Levelator. Both did very well for the occasional quick and not-so-dirty fix. While suggesting such I concur in suggesting to stay away from 0 dB FS. Any postprocessing needs a bit of headroom, and one would gain nought by amplification in the range of +1dB. If loudness gain is really needed: After Re-EQing and manual envelope trimming one may use For music: 4-band RMS compression max. 1:1.5 with proven handmatched time constants, (like 50ms Attack 150ms Release). Makeup Gain so that GR works -3..-6dB, this will do least harm. Speech: Different of course. I would still handwork that more before using any automated compression, then might even go to LordMulder's which uses longer lookahead, see before. Learing from the restrictions for limited 8-bit YUV Video: Limits are there for a good reason: Use them up, and any digital postprocessing (oversampling, antialiasing) shows painfully what happens: Clipping, loss of information which just was sitting nicely there until somebody decided to use up all the available values.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." Last edited by Emulgator; 10th February 2021 at 16:16. |
10th February 2021, 16:39 | #16 | Link | |
Registered User
Join Date: Sep 2003
Location: Berlin, Germany
Posts: 3,099
|
Quote:
You can find an interesting blog post by the author of LoudNorm here: http://k.ylo.ph/2016/04/04/loudnorm.html Some hints: If the result is meant for broadcasting on TV, you better stick to the EBU recommendation of -23 LUFS or the atsc value of -24 LUFS. For LoudNorm use 2-pass mode. For peak analyzing use True Peak instead of Sample Peak. |
|
10th February 2021, 16:41 | #17 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,656
|
Ah, true.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're that issue working on. Synce invntoin uf lingöage..." |
10th February 2021, 16:55 | #18 | Link |
Registered User
Join Date: Oct 2020
Posts: 20
|
But thanks for the apps anyway, they might come in handy somewhen.
I wrote a small method that tries different amplify values and recalculates the ebur128 each time and tries to automatically generate an optimal amplify value to get as near as possible (in 0.1 steps) to the target volume and below a maximum peak. There are of course some outliers due to some that have a very high max-peak (was even +5 in the worst case!), so for it to go below under the peak threshold, the target volume then also drops of course but the worst case I found now is -20 instead of -18 so I think that should still be fine and barely hearable to -18. |
Thread Tools | Search this Thread |
Display Modes | |
|
|