Question about framerate conversion and audio adjustment

Chainmax · 21st March 2007, 00:30

I'm about to try and convert an old movie that came in one of my games. The file needs a small framerate conversion, so AssumeFPS will be used. Now, I have two options to adjust the audio accordingly:

* TimeStretch(tempo=(100.0*y)/x).AssumeFPS(y)
* AssumeFPS(y,true) + Resampling with SSRC

where x is the original framerate and y is the final framerate. I was wondering what advantages and disadvantages each method has compared to the other.

wonkey_monkey · 21st March 2007, 10:52

AssumeFPS does a simple resample of the audio, which means it's pitch will shift by the same amount as the framerate (like the 4% PAL speedup which results in slightly higher voices). TimeStretch tries to be smart about it (my understanding is that it chops the audio into chunks, pulls the chunks apart or squishes them together, and does a short fade between the chunks) and keeps the same pitch, but the audio will suffer from artefacts - clicks, pops, and warbles.

David

tebasuna51 · 21st March 2007, 12:26

If you don't need recompress the video you can use for the video:
VirtualDub -> Video -> Direct Stream copy
VirtualDub -> Video -> Frame Rate ... Change to Y frames per second

And for audio:
TimeStretch(tempo=(100.0*y)/x)
SoundOut()

TimeStretch(tempo) preserve the pitch and is the recommended method, of course is not perfect. If the audio track is more important than video (concert,...) is recommended Convert the video track (inserting or deleting frames) and preserve the audio.

The equivalent AviSynth methods (when you need recompress the video) are AssumeFPS(y) for VirtualDub-Change and ChangeFPS(y) for VirtualDub-Convert

Chainmax · 21st March 2007, 14:28

davidhorman: that's more or less what IanB told me a long time ago, thanks for the confirmation

. Do you think the artifacting should be noticeable on a ~-0.093% or a ~0.0067% adjustment?

tebasuna51: the video will need heavy processing, and the framerate conversion will be from 14.999fps to either 14.985fps or 15fps, which is why I prefer AssumeFPS to other smarter methods. Therefore, AssumeFPS should be included.
About SoundOut, the audio file will need further processing as well (samplerate change, conversion to stereo, enhancement). Would SoundOut still be appropriate in that case?

wonkey_monkey · 21st March 2007, 14:33

Quote:

Originally Posted by Chainmax

Do you think the artifacting should be noticeable on a ~-0.093% or a ~0.0067% adjustment?

Wow, if that's the adjustment I'd just use sync_audio. No-one's going to notice the pitch shift and you'll be guaranteed no artefacts - although with the right parameters I doubt you'd notice the artefacts from timestretch either.

How long does the video run?

David

tebasuna51 · 22nd March 2007, 04:37

Quote:

Originally Posted by Chainmax

About SoundOut, the audio file will need further processing as well (samplerate change, conversion to stereo, enhancement). Would SoundOut still be appropriate in that case?

Of course, you can use any audio internal filter or plugin and finish with SoundOut

Chainmax · 22nd March 2007, 05:02

davidhorman: yup, it's a change from 14.999fps to either 14.985fps or 15fps. About sync_audio, do you mean using the AssumeFPS switch? Because that way, the samplerate will be very slightly changed and SSRC will probably not be able to resample it to 48000Hz. By the way, the video is 20min in length. Why do you ask?

tebasuna51: actually, all those enhancements will be done externally. Especially the stereo enhancement which uses a VST plugin.

IanB · 22nd March 2007, 05:42

For 16 bit audio samples ResampleAudio() will be indistinguishable from SSRC(). Also it does not have input/output ratio restriction and you will avoid a ConvertAudioToFloat/ConvertAudioTo16bit pair.

And the question about length relates to the total amount of error we are talking about 14.999 -> 15.0 over 20 minutes is 80 milliseconds ==> don't change it! (It's a lot less than 200ms)

Also 14.999 -> 14.985 over 20 minutes is 1116 milliseconds ==> Do change it!

At 15 fps each frame is 66.6 milliseconds.

Chainmax · 22nd March 2007, 21:14

The source is only 8bit. So, what you're recommending me to use is AssumeFPS(15,true).ResampleAudio(11025) then? Also, why do you say a 1.3ms error over 20mins is too much? I might try to change the framerate sometime, and when trying to go to 30fps results will probably be better when starting from 15fps than when starting from 14.999fps, right?

[edit]ResampleAudio automatically changes the depth to 16bit. Can I tell it to leave the depth as it was?

wonkey_monkey · 22nd March 2007, 22:10

I think Ian's suggesting you don't resample the audio at all, but I disagree with his figures - Ian, I think you used seconds instead of minutes!

Converting from 14.999fps to 15.000fps will result in a drift
of 80ms (just over one frame - you could get away with not resampling).

Converting from 14.999fps to 14.985fps (which seems an odd thing to do, but that's what you said above!) will result in a drift of over one second - something you won't get away with.

David

foxyshadis · 23rd March 2007, 06:44

14.985 is 1/2 ntsc, an option if you were going to put it on VCD or DVD but sort of pointless otherwise.

Whatever you do, make sure the very first thing you do is convert to 16 or 24 bit. You never ever want to work with audio in anything less than 16. It's a fast track to even more severe quality loss, and it's useless because all dct-based codecs are infinite-precision internally, and lower bitdepth input makes their output worse, not better. Unless you're just going to save it back to ADPCM or something lame like that.

In fact, you should convert to 16 bit, load it into audacity, and use its noise reduction to get rid of some of the quantization noise (and any microphone noise).

(This applies just as much to video, which is why I'm always banging the 16-bit drum, but the human eyes are so much less sensitive to quantization noise than the ears that it doesn't matter to most people.

)

I agree that either method will give you decent quality, with such a tiny sampling difference, although I'll admit I've never tried SSRC on 8-bit audio; maybe it fails miserably there.

Chainmax · 23rd March 2007, 15:31

foxyshadis: I actually plan to convert the video to 1280x960 and might eventually do framerate conversion up to HDTV standards. The reason I'm torn between 14.985fps and 15fps is because I'd like to upconvert to the most common used HDTV framerate and I don't know wether it's 29.97/59.94fps or 30/60fps.
As for the audio part, once the framerate is adjusted I intend to convert it to stereo, then use SSRC to upsample to 48000Hz and 16bit and finally do some enhancement in Nero Wave Editor. That's why I want to keep it at 8bit initially.

foxyshadis · 23rd March 2007, 16:01

Then why not ConvertAudioTo16Bit().AssumeFPS(14.985,true).SSRC(48000)? Why bother resampling twice (resampleaudio/timestretch + ssrc)?

Performing any dsp operations on 8 bit audio will increase quantization noise quite audibly. Even performing more than a few on 16-bit will do it, but not so bad. Actually, try it both ways, so you can hear for yourself.

30/60p is very rare, even though it's allowed by the HD specs. 29.97/59.94 is basically all you'll find in the NTSC world, since there's still so much legacy equipment.

Are you going to use one of the pseudo-stereo matrixing methods? I'm curious how it'll be done.

IanB · 24th March 2007, 02:29

@davidhorman,

Opps, yes I forgot the times 60 (I have fixed my post)

@foxyshadis,

SSRC only works with float samples, so converting to 16 bit first is a waste of time, because it is going to convert to float next step anyway.

@Chainmax,

Yes I am saying wear the 80 millisecond error in preference to getting other problems. The 1116 milliseconds you will have to fix.

If you are going to try to convert the FPS with MVTools or other, then start with the 14.999. Failing that I would just use ChangeFPS and avoid all the trouble.

Chainmax · 24th March 2007, 18:18

Quote:

Originally Posted by foxyshadis

Then why not ConvertAudioTo16Bit().AssumeFPS(14.985,true).SSRC(48000)? Why bother resampling twice (resampleaudio/timestretch + ssrc)?

Because I want to let SSRC itself take care of the sampling rate and depth upconversion with very specific parameters:

Code:

ssrc.exe --rate 48000 --twopass --dither 1 --bits 24 --pdf 1 input.wav output.wav

Quote:

Originally Posted by foxyshadis

30/60p is very rare, even though it's allowed by the HD specs. 29.97/59.94 is basically all you'll find in the NTSC world, since there's still so much legacy equipment.

Are you going to use one of the pseudo-stereo matrixing methods? I'm curious how it'll be done.

Are you sure the float framerates are more widespread than the integer ones?
As for the stereo enhancement, it will be done using a VST plugin. If you're interested in hearing the results, I started a thread at the HydrogenAudio forums asking for advice. Once a few samples are posted for people to give me feedback, I can send you the link to it if you want to participate.

Quote:

Originally Posted by IanB

...
@Chainmax,

Yes I am saying wear the 80 millisecond error in preference to getting other problems. The 1116 milliseconds you will have to fix.

If you are going to try to convert the FPS with MVTools or other, then start with the 14.999. Failing that I would just use ChangeFPS and avoid all the trouble.

I'd rather have as little error as possible, wouldn't AssumeFPS(x,true).sampling back to 11025 guarantee that?
Regarding framerate conversion: it's only a possibility, but if it's done, wouldn't the tools have less of an issue multiplying the framerate by 2 than by 2.0001 or 1.9981?

foxyshadis · 25th March 2007, 12:59

It'd be nice if avisynth exposed more of these parameters, although I think dither=1 is default. Maybe, maybe not. Anyway, run it through assumefps, do not use timestretch or resampleaudio, and output to wav for ssrc. Since assumefps isn't dsp, it just sets the samplerate, there's no 8-bit degradation. It'll now be 11033 khz or 10989 khz or something like that, but ssrc should be able to deal with it without artifacts from two separate resamplings.

Have you verified that ssrc takes 8-bit input? It should upsample properly, but if not you might need to convert to 24/32 in avisynth anyway.

And yes, the "old-fashioned" framerates are absolutely prevalent. If you look at the decrypting forum or the various EVOB threads, ntsc movie content is still being encoded at 23.976. This is almost certainly because they're still using the same old workflows as in the past, along with backward compatibility with large numbers of older HD sets that could only show 59.97i. Undoubtedly, this will fade in time, and as long as you respect the other HD standards (HD-DVD, BD, DVB) you'll be able to make your content compatible with modern equipment, whether you use an integral framerate or not.

525/60 · 25th March 2007, 16:04

Quote:

ChangeFPS changes the frame rate by deleting or duplicating frames.

Why not leave the audio alone and let ChangeFPS drop the frames needed to bring it up to the right framerate? The change is so small that very few frames would be dropped. Is the video running in slow motion or did you need to use a more standard framerate?

ChangeFPS(15000,1001)

Chainmax · 26th March 2007, 23:07

Well, I tried feeding the file as 8bit, 16bit, 24bit and 32bit into SSRC v1.30 and (as expected) choked because of the weird 11026Hz sample rate. I'll go ask at HA if they can recommend me something at least as good as SSRC with these specific CLI settings.

21st March 2007, 00:30	#1 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	Question about framerate conversion and audio adjustment I'm about to try and convert an old movie that came in one of my games. The file needs a small framerate conversion, so AssumeFPS will be used. Now, I have two options to adjust the audio accordingly: * *TimeStretch(tempo=(100.0y)/x).AssumeFPS(y)** * AssumeFPS(y,true) + Resampling with SSRC where x is the original framerate and y is the final framerate. I was wondering what advantages and disadvantages each method has compared to the other. __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.

21st March 2007, 14:28	#4 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	davidhorman: that's more or less what IanB told me a long time ago, thanks for the confirmation . Do you think the artifacting should be noticeable on a ~-0.093% or a ~0.0067% adjustment? tebasuna51: the video will need heavy processing, and the framerate conversion will be from 14.999fps to either 14.985fps or 15fps, which is why I prefer AssumeFPS to other smarter methods. Therefore, AssumeFPS should be included. About SoundOut, the audio file will need further processing as well (samplerate change, conversion to stereo, enhancement). Would SoundOut still be appropriate in that case? __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.

22nd March 2007, 05:02	#7 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	davidhorman: yup, it's a change from 14.999fps to either 14.985fps or 15fps. About sync_audio, do you mean using the AssumeFPS switch? Because that way, the samplerate will be very slightly changed and SSRC will probably not be able to resample it to 48000Hz. By the way, the video is 20min in length. Why do you ask? tebasuna51: actually, all those enhancements will be done externally. Especially the stereo enhancement which uses a VST plugin. __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.

22nd March 2007, 05:42	#8 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	For 16 bit audio samples ResampleAudio() will be indistinguishable from SSRC(). Also it does not have input/output ratio restriction and you will avoid a ConvertAudioToFloat/ConvertAudioTo16bit pair. And the question about length relates to the total amount of error we are talking about 14.999 -> 15.0 over 20 minutes is 80 milliseconds ==> don't change it! (It's a lot less than 200ms) Also 14.999 -> 14.985 over 20 minutes is 1116 milliseconds ==> Do change it! At 15 fps each frame is 66.6 milliseconds. Last edited by IanB; 24th March 2007 at 02:13. Reason: @davidhorman, Opps yes I forgot the times 60

22nd March 2007, 21:14	#9 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	The source is only 8bit. So, what you're recommending me to use is AssumeFPS(15,true).ResampleAudio(11025) then? Also, why do you say a 1.3ms error over 20mins is too much? I might try to change the framerate sometime, and when trying to go to 30fps results will probably be better when starting from 15fps than when starting from 14.999fps, right? [edit]ResampleAudio automatically changes the depth to 16bit. Can I tell it to leave the depth as it was? __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it. Last edited by Chainmax; 22nd March 2007 at 21:44.

21st March 2007, 10:52	#2 \| Link
wonkey_monkey Formerly davidh***** Join Date: Jan 2004 Posts: 2,496	AssumeFPS does a simple resample of the audio, which means it's pitch will shift by the same amount as the framerate (like the 4% PAL speedup which results in slightly higher voices). TimeStretch tries to be smart about it (my understanding is that it chops the audio into chunks, pulls the chunks apart or squishes them together, and does a short fade between the chunks) and keeps the same pitch, but the audio will suffer from artefacts - clicks, pops, and warbles. David

21st March 2007, 12:26	#3 \| Link
tebasuna51 Moderator Join Date: Feb 2005 Location: Spain Posts: 6,915	If you don't need recompress the video you can use for the video: VirtualDub -> Video -> Direct Stream copy VirtualDub -> Video -> Frame Rate ... Change to Y frames per second And for audio: TimeStretch(tempo=(100.0y)/x) SoundOut() TimeStretch(tempo) preserve the pitch and is the recommended method, of course is not perfect. If the audio track is more important than video (concert,...) is recommended Convert* the video track (inserting or deleting frames) and preserve the audio. The equivalent AviSynth methods (when you need recompress the video) are AssumeFPS(y) for VirtualDub-Change and ChangeFPS(y) for VirtualDub-Convert

22nd March 2007, 22:10	#10 \| Link
wonkey_monkey Formerly davidh***** Join Date: Jan 2004 Posts: 2,496	I think Ian's suggesting you don't resample the audio at all, but I disagree with his figures - Ian, I think you used seconds instead of minutes! Converting from 14.999fps to 15.000fps will result in a drift of 80ms (just over one frame - you could get away with not resampling). Converting from 14.999fps to 14.985fps (which seems an odd thing to do, but that's what you said above!) will result in a drift of over one second - something you won't get away with. David

23rd March 2007, 06:44	#11 \| Link
foxyshadis Angel of Night Join Date: Nov 2004 Location: Tangled in the silks Posts: 9,559	14.985 is 1/2 ntsc, an option if you were going to put it on VCD or DVD but sort of pointless otherwise. Whatever you do, make sure the very first thing you do is convert to 16 or 24 bit. You never ever want to work with audio in anything less than 16. It's a fast track to even more severe quality loss, and it's useless because all dct-based codecs are infinite-precision internally, and lower bitdepth input makes their output worse, not better. Unless you're just going to save it back to ADPCM or something lame like that. In fact, you should convert to 16 bit, load it into audacity, and use its noise reduction to get rid of some of the quantization noise (and any microphone noise). (This applies just as much to video, which is why I'm always banging the 16-bit drum, but the human eyes are so much less sensitive to quantization noise than the ears that it doesn't matter to most people. ) I agree that either method will give you decent quality, with such a tiny sampling difference, although I'll admit I've never tried SSRC on 8-bit audio; maybe it fails miserably there.

23rd March 2007, 15:31	#12 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	foxyshadis: I actually plan to convert the video to 1280x960 and might eventually do framerate conversion up to HDTV standards. The reason I'm torn between 14.985fps and 15fps is because I'd like to upconvert to the most common used HDTV framerate and I don't know wether it's 29.97/59.94fps or 30/60fps. As for the audio part, once the framerate is adjusted I intend to convert it to stereo, then use SSRC to upsample to 48000Hz and 16bit and finally do some enhancement in Nero Wave Editor. That's why I want to keep it at 8bit initially. __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.

23rd March 2007, 16:01	#13 \| Link
foxyshadis Angel of Night Join Date: Nov 2004 Location: Tangled in the silks Posts: 9,559	Then why not ConvertAudioTo16Bit().AssumeFPS(14.985,true).SSRC(48000)? Why bother resampling twice (resampleaudio/timestretch + ssrc)? Performing any dsp operations on 8 bit audio will increase quantization noise quite audibly. Even performing more than a few on 16-bit will do it, but not so bad. Actually, try it both ways, so you can hear for yourself. 30/60p is very rare, even though it's allowed by the HD specs. 29.97/59.94 is basically all you'll find in the NTSC world, since there's still so much legacy equipment. Are you going to use one of the pseudo-stereo matrixing methods? I'm curious how it'll be done.

24th March 2007, 02:29	#14 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	@davidhorman, Opps, yes I forgot the times 60 (I have fixed my post) @foxyshadis, SSRC only works with float samples, so converting to 16 bit first is a waste of time, because it is going to convert to float next step anyway. @Chainmax, Yes I am saying wear the 80 millisecond error in preference to getting other problems. The 1116 milliseconds you will have to fix. If you are going to try to convert the FPS with MVTools or other, then start with the 14.999. Failing that I would just use ChangeFPS and avoid all the trouble.

25th March 2007, 12:59	#16 \| Link
foxyshadis Angel of Night Join Date: Nov 2004 Location: Tangled in the silks Posts: 9,559	It'd be nice if avisynth exposed more of these parameters, although I think dither=1 is default. Maybe, maybe not. Anyway, run it through assumefps, do not use timestretch or resampleaudio, and output to wav for ssrc. Since assumefps isn't dsp, it just sets the samplerate, there's no 8-bit degradation. It'll now be 11033 khz or 10989 khz or something like that, but ssrc should be able to deal with it without artifacts from two separate resamplings. Have you verified that ssrc takes 8-bit input? It should upsample properly, but if not you might need to convert to 24/32 in avisynth anyway. And yes, the "old-fashioned" framerates are absolutely prevalent. If you look at the decrypting forum or the various EVOB threads, ntsc movie content is still being encoded at 23.976. This is almost certainly because they're still using the same old workflows as in the past, along with backward compatibility with large numbers of older HD sets that could only show 59.97i. Undoubtedly, this will fade in time, and as long as you respect the other HD standards (HD-DVD, BD, DVB) you'll be able to make your content compatible with modern equipment, whether you use an integral framerate or not. Last edited by foxyshadis; 25th March 2007 at 13:02.

26th March 2007, 23:07	#18 \| Link
Chainmax Huh? Join Date: Sep 2003 Location: Uruguay Posts: 3,103	Well, I tried feeding the file as 8bit, 16bit, 24bit and 32bit into SSRC v1.30 and (as expected) choked because of the weird 11026Hz sample rate. I'll go ask at HA if they can recommend me something at least as good as SSRC with these specific CLI settings. __________________ Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it.