SelectEvery(5,0,2) automatically adjusts the video frame rate from 30 fps to 12 fps so there shouldn't be any audio desync.
Assuming the AABBB pattern is fixed throughout the whole video, you can use neuron2's suggestion as is, but if the pattern changes throughout the source, one way out is to use TDecimate(cycleR=3,cycle=5) or the slightly more accurate SelectEven().TDecimate(cycleR=1,cycle=5). These two alternatives also adjust the frame rate automatically.
This is if you're okay with 12 fps output; if you want motion-interpolated 24/30/60/whatever fps output, you'll have to get rid of all the duplicates first (using any of the three methods mentioned above) prior to motion interpolation.
|