Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Newbies

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th May 2016, 18:34   #1  |  Link
przemoc
Registered User
 
Join Date: Sep 2010
Posts: 2
MP4 delay - start_pts vs media time from edit list table entry in moov.trak.edts.elst

Hi!

I'd like to understand some MP4 and AAC-related stuff and ffmpeg behavior regarding it.

I'm transcoding 14.5 secs footage (50fps; 696000 48kHz audio samples) huffyuv+pcm_s16le from MKV into h264+aac to MP4 using latest stable ffmpeg 3.0.1 (Zeranoe's Win64 static build).

Code:
ffmpeg -i 30-notes-huffyuv.mkv ^
-pix_fmt:v yuv420p ^
-c:v libx264 -profile:v high -preset:v fast ^
-sc_threshold:v 0 -g:v 25 -bf:v 2 -crf:v 18 ^
-c:a aac -profile:a aac_low -b:a 384k -cutoff:a 22000 ^
30-notes.mp4
When we look at ffprobe's -show_streams output:
Code:
[STREAM]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=High
codec_type=video
codec_time_base=1/100
codec_tag_string=avc1
codec_tag=0x31637661
width=1920
height=1080
coded_width=1920
coded_height=1088
has_b_frames=2
sample_aspect_ratio=1:1
display_aspect_ratio=16:9
pix_fmt=yuv420p
level=42
color_range=N/A
color_space=unknown
color_transfer=unknown
color_primaries=unknown
chroma_location=left
timecode=N/A
refs=4
is_avc=true
nal_length_size=4
id=N/A
r_frame_rate=50/1
avg_frame_rate=50/1
time_base=1/12800
start_pts=0
start_time=0.000000
duration_ts=185600
duration=14.500000
bit_rate=15634211
max_bit_rate=N/A
bits_per_raw_sample=8
nb_frames=725
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=1
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
TAG:language=und
TAG:handler_name=VideoHandler
[/STREAM]
[STREAM]
index=1
codec_name=aac
codec_long_name=AAC (Advanced Audio Coding)
profile=LC
codec_type=audio
codec_time_base=1/48000
codec_tag_string=mp4a
codec_tag=0x6134706d
sample_fmt=fltp
sample_rate=48000
channels=2
channel_layout=stereo
bits_per_sample=0
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/48000
start_pts=-1024
start_time=-0.021333
duration_ts=697024
duration=14.521333
bit_rate=235170
max_bit_rate=384000
bits_per_raw_sample=N/A
nb_frames=681
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=1
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
TAG:language=und
TAG:handler_name=SoundHandler
[/STREAM]
we can see that:
  • video:
    start_pts=0
    start_time=0.000000
  • audio:
    start_pts=-1024 (accomodating AAC priming - ffmpeg's native aac encoder delay is 1024 samples long)
    start_time=-0.021333 (1024/48000)
which is all fine.
But if we look into internals of this MP4 via Elecard Video Format Analyzer or via dump from mp4box (I used latest stable 0.6.1 Win64 build):
Code:
mp4box -std -diso 30-notes.mp4 | egrep -v "\<(CompositionOffsetEntry|SyncSampleEntry|SampleToChunkEntry|SampleSizeEntry|ChunkEntry)\>" 
<?xml version="1.0" encoding="UTF-8"?>
<!--MP4Box dump trace-->
<IsoMediaFile Name="30-notes.mp4">
<FileTypeBox MajorBrand="isom" MinorVersion="512">
<BoxInfo Size="32" Type="ftyp"/>
<BrandEntry AlternateBrand="isom"/>
<BrandEntry AlternateBrand="iso2"/>
<BrandEntry AlternateBrand="avc1"/>
<BrandEntry AlternateBrand="mp41"/>
</FileTypeBox>
<FreeSpaceBox size="0">
<BoxInfo Size="8" Type="free"/>
</FreeSpaceBox>
<MediaDataBox dataSize="28763883">
<BoxInfo Size="28763891" Type="mdat"/>
</MediaDataBox>
<MovieBox>
<BoxInfo Size="18880" Type="moov"/>
<MovieHeaderBox CreationTime="0" ModificationTime="0" TimeScale="1000" Duration="14522" NextTrackID="3">
<BoxInfo Size="108" Type="mvhd"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</MovieHeaderBox>
<TrackBox>
<BoxInfo Size="12723" Type="trak"/>
<TrackHeaderBox CreationTime="0" ModificationTime="0" TrackID="1" Duration="14500" Width="1920.00" Height="1080.00">
<Matrix m11="0x00010000" m12="0x00000000" m13="0x00000000" 								m21="0x00000000" m22="0x00010000" m23="0x00000000" 								m31="0x00000000" m32="0x00000000" m33="0x40000000"/><BoxInfo Size="92" Type="tkhd"/>
<FullBoxInfo Version="0" Flags="0x3"/>
</TrackHeaderBox>
<EditBox>
<BoxInfo Size="36" Type="edts"/>
<EditListBox EntryCount="1">
<BoxInfo Size="28" Type="elst"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<EditListEntry Duration="14500" MediaTime="512" MediaRate="1"/>
</EditListBox>
</EditBox>
<MediaBox>
<BoxInfo Size="12587" Type="mdia"/>
<MediaHeaderBox CreationTime="0" ModificationTime="0" TimeScale="12800" Duration="185600" LanguageCode="und">
<BoxInfo Size="32" Type="mdhd"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</MediaHeaderBox>
<HandlerBox Type="vide" Name="VideoHandler" reserved1="0" reserved2="data:application/octet-string,000000000000000000000000">
<BoxInfo Size="45" Type="hdlr"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</HandlerBox>
<MediaInformationBox>
<BoxInfo Size="12502" Type="minf"/>
<VideoMediaHeaderBox>
<BoxInfo Size="20" Type="vmhd"/>
<FullBoxInfo Version="0" Flags="0x1"/>
</VideoMediaHeaderBox>
<DataInformationBox><BoxInfo Size="36" Type="dinf"/>
<DataReferenceBox>
<BoxInfo Size="28" Type="dref"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<URLDataEntryBox>
<!--Data is contained in the movie file-->
<BoxInfo Size="12" Type="url "/>
<FullBoxInfo Version="0" Flags="0x1"/>
</URLDataEntryBox>
</DataReferenceBox>
</DataInformationBox>
<SampleTableBox>
<BoxInfo Size="12438" Type="stbl"/>
<SampleDescriptionBox>
<BoxInfo Size="154" Type="stsd"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<AVCSampleEntryBox DataReferenceIndex="1" Width="1920" Height="1080" XDPI="4718592" YDPI="4718592" BitDepth="24">
<BoxInfo Size="138" Type="avc1"/>
<AVCConfigurationBox>
<AVCDecoderConfigurationRecord configurationVersion="1" AVCProfileIndication="100" profile_compatibility="0" AVCLevelIndication="42" nal_unit_size="4" chroma_format="0" luma_bit_depth="0" chroma_bit_depth="0">
<SequenceParameterSet size="27" content="data:application/octet-string,6764002AACD940780227E5C044000003000400000301903C60C658"/>
<PictureParameterSet size="6" content="data:application/octet-string,68EAE08CB22C"/>
</AVCDecoderConfigurationRecord>
<BoxInfo Size="52" Type="avcC"/>
</AVCConfigurationBox>
</AVCSampleEntryBox>
</SampleDescriptionBox>
<TimeToSampleBox EntryCount="1">
<BoxInfo Size="24" Type="stts"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<TimeToSampleEntry SampleDelta="256" SampleCount="725"/>
<!-- counted 725 samples in STTS entries -->
</TimeToSampleBox>
<CompositionOffsetBox EntryCount="665">
<BoxInfo Size="5336" Type="ctts"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<!-- counted 725 samples in CTTS entries -->
</CompositionOffsetBox>
<SyncSampleBox EntryCount="29">
<BoxInfo Size="132" Type="stss"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</SyncSampleBox>
<SampleToChunkBox EntryCount="93">
<BoxInfo Size="1132" Type="stsc"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<!-- counted 724 samples in STSC entries (could be less than sample count) -->
</SampleToChunkBox>
<SampleSizeBox SampleCount="725">
<BoxInfo Size="2920" Type="stsz"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</SampleSizeBox>
<ChunkOffsetBox EntryCount="679">
<BoxInfo Size="2732" Type="stco"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</ChunkOffsetBox>
</SampleTableBox>
</MediaInformationBox>
</MediaBox>
</TrackBox>
<TrackBox>
<BoxInfo Size="5943" Type="trak"/>
<TrackHeaderBox CreationTime="0" ModificationTime="0" TrackID="2" Duration="14522" AlternateGroupID="1" Volume="1.00">
<BoxInfo Size="92" Type="tkhd"/>
<FullBoxInfo Version="0" Flags="0x3"/>
</TrackHeaderBox>
<EditBox>
<BoxInfo Size="36" Type="edts"/>
<EditListBox EntryCount="1">
<BoxInfo Size="28" Type="elst"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<EditListEntry Duration="14500" MediaTime="1024" MediaRate="1"/>
</EditListBox>
</EditBox>
<MediaBox>
<BoxInfo Size="5807" Type="mdia"/>
<MediaHeaderBox CreationTime="0" ModificationTime="0" TimeScale="48000" Duration="697024" LanguageCode="und">
<BoxInfo Size="32" Type="mdhd"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</MediaHeaderBox>
<HandlerBox Type="soun" Name="SoundHandler" reserved1="0" reserved2="data:application/octet-string,000000000000000000000000">
<BoxInfo Size="45" Type="hdlr"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</HandlerBox>
<MediaInformationBox>
<BoxInfo Size="5722" Type="minf"/>
<SoundMediaHeaderBox>
<BoxInfo Size="16" Type="smhd"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</SoundMediaHeaderBox>
<DataInformationBox><BoxInfo Size="36" Type="dinf"/>
<DataReferenceBox>
<BoxInfo Size="28" Type="dref"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<URLDataEntryBox>
<!--Data is contained in the movie file-->
<BoxInfo Size="12" Type="url "/>
<FullBoxInfo Version="0" Flags="0x1"/>
</URLDataEntryBox>
</DataReferenceBox>
</DataInformationBox>
<SampleTableBox>
<BoxInfo Size="5662" Type="stbl"/>
<SampleDescriptionBox>
<BoxInfo Size="106" Type="stsd"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<MPEGAudioSampleDescriptionBox DataReferenceIndex="1" SampleRate="48000" Channels="2" BitsPerSample="16">
<BoxInfo Size="90" Type="mp4a"/>
<MPEG4ESDescriptorBox>
<BoxInfo Size="54" Type="esds"/>
<FullBoxInfo Version="0" Flags="0x0"/>
 <ES_Descriptor ES_ID="es2" binaryID="2" >
  <decConfigDescr>
   <DecoderConfigDescriptor objectTypeIndication="64" streamType="5" maxBitrate="384000" avgBitrate="235170" >
    <decSpecificInfo>
     <DecoderSpecificInfo type="auto" src="data:application/octet-string,%11%90%56%E5%00" />
    </decSpecificInfo>
   </DecoderConfigDescriptor>
  </decConfigDescr>
  <slConfigDescr>
   <SLConfigDescriptor >
    <predefined value="2" />
    <custom >
    </custom>
   </SLConfigDescriptor>
  </slConfigDescr>
 </ES_Descriptor>
</MPEG4ESDescriptorBox>
</MPEGAudioSampleDescriptionBox>
</SampleDescriptionBox>
<TimeToSampleBox EntryCount="2">
<BoxInfo Size="32" Type="stts"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<TimeToSampleEntry SampleDelta="1024" SampleCount="680"/>
<TimeToSampleEntry SampleDelta="704" SampleCount="1"/>
<!-- counted 681 samples in STTS entries -->
</TimeToSampleBox>
<SampleToChunkBox EntryCount="2">
<BoxInfo Size="40" Type="stsc"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<!-- counted 681 samples in STSC entries (could be less than sample count) -->
</SampleToChunkBox>
<SampleSizeBox SampleCount="681">
<BoxInfo Size="2744" Type="stsz"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</SampleSizeBox>
<ChunkOffsetBox EntryCount="679">
<BoxInfo Size="2732" Type="stco"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</ChunkOffsetBox>
</SampleTableBox>
</MediaInformationBox>
</MediaBox>
</TrackBox>
<UserDataBox>
<BoxInfo Size="98" Type="udta"/>
<MetaBox>
<BoxInfo Size="90" Type="meta"/>
<FullBoxInfo Version="0" Flags="0x0"/>
<HandlerBox Type="mdir" Name="" reserved1="0" reserved2="data:application/octet-string,6170706C0000000000000000">
<BoxInfo Size="33" Type="hdlr"/>
<FullBoxInfo Version="0" Flags="0x0"/>
</HandlerBox>
<ItemListBox>
<BoxInfo Size="45" Type="ilst"/>
<ToolBox value="Lavf57.25.100" >
<FullBoxInfo Version="0" Flags="0x1"/>
<BoxInfo Size="37" Type=".too"/>
</ToolBox>
</ItemListBox>
</MetaBox>
</UserDataBox>
</MovieBox>
</IsoMediaFile>
then we can see that:
  • video:
    moov.trak.edts.elst[0].media_time = 512
    moov.trak.mdia.mdhd.timescale=12800
    moov.trak.mdia.mdhd.duration=185600 (i.e. 14.5 secs)
  • audio:
    moov.trak.edts.elst[0].media_time = 1024
    moov.trak.mdia.mdhd.timescale=48000
    moov.trak.mdia.mdhd.duration=697024 (i.e. 14.5 secs + 1024 samples)
It confuses me, because it suggests that video playback starts at 512 or 512/12800 = 0.04 s, i.e. 40 ms into video stream, yet encoded video stream is not longer by that value and ffprobe clearly shows start_pts = 0.

1. What I am missing here? Am I looking at media time incorrectly, i.e. it has some other meaning that I think it has?

I have also some bonus questions:

2. Isn't AAC encoder required to produce full access units (typically having 1024 samples)? 697024/1024 = 680.6875 is not an integer.

3. I know that padding info (for start and end) can be stored within ITUNSMPB tag, but ffmpeg is not using that, adhering (I hope so) to ISO only, so where is this tail padding stored? Or is moov.trak.mdia.mdhd.duration allowed to be lower that real media duration (which would be divisible by 1024)?

4. If ffmpeg is using ISO way of delaying AAC audio (instead of iTunes way), then shouldn't it also add sample group (sgpd) with roll distance set to -1, as edit list (elst) is not enough for signaling encoder delay?

Ok, that's all for my first post on doom9.
przemoc is offline   Reply With Quote
Old 30th May 2016, 22:19   #2  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,263
Quote:
Originally Posted by przemoc View Post
It confuses me, because it suggests that video playback starts at 512 or 512/12800 = 0.04 s, i.e. 40 ms into video stream, yet encoded video stream is not longer by that value and ffprobe clearly shows start_pts = 0.
I don't know what ffprobe's start_pts is exactly. But moov.trak.edts.elst[0].media_time = 512 does not necessarily mean that there's any actual delaying/skipping of the first video sample. I cannot see because your log is shortened but it may simply be to "counteract" a CTS of 512. 512 - 512 = 0. No delay, everything is fine. The edit list maps from media time line to actual presentation time line.


Quote:
Originally Posted by przemoc View Post
2. Isn't AAC encoder required to produce full access units (typically having 1024 samples)? 697024/1024 = 680.6875 is not an integer.
Good question. I'm note sure. It would make sense but is it an actual requirement? Maybe it would make more sense with a complete log.

Quote:
Originally Posted by przemoc View Post
3. I know that padding info (for start and end) can be stored within ITUNSMPB tag, but ffmpeg is not using that, adhering (I hope so) to ISO only, so where is this tail padding stored? Or is moov.trak.mdia.mdhd.duration allowed to be lower that real media duration (which would be divisible by 1024)?
It should be implied by edit list duration.

Quote:
Originally Posted by przemoc View Post
4. If ffmpeg is using ISO way of delaying AAC audio (instead of iTunes way), then shouldn't it also add sample group (sgpd) with roll distance set to -1, as edit list (elst) is not enough for signaling encoder delay?
Correct.
sneaker_ger is offline   Reply With Quote
Old 1st June 2016, 00:38   #3  |  Link
przemoc
Registered User
 
Join Date: Sep 2010
Posts: 2
Thank you for your reply, sneaker_ger!

Full dump:
http://paste.przemoc.net/doom9/mp4-d...notes_info.xml
przemoc is offline   Reply With Quote
Reply

Tags
acc, ffmpeg, media_time, mp4, start_pts

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:18.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.