Which atoms need to be altered if the offset of the sound data may shift?

NCrusher74 · 9th September 2020, 04:02

The backstory: I'm working on a pure-Swift library for editing metadata and chaptering in mp4 audio files. I know there are other libraries out there that do this, but for my purposes (creating a suite of tools for comprehensive audiobook library management) they don't do what I need and I don't have the knowledge of C++ and/or Objective-C that I need to alter them so that they will.

I've only been learning Swift for about a year, and before that, the last coding I was was in high school in the early 90s (the language was BASIC.) While AVFoundation's handling of metadata is ample for my purposes, I was unsuccessful at working out how to get chaptering functionality working, and after months of trying other tools that either wouldn't do what I needed, or would do what I needed but couldn't be used in Swift without a knowledge of other languages that I lacked, I determined that I was spinning my wheels and it would be better to create my own library that I could then share with others who might find themselves in a similar situation.

I've been successful...up to a point. I have a library that works if the audio data is stored in consecutive chunks. The sample files I'd worked with while writing the code all had the chapter title data either at the beginning or end of the audio data, either in the same mdat atom or in a separate mdat atom. It was a simply matter to update the offsets in the stco atom.

But when I tried my library on a file that was chaptered using Chapter & Verse (which is widely used for creating m4b files for Librivox, the most comprehensive source of free audiobooks for works in the public domain) I discovered that some audiobook files are stored with the chapter title data interspersed with the audio data.

(Strangely, even though Chapter & Verse is theoretically created using MP4v2, some of my other sample files were also created using different apps built on MP4v2, and those apps stored the audio data and chapter title data in uninterrupted, consecutive chunks. I'm not sure why C&V is different in that regard.)

This makes it significantly more challenging to update the offsets if the length of the chapter title strings should happen to change (say, someone prefers using "Chapter 03" instead of "Chapter Three" or they want to add or remove a chapter subtitle that's included in the string, or whatever.) Instead of changing every offset by the same amount, the chunks that comprise the audio data of one chapter may shift by a different amount than the chunks that comprise a different chapter's audio data.

It would be difficult to calculate these shifts accurately (at least at my current skill level and understanding of the atom structure) especially with so many other variables at play. Really the most straightforward thing to do would be to restructure the way the data is stored so that the chapter title data is in uninterrupted, consecutive chunks and the audio data is in uninterrupted, consecutive chunks.

But I'm missing something, and I'm not sure what.

Using the data in the stsc atom, I'm able to get the byte-count of each chunk by adding together the byte-counts of the samples it contains, and combined with the offset in the co64/stco atom, I'm able to isolate each chunk's data, recombine it in a single, consecutive block of chunks, and edit the offsets. However, some apps--most notably Apple Books-- will only play the first 23 seconds or so of the audio afterward (out of a 17 and a half minute file). Others (for example, Fission) will play it all. Some, like VLC, will report that it's only 23 seconds long but will play the whole thing.

I figure I can't have broken things too badly if the audio is playing, even if it's not playing in its entirety. I suspect if I were to map out the duration of the part that will play, it would be likely that the point at which the audio stops playing is where it transitions to a new chunk, perhaps? The sample sizes should not have changed, nor the number of samples per chunk. All that should have changed is the offsets.

Is there an atom I'm missing that may be causing this issue, or an error I may be making in how I'm approaching it?

9th September 2020, 04:02	#1 \| Link
NCrusher74 Registered User Join Date: Sep 2020 Posts: 1	Which atoms need to be altered if the offset of the sound data may shift? The backstory: I'm working on a pure-Swift library for editing metadata and chaptering in mp4 audio files. I know there are other libraries out there that do this, but for my purposes (creating a suite of tools for comprehensive audiobook library management) they don't do what I need and I don't have the knowledge of C++ and/or Objective-C that I need to alter them so that they will. I've only been learning Swift for about a year, and before that, the last coding I was was in high school in the early 90s (the language was BASIC.) While AVFoundation's handling of metadata is ample for my purposes, I was unsuccessful at working out how to get chaptering functionality working, and after months of trying other tools that either wouldn't do what I needed, or would do what I needed but couldn't be used in Swift without a knowledge of other languages that I lacked, I determined that I was spinning my wheels and it would be better to create my own library that I could then share with others who might find themselves in a similar situation. I've been successful...up to a point. I have a library that works if the audio data is stored in consecutive chunks. The sample files I'd worked with while writing the code all had the chapter title data either at the beginning or end of the audio data, either in the same mdat atom or in a separate mdat atom. It was a simply matter to update the offsets in the stco atom. But when I tried my library on a file that was chaptered using Chapter & Verse (which is widely used for creating m4b files for Librivox, the most comprehensive source of free audiobooks for works in the public domain) I discovered that some audiobook files are stored with the chapter title data interspersed with the audio data. (Strangely, even though Chapter & Verse is theoretically created using MP4v2, some of my other sample files were also created using different apps built on MP4v2, and those apps stored the audio data and chapter title data in uninterrupted, consecutive chunks. I'm not sure why C&V is different in that regard.) This makes it significantly more challenging to update the offsets if the length of the chapter title strings should happen to change (say, someone prefers using "Chapter 03" instead of "Chapter Three" or they want to add or remove a chapter subtitle that's included in the string, or whatever.) Instead of changing every offset by the same amount, the chunks that comprise the audio data of one chapter may shift by a different amount than the chunks that comprise a different chapter's audio data. It would be difficult to calculate these shifts accurately (at least at my current skill level and understanding of the atom structure) especially with so many other variables at play. Really the most straightforward thing to do would be to restructure the way the data is stored so that the chapter title data is in uninterrupted, consecutive chunks and the audio data is in uninterrupted, consecutive chunks. But I'm missing something, and I'm not sure what. Using the data in the stsc atom, I'm able to get the byte-count of each chunk by adding together the byte-counts of the samples it contains, and combined with the offset in the co64/stco atom, I'm able to isolate each chunk's data, recombine it in a single, consecutive block of chunks, and edit the offsets. However, some apps--most notably Apple Books-- will only play the first 23 seconds or so of the audio afterward (out of a 17 and a half minute file). Others (for example, Fission) will play it all. Some, like VLC, will report that it's only 23 seconds long but will play the whole thing. I figure I can't have broken things too badly if the audio is playing, even if it's not playing in its entirety. I suspect if I were to map out the duration of the part that will play, it would be likely that the point at which the audio stops playing is where it transitions to a new chunk, perhaps? The sample sizes should not have changed, nor the number of samples per chunk. All that should have changed is the offsets. Is there an atom I'm missing that may be causing this issue, or an error I may be making in how I'm approaching it?

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode