Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > General > Audio encoding

Reply
 
Thread Tools Search this Thread Display Modes
Old 30th July 2011, 21:00   #1  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
NeroAACEnc Multi-Channel Efficiency Puzzle

I have been using neroaacenc to transcode AC3 files (after first decoding with avisynth/NicAudio) to AAC format for years, so I have a bit of experience with the format and its typical efficiency, but the following I do not understand. I typically use neroaacenc quality settings in the 0.3 to 0.4 range.

1. AAC seems to be substantially more efficient than AC3. Stereo AC3 files typically compress to stereo AAC files about half the size.

2. For most material, I also downmix 5.1 AC3s to stereo AACs. That of course makes the size difference even larger. A typical stereo AAC is only about one third the size of the source 5.1 AC3.

3. For high-def video, where the video bit rate is so high anyway, I'm seriously considering stopping downmixing and taking the hit on larger resulting 5.1 AACs.

The problem is: 5.1 AACs are only slightly (~10%) smaller than the source 5.1 AC3s. This seems completely inconsistent the much higher efficiency observed when encoding stereo AC3s to stereo AACs (a reduction in size of about 50%, see above).

Before you respond along the lines of "You idiot! Of course 5.1 AACs must be larger than stereo AACs! They carry additional channels and therefore more information!" please let me reassure you that I understand that.

What puzzles me is the difference in efficiency AAC efficiency in 5.1 (where it hardly beats AC3) and in stereo (where it achieves 50% or so improvement). In fact, I would have thought that the additional surround channels would, for most scenes in most content, be very low or highly redundant to the stereo channels or (for the subwoofer channel) only cover very low frequencies (meaning little data), so adding them to a well-designed compressor which takes advantage of all this redundancy would mean a relatively small size-hit.

Instead, it is almost as if all the channels were encoded independently and encoding 5.1 channels takes just two or three times as much space as encoding 2 channels in AAC.

Can anybody explain this to me? Is it a general feature of AAC or just of neroaacenc? Or perhaps typical stereo AC3s are encoded at an excessively high bit rate, which AAC can squeeze out easily, but typical surround AC3s are not? Or perhaps the quality setting in neroaacenc just gives too many bits to multichannel audio--rather than achieving the same perceived sound quality as it would at stereo with the same -q setting? Or something else?
CarlEdman is offline   Reply With Quote
Old 30th July 2011, 21:14   #2  |  Link
mariush
Registered User
 
Join Date: Dec 2008
Posts: 589
Hmmm....

The AAC compression is lossy... depending on the quality factor, it filters the signal and removes lows and high sides so that as much information can be retained in that bitrate.

For the main stereo signal, where there's not much bass and highs in the first place, the channels have a lot of redundancy so for that quality preset, the encoder may be able to retain a lot in a small bitrate... let's say 96-112kbps on average compared to 192 kbps for AC3 stereo.
The back speaker signal should be about the same, with some redundancy and less bass or highs, so .
But the bass and center speakers are paired in the last "stereo" pair and they have nothing in common, the encoder probably uses A LOT of bitrate to retain the bass (for the quality preset you use) and an average amount of bitrate for the center speaker, but as there's no similarities between the channels there's just a lot of bitrate... I'd guess instead of having 96-112 kbps, you probably average 160-224 kbps for that quality preset.
So what you gain on front and back channels you lose on the bass and center channel..
mariush is offline   Reply With Quote
Old 30th July 2011, 23:18   #3  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
You need to compare apples with apples.

You don't say what the original AC3 files are. Most commercial AC3 encodings are very generous bit-rate wise, so there is lots of fat for neroaacenc to trim, hence a high apparent efficiency in your trancode.

I think what you are actually measuring is how efficiently compressed the original AC3's were. If the original bit-rate was over generous then you get a big win. If the original bit-rate was a little squeezed then then you gain less.
IanB is offline   Reply With Quote
Old 31st July 2011, 00:38   #4  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Thanks for your responses and insights, mariush and IanB.

The AC3 files I'm compressing are overwhelmingly extracted 5.1 audio streams from my DVD collection or captured broadcasts. A typical bit-rate is 384 kBps.

As for apples-to-apples comparison, the closest thing I can come to is taking a single 5.1 AC3 file of, e.g., a typical 2 hour movie. If I downmix to two channels with NicAudio as part of the AC3 -> WAV step the resulting file with "neroaacenc -lc -q 0.40" will typically be about 35% the size of the original AC3. If do not downmix, but use otherwise identical settings and materials, the resulting AAC file will be about 90% the size of the original AC3.

Obviously these percentages will vary from source to source, but are remarkably consistent within +-5% among the dozens of transcodes I've made the comparison. This consistent differential still seems excessively large.
CarlEdman is offline   Reply With Quote
Old 31st July 2011, 11:17   #5  |  Link
shon3i
BluRay Maniac
 
shon3i's Avatar
 
Join Date: Dec 2005
Posts: 2,419
@CarlEdman, well you comparing prue VBR and CBR. If you want less size then you encode with ABR/CBR in AAC, for example 160kbps HE-AAC will give good balance for 5.1 encoding, will be simmilar like 448kbps AC3 and 1536kbps DTS.
__________________
ChapterGen - manipulate with chapters in various i/o formats, with CLI support
Official website or Doom9 thread
shon3i is offline   Reply With Quote
Old 31st July 2011, 14:07   #6  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
@shon3i I know that I can always reduce the size of the output at the expense of quality by tuning down the '-q' setting.

But for almost all purposes, I very much prefer to target a quality level I experience as transparent, rather than a fixed file size or bit rate for either audio or video. All my encodes end up on my 10 TByte NAS and I really do not care whether any particular encode of a 2 hour movie is twice as large as that of another 2 hour movie. As long as the encodes are transparent and do not waste space, I'm happy.

For some purposes (e.g., archive to small fixed-size storage medium or limited-bandwidth distribution), I can understand why people still use CBR/ABR modes for audio and video. But unless your application falls into one of those specialized categories, there is really no good reason not to go VBR for both and at least three good reason to: (1) Your encoding will be simplified, typically single-step, rather than two-step; (2) you avoid the risk of an unacceptably low-quality encode; (3) you avoid wasting space.

But, in my experience, most people still using CBR/ABR encoding have no good reason. They just have always done it that way--for example, because there once was a good reason or because VBR was not available yet--and are too intellectually lazy to switch to VBR.
CarlEdman is offline   Reply With Quote
Old 31st July 2011, 15:41   #7  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Very interesting topic you brought up. I'm by no means an expert on audio topics, but I made a very small test:

I created three aac files of a 90 seconds stereo music file using neroaacenc 1.5.4.
1.) just stereo, i.e. two channels
2.) a 5.1 file, with the stereo on front left and front right channels, all other channels (center, LFE, surround) digital silence
3.) a 5.1 file, with the stereo on front and surround left and front and surround right, other channels (center, LFE) digital silence

File sizes:
1.) 1.46 MB
2.) 1.54 MB
3.) 2.94 MB

Sample 3 has almost exactly double the size of sample 1. It seems that neroaacenc doesn't do any "coupling" between front and surround channels, so that would confirm your assumption. I have no idea why sample 2 has more "overhead" from the silent channels, than sample 3.
sneaker_ger is offline   Reply With Quote
Old 31st July 2011, 15:53   #8  |  Link
shon3i
BluRay Maniac
 
shon3i's Avatar
 
Join Date: Dec 2005
Posts: 2,419
Quote:
Originally Posted by sneaker_ger
It seems that neroaacenc doesn't do any "coupling" between front and surround channels, so that would confirm your assumption. I have no idea why sample 2 has more "overhead" from the silent channels, than sample 3.
I noticed that from first day when Nero became free, and always i had better results with CT AAC these days.
__________________
ChapterGen - manipulate with chapters in various i/o formats, with CLI support
Official website or Doom9 thread
shon3i is offline   Reply With Quote
Old 31st July 2011, 17:47   #9  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Thanks to sneaker_ger to confirming one of my suspicions!

Quote:
Originally Posted by shon3i View Post
I noticed that from first day when Nero became free, and always i had better results with CT AAC these days.
That is very interesting. I always thought NeroAACenc was the best widely available AAC encoder. Was I wrong? Is CT AAC cleverer and available? Or is something else I should switch to?

Speed is not an important factor for me; audio encode time is always trivial compared to video encode time.
CarlEdman is offline   Reply With Quote
Old 31st July 2011, 17:52   #10  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Can someone confirm that the quality remains constant? Maybe the bitrate allocation changes with increasing channel count, i.e. increased quality for constant "q"? I tried to rule that out with sample 2 and sample 3 seems to indicate that this is not the case, but I fail at listening tests, so I cannot make 100% sure.
sneaker_ger is offline   Reply With Quote
Old 31st July 2011, 19:09   #11  |  Link
shon3i
BluRay Maniac
 
shon3i's Avatar
 
Join Date: Dec 2005
Posts: 2,419
Quote:
Originally Posted by CarlEdman View Post
Thanks to sneaker_ger to confirming one of my suspicions!



That is very interesting. I always thought NeroAACenc was the best widely available AAC encoder. Was I wrong? Is CT AAC cleverer and available? Or is something else I should switch to?

Speed is not an important factor for me; audio encode time is always trivial compared to video encode time.
This my statament is for multichannel encoding eg 5.1 or 7.1. CT AAC has only ABR RC but comparing to Nero in general, always give more respect to all channels than Nero. You think that Nero is best because there is test's which prove that but only for stereo and usually low bitrate eg 48-64kbps, but nobody test's multichannel encoding. IIRC Itunes(qtaac) AAC curently uses first place from lastest AAC testing (in stereo), and it has TrueVBR RC which should be better for you needs. I didn't test it comparing to CT or Nero in multichannel.
__________________
ChapterGen - manipulate with chapters in various i/o formats, with CLI support
Official website or Doom9 thread
shon3i is offline   Reply With Quote
Old 31st July 2011, 19:47   #12  |  Link
b66pak
Registered User
 
b66pak's Avatar
 
Join Date: Aug 2008
Location: The Land Of Dracula (Romania - EU)
Posts: 934
Quote:
Originally Posted by sneaker_ger View Post
Very interesting topic you brought up. I'm by no means an expert on audio topics, but I made a very small test:

I created three aac files of a 90 seconds stereo music file using neroaacenc 1.5.4.
1.) just stereo, i.e. two channels
2.) a 5.1 file, with the stereo on front left and front right channels, all other channels (center, LFE, surround) digital silence
3.) a 5.1 file, with the stereo on front and surround left and front and surround right, other channels (center, LFE) digital silence

File sizes:
1.) 1.46 MB
2.) 1.54 MB
3.) 2.94 MB

Sample 3 has almost exactly double the size of sample 1. It seems that neroaacenc doesn't do any "coupling" between front and surround channels, so that would confirm your assumption. I have no idea why sample 2 has more "overhead" from the silent channels, than sample 3.
same test using a 300 sec music encoded with qaac (QT 7.6.9) tvbr 127 and best quality:

1.) 13,160,646 bytes
2.) 13,355,252 bytes
3.) 26,357,495 bytes
_
__________________
if you ask a question and somebody give you the correct answer don't forget to leave a "thank you" note...
Visit The Land Of Dracula (Romania - EU)!

Last edited by b66pak; 2nd August 2011 at 19:21.
b66pak is offline   Reply With Quote
Old 1st August 2011, 14:56   #13  |  Link
hello_hello
Registered User
 
Join Date: Mar 2011
Posts: 4,821
Quote:
Originally Posted by CarlEdman View Post
For high-def video, where the video bit rate is so high anyway, I'm seriously considering stopping downmixing and taking the hit on larger resulting 5.1 AACs.
Personally, (and I store video in much the same way as you) I stopped taking any quality hit at all and just keep the original 5.1 AC3. I only ever convert DTS to AAC.

Not that it answers your question, but I tried for myself just out of interest. I started with a DTS audio track from a BluRay.
I hour, 44 minutes long, 1.1GB.

Encoding to AC3 using AFTEN.
5.1 448Kb/s, 335MB
5.1 384Kb/s, 287MB
Stereo 192Kb/s, 144MB

Encoding to AAC using neroaacenc

q0.5, 5.1, 311MB
q0.5, Stereo, 125MB

q0.4, 5.1, 230MB
q0.4, Stereo, 91MB

q0.3, 5.1, 155MB
q0.3, Stereo, 62MB

I always use q.05 when encoding AAC so it's obvious why I just keep the AC3 audio. The AAC encode doesn't always end up larger than the 384k AC3 at q.05. Sometimes it's a little smaller. I guess it depends on the contents of the audio track. In which case the size of the q.04 and q.03 encodes would also be reduced.

All the above was converted using foobar2000 and it's own 5.1 to stereo plugin where appropriate.

A stereo 192k AC3 file is about half the size of a 5.1ch 384k AC3, but I assume it's using a fixed bitrate for each channel and I assume the front channels must get more bits?
For AAC, as it's VBR the difference between the size of the stereo encode and the 5.1 encode is probably proportional to the amount of activity in the other four channels.

Still in this case, compared with 5.1ch 384k AC3, 5.1ch q.04 offered a about a 20% reduction in file size while q.03 reduced the file size by about 46%.

Last edited by hello_hello; 2nd August 2011 at 09:13.
hello_hello is offline   Reply With Quote
Old 1st August 2011, 16:24   #14  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
Quote:
Originally Posted by hello_hello View Post
Personally, (and I store video in much the same way as you) I stopped taking any quality hit at all and just keep the original 5.1 AC3. I only ever convert DTS to AAC.
That is a very good thought and I say that not just because I've had the same idea, at least where the video bit rate is huge anyway (like much HD content) or I really care about the sound quality (like musical or opera films).

What has stopped me from doing this is that I use mp4 as a container format and while I believe that AC3-in-MP4 was approved a while ago, I worry about hardware player compatibility. I'm sure VLC wouldn't have a problem, but what about networked video streamers I keep around the house? Or even worse, given how resistant Apple can be to accept industry standard updates, what about the various iPads/iPhones on which my wife and kids love to carry around and watch movies?

So does anybody have a feeling for how widely AC3-in-MP4 has actually been accepted?
CarlEdman is offline   Reply With Quote
Old 1st August 2011, 16:43   #15  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Quote:
Originally Posted by CarlEdman View Post
So does anybody have a feeling for how widely AC3-in-MP4 has actually been accepted?
It has not (from my subjective feeling).

Another problem that often gets forgotten: while many devices support aac-in-mp4, most of them force a downmix to stereo, because of licensing issues (IIRC). And you cannot bitstream AAC.
So if you're going for multi-channel home cinema I'd rather use Matroska with AC3 or DTS.
sneaker_ger is offline   Reply With Quote
Old 2nd August 2011, 00:08   #16  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
No question, Quality constrained, Variable Bitrate encoding gives the best available audio quality for the total space used. And all the posts in this thread keep confirming this.

For Quality constrained, Variable Bitrate encoding the rules are simple, use as many or few bits as required to encode this audio while maintaining a psycho-acoustic quality of the specified value. If a given 1 second of audio needs 1024kb/s then use 1024 kilobits. If another 1 second can be faithfully compressed at 4kb/s then just use 4 kilobits. This is what -q=value encoding with AAC is all about.

With CBR encoding the rules are very different. Here the goal is to make sure that the sustained bitrate is maintained within the declared buffer size. The rules are imposed for some bandwidth limited element in the distribution path, maybe a DVB channel allocation, maybe the minimum transfer speed of domestic DVD players, maybe the speed of your internet connection. If a given 1 second of audio has to be squeezed to q=0.01 to fit into a 64kb/s stream then the sound gets mashed, no choice. If another 1 second can be faithfully compressed at q=1.0 and still fit into the 64kb/s stream then we have good luck and the stream possibly gets some padding bits added.

When recompressing a CBR constrained stream as Quality constrained you need to consider the psycho-acoustic quality of each second of audio. If the CBR rate control function deemed a section of audio needs a q=0.2 then that audio is mashed forever, the VBR encoder happily encodes the simplified audio easily effectively at q=0.2. However if the CBR rate control function deemed a section of audio can be encoded at q=0.7 but the VBR encoder has been told do q=0.3 for that section the quality will be reduced to q=0.3. So what you get is a a variable quality stream with a maximum q=value as specified but lower values where the CBR rate control needed to squeeze harder. So the comparisons are a little unfair.

But as I said earlier commercial AC3 streams are very generous there is lots of fat to trim.


Also interesting observation that nero doesn't find good correlation in 5.1 streams. It's a very difficult problem because the ambience sound is delayed from the initial sound so you need to correlate across a wide time, maybe wider than duration of a compressed audio frame.
IanB is offline   Reply With Quote
Old 2nd August 2011, 09:12   #17  |  Link
hello_hello
Registered User
 
Join Date: Mar 2011
Posts: 4,821
Just out of curiosity, after I encoded the dts file in my previous post to AC3 and AAC, I encoded the AC3 files to AAC too. I wanted to see if going from AC3 to AAC would produce drastic file size differences compared with dts to AAC. It didn't, which is why I don't mentioned it in my previous post. The file sizes were a little different though. Pretty much all the AC3 to AAC encodes resulted in a larger file than the dts to AAC encodes. Not by much, I think the maximum increase was 20MB, and I may even be remembering that incorrectly. They did all increase by a little though.
hello_hello is offline   Reply With Quote
Old 2nd August 2011, 09:21   #18  |  Link
hello_hello
Registered User
 
Join Date: Mar 2011
Posts: 4,821
Quote:
Originally Posted by CarlEdman View Post
What has stopped me from doing this is that I use mp4 as a container format and while I believe that AC3-in-MP4 was approved a while ago, I worry about hardware player compatibility.
I can't really answer the questions as I rarely use MP4 and I don't have any hardware compatibility issues to worry about. Well if I do have them, at least when it comes to phones and ipads, I still don't worry about it.
hello_hello is offline   Reply With Quote
Old 5th August 2011, 23:47   #19  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Another element to my discussion above is the different psycho-acoustic models used by the various lossy compressions. From what hello_hello seems to be saying, it seems that AC3 compression first makes AAC compression less effective. This would seem to indicate that the 2 psycho-acoustic models are fighting each other. But remember the DTS is also a lossy compression, just not very much. It is hard to make genuine comparisons. Sources from a DVD or BD will all have been individually compressed from an original lossless master.

So in posts above we have :-
  • Master -> DTS -> AAC
  • Master -> AC3 -> AAC
  • Master -> DTS -> AC3 -> AAC
What we need is
  • Master -> AAC
Also we do not know the DTS or AC3 encoders or the settings used to create the commercial media.
IanB is offline   Reply With Quote
Old 6th August 2011, 19:11   #20  |  Link
CarlEdman
Registered User
 
Join Date: Jan 2008
Posts: 185
@IanB

Agreed with everything, except for referring to the master as "lossless." The original audio and video signals are in the analogue domain with near-infinite band-width. Any recording, even a digital "master" which doesn't do anything but separately record every pixel for every frame and every audio amplitude for every channel at a high sample rate, is lossy. Both the digital audio and video stream will have both sample rate losses (i.e., sample rate for audio, pixel resolution and frame rate for video) and quantization losses (i.e., the bit-depth of the recorded audio and video).

Now these sampling losses and quantization losses will hopefully have been chosen so that they are outside of the limits of human perception. But in principle, that is all that "lossy" compression does too--throwing away material that is below the level of human perception. All that so-called "lossy" compression does is try to be cleverer about human perception than "lossless" recording which only does simple sampling and quantization.

But there is no principled distinction between the process by which a master is recorded and the later compression stages.

Sorry for the rant. I expect you were aware of that and the distinction between naive so-called "lossless" recording and clever lossy compression is, while only of degree and cleverness of algorithm, is nevertheless useful for some practical purposes. I just get annoyed at people who fetishes losslessness--everything is lossy and we just have to learn to make the best (i.e., least perceptible) of it.
CarlEdman is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:20.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.