Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 3rd March 2022, 01:04   #1  |  Link
BlueSwordM
Registered User
 
BlueSwordM's Avatar
 
Join Date: Dec 2021
Location: Canada
Posts: 22
Encoder tuning Part 4: A 2nd generation guide to aomenc-av1, institutional knowledge

So, this is a follow-up to the 2nd part guide regarding aomenc-av1, which can be found here:

https://old.reddit.com/r/AV1/comment...cav1libaomav1/

While that guide is still fine for the most part at a first glance,
I've learned a lot regarding the sudo-reference AV1 encoder, its options, its intricacies, and best of all, its shortcomings.

It now means I understand a lot more about the options themselves, what they do, how to take advantage of them, when to actually use them,
and even how to get around their downsides through some clever options and even a custom WIP build on how to address aomenc-av1's greatest weakness:
a surprising lack of deep psycho-visual optimizations(intra only has a nice number of them, but barely any video coding versions).

Before I begin, I have to add that this is not a comprehensive documentation. A simple Reddit forum post is far too small for such a massive endeavour, so a separate post will be done with an entry on a dedicated Wiki of some sorts to explain what each and every option does in detail, and even speed-features and their explanations.

Now, to get on to the main subject of the post itself: the 2nd generation tuning guide for aomenc-av1!

Encoder speed preset

The encoder preset itself:
Code:
--cpu-used=X
For VOD purposes, this ranges from 0 (abominably slow) to 6 (decently fast) in the good preset.
For realtime purposes like streaming, the RT presets range from 5 to 10, with 5 being the slowest RT preset and 10 being the fastest.

For reference, the default is 0. Not exactly optimal...

My general recommendation for choosing what preset to utilize is based on speed, usability and quality.
In that context, all realtime presets are off of the table until aomenc gets their frame-threading merged into the mainline build due to their low single instance speed/quality ratio; you are better off using SVT-AV1 right now in that sense.

Otherwise, my general recommendation is in the middle: CPU-2 being the lowest preset I'd recommend actually using, CPU-3 being a good middle ground in general since it keeps most of the juicy features on.

CPU-4 is good for those wanting faster encoding than CPU-3 while not losing much. CPU-5 is where tradeoffs start getting a bit more severe since pruning and the disabling of features(particularly loop restoration filtering).
gets disabled. CPU-6 is the fastest I'd go utilizing aomenc. Any faster today, and going with SVT-AV1 is a better tradeoff.

General recommendations: `--cpu-used=2` for slow encoding, `--cpu-used=3` as the middle ground, and `--cpu-used=5` as the fast option.

Keyframe refresh intervals

Code:
--kf-max-dist=240 --kf-min-dist=12
This parameter dictates the maximum distance between statically placed keyframes(as in, keyframes not placed by the scene-detection algorithms).
For seeking purposes in most content, the standard recommendation is 10 seconds worth of frames, with 300 frames usually being the max number of frames being put to keep good seeking performance.

So, my recommendations would for 240 frames for 24FPS, 250 frames for 25FPS, and 300 frames for >30FPS content.

As for kf-min-dist, it is the minimum amount of frames before you can place a keyframe. This is mainly done in case the scene-detection fails to insert intra-refreshes or fails to detect flashes and places unnecessary keyframes all over the place.

Threading options

Code:
--threads=cpu-threads --sb-size=64
for <=1080p content.
Code:
--threads=cpu-threads --sb-size=64 --tile-columns=1
for even higher encoder side threading and some decoder side tile threading.

Code:
--threads=cpu-threads --sb-size=64 --tile-columns=2 --tile-rows=1
if you need best threading for decoding purposes, particularly at higher resolutions.

Code:
--threads=cpu-threads --tile-columns=2 --tile-rows=1
for >1080p resolutions

Code:
--threads=2 --sb-size=64
+ thread pinning if you use chunked encoding to give yourself better thread scaling.

Now, threading in aomenc. What an interesting subject.
Aomenc has access to these threading parameters:

- Row threading --- - Tile Threading --- - Smaller task threading - Frame-threading(experimental, so will not be tackled in this guide)

The AV1 standard has access to 2 types of SuperBlock types: 64x64-128x128, also allowing for the usage of larger partitions at higher resolutions. Not very useful at standard HD resolutions(<=1080p), but it does exist for a good reason.

In aomenc, the default behavior is to dynamically choose between 64x64-128x128 superblocks. This is good, as very large static SBs and partitions might prove detrimental to speed and perceptual quality to a small extent. Another side effect of using larger SBs is that row threading gets less effective.

To balance it out, tile threading can be used, but as IÂ’ve tested personally, the penalty for using static 64x64 Sbs is lower than even adding just one additional tile column, so if you worry a bit about encoder side threading for the encoder to use 64x64 Sbs before adding tiles.

The main reason to add tiles would be to boost random access performance for the decoder, as frame threads are much higher latency than tile threads. Adding tiles boosts seeking performance.

Finally, tiles still follow the power of 2 rules. Therefore, `--tile-columns=1` = 2¹ = 2 tile columns.
The total number of tiles is dictated by: # of tile columns * # of tile rows = total number of tiles.
Thus, --tile-columns=2 --tile-rows=1 = 2² columns x 2¹ rows = 4x2 tiles = 8 tiles.

Rate control

Code:
--end-usage=q --cq-level=24
In aomenc, you have access to multiple rate control options.

The Q rate control mode is basically a modulated quantizer depending on spatial adaptive quantization, temporal-rdo, spatio-temporal AQ(deltaq-mode=1,2) and motion in general. Basically, its closest equivalent is CRF, so use it if you target maximum quality encodes without a bitrate limit.

CQ is Constrained Quality, meaning it's similar to it, except it can't go as high in terms of quality because of the bitrate constrained quality and other stuff. This is not recommended unless you have very specific requirements.

VBR and CBR are Variable and Constant Bitrate respectively. Unless you have a very recent aomenc build with the bitrate accuracy compiler flag enabled, I wouldnÂ’t recommend using them if youÂ’re trying to target a certain ratio of quality-bitrate.

As for cq-level, it is basically how you choose your base quality level/modulated quantizer. 24 is usually a good target for encoding at a decent quality. 20 is usually a good target for higher quality encoding, and 18 is where high quality encoding starts. 30 is where the threshold for low-mid quality starts and where aomenc-av1 really starts to pull away in front in quality/bitrate vs other encoders.

35-40 is where Youtube quality can be achieved without using more exotic settings. Anything higher is where the low quality threshold starts.

Note that these guidelines are all for 8-bit SDR live-action/animation sources. Very high motion and high contrast sources like video games have different requirements entirely, and thatÂ’s not even mentioning native 10-bit HDR sources with larger color gamuts; for video games, I usually recommend the Q level by 10-15 above the usual recommendations to achieve similar bitrates compared to easier content. As for HDR sources, keep reading

Bit-depth and chroma subsampling

Code:
--bit-depth=10
and whatever the source chroma subsampling is.

In AV1, you have access to 8-bit coding and 16-bit coding.
That leaves you with these bit-depths that the AV1 standard allows: 8-bit, 10-bit, and 12-bit.

I **always** recommend encoding in **10-bit**, particularly if your source is 4:2:0 YCbCr chroma subsampled limited range, even from an 8-bit source. So, most video sources currently found on the Internet.

Not only does encoding in 10-bit allow the encoder to process everything in 16-bit buffers(getting higher coding efficiency due to considerably less truncating/rounding off), but the much higher color depth allowed by 10-bit coding and output allows for a more perceptually efficient output, **particularly in darker shades where differences are more easily noticeable by the human eye and where dithering is more prominent.**

Also, since 8-bit YCbCr <> 8-bit RGB coding is not lossless unlike other transforms like YCoCg and XYB, 10-bit YcbCr allows for lossless RGB conversion to your screen.

As for other high bit-depth sources, keeping the same bit-depth is what is most optimal, especially if you value general HW decoder compatibility.

The same thing applies with chroma subsampling: unless you must support widespread HW decoders, keep the same chroma subsampling parameters as the source.

Encoding passes and lookahead

Code:
--lag-in-frames=48
(--passes=2 in aomenc is default, so no need to specify it).

2-pass was extremely important in vpxenc-vp9, as not only was it the only way for the encoder to utilize scene-detection, but it also allowed for the placement of alternate reference frames. Not doing that seriously cripples the encoder in what it can do. It also disables other stuff, but this also applies to aomenc-av1, so letÂ’s move on to the AV1 encoder again.

In aomenc-av1, 2-pass allows for these things in particular:
- More advanced scene detection when the lookahead buffer is high enough.
- Partition recoding: the encoder itself can decide whether or not to redo partition selection based on the preset on other conditions, resulting in better partition selection.
- Better auto-alt-ref placement through the encoded stream.

It also does some more advanced things, so IÂ’d advise keeping it on if you can

So yeah, always use 2-pass if you can. Luckily, itÂ’s set by default in the standalone encoder, so you donÂ’t need to do anything if you utilize a utility like nmkoder or av1an

As for lookahead, it is controlled through a parameter thatÂ’s called --lag-in-frames.

More lookahead in the form of lag-in-frames in aomenc gives you

- Better rate control.

- Better temporal-rdo.

- Better frame-placement.

- Generally more effective motion preservation due to a combination of previous and other factors.

In default aomenc, the range of lag-in-frames is 0-48, with the default being 35.
I always recommend putting to 48 as it increases efficiency nicely without any significant penalties other than higher memory consumption.

Another effect of lag-in-frames is the kind of scene detection the encoder decides to choose.

0-18: No scene-detection.

19-32: Scene detection mode 1 is active(due to limited future frame prediction)

33 and higher: Scene detection mode 2 is active due to large number of future references allowing for the highest level of scene detection present in aomenc and more information is gathered.

Temporal filtering

Code:
--arnr-strength=2 --arnr-maxframes=3
for medium fidelity live-action.

Code:
--arnr-strength=1 --arnr-maframes=3
for higher fidelity live-action. This will keep the temporal filtering on at low strength unless it decides it doesnÂ’t need it.

Code:
--arnr-strength=0
for animation.

Contrary to what I and many others believed, the arnr-maxframes=X parameter does not affect the maximum number of alternate reference in the encoderÂ’s search space sadly.

So, the settings written above affect temporal filtering, and nothing else. Interestingly enough, temporal filtering isnÂ’t exclusive to AV1 encoders: it can be found in other encoders for other standards and can even be found in some HW encoders, but thatÂ’s a discussion for another day.

That means `--arnr-strength=X` affects the strength of the filtering itself.
Higher = stronger = less detailts/artifacts pass through at the same quantizer.

I am of the philosophy that less is more, and if you want more filtering, you want to use external filtering which has way more dials to turn with to tweak the output. However, the filtering within the encoder is simple, decently effective, and tied to the encoding process decently(which can cause some problems however...) by lowering the filtering strength if your quantizer chosen is low enough. Of course, the adjustment itself isnÂ’t very high(1), so I prefer setting it lower myself.

As for arnr-maxframes, the trick is pretty simple: lower number of frames gets you higher visual consistency as with all spatio-temporal filtering, while a bigger filtering window gets you potentially higher quality filtering at the cost of a higher change of temporal artifacts. I prefer a low amount of frames to be used for temporal filtering for a more consistent look.

Animation is low variance by default, so there is no need to have temporal filtering on at all.

Spatial and spatio-temporal adaptive quantization

Code:
--aq-mode=1 --deltaq-mode=1
for low-mid fidelity encoding.

Code:
--aq-mode=1 --deltaq-mode=0
for higher fidelity and grainy encoding.

Code:
--aq-mode=1 --deltaq-mode=0 --enable-tpl-model=0
if you want the most stable grain possible, not the best one. You can also disable adaptive quantization for even more stable quantizer utilization, but at this time with default aomenc, I do not recommend doing that.

At very low bitrates, you can disable adaptive quantization entirely.

In aomenc, you have access to 3 spatial aq-modes:
  • aq-mode=1 is a variance based aq-mode, giving more bits to low variance blocks within SBs.
  • aq-mode=2 is a complexity based aq-mode, setting an AC bias(IE, high frequency varied pattern) to give more bits where high frequency detail is located.
  • aq-mode=3 is based on cyclic refresh AQ, giving more bits to moving spots within a mostly very static frame, such as in a video conference.

I pretty much always recommend aq-mode=1, since encoders are usually not very good at giving bits to low variance spots, and aomenc is no exception to that(in fact, IÂ’d argue it’s not very good at it in the 1st place). It would be nice if the aq-mode=1 also had an AC bias like in x264/x265Â’s aq-modes, but thatÂ’s a topic for another day.

As for the spatio-temporal deltaq-mode=X options(1/2, 3/4 are meant for AVIF/all-intra currently), they do some things rather interestingly.

deltaq-mode=1 is spatio-temporal adaptive quantization based on objective metrics, working in tandem with temporal RDO (tpl-model) to get nice coding gains by deciding costs between inter and intra coding modes alongside temporal optimizations. Works well at low-mid bitrates, but at higher fidelity levels and especially grainy stuff, it can be a detriment to fidelity.

deltaq-mode=2 is supposed to be the perceptual version of this , but not only does it not work well currently, but it also comes with a large speed penalty even at CPU-2/3, so I do not recommend using it at all as of March 2022.
BlueSwordM is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:08.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.