Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
![]() |
|
Thread Tools | Search this Thread | Display Modes |
![]() |
#1 | Link |
Registered User
Join Date: Dec 2021
Location: Canada
Posts: 22
|
The rav1e development thread: working on rav1e in a different more forward manner
Hello everyone. I know I don't post much on doom9, but no worries: from now on, I will be posting and most importantly, interacting more often
![]() The main reason I decided to write this post is that I've been (slowly) working on rav1e as well as contributing to other related projects like ssimulacra2_rs (a Rust version of the ssimulacra2_C psy image metric), and learning a lot of stuff from early September all the way to now. One of these things is how complex video encoder development truly is: not only do you need to have solid coding and math knowledge and experience, but a good understanding of scientific analysis and how the HVS system works is critical to good results in terms of speed, coding efficiency and visual performance. One of these things is how complex video encoder development truly is: a solid programming and mathematics background is very important to actually being able to implement the ideas you actually want to integrate into the encoder and how critical good scientific analysis with a solid understanding of how the HVS system works is to getting the best result possible in the end. Another important factor is how truly important feedback loops are to the entire development process: that is why metrics exist, right? They exist to act as a feedback loop into how encoder development should be driven, and where it should be driven. Be it objectively or subjectively driven, video encoder improvement is driven by how varied and good those feedback loops are. If they are poor, you will get suboptimal results, or the worst of all, you'll get the right answers to the wrong questions on how to improve the video encoder in question. Something that is often ignored in tightly knit video encoder development groups is how helpful user-driven/community-driven feedback can truly be, as working in a tightly knit group often results in shared opinions becoming the norm, and critical concensus is never truly attained. In that sense, you start developping tunnel vision in regards to what needs to become better after you've picked the low hanging fruit. How does that relate to the AV1 video encoder development scene? Well, most of the actual video encoder development in aomenc, SVT-AV1 and other tools is done somewhat privately by AOM members while not taking into opinions from outside entities much or disregarding them entirely. rav1e is a lot better in this regard, but because of a lack of funding, its development has slowed down somewhat. A lack of public open discussion on something like a forum means most of the discussion is happening on slightly closed off(#daala IRC channel) or even proprietary platforms(Discord AV1 server), both suboptimal ways to discuss development and news about this kind of stuff since it's closed off from public view, and most information moves from mouth to mouth, preventing general view from a large portion of viewers, readers, developers, enthusiasts, encoders, and even search engines, making for low visibility. As such, I've decided to lead (somewhat independently lmao) a different much more open approach to what I believe will be the strongest for AV1 encoder development: on public forums platforms like Reddit, doom9 forums, and a few others, I'll start taking feedback and ideas for various improvements and suggestions on what features to add and improve in rav1e, respond to general questions, involve more industry folks in the subject, have people from different standard toolset teams chip in for more general improvements. Most importantly, video encoder feedback can and will be taken more seriously, as every complaint or criticism with proper explanations of what is bad and can be changed is invaluable advice. I believe this addition in terms of feedback as well as the improvement of using heavily psy metric development I believe this change in development feedback loop is necessary, as it fully complements the beginning of the widespread usage of much more heavily psy driven development models and more varied understanding in that regard. In that sense, having many outside eyes and hands in a video encoder project is critical to its large success, as it can be used to go to revive and kick off a program that was thought to never be able to improve on its implementation weaknesses. With all of that in mind, what do I believe rav1e's end goal is? In a somewhat generic manner, it is to be the AV1 encoder equivalent of x264, but only from a general usage point of view. In the context of a video encoder, it is more nuanced than that:
Thank you all for reading up to this point. All of you and the people who've helped me to get up to this point are why I've decided to attempt such a large mindset change over the last few years. A few of them come to mind: rav1e and av1an developers, dav1d, some ffmpeg devs, AV1 Discord server members, and JPEG-XL folks; there are just too many good people to mention them all sadly. Now that I've said everything in my mind up to this point, I'd like to start the conversation by saying one thing: what do you suggest we can do to improve rav1e, the way we do development, and what suggestions do you have in general? I'd love to hear from all of you guys and gals around the world with different perspective and views. If any of you would like to submit concrete improvements backed up with comements and optimally, with numbers, please do so. This is the kind of feedback that is the most appreciated. Thank you all, and have a good day. All of the above text and opinions are of my own, only shared by some other people, and do not represent the thoughts of rav1e's development team as a whole. Last edited by BlueSwordM; 26th November 2022 at 07:48. |
![]() |
![]() |
![]() |
#2 | Link |
Registered User
Join Date: Dec 2021
Location: Canada
Posts: 22
|
One simple way to improve rav1e at all quality levels is to simple improve deblocking: while the strength selection algorithm is already very optimal, the distortion metric used is SSE, and is not influenced in any way by other forms of more advanced psy RDO metric in rav1e.
That means it blurs blocks to a larger degree than necessary in my testing in anything grainy, and unlike aomenc (unless I can't read the deblock.rs source properly), it doesn't actually include a bias to counteract that to a small degree. It should be relatively easy to lower deblocking strength somewhat to counteract this in the very short term, and in the mid term, implement frequency weighted SSE to make the deblocking filter application a lot more effective. This could either be determined through the non-flatness metric used, or by using tx_domain_dist in an interesting way to map out the block frequencies. Long term wise, it would be possible to even integrate the current psycho-visual RDO metric into it, although that would require some new piping and tuning, but it would allow reuse of data that is already used in activity masking. Last edited by BlueSwordM; 24th November 2022 at 06:23. Reason: Better formatting and clarification |
![]() |
![]() |
![]() |
#5 | Link |
Registered User
Join Date: Dec 2021
Location: Canada
Posts: 22
|
One other interesting thing I've noticed is how in most encoders, the more advanced RDO decisions aren't available to stuff like quantization, motion estimation, and temporal-RDO.
I can obviously understand not having more complex RDO for ME, as SSD/SATD ME RD decisions are already demanding enough as is, so adding a more complex metric would be very computationally complex. However, x264 and x265 jump out in particular as not having their psy-rd implementations working in their TPL-RDO implementations. rav1e actually does have it available, and in the encoder, when you disable the coding tools that tend to blur even at higher bitrates currently(deblocking and SGR, CDEF is a bit different), disabling TPL-RDO doesn't actually improve image quality that much, and tends to actually make image quality worse, even at the high end of content and bitrates(Foodmarket). This is actually an interesting observation that I didn't notice much before. |
![]() |
![]() |
![]() |
#6 | Link | |
Derek Prestegard IRL
![]() Join Date: Nov 2003
Location: Los Angeles
Posts: 5,978
|
Quote:
![]()
__________________
These are all my personal statements, not those of my employer :) |
|
![]() |
![]() |
![]() |
#7 | Link |
Registered User
Join Date: Dec 2021
Location: Canada
Posts: 22
|
Thank you for the compliment Blue_Misfit. And yeah, aomenc and SVT-AV1's codebases make them hard for someone like me to implement more psy driven RDO across all encoding processes.
In other news, I think I've figured out how to implement psy-driven RDO quantization: in-block psy-RDO and psy trellis/hybrid deadzone quant. |
![]() |
![]() |
![]() |
#9 | Link |
Registered User
Join Date: Dec 2021
Location: Canada
Posts: 22
|
Indeed.
Anyway, the main problem with specific video encoder pipelines is that they rely on low complexity metrics to make them feasible to run without HW acceleration and make HW acceleration not costly. For quantization processes and motion estimation(as well as TPL-RDO depending on how complex its implementation is), fast metrics are a necessity. That means only SAD and SATD to a lesser extent even when RDO for those processes is active, which means suboptimal consistent psycho-visual targeting. Using frequency weighted metrics in the traditional way by using a DCT transform to get the frequency information is fine compute wise, but it's too slow for processes that are repeated a lot like quantization. To fix that, I've finally found a way to fix this: I'll be using a specific filter to get the frequency information out of the blocks without using a transform, allowing us to get all the benefits of this without the compute costs! This will allow me to do stuff in a somewhat psycho-visually weighted manner while preserving the speed of a very simple metric like SAD, getting us nice gains even at high speed presets and even speedups for those presets! For those interested into how I've discovered this, I present you to you a glorious daala paper, where the Xiph folks managed to do something very smart: https://people.xiph.org/~tterribe/da...a-icip2017.pdf |
![]() |
![]() |
![]() |
#10 | Link |
hlg-tools Maintainer
Join Date: Feb 2008
Posts: 367
|
Greetings. I am a video encoding layperson. I do not currently possess the math skills required to implement what I want to see. I have implemented a FFT but barely know what a MDCT is.
From a user's perspective, I would much like to see AV1's capability for representing noise better exploited. I am, of course, referring to film grain synthesis. At present, SVT-AV1 seems to model in a somewhat okay manner, but on playback, the reproduced grain level is far below what it originally was. This is true when going to the max setting. In other words, SVT-AV1 *seems* to identify grain correctly, but doesn't set the intensity high enough. Another option is that libdav1d is decoding incorrectly. Either way, really good noise emulation would be invaluable for greatly reducing the bitrate of noisy movies (looking at you, Zack Snyder) while having similar visual quality. |
![]() |
![]() |
![]() |
#11 | Link | |
Registered User
Join Date: Feb 2021
Location: Germany
Posts: 15
|
Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|