View Single Post
Old 13th December 2018, 18:18   #1311  |  Link
Beelzebubu
Registered User
 
Join Date: Feb 2003
Location: New York, NY (USA)
Posts: 109
Quote:
Originally Posted by SmilingWolf View Post
Ronald Bultje commented on frame parallelism being a bad thing for VP9, so not much of a surprise it was turned off by default in AV1/libaom
No, that's a mis-interpretation. Frame parallelism is great.

For encoding, the speedup is slightly better than for tile parallelism, and the quality loss per added thread is less than for tile threading. For example, in my experiments, frame-multithreading in Eve/VP9 costs 0.0% BDRATE loss for a 1.8x speedup going from 1 to 2 threads, but tile threading only gives a 1.7x speedup and has a BDRATE quality loss of around 0.5%. This pattern holds for more threads, and tends to be true across multiple codecs and encoders. Now, obviously, libaom/vpx have no frame threaded encoding so not much to be said there. But in x264, my experience from many years ago is that they switches from slice to frame multi-threaded encoding for the same reason: better scaling *and* less BDRATE quality loss. So far, so good.

OK, next, decoding. This is trickier. For ffh264, for example, we classically found that frame-multithreaded decoding gives a higher speedup than slice-multi-threaded decoding per added thread. Given this pattern of frame multithreading scaling better *and* having less quality loss than within-frame alternatives in a variety of codecs, you'd expect everything to be good, right? Well, not exactly. It holds true, but only to some extend.

The problem in decoding of vp9/av1 is that frames depend on entropy output of the previous frame. For h264/5, cabac state resets in each frame, but this is not true for vp9/av1. So, for frame-multithreading, you need to split decoding in 2 passes, and pass 1 of the next dependent frame can only start when the previous frame finished it's pass 1 and started its pass 2. So, vp9/av1 *decoding* scale less well than h264 *decoding* when using frame multithreading. Fortunately, the system load doesn't go up either, so really what it means is that you need more threads to fully saturate a system. It's even better if you combine frame and tile threading, like what dav1d does.

Wait, you're asking now, what about that statement that frame parallelism is bad in libvpx? Well, it's not what you think it is. --frame-parallel in vpxenc has nothing to do with frame multi-threading in the encoder. It's a header bit that removes the entropy dependency I just talked about. So now, it scales better when using frame multi-threading, which is why this bit is called the "frame parallelism" bit, but it also costs you all backwards entropy, incurring >1% BDRATE quality loss. However, there is no reason to do this. Hardware is not allowed to support higher resolutions with vs. without this feature, and there is no software decoder that implements frame multithreading with but not without entropy dependencies disabled. And if entropy dependencies are present, you can saturate system load anyway by simply using more threads. So the whole thing is kind of silly. Why give up quality for no gain whatsoever?

Quote:
Originally Posted by mandarinka View Post
I can't tell how correct it is, but this was an interesting read: https://codecs.multimedia.cx/2018/12...cal-about-av1/
Author is a former libav/ffmpeg developer if you don't remember his name.
Kostya Shishkov.
Beelzebubu is offline   Reply With Quote