Do VP9 have B-frame or P-frame ?

kidmany2001 · 21st April 2014, 10:56

Hi eveybody,
After I read the WebM(VP9) official site.
I knew VP9's encode can be set the period between I-frames.
But it didn't say what are the type of frame between the I-frame.
I ask an expert. He told me there's no coding structure like HEVC with hierarchical-B .

I wonder in VP9 encoding ,
what are the kind of frames between the two different I-frame.

It's just nornal reference frame? Or VP9 still use B-frame and P-frame ?
Can I encode the sequence with IBBBBB as HEVC's random access test condition ?

I am quit confused ? Who can teach me?

thanks.

nevcairiel · 21st April 2014, 11:10

VP9 uses only P-frames, it does not have B-frames.

However, in addition, it supports a few additional features to manage reference frames, called AltRef and Golden frames.

kidmany2001 · 21st April 2014, 12:51

thanks for the answer.

Is there any document about the VP9's p-frame.?

I want to read about these.

mandarinka · 21st April 2014, 12:58

Check this thread for a description, mainly the "Reference frame management" and "Inter prediction" sections: http://forum.doom9.org/showthread.ph...86#post1647086

From that it seems to me that altref and golden frames are just names now and the real scheme works differently this time (closer to H.264's multiple referencing scheme, but with some comlications). And there is some thing - obviously named differently - that mimicks B-frames (compound prediction).

Quote:

Originally Posted by pieter3d

However, VP9 does support “compound prediction”, which really is just another word for bi-prediction where there are two motion vectors for each block and the two resulting prediction samples are averaged together. In order to avoid patents on bi-prediction, compound prediction is only enabled in frames that are marked as not-displayable. A frame like this is never output for display, but may be used for reference later. In fact, a later frame may consist of nothing but 64x64 blocks with no residuals and 0,0 motion vectors that point to this non-displayed frame, effectively causing it to be output later using very little data.

So maybe the answer is "yes it has bframes but don't tell anybody, we want to pretend it is not borrowing from MPEG at all" ? Well, that might be a bit on a provocative side, but that's how it looks to me.

nevcairiel · 21st April 2014, 13:01

At the very least it doesn't have frame-reordering typically caused by B-Frames, because they wanted to keep that complexity out. Instead it has ref frames which are not displayed and discarded by the decoder.. oh well.

benwaggoner · 21st April 2014, 20:23

Quote:

Originally Posted by nevcairiel

At the very least it doesn't have frame-reordering typically caused by B-Frames, because they wanted to keep that complexity out. Instead it has ref frames which are not displayed and discarded by the decoder.. oh well.

Do they have any encoder that's doing all the stuff?

One of the great features of B-frames is that they are skippable frames when seeking during random access, which makes things a lot faster. If there was an encoder that could make "skippable" frames and flag them in a way that a decoder could know what isn't needed to be decoded, that could make for much better random access with VP9.

Given that multithreaded decode of each VP9 frame doesn't look that feasible, having to decode each frame serially would be seriously painful. Imagine trying to get to frame 235 in a 240 frame GOP using an ARM software decoder...

kidmany2001 · 23rd April 2014, 13:15

Thanks for the response.
Before I jump to the conclusion about this issue.
First, I need to clear some questions about temporal prediction structure of VP9 .

the question :
1. The typical test scenarios in HEVC are : low dealy P , low delay B and random access ,
can we enumlate those scenarios (intra prediction) of HEVC as possible as we can in VP9?
Meanwhile, which parameter should be set to control the temporal prediction structure ?
( I think the parameter lag-in-frames, arnr-maxframe and arnr-strength
are what we need to notice.
Does anyone have the information of these part ? )

2. Since we already know VP9's compound mode is similar to b-frames.
Does VP9 still have the concept of P-frame ?
What exactly the temporal prediction structure VP9 is ? normal ? hierarchical ?

3. Can we set the temporal prediction of VP9 to encode IBBBBBB or IPPPPPPP individually?
The kf-max-dist can set I-frame part. But can we decide the frame type as P-frame or B-frame ?

Parameter guide is poor to explain the implementation.

More detailed exploration is welcome .

Hope more discussions will help us to eliminate the doubt.

thanks

pieter3d · 2nd May 2014, 16:52

With the flexible 8-entry reference pool, you can setup B-pyramids for trick mode / fast forwarding. It all depends on your structure. And with hidden frames you can emulate the reordering behavior pretty closely. Compound aka bipred is enabled at the frame level, using a header flag. One of the requirements is that one or two of the 3 reference frames (last, gold, altref) are marked as future frames, suggesting that their content be in the future.

Note that in H.264 and HEVC, frames are not really marked P and B, it can change per slice. Also, B slices can still code blocks that are P-like, and there are no constraints on where the two reference frames are in time for bipred blocks.

benwaggoner · 2nd May 2014, 18:40

Quote:

Originally Posted by pieter3d

With the flexible 8-entry reference pool, you can setup B-pyramids for trick mode / fast forwarding. It all depends on your structure. And with hidden frames you can emulate the reordering behavior pretty closely. Compound aka bipred is enabled at the frame level, using a header flag. One of the requirements is that one or two of the 3 reference frames (last, gold, altref) are marked as future frames, suggesting that their content be in the future.

That sounds promising. Is there some sort of frame header declaration about this stuff that a decoder could use to easily determine what frames are skippable?

Quote:

Note that in H.264 and HEVC, frames are not really marked P and B, it can change per slice. Also, B slices can still code blocks that are P-like, and there are no constraints on where the two reference frames are in time for bipred blocks.

I'm assuming each frame is a single slice and no tiling for H.264 and HEVC for typical VOD use cases (random access doesn't matter so much in live). Would not having mixed slice types in a single frame have much of a potential impact on compression efficiency? Certainly with H.264 we got out of the habit of using slicing outside of Blu-ray and very low latency real-time encoding due to the compression efficiency hit.

WPP in HEVC is a far superior replacement to slices for multithreaded decoding, but I haven't really thought through how slices might be useful for other scenarios in HEVC.

pieter3d · 2nd May 2014, 19:07

Regarding hierarchical structures: I think you just have to buffer a handful of compressed frames, decode their uncompressed headers (very quick) and infer the reference structure. Then you can know what frames can be dropped out. As of today there isn't any kind of metadata to indicate this (e.g. temporal layer id), but something may yet be added to the container.

Slices, and especially dependent slices + WPP are very useful for for low latency video transmission (conferencing). This way you can transmit a picture row by row, using full slice encapsulation, the NAL unit.

benwaggoner · 2nd May 2014, 20:54

Quote:

Originally Posted by pieter3d

Regarding hierarchical structures: I think you just have to buffer a handful of compressed frames, decode their uncompressed headers (very quick) and infer the reference structure. Then you can know what frames can be dropped out. As of today there isn't any kind of metadata to indicate this (e.g. temporal layer id), but something may yet be added to the container.

That could work, although it'd be a pain for streaming scenarios where you'd like to know what you could do with frames that haven't downloaded yet. A real moof would be nice

.

Quote:

Slices, and especially dependent slices + WPP are very useful for for low latency video transmission (conferencing). This way you can transmit a picture row by row, using full slice encapsulation, the NAL unit.

Yes, absolutely. But that's not a random access scenario anyway. The VPx codecs have always been less disadvantaged for videoconferencing type scenarios.

I feel blessed that I can mainly focus on non-realtime file-to-file encoding and VOD delivery these days. So many other problems don't apply so I can focus more on the (somehow still infinite) number of problems still remaining...

21st April 2014, 10:56	#1 \| Link
kidmany2001 cookieman Join Date: Mar 2014 Posts: 6	Do VP9 have B-frame or P-frame ? Hi eveybody, After I read the WebM(VP9) official site. I knew VP9's encode can be set the period between I-frames. But it didn't say what are the type of frame between the I-frame. I ask an expert. He told me there's no coding structure like HEVC with hierarchical-B . I wonder in VP9 encoding , what are the kind of frames between the two different I-frame. It's just nornal reference frame? Or VP9 still use B-frame and P-frame ? Can I encode the sequence with IBBBBB as HEVC's random access test condition ? I am quit confused ? Who can teach me? thanks.

21st April 2014, 11:10	#2 \| Link
nevcairiel Registered Developer Join Date: Mar 2010 Location: Hamburg/Germany Posts: 10,347	VP9 uses only P-frames, it does not have B-frames. However, in addition, it supports a few additional features to manage reference frames, called AltRef and Golden frames. __________________ LAV Filters - open source ffmpeg based media splitter and decoders

21st April 2014, 13:01	#5 \| Link
nevcairiel Registered Developer Join Date: Mar 2010 Location: Hamburg/Germany Posts: 10,347	At the very least it doesn't have frame-reordering typically caused by B-Frames, because they wanted to keep that complexity out. Instead it has ref frames which are not displayed and discarded by the decoder.. oh well. __________________ LAV Filters - open source ffmpeg based media splitter and decoders

21st April 2014, 12:51	#3 \| Link
kidmany2001 cookieman Join Date: Mar 2014 Posts: 6	thanks for the answer. Is there any document about the VP9's p-frame.? I want to read about these.

23rd April 2014, 13:15	#7 \| Link
kidmany2001 cookieman Join Date: Mar 2014 Posts: 6	Thanks for the response. Before I jump to the conclusion about this issue. First, I need to clear some questions about temporal prediction structure of VP9 . the question : 1. The typical test scenarios in HEVC are : low dealy P , low delay B and random access , can we enumlate those scenarios (intra prediction) of HEVC as possible as we can in VP9? Meanwhile, which parameter should be set to control the temporal prediction structure ? ( I think the parameter lag-in-frames, arnr-maxframe and arnr-strength are what we need to notice. Does anyone have the information of these part ? ) 2. Since we already know VP9's compound mode is similar to b-frames. Does VP9 still have the concept of P-frame ? What exactly the temporal prediction structure VP9 is ? normal ? hierarchical ? 3. Can we set the temporal prediction of VP9 to encode IBBBBBB or IPPPPPPP individually? The kf-max-dist can set I-frame part. But can we decide the frame type as P-frame or B-frame ? Parameter guide is poor to explain the implementation. More detailed exploration is welcome . Hope more discussions will help us to eliminate the doubt. thanks

2nd May 2014, 16:52	#8 \| Link
pieter3d Registered User Join Date: Jan 2013 Location: Santa Clara CA Posts: 114	With the flexible 8-entry reference pool, you can setup B-pyramids for trick mode / fast forwarding. It all depends on your structure. And with hidden frames you can emulate the reordering behavior pretty closely. Compound aka bipred is enabled at the frame level, using a header flag. One of the requirements is that one or two of the 3 reference frames (last, gold, altref) are marked as future frames, suggesting that their content be in the future. Note that in H.264 and HEVC, frames are not really marked P and B, it can change per slice. Also, B slices can still code blocks that are P-like, and there are no constraints on where the two reference frames are in time for bipred blocks.

2nd May 2014, 19:07	#10 \| Link
pieter3d Registered User Join Date: Jan 2013 Location: Santa Clara CA Posts: 114	Regarding hierarchical structures: I think you just have to buffer a handful of compressed frames, decode their uncompressed headers (very quick) and infer the reference structure. Then you can know what frames can be dropped out. As of today there isn't any kind of metadata to indicate this (e.g. temporal layer id), but something may yet be added to the container. Slices, and especially dependent slices + WPP are very useful for for low latency video transmission (conferencing). This way you can transmit a picture row by row, using full slice encapsulation, the NAL unit.