Google VP9 "Next Generation Open Video" information posted - Page 46

dapperdan · 30th August 2016, 13:32

Quote:

Originally Posted by mzso

This is weird... So it's worse below and above 1080p, but better or equivalent at 1080p?

They only tested 480, 720 and 1080. So it sounds like x265 does better overall, but mostly on the lower end.

dapperdan · 30th August 2016, 13:42

Quote:

Originally Posted by NikosD

Edge already supports it.

But it is for me a huge mystery, how on Earth the company (google) that develops both Chrome and VP9 hasn't managed yet to make HW acceleration feasible in its own browser that forces VP9 codec in Youtube, while Microsoft the other opponent has already done this in its own browser that doesn't forces the use of VP9.

Crazy.

It's boringly sane actually.

VP9 only helps Youtube save bandwidth if lots of people use it. Restricting it to those with hardware acceleration would basically kill it and make the whole things pointless, so they need to do at least software decode and then hardware as a bonus for low end and mobile devices. Each time they have to choose which to spend developer time on, then software decode is going to have lots of benefits since it will help so many more people it'll basically pay for itself many times over.

They run the numbers based on users bandwidth vs CPU vs visual quality vs user engagment to see whether it's working or not. They initially restricted VP9 to not show on XP devices because these stats showed that people on those devices gave up in disgust. If they're still showing it to you and you have a bad experience with it then presumably their data shows 3 or 4 people (maybe those with modern computers on bad connections) are having a better experience and watching more Youtube ads as a result.

Meanwhile, Microsoft has always been wary aout exposing itself to patents or licence fees, so is more interested in farming that out to 3rd parties and doesn't really care that much if most Edge users don't get VP9
support.

NikosD · 30th August 2016, 13:52

Interesting point of view

Motenai Yoda · 30th August 2016, 16:34

from intel 7th core generation slides

Quote:

Intel Video Codec Support
Kaby Lake Skylake Broadwell
H.264 Decode Hardware Hardware Hardware
HEVC Main Decode Hardware Hardware Hybrid
HEVC Main10 Decode Hardware Hybrid No
VP9 8-Bit Decode Hardware Hybrid Hybrid
VP9 10-Bit Decode Hardware No No

H.264 Encode FF & PG-Mode FF & PG-Mode PG-Mode
HEVC Main Encode FF & PG-Mode PG-Mode No
HEVC Main10 Encode FF & PG-Mode No No
VP9 8-Bit Encode FF & PG-Mode No No
VP9 10-Bit Encode No No No

31st August 2016, 20:12

Netflix's presentation of their study; A large scale video codec comparison of x264, x265 and libvpx for practical VOD applications, can be watched on YouTube here... https://youtu.be/wi1BefrfTos?t=1h25s

Naturally, if you don't use --tune ssim with x265, results are worse when measured using SSIM. When they avoided --tune psnr and --tune ssim their own visual quality metric showed that x265 delivered the highest efficiency.

Motenai Yoda · 1st September 2016, 01:17

Quote:

Originally Posted by x265_Project

Naturally, if you don't use --tune ssim with x265, results are worse when measured using SSIM. When they avoided --tune psnr and --tune ssim their own visual quality metric showed that x265 delivered the highest efficiency.

well but they avoided to tune for ssim or psnr with vp9 and x264 too for those tests as the 1/3 psnr slide mention "PSNR-tuned configuration", where vp9 give 43.5% bitrate reduction over x264 vs 43.4% of x265 (still there is a 2.5% gain for x265 vs vp9 on averages)
slide 2/3 based on ms-ssim indeed visual quality tuned, and here vp9 performs a bit better

so the point is: will tuning codecs for ssim give more reliable results on ms-ssim or vmaf metrics?

dapperdan · 1st September 2016, 18:07

They talk about a paper with more details of their results, is that available yet?

2nd September 2016, 16:28

Quote:

Originally Posted by dapperdan

They talk about a paper with more details of their results, is that available yet?

Not yet. Keep your eye on http://techblog.netflix.com/

2nd September 2016, 17:14

Quote:

Originally Posted by Motenai Yoda

well but they avoided to tune for ssim or psnr with vp9 and x264 too for those tests as the 1/3 psnr slide mention "PSNR-tuned configuration", where vp9 give 43.5% bitrate reduction over x264 vs 43.4% of x265 (still there is a 2.5% gain for x265 vs vp9 on averages)
slide 2/3 based on ms-ssim indeed visual quality tuned, and here vp9 performs a bit better

so the point is: will tuning codecs for ssim give more reliable results on ms-ssim or vmaf metrics?

It's well known that PSNR and SSIM are somewhat crude quality metrics that don't correlate perfectly with human quality evaluations. Over the years that x264 was optimized, the development team noticed that certain algorithms would deliver better subjective (human) visual quality, but worse objective quality scores (PSNR or SSIM). Knowing that the main goal of a video encoder is to deliver the best subjective visual quality, they optimized for the best experience as judged by humans. To enable PSNR or SSIM driven evaluations to be done more fairly, the x264 team came up with "tunings" that turned off all algorithms that affected these measurements negatively. Of course, no one would ever use --tune PSNR or --tune SSIM for production encoding, as these encodes will always deliver inferior visual quality when evaluated by real people.

As Netflix explained, they did 2 sets of test encodes with x264 and x265. One set was tuned for PSNR (using --tune psnr), and one set was done without this tuning. PSNR tuned encodes are the only valid encodes to use when you use PSNR as a test metric.
So, the first result they showed was a valid comparison...

The second results slide they showed was not valid comparison (you need to use --tune ssim if you want to compare x264 or x265 using SSIM).

Similarly, if you're doing subjective visual quality tests, or using an advanced metric like Netflix's VMAF, which correlates more closely with subjective visual quality assessments, you should not use the --tune PSNR encodes. The third results they showed are valid, as they used the visual quality tuned encodes to compare using their VMAF metric.

Of course, no objective (computer calculated) quality metric is as good as humans watching and evaluating video. There are many aspects of a video experience that are hard to measure with mathematical formulas - like motion accuracy, or the degree of noticeable compression artifacts.

Netflix's study is one of the most valid and comprehensive studies ever conducted, as they used real production codecs on a very large and representative sample of high quality real production content (unlike some codec comparisons that used previously compressed content, or used only reference encoders). This study definitely furthers our understanding of the performance that is possible from VP9 and HEVC under real-world conditions. Netflix has an amazingly talented R&D team, and it's awesome that they are willing to publish their research for everyone's benefit.

2nd September 2016, 17:32

And now, straight from Netflix...http://www.streamingmedia.com/Articl...ticleID=113346
Jan Ozer writes...

Quote:

I asked Netflix which set of results they felt was most significant. Their response was, “We believe that VMAF results will have the best correlation to user perception of quality. We use this metric, and sanity-check against other metrics (PSNR, SSIM, VIF, etc.) internally.”

Jamaika · 2nd September 2016, 19:58

Thanks for the explanation. It's nice that someone has developed a metric VMAF but I can't use from that.
According to this chart tune psnr there is no foreseeable result of the ratio of the maximum signal power to the noise power distortive the signal.
Results for Daala.

I Wonder, what is my predictable conversion score in function pass2 veryslow?

mandarinka · 3rd September 2016, 14:53

Quote:

Originally Posted by x265_Project

Hmm, could it be that VP9's win in that MS-SSIM test was caused completely just by psy optimizations distorting the result?

IIRC, libvpx doesn't have as extensive set of psyRDO/adaptive quantization tools as x264 developed (and then x265 adapted/extended). Since psy measurably hurts metrics, in this test x265 might have been handicapped while libvpx not, "thanks"to the latter's lack of meaningfully psy (somebody correct me if it has Psy RDO now)?

3rd September 2016, 20:23

Quote:

Originally Posted by mandarinka

Hmm, could it be that VP9's win in that MS-SSIM test was caused completely just by psy optimizations distorting the result?

IIRC, libvpx doesn't have as extensive set of psyRDO/adaptive quantization tools as x264 developed (and then x265 adapted/extended). Since psy measurably hurts metrics, in this test x265 might have been handicapped while libvpx not, "thanks"to the latter's lack of meaningfully psy (somebody correct me if it has Psy RDO now)?

Yes... it's well known that if you want to measure x264 or x265's SSIM scores, you need to use --tune SSIM for your encodes. If you don't (as with this example), x264 and x265 will be using algorithms that improve visual quality, but produce lower SSIM scores.

How many of you remember this blog post?
http://web.archive.org/web/201007230...edia.cx/?p=458

Even when you use --tune PSNR or --tune SSIM, and then compare encoders using PSNR or SSIM, you won't have a fully reliable comparison.

x264 and x265 were not designed to achieve the highest efficiency when measured with PSNR or SSIM. They're designed to achieve the highest visual quality with any content at any chosen bit rate. We could have much higher PSNR or SSIM scores if we simply optimized everything for these objective metrics. But that wouldn't be the best encoder for producing video for actual human beings to watch.

Netflix understands the limitations of PSNR and SSIM, which is why they've invested a lot of time and energy, working with experts at the University of Southern California, into developing a better objective quality measurement tool - Video Multimethod Assessment Fusion (VMAF). VMAF correlates to subjective (human) visual quality assessments much more closely than PSNR, SSIM, or other available objective metrics. VMAF results are a much more reliable predictor of actual human subjective visual quality evaluations.

LigH · 3rd September 2016, 21:08

Then I hope the VMAF specs are publicly available to make an AviSynth plugin implementable...

But let's discuss that elsewhere.

mandarinka · 3rd September 2016, 21:24

Yeah, I'm aware of the ssim/psnr and x264 businesses, I was encoding back then, when VAQ/PsyRDO landed and changed the encoding landscape like nothing in years

('08/'09)

I'm thinking that they probably meant to not test the same thing as with plain SSIM, when they run the MS-SSIM metric. Maybe they aimed to use it to gauge the visual quality like with VMAF... be it a good idea or not. I'm not sure how similar is MS-SSIM to SSIM proper. Xiph people use it IIRC, probably thinking it correlated better with visual quality than SSIM.

But x265 is still going to have bigger delta between metrics-tuned and psy-tuned result than libvpx is going to, I think, since libvpx lacks psyrdo. So to the degree MS-SSIM is similar to SSIM, it is probably disadvantaging x265. Hard to know for sure though, I never tried it.

3rd September 2016, 23:21

Quote:

Originally Posted by LigH

Then I hope the VMAF specs are publicly available to make an AviSynth plugin implementable...

But let's discuss that elsewhere.

They open sourced it under the Apache 2.0 license...
https://github.com/Netflix/vmaf

Jamaika · 4th September 2016, 07:23

Quote:

Originally Posted by mandarinka

Hmm, could it be that VP9's win in that MS-SSIM test was caused completely just by psy optimizations distorting the result?

IIRC, libvpx doesn't have as extensive set of psyRDO/adaptive quantization tools as x264 developed (and then x265 adapted/extended). Since psy measurably hurts metrics, in this test x265 might have been handicapped while libvpx not, "thanks"to the latter's lack of meaningfully psy (somebody correct me if it has Psy RDO now)?

Codecs libvpx and libaom have metric pnsr and ssim.

Code:

--tune=<arg>                Material to favor
                                          psnr, ssim}
--psnr                      Show PSNR in status line

The rest I think no one cares.
More interesting to me, how netflix introduced meric to codec Daala. Here probably wrote additional function and included. Daala doesn't have such functions. With the deduction indicates that VMAF can be used for all the codecs.
It is a pity that netflix didn't give the predicted score charts of PNSR / SSIM / VMAF vs DMOS for codecs X265 or VPX.

Probably not to annoy developers.
Results for Daala.

Quote:

Originally Posted by mandarinka

I'm thinking that they probably meant to not test the same thing as with plain SSIM, when they run the MS-SSIM metric. Maybe they aimed to use it to gauge the visual quality like with VMAF... be it a good idea or not. I'm not sure how similar is MS-SSIM to SSIM proper. Xiph people use it IIRC, probably thinking it correlated better with visual quality than SSIM.

Exceptions results are considerable for Daala

Chroma from Luma is a PSNR in Daala
Despite lending visual improvements, Chroma from Luma is a PSNR penalty in Daala. This is not particularly surprising, as neither PSNR nor any of the other common objective quality measures used in video coding represent color perception well, but it is especially interesting as the similar lack of PSNR performance in HEVC testing would certainly have contributed to it being dropped from the final HEVC standard. Perhaps the joint working group might have overlooked that if the spatial version of Chroma from Luma had not also been an apparent performance penalty. I speculate here mainly because our frequency domain implementation is substantially faster than classic intra prediction, not slower.

herbert · 5th September 2016, 23:27

VideoLAN Dev Days 2016: Update on VPX
Recap on recent VPX developments.

https://www.youtube.com/watch?v=peS2I14w8ow

VideoLAN Dev Days 2016: A VP9 Encoder
Information on ffvp9 until about 6:45, followed by a talk about 'Eve', an alternative VP9 encoder.

https://www.youtube.com/watch?v=t_z52-CBut0

Selur · 10th September 2016, 06:30

Just wondering: Since av1, vp8 and vp9 seem to be based on the same framework can they be compiled into a single binary? (vpxenc can include vp8, vp9, vp10, so I was wondering if av1 could be added there too)

Quikee · 10th September 2016, 20:32

Quote:

Originally Posted by Selur

Just wondering: Since av1, vp8 and vp9 seem to be based on the same framework can they be compiled into a single binary? (vpxenc can include vp8, vp9, vp10, so I was wondering if av1 could be added there too)

No, they removed everything that won't be used in av1 and renamed everything vp8, vp9, vpx to aom or av1.

30th August 2016, 13:52	#903 \| Link
NikosD Registered User Join Date: Aug 2010 Location: Athens, Greece Posts: 2,901	Interesting point of view __________________ Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all

31st August 2016, 20:12	#905 \| Link
x265_Project Guest Posts: n/a	Netflix's presentation of their study; A large scale video codec comparison of x264, x265 and libvpx for practical VOD applications, can be watched on YouTube here... https://youtu.be/wi1BefrfTos?t=1h25s Naturally, if you don't use --tune ssim with x265, results are worse when measured using SSIM. When they avoided --tune psnr and --tune ssim their own visual quality metric showed that x265 delivered the highest efficiency. Last edited by x265_Project; 31st August 2016 at 20:15.

2nd September 2016, 19:58	#911 \| Link
Jamaika Registered User Join Date: Jul 2015 Posts: 706	Thanks for the explanation. It's nice that someone has developed a metric VMAF but I can't use from that. According to this chart tune psnr there is no foreseeable result of the ratio of the maximum signal power to the noise power distortive the signal. Results for Daala. I Wonder, what is my predictable conversion score in function pass2 veryslow? Last edited by Jamaika; 4th September 2016 at 08:00.

3rd September 2016, 21:08	#914 \| Link
LigH German doom9/Gleitz SuMo Join Date: Oct 2001 Location: Germany, rural Altmark Posts: 6,782	Then I hope the VMAF specs are publicly available to make an AviSynth plugin implementable... But let's discuss that elsewhere. __________________ New German Gleitz board MediaFire: x264 \| x265 \| VPx \| AOM \| Xvid

10th September 2016, 06:30	#919 \| Link
Selur Registered User Join Date: Oct 2001 Location: Germany Posts: 7,277	Just wondering: Since av1, vp8 and vp9 seem to be based on the same framework can they be compiled into a single binary? (vpxenc can include vp8, vp9, vp10, so I was wondering if av1 could be added there too) __________________ Hybrid here in the forum, homepage

1st September 2016, 18:07	#907 \| Link
dapperdan Registered User Join Date: Aug 2009 Posts: 201	They talk about a paper with more details of their results, is that available yet?

3rd September 2016, 21:24	#915 \| Link
mandarinka Registered User Join Date: Jan 2007 Posts: 729	Yeah, I'm aware of the ssim/psnr and x264 businesses, I was encoding back then, when VAQ/PsyRDO landed and changed the encoding landscape like nothing in years ('08/'09) I'm thinking that they probably meant to not test the same thing as with plain SSIM, when they run the MS-SSIM metric. Maybe they aimed to use it to gauge the visual quality like with VMAF... be it a good idea or not. I'm not sure how similar is MS-SSIM to SSIM proper. Xiph people use it IIRC, probably thinking it correlated better with visual quality than SSIM. But x265 is still going to have bigger delta between metrics-tuned and psy-tuned result than libvpx is going to, I think, since libvpx lacks psyrdo. So to the degree MS-SSIM is similar to SSIM, it is probably disadvantaging x265. Hard to know for sure though, I never tried it.

5th September 2016, 23:27	#918 \| Link
herbert Registered User Join Date: May 2015 Posts: 7	VideoLAN Dev Days 2016: Update on VPX Recap on recent VPX developments. https://www.youtube.com/watch?v=peS2I14w8ow VideoLAN Dev Days 2016: A VP9 Encoder Information on ffvp9 until about 6:45, followed by a talk about 'Eve', an alternative VP9 encoder. https://www.youtube.com/watch?v=t_z52-CBut0