Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 13th June 2018, 21:16   #701  |  Link
MoSal
Registered User
 
Join Date: Jun 2013
Posts: 94
Quote:
I have bought 1,5 years ago LG smartTV. This year after upgrading the Youtube aplication it starts to support VP9 (stats for nerds reports it) and plays Youtube 4K without single drop.
I tested a cheap locally-assembled TV with Chinese parts the other day. VP9 4K@60fps is supported out of the box. Opus was the codec that's not supported.

No one will forget AV1, not even the no name chip manufacturers. Here is hope, from now on, they will not forget Opus either.
__________________
saldl: a command-line downloader optimized for speed and early preview.
MoSal is offline   Reply With Quote
Old 14th June 2018, 03:13   #702  |  Link
amichaelt
Guest
 
Posts: n/a
Quote:
Originally Posted by IgorC View Post
It's not like VP9 isn't supported by any smart TVs.

I have bought 1,5 years ago LG smartTV. This year after upgrading the Youtube aplication it starts to support VP9 (stats for nerds reports it) and plays Youtube 4K without single drop.
But it’s not supported by everything whereas anything that can do 4K with a Netflix app has to have HEVC support.
  Reply With Quote
Old 15th June 2018, 09:46   #703  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,986
Quote:
anything that can do 4K with a Netflix app has to have HEVC support.
Exactly
Blue_MiSfit is offline   Reply With Quote
Old 22nd June 2018, 10:24   #704  |  Link
blurred
Registered User
 
Join Date: Jul 2016
Posts: 14
Interesting discussion regarding the choice of entropy coder for AV1: https://encode.ru/threads/1890-Bench...ll=1#post56945

Daala range coder using 16 multiplications per symbol has won with rANS using 1 multiplication per symbol, and ~7x faster implementations: https://sites.google.com/site/powturbo/entropy-coder

Do anybody know why the slower and more costly one was chosen?

ps. This nibble adaptive rANS is e.g. used in recent open source Dropbox DivANS: https://blogs.dropbox.com/tech/2018/...r-with-divans/

Last edited by blurred; 22nd June 2018 at 10:40. Reason: Dropbox DivANS
blurred is offline   Reply With Quote
Old 22nd June 2018, 11:43   #705  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
Often the choice is for simpler hardware implementations, since thats really the future, not software. I'm also not convinced a generic benchmark can fully represent the performance characteristics of an actual codec.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 22nd June 2018 at 11:47.
nevcairiel is offline   Reply With Quote
Old 22nd June 2018, 12:13   #706  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 95
If I don't remenber incorrectly the rANS does some things in a reverse way and othes things that complicated the cost of implimentation in hardware, in other works of implementing in silicon the 16 multipliers in Daala range coder is cheapier than implementing the memory need to implement the rANS. Also it apears that rANS can increase the latency specially at low rates due to the buffers. Most of this problems are being tackled in a new generation of ANS coders, but they are not going to be ready for a possible implementation in AV1.

Also remenber: Faster/cheapier in software is not the same as faster/cheapier in hardware.
Phanton_13 is offline   Reply With Quote
Old 22nd June 2018, 17:03   #707  |  Link
blurred
Registered User
 
Join Date: Jul 2016
Posts: 14
But doesn't 16 multiplications cost more energy than one - paid in energy consumption and battery life of our devices?
blurred is offline   Reply With Quote
Old 23rd June 2018, 00:55   #708  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 95
That only aplies to software implementations, in hardware you dont think in number of instructions but in number of gates/transistors, and the maping is not that simple as sometimes you can implement a function with 9 multiplications into the same number of gates that takes to implent 2 multiplications (this in an example of a real case). Memory also have cost in gates and power and you ned to evaluate If it's more eficient to expend the gates on memory or in procesing, the result of this evaluation is what determined the use of the Daala range coder as av1 have been designed with the hardware implementation in mind because it is critical for mobile phones.

Last edited by Phanton_13; 23rd June 2018 at 00:58.
Phanton_13 is offline   Reply With Quote
Old 24th June 2018, 12:32   #709  |  Link
blurred
Registered User
 
Join Date: Jul 2016
Posts: 14
Quote:
Originally Posted by Phanton_13 View Post
(...)you can implement a function with 9 multiplications into the same number of gates that takes to implent 2 multiplications(...)
Looks like you are referring to serial execution, which might require 16x frequency increase here (?)
And hardware decoding requires replacing current hardware - meanwhile (~5 years) it will be made software, where being 7x slower seems a huge sacrifice.
Additionally, Google is still fighting for this ANS patent over dead bodies ( https://arstechnica.com/tech-policy/...public-domain/ ) - if it is not intended for AV1, will it prevent others using ANS in video compression?
blurred is offline   Reply With Quote
Old 24th June 2018, 15:21   #710  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 95
Quote:
Originally Posted by blurred View Post
Looks like you are referring to serial execution, which might require 16x frequency increase here (?)
No, I refering to reimplemt the function, in that case a 50Mhz FPGA implementation of the full ASIC was able to match a Core2duo at 2Ghz runing its functionality in software, and in silicon the asic was runing at 1Ghz.

Hardware design is very diferent that software development, for example in the range coder of daala most multiplication are constant*value, in this case in hardware you don't need to do multiplication always, for example in the case that the constant is "2" there are various variants as for example in unsigned is only a bit shift but in harware is even cheaper as you only resoute the data and for signed you use and adder or a modified shifter. And for other values most of the time there is an alternative and faster way to implement it instead of doing a full multiplier. Also most of the time you don't need to implement a full multiplication unit as you only implement it what you need, for example you can do a 12 bit multiplier instad of a 16bit one if you values always fit in 12 bits, or you only implement the lower bits of a 16 bits multiplication and ignore any value over 16bits...

In hardware design the frecuency is a derived value of data propagation (delay, timing) and what you whant is results, even if some implementation have slower frecuency but produces the result faster you go for it.

Quote:
Originally Posted by blurred View Post
And hardware decoding requires replacing current hardware - meanwhile (~5 years) it will be made software, where being 7x slower seems a huge sacrifice.
That is true, bus is more like 2-3 years for hardware to start apearing, and in this case it can be reduced to 1 year due to the varios hardware designers and manufactures in AOM.

Quote:
Originally Posted by blurred View Post
Additionally, Google is still fighting for this ANS patent over dead bodies ( https://arstechnica.com/tech-policy/...public-domain/ ) - if it is not intended for AV1, will it prevent others using ANS in video compression?
No, actually having it refused can actually be good as it's detimentral if its aproved at posteriori for other entity because it can be used to put the patent office and the posteriori aproval in question and invalidate it. More this also demostrated the disfuntionality in both the patent system and the legal teams in companies.
Phanton_13 is offline   Reply With Quote
Old 24th June 2018, 16:17   #711  |  Link
blurred
Registered User
 
Join Date: Jul 2016
Posts: 14
Quote:
Originally Posted by Phanton_13 View Post
(...)in the range coder of daala most multiplication are constant*value(...)
If I properly understand, there are 16 multiplications due to "maximal alphabet size" = 16 - it needs to multiply "range size" by CDF value for all 16 symbols.
In contrast, rANS needs to multiply by only one value (p[s] = CDF[s+1]-CDF[s]), where s is the currently decoded symbol.

CDF changes with data type (context), and can be adapted - these are definitely not constant values.
In hardware you can build 16 parallel multipliers not to increase frequency, but it would need 16x more gates, and most importantly: consume 16x more energy.
blurred is offline   Reply With Quote
Old 24th June 2018, 19:45   #712  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 95
In part you are rigth but at the same time you are forgeting one thing, those 16 pararell multipliers consume more energy than the extra memory needed in rANS? the hardware is inerent pararell, then is theupdate posible to do in pararel with another task during the decoding process? Also there is the posibility of optimization for those 16 pararell multiplications as one operand is comon to all. On thing that help with hardware is not to think of it like a computer program but as a data flow between operands.
Phanton_13 is offline   Reply With Quote
Old 24th June 2018, 21:24   #713  |  Link
blurred
Registered User
 
Join Date: Jul 2016
Posts: 14
Such additional (for rANS) buffer is only needed in encoder, which for video compression is usually an order of magnitude more costly, and for example for youtube, netflix video used only once per thousands or millions of views (decodings).
And video compressor seems to require huge flexible buffers for various modellings/predictions - is it a non-negligible cost to share a few kilobytes with entropy coder?

Quote:
Originally Posted by Phanton_13 View Post
Also there is the posibility of optimization for those 16 pararell multiplications as one operand is comon to all.
Interesting, indeed the range is varying, but the same for all 16 multiplications.
Thinking about multiplication as shifts and additions, the cheap shifting part can be indeed shared, but it doesn't seem simple to get systematic optimization for separate additions - do you maybe know some paper showing how to optimize it?
blurred is offline   Reply With Quote
Old 25th June 2018, 21:13   #714  |  Link
Quikee
Registered User
 
Join Date: Jan 2006
Posts: 41
AV1 freeze

AV1 1.0.0 code tag

Also specs don't have draft status anymore.

No official announcement yet..
Quikee is offline   Reply With Quote
Old 26th June 2018, 00:12   #715  |  Link
GTPVHD
Registered User
 
Join Date: Mar 2008
Posts: 175
https://aomediacodec.github.io/av1-spec/

Still says Draft Document here.
GTPVHD is offline   Reply With Quote
Old 26th June 2018, 02:03   #716  |  Link
TD-Linux
Registered User
 
Join Date: Aug 2015
Posts: 34
Quote:
Originally Posted by blurred View Post
Daala range coder using 16 multiplications per symbol has won with rANS using 1 multiplication per symbol, and ~7x faster implementations: https://sites.google.com/site/powturbo/entropy-coder

Do anybody know why the slower and more costly one was chosen?
Firstly, the AV1 range coder only uses 1 multiplication per CDF entry, the 16 is the "worst case" (keep in mind that they can be done in parallel, e.g. with SIMD, so it's actually better to use more than less as the multiply is the cheapest part in software). Secondly, the difference is nowhere near 7x when we benchnmarked the two - rANS was faster, but by a factor of about 2. However, the requirement to buffer and reverse the symbols was unfortunately insurmountable.

Also keep in mind that AV1 adjusts the probabilities on a per-symbol basis. The entropy coder CDFs are designed to make adapting the probabilities very fast (with only adds and shifts). This puts some constraints on the design that don't exist in the linked benchmark (which uses fixed probabilities as far as I can tell).

Last edited by TD-Linux; 26th June 2018 at 02:06.
TD-Linux is offline   Reply With Quote
Old 26th June 2018, 04:50   #717  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,738
Quote:
Originally Posted by nevcairiel View Post
Often the choice is for simpler hardware implementations, since thats really the future, not software. I'm also not convinced a generic benchmark can fully represent the performance characteristics of an actual codec.
I think you can remain happily convinced that a generic benchmark will NOT "represent the performance characteristics of an actual codec"

There is so much clever that gets done, even in decoders. And there are so many different kinds of parallelization, SIMD, ASIC, etcetera available. And surprising numbers of decoders don't implement basic stuff like skipping non-reference frames when doing seeking, due to the system layer and the decoder layers not being tightly coupled enough.

AV1 is way better designed for parallelized HW decoders than VP9 was, which was pretty painfully serialized compared to HEVC, with software decoders pretty dependent on fast single-core performance.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th June 2018, 07:37   #718  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,746
@ GTPVHD: Then the github site may have outdated content?

MABS also retrieves sources from GoogleSource. And had to disable a TESTS flag to continue compiling, 2 days ago.
__

P.S.: New upload:

AOM v1.0.0-6-gce8f4811b (yes, v1.0.0+)
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 26th June 2018 at 09:21.
LigH is offline   Reply With Quote
Old 26th June 2018, 11:17   #719  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 95
Quote:
Originally Posted by blurred View Post
do you maybe know some paper showing how to optimize it?
Lamentably no for the general case, also searching for it I found a paper:"DAALA_EC in AV1" that have some data for hadware implementations:

Daala_ec decoder 54k gates,performance 1 symbol per clock, decoding time 1 clock.
Daala_ec encoder 9k gates,performance 1 symbol per clock, encoding time 1 clock.
ANS decoder 49k gates,performance 1 symbol per clock, decoding time 1 clock.
ANS encoder 25k gates,performance 1 symbol every 2 clocks, encoding time 2 clocks.

As for reference VP9 G2 hardware codec has 2.60M gates (2160p@30fps content playback: ~250Mz)

Basically ANS has not faster decoding speed that Daala range coder once implemented in hardware, an even is slower in encoding. The thing that the speed diference in software implementation don't correlate to it in hardware implementation is enougth common as to call it a norm. Other thing is that it appears that in the decission of using the Daala range coder the hardware guys at ARM/AMD/Itel/Nvidia had a good hand in it.

Also rANS is quite recent and higthly optimised, plus it uses 32/64bit aritmetic and SIMD instructions while daala range coder uses only 16bit aritmethic. And you can do betwen 2 and 4 1 clock 16bit multipliers in the same number of gates that of a 32bit 1clock multiplier.

Last edited by Phanton_13; 26th June 2018 at 12:48.
Phanton_13 is offline   Reply With Quote
Old 27th June 2018, 08:11   #720  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,986
Congrats to the AOM team for hitting 1.0! It's only a few months late

Good stuff tho, looking forward to the encoder maturing. It's always great to see more options.
Blue_MiSfit is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:40.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.