View Single Post
Old 26th June 2018, 11:17   #724  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 95
Quote:
Originally Posted by blurred View Post
do you maybe know some paper showing how to optimize it?
Lamentably no for the general case, also searching for it I found a paper:"DAALA_EC in AV1" that have some data for hadware implementations:

Daala_ec decoder 54k gates,performance 1 symbol per clock, decoding time 1 clock.
Daala_ec encoder 9k gates,performance 1 symbol per clock, encoding time 1 clock.
ANS decoder 49k gates,performance 1 symbol per clock, decoding time 1 clock.
ANS encoder 25k gates,performance 1 symbol every 2 clocks, encoding time 2 clocks.

As for reference VP9 G2 hardware codec has 2.60M gates (2160p@30fps content playback: ~250Mz)

Basically ANS has not faster decoding speed that Daala range coder once implemented in hardware, an even is slower in encoding. The thing that the speed diference in software implementation don't correlate to it in hardware implementation is enougth common as to call it a norm. Other thing is that it appears that in the decission of using the Daala range coder the hardware guys at ARM/AMD/Itel/Nvidia had a good hand in it.

Also rANS is quite recent and higthly optimised, plus it uses 32/64bit aritmetic and SIMD instructions while daala range coder uses only 16bit aritmethic. And you can do betwen 2 and 4 1 clock 16bit multipliers in the same number of gates that of a 32bit 1clock multiplier.

Last edited by Phanton_13; 26th June 2018 at 12:48.
Phanton_13 is offline   Reply With Quote