In part you are rigth but at the same time you are forgeting one thing, those 16 pararell multipliers consume more energy than the extra memory needed in rANS? the hardware is inerent pararell, then is theupdate posible to do in pararel with another task during the decoding process? Also there is the posibility of optimization for those 16 pararell multiplications as one operand is comon to all. On thing that help with hardware is not to think of it like a computer program but as a data flow between operands.
|