Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 26th June 2018, 02:03   #721  |  Link
TD-Linux
Registered User
 
Join Date: Aug 2015
Posts: 32
Quote:
Originally Posted by blurred View Post
Daala range coder using 16 multiplications per symbol has won with rANS using 1 multiplication per symbol, and ~7x faster implementations: https://sites.google.com/site/powturbo/entropy-coder

Do anybody know why the slower and more costly one was chosen?
Firstly, the AV1 range coder only uses 1 multiplication per CDF entry, the 16 is the "worst case" (keep in mind that they can be done in parallel, e.g. with SIMD, so it's actually better to use more than less as the multiply is the cheapest part in software). Secondly, the difference is nowhere near 7x when we benchnmarked the two - rANS was faster, but by a factor of about 2. However, the requirement to buffer and reverse the symbols was unfortunately insurmountable.

Also keep in mind that AV1 adjusts the probabilities on a per-symbol basis. The entropy coder CDFs are designed to make adapting the probabilities very fast (with only adds and shifts). This puts some constraints on the design that don't exist in the linked benchmark (which uses fixed probabilities as far as I can tell).

Last edited by TD-Linux; 26th June 2018 at 02:06.
TD-Linux is offline   Reply With Quote
Old 26th June 2018, 04:50   #722  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,874
Quote:
Originally Posted by nevcairiel View Post
Often the choice is for simpler hardware implementations, since thats really the future, not software. I'm also not convinced a generic benchmark can fully represent the performance characteristics of an actual codec.
I think you can remain happily convinced that a generic benchmark will NOT "represent the performance characteristics of an actual codec"

There is so much clever that gets done, even in decoders. And there are so many different kinds of parallelization, SIMD, ASIC, etcetera available. And surprising numbers of decoders don't implement basic stuff like skipping non-reference frames when doing seeking, due to the system layer and the decoder layers not being tightly coupled enough.

AV1 is way better designed for parallelized HW decoders than VP9 was, which was pretty painfully serialized compared to HEVC, with software decoders pretty dependent on fast single-core performance.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th June 2018, 07:37   #723  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,851
@ GTPVHD: Then the github site may have outdated content?

MABS also retrieves sources from GoogleSource. And had to disable a TESTS flag to continue compiling, 2 days ago.
__

P.S.: New upload:

AOM v1.0.0-6-gce8f4811b (yes, v1.0.0+)
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 26th June 2018 at 09:21.
LigH is offline   Reply With Quote
Old 26th June 2018, 11:17   #724  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 93
Quote:
Originally Posted by blurred View Post
do you maybe know some paper showing how to optimize it?
Lamentably no for the general case, also searching for it I found a paper:"DAALA_EC in AV1" that have some data for hadware implementations:

Daala_ec decoder 54k gates,performance 1 symbol per clock, decoding time 1 clock.
Daala_ec encoder 9k gates,performance 1 symbol per clock, encoding time 1 clock.
ANS decoder 49k gates,performance 1 symbol per clock, decoding time 1 clock.
ANS encoder 25k gates,performance 1 symbol every 2 clocks, encoding time 2 clocks.

As for reference VP9 G2 hardware codec has 2.60M gates (2160p@30fps content playback: ~250Mz)

Basically ANS has not faster decoding speed that Daala range coder once implemented in hardware, an even is slower in encoding. The thing that the speed diference in software implementation don't correlate to it in hardware implementation is enougth common as to call it a norm. Other thing is that it appears that in the decission of using the Daala range coder the hardware guys at ARM/AMD/Itel/Nvidia had a good hand in it.

Also rANS is quite recent and higthly optimised, plus it uses 32/64bit aritmetic and SIMD instructions while daala range coder uses only 16bit aritmethic. And you can do betwen 2 and 4 1 clock 16bit multipliers in the same number of gates that of a 32bit 1clock multiplier.

Last edited by Phanton_13; 26th June 2018 at 12:48.
Phanton_13 is offline   Reply With Quote
Old 27th June 2018, 08:11   #725  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,507
Congrats to the AOM team for hitting 1.0! It's only a few months late

Good stuff tho, looking forward to the encoder maturing. It's always great to see more options.
Blue_MiSfit is offline   Reply With Quote
Old 27th June 2018, 11:26   #726  |  Link
wiak
Registered User
 
Join Date: Jul 2003
Location: somewhere north
Posts: 260
Quote:
Originally Posted by MoSal View Post
I tested a cheap locally-assembled TV with Chinese parts the other day. VP9 4K@60fps is supported out of the box. Opus was the codec that's not supported.

No one will forget AV1, not even the no name chip manufacturers. Here is hope, from now on, they will not forget Opus either.
LG OLED telly dont support opus either sooo heh
__________________
Woah! Ninja?! http://nwgat.ninja/ (AV1 Overview)
"Not available in your region" has now been redefined as "Go Pirate, you filthy scum" Nwgat
wiak is offline   Reply With Quote
Old 27th June 2018, 11:34   #727  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 5,851
Due to general interest:

My AOM builds are a result of jb-alvarado's media-autobuild_suite, which has building ffmpeg as main purpose, but also offers several more features.

During the configuration, among several other options, I enabled building of separate executables (not only libraries used in ffmpeg), and building of AOM.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 27th June 2018, 17:18   #728  |  Link
Tommy Carrot
Registered User
 
Tommy Carrot's Avatar
 
Join Date: Mar 2002
Posts: 852
I've done a few tests with the 1.0 build (thanks Ligh!). I've compared it to the 0.1.0-9043 build from april, i mainly tested --cpu-used 0 to 2, because above that the quality isn't really better than current gen codecs, while it's still horrendously slow. Overally the encoding speed is improved to around twice as fast in each speed settings, while the quality is very similar (both filesize and metrics). In some cases there are definite visual improvements, but in most cases it has more or less the same quality.

This encoder still needs a lot of work, it's still too slow even for short tests, not to mention everyday use. The quality is fairly impressive, definitely better than x265 or VP9, but IMO it falls short to XVC and VVC.
Tommy Carrot is offline   Reply With Quote
Old 27th June 2018, 18:51   #729  |  Link
iwod
Registered User
 
Join Date: Apr 2002
Posts: 749
Quote:
Originally Posted by Tommy Carrot View Post
I've done a few tests with the 1.0 build (thanks Ligh!). I've compared it to the 0.1.0-9043 build from april, i mainly tested --cpu-used 0 to 2, because above that the quality isn't really better than current gen codecs, while it's still horrendously slow. Overally the encoding speed is improved to around twice as fast in each speed settings, while the quality is very similar (both filesize and metrics). In some cases there are definite visual improvements, but in most cases it has more or less the same quality.

This encoder still needs a lot of work, it's still too slow even for short tests, not to mention everyday use. The quality is fairly impressive, definitely better than x265 or VP9, but IMO it falls short to XVC and VVC.
Are there VVC encoder already? Or are you implying XVC?
iwod is offline   Reply With Quote
Old 27th June 2018, 18:55   #730  |  Link
Tommy Carrot
Registered User
 
Tommy Carrot's Avatar
 
Join Date: Mar 2002
Posts: 852
Quote:
Originally Posted by iwod View Post
Are there VVC encoder already? Or are you implying XVC?
Well, the jvet encoder posted in the vvc thread. It should be more or less the same, just an earlier version.
Tommy Carrot is offline   Reply With Quote
Old 27th June 2018, 20:41   #731  |  Link
blurred
Registered User
 
Join Date: Jul 2016
Posts: 14
Response by author of the benchmark from https://encode.ru/threads/1890-Bench...ll=1#post57093
Quote:
Quote:
Firstly, the AV1 range coder only uses 1 multiplication per CDF entry, the 16 is the "worst case" (keep in mind that they can be done in parallel, e.g. with SIMD, so it's actually better to use more than less as the multiply is the cheapest part in software).
For SSE2 decoding in AV1 you need 4 SIMD multiplications (_mm_mullo_epi32) + 4 comparisons (_mm_cmpgt_epi32) + combining (after _mm_movemask_ps) 4 SSE2 registers
It is unlikely that this will be faster than scalar decoding.
For AV1 hardware implementations, you need 16 32x32 multipliers, otherwise parallel multiplications are not possible.
Also 16 comparisons + other operations are additionaly required.
Quote:
Secondly, the difference is nowhere near 7x when we benchnmarked the two - rANS was faster, but by a factor of about 2.
For this benchmark and current implementations, rANS decoding is SEVEN times faster than AV1.
On ARM the scalar version is 5 times faster.
The AV1 nibble entropy coder is even slower than a bitwise range coder.
Quote:
However, the requirement to buffer and reverse the symbols was unfortunately insurmountable.
This is only required in encoding which is usually done in software.
This irrelevant argument is always used in their discussions.
The benchmark shows that TurboANXN, even with reverse encoding is more than 4 times faster than the current AOMedia AV1 encoder.
Quote:
Also keep in mind that AV1 adjusts the probabilities on a per-symbol basis.
The entropy coder CDFs are designed to make adapting the probabilities very fast (with only adds and shifts).
This puts some constraints on the design that don't exist in the linked benchmark (which uses fixed probabilities as far as I can tell).
The benchmark is using adaptive probabilities.
Quote:
There is so much clever that gets done, even in decoders.
And there are so many different kinds of parallelization, SIMD, ASIC, etcetera available.
And surprising numbers of decoders don't implement basic stuff like skipping non-reference frames when doing seeking, due to the system layer and the decoder layers not being tightly coupled enough.
This is indepedant from entropy coding. Here we are comparing the AV1 entropy coder against rANS and they are interchangeable.
Quote:
Also rANS is quite recent and higthly optimised, plus it uses 32/64bit aritmetic and SIMD instructions while daala range coder uses only 16bit aritmethic.
And you can do betwen 2 and 4 1 clock 16bit multipliers in the same number of gates that of a 32bit 1 clock multiplier.
According to the AV1 source code, 32 bits operations are used. rANS is 32 bits only.

I think the decision against rANS is politically motivated (Not-invented-here-Syndrom).
Otherwise, why not simply let the (now removed) rANS version in the repository for comparisons.
Hardware comparisons (complexity,energie consumption,costs,...) are only possible after implementing both optimized versions.

Note, we are considering here only adapative rANS. Do not confuse this with block based ANS as used in zstd,lzfse, lzturbo...
blurred is offline   Reply With Quote
Old 28th June 2018, 18:22   #732  |  Link
Phanton_13
Registered User
 
Join Date: May 2002
Posts: 93
Quote:
Originally Posted by blurred View Post
Response by author of the benchmark...
According to the AV1 source code, 32 bits operations are used. rANS is 32 bits only.
They are actually doing 16 arithmetic using 32 bit operations due that modern processor are faster unsing 32 bits operations that using 16bit operations, also varios presentations and documents indicates that dala range coder uses 15x16-> 31bit multiplications.


Quote:
Originally Posted by blurred View Post
I think the decision against rANS is politically motivated (Not-invented-here-Syndrom).
Otherwise, why not simply let the (now removed) rANS version in the repository for comparisons.
The political motivation can also be viewed in the reverse and if it was included tell that it was for political reasons... An inventor of something can always and most of the time tell that the reason that other don't use it is political motivated. Other times something cam be included to make someone happy (this have hapened in hevc and x264).

Also the patent situation, not only by google but also by others that do the same dirts it's posibility of inclusion.

For me instead of whine for it not being included the correct is to continue perfecting for it to be included in AV2.

Last edited by Phanton_13; 28th June 2018 at 18:40.
Phanton_13 is offline   Reply With Quote
Old 29th June 2018, 06:06   #733  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,413
Quote:
Originally Posted by blurred View Post
Response by author of the benchmark from https://encode.ru/threads/1890-Bench...ll=1#post57093
Quote:
For SSE2 decoding in AV1 you need ....
I'm sorry, tell us again about how this codec isn't designed for your 20-year-old Pentium 4. That has nothing to do with optimizability under AVX/AVX2 or Altivec, which are the only instruction sets that matter today.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order.
foxyshadis is offline   Reply With Quote
Old 29th June 2018, 08:40   #734  |  Link
mzso
Registered User
 
Join Date: Oct 2009
Posts: 838
AV1 development is becoming disappointing. With the promise of a February delivery for the final bitstream format I expected youtube providing AV1 streams by now. (At least for new/popular videos)
mzso is offline   Reply With Quote
Old 29th June 2018, 14:05   #735  |  Link
wiak
Registered User
 
Join Date: Jul 2003
Location: somewhere north
Posts: 260
Quote:
Originally Posted by mzso View Post
AV1 development is becoming disappointing. With the promise of a February delivery for the final bitstream format I expected youtube providing AV1 streams by now. (At least for new/popular videos)
they shot them self in the foot during NAB, when they so called released the codec

but currently most tools with not encode, case in point ffmpeg has strict mode on and dont even do webm, aomenc does webm, the encoding and decoding parts are to slow to be ustable even on a modern ryzen 8-core

atleast with 1.0.x series we can finally decode stuff encoded in older builds

still useless for anything other than thinkering

and this is from a user perspective

a proper roadmap with set dates on when stuff is getting implemented like faster encoding, multi-threading, browser support?

am convinced that aomedia runs on valve time https://developer.valvesoftware.com/wiki/Valve_Time

anyway, will check back in 3 months time, pace out (but i guess they are still more than a year off)
__________________
Woah! Ninja?! http://nwgat.ninja/ (AV1 Overview)
"Not available in your region" has now been redefined as "Go Pirate, you filthy scum" Nwgat
wiak is offline   Reply With Quote
Old 29th June 2018, 14:59   #736  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,736
Quote:
Originally Posted by wiak View Post
but currently most tools with not encode, case in point ffmpeg has strict mode on and dont even do webm, aomenc does webm
The container bindings are not finalized yet, which is why tools don't really create those yet. But both MP4 and MKV/WebM bindings are being worked on right now, and once those are final, expect at least FFmpeg to pick them up too.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now   Reply With Quote
Old 30th June 2018, 01:30   #737  |  Link
MoSal
Registered User
 
Join Date: Jun 2013
Posts: 80
Quote:
Originally Posted by wiak View Post
the encoding and decoding parts are too slow to be usable even on a modern ryzen 8-core
No kidding. I tested with --cpu-used=8 --tile-columns=4 expecting acceptable speed and awful quality.

The opposite turned out to be true. The speed is still slow. And the quality wasn't bad. It wasn't very good either, but still beets a tuned x265-slower profile with the specific sample I tested (1Mbps / 1080p / 30fps).

Code:
aomenc -o t.webm t.y4m -t 4 --target-bitrate=256 --enable-qm=1 \
--aq-mode=1 --film-grain-test=1 --cpu-used=8 --tile-columns=4
Code:
ffmpeg -i t.y4m -c hevc -crf 38 -preset slower -x265-params \
sao=0:deblock=-2,-2:psy-rdoq=5:qcomp=.75:ipratio=1.25:pbratio=1.18 t.mkv
The quality of --film-grain-test=1 is impressive. Better than no test, but --film-grain-test=2 adds too much grain.

On the decoding side. ffav1 should be available soon-ish.
__________________
saldl: a command-line downloader optimized for speed and early preview.

Last edited by MoSal; 30th June 2018 at 04:47.
MoSal is offline   Reply With Quote
Old 5th July 2018, 15:38   #738  |  Link
paul97
Registered User
 
Join Date: Mar 2018
Posts: 3
Has AOMedia already started optimizing AV1 (especially its speed) after the bitstream freezed on the 25th of June?
paul97 is offline   Reply With Quote
Old 5th July 2018, 18:06   #739  |  Link
iwod
Registered User
 
Join Date: Apr 2002
Posts: 749
Quote:
Originally Posted by paul97 View Post
Has AOMedia already started optimizing AV1 (especially its speed) after the bitstream freezed on the 25th of June?
Well you can be assured they have plans. But it will take time, counted in months, not days or weeks.
iwod is offline   Reply With Quote
Old 5th July 2018, 22:07   #740  |  Link
Mjpeg
Registered User
 
Join Date: Jun 2018
Posts: 7
Article: Constrained Directional Enhancement Filter

Great Chris Montgomery article on CDEF

https://hacks.mozilla.org/2018/06/av...cement-filter/

I'm thrilled to see a royalty-free option here, so I'll try to be patient as they speed up the encoder.
Mjpeg is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:22.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.