Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > VP9 and AV1

Reply
 
Thread Tools Search this Thread Display Modes
Old 12th December 2018, 12:02   #1281  |  Link
Gravitator
Registered User
 
Join Date: May 2014
Posts: 292

ffmpeg-4.2-92396-g55e021f39b

- libaom 1.0.0-902-g03d8ebedc
- libdav1d 58fc516
Quote:
ffmpeg -hide_banner -t 10 -c:v libaom-av1 -i 1.mp4 -benchmark -f null - (43 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -i 1.mp4 -benchmark -f null - (52 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 1 -tilethreads 2 -i 1.mp4 -benchmark -f null - (61 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 2 -tilethreads 2 -i 1.mp4 -benchmark -f null - (65 fps)

ffmpeg-4.2-92681-0e833f6

- libaom 1.0.0-1028-78e6b2c
- libdav1d 0.1.0 73067e5
Quote:
ffmpeg -hide_banner -t 10 -c:v libaom-av1 -i 1.mp4 -benchmark -f null - (45 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -i 1.mp4 -benchmark -f null - (51 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 1 -tilethreads 2 -i 1.mp4 -benchmark -f null - (58 fps)
ffmpeg -hide_banner -t 10 -c:v libdav1d -threads 2 -tilethreads 2 -i 1.mp4 -benchmark -f null - (63 fps)

Last edited by Gravitator; 12th December 2018 at 12:08.
Gravitator is offline   Reply With Quote
Old 12th December 2018, 12:50   #1282  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Quote:
Originally Posted by v0lt View Post
They claim that dav1d is always faster than libaom. They say that there are problems only in single-threaded mode. This lie breaks.
Actuall it says that it will soon be faster then other decoders on all platforms. "soon" and not now.

If you don't have AVX2, the decoder is still being bottlenecked quite heavily, and also won't thread quite as nicely because reference frames take too long to decode, for example. The SSSE3 work is still at early stages - if you look at the ticket linked above, only a small part of assembly has been covered in SSSE3 yet.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now   Reply With Quote
Old 12th December 2018, 14:39   #1283  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
SSSE3 code base is fundamental because it's the first instruction test supported by all Core 2 Duo and above (not Pentium 4) and also it's very useful for decoding (at least on previous codecs like H.264/H.265)

But I don't know if they want to go back to even older instruction sets and CPUs like SSE2.

We'll see.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 12th December 2018, 15:15   #1284  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Shame for AMD users. K10 (like Phenom II) and similar don't have SSSE3. Produced up to 2012. Of course it will be years until AV1 is de-facto required (if ever) so by then...
sneaker_ger is offline   Reply With Quote
Old 12th December 2018, 15:24   #1285  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Quote:
Originally Posted by NikosD View Post
But I don't know if they want to go back to even older instruction sets and CPUs like SSE2.
https://code.videolan.org/videolan/d...207#note_24056
SmilingWolf is offline   Reply With Quote
Old 12th December 2018, 15:30   #1286  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Quote:
Originally Posted by sneaker_ger View Post
Shame for AMD users. K10 (like Phenom II) and similar don't have SSSE3. Produced up to 2012. Of course it will be years until AV1 is de-facto required (if ever) so by then...
The marketshare of non-SSSE3 desktop CPUs is so small that noone is really going to bother with that, particularly because many of those CPUs are often times going to be too slow for any real use anyway.
And in all honesty, if you bought a K10 in 2012 or anywhere near to that, you just did it wrong, even on the low-end market.

Intel introduced SSSE3 all the way back in 2006, afterall. Its hardly "new" even in 2012.

Ultimately its up to the developers how they want to spend their time, but as mentioned in the ticket linked above, pure SSE2 is often a lot more painful to write then using SSSE3 enhancements.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 12th December 2018 at 15:38.
nevcairiel is online now   Reply With Quote
Old 12th December 2018, 15:38   #1287  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by SmilingWolf View Post
Thank you.

So, SSSE3 is the minimum.

Little pity for AMD CPUs.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 12th December 2018, 15:39   #1288  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Steam HW Survey says 3% don't have SSSE3, only 0.01% don't have SSE3.

https://store.steampowered.com/hwsurvey
sneaker_ger is offline   Reply With Quote
Old 12th December 2018, 15:43   #1289  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
SSE3 (without the third S) is mostly useless for video. Its primarly floating-point.
For video, which needs integer instructions, you only have a few meaningful steps: (everything left out is mostly floating point or otherwise not related, like SSE3, AVX1, etc).

- MMX
- SSE2
- SSSE3
- SSE4.1
- AVX2
- AVX512

Obviously noone cares about MMX anymore. SSE4.1 is only useful in special cases. And obviously AVX512 is not rolled out and perhaps even understood widely enough yet, maybe in a few years.
So, by and large, that leaves SSE2, SSSE3, AVX2. The difference between SSE2 and SSSE3 is not gigantic, same 128-bit registers afterall, SSSE3 only adds a bunch of new instructions - but some of those are really useful and make code much simpler and easier to write.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 12th December 2018 at 15:48.
nevcairiel is online now   Reply With Quote
Old 12th December 2018, 16:33   #1290  |  Link
clsid
*****
 
Join Date: Feb 2005
Posts: 5,642
The optimizations in Dav1d are currently mostly for 8-bit only. So for 10-bit libaom may still be faster.

Development pace in Dav1d is pretty high, so we will have a fast decoder long before there is actual widespread AV1 content (beyond the current demo files and a few Youtube videos).
clsid is offline   Reply With Quote
Old 12th December 2018, 16:34   #1291  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
Quote:
Originally Posted by MoSal View Post
Can you try -threads 8 -tilethreads 1?
I test again. i5-3570K.
Code:
libaom-av1 - max 14 fps
libdav1d - max 7.2 fps
libdav1d -threads 4 -tilethreads 4 - max 9.7 fps
libdav1d -threads 8 -tilethreads 1 - max 10 fps
Quote:
Originally Posted by nevcairiel View Post
Actuall it says that it will soon be faster then other decoders on all platforms. "soon" and not now.
I carefully read their "press releases". I did not see them writing about slow speed without AVX2. But they know exactly about this. This happens the second time. "Press releases" write for sponsors?
I'm waiting for the dav1d to be faster on my processor. I want to see truthful information, not PR.
v0lt is offline   Reply With Quote
Old 12th December 2018, 16:53   #1292  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Quote:
Originally Posted by v0lt View Post
I carefully read their "press releases". I did not see them writing about slow speed without AVX2. But they know exactly about this. This happens the second time. "Press releases" write for sponsors?
I'm waiting for the dav1d to be faster on my processor. I want to see truthful information, not PR.
No you didn't.
The blogpost links twice to this previous one for detailed perf reports: http://www.jbkempf.com/blog/post/201...-first-release
Quote:
Today, dav1d is very fast on AVX2 processors, which should cover a bit more than 50% of the CPUs used on the desktop. We wrote 95% of the code needed for AVX2, but there is still a bit more achievable.

We're readying the SSE and the ARM optimizations, to do the same. They will be very fast too, in the next weeks.
It's clearly stated that dav1d is the fastest on AVX2.
Then the same post you claim to have read very carefully states that work on SSSE3 has only just begun.
Since the Pentium G5600 only supports extensions up to SSE4.2 it's clear you'll have to wait some more.

Spare the rage and read some more

Last edited by SmilingWolf; 12th December 2018 at 17:00.
SmilingWolf is offline   Reply With Quote
Old 12th December 2018, 17:18   #1293  |  Link
v0lt
Registered User
 
Join Date: Dec 2008
Posts: 1,959
Quote:
Originally Posted by SmilingWolf View Post
No you didn't.
The blogpost links twice to this previous one for detailed perf reports: http://www.jbkempf.com/blog/post/201...-first-release
Please quote the text where it is written that dav1d without AVX2 will run slower.
This information I could find only in the discussion of beta testing.
v0lt is offline   Reply With Quote
Old 12th December 2018, 17:29   #1294  |  Link
SmilingWolf
I am maddo saientisto!
 
SmilingWolf's Avatar
 
Join Date: Aug 2018
Posts: 95
Quote:
Originally Posted by v0lt View Post
Please quote the text where it is written that dav1d without AVX2 will run slower.
This information I could find only in the discussion of beta testing.
A certain extension provides a speedup. It follows, w/o said extension things will be slower.
You have the wunderbar vector extensions: you have the speedup these provide.
You can't use the vector extensions: you're going to run on C code, which is gonna be slower. Which is the reason these multimedia extensions exist in the first place.
Doesn't really take a degree to understand.
I got it, everyone around here got it, it seems you're the only one left out. Wonder where the problem lies?
SmilingWolf is offline   Reply With Quote
Old 12th December 2018, 18:06   #1295  |  Link
easyfab
Registered User
 
Join Date: Jan 2002
Posts: 332
And if you want the latest info for SIMD you should look :

AVX2 https://code.videolan.org/videolan/dav1d/issues/78
SSSE3 https://code.videolan.org/videolan/dav1d/issues/216
ARM / NEON https://code.videolan.org/videolan/dav1d/issues/215

As you can see for AVX2 it's pretty much done, but only a few for others. And that only for 8bit if i'm correct.
easyfab is offline   Reply With Quote
Old 12th December 2018, 19:40   #1296  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
Quote:
Originally Posted by nevcairiel View Post
many of those CPUs are often times going to be too slow for any real use anyway.
And in all honesty, if you bought a K10 in 2012 or anywhere near to that, you just did it wrong, even on the low-end market.
Keep in mind that even the Llano 1st gen APUs lacked SSSE3 due to their K10-derived CPU architecture.


As someone with both a Phenom II x4 and a Core 2 Quad (actually a Phenom II x2 unlocked to x4 and a quad Wolfdale Xeon), I find that the latter has pretty sub-par multicore scaling in video workloads - yes it's faster than a Core 2 Duo, but not quite at the level that you'd expect as I showed in my post two pages back (if Wolfdale had the same scaling from 2c/2t to 4c/4t as Nehalem, then 4c/4t Wolfdale would've only needed ~2.4GHz, not 2.7GHz)

This then commonly results in the Phenom actually performing similar to if not better than the Core 2 Quad on a per-GHz basis assuming the tested code isn't heavily relying on SSSE3 or SSE4.1 (as is obviously the case currently with AV1 decoding), and the Phenom not only tended to have higher stock clocks but even came in 6 core variants as well.


Similarly, I've also previously documented that the Phenom II is faster than Core 2 Quad clock-for-clock in SVP video interpolation (which is a task that loves "moar cores!" and SMT threads).
__________________
____HTPC____  | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258
Radeon HD5870  | Intel iGPU      
2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600       
Nintendo Maniac 64 is offline   Reply With Quote
Old 12th December 2018, 20:06   #1297  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
Quote:
Originally Posted by easyfab View Post
And if you want the latest info for SIMD you should look :

AVX2 https://code.videolan.org/videolan/dav1d/issues/78
SSSE3 https://code.videolan.org/videolan/dav1d/issues/216
ARM / NEON https://code.videolan.org/videolan/dav1d/issues/215

As you can see for AVX2 it's pretty much done, but only a few for others. And that only for 8bit if i'm correct.
Do we have numbers for the installed base of AVX2 capable PCs? They've been in all new mainstream systems for several years now. I'd guess it's >50% already.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 12th December 2018, 21:22   #1298  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 447
Quote:
Originally Posted by benwaggoner View Post
2Do we have numbers for the installed base of AVX2 capable PCs? They've been in all new mainstream systems for several years now.
I realize I sound like a broken record at this point, but the newest Pentiums and Celerons still do not support AVX, and this even applies to the models that use the full-fat Sky/Kaby/Coffeelake cores (though with smaller cache size) such as the ever-popular 2c/4t Pentium G4560 and its direct successor the G5400.

(and again, going forward the Athlon 200GE is a wiser choice of CPU, but that's only been on the market for a couple months now)
__________________
____HTPC____  | __Desktop PC__
2.93GHz Xeon x3470 (4c/8t Nehalem) | 4.5GHz 1.24v dual-core Haswell G3258
Radeon HD5870  | Intel iGPU      
2x2GB+2x1GB DDR3-1333 | 4x4GB DDR3-1600       
Nintendo Maniac 64 is offline   Reply With Quote
Old 13th December 2018, 12:08   #1299  |  Link
mzso
Registered User
 
Join Date: Oct 2009
Posts: 930
Hi!

On the decoder sides Dav1d and libAOM are the only two options? I see Firefox has a Dav1d option, which doesn't work too well, because it freezes on the bitmovin demo. (I guess the other is libaom.) The default decoder plays the video completely smoothly now on my computer.

PS:
By the way, can I download these streams?
The player is pretty trashy, the quality always resets and doesn't want to change unless I seek.
mzso is offline   Reply With Quote
Old 13th December 2018, 12:22   #1300  |  Link
mzso
Registered User
 
Join Date: Oct 2009
Posts: 930
Quote:
Originally Posted by utack View Post
Even worse that it seems to be about tiles.
These should have never been implemented in a new codec the first place, it is just a lazy workaround because libaom sucks at frame parallel encoding and decoding.
rav1e and dav1d won't need them
Why shouldn't we like tiled encoding?
mzso is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:28.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.