Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 11th October 2018, 16:57   #1121  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 674
Quote:
Originally Posted by benwaggoner View Post
Can a modern system do that for HEVC? HEVC decode is going to be more inherently parallelizable due to WPP. And I'm not aware of any software decoders that can do a realtime 2160p60 HEVC on any hardware I've looked at.
I think it was recently mentioned here that FFmpeg doesn't do WPP simultaneously in addition to frame threading. I'm also aware of it not scaling very well, maybe this is the reason. (Doesn't OpenHEVC support doing this?)
But since Nevcariel says it works on some PCs, I guess it is a matter of CPU and RAM bandwidth. And perhaps single-thread per-core performance. Many slower cores might not cut it due to bw/scaling issues, but fewer ones on 4,0-4,5 GHz like those Kaby Lake/Coffee Lake chips could?

FFmpeg's HEVC decoder isn't yet/atm optimised thoroughly, there is some intrinsics optimizations from openhevc missing (LAV Video decoder has them though) and there is probably some other pickable fruit too. There just wasn't motivation on the side of devs it seems (preferences for the google/royalty-free formats etc).

Last edited by mandarinka; 11th October 2018 at 17:10.
mandarinka is offline   Reply With Quote
Old 11th October 2018, 16:57   #1122  |  Link
Clare
Registered User
 
Join Date: Apr 2016
Posts: 58
Quote:
Originally Posted by marcomsousa View Post
you forget --threads=8?

Code:
aomenc -v -w 1920 -h 1080 --cpu-used=0 --target-bitrate=1500 --threads=8 --profile=0 --aq-mode=0 --lag-in-frames=25 --auto-alt-ref=1 --tile-columns=4 --row-mt=1 -o test15.webm test1.y4m
This use all CPU.

Tune to you logical cores.
Ooops thanks now it's working.
Code:
aomenc --threads=8 --cpu-used=4 --tile-columns=4 --row-mt=1 --passes=2 --pass=2 --bit-depth=10 --input-bit-depth=10 --end-usage=q --cq-level=28 --fpf=Chimera_DCI4k2398p_HDR_P3PQ.log -o Chimera_DCI4k2398p_HDR_P3PQ.ivf Chimera_DCI4k2398p_HDR_P3PQ.y4m
Edit: it bursted on all core for 30 seconds but went back to one core afterwards

Last edited by Clare; 11th October 2018 at 17:02.
Clare is offline   Reply With Quote
Old 11th October 2018, 17:20   #1123  |  Link
easyfab
Registered User
 
Join Date: Jan 2002
Posts: 322
for --row-mt=1 I think you need to wait that https://aomedia-review.googlesource.com/c/aom/+/72801 is merged.

Last edited by easyfab; 11th October 2018 at 17:22.
easyfab is offline   Reply With Quote
Old 11th October 2018, 17:46   #1124  |  Link
Clare
Registered User
 
Join Date: Apr 2016
Posts: 58
Quote:
Originally Posted by easyfab View Post
for --row-mt=1 I think you need to wait that https://aomedia-review.googlesource.com/c/aom/+/72801 is merged.
I already patched my build with this. It seems that as soon as the first frame is finished rendering, it drops back to one core.
Clare is offline   Reply With Quote
Old 11th October 2018, 20:51   #1125  |  Link
Nintendo Maniac 64
Registered User
 
Nintendo Maniac 64's Avatar
 
Join Date: Nov 2009
Location: Northeast Ohio
Posts: 338
Quote:
Originally Posted by benwaggoner View Post
Slowing down 60p to 30p will result in a stream that may be easier to decode than the same content natively captured at 30p. This is because twice as much motion happens between 30p frames, so there's more prediction and motion vectors per frame to process. Also, the bitrate will drop by half in a 60-30 conversion, when real-world a 30p might be 70-80% the bitrate of a 60p for the same spatial quality (since twice as much change per frame is being captured).
Yep indeed, this is why I specifically use YouTube's 30fps encodes when dealing with a frame rate that's less than 60fps as it's better to under-estimate performance and then be pleasantly surprised to find out that real performance is better.
Nintendo Maniac 64 is offline   Reply With Quote
Old 12th October 2018, 01:23   #1126  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,579
Quote:
Originally Posted by mandarinka View Post
I think it was recently mentioned here that FFmpeg doesn't do WPP simultaneously in addition to frame threading
That is going to be way more dependent on the encoder used than ffmpeg itself. x265 uses WPP AND frame-threads by default if you have multiple cores, and I doubt ffmpeg would disable that. Turning off WPP silently would be a big problem, as WPP also has significant decoder impact as well, particularly with multithreaded software decode.

It's an ongoing challenge with all encoder to make sure that the right flags are allowed by products that incorporate them. Making sure that commonly used tools like ffmpeg integrate libaom (or a superior alternative) well is pretty darn important, as that's what lots of reviewers and evaluators will use.

Getting good, actionable documentation into ffmpeg and in general is also important. Listing options without explaining why one might want to use it and its pros/cons isn't really documentation. x265.readthedocs.io is the gold standard here, and I don't even have a runner up.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Instant Video

My Compression Book

Amazon Instant Video is hiring! PM me if you're interested.
benwaggoner is offline   Reply With Quote
Old 12th October 2018, 01:42   #1127  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,579
Quote:
Originally Posted by Nintendo Maniac 64 View Post
Yep indeed, this is why I specifically use YouTube's 30fps encodes when dealing with a frame rate that's less than 60fps as it's better to under-estimate performance and then be pleasantly surprised to find out that real performance is better.
Yeah, your approach is the best I can think of until it becomes more feasible to personally encode test content that makes use of a realistic array of AV1 features.

Testing fast encodes risks skipping features, making decoding simpler for a SW decoder than real-world competitive quality AV1 encodes would be. Some examples from past codecs where faster encoder modes simplify impact decoder performance that can be turned off for encoder performance include:
  • Weighted prediction
  • In-loop deblocking or SAO
  • Number of reference frames
  • Number of B-frames
__________________
Ben Waggoner
Principal Video Specialist, Amazon Instant Video

My Compression Book

Amazon Instant Video is hiring! PM me if you're interested.
benwaggoner is offline   Reply With Quote
Old 12th October 2018, 12:15   #1128  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 424
Quote:
Originally Posted by mandarinka View Post
There just wasn't motivation on the side of devs it seems (preferences for the google/royalty-free formats etc).
I'd rather say that most capable devs had enough of (big) corps freeloading and/or are paid to do something else. In a sense, it is actually a good thing that they learnt such a lesson, as it indicates a maturing and more professional community. And if AoM is ready to play fair in that regard, then it's "not-AoM" loss.

As for HEVC, the mentioned WPP+frame threading could be available today. All in all, there's likely 30% speed-up left.

Source: said capable devs.

Last edited by Kurosu; 12th October 2018 at 12:37.
Kurosu is offline   Reply With Quote
Old 12th October 2018, 16:30   #1129  |  Link
Monarc
Registered User
 
Join Date: Jun 2002
Posts: 32
Quote:
Originally Posted by marcomsousa View Post
Code:
ffmpeg -benchmark -i Stream2_AV1_4K_22.7mbps.webm -f null -
Video: wrapped_avframe, yuv420p, 3840x2160 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
frame= 3604 fps= 16 q=-0.0 Lsize=N/A time=00:02:24.16 bitrate=N/A speed=0.622x
bench: utime=1069.000s stime=12.891s rtime=231.970s
bench: maxrss=856880kB
Since the video was 25 fps, in benchmark give that my PC is only capable to decode at 15-16 fps (speed=0.622x) at with this 22.7mbps video.

CPU: Intel Core i7-8550U
Decoder: ffmpeg-20181007-0a41a8b-win64 - libaom-av1 1.0.0-691-gbb8157b89
on my Intel(R) Core(TM) i5-3550 CPU @ 3.30GHz

ffmpeg -benchmark -i Stream2_AV1_4K_22.7mbps.webm -f null -

ffmpeg version n4.0.2
[libaom-av1 @ 0x55e28fe6f180] 1.0.0-759-g90a15f4f28

frame= 3604 fps=6.2 q=-0.0 Lsize=N/A time=00:02:24.16 bitrate=N/A speed=0.246x
video:1886kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=579.607s
bench: maxrss=602748kB



ffmpeg -threads 4 -benchmark -i Stream2_AV1_4K_22.7mbps.webm -f null -

frame= 3604 fps= 16 q=-0.0 Lsize=N/A time=00:02:24.16 bitrate=N/A speed=0.643x
video:1886kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
bench: utime=656.087s
bench: maxrss=737080kB
Monarc is offline   Reply With Quote
Old 13th October 2018, 00:26   #1130  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,579
Quote:
Originally Posted by Kurosu View Post
I'd rather say that most capable devs had enough of (big) corps freeloading and/or are paid to do something else. In a sense, it is actually a good thing that they learnt such a lesson, as it indicates a maturing and more professional community. And if AoM is ready to play fair in that regard, then it's "not-AoM" loss.
Also, multithreaded decode speed isn't THAT important a feature for ffmpeg, which is mainly for transcoding. Lots of operations that include decode are going to be encoder-bound. And the other operations likely have work to do on unused cores. Multithreaded decode uses more memory and sometimes CPU, so an export-bound operation may actually be a little faster with a single-threaded decoder.

You are right that lots of ffmpeg and other open-source encoder/tool development is corporate funded, and so a lot of its development is driven by what companies want enough to pay for. And HEVC source is still pretty uncommon outside of some high end contribution streams from live events.

Quote:
As for HEVC, the mentioned WPP+frame threading could be available today. All in all, there's likely 30% speed-up left.

Source: said capable devs.
That's reasonable. Decoders hit the point of asymptotic speed increases a lot sooner than encoders as there is only one "right" answer. AV1 is going to have >>30% decoder perf headroom still. I am curious how the greater variety of tools will impact optimal decoder performance in AV1 versus HEVC. AV1 is MUCH better designed for both HW and multithreaded SW decoders than VP9 and earlier, which had limitations in the loop filter and frame signaling that made them almost single-threaded. Which was generally fine for a PC web browser of the era, but had some real limitations for battery operated devices or those with lots of slower cores.

We'll probably have a good sense of the fundamental real-world decoder perf differences between HEVC and AV1 in 2019.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Instant Video

My Compression Book

Amazon Instant Video is hiring! PM me if you're interested.
benwaggoner is offline   Reply With Quote
Old 13th October 2018, 06:30   #1131  |  Link
Kurosu
Registered User
 
Join Date: Sep 2002
Location: France
Posts: 424
Quote:
Originally Posted by benwaggoner View Post
Also, multithreaded decode speed isn't THAT important a feature for ffmpeg, which is mainly for transcoding
True, the decoding speed increase is no longer useful for a lot of companies. Security is often more important.

Quote:
AV1 is MUCH better designed for both HW and multithreaded SW decoders than VP9 and earlier, which had limitations in the loop filter and frame signaling that made them almost single-threaded.
One of those bottlenecks in thread scaling, what I'd call entropic state sharing across frames, is still there in AV1. This is to me something that was decided irrespective of SW decoding. Another sign that indeed the interest in it is only transitory.

Quote:
We'll probably have a good sense of the fundamental real-world decoder perf differences between HEVC and AV1 in 2019.
And in 2020, hopefully, new products will all have HW decoding of AV1, so that would then no longer matter.
Kurosu is offline   Reply With Quote
Old 13th October 2018, 16:29   #1132  |  Link
Mjpeg
Registered User
 
Join Date: Jun 2018
Posts: 2
Bitmovin encoder speedup

New article from Streaming Media:
http://www.streamingmedia.com/Articl...ts-127956.aspx

On the second page there is an intriguing slide from Bitmovin that with a pure software encoder version of libaom, they attained a speedup for cpu-used=0 from 1000x of VP9 in the early days down to 40x today with improvements ongoing. The claim from AOM was always that there was lots of room for optimization work once the spec was settled and ... well maybe that's happening. The proof will be when those speedups are in mainline FFmpeg and show up in tests on this very thread!

The slide also quotes software decode of 3.4x of VP9 single-threaded.
Mjpeg is offline   Reply With Quote
Old 13th October 2018, 18:27   #1133  |  Link
utack
Registered User
 
Join Date: Apr 2018
Posts: 3
Quote:
Originally Posted by Mjpeg View Post
Regarding the "poor results for AV1".
The BBC paper used single pass encode
Quote:
The parameters “--passes=1” and “--lag-in-frames=0” were set to run AV1 in single pass mode
without the possibility of looking ahead in the video sequence before encoding
The Harmonic paper used a fixed GOP of less than a single second of video material
Quote:
--min-gf-interval=16 --max-gf-interval=16
The Ateme paper does not mention any configuration for the encoder

That is basically three results that are only good for academic publication count and the gutter in the real world


On another note, there is a 5 minute 720p clip with super diverse content out there now on youtube where VP9 and AV1 version have basically the same file size:
https://www.youtube.com/watch?v=WaWnLiffxJ4
You can do some direct comparison of quality

Last edited by utack; 13th October 2018 at 22:35.
utack is offline   Reply With Quote
Old Yesterday, 08:25   #1134  |  Link
marcomsousa
Registered User
 
Join Date: Jul 2018
Posts: 27
Quote:
Originally Posted by Clare View Post
I already patched my build with this. It seems that as soon as the first frame is finished rendering, it drops back to one core.

Another fix was merged
Quote:
Fix allocation of workers for enc row-mt

Workers allocated for row based multi-threading of encoder are
now evaluated as minimum of num of threads and total number of
sb rows in the frame to be encoded.

Change-Id: I07501f43514f1ee45dd6637fe56411432930396c
__________________
AV1 win64 VS2017 builds
Last build here | History
I also open source the build scripts at Github: here
marcomsousa is offline   Reply With Quote
Old Yesterday, 09:41   #1135  |  Link
dapperdan
Registered User
 
Join Date: Aug 2009
Posts: 151
Quote:
Originally Posted by Mjpeg View Post
New article from Streaming Media:
http://www.streamingmedia.com/Articl...ts-127956.aspx

On the second page there is an intriguing slide from Bitmovin that with a pure software encoder version of libaom, they attained a speedup for cpu-used=0 from 1000x of VP9 in the early days down to 40x today with improvements ongoing.
Might be worth noting that the slide is from a Google employee, the Bitmovin employee was just in the audience.

I'm intrigued to hear more about the AI strategies they are using and wonder how portable those are once the "trick" has been revealed by the AI.
dapperdan is offline   Reply With Quote
Old Yesterday, 11:29   #1136  |  Link
Tommy Carrot
Registered User
 
Tommy Carrot's Avatar
 
Join Date: Mar 2002
Posts: 840
Quote:
Originally Posted by Mjpeg View Post
On the second page there is an intriguing slide from Bitmovin that with a pure software encoder version of libaom, they attained a speedup for cpu-used=0 from 1000x of VP9 in the early days down to 40x today with improvements ongoing.
The speed improvements are nice, but many of those optimizations came at the cost of the quality. I'd estimate the compression efficiency dropped about 2-3% since the earlier versions.
Tommy Carrot is offline   Reply With Quote
Old Yesterday, 19:55   #1137  |  Link
soresu
Registered User
 
Join Date: May 2005
Location: Swansea, Wales, UK
Posts: 49
Personally, considering that BBC is directly involved in development of VVC, and given that they joined the AV1 party relatively late in the development cycle, I'm inclined to find their so called research analysis somewhat suspect.

The Streaming Media article mentions their credibility on the issue, coming from a member of AOMedia, without taking into account just how late they joined.

At the very least it seems a conflict of interest to post such a negative paper so early in the post standard development of AV1, even more so considering that VVC is 2 years away from even being standardised.

Does anyone here have any idea what level of patent/proposals BBC have in the VVC working group?

Edit: Sorry if this is on the wrong thread, it concerns both AV1 and VVC so I wasnt sure which to drop it in.

Last edited by soresu; Yesterday at 19:59.
soresu is offline   Reply With Quote
Old Today, 16:53   #1138  |  Link
Clare
Registered User
 
Join Date: Apr 2016
Posts: 58
AV1 Image File Format (AVIF) https://people.xiph.org/~negge/AVIF2018.pdf

Mile-High Video Workshop videos http://mile-high.video/files/mhv2018/

with topics such as:
  1. Video Encoding and HEVC
  2. Into the Depths: The Technical Details behind AV1
  3. AV1 vs. HEVC: Perceptual Evaluation of Video Encoders
  4. Codec Comparison from TCO and Compression Efficiency Perspective
  5. VVC The Next-Generation Video Standard of the Joint Video Experts Team
  6. Pushing Encoding Quality and Speed with x265
  7. Massively Parallel Encoding
Clare is offline   Reply With Quote
Old Today, 19:32   #1139  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,579
Quote:
Originally Posted by Tommy Carrot View Post
The speed improvements are nice, but many of those optimizations came at the cost of the quality. I'd estimate the compression efficiency dropped about 2-3% since the earlier versions.
A 2.5x perf improvement with only a 2-3% efficiency tradeoff is a pretty good optimization option. That's ballpark to a x26? preset ratio in perf and quality.

And things have to get LOTS faster to do the psychovisual and rate control tuning to get AV1 into something that can be practically compared to other encoders/bitstreams. Quality won't matter within an order of magnitude of the current speeds. It's not like anyone is actually delivering in volume 1080p encoded with --preset placebo either!

Quality @ Bitrate @ Perf!
__________________
Ben Waggoner
Principal Video Specialist, Amazon Instant Video

My Compression Book

Amazon Instant Video is hiring! PM me if you're interested.
benwaggoner is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:30.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.