Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 27th January 2015, 22:56   #241  |  Link
wizziwig
Registered User
 
Join Date: Jan 2007
Posts: 5
Quote:
Originally Posted by nevcairiel View Post
Too bad their drivers tend to suck, causing crashes in things that NVIDIA just works on =)
Besides, on H.264 Maxwell has been equally fast as Intels decoder now.
Have you tested the refresh rate accuracy for 23.976 and 59.940 fps on the 960? That seems to be an area where Intel has the edge over everyone else. On Haswell, it just works out-of-the-box without having to mess with custom out-of-spec timings that may not work on all displays.
wizziwig is offline   Reply With Quote
Old 29th January 2015, 14:56   #242  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
I've been working on 10-bit decoding, and while it mostly works, it doesn't seem to be likely that it'll work in DXVA2 native mode anytime soon, as EVR just either crashes or refuses to render anything when receiving P010.

The good news is that as a separate side-project I've been working on reducing the CPU usage of Copy-Back, and assuming a direct decoder -> output chain (no software deint, no format conversion, clean nv12 decode and nv12 output, or p010 decode and p010 output), I could reduce the CPU usage on my NVIDIA system by up to 50%, and additionally the benchmarked performance is now 99% the same as DXVA2-Native benchmarks. These gains may be a bit smaller on Intel or AMD GPUs, but they will still be quite noticeable.

I may also add a P010 decode -> NV12 output path using these optimizations, as that may be a common use-case for now when viewing 10-bit content on many renderers.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 29th January 2015, 15:38   #243  |  Link
Qaq
AV heretic
 
Join Date: Nov 2009
Posts: 422
Quote:
Originally Posted by nevcairiel View Post
The good news is that as a separate side-project I've been working on reducing the CPU usage of Copy-Back, ... I could reduce the CPU usage on my NVIDIA system by up to 50%, and additionally the benchmarked performance is now 99% the same as DXVA2-Native benchmarks. These gains may be a bit smaller on Intel or AMD GPUs, but they will still be quite noticeable.
Impressive! That is what I was asking for. I feel like I'm ready to watch some high bitrate telecined movies now with my Athlon 64 X2 6000+.
Qaq is offline   Reply With Quote
Old 29th January 2015, 15:43   #244  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
The Athlon 64 is probably still rather slow for CB, as it doesn't have SSE4.1, which introduced the optimized instructions for copying from GPU memory to system memory. It can make a world of difference.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 29th January 2015, 15:54   #245  |  Link
Qaq
AV heretic
 
Join Date: Nov 2009
Posts: 422
Oh no.... we'll see.
Qaq is offline   Reply With Quote
Old 29th January 2015, 15:58   #246  |  Link
vivan
/人 ◕ ‿‿ ◕ 人\
 
Join Date: May 2011
Location: Russia
Posts: 643
Quote:
Originally Posted by nevcairiel View Post
I've been working on 10-bit decoding, and while it mostly works, it doesn't seem to be likely that it'll work in DXVA2 native mode anytime soon, as EVR just either crashes or refuses to render anything when receiving P010.
What about madVR? It should support DXVA2-native.
vivan is offline   Reply With Quote
Old 29th January 2015, 16:14   #247  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
Quote:
Originally Posted by vivan View Post
What about madVR? It should support DXVA2-native.
I won't bother with the complexity of 10-bit through DXVA2-Native for madVR only, which works much better with CB anyway (or even does CB itself!), not to mention that it probably doesn't accept it right now either.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 29th January 2015, 20:10   #248  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by nevcairiel View Post
I've been working on 10-bit decoding, and while it mostly works, it doesn't seem to be likely that it'll work in DXVA2 native mode anytime soon, as EVR just either crashes or refuses to render anything when receiving P010.
So, it seems that DXVA2 Native and 10bit HEVC decoding are incompatible.

What about EVR-CP ?
Any difference ?

Could Intel or AMD do any tricks with the driver, in order for DXVA2 Native and 10bit to be compatible ?
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 29th January 2015, 20:56   #249  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,923
Quote:
Originally Posted by NikosD View Post
So, it seems that DXVA2 Native and 10bit HEVC decoding are incompatible.

What about EVR-CP ?
Any difference ?

Could Intel or AMD do any tricks with the driver, in order for DXVA2 Native and 10bit to be compatible ?
the problem is that current renderer can't handle P010 in general.
maybe the current or next version of madVR will handle P010 DXVA input correctly.
EVR can't even handle P010 from software decoding...
huhn is offline   Reply With Quote
Old 29th January 2015, 21:45   #250  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Then Microsoft should build a better renderer for Windows 10 in order to use 10bit.

I don't use madVR.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th January 2015, 02:08   #251  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,923
to display 10 bit videos in 10 bit you need a directx 10+ fullscreen surface too a lot needs to be changed.
huhn is offline   Reply With Quote
Old 30th January 2015, 03:25   #252  |  Link
mhourousha
Registered User
 
Join Date: Mar 2013
Posts: 31
Quote:
Originally Posted by NikosD View Post
So, it seems that DXVA2 Native and 10bit HEVC decoding are incompatible.
What about EVR-CP ?
Any difference ?
Could Intel or AMD do any tricks with the driver, in order for DXVA2 Native and 10bit to be compatible ?
GPU can't handle any YUV420 format(NV12 P010 P016 etc) directly before D3D11.1.
therefore if you want avoid cpu copy-back,you need convert YUV 420 to RGB(StretchRect or VideoProcessBlt)in DXVA pass, in render pass,convert it back to YUV420,apply some upsample and color-space mapping algorithm,or use the RGB surface directly.
the problem is most GPU can't do P010/P016->RGB conversion under D3D9EX,so if decoder output P010/P016 surface,then can't convert it to D3D compatible format with GPU.
D3D11.1 can read YUV420 format directly(mapping to 2 SRV one for Y and the other for UV)。so it no need to do the YUV->RGB conversion in DXVA pass。but it need a DX11 Decoder(Decoder use DX11 device instead of D3D9Ex)。D3D9Ex Device can't share YUV420 DXVA Surface to DXGI_FORMAT_NV12 or DXGI_FORMAT_P010/DXGI_FORMAT_P016
mhourousha is offline   Reply With Quote
Old 30th January 2015, 06:57   #253  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Microsoft's MFT decoders are all D3D11, but I think LAV Video is D3D9.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 30th January 2015, 10:29   #254  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
DirectShow EVR is D3D9 only, it doesn't offer a D3D11 interface to DXVA2. No experience with MediaFoundation to say if thats any different.
Its possible that MF EVR is a lot better and supports all we need already, but other than Windows Media Player, nothing uses MF.

Microsoft is unfortunately not going to care about DirectShow in Windows 10 anymore, they haven't really cared about it for a long time now.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 30th January 2015, 18:53   #255  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,923
i guess they want to "force" developer to switch to MF.
at least they start to add some small reasons to do this.
huhn is offline   Reply With Quote
Old 1st February 2015, 09:25   #256  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
On my GTX 770 (PCI-E 3.0 x8) & i5-3570K @4.4Ghz, the new LAV dxva2cb-Direct mode only has a minor performance benefit with the 4K HEVC hybrid decoder, but it does reduce CPU usage somewhat. dxva2native continues to be much faster with the HEVC 4K hybrid decoder on this GPU.

3.Ducks-2160p@50fps-4Mbps
DXVAChecker x64 - Playback Benchmark (scaled to 1280x720)
DXVA2 Native: 187.350 fps | CPU Usage 28% | GPU Load ~93%
DXVA2 Copyback: 66.561 fps | CPU Usage 21% | GPU Load ~52%
DXVA2 Copyback Direct: 73.253 fps | CPU Usage 15% | GPU Load ~52%

Last edited by cyberbeing; 1st February 2015 at 12:23.
cyberbeing is offline   Reply With Quote
Old 1st February 2015, 11:06   #257  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
The hybrid decoder is CPU bound, which is why it behaves quite differently. Additionally, it probably has to upload a lot more data to the GPU itself, which I then download again, and upload again, which would explain how its generally not ideal for a CB workflow.
On H.264, or using my 960 with HEVC/HEVC 10, the speed difference is practically non-existent with minimal CPU usage (< 2%, down from 5-6%) now.

This mode does allow however using CB for HEVC 4K Main10 files on the 960 without any significant performance impact, while DXVA2-Native for 10-bit just doesn't work with the software we have.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 1st February 2015 at 11:09.
nevcairiel is offline   Reply With Quote
Old 1st February 2015, 12:23   #258  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
I figured as much, which is why I've always seen the hybrid decoder as nothing more than a crutch. If copyback-direct greatly improves performance using the modern ASIC decoder on your GTX 960, that's all that really matters to me. What sort of decoding performance discrepancy were you seeing between 'normal dxva2 copyback' and 'dxva native' on your GTX 960 prior to these changes?

The ASIC decoder on this GTX 770 seems too slow to bottleneck copyback on high-bitrate 4K H264. Testing a 100Mbps clip, the biggest difference was lower CPU usage:

DXVAChecker x64 - Playback Benchmark (scaled to 1280x720)
37.342fps | 0% CPU (Native)
37.206fps | 3% CPU (Copyback-Direct)
37.198fps | 5% CPU (Copyback)


On something like 10Mbps 1080p H264 baseline clip used by WinSAT, again it mainly saved CPU time:

DXVAChecker x64 - Playback Benchmark (scaled to 1280x720)
175.522 fps | 0% CPU (Native)
166.549 fps | 5% CPU (Copyback-Direct)
163.770 fps | 8% CPU (Copyback)


That said, I can't complain about lower CPU usage with slightly higher performance even if DXVA2 Native retains an edge.

Last edited by cyberbeing; 1st February 2015 at 12:29.
cyberbeing is offline   Reply With Quote
Old 2nd February 2015, 06:26   #259  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by nevcairiel View Post
The hybrid decoder is CPU bound, which is why it behaves quite differently.
Using a Core i7-4790 with an iGPU HD 4600, it's definitely GPU bound.

Quote:
Originally Posted by nevcairiel View Post
This mode does allow however using CB for HEVC 4K Main10 files on the 960 without any significant performance impact, while DXVA2-Native for 10-bit just doesn't work with the software we have.
Can you post some figures in Playback Performance mode for 10bit 4K HEVC in DXVA Copy-Back ?

For example the ASTRA clip or any other demanding clip which mimics UHD Bluray.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 2nd February 2015, 16:02   #260  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,346
On the Astra clip, using DXVA2 Copy-Back in direct mode. In the meantime I implemented Direct Mode for P010 out, and a direct mode for P010 decode -> NV12 out (for renderers that don't do 10-bit)

Decode, Direct P010 Out: 127 fps, 3% CPU
Playback (1280x720), Direct P010 Out: --- EVR doesn't do P010 out.

Decode, Direct NV12 Out: 126 fps, 3% CPU
Playback (1280x720), Direct NV12 Out: 116 fps, 6% CPU

And for giggles without the new Direct Mode:
Decode, No Direct P010 Out: 73 fps, 6% CPU

A significant performance loss due to the extra copying being done right there.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:12.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.