Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
27th January 2015, 22:56 | #241 | Link |
Registered User
Join Date: Jan 2007
Posts: 5
|
Have you tested the refresh rate accuracy for 23.976 and 59.940 fps on the 960? That seems to be an area where Intel has the edge over everyone else. On Haswell, it just works out-of-the-box without having to mess with custom out-of-spec timings that may not work on all displays.
|
29th January 2015, 14:56 | #242 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
I've been working on 10-bit decoding, and while it mostly works, it doesn't seem to be likely that it'll work in DXVA2 native mode anytime soon, as EVR just either crashes or refuses to render anything when receiving P010.
The good news is that as a separate side-project I've been working on reducing the CPU usage of Copy-Back, and assuming a direct decoder -> output chain (no software deint, no format conversion, clean nv12 decode and nv12 output, or p010 decode and p010 output), I could reduce the CPU usage on my NVIDIA system by up to 50%, and additionally the benchmarked performance is now 99% the same as DXVA2-Native benchmarks. These gains may be a bit smaller on Intel or AMD GPUs, but they will still be quite noticeable. I may also add a P010 decode -> NV12 output path using these optimizations, as that may be a common use-case for now when viewing 10-bit content on many renderers.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
29th January 2015, 15:38 | #243 | Link | |
AV heretic
Join Date: Nov 2009
Posts: 422
|
Quote:
|
|
29th January 2015, 15:43 | #244 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
The Athlon 64 is probably still rather slow for CB, as it doesn't have SSE4.1, which introduced the optimized instructions for copying from GPU memory to system memory. It can make a world of difference.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
29th January 2015, 16:14 | #247 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
I won't bother with the complexity of 10-bit through DXVA2-Native for madVR only, which works much better with CB anyway (or even does CB itself!), not to mention that it probably doesn't accept it right now either.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
29th January 2015, 20:10 | #248 | Link | |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
What about EVR-CP ? Any difference ? Could Intel or AMD do any tricks with the driver, in order for DXVA2 Native and 10bit to be compatible ?
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
|
29th January 2015, 20:56 | #249 | Link | |
Registered User
Join Date: Oct 2012
Posts: 7,926
|
Quote:
maybe the current or next version of madVR will handle P010 DXVA input correctly. EVR can't even handle P010 from software decoding... |
|
29th January 2015, 21:45 | #250 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Then Microsoft should build a better renderer for Windows 10 in order to use 10bit.
I don't use madVR.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
30th January 2015, 03:25 | #252 | Link | |
Registered User
Join Date: Mar 2013
Posts: 31
|
Quote:
therefore if you want avoid cpu copy-back,you need convert YUV 420 to RGB(StretchRect or VideoProcessBlt)in DXVA pass, in render pass,convert it back to YUV420,apply some upsample and color-space mapping algorithm,or use the RGB surface directly. the problem is most GPU can't do P010/P016->RGB conversion under D3D9EX,so if decoder output P010/P016 surface,then can't convert it to D3D compatible format with GPU. D3D11.1 can read YUV420 format directly(mapping to 2 SRV one for Y and the other for UV)。so it no need to do the YUV->RGB conversion in DXVA pass。but it need a DX11 Decoder(Decoder use DX11 device instead of D3D9Ex)。D3D9Ex Device can't share YUV420 DXVA Surface to DXGI_FORMAT_NV12 or DXGI_FORMAT_P010/DXGI_FORMAT_P016 |
|
30th January 2015, 06:57 | #253 | Link |
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Microsoft's MFT decoders are all D3D11, but I think LAV Video is D3D9.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
30th January 2015, 10:29 | #254 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
DirectShow EVR is D3D9 only, it doesn't offer a D3D11 interface to DXVA2. No experience with MediaFoundation to say if thats any different.
Its possible that MF EVR is a lot better and supports all we need already, but other than Windows Media Player, nothing uses MF. Microsoft is unfortunately not going to care about DirectShow in Windows 10 anymore, they haven't really cared about it for a long time now.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
1st February 2015, 09:25 | #256 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
On my GTX 770 (PCI-E 3.0 x8) & i5-3570K @4.4Ghz, the new LAV dxva2cb-Direct mode only has a minor performance benefit with the 4K HEVC hybrid decoder, but it does reduce CPU usage somewhat. dxva2native continues to be much faster with the HEVC 4K hybrid decoder on this GPU.
3.Ducks-2160p@50fps-4Mbps DXVAChecker x64 - Playback Benchmark (scaled to 1280x720) DXVA2 Native: 187.350 fps | CPU Usage 28% | GPU Load ~93% DXVA2 Copyback: 66.561 fps | CPU Usage 21% | GPU Load ~52% DXVA2 Copyback Direct: 73.253 fps | CPU Usage 15% | GPU Load ~52% Last edited by cyberbeing; 1st February 2015 at 12:23. |
1st February 2015, 11:06 | #257 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
The hybrid decoder is CPU bound, which is why it behaves quite differently. Additionally, it probably has to upload a lot more data to the GPU itself, which I then download again, and upload again, which would explain how its generally not ideal for a CB workflow.
On H.264, or using my 960 with HEVC/HEVC 10, the speed difference is practically non-existent with minimal CPU usage (< 2%, down from 5-6%) now. This mode does allow however using CB for HEVC 4K Main10 files on the 960 without any significant performance impact, while DXVA2-Native for 10-bit just doesn't work with the software we have.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 1st February 2015 at 11:09. |
1st February 2015, 12:23 | #258 | Link |
Broadband Junkie
Join Date: Oct 2005
Posts: 1,859
|
I figured as much, which is why I've always seen the hybrid decoder as nothing more than a crutch. If copyback-direct greatly improves performance using the modern ASIC decoder on your GTX 960, that's all that really matters to me. What sort of decoding performance discrepancy were you seeing between 'normal dxva2 copyback' and 'dxva native' on your GTX 960 prior to these changes?
The ASIC decoder on this GTX 770 seems too slow to bottleneck copyback on high-bitrate 4K H264. Testing a 100Mbps clip, the biggest difference was lower CPU usage: DXVAChecker x64 - Playback Benchmark (scaled to 1280x720) 37.342fps | 0% CPU (Native) 37.206fps | 3% CPU (Copyback-Direct) 37.198fps | 5% CPU (Copyback) On something like 10Mbps 1080p H264 baseline clip used by WinSAT, again it mainly saved CPU time: DXVAChecker x64 - Playback Benchmark (scaled to 1280x720) 175.522 fps | 0% CPU (Native) 166.549 fps | 5% CPU (Copyback-Direct) 163.770 fps | 8% CPU (Copyback) That said, I can't complain about lower CPU usage with slightly higher performance even if DXVA2 Native retains an edge. Last edited by cyberbeing; 1st February 2015 at 12:29. |
2nd February 2015, 06:26 | #259 | Link | ||
Registered User
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
|
Quote:
Quote:
For example the ASTRA clip or any other demanding clip which mimics UHD Bluray.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1) HEVC decoding benchmarks H.264 DXVA Benchmarks for all |
||
2nd February 2015, 16:02 | #260 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,347
|
On the Astra clip, using DXVA2 Copy-Back in direct mode. In the meantime I implemented Direct Mode for P010 out, and a direct mode for P010 decode -> NV12 out (for renderers that don't do 10-bit)
Decode, Direct P010 Out: 127 fps, 3% CPU Playback (1280x720), Direct P010 Out: --- EVR doesn't do P010 out. Decode, Direct NV12 Out: 126 fps, 3% CPU Playback (1280x720), Direct NV12 Out: 116 fps, 6% CPU And for giggles without the new Direct Mode: Decode, No Direct P010 Out: 73 fps, 6% CPU A significant performance loss due to the extra copying being done right there.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
|
|