Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 6th July 2019, 17:59   #1  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 5,644
impact of copyback operations for video playback

this threads is here to discuss and analyse the impact of copyback decoder or other copyback operations used for video playback.

just copypaste your findings from the madVR thread in here lets get started.
huhn is offline   Reply With Quote
Old 8th July 2019, 11:33   #2  |  Link
chros
Registered User
 
chros's Avatar
 
Join Date: Mar 2002
Posts: 1,337
Quote:
Originally Posted by clsid View Post
Native is obviously more efficient than copyback. But the performance impact, while noticeable, isn't that huge that makes it a necessity to use.
Quote:
Originally Posted by el Filou View Post
did you test if disabling black bar detection changes anything?
Quote:
Originally Posted by chros View Post
I don't remember But for me the only advantage of using copyback would be to utilise black bar detection+cropping to save performance, and that's not the case. Otherwise I don't mind the full image processing and it will make to write profile rules easier.
My last report about this using GPU 1060 6GB (max, underclocked) freq is 1544Mhz:
- first 3 minutes of Shazam 23p 4k HDR BD remux (~75GB, video bitrate 76.7 Mb/s) on a 4K screen
- external srt subtitle is used (MPC-BE internal sub filter)
- LAV filters
- madvr:
-- hdr passthrough
-- only chroma upscaling is applied: NGU Sharp High
-- dithering: Error Diffusion 2
-- no trade quality option is checked
-- full screen window mode
-- 10 bit output if possible

GPU usage results (checked with nvidiainspector):
Code:
- dxva2 native:			76% - 80%
- dxva2 copy-back, - crop:	83% - 87%
- dxva2 copy-back, + crop:	83% - 91%
- d3d11 native:			73% - 77%
- d3d11 copy-back, - crop:	85% - 88%
- d3d11 copy-back, + crop:	87% - 95%
There's the ~10% difference on my system. The closest performer is dxva2 native but with its obvious flaws.
Interestingly enough, cropping (with copy-back modes) increases GPU usage and don't reduce it (it uses the same profile, so result is valid), the diff becomes ~15%!

I'll be curious about your results/graphs with similar test case, guys, including your system (mine is in my signature).
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v385.28),Win10 LTSB 1607,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED65B8(2160p@23/24/25/29/30/50/59/60Hz)

Last edited by chros; 8th July 2019 at 11:40.
chros is offline   Reply With Quote
Old 8th July 2019, 12:10   #3  |  Link
chros
Registered User
 
chros's Avatar
 
Join Date: Mar 2002
Posts: 1,337
Quote:
Originally Posted by el Filou View Post
The real limitations of copyback decoding only start to become a problem with 10-bit 4K, because it takes up 8 times the bandwidth of 8-bit FHD. With lower resolutions, using copyback or native doesn't have an impact on which madVR settings I am able to use. With 4K 10-bit it does.
Quote:
Originally Posted by tp4tissue View Post
is that on full bitrate 4k remux ?
Quote:
Originally Posted by huhn View Post
it's a broadcast sample and we are talking about decoded frames here they have always the same size with a 10 mbit source or a 125 mbit.
Somehow it does matter at some point in the rendering process. So to try out the biggest impact we can:
- use a 4k hdr bt.2020 10bit 23p remux (HDR passthrough is fine)
- use a 4k monitor/tv
- set "12 bit" in nvidia CP
- set "10 bit or above" in madVR
- and use the highest setting you can with madvr chroma scaling (e.g. NGU Sharp @ High)

All these settings are real life examples, not some demo material settings.

@huhn, do you have a 4k display? If you don't have 4k remux files then I think guys posted some samples in the tonemapping topic.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v385.28),Win10 LTSB 1607,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED65B8(2160p@23/24/25/29/30/50/59/60Hz)

Last edited by chros; 8th July 2019 at 12:13.
chros is offline   Reply With Quote
Old 8th July 2019, 17:07   #4  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 5,644
i use these https://kodi.wiki/view/Samples broadcast is a real world file

and yes even for me something is happening which should not happening.

10 bit UHD 59p is only 2.5 times the copyback work of 10 bit UHD 23p but i get far far more CPU usages then 2.5 times.

while i have a 4K TV you don't need one testing with DSR is fine too.
huhn is offline   Reply With Quote
Old 10th July 2019, 10:03   #5  |  Link
chros
Registered User
 
chros's Avatar
 
Join Date: Mar 2002
Posts: 1,337
Quote:
Originally Posted by el Filou View Post
4. Just out of curiosity I underclocked the CPU to 2100 MHz (FSB 200), to be able to test more different RAM speeds:

(Jellyfish 10-bit DXVA Checker decode):

RAM @ 400: 131,6 fps (native 268,4), CPU 76, GPU 1589, video 989, bus 13
RAM @ 533: 139,6 fps (native 275,7), CPU 70, GPU 1642, video 909, bus 14
RAM @ 666: 148,8 fps (native 281,1), CPU 67, GPU 1642, video 798, bus 15
RAM @ 800: 146,0 fps (native 281,8), CPU 68, GPU 1428, video 766, bus 15

for reference, CPU @ 3500 & RAM @ 800: 210,7 fps, CPU 60, GPU 1797, video 996, bus 22

With same RAM speed but 66% faster CPU, 40-45% more fps.
With same (slow) CPU speed but 66% faster RAM, 13% more fps.
Quote:
Originally Posted by nevcairiel View Post
70% CPU usage on Copy-Back is not a typical result, really. On NVIDIA or Intel you should see extremely low CPU usage, if you have a relatively recent CPU, since both of those will use the DMA engines to copy the image, which does not result in high CPU usage.
AMD, especially on older generations, has been notoriously bad with copy-back, and I would not recommend using it there, or using it as a testing reference for any meaning beyond those cards specifically.
I just tried out here with different CPU speeds (DDR4 RAM is 3200 MHz @CL16), in short: there's no change here.

I made my undervolted 6C12T Ryzen 5 2600 CPU lazy and capped it's speed at 2.8GHz with a Windows Power Plan (to not be hot in a fanless system) and it mainly runs around its lowest clocks 1.5GHz - 1.7GHz until some heavy task kicks in (e.g. encoding).
I switched to the Ryzen Balanced plan (that makes the CPU as snappy as it can, ~3.4GHz), but there wasn't any change in copyback performance here.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v385.28),Win10 LTSB 1607,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED65B8(2160p@23/24/25/29/30/50/59/60Hz)
chros is offline   Reply With Quote
Old Yesterday, 21:51   #6  |  Link
el Filou
Registered User
 
el Filou's Avatar
 
Join Date: Oct 2016
Posts: 466
Quote:
Originally Posted by el Filou View Post
CPU 3500 / RAM 666: DXVA Checker decode 63,0 fps; madVR 439 dropped frames, avg 50,16 ms, max 78,17 ms

CPU 3500 / RAM 800: DXVA Checker 66,8 fps; madVR 315 dropped frames, avg 45,68 ms, max 63,78 ms
So I pushed my Core 2 Duo a bit more for fun, and on that same UHD BD 75-second test clip I now have:

CPU 3500 / RAM 800 'agressive' timings: DXVA Checker 68,3 fps; madVR 283 dropped frames, avg 45,92 ms, max 58,61 ms

CPU 3780 / RAM 864: DXVA Checker 73,4 fps; madVR 165 dropped frames, avg 43,92 ms, max 55,13 ms

CPU still at 85+ % in all cases.
I'd be curious what an old AMD CPU of the same era would get with its integrated memory controller. It has to be the platform.

Edit: based on this old post by nevcairiel, Core 2 Duo has SSE4.1 so should be fine.
Quote:
Originally Posted by nevcairiel View Post
The Athlon 64 is probably still rather slow for CB, as it doesn't have SSE4.1, which introduced the optimized instructions for copying from GPU memory to system memory. It can make a world of difference.
My results are closer to what nevcairiel measured for the 'non-direct' copyback method:
Quote:
Originally Posted by nevcairiel View Post
Decode, Direct P010 Out: 127 fps, 3% CPU
Decode, Direct NV12 Out: 126 fps, 3% CPU
And for giggles without the new Direct Mode:
Decode, No Direct P010 Out: 73 fps, 6% CPU
I've gone back and read the LAV thread from that post: https://forum.doom9.org/showthread.php?t=171219&page=15 and apparently performance is heavily dependent on memory speed even with SSE4.1. Maybe even modern systems with slower memory can see an impact?
__________________
HTPC: W10 1809, E7400, 1050 Ti, DVB-C, Denon 2310, Panasonic GT60 | Desktop: W10 1809, 4690K, HD 7870, Dell U2713HM | MediaPortal 1/MPC-HC, LAV Filters, ReClock, madVR

Last edited by el Filou; Today at 00:42.
el Filou is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:30.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.