Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
6th July 2019, 17:59 | #1 | Link |
Registered User
Join Date: Oct 2012
Posts: 7,926
|
impact of copyback operations for video playback
this threads is here to discuss and analyse the impact of copyback decoder or other copyback operations used for video playback.
just copypaste your findings from the madVR thread in here lets get started. |
8th July 2019, 11:33 | #2 | Link | |||
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Quote:
Quote:
Quote:
- first 3 minutes of Shazam 23p 4k HDR BD remux (~75GB, video bitrate 76.7 Mb/s) on a 4K screen - external srt subtitle is used (MPC-BE internal sub filter) - LAV filters - madvr: -- hdr passthrough -- only chroma upscaling is applied: NGU Sharp High -- dithering: Error Diffusion 2 -- no trade quality option is checked -- full screen window mode -- 10 bit output if possible CPU usage results. GPU usage results (checked with nvidiainspector): Code:
- software decoding, - crop: 74% - 82% - software decoding, + crop: 75% - 85% - cuvid copy-back, - crop: 82% - 90% - cuvid copy-back, + crop: 84% - 95% - dxva2 native: 76% - 80% - dxva2 copy-back, - crop: 83% - 87% - dxva2 copy-back, + crop: 83% - 91% - d3d11 native: 73% - 77% - d3d11 copy-back, - crop: 85% - 88% - d3d11 copy-back, + crop: 87% - 95% Interestingly enough, cropping (with copy-back modes) increases GPU usage and don't reduce it (it uses the same profile, so result is valid), the diff becomes ~15%! So, in summary, the fastest mode is: d3d11 native. Note about d3d11-native: no black-bar detection/ deinterlacing / BD menus (using jRiver) is available in this mode. I'll be curious about your results/graphs with similar test case, guys, including your system (mine is in my signature).
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config Last edited by chros; 9th September 2021 at 20:10. |
|||
8th July 2019, 12:10 | #3 | Link | ||
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Quote:
Quote:
- use a 4k hdr bt.2020 10bit 23p remux (HDR passthrough is fine) - use a 4k monitor/tv - set "12 bit" in nvidia CP - set "10 bit or above" in madVR - and use the highest setting you can with madvr chroma scaling (e.g. NGU Sharp @ High) All these settings are real life examples, not some demo material settings. @huhn, do you have a 4k display? If you don't have 4k remux files then I think guys posted some samples in the tonemapping topic.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config Last edited by chros; 8th July 2019 at 12:13. |
||
8th July 2019, 17:07 | #4 | Link |
Registered User
Join Date: Oct 2012
Posts: 7,926
|
i use these https://kodi.wiki/view/Samples broadcast is a real world file
and yes even for me something is happening which should not happening. 10 bit UHD 59p is only 2.5 times the copyback work of 10 bit UHD 23p but i get far far more CPU usages then 2.5 times. while i have a 4K TV you don't need one testing with DSR is fine too. |
10th July 2019, 10:03 | #5 | Link | ||
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Quote:
Quote:
I made my undervolted 6C12T Ryzen 5 2600 CPU lazy and capped it's speed at 2.8GHz with a Windows Power Plan (to not be hot in a fanless system) and it mainly runs around its lowest clocks 1.5GHz - 1.7GHz until some heavy task kicks in (e.g. encoding). I switched to the Ryzen Balanced plan (that makes the CPU as snappy as it can, ~3.4GHz), but there wasn't any change in copyback performance here.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config |
||
16th July 2019, 21:51 | #6 | Link | |||
Registered User
Join Date: Oct 2016
Posts: 896
|
Quote:
CPU 3500 / RAM 800 'agressive' timings: DXVA Checker 68,3 fps; madVR 283 dropped frames, avg 45,92 ms, max 58,61 ms CPU 3780 / RAM 864: DXVA Checker 73,4 fps; madVR 165 dropped frames, avg 43,92 ms, max 55,13 ms CPU still at 85+ % in all cases. I'd be curious what an old AMD CPU of the same era would get with its integrated memory controller. It has to be the platform. Edit: based on this old post by nevcairiel, Core 2 Duo has SSE4.1 so should be fine. Quote:
Quote:
__________________
HTPC: Windows 10 22H2, MediaPortal 1, LAV Filters/ReClock/madVR. DVB-C TV, Panasonic GT60, Denon 2310, Core 2 Duo E7400 oc'd, GeForce 1050 Ti 536.40 Last edited by el Filou; 17th July 2019 at 00:42. |
|||
17th July 2019, 12:22 | #8 | Link |
Registered User
Join Date: Oct 2016
Posts: 896
|
Mine does (Wolfdale).
I guess you need at least good DDR3 for 4K 10-bit copyback.
__________________
HTPC: Windows 10 22H2, MediaPortal 1, LAV Filters/ReClock/madVR. DVB-C TV, Panasonic GT60, Denon 2310, Core 2 Duo E7400 oc'd, GeForce 1050 Ti 536.40 |
19th July 2019, 10:26 | #9 | Link |
Registered User
Join Date: Oct 2012
Posts: 7,926
|
i'm testing my zen 2 right now finally some bad results.
i get 15 % CPU load on an 3700X something very odd is happening here. 1060 3700X 3200 mhz ram. UHD 60p NGU AA mid chroma SSim d1 100 to 1080p d3d11 copyback CPU 15 %, GPU 19 MS d3d11 native CPU 11 %, GPU 14 MS DXVA copyback CPU 12 %, GPU 19 MS 1 core is pretty much totally loaded with and without copyback. if an 8 core CPU is loaded like this my 2 core intel shouldn't be able to do this idling... edit: the CPU load seems to be a part of madVR... can you guys please retest with mpcVR just for the CPU load: https://github.com/Aleksoid1978/VideoRenderer/releases all you need to do is install and load it as an external filter. Last edited by huhn; 19th July 2019 at 10:49. |
19th July 2019, 14:39 | #10 | Link | ||
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Congrats, enjoy your new system!
Quote:
Quote:
Which madvr version do you use? latest stable or HDR2SDR test?
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config |
||
19th July 2019, 15:49 | #12 | Link |
Registered User
Join Date: Aug 2008
Posts: 343
|
Important for INTEL iGPU users.
Since ever, enabling dx11 option in lavfilters decoder was causing slowdown in video decoding, much much worse performance was noticable than dxva native, especially with highest resolutions. It was a few versions of Win10 and intel drivers back. Now, dont know why, but enabling dx11 automatic, makes possible normal playing even with 8k videos. It does not drop any frame with nightly 2.1 MPC video decoder. It drops some frames with Madvr but its not so noticable. Madvr consumes much shared memory (intel igpu video ram) and it is almost filled at 8k (i see it in process manager). And finally, video playback crawls with EVR Custom presenter. This renderer does not convert HDR>>SDR tones, so it's the least featured among possible renderers now. Need to switch to another renderer finally on daily basis. Take note, enabling dxva2 CB makes video slow. It must be some dx11 compatibility chain between decoder and renderer making dx11 playback better performance. I didnt watch CPU usage, Not so important right now for me. Last edited by littleD; 31st July 2019 at 20:17. |
19th July 2019, 17:04 | #13 | Link | |
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Quote:
I understand if you don't want to change it, I'm just telling that even small amount of workload can result in high clock speeds using the Ryzen Balanced Plan.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config |
|
20th July 2019, 13:30 | #15 | Link | |
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Quote:
2160p_59fps_hevc-LG_2_DEMO_4K_L_N_06_Slam Dunk.mkv d3d11 native (madVR / mpcbeVR): CPU MPC-BE: ~1.8 % / ~1.8 % d3d11 copyback - no zoom control/black bar detection (madVR / mpcbeVR) CPU MPC-BE: ~5.8 % / ~3.8% d3d11 copyback + zoom control and black bar detection (only madVR) CPU MPC-BE: ~9.2 % Load is equally distributed between threads (3-4 threads having load <20% out of 12). Edit: I added mpc-be video renderer as well. The diff between madvr and mpcbeVR is 2% using d3d11 copyback, while there's no difference using d3d11 native.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config Last edited by chros; 1st August 2019 at 11:31. |
|
21st July 2019, 04:40 | #16 | Link |
Registered User
Join Date: Oct 2012
Posts: 7,926
|
mpcVR d3d9
d3d11 native 5 % at very low frequency DXVA copyback 3 % at even lower clocks... DXVA native 0.3 % at idle mpcVR d3d11 d3d11 native 2% at idle 300mhz DXVA copyback 3 % at up to 1300 usually much lower DXVA native 1.0 % at idle and here again the madVR numbers they are lower then the old numbers. d3d11 copyback CPU 13 %, GPU 19 MS d3d11 native CPU 9 %, GPU 14 MS DXVA copyback CPU 12 %, GPU 19 MS pressing control+v in a browser has higher load then copyback operation with mpcVR i have PCIe 3.0 which is currently not a given with ryzen 3000 |
3rd August 2019, 13:32 | #17 | Link |
Registered User
Join Date: Mar 2002
Posts: 2,323
|
I added CUVID results to the above post: as @nevcairiel suggested, it performs similarly to other copy-back methods.
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config |
20th September 2019, 21:11 | #18 | Link |
Registered User
Join Date: Oct 2012
Posts: 7,926
|
system 1 specs:
i7 3770k 16GB Kingston HyperX blu. DDR3-1600 DIMM CL10 Dual Kit ASRock Z77Pro3 Intel Z77 1060 the system is not assembled anymore and it was used most of the time with 4 dimms 2 totally different kits but it stopped booten with them about 5 month ago. ram support was superb on this plat form. the missing ram was 8GB (2x 4096MB) TeamGroup Elite DDR3-1333 DIMM CL9-9-9-24 Dual Kit Oc to 1600 mhz system 2 specs: r7 3700x msi x570a pro 32GB Corsair Vengeance LPX schwarz DDR4-3200 DIMM CL16 Dual Kit currently 1060 system 3 specs: i3 4130 ASRock B85 Pro4 2*8GB (1x 8192MB) Crucial Ballistix Sport DDR3-1600 DIMM CL9-9-9-24 Single mostly run at 1333 mhz because i didn't care. currently 960 system 3 up to this date doesn't really care about copyback operation up to UHD 23p at 60p it show real differences. |
21st September 2019, 20:46 | #19 | Link | ||
Registered User
Join Date: Mar 2002
Posts: 2,323
|
Quote:
Quote:
What more interesting is that it has big impact on the "normal" GPU operations for whatever reason. I added software decoding results to the above post: as @nevcairiel suggested, it performs better than other copy-back methods, almost like dxva2 native!
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config Last edited by chros; 21st September 2019 at 20:52. |
||
24th September 2019, 11:45 | #20 | Link | |
Registered User
Join Date: Oct 2016
Posts: 896
|
Quote:
System: PCIe 2.0 x16, Core 2 FSB 333 DDR2-800, 1050 Ti. Settings: 2160 to 1080, SSIM2D100, scale chroma separately, HDR processing with no trade quality and with highlights recovery. (Edit: GPU clock is at maximum boost in both cases) native / copyback (which skipped 110 frames in just 30 seconds): 33.61 / 51.32 0.53 / 1.79 Jinc Image Downscaling - Convert to Linear Light 10.04/ 10.83 Jinc Downscaling 0.45 / 3.79 SSIM RT 0.41 / 0.90 SSIM Final 0.77 / 1.48 SSIM AR 0.83 / 1.69 HDR Blur Dif 6.84 / 8.45 HDR Blur 0.66 / 0.88 HDR Frequency Split 0.47 / 0.64 Chrome Scaling - Shift X 0.47 / 1.03 Chrome Upscaling - ConvertToRGB 1.20 / 1.72 HDR Tone Map 0.47 / 1.49 HDR Blur Dif 4.46 / 6.45 HDR Blur 0.42 / 1.23 HDR Join 0.50 / 1.05 Image Scaling X 0.16 / 0.62 Image Scaling Y 0.09 / 0.35 HDR Compare 0.09 / 0.18 HDR Blur Dif 0.67 / 1.25 HDR Blur 0.42 / 0.70 HDR Gamut Map 0.57 / 2.16 HDR Final I don't know enough about GPUs to form an opinion, and also the fact that it drops frames may render this comparison invalid, but what's interesting is the long Jinc Downscaling step which by itself takes 1/3 of the rendering time sees a really small difference. Maybe a scheduling issue in the memory controller between the video decoder, the compute units, and the bus interface?
__________________
HTPC: Windows 10 22H2, MediaPortal 1, LAV Filters/ReClock/madVR. DVB-C TV, Panasonic GT60, Denon 2310, Core 2 Duo E7400 oc'd, GeForce 1050 Ti 536.40 Last edited by el Filou; 24th September 2019 at 11:55. |
|
|
|