View Single Post
Old 21st September 2019, 20:46   #19  |  Link
chros
Registered User
 
chros's Avatar
 
Join Date: Mar 2002
Posts: 2,323
Quote:
Originally Posted by nevcairiel View Post
The problem with CopyBack is also that it has to copy the image twice, once from the GPU to the system, and then back from the system to the GPU. On some GPUs, the download step is also rather slow (AMD used to historically have trouble there, no clue how recent hardware changed). But it can also stress the system RAM, especially on dual-channel memory mainstream systems.

I did a quick test on my system (which isn't a good example, since it has fast quad-channel RAM and everything else high-end as well, but regardless) with a random 4K 10-bit test clip I had at hand:
With DXVAChecker and naive EVR playback testing

DXVA2-Native, ~380 FPS
DXVA2-CopyBack, ~104 FPS
Software Decoding, ~196 FPS

The native test is close to what the hardware decoder can achieve, it was at ~95% usage most of the time. CopyBack definitely takes quite a toll on 4K. Interestingly on 1080p the overhead from CopyBack is generally extremely minimal.
Interesting is also software decoding. Granted you need a CPU that can actually decode this fast, and it was decode-limited at this point, but uploading the image alone is not bottlenecking the decoder yet.

Since I could upload at 196 fps at least (and probably more), I did another test, DXVA2-CopyBack, Decode only - which means it'll only download the image from the GPU, but not re-upload it. That yielded ~232 FPS.
Clearly the doubled use from download and upload creates the real bottleneck ... somewhere. Its not entirely clear where the real bottleneck is. Clearly the software upload path in the renderer can handle more then ~104 FPS. Clearly the download path in LAV Video can as well. PCIe is full-duplex, which means it should be capable of sending and receiving at the same time. System Memory is more complex in regards to that... but my quad-channel memory should have plenty bandwidth to accomodate this here.

What I don't know is if the EVR used in this example uses a different thread for uploading the video, or if its on the same thread as LAV Video uses to deliver the image - which might explain why its slowing down so much, since it does two things on the same thread. madVR, at least, uses a seperate thread for uploading, so it wouldn't be affected by that.
Quote:
Originally Posted by littleD View Post
Not sure if i add something new, but computers since like forever were designed with the data flow in one direction. To make pc games reach high fps rates, it means the pc system should have fast cpu>gpu memory transfer. Backward direction was always few times slower because there were no applications needing that. That fact starts matter in GPGPU times, even before Opencl, because general processing need to reupload data many times. Since then - the path gpu>cpu have been steadily improved but still is slower.
Ram speed has no much impact in cpu>gpu transfer since thats native for pc architecture. Upload to gpu, decode (texture/image) and render is typical for pc game. Quad channel might have advantage in software decoding whith many cpu<>ram memory transfers. So ram speed might matter in software decoding.
In Your example, alone downloading image (decode only) looks fast anyhow. Reuploading (playback) is contrary to PC design, even with fast RAM, slow speed may be hardware or software limitation (dxva design?). There are some small tools to benchmark PCIE gpu>cpu transfer.
Interesting to see that native is ~3.6x faster than copyback and software decoding ~1.9x faster.

What more interesting is that it has big impact on the "normal" GPU operations for whatever reason.

I added software decoding results to the above post: as @nevcairiel suggested, it performs better than other copy-back methods, almost like dxva2 native!
__________________
Ryzen 5 2600,Asus Prime b450-Plus,16GB,MSI GTX 1060 Gaming X 6GB(v398.18),Win10 LTSC 1809,MPC-BEx64+LAV+MadVR,Yamaha RX-A870,LG OLED77G2(2160p@23/24/25/29/30/50/59/60Hz) | madvr config

Last edited by chros; 21st September 2019 at 20:52.
chros is offline   Reply With Quote