Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th September 2014, 20:03   #21  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by huhn View Post
CUVID and DXVA2 use the same decoder so CUVID is like DXVA2 copyback at least with new cards.
currently i don't see any benefit with using CUVID over DXVA2 copyback or native
From the words of nevcairiel, I was under the impression that Nvidia's hybrid HEVC decoder was a CUVID only feature, like MPEG4-ASP
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 24th September 2014, 20:10   #22  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Who ever said MPEG4-ASP is a CUVID only feature? Its just that noone ever bothered to implement it in DXVA2, because its quite a bit of work for practically no return value.
With CUVID, it just comes for free, since its all handled in the driver, not much code needed at all.

HEVC is obviously implemented in DXVA2, and as such it also works on NVIDIA of course, and not only through CUVID.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 24th September 2014 at 20:33.
nevcairiel is online now   Reply With Quote
Old 24th September 2014, 20:40   #23  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
OK I'll test it and I'll post the results.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 25th September 2014, 00:39   #24  |  Link
sheppaul
Registered User
 
Join Date: Sep 2004
Posts: 146
There is a MPEG4-ASP DxVA decoder in potplayer though.

I believed the code was from lav.

Last edited by sheppaul; 25th September 2014 at 00:55.
sheppaul is offline   Reply With Quote
Old 25th September 2014, 01:17   #25  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,903
Quote:
Originally Posted by sheppaul View Post
There is a MPEG4-ASP DxVA decoder in potplayer though.

I believed the code was from lav.
why do you think so there is no DXVA decoder for ASP in lavfilter only CUVID.
huhn is offline   Reply With Quote
Old 25th September 2014, 03:22   #26  |  Link
edison
Registered User
 
Join Date: Dec 2005
Posts: 106
The x64 LAV video does good job here, but there is another problem, we still have not a mature replay suit. for example: there is not a x64 DTS-HD software audio decoder so far.
So, we still need to wait for a real HW decoder which support 4K 60p(at least) HEVC( GM1XX does not have a full HW HEVC decoder, but GM2XX does).
edison is offline   Reply With Quote
Old 25th September 2014, 04:19   #27  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,903
Quote:
Originally Posted by edison View Post
The x64 LAV video does good job here, but there is another problem, we still have not a mature replay suit. for example: there is not a x64 DTS-HD software audio decoder so far.
So, we still need to wait for a real HW decoder which support 4K 60p(at least) HEVC( GM1XX does not have a full HW HEVC decoder, but GM2XX does).
the new nvidia 980/970 didn't have a full HW decoder for HEVC they only have a HW HEVC encoder nvenc used for shadow play of cause they don't use HEVC encoding at the current version but the card can do it.

you can read about this here.

http://www.anandtech.com/show/8526/n...x-980-review/5
huhn is offline   Reply With Quote
Old 25th September 2014, 10:12   #28  |  Link
sheppaul
Registered User
 
Join Date: Sep 2004
Posts: 146
Quote:
Originally Posted by huhn View Post
why do you think so there is no DXVA decoder for ASP in lavfilter only CUVID.
There is a checkbox for that though it is disabled by default.

I didn't know that it is not implemented.

BTW, the hybrid hevc decoder is pretty disappointing. Is there any room to improve?

Last edited by sheppaul; 25th September 2014 at 10:41.
sheppaul is offline   Reply With Quote
Old 25th September 2014, 13:38   #29  |  Link
videoh
Useful n00b
 
Join Date: Jul 2014
Posts: 1,667
Quote:
Originally Posted by sheppaul View Post
BTW, the hybrid hevc decoder is pretty disappointing.
In what way do you find it to be disappointing?
videoh is offline   Reply With Quote
Old 25th September 2014, 13:39   #30  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by sheppaul View Post
BTW, the hybrid hevc decoder is pretty disappointing. Is there any room to improve?
From the results of cyberbeing using Nvidia 770, the hybrid decoder using DXVAn can be a lot faster than CPU decoder on my CPU Core i7-4790 and 2160p clips.

Look at the Ducks sample.
It's more than two times faster.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 25th September 2014, 16:52   #31  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Using a laptop with Nvidia's Optimus technology proved a difficult task for video decoding benchmarking.

First of all, I didn't find a way to disable it in BIOS, so I had to live with it.

Using initially Intel's HD 4600 iGPU inside the Core i5-4200M, I saw that even when DXVA native was used during video decoding, the HD 4600 never reached max clocks with maximum utilization.

The clocks were most of the time at 850MHz@100% usage for 1080p and 2160p clips with a max clock of 1150MHz.

Also, the CPU utilization was too high for DXVA native mode, ~40% for 1080p clips and 33% - 57% for 2160p and the CPU clock went to max 2.5GHz a lot of times during benchmarking.

It looks like a system with an iGPU and a discrete Nvidia card on the same system using Optimus technology, can't utilize perfectly DXVA native mode for both GPUs.

Of course DXVA copy-back was even slower.

From the Control Panel I chose max performance for iGPU.
Also I tried using the laptop in battery and plugged-in.

Nothing changed.



But with Nvidia's 740M GPU the problems were a lot bigger.

Starting with 340.52 driver, the clocks went high at 980MHz (max 1060MHz) but with low GPU utilization.
Then the video dropped to 0fps and stopped working.

Unfortunately that happened with almost all clips I tried and all modes (DXVA copy-back, NVCUVID)

I tried both LAV x86 and x64 with no success.

I did an update to the latest driver 344.11, but still had issues.

The GPU clocks went further down to 140MHz - 230MHz (!) with very little GPU usage, but at least most of the clips were fully decoded without sudden stops, but not all the clips I tried.
Some, still have incompatibilities.

So, there are still compatibility problems with 740M even with latest drivers and LAV nightly 0.62.46

The results were extremely low for the clips that decoded fine and some didn't reach the end, so I decided to not include any results.

I don't know if it's Optimus technology problem or driver or LAV Video.

I will try again after a while, to see if anything has changed.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 26th September 2014 at 17:14.
NikosD is offline   Reply With Quote
Old 25th September 2014, 17:10   #32  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,903
the max gpu clock is heat controlled for both nvidia and intel.

and this is a laptop they are not know for efficient heat management.
huhn is offline   Reply With Quote
Old 25th September 2014, 19:15   #33  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by huhn View Post
the max gpu clock is heat controlled for both nvidia and intel.

and this is a laptop they are not know for efficient heat management.
I was way below the heat throttling of both GPUs.

Especially for Nvidia, the issue is certainly not there.


Quote:
Originally Posted by cyberbeing View Post
i5-3570K @4.4Ghz, 16GB DDR3 1866Mhz 8-9-9-24
NVIDIA GTX 770 2GB, PCI-E 3.0 x8
Win7 SP1 x64
LAV Filters git-33f0b3 (2014-09-21)
DXVA Checker 3.1.2

OK you have a:
Core i5-3570K @4.4Ghz (Ivy) 4cores/4threads (No HT) - Win7 SP1 x64 - 16GB - DDR3 1866MHz CL9 - LAV Filters git-33f0b3 (2014-09-21) - DXVA Checker 3.1.2

I have a:
Core i7-4790@3.8GHz (Haswell) 4cores/8threads (HT On) - Win 8.1 Pro x64 - 8GB - DDR3 1600MHz CL9 - LAV filters git 52 (25-09-2014) - DXVA Checker 3.1.2 (DXVA Decoding)

Your processor has 16% higher frequency, but it's only 4 threads vs 8 threads and it's older architecture (Ivy vs Haswell)


Results:


1.Beauty-2160p@30fps-12.3Mbps


LAV x64 i5-3570K 97/107/111 CPU Utilization: 95% (Threads=12) (Windows 7) (4.4GHz)

LAV x64 i5-3570K 89/103/109 CPU Utilization: 91% (Threads=12) (Windows 8.1) (4.4GHz)

LAV x64 i7-3770K 85/101/105 CPU Utilization: 79% (Threads=16) (HT on) (Windows 7) (4.0GHz)

LAV x64 i7-4790 82/95/99 CPU Utilisation: 78% (Threads=16) (HT on) (Windows 8.1) (3.8GHz)

LAV x64 i7-3770K 88/93/100 CPU Utilization: 95% (Threads=16) (HT off) (Windows 7) (4.0GHz)

LAV x64 i7-3770K 80/93/99 CPU Utilization: 70% (Thread=Auto) (HT on) (Windows 7) (4.0GHz)

LAV x64 i7-3770K 77/92/97 CPU Utilization: 94% (Threads=12) (HT off) (Windows 7) (4.0GHz)

LAV x64 i7-4790 78/90/95 CPU Utilisation: 73% (Threads=Auto) (HT on) (Windows 8.1) (3.8GHz)

LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) (Windows 7) (4.4GHz)

LAV x64 i7-4790 69/81/87 CPU Utilisation: 90% (Threads=16) (HT off) (Windows 8.1) (3.8GHz)

LAV x64 i7-4790 69/81/85 CPU Utilisation: 90% (Threads=12) (HT off) (Windows 8.1) (3.8GHz)

LAV x64 i7-3770K 61/71/76 CPU Utilization: 71% (Threads=Auto) (HT off) (Windows 7) (4.0GHz)

LAV x64 i7-4790 61/68/72 CPU Utilisation: 90% (Threads=Auto) (HT off) (Windows 8.1) (3.8GHz)





2.Fitness-2160p@30fps-8Mbps

LAV x64 i5-3570K (Null) 109/132/142 Threads=12
LAV x64 i7-4790 (Null) 99/120/148 CPU Utilisation: 92% Threads=16
LAV x64 i7-4790 (Null) 97/113/136 CPU Utilisation: 85% Threads=Auto



3.Ducks-2160p@50fps-4Mbps

LAV x64 i5-3570K (Null) 128/140/144 Threads=12
LAV x64 i7-4790 (Null) 109/122/127 CPU Utilisation: 92% Threads=16
LAV x64 i7-4790 (Null) 110/120/122 CPU Utilisation: 91% Threads=Auto



Astra-UHD@50fps-18Mbps (10bit)


LAV x64 i5-3570K 71/86/108 CPU Utilization: 94% (Threads=12) (Windows 7) (4.4GHz)

LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) (Windows 7) (4.4GHz)

LAV x64 i5-3570K 73/81/86 CPU Utilization: 69% (Threads=Auto) (Windows 8.1) (4.4GHz)

LAV x64 i7-4790 61/72/85 CPU Utilisation: 80% (Threads=16) (HT on) (Windows 8.1) (3.8GHz)

LAV x64 i7-4790 59/69/81 CPU Utilisation: 76% (Threads=Auto) (HT on) (Windows 8.1) (3.8GHz)

LAV x64 i5-3570K 55/69/95 CPU Utilization: 75% (Threads=Auto) (Windows 7) (4.4GHz)

LAV x64 i7-4790 51/64/77 CPU Utilisation: 93% (Threads=12) (HT off) (Windows 8.1) (3.8GHz)

LAV x64 i7-4790 47/63/77 CPU Utilisation: 92% (Threads=16) (HT off) (Windows 8.1) (3.8GHz)

LAV x64 i7-4790 47/55/76 CPU Utilisation: 79% (Threads=Auto) (HT off) (Windows 8.1) (3.8GHz)



UHD_ENT_Transformer_Quad@24fps-51Mbps (10bit)

LAV x64 i5-3570K(Null-P010) 45/59/102
LAV x64 i7-4790 (Null) 45/58/85 CPU Utilisation: 96% Threads=16
LAV x64 i7-4790 (Null-P010) 45/57/80 CPU Utilisation: 94% Threads=Auto


Can you explain me at least the last result, a Core i7-4790 with a CPU utilization of 96% (full 8 threads decoding) to have less decoding performance than a Core i5-3570K with 15% higher frequency.

I find it unbelievable.

I used DXVA Checker's v3.1.2 DXVA decoding choice for Null renderer and latest (today's) LAV filters 0.62.52

Can you try that tool with that filters version to check the performance reporting CPU utilization too?

Thanks!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 26th September 2014 at 21:28.
NikosD is offline   Reply With Quote
Old 25th September 2014, 19:56   #34  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
All my previous results were with LAV Video set to Threads=12, since as you also noticed, CPU utilization and resulting performance is lower than expected on some HEVC samples with LAV set to Threads=Auto. I'd assume this is the discrepancy you are seeing.
cyberbeing is offline   Reply With Quote
Old 25th September 2014, 20:46   #35  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by cyberbeing View Post
All my previous results were with LAV Video set to Threads=12, since as you also noticed, CPU utilization and resulting performance is lower than expected on some HEVC samples with LAV set to Threads=Auto. I'd assume this is the discrepancy you are seeing.
OK.

I tried first putting Threads=12, but it was equal or a little slower than Auto for my CPU.

So I went all the way up to Threads=16.

I edited my previous post to show the results of Threads=16

There is an increase, but still I can't reach you.

I still find it extremely weird.

Do you have a secret ?

Did you use GraphStudioNext for benchmarking Null renderer ?

I can't figure it out, unless it's the different LAV filters version.

I can't think of anything else.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 25th September 2014, 21:41   #36  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
I'm not doing anything special:
DXVAChecker 3.1.2 -> Decoder tab -> Drag/Drop Video -> Click Arrow -> Benchmark -> DXVA decoding

Threading differences between Win7 & Win8, HT vs no-HT, architecture jumps not always being superior in all metrics, and that Intel CPUs sometimes unbottleneck themselves when overclocked could all be possible explanations. Also since you have a laptop, it's possible your CPU is power/thermal throttling and not using maximum TurboBoost when benchmarking. If you download RealTemp TI, it should show you the actual clock speed at any given time.


It looks like LAV HEVC threading problem I was referring to only occurs on the following two samples:

3570K@4.4Ghz
LAV Filters git-1d591 (Betaking build)


1.Beauty-2160p@30fps-12.3Mbps

LAV x64 i5-3570K 97/107/111 CPU Utilization: 95% (Threads=12)
LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto)


Astra-UHD@50fps-18Mbps (10bit)

LAV x64 i5-3570K 71/86/108 CPU Utilization: 94% (Threads=12)
LAV x64 i5-3570K 55/69/95 CPU Utilization: 75% (Threads=Auto)

The other samples had good utilization (>92%) with Threads=Auto, and only ~1% lower performance vs Threads=12. Not so for the two samples above.

Last edited by cyberbeing; 25th September 2014 at 21:52.
cyberbeing is offline   Reply With Quote
Old 25th September 2014, 22:06   #37  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,903
@cyberbeing
he uses processing with 1280x720.

my i7 3770k 4.0 ghz

get's this with 62.52 and processing 1280x720

Beauty-2160p@30fps

12 threads CPU 64 83 90 / 60 % cpu usage
huhn is offline   Reply With Quote
Old 25th September 2014, 22:27   #38  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
Quote:
Originally Posted by huhn View Post
@cyberbeing
he uses processing with 1280x720.
Above NikosD was testing Null, which implies no scaling.

My results using 'DXVA Processing' and 1280x720 scaling with actual renderer output, explicitly state so. There is not much difference in performance either way.

Since you both have CPUs with Hyperthreading, try disabling HT and see if your results improve. And you huhn, since you have a 3770k, could try matching my 4.4Ghz overclock. If you are only seeing 60% CPU Utilization on your 3770K that is extremely strange...were you also using Win8 and not Win7 like I am? I'll try booting into Win8.1 later and see if there is any difference.

Also, the LAV Filters nightly builds I've been using for testing are those from Betaking: here. Which builds have you guys been using?

Edit (Win7 vs Win8.1 Threading):
Windows 8.1 does seem to have marginally lower CPU Utilization and benchmark performance compared to Windows 7, but not significant enough to explain what NikosD or huhn are seeing.

1.Beauty-2160p@30fps-12.3Mbps
LAV x64 i5-3570K 97/107/111 CPU Utilization: 95% (Threads=12) (Windows 7)
LAV x64 i5-3570K 89/103/109 CPU Utilization: 91% (Threads=12) (Windows 8.1)

LAV x64 i5-3570K 76/82/86 CPU Utilization: 72% (Threads=Auto) (Windows 7)
LAV x64 i5-3570K 73/81/86 CPU Utilization: 69% (Threads=Auto) (Windows 8.1)

Edit2 (Betaking 2014-09-25 vs K-Lite 10.76 LAV nightly build):
No difference in performance.

Last edited by cyberbeing; 26th September 2014 at 01:20.
cyberbeing is offline   Reply With Quote
Old 26th September 2014, 00:58   #39  |  Link
huhn
Registered User
 
Join Date: Oct 2012
Posts: 7,903
i used windows 7.

lavfilter is 0.62.0.52 i took it from KLCP

4.4 GHz results in a blue screen after reboot. my 3770k has pretty bad quality silicon so i stay at 4.0.
my RAM was at 1333 for some reasons it now at normal 1600.
the system wasn't rebooted for some time looks a lot better with 1600 and rebooted system:

Beauty_3840x2160_120fps_420_8bit_HEVC_MP4

80 93 99 / 70% threads auto (should be 12 with 8 thread system)
85 101 105 / 79% threads 16

i'm pretty sure an i7 will look pretty good with 24-32 threads.

all used Null rendering

edit:
non HT

61 71 76 / 71% threads auto (should be 6)
77 92 97 / 94% threads 12
88 93 100 / 95% threads 16


looks like using 3 times the number of threads the CPU got helps for HEVC decoding.

12 thread with core 4 cpu and 12 threads with 4 core cpu with HT threads have about the same speed the min FPS is a lot higher with HT.
at 16 threads it looks like HT give a decent speed boost where non HT has no real difference except min fps.
i guess HT does a good job with a lot of threads too but this will take "a lot" of RAM. i guess even with 32 threads this shouldn't be a real problem for people with 8 thread CPUs.

Last edited by huhn; 26th September 2014 at 01:18. Reason: formating
huhn is offline   Reply With Quote
Old 26th September 2014, 08:02   #40  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Some clips are just not encoded in a way that would be beneficial to multi-threading, and while you can try throwing more and more threads at it to try to improve the CPU utilization, its usually a bad solution.
For such clips it may be useful to combine both frame and slice multi-threading, so that big I frames can use slice threading (or wpp) to speed up their decoding, since everything hinges on them being ready as a reference for the others. I've seen preliminary patches for this, but its never been finished.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:35.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.