Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 19th September 2011, 14:14   #61  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Sure you can share them so the MSDK Devs become aware of these issues and fix them in Driver, this is the goal of such a collaboration improving by sharing problems so the whole Ecosystem can leverage from it from ISVs, Vendors to Consumers in the End
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 19th September 2011 at 14:17.
CruNcher is offline   Reply With Quote
Old 19th September 2011, 14:23   #62  |  Link
BetaBoy
CoreCodec Founder
 
BetaBoy's Avatar
 
Join Date: Oct 2001
Location: San Francisco
Posts: 1,421
Quote:
Originally Posted by egur View Post
New and improved version. Zip files contain documentation, please read.
egur... thank you for this and your continued work on it.
__________________
Dan "BetaBoy" Marlin
Ubiquitous Multimedia Technologies and Developer Tools

http://corecodec.com
BetaBoy is offline   Reply With Quote
Old 19th September 2011, 14:28   #63  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Btw Egur this is also something that interests myself http://software.intel.com/en-us/foru...86355&o=a&s=lr
In the Documention it says there needs to be a Display connected @ least so in theory it should work with a Discreet Card inside and connected if another Monitor is also connected to the IGPU (or maybe a Dongle is enough to make Windows and the Driver and so the MSDK in believing a Monitor is connected ) I really could guess suddenly you would get a answer back that doesn't say unsupported anymore, though maybe Intel now decided to remove it completely after the Lucid Logix Partnership

Especially im interested how to leverage the DSP Encoder without needing 3rd party software for framebuffer copying like Lucid Logix in such a scenario (if a dongle is enough it would be perfect i didn't tried it yet, im still testing the full capabilities of Intels GT1 alone especially in Power Consumption, but obviously i keep a backup of every SDK and Driver to check if something dramaticaly changed or has been removed on purpose)

Im a little sad that my Mainboard manufacture didn't decided to give this capability to their customers for free especially early adopters but Intel did so so in the end they gave their users something for the Chipset Disaster for free (but if it should come out that it's possible without 3rd party software to leverage Quicksync alone on a multi GPU system then it would have been just a clever marketing step for both Intel and Lucid Logix) i wished other Vendors would have gone the same way but they made it a feature for Higher Class SKUs

So im really interested in the answer you gonna get myself
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 19th September 2011 at 15:00.
CruNcher is offline   Reply With Quote
Old 20th September 2011, 07:30   #64  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by CruNcher View Post
Im a little sad that my Mainboard manufacture didn't decided to give this capability...
FYI, I had to update my BIOS to get Virtu working on my Intel DH67GD motherboard. New BIOS had other enhancements like a much smarter fan control and fast boot (1-2s POST).

I think that with Viru you actually use the 2 GPUs, but you'll need 2 processes. Each process will use a different GPU (add one of them to Virtu's app list) and data needs to be copied to shared memory (memory mapped file). This is a little complex setup and I don't have the resources to explore it. At least I've proven that copying the data from the Intel GPU isn't too bad. Latest benchmarks for a 243 frame clip 1920x816 took 110ms (for all the frames) according to VTune Amplifier 2011.
I'll report in this thread if there's anything new on the matter.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 20th September 2011, 07:31   #65  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by BetaBoy View Post
egur... thank you for this and your continued work on it.
10x, I appreciate it.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 20th September 2011, 13:14   #66  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Egur i expected that Marketing Answer and im not happy with it @ all
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004
CruNcher is offline   Reply With Quote
Old 20th September 2011, 13:26   #67  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by CruNcher View Post
Egur i expected that Marketing Answer and im not happy with it @ all
What marketing answer?
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 20th September 2011, 15:41   #68  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Quote:
Originally Posted by egur View Post
What marketing answer?
Quote:
Eric,

This feature is only supported for systems with switchable graphics. More details can be found here:

http://www.intel.com/support/graphics/sb/CS-031103.htm

Details about how to set this up can be very system specific. We're hoping to add some clarifications to the documentation in the future.

Regards,

Jeff
That one
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004
CruNcher is offline   Reply With Quote
Old 22nd September 2011, 09:52   #69  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
performance for 0.13+

I've ran a few vtune sessions to optimize my code. New version (0.14) will be slightly faster than 0.13.
Test platform:
* Windows 7, 64 bit
* Core i7 2840 @2.4GHz (45W)
* MPC-HC (current version)
* A 10s clip. H264/AVC1, 1920x816, 243 frames

Vtune showed that the latest sse4_memcpy took 112ms for the entire clip. That's less than 0.5ms per frame (almost 1080p).
CPU usage was in the low single digits ~5%.
My DLL's code contributed 1/50 of that 5%.
A more important thing is the the CPU frequency went down to 800MHz, the lowest frequency SNB-mobile will go to for the entire clip. This is about 1/3 of the stock frequency and ~1/4 of max turbo.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 22nd September 2011, 12:42   #70  |  Link
Eliminateur
Registered User
 
Join Date: Jan 2010
Posts: 75
i'm really looking forward to see your decoder implemented in mpc-hc!(if it's possible at all), since right now dxva decoding is broken for SNB on MPC-HC
Eliminateur is offline   Reply With Quote
Old 22nd September 2011, 12:48   #71  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by Eliminateur View Post
i'm really looking forward to see your decoder implemented in mpc-hc!(if it's possible at all), since right now dxva decoding is broken for SNB on MPC-HC
It works with MPC-HC (32/64 bit).
MPC-HC is my only test platform for 64 bit BTW.
Using EVR in MPC-HC is very solid, except for several VC1 clips which are under inverstigation and only libwmv9 can play properly.
It's still work in progress, but things are quite stable and I'd appreciate more testers.
In MPC-HC just uncheck the internal filters for MPEG2/H264/VC1 in the "options->Internal Filters" dialog. Add ffdshow to the external filter list, configure it to use IntelQuickSync and you're set to go.
Latest version is always availble on the 1st page.

Comments are welcome.

Next release will come as an FFDshow installer like the standard builds.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 22nd September 2011, 13:00   #72  |  Link
Eliminateur
Registered User
 
Join Date: Jan 2010
Posts: 75
what i meant was working as in "integrated" into the internal filters, not as part of ffdshow separate installation.
When i get a new Pentium Gxxx machine built here in the shop i'll test if it works with that series
Eliminateur is offline   Reply With Quote
Old 22nd September 2011, 13:04   #73  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by Eliminateur View Post
what i meant was working as in "integrated" into the internal filters, not as part of ffdshow separate installation.
It's on my TODO list.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 22nd September 2011, 13:23   #74  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Yep it got amazing fast now and CPU overhead is in the range of Lav Cuvid now for Yoon Yoon it was a dramatic improvement from that heavy utilization @ the beginning to 18% and now only 7-8% pretty good (for non DXVA)
It would now even make sense to try it in a Quicksync based Framework and see how it does their inlcuding the Encoder inside ffdshow also looks like a good idea
Yeah the decoding issue with the MC.ts bitstream is still a problem it's funny that also Nvidia had problems in the beginning of their API with this i wonder why this bitstream type was overlooked by Nvidia and Intel now .
Also i might have found another H.264 issue but i have to isolate this first it happened in a pretty normal playback scenario.

I also tested Intels PP system but it's fairly weak (Denoise,Sharpening) are pretty basic implementations currently

Also Deinterlacing and IVTC work only Efficient on EVR with EVR-CP MBAFF Deinterlacing and IVTC are failing currently, though not much of a big deal as Shader based PP are usable on both and with Aero on tearing is history anyways (for my weak GT1 6 EU it works still pretty reliable and i still have clock headroom to improve higher res input)

PS: I see that a Dummy works like expected from the Documentation very nice no need for the Lucid solution though that you have no reference what for P-States the GT1 is using i wonder how much power it draws if in this headless mode i guess as much as with a real display though, maybe a little lower depends on how the DSP is weaved together with the rest of the GPU and CPU and the efficiency of the Power Management Intel implemented

Now i slowly getting there todo a complete framework test between Nvidia and Intel
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 22nd September 2011 at 14:08.
CruNcher is offline   Reply With Quote
Old 22nd September 2011, 14:34   #75  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Nice to see the performance improvements, that'll surely make it much more usable in the future.
Luckily SSE4.1 is available since Penryn, so any recent Intel iGPU will be able to use it.

Looking forward to working on integrating it in LAV Video when i'm done integrating CUVID properly (and maybe wmv9, depending on what i decide to do first).

PS:
Regarding "integrating into MPC-HC", the MPC-HC integrated decoders are overall outdated, the only thing useful they offer is the DXVA decoder which works better then ffdshows (which is based on the same code, but never was truely maintained)
I've always aimed to replace those decoders with a equally simple and easy to use, yet modern, decoder, which is exactly what my LAV Audio & Video are providing.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 22nd September 2011 at 15:15.
nevcairiel is offline   Reply With Quote
Old 22nd September 2011, 15:48   #76  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by CruNcher View Post
Yep it got amazing fast now and CPU overhead is in the range of Lav Cuvid now for Yoon Yoon it was a dramatic improvement from that heavy utilization @ the beginning to 18% and now only 7-8% pretty good (for non DXVA)
We need some a method for measuring CPU usage. ffdshow-quicksync CPU usage (your example) went down from 18%@3.1GHz to 7-8%@0.8GHz.
Maybe a normalized formula is needed:

Code:
NormalizedCpuUsage = CpuUsage * NumPhysicalCores * Freq
Regarding power - SNB reduced voltage at 800MHz to about 0.7V-0.75V. In turbo it's ~1.2V. That's a major power drop.

Quote:
Originally Posted by CruNcher View Post
It would now even make sense to try it in a Quicksync based Framework and see how it does their inlcuding the Encoder inside ffdshow also looks like a good idea
I think its a good idea too. But not feasible in the short term. BTW, there're encoders in MSDK, but I know nothing about them.

Quote:
Originally Posted by CruNcher View Post
Yeah the decoding issue with the MC.ts bitstream is still a problem it's funny that also Nvidia had problems in the beginning of their API with this i wonder why this bitstream type was overlooked by Nvidia and Intel now .
I gave the MC.ts clip to the MSDK team to check out.
There’s something wrong with it. My AMD Radeon 6950 DXVA crashes on it, libavcodec doesn’t work. Only WMV9 works well.

Quote:
Originally Posted by CruNcher View Post
Also i might have found another H.264 issue but i have to isolate this first it happened in a pretty normal playback scenario.
I’m aware of the following bug (not root caused yet) :
Open an MKV/AVC1 clip in MPC-HC using with EVR-CP as renderer --> crash.
But… if you open another file first and then open the crashing clip it will not crash!
Also not crash with normal EVR. Very strange and very repeatable. Crash is within ffdshow.ax but before my constructor is called. In ZoomPlayer it never happened (no EVR-CP).

Quote:
Originally Posted by CruNcher View Post
I also tested Intels PP system but it's fairly weak (Denoise,Sharpening) are pretty basic implementations currently
I get more detail using EVR in the IGP then my AMD card. I guess a matter of taste.
Please post images for comparison. Also the IGP scaling is much better (I designed it )

Quote:
Originally Posted by CruNcher View Post
Also Deinterlacing and IVTC work only Efficient on EVR with EVR-CP MBAFF Deinterlacing and IVTC are failing currently,…
Please explain. I don’t fully understand.

Quote:
Originally Posted by CruNcher View Post
PS: I see that a Dummy works like expected from the Documentation very nice no need for the Lucid solution though that you have no reference what for P-States the GT1 is using i wonder how much power it draws if in this headless mode i guess as much as with a real display though, maybe a little lower depends on how the DSP is weaved together with the rest of the GPU and CPU and the efficiency of the Power Management Intel implemented
P states should be high (my guess) – a lot of memory traffic but the EUs should be idle. They don’t do much.

Quote:
Originally Posted by CruNcher View Post
Now i slowly getting there todo a complete framework test between Nvidia and Intel
Excellent – this would be good for everyone. Someone needs to replace HQV with something more professional.

Quote:
Originally Posted by nevcairiel View Post
Nice to see the performance improvements, that'll surely make it much more usable in the future.
Luckily SSE4.1 is available since Penryn, so any recent Intel iGPU will be able to use it.
It works on Penryn (I have a Penryn laptop T400 Thinkpad), but poorly . The HW isn’t the same as SNB…

Quote:
Originally Posted by nevcairiel View Post
Looking forward to working on integrating it in LAV Video when i'm done integrating CUVID properly (and maybe wmv9, depending on what i decide to do first).
Excellent! I’m gathering requirements now. Please send them to me.

Quote:
Originally Posted by nevcairiel View Post
PS:
Regarding "integrating into MPC-HC", the MPC-HC integrated decoders are overall outdated, the only thing useful they offer is the DXVA decoder which works better then ffdshows (which is based on the same code, but never was truely maintained)
I've always aimed to replace those decoders with a equally simple and easy to use, yet modern, decoder, which is exactly what my LAV Audio & Video are providing.
I don’t have the bandwidth to create a standalone DirectShow decoder. The MPC-HC devs will have to it themselves, I guess. What I can do is create a standalone decoder with a C++ interface that’s not dependent on anything. This is BTW very close to where I’m now. I’m missing interface requirements to make integration a smooth process (<1 week)
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.
egur is offline   Reply With Quote
Old 22nd September 2011, 16:18   #77  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Quote:
Originally Posted by egur View Post
I gave the MC.ts clip to the MSDK team to check out.
There’s something wrong with it. My AMD Radeon 6950 DXVA crashes on it, libavcodec doesn’t work. Only WMV9 works well.
Its interlaced VC-1, of course libavcodec won't work - interlaced VC-1 is not supported at all, sadly.
As far as i am aware, CUVID decoders on NVIDIA work fine with it, though. Sadly i don't have a copy of that file to check it out, and it appears no-one ever linked it publicly in this thread, or i was too blind to find it.

Quote:
Originally Posted by egur View Post
Excellent! I’m gathering requirements now. Please send them to me.
I'll get back to you on that. I don't really have a set of requirements defined, as most of the time as a developer of these components i just have to adapt to the APIs i have, be it CUDA/CUVID, the WMV9 decoder, DXVA2 or the Intel MSDK.

All i really need is some API at which i can throw compressed frames, and it somehow gives me back the decoded frames, including all necessary metadata.
Then again, there is timestamp handling, which will never work out of the box, so defining requirements for that is non-trivial. H264 and MPEG2 are easy, VC-1 is hard.

I'll think some about that.
If anything, your code will be a great template to build upon.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 22nd September 2011 at 16:26.
nevcairiel is offline   Reply With Quote
Old 22nd September 2011, 16:29   #78  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Quote:
I get more detail using EVR in the IGP then my AMD card. I guess a matter of taste.
Please post images for comparison. Also the IGP scaling is much better (I designed it )
Nice yes im gonna do some also the PP stages and differences to what is basically available to consumer including a pretty basic comparison to thinks like SimHD which imho is overrated entirely to basic Shader PP in it's current incarnation @ least

This H.264 problem is not directly related to ffdshow-quicksync but another popular 3rd Party component that makes use of the Decoder via DXVA, though im still checking this.
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 22nd September 2011 at 16:35.
CruNcher is offline   Reply With Quote
Old 22nd September 2011, 16:45   #79  |  Link
egur
QuickSync Decoder author
 
Join Date: Apr 2011
Location: Atlit, Israel
Posts: 916
Quote:
Originally Posted by nevcairiel View Post
Sadly i don't have a copy of that file to check it out, and it appears no-one ever linked it publicly in this thread, ...
For MC.ts & CD.ts see CruNcher's post with links:
http://forum.doom9.org/showthread.ph...99#post1526099

CD.ts plays fine and it's VC1 interlaced. CruNcher said in the post , that the MC.ts is field interlaced and the CD.ts is frame interlaced. Something is completely screwed with the MC clip, I don't know what yet. It was sent to the MSDK guys for a solution.

Update
wmv9 (from ffdshow 3978) - reports clip as progressive (wrong). No block artifacts. No deinterlacing.
Intel decoder - clip is interlaced (TFF). EVR deinterlaces OK. Strong block artifacts in decoder at the macro block level. No idea why.
__________________
Eric Gur,
Processor Application Engineer for Overclocking and CPU technologies
Intel QuickSync Decoder author
Intel Corp.

Last edited by egur; 22nd September 2011 at 17:51.
egur is offline   Reply With Quote
Old 22nd September 2011, 16:48   #80  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Field interlacing is rather rare for VC-1, however i've run across another clip that uses this just a short while ago - but i've never seen it before that.

Edit:
I can confirm that MC.ts plays fine with my CUVID decoder.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 22nd September 2011 at 16:52.
nevcairiel is offline   Reply With Quote
Reply

Tags
ffdshow, h264, intel, mpeg2, quicksync, vc1, zoom player

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:19.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.