Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Programming and Hacking > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 25th May 2010, 15:50   #1  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
x264 OpenCL

I was thinking of adding OpenCL support to x264. Iíve googled a bit and it seems that people would be interested, except that not many are willing to start on such a big project. I donít know GCC compiler directives well enough to start converting it to OpenCL.
I could work on the C code possibly; however, itís a bit confusing to go through it. If someone were to add an OpenCL folder under the common folder adding to the configure script to allow for compiling with OpenCL support and put the C code in there of the important files such as mc.c, encode.c. Basically all the C code versions of the assembly files. I could implement OpenCL from there.
swg is offline   Reply With Quote
Old 25th May 2010, 16:44   #2  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,922
What do you mean by "OpenCL support"?
Guest is offline   Reply With Quote
Old 25th May 2010, 17:04   #3  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
Quote:
Originally Posted by neuron2 View Post
What do you mean by "OpenCL support"?
OpenCL hardware acceleration, essentially GPU acceleration.
swg is offline   Reply With Quote
Old 25th May 2010, 18:12   #4  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
Basically OpenCL would replace the assembly part, openCL allows for massively parallel operations to happen on GPU, CPU or any supported hardware and would be optimized accordingly.
swg is offline   Reply With Quote
Old 25th May 2010, 18:44   #5  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,196
Quote:
Originally Posted by swg View Post
Basically OpenCL would replace the assembly part, openCL allows for massively parallel operations to happen on GPU, CPU or any supported hardware and would be optimized accordingly.
It has been explained a dozen times why the naive idea that you only need to throw your existing code on the GPU to get a massive speed-up is more than wrong

Summery: Writing code that actually runs fast on the GPU isn't trivial at all. Inventing new algorithms that are suitable for GPU is even harder. Switching from the CPU to the GPU often requires finding completely new solutions for your problems - which needs a whole lot of work! And in some cases the problem is inherently sequential and thus will never run (efficiently) on the GPU. Last but not least, moving only small parts of a software to the GPU isn't reasonable (speed-wise), because transferring data between the host memory and the GPU memory has a huge delay...

(The fact that all the "GPU encoders" available on market only reach fast encoding speed by sacrificing quality shows that GPU's aren't that great for H.264 encoding)

__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 25th May 2010 at 20:14.
LoRd_MuldeR is offline   Reply With Quote
Old 25th May 2010, 19:35   #6  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
Quote:
Originally Posted by LoRd_MuldeR View Post
It has been explained a dozens times why the naive idea that you only need to throw your existing code on the GPU to get a massive speed-up is more than wrong

Summery: Writing code that actually runs fast on the GPU isn't trivial at all. Inventing new algorithms that are suitable for GPU is even harder. Switching from the CPU to the GPU often requires finding completely new solutions for your problems - which needs a whole lot of work! And in some cases the problem is inherently sequential and thus will never run (efficiently) on the GPU. Last but not least, moving only small parts of a software to the GPU isn't reasonable (speed-wise), because transferring data between the host memory and the GPU memory has a huge delay...

(The fact that all the "GPU encoders" available on market only reach fast encoding speed by sacrificing quality shows that GPU's aren't that great for H.264 encoding)

Yes I'm aware of that, I was just offering my help in trying to achieve it. If your not interested that's fine I'll just do it as a side project. Just don't bash. I just read this quote on http://x264dev.multimedia.cx/?p=332
# Ashish Says:
March 25th, 2010 at 3:42 am

What about Motion Estimation using CUDA ? I see much activity 6 months ago. But, not much since then, is the project already complete ?

I have had past CUDA experience and would like to work on this, but I have minimal Video experience. I got some head start from Anton of Nvidia, and I am now exploring wiki pages regarding video compression. I hope I get selected.
# onitake Says:
March 26th, 2010 at 8:26 am

Ashish: gpu offloading would be much appreciated.
could you consider working with opencl instead of cuda though? it’s available in nvidia and amd drivers on both windows and linux now, and also on osx. cuda is constrained to nvidia hardware

Last edited by swg; 25th May 2010 at 19:36. Reason: Added URL
swg is offline   Reply With Quote
Old 25th May 2010, 19:43   #7  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,196
Quote:
Originally Posted by swg View Post
Yes I'm aware of that, I was just offering my help in trying to achieve it.
What you wrote just didn't sound like that

Quote:
Originally Posted by swg View Post
If your not interested that's fine I'll just do it as a side project.
I am interested. But be warned that it won't be easy. Various companies tried to implement a GPU encoder for H.264 and so far their results all were more than disappointing

But if you really think you can do it better, then you should talk to the developers at irc://irc.freenode.net/x264dev

Quote:
Originally Posted by swg View Post
Just don't bash.
I only summarized the facts. If you think that is bashing, then you have a rather bizarre definition of bashing.

Don't ask, if you don't want to hear the answer...

Quote:
Originally Posted by swg View Post
What about Motion Estimation using CUDA ?
There already is a GSoC 2010 task scheduled for GPU Motion Estimation:
http://wiki.videolan.org/SoC_x264_20...n_Estimation_2

Not sure whether that task has already been assigned to somebody. If not, you may want to volunteer...
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 25th May 2010 at 20:00.
LoRd_MuldeR is offline   Reply With Quote
Old 25th May 2010, 19:56   #8  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
Quote:
Originally Posted by LoRd_MuldeR View Post



I only summarized the facts. If you think that is bashing, then you have a rather bizarre definition of bashing.

Don't ask, if you don't want to hear the answer...



There already is a GSoC project for that...
Sorry I'm just currently being bashed on that channel for having suggested it. That statement was aimed at people bashing the idea there. Basically I believe for high definition pictures it would help while low definition there would be little to no performance gain due to latency from main memory to GPU memory
swg is offline   Reply With Quote
Old 25th May 2010, 19:59   #9  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,922
Why wouldn't latency affect HD also?

Do you have any data to support your beliefs?
Guest is offline   Reply With Quote
Old 25th May 2010, 20:44   #10  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,234
Does motion estimation only work from one from to the next or can it be effective over several frames? Say you had a graphic at the bottom of the screen, and the next frame it moves up. That I know is already done with ME. If that graphic gets hidden and exposed again I know that gets covered as well as long as its within a certain number of frames. What I am referring to is if that graphic gets covered, then when it is uncovered it is now say, to the left and up a bit? Wouldn't that require the motion estimation (say in UMH mode) to be calculated from the first frame to the second, the first to the third, first to the fourth and so on? Since the changes for each frame are then known, the only ME that would need to be applied between the second frame and third frame, and second frame and fourth frame etc would is the differences between the first and second frame, since the other information is already known. Over time for that group of frames you have a large known set of motion estimations making each successive frame quicker and easier to process.

Thats probably already done? and if not is it a stupid idea or actually possible? I'm guessing it would be very cpu intensive if not already done and its a possible scenario where GPU assistance could be handy? Good for animation especially?

Last edited by burfadel; 25th May 2010 at 20:46.
burfadel is offline   Reply With Quote
Old 25th May 2010, 20:49   #11  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
1. Latency != bandwidth. The latency I'm concerned about is not only the hardware latency, but the task scheduler latency in starting the job. For even a fully uncompressed HD image at 1920 x 1080 ( 1 byte per band per pixel at 3 bands per pixel) the transfer time is going to be less than 1/1000 of a second from main memory to GPU memory on a PCI Ex16 V2.x bus at 8 GB/s. The difference in overhead between HD and low def is minimum in transferring because the constant latency time would still be there in HD. The payoff would be greater.
2. I don't have any data to back up my beliefs.

From the type of questions you two are asking me and the responses I'm getting on IRC I get the point that the developers are not too interested in this. I am interested in the Motion Estimation task in http://wiki.videolan.org/SoC_x264_2010#Projects however I am going to take more time to fully explore the code in x264 before committing myself to it. I've only been studying it for 2 weeks.
swg is offline   Reply With Quote
Old 25th May 2010, 21:20   #12  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,688
Reports from a tester in #x264dev are that the device latency is more like 1 microsecond.
Dark Shikari is offline   Reply With Quote
Old 25th May 2010, 22:07   #13  |  Link
swg
Registered User
 
Join Date: Apr 2010
Posts: 7
Interesting I would have thought it to be higher. Oh well I'll look at the motion estimation problem as well as continue to look at the rest of the code.
swg is offline   Reply With Quote
Old 26th May 2010, 17:17   #14  |  Link
MfA
Registered User
 
Join Date: Mar 2002
Posts: 1,075
Whatever happened to me-prepass? A true-motion motion estimation pre-pass, with the main code only taking a pick between true motion and predicted MV and then doing local RD optimization, could be easily decoupled from the main codec.
MfA is offline   Reply With Quote
Reply

Tags
gpu acceleration, opencl, video encoding, x264

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:04.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, vBulletin Solutions Inc.