Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
4th August 2010, 22:20 | #42 | Link | |
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Quote:
Dark Shikari had already state that we need to build the whole encoder from scratch . So I think it'd be best if we wait for H.265 and build x265 from the ground up to harness the GPU . Corrections are welcome as always
__________________
AviDemux Windows Builds |
|
4th August 2010, 22:26 | #43 | Link | |
Registered User
Join Date: Dec 2008
Posts: 589
|
Quote:
SSD drives are also getting more and more common and cheap, a 40-80 GB SSD drive is now 100-150$ and can sustain 100-150MB/s writes easily. Dumping 512 MB of data to RAM and then in less than 10 seconds to a hard drive should be doable (it's doable in theory even on a regular drive, mine do 60-70MB easily, but probably won't do if you read data from it at the same time). And, of course, this is without RAID. But remember, I was talking about uploading 512 MB of data to video card and then dump the results of the processing... that doesn't necessarily mean it will be 512 MB of results, it could easily be just 40-50 MB of data. Of course, if it takes less time to process the data than uploading and downloading it from the card it's not worth it. neuron2: I never claimed to be an expert, I'm not, I'm barely able to code websites and do occasional conversions... I'm just writing my thoughts so other can explain why it won't work or it wouldn't be feasible and I'll learn something out of it, wouldn't I? After all, it's a forum here and that's the definition of a forum, a place where people can discuss things. |
|
4th August 2010, 22:35 | #44 | Link | ||
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Quote:
Quote:
Corrections are welcome
__________________
AviDemux Windows Builds |
||
4th August 2010, 22:38 | #45 | Link | ||
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,251
|
Quote:
Porting software to CUDA/OpenCL isn't simple at all. Getting a non-trivial software running on the GPU will be though task. Not to mention all the work that has to be done to optimize it for speed. Also there is absolutely no guarantee that your software will run any faster (more efficient) on the GPU than it does on the CPU. It may or may not work. If your problem isn't highly parallel, it won't fit on the GPU. But even if your problem is highly parallel in theory, then you still have to come up with a smart parallel algorithm that works on the real hardware. See also: http://forum.doom9.org/showpost.php?...&postcount=192 Also the this example shows how complex it is to optimize something as simple as a "parallel reduction" on CUDA: http://developer.download.nvidia.com.../reduction.pdf Quote:
You can't "move" a single DSP function to the GPU (even if it is a LOT faster there), because the delay for CPU -> GPU -> CPU data transfer would nullify the speed-up. Instead you must "move" (read: re-implement) complete algorithms on the GPU, so there will be enough "calculations per data transfer" to legitimate the transfer delay. (Furthermore we don't have any indication that H.265 will be any easier or harder to implement on a GPU)
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 4th August 2010 at 22:46. |
||
4th August 2010, 22:46 | #46 | Link | |||
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Hi Mulder , long time no see
Quote:
Quote:
Quote:
__________________
AviDemux Windows Builds |
|||
4th August 2010, 22:46 | #47 | Link | ||
Guest
Join Date: Jan 2002
Posts: 21,901
|
Quote:
Quote:
Carry on! |
||
4th August 2010, 22:51 | #48 | Link | |
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Quote:
__________________
AviDemux Windows Builds |
|
4th August 2010, 22:54 | #49 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,251
|
Quote:
With all those "big" companies working on GPU-accelerated H.264 encoders and still not one of them can compete with x264 in a proper "quality per speed" comparison, there are only two conclusions: Either all those companies are completely incompetent -or- GPU's aren't as suitable for video encoding as the GPU vendors try to make us believe. Decide yourself
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 4th August 2010 at 22:56. |
|
4th August 2010, 23:01 | #50 | Link | |
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Quote:
__________________
AviDemux Windows Builds Last edited by TheImperial2004; 18th August 2010 at 04:40. |
|
4th August 2010, 23:05 | #52 | Link | |
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Quote:
Sorry I'll edit my post .
__________________
AviDemux Windows Builds Last edited by TheImperial2004; 18th August 2010 at 04:41. |
|
5th August 2010, 17:28 | #53 | Link | |
Registered User
Join Date: Oct 2006
Posts: 150
|
Quote:
One thing you must understand is that GPU and CPU are completely different architectures. GPUs are only good at some tasks and in those tasks they really excel because they are BUILT that way. On ice, your 60 mph snowmobile is always gonna outperform your 180 mph sports car because snowmobiles are built to run on snow. The MHz in these figures have little relevance, as GPUs use huge pipelines and thousands of (yes, more than a thousand) shader processors to run their task. Trying to compare that with 6 core CPU is foolishness. Even if you completely ignore architecture differences, merely a few instruction sets for accelerating certain tasks can have huge impact. The 19 fold increase in encryption performance of the 6 core intel over the older 4 core model is mainly because the AES accelerating instruction set was implemented. So even within the same architecture, clockspeed might not always be the most decisive factor of performance. |
|
5th August 2010, 19:07 | #54 | Link | |
C# Addict
Join Date: Oct 2008
Location: Saudi Arabia
Posts: 114
|
Quote:
Also , we know -to a certain extent- that CUDA API is so difficult to code for , let alone porting existing code into it . thats what we heard from experts . How about Open-CL ? anyone experminted with it ?
__________________
AviDemux Windows Builds Last edited by TheImperial2004; 5th August 2010 at 19:09. |
|
6th August 2010, 02:16 | #55 | Link | |
Registered User
Join Date: Apr 2009
Posts: 478
|
Quote:
|
|
6th August 2010, 02:57 | #56 | Link |
Guest
Join Date: Jan 2002
Posts: 21,901
|
It's not difficult. I wrote an NV12 to RGB24 conversion (with configurable 601/709 coefficients) plus host transfer in two days. And that was starting with very little knowledge of CUDA. The code is so simple that I'm embarrassed that it took me that long (although a lot of that time was working out the correct YUV->RGB equations and optimizing the implementation).
So speak for yourself! Last edited by Guest; 6th August 2010 at 05:47. |
6th August 2010, 04:59 | #57 | Link | |
Registered User
Join Date: Dec 2001
Posts: 145
|
Quote:
It's the algorithms that can be hard to make efficient (not porting per se). "Anything" can be made to run on GPU, but it's a different story if it reaps any benefits (even if it runs faster on GPU). |
|
6th August 2010, 07:25 | #58 | Link | |
Registered User
Join Date: Oct 2006
Posts: 150
|
Quote:
The fact is that x264 is already highly optimized for CPUs and that is because programmers have been optimizing the compilers and routines for CPUs for decades, whereas GPGPU computing is very new and frankly there are very few people who have any sort of expertise on how to optimize the code for GPU processing, and the lack of documentations and established experiments doesn't help. The SIMD extensions in the current CPUs are tailored to accelerate media processing, while GPUs don't offer such specific optimization capabilities-yet. Using x264 on CPU is like putting on a nicely fitted dress on a not too gorgeous girl, but the dress makes her look like a princess. Putting x264 on GPU right now is like putting an overly large dress on a prettier girl, although this girl is more beautiful, she is still going to look laughable. CUDA is not so complex, per say. It involves doing a lot of other things before you can get a result. It's like wrapping your hand around your neck and back before you put that candy in your hand in your mouth, when you could just do it straight. But that added procedures are just the facts of GPU computing now. |
|
6th August 2010, 13:35 | #59 | Link | |
Guest
Join Date: Jan 2002
Posts: 21,901
|
Quote:
|
|
6th August 2010, 15:36 | #60 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,251
|
Quote:
As soon as you do something slightly more complex and you try to do it in a way that runs "fast" on CUDA, things get much more ugly. Especially if you need to store intermediate data in "shared" memory, but shared memory is too small. Also all the "memory access pattern" things are very complex. You need to take care which threads (of a block) run in the same Warp and which memory addresses (banks) they access. Again I want to point to this example: http://developer.download.nvidia.com.../reduction.pdf (And remember, all they implement is a simple Vector reduction! At the end they have a bunch of code, code that really isn't trivial to understand, while in plain C this would be ~3 lines of code ^^)
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 6th August 2010 at 15:42. |
|
Tags |
encoder, gpu, h.264 |
Thread Tools | Search this Thread |
Display Modes | |
|
|