It's not difficult. I wrote an NV12 to RGB24 conversion (with configurable 601/709 coefficients) plus host transfer in two days. And that was starting with very little knowledge of CUDA. The code is so simple that I'm embarrassed that it took me that long (although a lot of that time was working out the correct YUV->RGB equations and optimizing the implementation).
So speak for yourself!