Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th July 2018, 03:12   #1  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
DG HDR10 CUDA filters

The DG HDR10 CUDA-accelerated filters for Avisynth+ (pinterf) are now performing excellently and are ready for serious use.

DGHDRtoSDR: Converts HDR10 PQ to 8-bit SDR YV12 or 10-bit SDR stored in YUV420P16.

DGPQtoHLG: Converts HDR10 PQ to HDR10 HLG.

http://rationalqm.us/hdr

Feedback and feature requests will be happily considered and appreciated.

Last edited by videoh; Yesterday at 17:07.
videoh is offline   Reply With Quote
Old 28th July 2018, 17:39   #2  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 556
Quote:
Originally Posted by videoh View Post

DGPQtoHLG: Converts HDR10 PQ to HDR10 HLG.

http://rationalqm.us/hdr/DGPQtoHLG_1.0.rar
This is the one I needed a month ago in this topic: https://forum.doom9.org/showthread.php?t=175546

Back then, I solved with Davinci Resolve, but I'm glad to finally have PQ to HLG available on Avisynth+ and cuda-accelerated.
I understand that it works in real YUV420P16 (the one used by Avisynth+), but would you consider introducing a 16bit stacked or 16bit interleaved option to make it available for our poor plain Avisynth users?

Thank you in advance,
I appreciate.
__________________
Broadcast Encoder
LinkedIn
FranceBB is offline   Reply With Quote
Old 28th July 2018, 18:35   #3  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
Sure, I can do that. But as I have never worked with stacked or interleaved can you give me a link to the format specs, please? Or just explain it here. Thank you, FranceBB!

Last edited by videoh; 28th July 2018 at 18:56.
videoh is offline   Reply With Quote
Old 28th July 2018, 19:27   #4  |  Link
kolak
Registered User
 
Join Date: Nov 2004
Location: UK
Posts: 2,368
Vapoursynth version please

Any details about used methods for conversions?
kolak is offline   Reply With Quote
Old 28th July 2018, 20:01   #5  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
Quote:
Originally Posted by kolak View Post
Vapoursynth version please
OK, coming. Thank you for the suggestion, kolak.

Quote:
Any details about used methods for conversions?
10-bit YCbCr -> float R'G'B' PQ
R'G'B' PQ -> linear RGB
tonemapping
2020 -> 709 gamut mapping
linear RGB -> gamma R'G'B'
gamma R'G'B' -> YCbCr 8-bit or 10-bit

I'm still reading through the voluminous madVR threads, so when done with that I'll be able to say "it's like madVR xyz mode". Let's just say I am not averse to making heuristic tradeoffs for simplicity and CUDA performance. Sometimes the "scientific" approach misses some crucial science. For example, the perceptual thresholds require a certain amount of dwell time; with video, stuff happens too fast for simple science to be universally applicable, IMHO. People do not freeze frame and zoom in when watching videos. The key scenes I have seen in the madVR threads appear to be handled quite reasonably both by madVR and DGHDRtoSDR. Perfection is not an applicable concept here and obsessing about single scenes that flash by for one or two seconds just gets one stuck in quicksand, again IMHO.

Last edited by videoh; 28th July 2018 at 20:38.
videoh is offline   Reply With Quote
Old 28th July 2018, 20:12   #6  |  Link
Cary Knoop
Cary Knoop
 
Cary Knoop's Avatar
 
Join Date: Feb 2017
Location: Newark CA, USA
Posts: 156
Are you going to share the source code?
Cary Knoop is offline   Reply With Quote
Old 28th July 2018, 20:18   #7  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
Quote:
Originally Posted by Cary Knoop View Post
Are you going to share the source code?
I'm thinking about it, Cary. I don't mind sharing the SDR conversion details but if I give the whole CUDA filtering infrastructure (with embedded PTX code, my own helper tools, etc.) I lose my competitive edge.

I'm working on an idea called CUDASynth, which will pipeline filters on the GPU, thereby avoiding very large frame copies GPU<->CPU. Maybe after that is done I'll give up everything.

Last edited by videoh; 28th July 2018 at 20:34.
videoh is offline   Reply With Quote
Old 28th July 2018, 20:41   #8  |  Link
Cary Knoop
Cary Knoop
 
Cary Knoop's Avatar
 
Join Date: Feb 2017
Location: Newark CA, USA
Posts: 156
Quote:
Originally Posted by videoh View Post
I'm working on an idea called CUDASynth, which will pipeline filters on the GPU, thereby avoiding very large frame copies GPU<->CPU.
All I can say is: Go for it!
Cary Knoop is offline   Reply With Quote
Old 28th July 2018, 20:48   #9  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
I will! It's more of an API/scripting language design issue than a technical one. If we have 5 filters processing a 3840x2160 frame, we shouldn't need to transfer that full frame back to the CPU and then back up to the GPU for each filter (GPU bandwidth is the bottleneck for these large frames). Once up to the GPU at the start of the chain and once down to the CPU at the end would greatly improve performance of the chain. How to specify that and make it work with existing Avisynth(+)...that's where the creative work comes in. Also, script processing can branch so that should be allowed for.
videoh is offline   Reply With Quote
Old 28th July 2018, 21:11   #10  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 1,136
about cuda, there are already something like this https://forum.doom9.org/showthread.p...81#post1820381

don't know if CLSynth or whatever is possible since not all people has nvidia
__________________
My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 28th July 2018, 21:15   #11  |  Link
kolak
Registered User
 
Join Date: Nov 2004
Location: UK
Posts: 2,368
Quote:
Originally Posted by videoh View Post
OK, coming. Thank you for the suggestion, kolak.


10-bit YCbCr -> float R'G'B' PQ
R'G'B' PQ -> linear RGB
tonemapping
2020 -> 709 gamut mapping
linear RGB -> gamma R'G'B'
gamma R'G'B' -> YCbCr 8-bit or 10-bit

I'm still reading through the voluminous madVR threads, so when done with that I'll be able to say "it's like madVR xyz mode". Let's just say I am not averse to making heuristic tradeoffs for simplicity and CUDA performance. Sometimes the "scientific" approach misses some crucial science. For example, the perceptual thresholds require a certain amount of dwell time; with video, stuff happens too fast for simple science to be universally applicable, IMHO. People do not freeze frame and zoom in when watching videos. The key scenes I have seen in the madVR threads appear to be handled quite reasonably both by madVR and DGHDRtoSDR. Perfection is not an applicable concept here and obsessing about single scenes that flash by for one or two seconds just gets one stuck in quicksand, again IMHO.
I don't know much about it, but what about per scene/2 pass process for best possible results?
I think this is what Technicolor R&D people seems to do.
kolak is offline   Reply With Quote
Old 28th July 2018, 22:43   #12  |  Link
Sparktank
47.952fps@71.928Hz
 
Sparktank's Avatar
 
Join Date: Mar 2011
Posts: 883
Great work! Now I just need to upgrade my card.
Then my disc drive, then get a decryptor.

Might have to get a new mobo/cpu, if the DeUHD requires it.
Though, I think they just need a disc drive upgrade?
__________________
Win10 (x64) build 17134 | GPU Caps Viewer 1.40.1.0
NVIDIA GeForce GT 1030 (GP108) 2047MB/GDDR5 | (R417.22)
NTSC | DVD: R1 | BD: A
Sparktank is offline   Reply With Quote
Old 28th July 2018, 23:17   #13  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 556
Quote:
Originally Posted by videoh View Post
Sure, I can do that. But as I have never worked with stacked or interleaved can you give me a link to the format specs, please? Or just explain it here. Thank you, FranceBB!
Sure.
Basically, Avisynth internally works in 8bit only.
In order to achieve high bit depth, two "work-around" were invented: 16bit stacked and 16bit interleaved.
In 16bit stacked, the picture is made of two parts: one containing the highest 8 bits (MSB) for each pixel, stacked on top of another containing the lowest 8 bits (LSB).
MSB stands for "Most significant bit" and LSB stands for "Less significant bit".
16bit stacked is also called "Double-Height".
16bit interleaved works the same way, but the MSBs and LSBs are horizontally interleaved, that's why is also called "Double-Width".

An 8bit source clip will have a null LSB in 16bit stacked.
To convert an 8bit source to 16bit stacked, all you have to do is:

Code:
source=ColorBarsHD(width=1280, height=720, pixel_type="YV24").Converttoyv12()

function null_lsb (source)
{
 BlankClip (source, pixel_type="YV12", color_yuv=0)
 
 }

StackVertical(source, source.null_lsb)
The clip will be 1280x1440, with the upper part showing the bars and the lower part showing a solid green color.

Once you are in 16bit stacked, you gotta work with MSB and LSB.
In order to work with them you gotta "get" them like so:

Code:
source=ColorBarsHD(width=1280, height=720, pixel_type="YV24").Converttoyv12()

function null_lsb (source)
{
 BlankClip (source, pixel_type="YV12", color_yuv=0)
 
 }

StackVertical(source, source.null_lsb)

source16bitstacked=last


msb_w = source16bitstacked.Width ()
msb_h = source16bitstacked.Height () / 2
msb = source16bitstacked.Crop (0, 0, msb_w, msb_h)

lsb_w = source16bitstacked.Width ()
lsb_h = source16bitstacked.Height () / 2
lsb = source16bitstacked.Crop (0, lsb_h, lsb_w, lsb_h)

StackVertical(msb, lsb)
This second example shows the exact same picture of the first one (1280x1440), but it demonstrates you how to get msb and lsb independently and work with them.

The most complete plugin that works in 16bit stacked is "Dither Tools".
http://avisynth.nl/index.php/Dither_tools

With Dither Tool you can replicate the example above (converting an 8bit clip to 16bit stacked) by simply typing:

Code:
Dither_convert_8_to_16()
There are many things you can do using Dither Tools.

As to the 16bit interleaved, it's not as widely used as 16bit stacked, but there are indeed pipes and filters that support one or another. HDRTools is the 16bit interleaved equivalent of Dither Tools.
You can easily go from 16bit stacked to 16bit interleaved and vice-versa.
In avisynth+ there's also a function that allows you to go from 16bit stacked to real 16bit and vice-versa.
Unfortunately, i've never used Avisynth+, so I don't know how it works internally.
Anyway, the idea would be to either integrate the Avisynth+ conversion into your plugin or to make it work directly in 16bit stacked (I don't suggest interleaved as it's not widely used).

If you need further explanation, I'm here.
__________________
Broadcast Encoder
LinkedIn

Last edited by FranceBB; 28th July 2018 at 23:26.
FranceBB is offline   Reply With Quote
Old 28th July 2018, 23:31   #14  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
Quote:
Originally Posted by real.finder View Post
about cuda, there are already something like this https://forum.doom9.org/showthread.p...81#post1820381
There is no real documentation that says what it does, so I can't tell if it is implementing my idea, i.e., pipelines on the GPU to eliminate frame transfers between filters.

Quote:
don't know if CLSynth or whatever is possible since not all people has nvidia
It wouldn't be a modification to Avisynth but rather just a way to signal to filters that their input is already up there on the GPU, etc. I'm an nVidia fanboy, so I don't care about people without nVidia cards.

Last edited by videoh; 28th July 2018 at 23:37.
videoh is offline   Reply With Quote
Old 28th July 2018, 23:34   #15  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
Quote:
Originally Posted by kolak View Post
I don't know much about it, but what about per scene/2 pass process for best possible results?
I think this is what Technicolor R&D people seems to do.
I never heard of 2-pass SDR conversion, but do know about madVR's rolling average light measurements. It's possible to do something like that but I usually wait for a problematic stream(s) to motivate changes, so if you have any that seem to need it, please let me know.

Last edited by videoh; 12th December 2018 at 03:54.
videoh is offline   Reply With Quote
Old 28th July 2018, 23:36   #16  |  Link
videoh
Registered User
 
Join Date: Jul 2014
Posts: 709
Quote:
Originally Posted by FranceBB View Post
Sure.
Basically, Avisynth internally works in 8bit only.
...
If you need further explanation, I'm here.
Thank you for the detailed explanation. If I get stuck I'll post again.
videoh is offline   Reply With Quote
Old 28th July 2018, 23:55   #17  |  Link
kolak
Registered User
 
Join Date: Nov 2004
Location: UK
Posts: 2,368
Quote:
Originally Posted by videoh View Post
I never heard of 2-pass SDR conversion, but do know about madVR's rolling average light measurements. It's possible to so something like that but I usually wait for a problematic stream(s) to motivate changes, so if you have any that seem to need it, please let me know.
I think theory is that single settings don't work well for the whole eg. 2h movie, so Technicolor/Dolby seems to be doing things per scene with 2 pass process: analyse and then convert with settings based on analyse stage results.

Once we get vs version I will try to compare it to other/pro solutions.
kolak is offline   Reply With Quote
Old 29th July 2018, 02:09   #18  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 1,136
Quote:
Originally Posted by ObenS View Post
Nvidia GPUs support OpenCL. CLSynth would be nice though.
yes, I already know, I said that to mention that cuda is not for all people not the opencl which work even without gpu at all (with just cpu) in many cpu's see here

and there are many people that use servers or RDP account on shared server for encoding and others and those servers don't has nvidia or don't even has gpu at all most of times! not mention laptop's users that don't has nvidia most of times too, in desktop world I think there are many nvidia users if they are not the most of desktop's users, but nowdays many people don't use desktop's, so make the filter work only with cuda (nvidia) mean most people will not able to use it

edit: where did the ObenS post go!
__________________
My Avisynth Stuff

Last edited by real.finder; 29th July 2018 at 02:15.
real.finder is offline   Reply With Quote
Old 29th July 2018, 03:36   #19  |  Link
Dion
Registered User
 
Join Date: Oct 2004
Posts: 62
Quote:
Originally Posted by FranceBB View Post
but would you consider introducing a 16bit stacked or 16bit interleaved option to make it available for our poor plain Avisynth users?
Avisynth+ supports higher bit depth outputs.. Seems silly to ask him to support outdated avisynth builds.
Dion is offline   Reply With Quote
Old 29th July 2018, 05:00   #20  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Germany
Posts: 556
@Dion... many people are still using Avisynth, which is not outdated, it's simply developed in a different branch with a different approach. I hope that proper High Bit Depth will be introduced in Avisynth one day, so we won't have to use 16bit stacked/interleaved anymore. Besides, there are many people that are still using Avisynth 2.6.1 alpha and are waiting for the beta and the stable. It's been out in alpha for years now and is about time to move to beta, but... you know... Avisynth development takes ages, so...

Anyway, end of OT.
__________________
Broadcast Encoder
LinkedIn
FranceBB is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 18:37.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.