MaskTools2 - pfmod - Page 11

burfadel · 15th September 2017, 23:21

Open source means you can use it, as long as you don't charge for borrowed code and you acknowledge the source. Maybe some of the legacy support code can be removed, particularly if it is impinging in any way. The 64 bit avisynth shouldn't have any of the 32 bit compatibility stuff since you can't use those filters. Maybe all the compatibility code that is still valid but not useful for most modern filters can be moved to a plugin, no point limiting Avisynth with unuseful constraints.

real.finder · 16th October 2017, 14:57

Quote:

Originally Posted by real.finder

hi pinterf

check this https://forum.doom9.org/showthread.php?t=174752

and what about adding mt_merge parameter for mpeg2 vs mpeg1 in 420 with luma=true?

any news? LSFmod port depends on that

burfadel · 9th November 2017, 05:54

It seems that maybe the scaling feature doesn't work as intended. I was using mt_lut with range of 256, with bit depth 12, and the result was as if there was no scaling (all black). If I convert back to 8 bit first, run it, and then convert back to 12 bits it works properly.

pinterf · 9th November 2017, 08:47

Quote:

Originally Posted by burfadel

It seems that maybe the scaling feature doesn't work as intended. I was using mt_lut with range of 256, with bit depth 12, and the result was as if there was no scaling (all black). If I convert back to 8 bit first, run it, and then convert back to 12 bits it works properly.

What was your expression string? Constants inside expressions are scaled only when you specify scaleb or scalef for them.

real.finder · 15th November 2017, 18:43

so, since the expr is added to avs+, are you going to make mt_lut* use it with some option or it's not possible?

pinterf · 15th November 2017, 20:49

I was just thinking about it while rideing homeward. Sure, it won't be a default behaviour.
It has performance penalty. Scaling inputs to a common range requires a floating multiplication within the expression, right after reading the source pixels, unless the common bitdepth is the same as source clip bitdepth. I suppose - knowing that this behaviour was requested because of the easy conversion of old scripts - that this common bit depth is in 8 bit scale 0-255. So for 16 bit input clips the multiplier is 1/256. For 8 bit input, there is no performance loss in this scenario.
A second conversion occurs before storing the result back.

Another ambiguity comes on whether the source is a limited range yuv or full scale. Limited range can nicely be scaled by bit shift method, but this method will give wrong results if we use it on a full scale source.

Other.

In Expr not all operators/functions are implemented, there are masktools-only syntax elements. Do you know scripts that are using these operators? Modulo, sin, cos, all kinds of rounding?

edcrfv94 · 15th November 2017, 21:39

You can try this first, mt_lut at 16bit still faster than Expr 10% speed.

Code:

Function kf_expr_x(clip clip1, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lut(clip1, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_expr_xy(clip clip1, clip clip2, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lutxy(clip1, clip2, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, clip2, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, clip2, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, clip2, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_expr_xyz(clip clip1, clip clip2, clip clip3, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lutxyz(clip1, clip2, clip3, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, clip2, clip3, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, clip2, clip3, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, clip2, clip3, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_expr_xyza(clip clip1, clip clip2, clip clip3, clip clip4, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lutxyza(clip1, clip2, clip3, clip4, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, clip2, clip3, clip4, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, clip2, clip3, clip4, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, clip2, clip3, clip4, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_GetCSP(clip c)
{
	try {
		csp = c.kf_GetCSP_avsPlus()
	} catch (error_msg) {
		csp = c.kf_GetCSP_avs()
	}
	return csp
}


Function kf_GetCSP_avs(clip c)
{
	return c.IsPlanar ? c.IsYV12 ? "YV12" :
	\                   c.IsYV16 ? "YV16" :
	\                   c.IsYV24 ? "YV24" : c.kf_GetCSP_Y8_YV411() :
	\      c.IsYUY2   ? "YUY2"   :
	\      c.IsRGB32  ? "RGB32"  :
	\      c.IsRGB24  ? "RGB24"  : "Unknown"

	Function kf_GetCSP_Y8_YV411(clip c) {
	    try {
		c.UtoY
		csp = "YV411"
	    } catch (error_msg) {
		csp = "Y8"
	    }
	    return csp
	}
}

Function kf_GetCSP_avsPlus(clip c)
{
	return c.Is420   ? "YV12"  :
	\      c.IsY     ? "Y8"    :
	\      c.Is422   ? "YV16"  :
	\      c.Is444   ? "YV24"  : 
	\      c.IsYUVA  ? "YUVA"  :
	\      c.IsYV411 ? "YV411" :
	\      c.IsYUY2  ? "YUY2"  :
	\      c.IsRGB32      ? "RGB32" :
	\      c.IsRGB24      ? "RGB24" :
	\      c.IsPlanarRGB  ? "RGB"   :
	\      c.IsPlanarRGBA ? "RGBA"  :  
	\      c.IsPackedRGB  ? "RGBIL" : "Unknown"
}

real.finder · 15th November 2017, 21:54

Quote:

Originally Posted by pinterf

In Expr not all operators/functions are implemented, there are masktools-only syntax elements. Do you know scripts that are using these operators? Modulo, sin, cos, all kinds of rounding?

well, I can't count all scripts, and aside from those in wiki there are many that not listed there, and they are more than these in wiki

so for safe choice, useexpr parameter should be:-

useexpr="internal" or "none" (default)

so anyone update some function that has mt_lut* can make it faster if it possible by set it to "internal", and the "internal" will be "none" automatically if normal avs or old avs+ is used

aside from that now, maybe in future if someone update the clexpr, then it will be another options with "internal" and "none"

real.finder · 15th November 2017, 22:01

Quote:

Originally Posted by edcrfv94

You can try this first, mt_lut at 16bit still faster than Expr 10% speed.

didn't try it but what about RAM usage

?

edcrfv94 · 15th November 2017, 22:08

Quote:

Originally Posted by real.finder

didn't try it but what about RAM usage

?

Almost no different, mt_lut at 16bit use 1mb ram more than Expr.

pinterf · 16th November 2017, 09:43

Quote:

Originally Posted by edcrfv94

You can try this first, mt_lut at 16bit still faster than Expr 10% speed.

I think it depends on the expression itself. What expression string did you use for comparison? (and that 10% means that lut is faster by 10% or lut needs only 10% time of Expr?)

And I'd like to ask you (or someone) with AVX2, could you please compare the performance of lut/lutxy with Expr on your machines (for a basic expression like "x x +" for lut and "x y -" for lut_xy, and a more complex one?) with optAvx2=False and optAVX2=true. For 8, 10 and 16 bits. (10 bit lut_xy has not that much memory overhead)

Thanks.

edcrfv94 · 16th November 2017, 11:39

Quote:

Originally Posted by pinterf

I think it depends on the expression itself. What expression string did you use for comparison? (and that 10% means that lut is faster by 10% or lut needs only 10% time of Expr?)

And I'd like to ask you (or someone) with AVX2, could you please compare the performance of lut/lutxy with Expr on your machines (for a basic expression like "x x +" for lut and "x y -" for lut_xy, and a more complex one?) with optAvx2=False and optAVX2=true. For 8, 10 and 16 bits. (10 bit lut_xy has not that much memory overhead)

Thanks.

3770k did not support AVX2 only has AVX.(When I9 PC arrivals I can test again maybe need a month.)

mt_lut 8bit-16bit 10%+ faster with complex Expr.
mt_lutxy 8bit faster Expr, other way Expr much faster.

I7 3770k 4.2g

16bit Expr Y8: 201fps 49% cpu 42Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.Expr("x x +")
	ConvertToStacked().DitherPost(mode=6, ampo=1)

16bit mt_lut Y8: 200fps 45% cpu 43Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.mt_lut("x x +", y=3, u=1, v=1)
	ConvertToStacked().DitherPost(mode=6, ampo=1)

16bit Expr Y8: 155fps 34% cpu 43Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.Expr("x 111 + 3 * 100 - 2 / 2 ^ 0.02 * 4 ^ 11 +")
	ConvertToStacked().DitherPost(mode=6, ampo=1)

16bit mt_lut Y8: 201fps 47% cpu 43Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.mt_lut("x 111 + 3 * 100 - 2 / 2 ^ 0.02 * 4 ^ 11 +", y=3, u=1, v=1)
	ConvertToStacked().DitherPost(mode=6, ampo=1)

16bit Expr Y8: 159fps 34% cpu 50Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y -")
	ConvertToStacked().DitherPost(mode=6, ampo=1)

16bit mt_lutxy Y8: 36fps 16% cpu 50Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y -", y=3, u=1, v=1)
	ConvertToStacked().DitherPost(mode=6, ampo=1)

8bit Expr Y8: 708fps 12% cpu 35Mib Memory

Code:

	
SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y -")
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

8bit mt_lutxy Y8: 636fps 12% cpu 35Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y -", y=3, u=1, v=1)
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

8bit Expr Y8: 276ps 12% cpu 35Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?")
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

8bit mt_lutxy Y8: 614fps 12% cpu 35Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?", y=3, u=1, v=1)
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

10bit Expr Y8: 220ps 12% cpu 47Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=10)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?")
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

10bit mt_lutxy Y8: 225fps 12% cpu 48Mib Memory

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=10)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?", y=3, u=1, v=1)
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

pinterf · 16th November 2017, 13:58

Quote:

Originally Posted by edcrfv94

3770k did not support AVX2 only has AVX.(When I9 PC arrivals I can test again maybe need a month.)

Many thanks!
Can you check again these two numbers? I feel that the difference is too big between them.
"x y -"
16bit Expr Y8: 159fps 34% cpu 50Mib Memory
8bit Expr Y8: 708fps 12% cpu 35Mib Memory

edit: the difference occured because of the final stacked + dither post conversion

This script eliminates the conversion overhead before the lut/expr (and no conversion occurs after, avsmeter can report the real comparison of the methods)

Code:

SetMemoryMax(3000)

bits=16 # set 8, 10, .. 16
lut=true # true:lut, false:expr
clipcount=1 # 1: clip/lut, 2: two clips/lutxy
simpleexpr=false # choose simple (true) or a more complex expression

format="YUV420P"+String(bits)
colorbars(width=1920, height=1080, pixel_type=format)
ConvertToY()
trim(0, 5000)
p1 = last
p2 = p1.Invert("Y").trim(0,-1).loop(p1.framecount()) # all cached, no speed penalty

expr_1d_simple  = "x x +"
expr_1d_complex = "x 111 + 3 * 100 - 2 / 2 ^ 0.02 * 4 ^ 11 +"
expr_2d_simple  = "x y -"
expr_2d_complex = "x y < x x y - 0.8 * - x x y - 0.9 * - ?"

expr = clipcount==1 ? (simpleexpr ? expr_1d_simple : expr_1d_complex) : (simpleexpr ? expr_2d_simple : expr_2d_complex)

if(clipcount==1) {
  result = lut ? mt_lut(p1, expr, y=3, u=1, v=1) : Expr(p1, expr)
} else {
  result = lut ? mt_lutxy(p1, p2, expr, y=3, u=1, v=1) : Expr(p1, p2, expr)
}
result

My results for avs+ x86 r2544 and masktools2 2.2.10:

Code:

i7-3770 @ 3.40 GHz (No AVX2)
results in [fps] reported by AvsMeter

Bits: 8         Lut/Expr 
Simple,  1 Clip  790/700
Complex, 1 Clip  796/306
Simple,  2 Clips 540/684
Complex, 2 Clips 540/219

Bits: 16         Lut/Expr 
Simple,  1 Clip  655/713
Complex, 1 Clip  664/300
Simple,  2 Clips  41.2/684 # masktools2 16 bit lutxy is not lookup but realtime calc
Complex, 2 Clips  6.45/224 # masktools2 16 bit lutxy is not lookup but realtime calc

real.finder · 16th November 2017, 17:20

the top is Expr, then Expr with optAvx2=False, then mt_lutxy, using this https://ark.intel.com/products/75459...up-to-2_60-GHz

Code:

colorbars()
converttoyv12
d=last
Expr(last,d,"x y -")
#Expr(last,d,"x y -",optAvx2=False)
#mt_lutxy(last,d,"x y -")

pinterf · 16th November 2017, 19:30

What OS? Not much difference for this kind of expression with or w/o avx2

real.finder · 16th November 2017, 19:43

Quote:

Originally Posted by pinterf

What OS? Not much difference for this kind of expression with or w/o avx2

win7 64 sp1, not my pc btw

edcrfv94 · 30th November 2017, 09:33

Code:

kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)

expr: Stack unbalanced at end of expression. Need to have exactly one value on the stack to return.

Code:

kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)

mt_lut:Output is garbage

Code:

kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)
kf_limit_dif8_128_mt_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)

Work fine.

Maybe "- Expr optimization: eliminate ^1 +0 -0 *1 /1 " cause problems?

Code:

	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12").killaudio().assumefps(25, 1)
	
	#ConvertToY8()
	trim(0, 5000)
	#Limiter()
	#InvertNeg()
	#VToY()
	
	src = last
	src_f = src.RemoveGrain(11, 11, 11)
	
	kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)
	#kf_limit_dif8_128_mt_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)
	#kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)
	#kf_limit_dif8_128_mt_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)

Function kf_limit_dif8_128_expr_test(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{
	smooth   = Default(smooth, True    )
	thr      = Default(thr,    1.0     )
	elast    = Default(elast,  smooth ? 3.0 : 128./thr)
	darkthr  = Default(darkthr,thr     )
	Y        = Default(Y,      3       )
	U        = Default(U,      3       )
	V        = Default(V,      3       )
	
	Y        = min(Y,     4)
	U        = min(U,     4)
	V        = min(V,     4)
	Yt       = Y == 3
	Ut       = U == 3
	Vt       = V == 3
	Y31      = Yt ? 3 : 1
	U31      = Ut ? 3 : 1
	V31      = Vt ? 3 : 1
	
	thr      = max(min(    thr, 128.0), 0.0)
	darkthr  = max(min(darkthr, 128.0), 0.0)
	elast    = max(elast, 1.0)
	mode     = thr == 0 && darkthr == 0 ? 4 : thr == 128 && darkthr == 128 ? 2 : 3
	smooth   = elast==1 ? False : smooth
	
	diffstr  = " x range_half - "
	elaststr = " "+string(elast)+" "
	
	thrstr   = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
	alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
	betastr  = thrstr+elaststr+" * "
	sexpr    = smooth   ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * range_half + "
	\                   : thrstr+diffstr+diffstr" abs / * range_half + "
	expr     = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= range_half "+sexpr+" ? ? "
	
	thrstrc  = " "+string(thr)+" scalef "
	alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
	betastrc = thrstrc+elaststr+" * "
	sexprc   = smooth   ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * range_half + "
	\                   : thrstrc+diffstr+diffstr" abs / * range_half + "
	exprc    = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= range_half "+sexprc+" ? ? "
	
	# diff   = filtered - original
	# alpha  = 1 / (thr * (elast - 1))
	# beta   = elast * thr
	# When smooth=True  :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + alpha * diff * (beta - abs(diff))
	# When smooth=False :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + thr * (diff / abs(diff))
	
	diff     = mt_makediff(filtered, original, y=Y31, u=U31, v=V31)
	ldiff    = expr(diff, expr, exprc, exprc)
	merged   = mt_adddiff(original, ldiff, y=Y31, u=U31, v=V31)
	merged   = Y==2 || U==2 || V==2 || Y==4 || U==4 || V==4 ? mt_lutxyz(filtered, original, merged, Y=Y==3?5:Y, U=U==3?5:U, V=V==3?5:V) : merged
	
	return   mode == 4 ? original
	\      : mode == 2 ? filtered
	\      :             merged
}
	
Function kf_limit_dif8_128_mt_test(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{
	smooth   = Default(smooth, True    )
	thr      = Default(thr,    1.0     )
	elast    = Default(elast,  smooth ? 3.0 : 128./thr)
	darkthr  = Default(darkthr,thr     )
	Y        = Default(Y,      3       )
	U        = Default(U,      3       )
	V        = Default(V,      3       )
	
	Y        = min(Y,     4)
	U        = min(U,     4)
	V        = min(V,     4)
	Yt       = Y == 3
	Ut       = U == 3
	Vt       = V == 3
	Y31      = Yt ? 3 : 1
	U31      = Ut ? 3 : 1
	V31      = Vt ? 3 : 1
	
	thr      = max(min(    thr, 128.0), 0.0)
	darkthr  = max(min(darkthr, 128.0), 0.0)
	elast    = max(elast, 1.0)
	mode     = thr == 0 && darkthr == 0 ? 4 : thr == 128 && darkthr == 128 ? 2 : 3
	smooth   = elast==1 ? False : smooth
	
	diffstr  = " x range_half - "
	elaststr = " "+string(elast)+" "
	
	thrstr   = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
	alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
	betastr  = thrstr+elaststr+" * "
	sexpr    = smooth   ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * range_half + "
	\                   : thrstr+diffstr+diffstr" abs / * range_half + "
	expr     = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= range_half "+sexpr+" ? ? "
	
	thrstrc  = " "+string(thr)+" scalef "
	alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
	betastrc = thrstrc+elaststr+" * "
	sexprc   = smooth   ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * range_half + "
	\                   : thrstrc+diffstr+diffstr" abs / * range_half + "
	exprc    = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= range_half "+sexprc+" ? ? "
	
	# diff   = filtered - original
	# alpha  = 1 / (thr * (elast - 1))
	# beta   = elast * thr
	# When smooth=True  :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + alpha * diff * (beta - abs(diff))
	# When smooth=False :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + thr * (diff / abs(diff))
	
	diff     = mt_makediff(filtered, original, y=Y31, u=U31, v=V31)
	ldiff    = mt_lut(diff, yexpr=expr, uexpr=exprc, vexpr=exprc, y=Y31, u=U31, v=V31)
	merged   = mt_adddiff(original, ldiff, y=Y31, u=U31, v=V31)
	merged   = Y==2 || U==2 || V==2 || Y==4 || U==4 || V==4 ? mt_lutxyz(filtered, original, merged, Y=Y==3?5:Y, U=U==3?5:U, V=V==3?5:V) : merged
	
	return   mode == 4 ? original
	\      : mode == 2 ? filtered
	\      :             merged
}

pinterf · 30th November 2017, 10:38

Missing + sign?
I can see
diffstr+diffstr"
instead of
diffstr+diffstr+"

Edit: elast=1 -> smooth=False, different expressions, syntax error was in the smooth=false branch

edcrfv94 · 30th November 2017, 11:03

Quote:

Originally Posted by pinterf

Missing + sign?
I can see
diffstr+diffstr"
instead of
diffstr+diffstr+"

Edit: elast=1 -> smooth=False, different expressions, syntax error was in the smooth=false branch

You are right, thanks! *mt_lut no error message at Stack mistakes?

real.finder · 6th December 2017, 21:47

any news on scaling things for both expr and lut?

15th November 2017, 18:43	#205 \| Link
real.finder Registered User Join Date: Jan 2012 Location: Mesopotamia Posts: 2,587	so, since the expr is added to avs+, are you going to make mt_lut* use it with some option or it's not possible? __________________ See My Avisynth Stuff

15th November 2017, 20:49	#206 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	I was just thinking about it while rideing homeward. Sure, it won't be a default behaviour. It has performance penalty. Scaling inputs to a common range requires a floating multiplication within the expression, right after reading the source pixels, unless the common bitdepth is the same as source clip bitdepth. I suppose - knowing that this behaviour was requested because of the easy conversion of old scripts - that this common bit depth is in 8 bit scale 0-255. So for 16 bit input clips the multiplier is 1/256. For 8 bit input, there is no performance loss in this scenario. A second conversion occurs before storing the result back. Another ambiguity comes on whether the source is a limited range yuv or full scale. Limited range can nicely be scaled by bit shift method, but this method will give wrong results if we use it on a full scale source. Other. In Expr not all operators/functions are implemented, there are masktools-only syntax elements. Do you know scripts that are using these operators? Modulo, sin, cos, all kinds of rounding? __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average

16th November 2017, 17:20	#214 \| Link
real.finder Registered User Join Date: Jan 2012 Location: Mesopotamia Posts: 2,587	the top is Expr, then Expr with optAvx2=False, then mt_lutxy, using this https://ark.intel.com/products/75459...up-to-2_60-GHz Code: colorbars() converttoyv12 d=last Expr(last,d,"x y -") #Expr(last,d,"x y -",optAvx2=False) #mt_lutxy(last,d,"x y -") __________________ See My Avisynth Stuff Last edited by real.finder; 16th November 2017 at 17:25.

16th November 2017, 19:30	#215 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	What OS? Not much difference for this kind of expression with or w/o avx2 __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average

30th November 2017, 10:38	#218 \| Link
pinterf Registered User Join Date: Jan 2014 Posts: 2,314	Missing + sign? I can see diffstr+diffstr" instead of diffstr+diffstr+" Edit: elast=1 -> smooth=False, different expressions, syntax error was in the smooth=false branch __________________ AviSynth+ on github, Other repos: RgTools, Masktools2, MvTools2, TIVTC, Average

15th September 2017, 23:21	#201 \| Link
burfadel Registered User Join Date: Aug 2006 Posts: 2,229	Open source means you can use it, as long as you don't charge for borrowed code and you acknowledge the source. Maybe some of the legacy support code can be removed, particularly if it is impinging in any way. The 64 bit avisynth shouldn't have any of the 32 bit compatibility stuff since you can't use those filters. Maybe all the compatibility code that is still valid but not useful for most modern filters can be moved to a plugin, no point limiting Avisynth with unuseful constraints.

9th November 2017, 05:54	#203 \| Link
burfadel Registered User Join Date: Aug 2006 Posts: 2,229	It seems that maybe the scaling feature doesn't work as intended. I was using mt_lut with range of 256, with bit depth 12, and the result was as if there was no scaling (all black). If I convert back to 8 bit first, run it, and then convert back to 12 bits it works properly.

6th December 2017, 21:47	#220 \| Link
real.finder Registered User Join Date: Jan 2012 Location: Mesopotamia Posts: 2,587	any news on scaling things for both expr and lut? __________________ See My Avisynth Stuff