Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 15th September 2017, 23:21   #201  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
Open source means you can use it, as long as you don't charge for borrowed code and you acknowledge the source. Maybe some of the legacy support code can be removed, particularly if it is impinging in any way. The 64 bit avisynth shouldn't have any of the 32 bit compatibility stuff since you can't use those filters. Maybe all the compatibility code that is still valid but not useful for most modern filters can be moved to a plugin, no point limiting Avisynth with unuseful constraints.
burfadel is offline   Reply With Quote
Old 16th October 2017, 14:57   #202  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
Quote:
Originally Posted by real.finder View Post
hi pinterf

check this https://forum.doom9.org/showthread.php?t=174752

and what about adding mt_merge parameter for mpeg2 vs mpeg1 in 420 with luma=true?
any news? LSFmod port depends on that
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 9th November 2017, 05:54   #203  |  Link
burfadel
Registered User
 
Join Date: Aug 2006
Posts: 2,229
It seems that maybe the scaling feature doesn't work as intended. I was using mt_lut with range of 256, with bit depth 12, and the result was as if there was no scaling (all black). If I convert back to 8 bit first, run it, and then convert back to 12 bits it works properly.
burfadel is offline   Reply With Quote
Old 9th November 2017, 08:47   #204  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by burfadel View Post
It seems that maybe the scaling feature doesn't work as intended. I was using mt_lut with range of 256, with bit depth 12, and the result was as if there was no scaling (all black). If I convert back to 8 bit first, run it, and then convert back to 12 bits it works properly.
What was your expression string? Constants inside expressions are scaled only when you specify scaleb or scalef for them.
pinterf is offline   Reply With Quote
Old 15th November 2017, 18:43   #205  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
so, since the expr is added to avs+, are you going to make mt_lut* use it with some option or it's not possible?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 15th November 2017, 20:49   #206  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
I was just thinking about it while rideing homeward. Sure, it won't be a default behaviour.
It has performance penalty. Scaling inputs to a common range requires a floating multiplication within the expression, right after reading the source pixels, unless the common bitdepth is the same as source clip bitdepth. I suppose - knowing that this behaviour was requested because of the easy conversion of old scripts - that this common bit depth is in 8 bit scale 0-255. So for 16 bit input clips the multiplier is 1/256. For 8 bit input, there is no performance loss in this scenario.
A second conversion occurs before storing the result back.

Another ambiguity comes on whether the source is a limited range yuv or full scale. Limited range can nicely be scaled by bit shift method, but this method will give wrong results if we use it on a full scale source.

Other.

In Expr not all operators/functions are implemented, there are masktools-only syntax elements. Do you know scripts that are using these operators? Modulo, sin, cos, all kinds of rounding?
pinterf is offline   Reply With Quote
Old 15th November 2017, 21:39   #207  |  Link
edcrfv94
Registered User
 
Join Date: Apr 2015
Posts: 84
You can try this first, mt_lut at 16bit still faster than Expr 10% speed.

Code:
Function kf_expr_x(clip clip1, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lut(clip1, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_expr_xy(clip clip1, clip clip2, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lutxy(clip1, clip2, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, clip2, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, clip2, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, clip2, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_expr_xyz(clip clip1, clip clip2, clip clip3, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lutxyz(clip1, clip2, clip3, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, clip2, clip3, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, clip2, clip3, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, clip2, clip3, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_expr_xyza(clip clip1, clip clip2, clip clip3, clip clip4, string "expr", string "yExpr", string "uExpr", string "vExpr", string "aExpr", int "Y", int "U", int "V", int "A", bool "sse2", bool "avx2", bool "optSSE2", bool "optSingleMode", bool "optAvx2")
{
	sCSP = clip1.kf_GetCSP()
	IsY8 = sCSP == "Y8"
	IsRGBA = sCSP == "RGBA"
	
	sBit = clip1.BitsPerComponent()
	use_mt_expr = (sBit == 8)
	
	yExpr = Default(yExpr,   expr)
	uExpr = Default(uExpr,  yExpr)
	vExpr = Default(vExpr,  yExpr)
	aExpr = Default(aExpr,  yExpr)
	
	optSSE2 = Default(optSSE2,   sse2)
	optAvx2 = Default(optAvx2,   avx2)
	
	Y = Default(Y, 3)
	U = Default(U, 1)
	V = Default(V, 1)
	A = Default(A, 1)
	
	yExpr = (Y == 3) ? yExpr : ""
	uExpr = (U == 3) ? uExpr : ""
	vExpr = (V == 3) ? vExpr : ""
	aExpr = (A == 3) ? aExpr : ""
	
	out = use_mt_expr ? mt_lutxyza(clip1, clip2, clip3, clip4, expr=expr, yExpr=yExpr, uExpr=uExpr, vExpr=vExpr, aExpr=aExpr, Y=Y, U=U, V=V, A=A, sse2=sse2, avx2=avx2)
	\   : IsY8        ? Expr(clip1, clip2, clip3, clip4, yExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   : !IsRGBA     ? Expr(clip1, clip2, clip3, clip4, yExpr, uExpr, vExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	\   :               Expr(clip1, clip2, clip3, clip4, yExpr, uExpr, vExpr, aExpr, optSSE2=optSSE2, optSingleMode=optSingleMode, optAvx2=optAvx2)
	
	return out
}

Function kf_GetCSP(clip c)
{
	try {
		csp = c.kf_GetCSP_avsPlus()
	} catch (error_msg) {
		csp = c.kf_GetCSP_avs()
	}
	return csp
}


Function kf_GetCSP_avs(clip c)
{
	return c.IsPlanar ? c.IsYV12 ? "YV12" :
	\                   c.IsYV16 ? "YV16" :
	\                   c.IsYV24 ? "YV24" : c.kf_GetCSP_Y8_YV411() :
	\      c.IsYUY2   ? "YUY2"   :
	\      c.IsRGB32  ? "RGB32"  :
	\      c.IsRGB24  ? "RGB24"  : "Unknown"

	Function kf_GetCSP_Y8_YV411(clip c) {
	    try {
		c.UtoY
		csp = "YV411"
	    } catch (error_msg) {
		csp = "Y8"
	    }
	    return csp
	}
}

Function kf_GetCSP_avsPlus(clip c)
{
	return c.Is420   ? "YV12"  :
	\      c.IsY     ? "Y8"    :
	\      c.Is422   ? "YV16"  :
	\      c.Is444   ? "YV24"  : 
	\      c.IsYUVA  ? "YUVA"  :
	\      c.IsYV411 ? "YV411" :
	\      c.IsYUY2  ? "YUY2"  :
	\      c.IsRGB32      ? "RGB32" :
	\      c.IsRGB24      ? "RGB24" :
	\      c.IsPlanarRGB  ? "RGB"   :
	\      c.IsPlanarRGBA ? "RGBA"  :  
	\      c.IsPackedRGB  ? "RGBIL" : "Unknown"
}
edcrfv94 is offline   Reply With Quote
Old 15th November 2017, 21:54   #208  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
Quote:
Originally Posted by pinterf View Post
In Expr not all operators/functions are implemented, there are masktools-only syntax elements. Do you know scripts that are using these operators? Modulo, sin, cos, all kinds of rounding?
well, I can't count all scripts, and aside from those in wiki there are many that not listed there, and they are more than these in wiki

so for safe choice, useexpr parameter should be:-

useexpr="internal" or "none" (default)

so anyone update some function that has mt_lut* can make it faster if it possible by set it to "internal", and the "internal" will be "none" automatically if normal avs or old avs+ is used

aside from that now, maybe in future if someone update the clexpr, then it will be another options with "internal" and "none"
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 15th November 2017, 22:01   #209  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
Quote:
Originally Posted by edcrfv94 View Post
You can try this first, mt_lut at 16bit still faster than Expr 10% speed.
didn't try it but what about RAM usage ?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Old 15th November 2017, 22:08   #210  |  Link
edcrfv94
Registered User
 
Join Date: Apr 2015
Posts: 84
Quote:
Originally Posted by real.finder View Post
didn't try it but what about RAM usage ?
Almost no different, mt_lut at 16bit use 1mb ram more than Expr.
edcrfv94 is offline   Reply With Quote
Old 16th November 2017, 09:43   #211  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by edcrfv94 View Post
You can try this first, mt_lut at 16bit still faster than Expr 10% speed.
I think it depends on the expression itself. What expression string did you use for comparison? (and that 10% means that lut is faster by 10% or lut needs only 10% time of Expr?)

And I'd like to ask you (or someone) with AVX2, could you please compare the performance of lut/lutxy with Expr on your machines (for a basic expression like "x x +" for lut and "x y -" for lut_xy, and a more complex one?) with optAvx2=False and optAVX2=true. For 8, 10 and 16 bits. (10 bit lut_xy has not that much memory overhead)

Thanks.
pinterf is offline   Reply With Quote
Old 16th November 2017, 11:39   #212  |  Link
edcrfv94
Registered User
 
Join Date: Apr 2015
Posts: 84
Quote:
Originally Posted by pinterf View Post
I think it depends on the expression itself. What expression string did you use for comparison? (and that 10% means that lut is faster by 10% or lut needs only 10% time of Expr?)

And I'd like to ask you (or someone) with AVX2, could you please compare the performance of lut/lutxy with Expr on your machines (for a basic expression like "x x +" for lut and "x y -" for lut_xy, and a more complex one?) with optAvx2=False and optAVX2=true. For 8, 10 and 16 bits. (10 bit lut_xy has not that much memory overhead)

Thanks.
3770k did not support AVX2 only has AVX.(When I9 PC arrivals I can test again maybe need a month.)

mt_lut 8bit-16bit 10%+ faster with complex Expr.
mt_lutxy 8bit faster Expr, other way Expr much faster.

I7 3770k 4.2g

16bit Expr Y8: 201fps 49% cpu 42Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.Expr("x x +")
	ConvertToStacked().DitherPost(mode=6, ampo=1)
16bit mt_lut Y8: 200fps 45% cpu 43Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.mt_lut("x x +", y=3, u=1, v=1)
	ConvertToStacked().DitherPost(mode=6, ampo=1)
16bit Expr Y8: 155fps 34% cpu 43Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.Expr("x 111 + 3 * 100 - 2 / 2 ^ 0.02 * 4 ^ 11 +")
	ConvertToStacked().DitherPost(mode=6, ampo=1)
16bit mt_lut Y8: 201fps 47% cpu 43Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	last.mt_lut("x 111 + 3 * 100 - 2 / 2 ^ 0.02 * 4 ^ 11 +", y=3, u=1, v=1)
	ConvertToStacked().DitherPost(mode=6, ampo=1)
16bit Expr Y8: 159fps 34% cpu 50Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y -")
	ConvertToStacked().DitherPost(mode=6, ampo=1)
16bit mt_lutxy Y8: 36fps 16% cpu 50Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y -", y=3, u=1, v=1)
	ConvertToStacked().DitherPost(mode=6, ampo=1)
8bit Expr Y8: 708fps 12% cpu 35Mib Memory
Code:
	
SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y -")
	#ConvertToStacked().DitherPost(mode=6, ampo=1)
8bit mt_lutxy Y8: 636fps 12% cpu 35Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y -", y=3, u=1, v=1)
	#ConvertToStacked().DitherPost(mode=6, ampo=1)
8bit Expr Y8: 276ps 12% cpu 35Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?")
	#ConvertToStacked().DitherPost(mode=6, ampo=1)
8bit mt_lutxy Y8: 614fps 12% cpu 35Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	#ConvertBits(bits=16)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?", y=3, u=1, v=1)
	#ConvertToStacked().DitherPost(mode=6, ampo=1)
10bit Expr Y8: 220ps 12% cpu 47Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=10)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	Expr(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?")
	#ConvertToStacked().DitherPost(mode=6, ampo=1)
10bit mt_lutxy Y8: 225fps 12% cpu 48Mib Memory
Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12")
	
	ConvertToY8()
	trim(0, 5000)
	
	ConvertBits(bits=10)
	
	p1 = last
	p2 = p1.Invert("Y")
	
	mt_lutxy(p1, p2, "x y < x x y - 0.8 * - x x y - 0.9 * - ?", y=3, u=1, v=1)
	#ConvertToStacked().DitherPost(mode=6, ampo=1)

Last edited by edcrfv94; 16th November 2017 at 11:48.
edcrfv94 is offline   Reply With Quote
Old 16th November 2017, 13:58   #213  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Quote:
Originally Posted by edcrfv94 View Post
3770k did not support AVX2 only has AVX.(When I9 PC arrivals I can test again maybe need a month.)
Many thanks!
Can you check again these two numbers? I feel that the difference is too big between them.
"x y -"
16bit Expr Y8: 159fps 34% cpu 50Mib Memory
8bit Expr Y8: 708fps 12% cpu 35Mib Memory

edit: the difference occured because of the final stacked + dither post conversion

This script eliminates the conversion overhead before the lut/expr (and no conversion occurs after, avsmeter can report the real comparison of the methods)
Code:
SetMemoryMax(3000)

bits=16 # set 8, 10, .. 16
lut=true # true:lut, false:expr
clipcount=1 # 1: clip/lut, 2: two clips/lutxy
simpleexpr=false # choose simple (true) or a more complex expression

format="YUV420P"+String(bits)
colorbars(width=1920, height=1080, pixel_type=format)
ConvertToY()
trim(0, 5000)
p1 = last
p2 = p1.Invert("Y").trim(0,-1).loop(p1.framecount()) # all cached, no speed penalty

expr_1d_simple  = "x x +"
expr_1d_complex = "x 111 + 3 * 100 - 2 / 2 ^ 0.02 * 4 ^ 11 +"
expr_2d_simple  = "x y -"
expr_2d_complex = "x y < x x y - 0.8 * - x x y - 0.9 * - ?"

expr = clipcount==1 ? (simpleexpr ? expr_1d_simple : expr_1d_complex) : (simpleexpr ? expr_2d_simple : expr_2d_complex)

if(clipcount==1) {
  result = lut ? mt_lut(p1, expr, y=3, u=1, v=1) : Expr(p1, expr)
} else {
  result = lut ? mt_lutxy(p1, p2, expr, y=3, u=1, v=1) : Expr(p1, p2, expr)
}
result
My results for avs+ x86 r2544 and masktools2 2.2.10:
Code:
i7-3770 @ 3.40 GHz (No AVX2)
results in [fps] reported by AvsMeter

Bits: 8         Lut/Expr 
Simple,  1 Clip  790/700
Complex, 1 Clip  796/306
Simple,  2 Clips 540/684
Complex, 2 Clips 540/219

Bits: 16         Lut/Expr 
Simple,  1 Clip  655/713
Complex, 1 Clip  664/300
Simple,  2 Clips  41.2/684 # masktools2 16 bit lutxy is not lookup but realtime calc
Complex, 2 Clips  6.45/224 # masktools2 16 bit lutxy is not lookup but realtime calc

Last edited by pinterf; 16th November 2017 at 16:42.
pinterf is offline   Reply With Quote
Old 16th November 2017, 17:20   #214  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
the top is Expr, then Expr with optAvx2=False, then mt_lutxy, using this https://ark.intel.com/products/75459...up-to-2_60-GHz



Code:
colorbars()
converttoyv12
d=last
Expr(last,d,"x y -")
#Expr(last,d,"x y -",optAvx2=False)
#mt_lutxy(last,d,"x y -")
__________________
See My Avisynth Stuff

Last edited by real.finder; 16th November 2017 at 17:25.
real.finder is offline   Reply With Quote
Old 16th November 2017, 19:30   #215  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
What OS? Not much difference for this kind of expression with or w/o avx2
pinterf is offline   Reply With Quote
Old 16th November 2017, 19:43   #216  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
Quote:
Originally Posted by pinterf View Post
What OS? Not much difference for this kind of expression with or w/o avx2
win7 64 sp1, not my pc btw
__________________
See My Avisynth Stuff

Last edited by real.finder; 16th November 2017 at 19:46.
real.finder is offline   Reply With Quote
Old 30th November 2017, 09:33   #217  |  Link
edcrfv94
Registered User
 
Join Date: Apr 2015
Posts: 84
Code:
kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)
expr: Stack unbalanced at end of expression. Need to have exactly one value on the stack to return.

Code:
kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)
mt_lut:Output is garbage

Code:
kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)
kf_limit_dif8_128_mt_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)
Work fine.

Maybe "- Expr optimization: eliminate ^1 +0 -0 *1 /1 " cause problems?

Code:
	SetMemoryMax(3000)
	
	colorbars(width=1920, height=1080, pixel_type="yv12").killaudio().assumefps(25, 1)
	
	#ConvertToY8()
	trim(0, 5000)
	#Limiter()
	#InvertNeg()
	#VToY()
	
	src = last
	src_f = src.RemoveGrain(11, 11, 11)
	
	kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)
	#kf_limit_dif8_128_mt_test(src_f, src, thr=1.0, elast=1.0, y=3, u=3, v=3)
	#kf_limit_dif8_128_expr_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)
	#kf_limit_dif8_128_mt_test(src_f, src, thr=1.0, elast=1.01, y=3, u=3, v=3)

Function kf_limit_dif8_128_expr_test(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{
	smooth   = Default(smooth, True    )
	thr      = Default(thr,    1.0     )
	elast    = Default(elast,  smooth ? 3.0 : 128./thr)
	darkthr  = Default(darkthr,thr     )
	Y        = Default(Y,      3       )
	U        = Default(U,      3       )
	V        = Default(V,      3       )
	
	Y        = min(Y,     4)
	U        = min(U,     4)
	V        = min(V,     4)
	Yt       = Y == 3
	Ut       = U == 3
	Vt       = V == 3
	Y31      = Yt ? 3 : 1
	U31      = Ut ? 3 : 1
	V31      = Vt ? 3 : 1
	
	thr      = max(min(    thr, 128.0), 0.0)
	darkthr  = max(min(darkthr, 128.0), 0.0)
	elast    = max(elast, 1.0)
	mode     = thr == 0 && darkthr == 0 ? 4 : thr == 128 && darkthr == 128 ? 2 : 3
	smooth   = elast==1 ? False : smooth
	
	diffstr  = " x range_half - "
	elaststr = " "+string(elast)+" "
	
	thrstr   = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
	alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
	betastr  = thrstr+elaststr+" * "
	sexpr    = smooth   ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * range_half + "
	\                   : thrstr+diffstr+diffstr" abs / * range_half + "
	expr     = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= range_half "+sexpr+" ? ? "
	
	thrstrc  = " "+string(thr)+" scalef "
	alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
	betastrc = thrstrc+elaststr+" * "
	sexprc   = smooth   ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * range_half + "
	\                   : thrstrc+diffstr+diffstr" abs / * range_half + "
	exprc    = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= range_half "+sexprc+" ? ? "
	
	# diff   = filtered - original
	# alpha  = 1 / (thr * (elast - 1))
	# beta   = elast * thr
	# When smooth=True  :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + alpha * diff * (beta - abs(diff))
	# When smooth=False :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + thr * (diff / abs(diff))
	
	diff     = mt_makediff(filtered, original, y=Y31, u=U31, v=V31)
	ldiff    = expr(diff, expr, exprc, exprc)
	merged   = mt_adddiff(original, ldiff, y=Y31, u=U31, v=V31)
	merged   = Y==2 || U==2 || V==2 || Y==4 || U==4 || V==4 ? mt_lutxyz(filtered, original, merged, Y=Y==3?5:Y, U=U==3?5:U, V=V==3?5:V) : merged
	
	return   mode == 4 ? original
	\      : mode == 2 ? filtered
	\      :             merged
}
	
Function kf_limit_dif8_128_mt_test(clip filtered, clip original, bool "smooth", float "thr", float "elast", float "darkthr", int "Y", int "U", int "V")
{
	smooth   = Default(smooth, True    )
	thr      = Default(thr,    1.0     )
	elast    = Default(elast,  smooth ? 3.0 : 128./thr)
	darkthr  = Default(darkthr,thr     )
	Y        = Default(Y,      3       )
	U        = Default(U,      3       )
	V        = Default(V,      3       )
	
	Y        = min(Y,     4)
	U        = min(U,     4)
	V        = min(V,     4)
	Yt       = Y == 3
	Ut       = U == 3
	Vt       = V == 3
	Y31      = Yt ? 3 : 1
	U31      = Ut ? 3 : 1
	V31      = Vt ? 3 : 1
	
	thr      = max(min(    thr, 128.0), 0.0)
	darkthr  = max(min(darkthr, 128.0), 0.0)
	elast    = max(elast, 1.0)
	mode     = thr == 0 && darkthr == 0 ? 4 : thr == 128 && darkthr == 128 ? 2 : 3
	smooth   = elast==1 ? False : smooth
	
	diffstr  = " x range_half - "
	elaststr = " "+string(elast)+" "
	
	thrstr   = diffstr+" 0 > "+string(darkthr)+" scalef "+string(thr)+" scalef ? "
	alphastr = elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstr+" * / ? "
	betastr  = thrstr+elaststr+" * "
	sexpr    = smooth   ? alphastr+diffstr+" * "+betastr+diffstr+" abs - * range_half + "
	\                   : thrstr+diffstr+diffstr" abs / * range_half + "
	expr     = diffstr+" abs "+thrstr+" <= x "+diffstr+" abs "+betastr+" >= range_half "+sexpr+" ? ? "
	
	thrstrc  = " "+string(thr)+" scalef "
	alphastrc= elaststr+" 1 <= 0 1 "+elaststr+" 1 - "+thrstrc+" * / ? "
	betastrc = thrstrc+elaststr+" * "
	sexprc   = smooth   ? alphastrc+diffstr+" * "+betastrc+diffstr+" abs - * range_half + "
	\                   : thrstrc+diffstr+diffstr" abs / * range_half + "
	exprc    = diffstr+" abs "+thrstrc+" <= x "+diffstr+" abs "+betastrc+" >= range_half "+sexprc+" ? ? "
	
	# diff   = filtered - original
	# alpha  = 1 / (thr * (elast - 1))
	# beta   = elast * thr
	# When smooth=True  :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + alpha * diff * (beta - abs(diff))
	# When smooth=False :
	# output = diff <= thr  ? filtered : \
	#          diff >= beta ? original : \
	#                         original + thr * (diff / abs(diff))
	
	diff     = mt_makediff(filtered, original, y=Y31, u=U31, v=V31)
	ldiff    = mt_lut(diff, yexpr=expr, uexpr=exprc, vexpr=exprc, y=Y31, u=U31, v=V31)
	merged   = mt_adddiff(original, ldiff, y=Y31, u=U31, v=V31)
	merged   = Y==2 || U==2 || V==2 || Y==4 || U==4 || V==4 ? mt_lutxyz(filtered, original, merged, Y=Y==3?5:Y, U=U==3?5:U, V=V==3?5:V) : merged
	
	return   mode == 4 ? original
	\      : mode == 2 ? filtered
	\      :             merged
}
edcrfv94 is offline   Reply With Quote
Old 30th November 2017, 10:38   #218  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,314
Missing + sign?
I can see
diffstr+diffstr"
instead of
diffstr+diffstr+"

Edit: elast=1 -> smooth=False, different expressions, syntax error was in the smooth=false branch
pinterf is offline   Reply With Quote
Old 30th November 2017, 11:03   #219  |  Link
edcrfv94
Registered User
 
Join Date: Apr 2015
Posts: 84
Quote:
Originally Posted by pinterf View Post
Missing + sign?
I can see
diffstr+diffstr"
instead of
diffstr+diffstr+"

Edit: elast=1 -> smooth=False, different expressions, syntax error was in the smooth=false branch
You are right, thanks! *mt_lut no error message at Stack mistakes?
edcrfv94 is offline   Reply With Quote
Old 6th December 2017, 21:47   #220  |  Link
real.finder
Registered User
 
Join Date: Jan 2012
Location: Mesopotamia
Posts: 2,587
any news on scaling things for both expr and lut?
__________________
See My Avisynth Stuff
real.finder is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:21.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.