Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 26th March 2025, 19:05   #261  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
No specific, i just want to keep AVS compatibility for people who are still using it and using my plugins.
You're talking to someone who's still under Windows 7 (but with Openshell with Windows XP interface) because i hate the new interface, and even with Openshell Windows 10 interface is not back to my liking.
But unfortunately, i'll have no choice for the new gig i'm building...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 2nd April 2025, 08:44   #262  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
I will not implement the new optimized SSE/AVX2 code, but the rest: new coeff calcul and chroma, i will (or at least, i'll try).

I've pushed (but no build made) the first step of it. The new coeff calcul is implemented, but that's all, for now, the chroma is still the same as previous.
I've made only a few tests. If some people are also interested in testing and building, go ahead !
__________________
My github.
jpsdr is offline   Reply With Quote
Old 3rd April 2025, 19:14   #263  |  Link
Jamaika
Registered User
 
Join Date: Jul 2015
Posts: 872
Quote:
Originally Posted by jpsdr View Post
I've pushed (but no build made) the first step of it. The new coeff calcul is implemented, but that's all, for now, the chroma is still the same as previous.
I've made only a few tests. If some people are also interested in testing and building, go ahead !
Code:
resample_functions.h:121:32: error: 'memset' was not declared in this scope
  121 |         if (bits_per_pixel<32) memset(pixel_coefficient,0,sizeof(short)*target_size*filter_size);
      |                                ^~~~~~
I'm curious. Will all aligned functions be cut out in the end like in avisynth?
Code:
	switch(data->f_process)
	{
		//case 1 : ptrClass->ResamplerLumaAlignedMT(MT_DataGF);
			//break;
		case 2 : ptrClass->ResamplerLumaUnalignedMT(MT_DataGF);
			break;
		//case 3 : ptrClass->ResamplerUChromaAlignedMT(MT_DataGF);
			//break;
		case 4 : ptrClass->ResamplerUChromaUnalignedMT(MT_DataGF);
			break;
		//case 5 : ptrClass->ResamplerVChromaAlignedMT(MT_DataGF);
			//break;
		case 6 : ptrClass->ResamplerVChromaUnalignedMT(MT_DataGF);
			break;
		//case 7 : ptrClass->ResamplerLumaAlignedMT2(MT_DataGF);
			//break;
		case 8 : ptrClass->ResamplerLumaUnalignedMT2(MT_DataGF);
			break;
		//case 9 : ptrClass->ResamplerLumaAlignedMT3(MT_DataGF);
			//break;
		case 10 : ptrClass->ResamplerLumaUnalignedMT3(MT_DataGF);
			break;
		//case 11 : ptrClass->ResamplerLumaAlignedMT4(MT_DataGF);
			//break;
		case 12 : ptrClass->ResamplerLumaUnalignedMT4(MT_DataGF);
			break;
		default : ;
	}
I am interested in the output parameter color range tv {range=2}. When is it active?

Last edited by Jamaika; 3rd April 2025 at 20:29.
Jamaika is offline   Reply With Quote
Old 4th April 2025, 18:10   #264  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
I have no issue building with either Visual Studio 2010 or 2019, the memset issue is odd...

As i said, there will be no change in my code of the core resample function, my code is:
Code:
void FilteredResizeV::StaticThreadpoolV(void *ptr)
{
	Public_MT_Data_Thread *data=(Public_MT_Data_Thread *)ptr;
	FilteredResizeV *ptrClass=(FilteredResizeV *)data->pClass;
	MT_Data_Info_ResampleMT *MT_DataGF=((MT_Data_Info_ResampleMT *)data->pData)+data->thread_Id;
	
	switch(data->f_process)
	{
		case 1 : ptrClass->ResamplerLumaAlignedMT(MT_DataGF);
			break;
		case 2 : ptrClass->ResamplerLumaUnalignedMT(MT_DataGF);
			break;
		case 3 : ptrClass->ResamplerUChromaAlignedMT(MT_DataGF);
			break;
		case 4 : ptrClass->ResamplerUChromaUnalignedMT(MT_DataGF);
			break;
		case 5 : ptrClass->ResamplerVChromaAlignedMT(MT_DataGF);
			break;
		case 6 : ptrClass->ResamplerVChromaUnalignedMT(MT_DataGF);
			break;
		case 7 : ptrClass->ResamplerLumaAlignedMT2(MT_DataGF);
			break;
		case 8 : ptrClass->ResamplerLumaUnalignedMT2(MT_DataGF);
			break;			
		case 9 : ptrClass->ResamplerLumaAlignedMT3(MT_DataGF);
			break;
		case 10 : ptrClass->ResamplerLumaUnalignedMT3(MT_DataGF);
			break;			
		case 11 : ptrClass->ResamplerLumaAlignedMT4(MT_DataGF);
			break;
		case 12 : ptrClass->ResamplerLumaUnalignedMT4(MT_DataGF);
			break;			
		default : ;
	}
}
and will not change.

If no range is specified (default mode auto), tv is active for all YUV formats.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 4th April 2025, 21:16   #265  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
Still no build for now, but i've pushed the final step of resampler update, added keep_center and placement parameters.
Very first quick test, seems to work.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 6th April 2025, 11:35   #266  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
... It seems that i have an issue with AVX2 & AVS+, but not with AVX2 and AVS...
Argh... !
As my standard PC with Visual Studio hasn't AVX2, and my default is with AVS, didn't see it in very quick test...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 7th April 2025, 21:56   #267  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,318
Up to Visual Studio 2017 we have good (full) integration of SDE from intel. So if you use VS2015 (?) or VS2017 (and may be some old ?) you can download and install intel SDE and have full AVX2 (and AVX512) simulation at about any intel (?) CPU. Simply select SDE debugger in the IDE.
DTL is offline   Reply With Quote
Old 7th April 2025, 21:59   #268  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
Pushed a new version, i made more tests, i think everything is good now.
Builds comming soon.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 9th April 2025, 18:09   #269  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
New version, see first post.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 10th April 2025, 14:44   #270  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,318
With ver 2.5.1 looks edges processing is good enough. Very small difference with internal AVS+ resize:
Code:
Loadplugin("ResampleMT.dll")


Function Diff(clip src1, clip src2)
{
  return Subtract(src1.ConvertBits(8),src2.ConvertBits(8)).Levels(120, 1, 255-120, 0, 255, coring=false)
}

BlankClip(100, 200, 100, color=$7F7F7F, pixel_type="YV12")

AddBorders(2, 2, 2, 2, r=2, param1=8)

pad=50

std=LanczosResize(width*2, height*2, taps=16).Subtitle("AVS+ Std 2xLanczosResize taps=16", align=5)

mt=LanczosResizeMT(width*2, height*2, taps=16).Subtitle("ResampleMT  2xLanczosResize taps=16", align=5)

d1 = Diff(mt,std)
d2 = Diff(mt,std)

StackHorizontal(StackVertical(std, mt), Stackvertical(d1, d2))
DTL is offline   Reply With Quote
Old 10th April 2025, 18:07   #271  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
... In theory, there shouldn't have a difference...
I'll make some quick test, but for now, i will not spend more time on this.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 10th April 2025, 21:45   #272  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,318
SIMD resampling functions are still different. I use E7500 CPU pre-AVX. So it run some SSE SIMD for computing.

No difference only with SetMaxCPU("none").
DTL is offline   Reply With Quote
Old 14th April 2025, 16:09   #273  |  Link
pinterf
Registered User
 
Join Date: Jan 2014
Posts: 2,472
A small note on why the changes shouldn't be adopted immediately: The resampler code (SSEx, AVX2) is being modified slightly. The 8-bit and 10-16 bit cases differed only in a few lines, so their code was mostly duplicated. I've unified them and put them into templates. Additionally, the plain C code is being reorganized to help compiler auto-vectorization when built on non-Intel systems. Neither of these changes has been committed to the live system yet, as cleanup is needed on my side. Since there is no release date push on me, it may be finished in some weeks.
pinterf is offline   Reply With Quote
Old 14th April 2025, 16:45   #274  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,433
Thanks for this information.

Anyway, i've just updated the chroma position and coeff/border calcul.

I didn't update the resampler code (SSEx, AVX2), and honestly don't intend to, it will be too much work. There is allready an optimized code (which is good enough for me and compatible AVS & AVS+), and for me, i've implented what realy matters : chroma & coeff.
__________________
My github.
jpsdr is offline   Reply With Quote
Old 14th April 2025, 18:58   #275  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 1,318
Quote:
Originally Posted by pinterf View Post
A small note on why the changes shouldn't be adopted immediately: The resampler code (SSEx, AVX2) is being modified slightly.
I see you use currently dual 256bit SIMD dataword load+processing+store for better performance at
https://github.com/AviSynth/AviSynth...e_avx2.cpp#L84

But have you test more 'workunit size' for AVX2 (also at AVX512 chips) ? Like 4 of 256bit SIMD datawords in 'parallel' ? It is expected if some chips have more dispatch ports or ports with more width (like 512bit) it possibly may dispatch more operations per cycle. Or it is subject to test in future ? Also are AVX512 version expected ?

For some of used instructions _mm256_add_epi32 the Throughput is 3 results per cycle even at very old already chips:


The Latency of _mm256_madd_epi16

is 5 cycles so it output 2 results after 5 cycles delay but can output 2 results per cylcle each next cycle is sources were ready to compute. So to hide startup latency it may be also better to provide > 2 source datasets to compute.

We also have some example of larger 'workunit size' for AVX2 processing in vsTTempSmooth plugin - it loads and process 4x256 bit data for 1 loop spin: https://github.com/Asd-g/AviSynth-vs...h_AVX2.cpp#L71 As I remember this make some performance boost over 1 and 2 256bit datasets.

Last edited by DTL; 16th April 2025 at 20:35.
DTL is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:05.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.