Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#261 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,433
|
No specific, i just want to keep AVS compatibility for people who are still using it and using my plugins.
You're talking to someone who's still under Windows 7 (but with Openshell with Windows XP interface) because i hate the new interface, and even with Openshell Windows 10 interface is not back to my liking. But unfortunately, i'll have no choice for the new gig i'm building... ![]()
__________________
My github. |
![]() |
![]() |
![]() |
#262 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,433
|
I will not implement the new optimized SSE/AVX2 code, but the rest: new coeff calcul and chroma, i will (or at least, i'll try).
I've pushed (but no build made) the first step of it. The new coeff calcul is implemented, but that's all, for now, the chroma is still the same as previous. I've made only a few tests. If some people are also interested in testing and building, go ahead ! ![]()
__________________
My github. |
![]() |
![]() |
![]() |
#263 | Link | |
Registered User
Join Date: Jul 2015
Posts: 872
|
Quote:
Code:
resample_functions.h:121:32: error: 'memset' was not declared in this scope 121 | if (bits_per_pixel<32) memset(pixel_coefficient,0,sizeof(short)*target_size*filter_size); | ^~~~~~ Code:
switch(data->f_process) { //case 1 : ptrClass->ResamplerLumaAlignedMT(MT_DataGF); //break; case 2 : ptrClass->ResamplerLumaUnalignedMT(MT_DataGF); break; //case 3 : ptrClass->ResamplerUChromaAlignedMT(MT_DataGF); //break; case 4 : ptrClass->ResamplerUChromaUnalignedMT(MT_DataGF); break; //case 5 : ptrClass->ResamplerVChromaAlignedMT(MT_DataGF); //break; case 6 : ptrClass->ResamplerVChromaUnalignedMT(MT_DataGF); break; //case 7 : ptrClass->ResamplerLumaAlignedMT2(MT_DataGF); //break; case 8 : ptrClass->ResamplerLumaUnalignedMT2(MT_DataGF); break; //case 9 : ptrClass->ResamplerLumaAlignedMT3(MT_DataGF); //break; case 10 : ptrClass->ResamplerLumaUnalignedMT3(MT_DataGF); break; //case 11 : ptrClass->ResamplerLumaAlignedMT4(MT_DataGF); //break; case 12 : ptrClass->ResamplerLumaUnalignedMT4(MT_DataGF); break; default : ; } Last edited by Jamaika; 3rd April 2025 at 20:29. |
|
![]() |
![]() |
![]() |
#264 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,433
|
I have no issue building with either Visual Studio 2010 or 2019, the memset issue is odd...
As i said, there will be no change in my code of the core resample function, my code is: Code:
void FilteredResizeV::StaticThreadpoolV(void *ptr) { Public_MT_Data_Thread *data=(Public_MT_Data_Thread *)ptr; FilteredResizeV *ptrClass=(FilteredResizeV *)data->pClass; MT_Data_Info_ResampleMT *MT_DataGF=((MT_Data_Info_ResampleMT *)data->pData)+data->thread_Id; switch(data->f_process) { case 1 : ptrClass->ResamplerLumaAlignedMT(MT_DataGF); break; case 2 : ptrClass->ResamplerLumaUnalignedMT(MT_DataGF); break; case 3 : ptrClass->ResamplerUChromaAlignedMT(MT_DataGF); break; case 4 : ptrClass->ResamplerUChromaUnalignedMT(MT_DataGF); break; case 5 : ptrClass->ResamplerVChromaAlignedMT(MT_DataGF); break; case 6 : ptrClass->ResamplerVChromaUnalignedMT(MT_DataGF); break; case 7 : ptrClass->ResamplerLumaAlignedMT2(MT_DataGF); break; case 8 : ptrClass->ResamplerLumaUnalignedMT2(MT_DataGF); break; case 9 : ptrClass->ResamplerLumaAlignedMT3(MT_DataGF); break; case 10 : ptrClass->ResamplerLumaUnalignedMT3(MT_DataGF); break; case 11 : ptrClass->ResamplerLumaAlignedMT4(MT_DataGF); break; case 12 : ptrClass->ResamplerLumaUnalignedMT4(MT_DataGF); break; default : ; } } If no range is specified (default mode auto), tv is active for all YUV formats.
__________________
My github. |
![]() |
![]() |
![]() |
#266 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,433
|
... It seems that i have an issue with AVX2 & AVS+, but not with AVX2 and AVS...
Argh... ! ![]() As my standard PC with Visual Studio hasn't AVX2, and my default is with AVS, didn't see it in very quick test...
__________________
My github. |
![]() |
![]() |
![]() |
#267 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,318
|
Up to Visual Studio 2017 we have good (full) integration of SDE from intel. So if you use VS2015 (?) or VS2017 (and may be some old ?) you can download and install intel SDE and have full AVX2 (and AVX512) simulation at about any intel (?) CPU. Simply select SDE debugger in the IDE.
|
![]() |
![]() |
![]() |
#270 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,318
|
With ver 2.5.1 looks edges processing is good enough. Very small difference with internal AVS+ resize:
Code:
Loadplugin("ResampleMT.dll") Function Diff(clip src1, clip src2) { return Subtract(src1.ConvertBits(8),src2.ConvertBits(8)).Levels(120, 1, 255-120, 0, 255, coring=false) } BlankClip(100, 200, 100, color=$7F7F7F, pixel_type="YV12") AddBorders(2, 2, 2, 2, r=2, param1=8) pad=50 std=LanczosResize(width*2, height*2, taps=16).Subtitle("AVS+ Std 2xLanczosResize taps=16", align=5) mt=LanczosResizeMT(width*2, height*2, taps=16).Subtitle("ResampleMT 2xLanczosResize taps=16", align=5) d1 = Diff(mt,std) d2 = Diff(mt,std) StackHorizontal(StackVertical(std, mt), Stackvertical(d1, d2)) ![]() |
![]() |
![]() |
![]() |
#273 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,472
|
A small note on why the changes shouldn't be adopted immediately: The resampler code (SSEx, AVX2) is being modified slightly. The 8-bit and 10-16 bit cases differed only in a few lines, so their code was mostly duplicated. I've unified them and put them into templates. Additionally, the plain C code is being reorganized to help compiler auto-vectorization when built on non-Intel systems. Neither of these changes has been committed to the live system yet, as cleanup is needed on my side. Since there is no release date push on me, it may be finished in some weeks.
|
![]() |
![]() |
![]() |
#274 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,433
|
Thanks for this information.
Anyway, i've just updated the chroma position and coeff/border calcul. I didn't update the resampler code (SSEx, AVX2), and honestly don't intend to, it will be too much work. There is allready an optimized code (which is good enough for me and compatible AVS & AVS+), and for me, i've implented what realy matters : chroma & coeff.
__________________
My github. |
![]() |
![]() |
![]() |
#275 | Link | |
Registered User
Join Date: Jul 2018
Posts: 1,318
|
Quote:
https://github.com/AviSynth/AviSynth...e_avx2.cpp#L84 But have you test more 'workunit size' for AVX2 (also at AVX512 chips) ? Like 4 of 256bit SIMD datawords in 'parallel' ? It is expected if some chips have more dispatch ports or ports with more width (like 512bit) it possibly may dispatch more operations per cycle. Or it is subject to test in future ? Also are AVX512 version expected ? For some of used instructions _mm256_add_epi32 the Throughput is 3 results per cycle even at very old already chips: ![]() The Latency of _mm256_madd_epi16 ![]() is 5 cycles so it output 2 results after 5 cycles delay but can output 2 results per cylcle each next cycle is sources were ready to compute. So to hide startup latency it may be also better to provide > 2 source datasets to compute. We also have some example of larger 'workunit size' for AVX2 processing in vsTTempSmooth plugin - it loads and process 4x256 bit data for 1 loop spin: https://github.com/Asd-g/AviSynth-vs...h_AVX2.cpp#L71 As I remember this make some performance boost over 1 and 2 256bit datasets. Last edited by DTL; 16th April 2025 at 20:35. |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|