Meanwhile I accidentally observed a bug, which appeared on the right-side of a clip as random color blocks/lines.
It was appearing only in 32bit float test clips and only at specific sequence of resizing.
Check and encode this script and look at the right side of the bottom right clip (32 bit float)
Code:
x=ColorBarsHD.ConvertToYUV444().Trim(0,100)
Function Resize(clip c)
{
Return c.Spline64Resize(2802,1501).\
Spline16Resize(904,487).\
BilinearResize(402,500).\
Spline36Resize(200,489).\
LanczosResize(400,300).ConvertBits(8)
}
a8 = x.ConvertBits(8).Resize()
a10 = x.ConvertBits(10).Resize()
a16 = x.ConvertBits(16).Resize()
a32 = x.ConvertBits(32).Resize()
Stackvertical(StackHorizontal(a8,a10),StackHorizontal(a16,a32))
Unfortunately it was perfect when I encoded float-only clips. The uglyness in the reproduction was that there had to be a 8-16 bit clip in the script (I usually do the tests for 8-10-16-32 bits then stack it together in 8 bits to see any difference) Obviously there had to be something that had left other patterns in memory than a 32bit float format clip.
Narrowing the problem down, it turned out that the SIMD code that handles the pixels in 4/8 units was run into some garbage at the right side, where not all the 4/8 units are visible pixels, depending of the clip width (modulo 4 or 8).
During the resizing process, the unused pixels are masked out with a zero multiplier - existing 8-16 bit code worked fine like that -, but it was not enough for 32 bit float pixels. When such a pixel is undefined, the processor would report it NaN (Not a Number), and multiplying it by 0 would still result in NaN.
Thus such pixel in the resized clip turned into undefined (garbage)
So I had to fix the float resizer code - uhh, it was old, one of my early attempts.
Since the 10-16 bit resizer parts were affected as well, they had to be touched, too.
Finally whe whole 10-16 and float resizer code got rewritten - unfortunately I couldn't see the time that the bug chasing needed, sure, with less effort I could port Zimg resizers into avs+.
Btw zimg.
Since I had to benchmark the new code if it is any better than the one in r2580, I have included the z_XXX resizers.
https://forum.doom9.org/showthread.php?t=173986
Results are interesting.
Look at the 400x2800 -> 900x400 case (16 bit)
Resizing always happens in one horizontal and one vertical pass. Or first vertical resizing, then horizontal. it depends.
In Avisynth - probably for quality reasons - there is a strategy: "// ensure that the intermediate area is maximal"
Avs resizer gave a 103 fps, while zimg had 402 fps. What?
Then it was made clear that Avs chose the 400x2800->900x2800->900x400 sequence,
while zimg chose
400x2800->400x400->900x400.
When I turned the resizing command into two resizing (first V then H), avisynth+ gave a quite comparable result of 427 fps.
First three columns (AVX2, SSE4, noSSE4 contains results of the new resizers) Code was run on an i7-7700, Avs+ x64. I built specific avs+ versions for the test to ignore AVX2..SSE4.1 CPU flags.
EDIT: this benchmark data contains the comparison of a current "under construction" version avs+ and the z-lib resizer I had access (r1a, from 2016).
They both have faster variants since then.
Code:
#32bit float AVX2 SSE4 noSSE4 v2580 Zimg
#400x2800 -> 900x400: 103 64.3 64.3 65.9 Lanczos
#1920x1080-> 1280x720 143.1 83.7 83.2 95.7 158.4 Spline64
#16 bit AVX2 SSE4 noSSE4 v2580 Zimg
#400x2800 -> 900x400: 103.9 84 75.8 56.2 402.6 Lanczos **see comment
#400x2800 -> 400x400->
# 900x400: 427.2 315.3 289.9 243.4 405.1 Lanczos
#1920x1080-> 1280x720 152.2 125.0 118.9 80.6 129 Spline64
#1920x1080-> 1280x720 160.6 134.3 129.4 85.9 134.5 Lanczos
#1920x1080-> 1280x1080 240.2 190.6 185.2 113.2 191.8 Lanczos H
#1920x1080-> 1920x720 335.5 320.8 281 272.2 203.0 Lanczos V
#10 bit AVX2 SSE4 noSSE4 v2580 Zimg
#400x2800 -> 900x400: 105.4 77.5 53.5 Lanczos
#1920x1080-> 1280x720 155.2 146.6 120.4 78.9 133 Spline64
#1920x1080-> 1280x720 163.6 156.2 Lanczos
#8 AVX2 SSE4 noSSE4 Old Zimg
#400x2800 -> 900x400: 93.8 93.4 96.2 93.8 402 **see comment
#1920x1080-> 1280x720 104.7 105.1 106.0 105.5 120.2
# ** Avisynth - unlike zimg - always orders H/V resizers for max intermediate area!
(8 bit resizer code is untouched by me, there is no avx2 option there but I included their measurements)
Now it's to be decided that the slower H/V or V/H decision strategy should be kept or not. Is the difference really visible and when?