I looked at sources of muvsfunc SSIM and Avisynth's SSIM (v0.25.1 by Mitsubishi).
The Avisynth version is not doing the gaussian kernel at all - it's implemented using summed area tables. That's a faster but lower quality way to calculate it, the
MSU Quality measurement tool page has an example of the difference.
Also muvsfunc returns SSIM calculated on one plane only, by default the luma. Avisynth SSIM has a plane argument which defaults to 0 and then it returns a weighted sum of the luma and chroma channels:
(0.8 * Y) + (0.1*(U+V))
And yes, Avisynth has the lumimask but it's disabled in the code. Muvsfunc has the variables k1 and k2, but at least they default to same values as the ones used in Avisynth version.
So there are quite a few ways to make the implementations differ, I guess there are similar small differences between the other implementations.
My opinion is that the default muvsfunc SSIM downsampling is not useful when comparing the quality of different script settings (like in the Zopti optimizer).