Code:
--enable-sse2 enable SSE/SSE2 optimizations
--enable-avx enable AVX optimizations
--enable-avx2 enable AVX2 optimizations
--enable-avx512 enable AVX512 optimizations
--enable-avx-128-fma enable AVX128/FMA optimizations
--enable-kcvi enable Knights Corner vector instructions optimizations
--enable-altivec enable Altivec optimizations
--enable-vsx enable IBM VSX optimizations
--enable-neon enable ARM NEON optimizations
--enable-generic-simd128 enable generic (gcc) 128-bit SIMD optimizations
--enable-generic-simd256 enable generic (gcc) 256-bit SIMD optimizations
Above is some flags that you can use during configuration.
The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it.
AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are.
The fftw release note says:
Quote:
enabling them all at the same time is a bad idea, because it increases the planning time for minimal gain
|
And the more path you enabled, the more fat the dlls will be.
Quote:
Originally Posted by HolyWu
I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms.
|
The future builds will have these codelets generated as well.