Doom9's Forum - View Single Post

Wolfberry · 9th November 2018, 13:08

Code:

  --enable-sse2             enable SSE/SSE2 optimizations
  --enable-avx              enable AVX optimizations
  --enable-avx2             enable AVX2 optimizations
  --enable-avx512           enable AVX512 optimizations
  --enable-avx-128-fma      enable AVX128/FMA optimizations
  --enable-kcvi             enable Knights Corner vector instructions optimizations
  --enable-altivec          enable Altivec optimizations
  --enable-vsx              enable IBM VSX optimizations
  --enable-neon             enable ARM NEON optimizations
  --enable-generic-simd128  enable generic (gcc) 128-bit SIMD optimizations
  --enable-generic-simd256  enable generic (gcc) 256-bit SIMD optimizations

Above is some flags that you can use during configuration.
The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it.
AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are.

The fftw release note says:

Quote:

enabling them all at the same time is a bad idea, because it increases the planning time for minimal gain

And the more path you enabled, the more fat the dlls will be.

Quote:

Originally Posted by HolyWu

I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms.

The future builds will have these codelets generated as well.