View Single Post
Old 9th November 2018, 13:08   #20  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
Code:
  --enable-sse2             enable SSE/SSE2 optimizations
  --enable-avx              enable AVX optimizations
  --enable-avx2             enable AVX2 optimizations
  --enable-avx512           enable AVX512 optimizations
  --enable-avx-128-fma      enable AVX128/FMA optimizations
  --enable-kcvi             enable Knights Corner vector instructions optimizations
  --enable-altivec          enable Altivec optimizations
  --enable-vsx              enable IBM VSX optimizations
  --enable-neon             enable ARM NEON optimizations
  --enable-generic-simd128  enable generic (gcc) 128-bit SIMD optimizations
  --enable-generic-simd256  enable generic (gcc) 256-bit SIMD optimizations
Above is some flags that you can use during configuration.
The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it.
AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are.

The fftw release note says:
Quote:
enabling them all at the same time is a bad idea, because it increases the planning time for minimal gain
And the more path you enabled, the more fat the dlls will be.
Quote:
Originally Posted by HolyWu View Post
I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms.
The future builds will have these codelets generated as well.
__________________
Monochrome Anomaly

Last edited by Wolfberry; 9th November 2018 at 13:12.
Wolfberry is offline   Reply With Quote