Quote:
Originally Posted by HolyWu
Interesting. I had compiled FFTW with GCC 7.2.0 as well but only done tests with FFTW's benchmark program (benchf.exe) and DFTTest since I didn't use DCT mode in MVTools. After investigation I find out that MVTools dislike ICC's O3 optimization for unknown reason, change to O2 optimization gives better performance. I also discover that FFTW by default only generate efficient codelets of size 8 in DCT/IDCT transforms. I especially generate codelets of typical sizes 4, 16, 32 and 64 so now it's at least 50% faster than before when blksize is one of them. DFTTest and FFT3DFilter are unaffected since they use real DFT transforms. The 7z file on GitHub is updated.
|
Nice, the new fftw DLL is almost twice as fast with the script I posted above.