Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
28th March 2017, 15:46 | #1 | Link | |||||
47.952fps@71.928Hz
Join Date: Mar 2011
Posts: 940
|
newer FFTW DLL's Windows?
Edit 3: Currently active links
Quote:
Quote:
EDIT2: EDIT: Quote:
Quote:
Old: I notice on the FFTW site that the current Windows builds are for 3.3.5. http://fftw.org/install/windows.html Since then, there are have been a few updates for the 'stable' 3.3.6 verison: http://fftw.org/release-notes.html Quote:
media-autobuild_suite doesn't include fftw building. and the last time I tried to compile something myself, I ended up spending 2 days trying to get everythint together and I don't even remember if I finished or not. EDIT: Can someone please compile and upload? The official site might update once they reach 3.3.7. I'm pretty sure the version I have currently on my system is one of the deprecated versions.
__________________
Win10 (x64) build 19041 NVIDIA GeForce GTX 1060 3GB (GP106) 3071MB/GDDR5 | (r435_95-4) NTSC | DVD: R1 | BD: A AMD Ryzen 5 2600 @3.4GHz (6c/12th, I'm on AVX2 now!)
Last edited by Sparktank; 23rd April 2020 at 22:54. |
|||||
21st December 2017, 22:31 | #3 | Link |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 21st December 2017 at 22:34. |
22nd December 2017, 22:16 | #4 | Link |
Anime addict
Join Date: Feb 2009
Location: Spain
Posts: 673
|
From official mirror: ftp://ftp.fftw.org/pub/fftw/
__________________
Intel i7-6700K + Noctua NH-D15 + Z170A XPower G. Titanium + Kingston HyperX Savage DDR4 2x8GB + Radeon RX580 8GB DDR5 + ADATA SX8200 Pro 1 TB + Antec EDG750 80 Plus Gold Mod + Corsair 780T Graphite |
23rd December 2017, 23:45 | #6 | Link |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
I built libfftw3f-3.dll with gcc 4.9.3 (SSE2, x86) and was surprised to see it perform better than the ICC version that comes with dfttest.
Tested on i5 2500K @4GHz. Script for testing: Code:
colorbars(width = 1280, height = 720, pixel_type = "yv12").killaudio().assumefps(50, 1).trim(0, 49) RemoveNoise() function RemoveNoise(clip video, int "threshold") { last = video sc = MSuper(hpad = 16, vpad = 16) backward_vector = MAnalyse(sc, isb = true, delta = 1, blksize = 16, overlap = 4, truemotion = false, sadx264 = 4, dct = 1) forward_vector = MAnalyse(sc, isb = false, delta = 1, blksize = 16, overlap = 4, truemotion = false, sadx264 = 4, dct = 1) MDegrain1(sc, backward_vector, forward_vector, thSAD = 300) return last } Code:
Frames processed: 50 (0 - 49) FPS (min | max | average): 0.742 | 1.529 | 0.760 Memory usage (phys | virt): 51 | 47 MiB Thread count: 9 CPU usage (average): 25% Code:
Frames processed: 50 (0 - 49) FPS (min | max | average): 0.899 | 1.806 | 0.921 (+21%) Memory usage (phys | virt): 49 | 45 MiB Thread count: 9 CPU usage (average): 25%
__________________
Groucho's Avisynth Stuff |
23rd December 2017, 23:48 | #7 | Link |
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
Thank you, brother Marx! Better in speed and memory usage, superb!
Edit: if I missed something, in which part of the script is called dfttest or fft3dfilter? Last edited by GMJCZP; 23rd December 2017 at 23:52. |
24th December 2017, 00:03 | #8 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
However, with dfttest the gcc build is quite a bit slower. So, not recommended for dfttest. With fft3dfilter it's about the same speed.
__________________
Groucho's Avisynth Stuff |
|
2nd January 2018, 13:37 | #9 | Link | |
Join Date: Mar 2006
Location: Barcelona
Posts: 5,034
|
Quote:
__________________
Groucho's Avisynth Stuff |
|
2nd May 2018, 19:31 | #11 | Link | |
Unavailable
Join Date: Mar 2009
Location: offline
Posts: 1,480
|
Quote:
|
|
22nd October 2018, 09:13 | #13 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,795
|
THX for the builds.
o2 is always faster on my Ryzen 1700 (like 0.5 fps faster ). And for some reason the avx build is the slowest. Tested with dfttest.
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database |
5th November 2018, 09:12 | #15 | Link |
Registered User
Join Date: Mar 2003
Location: Germany
Posts: 215
|
@Wolfberry:
I tested the performance of your x64 compilations with dfttest on a mobile i5 Haswell. The one named „simd128+256“ was around 10% faster than the official 3.3.5 Build, all others were 5% slower. However using fft3dfilter with „simd128+256“ produces an access violation. As far as I know, the instruction set of haswells should be complete up to AVX-256. Did someone manage to get fft3dfilter working with this build, if yes on what system? |
8th November 2018, 11:24 | #16 | Link |
Helenium(Easter)
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
|
@ErazorTT
Confirmed.
__________________
Monochrome Anomaly |
8th November 2018, 11:39 | #17 | Link | |
Registered User
Join Date: Mar 2003
Location: Germany
Posts: 215
|
Great, I will test the performance with your new builds when I’m back home.
Quote:
Can you show the differences of the flags you use for your compilations and the old 128+256? |
|
9th November 2018, 09:05 | #18 | Link |
Registered User
Join Date: Mar 2003
Location: Germany
Posts: 215
|
Ok so all builds apart of simd256 work with fft3dfilter. However the old build simd128+256 appears to have been a wee bit faster than all new builds with dfttest.
Out of curiosity: what do you actually mean by simd 128 or 256 in contrast to sse2/avx/avx2? After all sse2 is a 128 simd and avx/avx2 have additional 256 simd instructions on top of sse2. Last edited by ErazorTT; 9th November 2018 at 09:13. |
9th November 2018, 13:08 | #19 | Link | |
Helenium(Easter)
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 99
|
Code:
--enable-sse2 enable SSE/SSE2 optimizations --enable-avx enable AVX optimizations --enable-avx2 enable AVX2 optimizations --enable-avx512 enable AVX512 optimizations --enable-avx-128-fma enable AVX128/FMA optimizations --enable-kcvi enable Knights Corner vector instructions optimizations --enable-altivec enable Altivec optimizations --enable-vsx enable IBM VSX optimizations --enable-neon enable ARM NEON optimizations --enable-generic-simd128 enable generic (gcc) 128-bit SIMD optimizations --enable-generic-simd256 enable generic (gcc) 256-bit SIMD optimizations The SIMD builds also enabled SSE2/AVX/AVX2, but I am not sure if it is worth it. AFAIK, the generic-simd128/256 is some kind of generic AVX(2), not sure how generic they are. The fftw release note says: Quote:
The future builds will have these codelets generated as well.
__________________
Monochrome Anomaly Last edited by Wolfberry; 9th November 2018 at 13:12. |
|
9th November 2018, 18:46 | #20 | Link |
Registered User
Join Date: Mar 2003
Location: Germany
Posts: 215
|
So the generic options are based on the compiler vectorization and optimization.
Have you tried to increase the alignment using --with-incoming-stack-boundary? Like suggested here: https://forum.doom9.org/showthread.p...80#post1857180 |
Tags |
fftw, fftw3.dll |
|
|