Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
28th August 2021, 14:06 | #1 | Link |
Registered User
Join Date: Jul 2015
Posts: 708
|
Questions about assembler
I don't know how to make codecs with nasm and gcc.
Does assembler mean SIMD AVX, AVX2, AVX3? I know that assembler can be used with 64bit files. Do you need computer with cpu AVX2 for the codecs? What functions should gcc and nasm have? Is nasm the best for assembler? Maybe there is something better and newer. Maybe third program is needed to merge the others? How to check if assembler is included and works with the codec? |
28th August 2021, 15:53 | #2 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
No. Not necessarily. Assembler means you are writing code in assembly language, instead of using a "high level" language, such as C, C++, Rust, etc. pp.
If you write code in assembly language, for the x86 or x64 platform, then you may use SIMD instructions (MMX, SSE/2/3/4, AVX/2, etc.), but you don't have to. Using SIMD instructs limits the CPUs that your code will run on. Note: Usually like ~99% of the code of an application or library are written in a "high level" language. Only the "critical" functions are written as assembler code, for optimization purposes. If code written in assembly language uses AVX2 instructions, then yes, that code requires a CPU which supports (at least) the AVX2 instruction set extension. Otherwise it would crash will "illegal instruction" exception But: Usually, developers create multiple versions of the assembly code targeting different types of CPU. Then, at runtime, the "best" version of the code for the particular CPU can be selected. For example, the same function can be implemented as "plain C" (runs on all CPUs and serves as a baseline), as AVX-optimized assembly code (runs on CPU with AVX support) and as AVX2-optimized assembly code (runs on CPU with AVX2 support). At runtime, the application can check the capabilities of the CPU that it is running on, using the CPUID instruction, and then select the implementation that matches the actual CPU. Of course, all that does not happen "automatically". The programmer has to implement it that way! Quote:
By looking at the source code? Also, applications and libraries often provide diagnostic output at runtime, which shows whether it was built with assembly code enabled and, if so, which specific assembly code optimizations actually are in use. x264 is a good example for that: Code:
x264 [info]: using cpu capabilities MMX MMXEXT SSE SSE2
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 28th August 2021 at 16:33. |
|
28th August 2021, 18:30 | #6 | Link | ||||
Registered User
Join Date: Jul 2015
Posts: 708
|
Quote:
AVX2 ranges from MMX, SSE2 / 3/4 to AVX2, but does that mean I will be able to use SSE2 alone? Quote:
Added asm sse2 or avx/2/3 files can be freely added in gcc under the same (-msse2) but are these dead functions and do they contribute anything? It doesn't have rare comumic. Quote:
Quote:
For x265 I have a message: none. At first I thought I had an old computer, but I downloaded other users' codec and here the assembler communicates that it works. Gives preset functions to programs: I wonder what I am doing wrong. Code:
g++.exe -std=gnu++11 -ggdb3 -flto -O3 -fPIC -DWINVER=0x0602 -D_WIN32_WINNT=0x0602 -DEXPORT_C_API=0 -DX265_NS=x265_12bit -DX86_64=1 -DX265_VERSION=3.5+13 -DHIGH_BIT_DEPTH=1 -DX265_DEPTH=12 -DNX265_ARCH_X86=1 -DENABLE_HDR10_PLUS=1 -DNENABLE_LIBVMAF=1 -DENABLE_ASSEMBLY=1 -DHAVE_STRTOK_R -c ... -o ... nasm.exe -f win64 -O3 -DARCH_X86_64=1 -DBIT_DEPTH=12 -DHIGH_BIT_DEPTH=1 -DX265_NS=x265_12bit -DPIC=1 -DSUFFIX=o -Xgnu ... -o ... Should windows be win64 or elf64 for gcc? Or maybe gcc 12.0.0 just has bugs and doesn't work. Sorry for my English Last edited by Jamaika; 28th August 2021 at 18:59. |
||||
28th August 2021, 18:35 | #7 | Link |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Most applications or libraries that do contain optimized assembly code will check for the required assembler tool (e.g. nasm) when you run the provided ./configure script.
Usually the ./configure script will simply error out when the required assembler tool is missing – unless you explicitly disable assembly code by passing the --disable-asm option (or whatever it is called). Sometimes the ./configure script "silently" disables the assembly code, if the required assembler tool wasn't found. In that case you should see whether assembly code is enabled or not from the final output of the ./configure script, e.g.: Code:
./configure [...] platform: X86 shared: yes static: yes asm: yes <--- !!!
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 28th August 2021 at 18:53. |
28th August 2021, 18:45 | #8 | Link | |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
If "optimized" assembly code uses any of those instruction set extensions, then that code will run only on CPUs which support the specific instruction set extension. So, if the assembly code uses any SSE2 instructions, it requires a CPU that supports (at least) SSE2. If the assembly code uses any AVX instructions, it requires a CPU that supports (at least) AVX. If the assembly code uses any AVX2 instructions, it requires a CPU that supports (at least) AVX2. And so on. Of course, SSE2 and AVX (or AVX2) instructions can be "mixed" in the assembly code, but then a CPU with support for SSE2 and AVX (or AVX2) is required to run that code! (Note: If a CPU supports AVX, then support for SSE2 is implied, but certainly not the other way around! Also AVX2 support implies AVX support, but again not the other way around) As said before, an application or library may contain multiple versions of the same assembly code. For example, one version that uses SSE2 only and another version that uses AVX. This allows the "SSE2 only" assembly code to be used on CPUs that do support SSE but not AVX. And the "AVX" assembly code can be used on CPUs that support AVX. But, to be clear, this kind of "runtime CPU dispatching" does not happen automatically; it needs to be implemented!
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 28th August 2021 at 19:00. |
|
28th August 2021, 19:06 | #9 | Link | |
Registered User
Join Date: Jul 2015
Posts: 708
|
Quote:
Use functions in gcc flto or fast-math or rather not necessarily? What function g "" should be used? g0, ggdb or gdwarf Last edited by Jamaika; 28th August 2021 at 19:10. |
|
28th August 2021, 20:59 | #11 | Link | ||
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
It is the architecture that all Intel and AMD processors from last ~15 years use. Binaries built for "x86-64" architecture will run on 64-Bit Windows (Win64), as long as we are talking about the 64-Bit Windows for Intel/AMD "x86-64" processors. Note: There now is a version of Windows for "arm64" processors too, but that is something different! Quote:
https://en.wikipedia.org/wiki/Interp...l_optimization Regarding the "-ffast-math" option: This enables some "unsafe" optimizations for math functions. Specifically, it drops strict compliance to the IEEE rules/specifications in order to yield even faster code. This may break some applications!
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 28th August 2021 at 21:30. |
||
28th August 2021, 21:08 | #12 | Link | |
Registered User
Join Date: Jul 2015
Posts: 708
|
Quote:
|
|
28th August 2021, 21:16 | #13 | Link |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Well, LTO may give a nice speed-up for some applications, but may not have any noteworthy effect for others. You really have to test it out. I think it's something worth trying.
But option "-ffast-math" is something you should use with care. Even though it may give an extra speed-up, if you do not exactly understand the consequences (for your particular application) then better don't use it
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 28th August 2021 at 21:32. |
29th August 2021, 13:00 | #14 | Link |
Registered User
Join Date: Jul 2015
Posts: 708
|
Searching for a bug.
I read bit. For libjpeg-turbo cmake is recommended to use debugging information "-g". In nasm this is just the equivalent of "-g" I used "-g0". My mistake. "Level 0 produces no debug information at all. Thus, -g0 negates -g." Default is 2 Then I ran test with ggdb3. The assembler test for x265 failed on sse2 or avx2. For svt-av1 cmake is recommended to run "-gdwarf". Option under unix. The problem is that "gdwarf" in gcc defaults to gdwarf32 for the nasm elf32/64 function. Trying with the "gdwarf" function did not work either. For me, it's big deal to combine nasm with gcc 12.0.0. The suggestion of permutation with options in programs is like ball against the fence. Nasm doesn't have lot of options. The programs either work or don't. libjpeg-turbo Code:
nasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -D__x86_64__=1 -DPIC=1 %%f -Xgnu -o %%~nf.o gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DWITH_SIMD=1 -DUSE_WINDOWS_MESSAGEBOX=1 -DBITS_IN_JSAMPLE=8 -DINLINE="inline __attribute__((always_inline))" -DLOCAL(type)="static type" -c %%f -o %%~nf.o Code:
fnasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -DARCH_X86_64=1 -DPIC=1 -Dprivate_prefix=dav1d -Xgnu %%f -o %%~nf.o gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DARCH_X86_64=1 -DHAVE_ASM=1 -D__USE_MINGW_ANSI_STDIO=1 -DUNICODE=1 -D_UNICODE=1 -DCONFIG_8BPC -DCONFIG_16BPC -DBITDEPTH=8 -DHAVE_ALIGNED_MALLOC=1 -c %%f -o %%~nf.o Code:
nasm_2.15.05.exe -f win64 -g -F cv8 -O3 -DWIN64=1 -DARCH_X86_64=1 -DPIC=1 -Xgnu %%f -o %%~nf.o gcc_12.0.0.exe -std=gnu11 -g -O3 -fPIC -DARCH_X86_64=1 -c %%f -o %%~nf.o Code:
nasm_2.15.05.exe -fwin32 -g -F cv8 -O3 -DX265_ARCH_X86=1 -DARCH_X86_64=0 -DBIT_DEPTH=8 -DHIGH_BIT_DEPTH=0 -DX265_NS=x265 -DPIC=1 -DSUFFIX=o -Xgnu %%f -o %%~nf.o g++_12.0.0.exe -std=gnu++11 -g -O3 -fPIC -DX86_64=0 -DWINVER=0x0602 -D_WIN32_WINNT=0x0602 -DEXPORT_C_API=1 -DX265_NS=x265 -DLINKED_10BIT=1 -DLINKED_12BIT=1 -DX265_VERSION=3.5+13 -DHIGH_BIT_DEPTH=0 -DX265_DEPTH=8 -DENABLE_HDR10_PLUS=1 -DHAVE_STRTOK_R -DENABLE_ASSEMBLY=1 -c %%f -o %%~nf.o Last edited by Jamaika; 31st August 2021 at 06:27. |
31st August 2021, 06:01 | #15 | Link |
Registered User
Join Date: Jul 2015
Posts: 708
|
Building with assembler using libheif. Assume that we do not press cmake, but we want to test new functions.
After hours of guessing, I find libjpeg-turbo, dav1d, and x265 assemblers a pain to compile. Defects: Very large gcc build file sizes with {-g -O3} only functions. The user can only use fPIC under 64bit if the assembly files contain the appropriate functions. The rest of the functions aren't included {-ffast-math -fflto -ftree-vectorize}. There will be differences in the nasm, gcc files and the codec will not work properly. x265 works for me only in 32bit, when we turn off avx2, avx3 functions which only work in 64bit. When I use gcc in C++ x86_64 32bit {m32} then gcc doesn't allow this to work with the additional {mingw.thread} software. There are bugs. Moreover x265 assembler has bugs and doesn't work in 64bit after corrections, which it communicates. libjpeg and dav1d only work in x86_64 32bit and 64bit. User should use {mavx2} functions because assembler is mmx - sse2 - avx2. However user may not approve {mavx2} and it is only under SSE2. So for photo codecs the user should not use assembler. In libavif should be easier. There isn't c++ and x265. AV1 doesn't have 64bit assembler yet https://www.sendspace.com/file/qmalk7 Last edited by Jamaika; 31st August 2021 at 06:36. |
31st August 2021, 07:41 | #16 | Link |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,348
|
All these projects already come with build systems that compile the C code and the assembly files for you. Why are you manually trying to compile them when someone already took all the effort to make it easy to compile with just a few commands?
__________________
LAV Filters - open source ffmpeg based media splitter and decoders |
31st August 2021, 08:25 | #17 | Link |
Registered User
Join Date: Jul 2015
Posts: 708
|
Why can I not? Inquiry is prohibited.
Secondly. I wanted to investigate the latest dav1d assembler fixes that are not included in libheif. I know this upsets some. https://github.com/videolan/dav1d "When I use gcc in C++ x86_64 32bit {m32} then gcc doesn't allow this to work with the additional {mingw.thread} software." My mistake. The downloaded gcc 64bit did not contain 32bit files. |
1st September 2021, 07:57 | #18 | Link |
Registered User
Join Date: Jul 2015
Posts: 708
|
Assembler in GCC for AVX / AVX2 / AVX3 processors. I have old computer so this topic didn't interest me. However this change awaits me.
Amateur observations. How is it that AVX2 can open files in C11 from C++11 and not in C++11 from C11. How to observe it? Help options aren't displayed in C++11. What am I doing wrong? I read on forums that combining languages in C11/C++11 is troublesome. Use "march=native". It doesn't work for me. I thought that I had old computer that doesn't open the next generation AVX2 codecs. Today I compile libavif in c11. Codecs work in SSE2. Thus. Does user need special file extraction on SSE2 and AVX2? What to add to C ++ 11 to make it work? Despite the effort, I was unable to run libheif or libwebp2 in C++11 with AVX2. Examples: https://www.sendspace.com/filegroup/...g60Ouqaacpaf5Q Last edited by Jamaika; 1st September 2021 at 09:19. |
|
|