Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
12th January 2023, 23:16 | #142 | Link |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Since everything you run through AWS is virtualized (unless you specifically require bare metal), it's hard to know, but I believe they're running an NVIDIA Tesla M6 8GB GDDR5.
|
17th January 2023, 18:52 | #143 | Link | |
Moderator
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,750
|
Quote:
That said, a c6a.4xlarge (AMD EPYC 7R13) may well offer the same performance software-only at a lower per-hour instance cost. |
|
10th February 2023, 11:36 | #144 | Link | ||
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Quote:
The >8bit MPEGs more benefit from AVX512 because it uses 2x wider computing of source and result and immediate numbers (16bit for src and out and 32bit immediate) in compare with 8bit MPEGs (8bit src and out and 16bit immediate possible). Median performance boost from AVX512 over AVX2 is about 40% at 'processing kernels' of MPEG encoder. But 10bit x264 looks never was interested for major usage and no developers resources were for design either wider AVX512 computing for existed x264 architecture (SIMD 'workunit' size of 512 bytes max for AVX2 register file) or even much more complex design separate brunch of 4x enlarged 'workunit' size for 2048 bytes register file for AVX512 x64 environment. And that AVX512 brunch will only be executed faster on rare exist at endusers AVX512 environment in prevoius years. And no rich investor put some grant to support AVX512 redesign of x264 (to use in some possible commertials). When some company start project from AVX512 hardware it can invest in pro programmers to design software solution for target platform and use all benefits of AVX512 environment. Splitting problem chunks to 'large workunit' size up to 2048 bytes with best processing performance at AVX512 platform. Also using alignment of workunits addressing to 64bytes cacheline size for fastest transfer to and from AVX512 register file and dispatch ports. And freeware opensource developers for multiplatform development typically making C-reference solutions to rely on compiler vectorization is possible and may limit 'workunit size' of algorithm to smaller wider used by endusers platforms because it works faster on small register file chips (less reloads from cache). So practically when opensource developers design and profile for best performance some computing algorithms at cheap old platforms with small sized register file (128 or 256 bytes for SSE(2) and AVX(2)) they actually optimize algorithm to run only at small register file sized platforms. And resources of high-cost in the past AVX512 platforms left underused. Full optimizing for AVX512 includes both usage of 2x wider execution ports (and some more faster instructions) and 2x sized datawords transfer and usage of 4x larger register file increasing processed 'workunit' size. And changing 'workunit' size may need the significant redesign of software (like processing 4 blocks in single pass instead of 1 block at AVX2 and so on). AVX512 programs significantly more complex in design and debug. Without AVX512 chip it is possible to design via intel SDE software simulator but it can not provide correct profiling for performance results. May be with progress of AI like ChatGPT or others we can see some progress in using of new compute platforms for solving old tasks like x264. Because it looks resources of opensource freeware programmers fast dying with ending of current civilization. Quote:
Intel C compiler can use multi-file interprocedural optimization (not everytime works and may need to fix sources). It may visibly helps to performance of complex program with many small processing functions (and many C source files). Though with increasing of complexity of program the probability of successful full program multi-file IPO decreases. But you still can try to enable it for separate projects of solution. Sometime you need to run several compiling runs and one of it finally ends with successful multi-file IPO. Also it possibly best compiler to make AVX512-targeted builds (and intel chips builds with individual optimization to many intel chips families). Last edited by DTL; 10th February 2023 at 12:24. |
||
10th February 2023, 17:51 | #145 | Link | |||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Quote:
Quote:
I'm sure that there are other companies using x264 to encode either AVC Intra files or XAVC files using the Sony and Panasonic profiles for linear playout beside us, so if they could chip in and support the development, that would be greatly appreciated. Quote:
Why? Dunno. |
|||
10th February 2023, 18:57 | #146 | Link |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
"professionally AVC Intra and XAVC Intra are two very common use cases for both FULL HD and UHD workflows (like Intra Class 300 and 480) and there AVX512 would definitely speed things up a lot"
Intra may be already very simple and too fast - so the total workflow may be too few benefit if someone make intra more faster. Most slow things may happen at inter frame encoding where motion search required. Also as I see today 10bit x264 builds for AVC-Intra is almost impossible to found. May be it somewhere in special builds of ffmpeg. 8bit 'classic' x264 throws an error at YV12 avisynth input - Code:
at -I 1 --avcintra-class=100 x264 [error]: 8-bit AVC-Intra is not widely compatible x264 [error]: 10-bit x264 is required to encode AVC-Intra x264 [error]: x264_encoder_open failed "I'm sure that there are other companies using x264 to encode either AVC Intra files or XAVC files using the Sony and Panasonic profiles for linear playout beside us, so if they could chip in and support the development, that would be greatly appreciated." If some commertials uses freeware for production it may be too poor company to pay even for its existing poor workers. Not have any funds to pay to pro programmers to make opensource software better. If you have some funds you may try to open some offering contract at software developers jobs sites with exact task to pay for - like make x264 build run 2x faster at AVX512 defined chip with your required arguments and provided sample footage. May be such jobs exist and already solved but not offered opensource as a commit to standard project for free. Also current business models are about making money for business owners and not make job amount of hired workers lower for the same payments. So if worker want to encode faster and spent less work time it is only task for worker to read intel docs for chip and make software run faster. The business owner will not pay for it. Also as stated in typical jobs contracts: All enchancements made by worker for software are new funds of business owner. So it may be illegal for workers contracts to share any enchancements made to x264 at the payed time by business owner. Last edited by DTL; 10th February 2023 at 19:43. |
11th February 2023, 00:22 | #147 | Link | |||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
|
Quote:
Quote:
Keep in mind that the command line switch is a bit like the Blu-ray switch: it will make life easier for you, but you can also use the options and do it manually. https://forum.doom9.org/showthread.php?t=182715 Quote:
About the last point, it's not true, this forum is the demonstration as we have people contributing to open source projects from all over the world and working on several different companies. Our Ben here works at Amazon for instance and they contributed to x265 with the NEON assembly optimization for ARM among other things, for instance, Steinar works at NRK and contributed to the Sony and Panasonic flavours we talked about in x264, Kieran works at Open Broadcast System and made x262 etc (I could go on and on and on), so companies do contribute to open source encoders. |
|||
11th February 2023, 14:42 | #148 | Link | |
Registered User
Join Date: Jul 2018
Posts: 1,041
|
Quote:
Later I hope to put this function call to multi-block SIMD processing of AVX2/AVX512 to keep full quality. Updated: fixed build to include processing of 4:2:2 and 10bit format required for AVX-Intra with skipped _x9 macroblocks compare. It looks this functions still not exist at all for single call _x9 processing and for > 4:2:0 and/or > 8bit processing always performed longer loop of checking each predictor separately. At i5-11600 CPU this build with simulated AVX-Intra 100 for FullHD frame settings (from https://forum.doom9.org/showthread.p...82#post1940382 post) run at about 22.9 fps (jpsdr build 'winthreads' marked run at about 20 fps). As more profiling shows for I-frames only high bitrates encoding with CAVLC only compression: It looks x264 not any optimized for such production. The AVC-Intra over 50 class not allow CABAC (having some asm optimizations) and so x264 only can run with CAVLC compression. CAVLC have only C-implementation and almost no asm optimizations. And it is not about math computing but mostly shuffling small enough and random length byte streams. So SIMD units unlikely can help alot here. It mostly memory-bound task. At least at first look at CAVLC compressor. Also it much more display x264 feature to have lower performance at the lower compression rate. If you remove hard fixed bitrate from AVC-Intra and allow x264 to run at crf-ratecontrol with VBR it run significantly faster. At IBP encodings it is also visible but much lower. So at I-frames only and high fixed bitrates performance looks like limited by rate control logic to keep required very high bitrate stable. If disable CBR and allow low crf - the analysis run several times faster (about 80 fps vs 20fps). May be for professional usage you can found other intra-frame encoder MJPEG-like with much better implementation of fixed CBR output. Last edited by DTL; 13th February 2023 at 09:24. |
|
20th February 2024, 16:23 | #150 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
|
Most probably not containing all the encoders specialized in various internal resolutions (8+10 bit). Especially the higher resolutions require a lot more efforts and code to handle more than 1 byte per video component in Assembler.
New MABS compile: x264 0.164.3179 12426f5 Last edited by LigH; 21st February 2024 at 23:30. |
20th February 2024, 18:58 | #152 | Link |
Registered User
Join Date: Feb 2007
Location: Sweden
Posts: 480
|
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else. |
21st February 2024, 00:36 | #153 | Link |
Registered User
Join Date: Sep 2007
Location: Italy
Posts: 25
|
I read now that there is no lavf support in win64 version. There is in win32 ver.
Code:
x264 core:164 r3179 12426f5 Syntax: x264 [options] -o outfile infile Infile can be raw (in which case resolution is required), or YUV4MPEG (*.y4m), or Avisynth if compiled with support (yes). or libav* formats if compiled with lavf support (no) or ffms support (no). Last edited by blob2500; 21st February 2024 at 00:42. |
21st February 2024, 21:35 | #154 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,752
|
From which source? My MABS build supports both LAVF and FFMS in both bitnesses, also L-SMASH MP4 output.
|
22nd February 2024, 03:06 | #155 | Link |
Registered User
Join Date: Sep 2007
Location: Italy
Posts: 25
|
I was still referring to the latest build 'official' version (r3179, 3MB).
https://artifacts.videolan.org/x264/release-win64/ Up to r3173 (win64) there was support for lavf: Code:
x264 core:164 r3173 4815cca Syntax: x264 [options] -o outfile infile Infile can be raw (in which case resolution is required), or YUV4MPEG (*.y4m), or Avisynth if compiled with support (yes). or libav* formats if compiled with lavf support (yes) or ffms support (no). Thanks for your builds with full lavf+ffms support. I've been using them for a long time. Last edited by blob2500; 22nd February 2024 at 03:12. |
Thread Tools | Search this Thread |
Display Modes | |
|
|