x265 HEVC Encoder - Page 429

Selur · 24th July 2022, 16:13

Yup, using MABS too when building Windows tools for Hybrid.

BuccoBruce · 27th July 2022, 16:22

Quote:

Originally Posted by benwaggoner

If anyone can give me an overview of the best way to go from a source directory to a Windows 64-bit binary, I'll owe you one! I don't mind going the GCC route if that's easier.

Quote:

Originally Posted by Boulder

I don't know if it helps, but I just cloned the repo and edited the make-solutions.bat in the build\vc15-x86_64 folder to this:

cmake -G "Visual Studio 16 2019" ..\..\source && cmake-gui ..\..\source

Then ran make-solutions in the console and it will proceed like it is instructed in the wiki. You might need to point to the path where NASM is and enable assembly in the configure part (assembly is disabled if CMake cannot find NASM). The resulting .sln file can be opened in Visual Studio 2019.

Just a note about MABS, it will build with GCC and purely with GCC, unless you tell it to build with clang. Not an issue for most, and probably preferred if you're going to be using the libraries to link against anything else built with GCC. I still prefer it for building non-free ffmpeg (with ffmpeg's AAC, MP2, MP3, opus, and vorbis implementations disabled outright) to make an MPV build that can handle USAC and that de/encodes opus using libopus. Trying to build all those requirements separately with MSVC myself would take a few years off my life, and take forever.

For what it's worth, I've found on some machines that VS builds of x265.exe (and x264, aom-enc, SVT-AV1, to name a few) perform a bit better in some cases, but only negligibly. It does seemingly add up on Intel CPUs without AVX2 though, and it also allows you to use VS profiler guided optimizations (PGO) if you choose to, and in my case, disable things like Spectre slowdow...I mean mitigations, but only because I don't know how to pass that through to GCC.

I can confirm Boulder's edit worked for compiling under VS2019. I add -A x64 out of habit, so cmake -G "Visual Studio 16 2019" -A x64 ..\..\source && cmake-gui ..\..\source. You would presumably edit it to read "Visual Studio 17 2022" if you're using 2022.

Make sure you start a "x64 Native Tools Command Prompt for VS 2019" to run things from, and make sure NASM is in your PATH or you'll end up with no optimized assembly code.

BuccoBruce · 27th July 2022, 16:49

TL;DR Is there some magic bullet for muxing an HEVC elementary stream with Open GOP using mp4box and getting Media Foundation to decode it properly?

I'm running into some weird issues with Open GOP HEVC + Media Foundation decoding in an MP4 container. Muxing with ffmpeg -movflags faststart+negative_cts_offsets seems to work fine most of the time. It complains about a lack of timestamps in the raw .hevc stream and outputs VFR, so -i has to be preceded with e.g. -r 60000/1001 to get around that, and I have to -loglevel error -stats or it will just quickly fill the console, ad infinitum, with:

Quote:

[mp4 @ 000001da260000c0] Timestamps are unset in a packet for stream 0.
This is deprecated and will stop working in the future.
Fix your code to set the timestamps properly
[mp4 @ 000001da260000c0] pts has no valueB time=00:00:00.00 bitrate=N/A speed= 0x
Last message repeated 103 times

5760x2880, 5408x2704, 4800x2400, 4096x2048, 3840x1920, 3000x1500
Level 6 Main at the max, lower for smaller resolutions
Issue persists even with L5/Main 3000x1500 50 fps video, or 2160x2160 30 fps
8/10 bit doesn't matter
GOP length doesn't seem to matter, tried 60/600 (10 second rule), 30/300 (half), 25/250 (default)
Ref/b-frame count doesn't seem to matter, all within the limits of Level 6 or well below anyways
Thought it might be VBV limited CRF acting up and overflowing the DPB using the default Level 6 VBV, so I tried lowering the VBV, and lower bitrate ABR with a much longer RC Lookahead, issue persisted
CRF encodes ended up being fine anyways, since the issue seems to be limited to MP4Box+Media Foundation...
Disabling Open GOP magically fixes it most of the time.

Is it some kind of IDR signaling issue? Disabling Open GOP alone seemingly resolves all of the issues, but I would like to use Open GOP since these are static camera shots. Is it something really dumb like -inter 500 being too small? I guess that would make me really dumb. It's starting to seem more like a GPAC/mp4box issue, or more likely super-duper Dunning-Kreuger PEBKAC, but I don't know enough about HEVC bitstream output and signaling (PPS/SPS/VUI) to know any better so I wasted my time messing with encoder parameters.

MP4Box output is mostly unplayable, it doesn't seek, and it plays choppy, almost like what you'd expect to see when the decoder drops a temporal enhancement layer and plays back at half FPS. Using --forcesync with mp4box didn't help either. I tried MP4Box with an added --negctts but that just outputs "Arg negctts set but not used" in the console - whereas using negative_cts_offsets in ffmpeg seems to fix things?! Either way, the issue only seems to be with Media Foundation playback. Using an MKV and/or decoding with LAV works fine, as does decoding in software or using MPV or even ffplay.

---

What about any of the x265 bitstream options, could they help? Based on the documentation, --repeat-headers seems like it's only useful for trying to seek within the elementary stream output before muxing it. --aud? --eos? --hrd? I thought --idr-recovery-sei might help, but enabling it along with --repeat-headers seemingly made things worse. mp4box's output when trying to mux a stream made with those two options makes avidemux crash immediately, and makes ffmpeg (MPV) have serious issues playing the file too. This seems to be regardless of the parameters I tried with mp4box: -inter 0 to force a flat mp4, letting it do the default -inter 500, and trying with and without --forcesync for both options. I am pretty sure I tried --nosei, but that'd just be throwing away the extra stuff I asked x265 to write, and then I'd have to waste my time remembering how to re-signal bt709/limited. Trying to remux any of that mp4box output using ffmpeg results in a file that is entirely unplayable in anything, it skips back and forth randomly, you get intermittently decoded blocks, etc. It's also the only result that could technically allow posting a screenshot, since it's NSFW video...it's VR pr0n alright? I can mux directly from the raw .hevc stream if I use those two x265 options with ffmpeg but only with no other parameters, just forcing the FPS with -r 60000/1001 to prevent erroneous VFR output, and -c copy. I haven't tried -movflags faststart, and using negative_cts_offsets seemingly breaks these files too. I also have yet to try putting ffmpeg's output back through mp4box. I am streaming these from a NAS and would prefer to have the MOOV atom at the beginning of the file, so a working flat mp4 is only "half fixed".

What's even weirder is I can take a working mp4 from elsewhere at the same resolution, frame rate, and bitrate, and with seemingly identical x265 settings visible in the SEI, and remux it all I want with mp4box. It doesn't break playback under Media Foundation. The only difference is the version tag for x265 reading 0.0 - these working files were also seemingly muxed with ffmpeg (Lavf58.12.100) or even encoded directly with it using -c:v libx265. Looking at that file, it looks like the only options they passed to x265 were --bitrate 30000 --output-depth 10 --colormatrix=2 --colorprim=2 --transfer=2 --videoformat=5. Everything else is --preset medium defaults.

I've just been using --preset medium with some slower options selectively enabled, and some that I thought would help lower bitrate when I thought that was the issue.

Code:

--bitrate 20000 --output-depth 10 --level-idc 6 --no-high-tier
--rect --amp --tskip --tskip-fast --b-intra --limit-modes
--vbv-bufsize 30000 --vbv-maxrate 40000
--analyze-src-pics --rc-lookahead 120 --min-keyint 60 --keyint 600
--fades --video-signal-type-preset BT709_YCC
--opt-qp-pps --opt-ref-list-length-pps --opt-cu-delta-qp
--limit-sao --selective-sao 1 --sao-non-deblock

Plus either +,- or -,+ for pools on a dual socket system, and I've obviously tried with/without --repeat-headers --idr-recovery-sei . Adding/removing any of --b-intra --fades --analyze-src-pics --opt-qp-pps --opt-ref-list-length-pps --opt-cu-delta-qp didn't make a difference either - I'm just including the command I tried with the most options for completeness.

Boulder · 29th July 2022, 12:23

A note for MABS users who build for Zen2/3: add -march=znver2 or -march=znver3 in custom_profile in the local64\etc directory. It gives a slightly better performance for those chips, I think I found it 3-4% better when I tested it on my 3900X.

LigH · 30th July 2022, 20:06

New upload: x265 3.5+39-a599806d3

[Windows][GCC 12.1.0][32/32XP/64 bit] 8bit+10bit+12bit

LeXXuz · 30th July 2022, 20:42

I was wondering if someone could build me a Zen3 optimized and Zen2 optimized Windows version for my 5950x and 3950x CPUs. That would be much appreciated.

RanmaCanada · 1st August 2022, 15:34

Quote:

Originally Posted by LeXXuz

I was wondering if someone could build me a Zen3 optimized and Zen2 optimized Windows version for my 5950x and 3950x CPUs. That would be much appreciated.

Pretty sure DJATOM has the best. Yes it's an older build, but x265 has been in maintenance mode for well over a year now.

LeXXuz · 1st August 2022, 19:20

Quote:

Originally Posted by RanmaCanada

Pretty sure DJATOM has the best. Yes it's an older build, but x265 has been in maintenance mode for well over a year now.

I know that's why I'd like an actual build for comparison.

benwaggoner · 1st August 2022, 19:23

Quote:

Originally Posted by BuccoBruce

TL;DR Is there some magic bullet for muxing an HEVC elementary stream with Open GOP using mp4box and getting Media Foundation to decode it properly?

I've had some .hevc files that don't play properly when muxed in mp4box, but do when muxed in ffmpeg. ffmpeg complains enormously about missing PTS data, but seems to fix it fine.

They were all Closed GOP, though, so potentially unrelated to your issue.

BuccoBruce · 1st August 2022, 22:19

Quote:

Originally Posted by benwaggoner

I've had some .hevc files that don't play properly when muxed in mp4box, but do when muxed in ffmpeg. ffmpeg complains enormously about missing PTS data, but seems to fix it fine.

They were all Closed GOP, though, so potentially unrelated to your issue.

Might still be related, I re-encoded so many files I might have forgotten if something other than Open GOP was also causing it. Guess I'll stick to ffmpeg for HEVC and just use mp4box for AVC+HLS.

LeXXuz · 2nd August 2022, 06:13

Is SAO still an issue for high quality encodes with actual builds or can this safely be activated now?

microchip8 · 2nd August 2022, 07:23

Quote:

Originally Posted by LeXXuz

Is SAO still an issue for high quality encodes with actual builds or can this safely be activated now?

it's still an issue

LeXXuz · 24th August 2022, 11:58

I'm tinkering around with my profiles to gain more speed out of my encodes. The significant rise in electricity cost here in Germany made that decision necessary.

I have a question regarding the "--limit refs" parameter. As there is a huge speed difference between mode 1 and 3 and I was told to better use mode 1 for better quality, I now also tested mode 2 which none of the presets seem to use by default.

I got a decent performance increase with mode 2 over mode 1 and tested this with quite a few examples. Can't say I've seen any notable differences in quality so far.

I read the docs about the differenct modes, but in all honesty I don't really understand what's written there and how that may affect quality.

I always do high bitrate encodes with the "slower" preset as a base and CRF values of 18 or even below. Is there any good reason NOT to use mode 2 over 1 for better performance?

benwaggoner · 24th August 2022, 18:42

Quote:

Originally Posted by LeXXuz

I'm tinkering around with my profiles to gain more speed out of my encodes. The significant rise in electricity cost here in Germany made that decision necessary.

I have a question regarding the "--limit refs" parameter. As there is a huge speed difference between mode 1 and 3 and I was told to better use mode 1 for better quality, I now also tested mode 2 which none of the presets seem to use by default.

I got a decent performance increase with mode 2 over mode 1 and tested this with quite a few examples. Can't say I've seen any notable differences in quality so far.

I read the docs about the differenct modes, but in all honesty I don't really understand what's written there and how that may affect quality.

I always do high bitrate encodes with the "slower" preset as a base and CRF values of 18 or even below. Is there any good reason NOT to use mode 2 over 1 for better performance?

To test more subtle features like this, I strongly recommend using a 2-pass --bitrate encode instead of CRF. It's hard to disentangle impacts on quality when bitrate is also varying. 1-pass CBR can also work, and is faster.

benwaggoner · 24th August 2022, 19:03

Quote:

Originally Posted by LeXXuz

I'm tinkering around with my profiles to gain more speed out of my encodes. The significant rise in electricity cost here in Germany made that decision necessary.

If you're looking for ways to reduce joules/pixel, --frame-threads 1 can really help. The overhead of frame threading can really reduce power efficiency, and doesn't always have that big of a speed boost depending on how many cores you have and the resolution you're encoding at.

If you use SAO, --selective-sao 2 saves a bit without material quality impact.

If you can share your current command line, we might have other suggestions.

In general, the --preset options are pretty well tuned for a typical range of content and scenarios as of x265 3.0. They don't include any features added in 3.1 or later, which is why no --selective-sao, --rskip 2, etcetera, even though those really should be the defaults.

LeXXuz · 24th August 2022, 23:07

Thank you for those suggestions.

Right now I recode 1080p content

I use these settings:

Code:

--preset slower --crf 17.00 --qpfile "E:\WORK\chp.qpf"
 --repeat-headers --input-depth 16 --output-depth 10 --dither 
--ctu 32 --limit-refs 2 --psy-rdoq 5 --selective-sao 0 --no-sao 
--colorprim bt709 --transfer bt709 --colormatrix bt709

CPUs used are Ryzen 5950x and 3950x.

benwaggoner · 25th August 2022, 01:49

Quote:

Originally Posted by LeXXuz

Thank you for those suggestions.

Right now I recode 1080p content

I use these settings:

Code:

--preset slower --crf 17.00 --qpfile "E:\WORK\chp.qpf"
 --repeat-headers --input-depth 16 --output-depth 10 --dither 
--ctu 32 --limit-refs 2 --psy-rdoq 5 --selective-sao 0 --no-sao 
--colorprim bt709 --transfer bt709 --colormatrix bt709

CPUs used are Ryzen 5950x and 3950x.

--slower is already one of the better-balanced presets. Changing parameters from slower to ones from slow will speed things up, but all of them have quality impacts too.

There's no point to using --selective-sao if you're already using --no-sao.

I always like to set --profile and --level-idc so I'll get warnings if I violate the requirements. In your case that looks like --profile main10 --level-idc 4.0 or 4.1.

Using --psy-rdoq 5 without raising --psy as well is an uncommon configuration, but should work.

I'd use --rskip 2 to replace the default --rskip 1 because it's a better quality mode. I've not directly compared the speed. Higher --rskip-edge-threshold values are faster, but can reduce quality. I tend to use 2-3 in my stuff, but I'm more biased towards quality/efficiency than your use case.

What CPU are you running on?

The biggest thing to improve pixels/joule without any quality loss would be --frame-threads 1. Lower values can actually improve quality.

You can learn a lot from doing a --csv-log-level 2 and looking at the frame level data. For example, if there aren't a lot of TUs smaller than 8x8 you could reduce --tu-intra-depth and --tu-inter-depth by 1. Recursing all the way down is mostly helpful with content that has sharp details, like text and cel animation.

If you have a lot of RAM, increasing --rc-lookahead can improve quality when VBV-limited quite a lot without much negative speed impact.

LeXXuz · 25th August 2022, 09:38

Quote:

Originally Posted by benwaggoner

There's no point to using --selective-sao if you're already using --no-sao.

I was uncertain if I have to set it to 0 as well when I don't want to have SAO at all. I will remove that parameter.

Quote:

Originally Posted by benwaggoner

I always like to set --profile and --level-idc so I'll get warnings if I violate the requirements. In your case that looks like --profile main10 --level-idc 4.0 or 4.1.

Again, I was unsure if I should let x265 decide on its own or put these in manually. Never thought about the violation warnings though which is a very good point. Will add these again.

Quote:

Originally Posted by benwaggoner

Using --psy-rdoq 5 without raising --psy as well is an uncommon configuration, but should work.

Well, that is a longer story and the most subtle approach at the moment to fight banding with the quite clean source material I have. The --slower preset already uses --psy-rd 2. Raising that any higher added too much static noise into flat areas to my taste.
It's barely visible on 4k, but visible on 1080p and almost terrible on SD.
Without raising at least --psy-rdoq a little, x265 tends to produce banding in certain flat areas. And sadly my living room TV is very susceptible to that and tends to intensify even the slightest banding compared to my other TVs. So this is somewhat a personal compromise.

Quote:

Originally Posted by benwaggoner

I'd use --rskip 2 to replace the default --rskip 1 because it's a better quality mode. I've not directly compared the speed. Higher --rskip-edge-threshold values are faster, but can reduce quality. I tend to use 2-3 in my stuff, but I'm more biased towards quality/efficiency than your use case.

I'll add --rskip 2 to my script.

Quote:

Originally Posted by benwaggoner

What CPU are you running on?

AMD Ryzen 5950x and 3950x. Both with 16 cores/32 threads

Quote:

Originally Posted by benwaggoner

The biggest thing to improve pixels/joule without any quality loss would be --frame-threads 1. Lower values can actually improve quality.

Doesn't that decrease speed a lot as it reduces parallel processing? Or am I mistaken here?

Quote:

Originally Posted by benwaggoner

If you have a lot of RAM, increasing --rc-lookahead can improve quality when VBV-limited quite a lot without much negative speed impact.

The machines have 64GB. I think the default is 40 for the --slower preset? How much should I raise that?

Thanks again for your valued input Ben.

vpupkind · 25th August 2022, 20:25

rc_lookahead -- at least 1s worth of frames

Immaculate · 25th August 2022, 23:13

Quote:

Originally Posted by benwaggoner

--slower is already one of the better-balanced presets. Changing parameters from slower to ones from slow will speed things up, but all of them have quality impacts too.

There's no point to using --selective-sao if you're already using --no-sao.

I always like to set --profile and --level-idc so I'll get warnings if I violate the requirements. In your case that looks like --profile main10 --level-idc 4.0 or 4.1.

Using --psy-rdoq 5 without raising --psy as well is an uncommon configuration, but should work.

I'd use --rskip 2 to replace the default --rskip 1 because it's a better quality mode. I've not directly compared the speed. Higher --rskip-edge-threshold values are faster, but can reduce quality. I tend to use 2-3 in my stuff, but I'm more biased towards quality/efficiency than your use case.

What CPU are you running on?

The biggest thing to improve pixels/joule without any quality loss would be --frame-threads 1. Lower values can actually improve quality.

You can learn a lot from doing a --csv-log-level 2 and looking at the frame level data. For example, if there aren't a lot of TUs smaller than 8x8 you could reduce --tu-intra-depth and --tu-inter-depth by 1. Recursing all the way down is mostly helpful with content that has sharp details, like text and cel animation.

If you have a lot of RAM, increasing --rc-lookahead can improve quality when VBV-limited quite a lot without much negative speed impact.

Thanks for the tips. --rskip 2 seems to improve grain "motion" quite a bit in some cases.

It's a shame that you have to fiddle with x265 to get an acceptable quality, when a simple --tune film --preset veryslow produces good results with x264. Of course, clean material isn't an issue, it's just that x264 looks better with noise/grain - out of the box.

24th July 2022, 16:13	#8561 \| Link
Selur Registered User Join Date: Oct 2001 Location: Germany Posts: 7,277	Yup, using MABS too when building Windows tools for Hybrid. __________________ Hybrid here in the forum, homepage

29th July 2022, 12:23	#8564 \| Link
Boulder Pig on the wing Join Date: Mar 2002 Location: Finland Posts: 5,733	A note for MABS users who build for Zen2/3: add -march=znver2 or -march=znver3 in custom_profile in the local64\etc directory. It gives a slightly better performance for those chips, I think I found it 3-4% better when I tested it on my 3900X. __________________ And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon...

30th July 2022, 20:06	#8565 \| Link
LigH German doom9/Gleitz SuMo Join Date: Oct 2001 Location: Germany, rural Altmark Posts: 6,782	New upload: x265 3.5+39-a599806d3 [Windows][GCC 12.1.0][32/32XP/64 bit] 8bit+10bit+12bit __________________ New German Gleitz board MediaFire: x264 \| x265 \| VPx \| AOM \| Xvid

24th August 2022, 23:07	#8576 \| Link
LeXXuz 21 years and counting... Join Date: Oct 2002 Location: Germany Posts: 716	Thank you for those suggestions. Right now I recode 1080p content I use these settings: Code: --preset slower --crf 17.00 --qpfile "E:\WORK\chp.qpf" --repeat-headers --input-depth 16 --output-depth 10 --dither --ctu 32 --limit-refs 2 --psy-rdoq 5 --selective-sao 0 --no-sao --colorprim bt709 --transfer bt709 --colormatrix bt709 CPUs used are Ryzen 5950x and 3950x.

30th July 2022, 20:42	#8566 \| Link
LeXXuz 21 years and counting... Join Date: Oct 2002 Location: Germany Posts: 716	I was wondering if someone could build me a Zen3 optimized and Zen2 optimized Windows version for my 5950x and 3950x CPUs. That would be much appreciated.

2nd August 2022, 06:13	#8571 \| Link
LeXXuz 21 years and counting... Join Date: Oct 2002 Location: Germany Posts: 716	Is SAO still an issue for high quality encodes with actual builds or can this safely be activated now?

24th August 2022, 11:58	#8573 \| Link
LeXXuz 21 years and counting... Join Date: Oct 2002 Location: Germany Posts: 716	I'm tinkering around with my profiles to gain more speed out of my encodes. The significant rise in electricity cost here in Germany made that decision necessary. I have a question regarding the "--limit refs" parameter. As there is a huge speed difference between mode 1 and 3 and I was told to better use mode 1 for better quality, I now also tested mode 2 which none of the presets seem to use by default. I got a decent performance increase with mode 2 over mode 1 and tested this with quite a few examples. Can't say I've seen any notable differences in quality so far. I read the docs about the differenct modes, but in all honesty I don't really understand what's written there and how that may affect quality. I always do high bitrate encodes with the "slower" preset as a base and CRF values of 18 or even below. Is there any good reason NOT to use mode 2 over 1 for better performance?

25th August 2022, 20:25	#8579 \| Link
vpupkind Registered User Join Date: Jul 2007 Posts: 63	rc_lookahead -- at least 1s worth of frames