Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th January 2025, 00:29   #9701  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,955
Quote:
Originally Posted by higher View Post
It's likely that Zen 5 is not much of an advance in x265 encoding compared to other workloads.
It has full native AVX512 support, which could make using that flag improve performance more, and in more scenarios.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 16th January 2025, 00:30   #9702  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,955
Quote:
Originally Posted by cubicibo View Post
VFR can be signaled both on the entire stream or at a CWS level. But specifying the actual picture output-presentation delay is overly complicated here.

Anyway, it does not matter for the current problem. pic_struct should be used with CFR. But frame entry time in decoder must be adapted with respect to the last pic struct instruction. I can't find any such code in x265, so VBV conformance must be way off.
Back in the VC-1 days we handled this by having the frame in the bitstream, containing just the frame_repeat tag. Didn't require any VFR, time shifts, etcetera.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 16th January 2025, 04:46   #9703  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by benwaggoner View Post
It has full native AVX512 support, which could make using that flag improve performance more, and in more scenarios.
If my test is anything to be believed, it's around 5% to 8% "free performance" (more or less), depending on the target quality somehow.
(5% with CRF-14, 6.3% with CRF-18, 6.6% with CRF-22 and 8% with CRF-26, the 4 quality targets I often use to draw RD curve (there's no difference in the RD curve in this case, of course))

Last edited by Z2697; 16th January 2025 at 05:46.
Z2697 is offline   Reply With Quote
Old 16th January 2025, 16:09   #9704  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,871
Benchmarking with different builds, I've noticed that Patman86's x265-4.1+79+12-81640d428 ICC standard/AVX/AVX2 builds are identical but 1-2 bytes.

It seems a bit strange to me and I've opened a issue on GitHub.

Please be aware of that in the mean time.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 16th January 2025, 17:53   #9705  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,955
Quote:
Originally Posted by Z2697 View Post
If my test is anything to be believed, it's around 5% to 8% "free performance" (more or less), depending on the target quality somehow.
(5% with CRF-14, 6.3% with CRF-18, 6.6% with CRF-22 and 8% with CRF-26, the 4 quality targets I often use to draw RD curve (there's no difference in the RD curve in this case, of course))
Hmm. Perhaps higher QP makes for fewer early exists so doing more in parallel helps? Can you share the command line? I imagine stuff like TU size could have different impact.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 17th January 2025, 04:05   #9706  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by tormento View Post
Benchmarking with different builds, I've noticed that Patman86's x265-4.1+79+12-81640d428 ICC standard/AVX/AVX2 builds are identical but 1-2 bytes.

It seems a bit strange to me and I've opened a issue on GitHub.

Please be aware of that in the mean time.
You should compare with both setting no-info, and do a proper bytes compare instead of just telling from the size

Never mind, so you were talking about the executable, I thought you were talking about the encoded hevc stream.

Last edited by Z2697; 17th January 2025 at 04:08.
Z2697 is offline   Reply With Quote
Old 17th January 2025, 04:23   #9707  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by benwaggoner View Post
Hmm. Perhaps higher QP makes for fewer early exists so doing more in parallel helps? Can you share the command line? I imagine stuff like TU size could have different impact.
Code:
--preset slow
--rd 6
--ctu 32
--no-rect
--no-sao
--no-strong-intra-smoothing
--no-open-gop
--b-intra
--weightb
--aq-mode 1
--aq-strength 0.8
--qcomp 0.7
--pbratio 1.2
--bframes 3
--cbqpoffs -2
--crqpoffs -2
--deblock -3,-3
--rc-lookahead 80
Z2697 is offline   Reply With Quote
Old 17th January 2025, 19:51   #9708  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,955
Quote:
Originally Posted by Z2697 View Post
Code:
--preset slow
--rd 6
--ctu 32
--no-rect
--no-sao
--no-strong-intra-smoothing
--no-open-gop
--b-intra
--weightb
--aq-mode 1
--aq-strength 0.8
--qcomp 0.7
--pbratio 1.2
--bframes 3
--cbqpoffs -2
--crqpoffs -2
--deblock -3,-3
--rc-lookahead 80
Huh. What are you trying to encode/optimize for with those settings? And this is with 1080p --crf, right?

Nothing pops out as impacting performance in particular. However, as CABAC is single threaded per frame, just reducing the bitrate itself will improve performance on many systems.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 17th January 2025, 20:43   #9709  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by benwaggoner View Post
Huh. What are you trying to encode/optimize for with those settings? And this is with 1080p --crf, right?

Nothing pops out as impacting performance in particular. However, as CABAC is single threaded per frame, just reducing the bitrate itself will improve performance on many systems.
I don't usually optimize x265 parameters for every different sources based on their "style" or characteristic, unless the image quality of the source is very characterized.
These are just... "generic" settings.
I choose my encoding parameters based on the performance (FPS & RD), and the "dark biased AQ" based on... I see fit. I frankensteined AQ 1 and drak-bias as new mode in my mod build, yes
(AQ is not really about the RD performance, of course. The FPS performance also don't seem to affected much by AQ except edge based ones.)

However, the major portion of the things I encode is anime. When I do non-test encoding I use slower parameters. (e.g. hme)

And yes, the test is 1080p CRF.

Last edited by Z2697; 18th January 2025 at 16:47.
Z2697 is offline   Reply With Quote
Old 19th January 2025, 16:30   #9710  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,530
Quote:
Originally Posted by higher View Post
It's likely that Zen 5 is not much of an advance in x265 encoding compared to other workloads.

TPU uses x265 with preset slow at 4K resolution. It fully saturates my 5900X and I'm guessing it fully saturates a 9900X as well. Yet, the 9900X is only 25% faster than 5900x while the 9950x is 27% faster than 5950x in he TPU benchmark and it sounds about right.

Power consumption is on a different level though. The 9900X consumes around 170W fully loaded while an M4 Pro consumes less than 50W.
Unfortunately X86 is years behind in this regards and also in single core performance.
No, x265 at 4K with x265 at slow preset don't saturate 32 threads and by far. And saturate all thread mean 100% CPU charge at power limit during all encoding time.

If TPU codec benchmark saturate all CPU, the difference between 5900X, 5950X, 9900X and 9950X could be equivalent to blender benchmark, for exemple. And it's not the case.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9
Sagittaire is offline   Reply With Quote
Old 31st January 2025, 02:36   #9711  |  Link
OvejaNegra
ekTOMBE STUDIOS
 
OvejaNegra's Avatar
 
Join Date: Dec 2005
Location: Cuba
Posts: 257
hi to all!
it's been a long time since i used x265, i had to change the megui version and of course i updated x265 to latest version.

i jumped from version 3.x to 4.1

Since i still have some old habits, i want to know if things have changed so i can update my presets. I have some
presets with some options that i tweaked for encoding speed VS qualiy using advices from other users.

As i said, it's been some time since i used x265 and maybe some old habits must change too.

Sorry in advance for my english, is not my native language.

Here are my doubts:

1 - I'm using rd 4 because rd 6 gave me some artifacts on sharp edges, i remember someone saying that 4 was safe for real life
content and 6 was better for animation (no artifacts with animation)

2 - I still use --aq-mode 1 --aq-strength xx because the other modes were a little inconsistent for me, any changes? (for real life
and animation)

3 - I'm using --max-merge 2, other modes gave me problem with edges on dark scenes (worst on animation).

4 - --rskip 2 --rskip-edge-threshold X is still recommended?

5 - --limit-refs 3 with more refs (5) is still adviced VS --limit-refs 0 and less refs (taking in account encoding time and
benefits of the reference frames)

6 - i'm using --ctu 32 because a bug with default value (on 4k content if i remember correctly) is it fixed? Should i use
64 for everything or just for 4k content (and 32 for the rest).

7 - I'm using --bframes 4 for everything, should i use 2 for live action and 4 for animation?

8 - I'm using --tu-intra-depth 3 --tu-inter-depth 3 always (live action and animation). It's ok? Should i use it only if i'm using

rect and amp?

9 - Is --limit-tu X advised with --tu-intra-depth 3 --tu-inter-depth 3 ?

10 - If i have time, i use --rect for real life content and --rect --amp for animation, is that ok?

11 - --limit-modes is still advised to use with --rect --amp if speed is required (vs not using rect amp at all)

12 - I'm using --rc-lookahead 250 or as big as i can (for abr and crf), is still advised?

13 - I always use --weightp --weightb --b-intra

14 - I use --psy-rd 2.0 --psy-rdoq 0.0 --rdoq-level 0 if it looks ok and --psy-rd 2.0 --psy-rdoq 2.0 --rdoq-level 2
if i need to retain more detail, anything new on that?

15 - Has anyone tested --tskip with animated content? Does it really helps? Should i use --tskip-fast?

16 - For real life content, i don't like / use sao, but i remember it being usefull for animation with low bitrate, any
change on that?

17 - What's the utility of --limit-sao and --sao-non-deblock? (more speed? some kind of intelligent mode? less strong effect?)

18 - --me 4 (star) is adviced? (for real life and animation)

19 - I use --subme 3 for everything ¿Should i use another mode? Should i use a different mode for animation?

20 - Any other advice? Something new that i should use / not use?

Thanks!!
__________________
So, it works or not???
OvejaNegra is offline   Reply With Quote
Old 31st January 2025, 10:49   #9712  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
The problems you mentioned in 1, 3 and 6 feels unrealistic, can you elaborate?
While the "core encoding functions" haven't been changed for years, and you are pretty much will be getting exact same results (I have compared 3.5 to 4.1), I have some advise... or just my 2 cents.

5. limit-refs makes very little difference (~1% speed and ~0.05% bd-rate). ref isn't that big of a deal for regular contents neither.

6. ctu (64 or 32) isn't very impactful to the quality. many other things are actually 32x32 max in HEVC.

7. just don't use too many b frames, they will slow down the encoding with b-adapt and has no real benefit.

8. tu-depth is not "tied" to rect and amp.

9. limit-tu is a good trade-off, but maybe lower the depths to 2 is a better trade-off.

10. they are ok and will improve a little bit of quality. with tu-depths already increased it'll be tiny little bit.

15. tskip only works on 4x4 intra TUs. it's not gonna magically make anything look better. the target situation of its design is esoteric and not even work very well.

18. me 4 is not star.
Z2697 is offline   Reply With Quote
Old 31st January 2025, 13:04   #9713  |  Link
OvejaNegra
ekTOMBE STUDIOS
 
OvejaNegra's Avatar
 
Join Date: Dec 2005
Location: Cuba
Posts: 257
Quote:
Originally Posted by Z2697 View Post
The problems you mentioned in 1, 3 and 6 feels unrealistic, can you elaborate?
While the "core encoding functions" haven't been changed for years, and you are pretty much will be getting exact same results (I have compared 3.5 to 4.1), I have some advise... or just my 2 cents.

5. limit-refs makes very little difference (~1% speed and ~0.05% bd-rate). ref isn't that big of a deal for regular contents neither.

6. ctu (64 or 32) isn't very impactful to the quality. many other things are actually 32x32 max in HEVC.

7. just don't use too many b frames, they will slow down the encoding with b-adapt and has no real benefit.

8. tu-depth is not "tied" to rect and amp.

9. limit-tu is a good trade-off, but maybe lower the depths to 2 is a better trade-off.

10. they are ok and will improve a little bit of quality. with tu-depths already increased it'll be tiny little bit.

15. tskip only works on 4x4 intra TUs. it's not gonna magically make anything look better. the target situation of its design is esoteric and not even work very well.

18. me 4 is not star.

I'ts a little hard for me to find the information, as i said, i saw the artifacts, did some research, changed the values and
it was ok (for me):


https://forum.doom9.org/showthread.p...87#post1892787


Quote:
"I don't go higher than max-merge 3. I've seen 4 and 5 produce worse results than 3 during fast motion in animated content. I don't know why."

Increasing max merge, gave something like edge ghosting (remember old LCDs?) on dark edges on dark background if i remember correctly


https://forum.doom9.org/showthread.p...93#post1963393

Quote:
"CTU 64 will make a mess out of noisy flat backgrounds compared to CTU 32 (qg-size 32 in both cases since x265 doesn't use 64 by default for CTU 64)"

https://forum.doom9.org/showthread.p...28#post1963328


https://forum.doom9.org/showthread.p...52#post1963352



5 - Should i stay with limit-refs 3? (faster)

7 - 4 Looked like a balanced default to me, maybe 2 is better for real life content, but that's just speculation

9 - like: --tu-intra-depth 2 --tu-inter-depth 2 --limit-tu 0 instead of --tu-intra-depth 3 --tu-inter-depth 3 --limit-tu 4

10 - So, for general content and animation, is a better investiment --tu-intra-depth 2 --tu-inter-depth 2 and forget about rect amp?

15 - I remember members commenting about the possible bennefits of tskip for animation, maybe someone has some actual experience with it.

18 - Sorry: --me 3

thanks!
__________________
So, it works or not???
OvejaNegra is offline   Reply With Quote
Old 31st January 2025, 18:56   #9714  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by OvejaNegra View Post
I'ts a little hard for me to find the information, as i said, i saw the artifacts, did some research, changed the values and
it was ok (for me):


https://forum.doom9.org/showthread.p...87#post1892787





Increasing max merge, gave something like edge ghosting (remember old LCDs?) on dark edges on dark background if i remember correctly


https://forum.doom9.org/showthread.p...93#post1963393




https://forum.doom9.org/showthread.p...28#post1963328


https://forum.doom9.org/showthread.p...52#post1963352



5 - Should i stay with limit-refs 3? (faster)

7 - 4 Looked like a balanced default to me, maybe 2 is better for real life content, but that's just speculation

9 - like: --tu-intra-depth 2 --tu-inter-depth 2 --limit-tu 0 instead of --tu-intra-depth 3 --tu-inter-depth 3 --limit-tu 4

10 - So, for general content and animation, is a better investiment --tu-intra-depth 2 --tu-inter-depth 2 and forget about rect amp?

15 - I remember members commenting about the possible bennefits of tskip for animation, maybe someone has some actual experience with it.

18 - Sorry: --me 3

thanks!
Personally speaking, I think they are experiencing a version of "pigeonhole principle" where some parts look better and some parts look worse and they are biased to their initial hypothesis.

5. it's only 1% faster. but more performance is always good when the downside is negligible right? I wouldn't say you should or should not because it's about trade-off, I think there's no definitive answer.

7. I don't know, I honestly can't tell the difference.

9. sometimes the limit-tu will decrease quality more than "less depth", but we are talking about maybe 1% difference in bd-rate here, and I have biases like everyone.

10. if you think about it, rect and amp are like "cheap version" of Quadtree split right? so when you allow more Quadtree split, they become much less effective in improving compression while having roughly same computational cost. (less effective, but still effective, yes. it's just harder to justify the computational cost, you may get beffter result spending those computational resource on some other parameter, for example, HME)

15. 4x4 intra TUs are rare. the odds of them being "chosen to skip" is beyond rare. I haven't seen it imporve anything in my test and I hypothesize it will not have any significant effect on regular contents.
(Most video sources you can get are already "transformed", what "transform skip" will save you from? for truly lossless cases, just go check how many 4x4 intra TUs are there and how many of them are transform skipped, with tools like YUView)

I mean, don't get me wrong, tskip works well in its designed use case: sharp, detailed, high-contrast and mostly static contents, like screen recording with mostly texts on simple / pure color background. I just don't think it's worth to enable for most regualr contents.

18. UMH and STAR are close in performance, both speed and quality. A cheap alternative is HEX with some large merange (like, hundreds, for HEX that won't slow down much). If you want something better than STAR 57 in many presets, go with HME.
(--hme --hme-search star,umh,star)
(star in second level will make encoding very slow for some mystic reason.)
(If you want something fast, use --hme-search hex,hex,hex which is still better than traditional ME)
(you can adjust each level's range as well, but default is good enough)

(HME, in combination with lookahead-slices will produce non-deterministic result. If you want your encode with exact same settings output identical bitstream, use --lookahead-slices 0 with HME)
(there are other things that will make encode non-deterministic, most common example is VBV)

Last edited by Z2697; 1st February 2025 at 18:40.
Z2697 is offline   Reply With Quote
Old 1st February 2025, 10:36   #9715  |  Link
OvejaNegra
ekTOMBE STUDIOS
 
OvejaNegra's Avatar
 
Join Date: Dec 2005
Location: Cuba
Posts: 257
Thanks!

I'll do some test and i'll what can i improve.

__________________
So, it works or not???
OvejaNegra is offline   Reply With Quote
Old 9th February 2025, 07:12   #9716  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
When using more than 1 slices, x265 creates intra blocks in prediction frames out of nowhere.
Let's do a simple test:
Code:
ffmpeg -lavfi color=gray:s=hd1080:d=10,noise=allf=u:alls=50 -x265-params slices=2 2sli.265
With the given command line, x265 is allowed to encode a completely static video, static noise is applied to add variance to the image, it does not has to be noise, nor does it has to be completely static, you just need some "adequately" high frequency information, a flat color image does not trigger the bug.
The result file is absurdly larger than single slice result.
Viewing the result file with YUView reveals that the encode has a lot of intra blocks near slice borders.


Setting frame-threads=1 seems to mitigate this issue.

The false intra mode in otherwise inter-predictable blocks will make compression ratio a lot worse, if you are planning or forced to use more slices (e.g. UHD Bluray compatible encoding), make sure to use frame-threads=1, slices will provide parallelism silimar to frame-threads.

Last edited by Z2697; 9th February 2025 at 09:01.
Z2697 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:10.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.