Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd January 2025, 20:46   #9681  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,530
Quote:
Originally Posted by excellentswordfight View Post
These are two different methodologies, this is not a case of "correct" and "not correct" way of doing it. Single instance encoding is still a thing, and actually the most common case for most users, so benchmarking single instance is still very much relevant.

Most software does not have perfect parallelization scaling, and in most cases were that is the case, i.e. 3d-rendering and simulations etc, those loads usually gain more to be calculated on GPUs anyway. And although I think it makes perfect sense to test both cases here, cause you can just run two encodes at the same time even though you dont wanna start doing chunk-encoding to get "more" out of your CPU. Its not like we have a history of starting to run multiple parallel benchmark of a software cause we dont find the thread-scaling good enough when that the results dont see the full "potential" of the CPU, cause this argument can be made for most of them (audio encoding, compression, compiling etc).

Yes but time to encode wav to mp3 with lame is not really a problem.

Encoding video source can take several hours. And multipart or ABR Ladder to saturate CPU are simply well-known techniques in the professional world.

For exemple ABR Ladder is full option include directly in x265 codec.

Make multiscession encoding is option too and directly in handbrake.

Why buy a $600 CPU to do the fastest possible encoding in AOM AV1, when you can do it 4 times faster with a $200 CPU using the right encoding technique.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 2nd January 2025 at 20:52.
Sagittaire is offline   Reply With Quote
Old 3rd January 2025, 10:07   #9682  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by Sagittaire View Post
Yes but time to encode wav to mp3 with lame is not really a problem.

Encoding video source can take several hours. And multipart or ABR Ladder to saturate CPU are simply well-known techniques in the professional world.

For exemple ABR Ladder is full option include directly in x265 codec.

Make multiscession encoding is option too and directly in handbrake.

Why buy a $600 CPU to do the fastest possible encoding in AOM AV1, when you can do it 4 times faster with a $200 CPU using the right encoding technique.
In fact you can buy a whole "lowest end" mac mini m4 version with the money of a "just CPU" 9950X. (of course the performance is far away)
Since m4 pro only comes with severely overpriced memory, if you are planning on only use the CPU to do "work" (but why) that's just not worth it.
Or maybe it's the other way around, the basic model of mac mini m4 is underpriced? You know, like the razor and blades model?
I don't know, I don't own a mac.
(wait a minute, m4 pro has 2 variants? and they are very different errrr
it's a great cpu but apple is just confusing)

It's a great chip, but it doesn't come as just a chip, I just don't want to buy it this way. (and the "large scale" customers are likely don't want as well)

It seems like even a m4 in basic model mac mini has more transistors than 9950X (although with integrated memory and GPU), and with the best process node at the time, I'm not surprised that it's performant and efficient, and the best model (m4 max 12p+4e) can come close to 9950X with less power draw.
Physics works, how surprising.

I have to say this is very out of topic now.

Last edited by Z2697; 3rd January 2025 at 10:25.
Z2697 is offline   Reply With Quote
Old 3rd January 2025, 17:12   #9683  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Hell yeah let's just error out if input resolution exceeds 8192x4320
Z2697 is offline   Reply With Quote
Old 3rd January 2025, 22:25   #9684  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 492
x265 v4.1+78-5223ea7
Built on January 03 2025, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit

https://bitbucket.org/multicoreware/.../branch/master

DL :
https://www.mediafire.com/file/86pd5zd03csrk67
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.
Barough is offline   Reply With Quote
Old 12th January 2025, 17:50   #9685  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,871
Finally I have a working build with --frame-dup working and I'd like to play with it a bit, as I mostly encode animes.

What value of --dup-threshold should be ok? The default is 70 but it doesn't tell too much to me.

Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 12th January 2025, 20:11   #9686  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,180
Quote:
Originally Posted by tormento View Post
Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
Well, in Avisynth I'd say:

YDifferenceFromPrevious()
UDifferenceFromPrevious()
VDifferenceFromPrevious()

and the corresponding

YDifferenceToNext()
UDifferenceToNext()
VDifferenceToNext()
FranceBB is offline   Reply With Quote
Old 12th January 2025, 20:14   #9687  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,871
Quote:
Originally Posted by FranceBB View Post
Well, in Avisynth I'd say:

YDifferenceFromPrevious()
UDifferenceFromPrevious()
VDifferenceFromPrevious()

and the corresponding

YDifferenceToNext()
UDifferenceToNext()
VDifferenceToNext()

As far as I’ve read, it uses some sort of PSNR.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 13th January 2025, 18:40   #9688  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,388
Quote:
Originally Posted by tormento View Post
Finally I have a working build with --frame-dup working
You said that both mine and Patman's build didn't work.
Have you done some changes in the code to make it work ?
__________________
My github.
jpsdr is offline   Reply With Quote
Old 13th January 2025, 18:54   #9689  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,871
Quote:
Originally Posted by jpsdr View Post
Have you done some changes in the code to make it work ?
I am talking about the latest Patman's build. I don't know what changes have been done.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 14th January 2025, 18:21   #9690  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,955
Quote:
Originally Posted by tormento View Post
Finally I have a working build with --frame-dup working and I'd like to play with it a bit, as I mostly encode animes.

What value of --dup-threshold should be ok? The default is 70 but it doesn't tell too much to me.

Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
How did you get a working --frame-dup? Do you know what the fix was?

Since the setting hadn't been working until this, we've not had much experience with it. Experimentally, I'd play around with different values and look at how frames get classified in a bitstream analyzer (or just comparing the log file).

What you want to see is frames that are duplicated in the source are mostly set as duplicated frames in the bitstream, and that frames that aren't duplicates in the source are distinct frames in the output.

Anime tends to have more frames duplicates than not, and are pretty distinct between no dup and dup, so it's really the best case for this feature, and you can probably use a lot more aggressive settings than for other classes of content.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 14th January 2025, 20:24   #9691  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,388
I take a look at the last Patman's commits, and didn't notice anything specific to a frame-dup fix. If it's fixed, it seems to be a side effect of something else (unless i missed something).
__________________
My github.
jpsdr is offline   Reply With Quote
Old 15th January 2025, 05:28   #9692  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by benwaggoner View Post
How did you get a working --frame-dup? Do you know what the fix was?

Since the setting hadn't been working until this, we've not had much experience with it. Experimentally, I'd play around with different values and look at how frames get classified in a bitstream analyzer (or just comparing the log file).

What you want to see is frames that are duplicated in the source are mostly set as duplicated frames in the bitstream, and that frames that aren't duplicates in the source are distinct frames in the output.

Anime tends to have more frames duplicates than not, and are pretty distinct between no dup and dup, so it's really the best case for this feature, and you can probably use a lot more aggressive settings than for other classes of content.
"frame-dup" has been working all along, it's that it's not working as you'd expect from the name.
There're detailed explanations a few pages back in this thread.

I think the claims about how it's "not working" or "now working" by tormento is untrustworthy.
And what he actually means is that somehow on his computer the x265 crashed when enabling frame-dup, and is now not crashing, without any change to the code regadring frame-dup feature. Not how the feature itself is broken.

No offense. I mean the fact that the observation changed without any related interaction is not trustworthy. Not the guy (hopefully).

BTW, was there a x264 feature (or x264 mod) that does the similar thing? I think there was but I don't trust my memory.

Last edited by Z2697; 15th January 2025 at 06:33.
Z2697 is offline   Reply With Quote
Old 15th January 2025, 08:54   #9693  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,871
Counter-order, guys.

It happens that I am not having --frame-dup working anymore. It's now crashing as the previous build did.

I am trying to reproduce the conditions under which it worked without errors, i.e. resolution and parameters.

Stay tuned but I fear it will be a long path, as it was a random test and I need to recall the conditions when it worked.
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Old 15th January 2025, 11:05   #9694  |  Link
cubicibo
Registered User
 
Join Date: Feb 2022
Posts: 166
Quote:
Originally Posted by Z2697 View Post
To have a better understanding of what this feature means: this feature removes frames based on PSNR thresholding, and signal picture timing SEIs to keep the correct... picture timing... yeah... which no commonly available decoder can recognize.
I don't question the looseness of the logic to detect dupe frames, but a decoder that does not implement Pic Timing SEI then relies solely on the decodied frame PTS, and the stream becomes VFR, with strictly identical output to the end user.

If a decoder does not support VFR, they should (shall?) support Pic Timing SEI. The decoded frame PTS timeline should appropriately reflect this. Gaps in the timeline should be filled by using the appropriate pic_struct entry to deliver CFR.

Quote:
Originally Posted by Z2697 View Post
Even if it works as it should, the PSNR thresholding is not ideal to begin with, and the bits saved with removing near identical frames are, well, did you know picture timing SEIs cost bits?
It's not necessarily about bits, but better utilization of B and P frames. Also I am pretty sure 2K or 4K frame dupes are more costly than a few bytes.

EDIT: I dived in x265 codebase, and I think it does not set the frame PTS appropriately

Last edited by cubicibo; 15th January 2025 at 12:52.
cubicibo is offline   Reply With Quote
Old 15th January 2025, 14:01   #9695  |  Link
higher
Registered User
 
Join Date: Apr 2017
Location: Hungary
Posts: 9
Quote:
Originally Posted by Sagittaire View Post
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.

I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.

When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power
It's likely that Zen 5 is not much of an advance in x265 encoding compared to other workloads.

TPU uses x265 with preset slow at 4K resolution. It fully saturates my 5900X and I'm guessing it fully saturates a 9900X as well. Yet, the 9900X is only 25% faster than 5900x while the 9950x is 27% faster than 5950x in he TPU benchmark and it sounds about right.

Power consumption is on a different level though. The 9900X consumes around 170W fully loaded while an M4 Pro consumes less than 50W.
Unfortunately X86 is years behind in this regards and also in single core performance.
higher is offline   Reply With Quote
Old 15th January 2025, 18:53   #9696  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by cubicibo View Post
I don't question the looseness of the logic to detect dupe frames, but a decoder that does not implement Pic Timing SEI then relies solely on the decodied frame PTS, and the stream becomes VFR, with strictly identical output to the end user.

If a decoder does not support VFR, they should (shall?) support Pic Timing SEI. The decoded frame PTS timeline should appropriately reflect this. Gaps in the timeline should be filled by using the appropriate pic_struct entry to deliver CFR.



It's not necessarily about bits, but better utilization of B and P frames. Also I am pretty sure 2K or 4K frame dupes are more costly than a few bytes.

EDIT: I dived in x265 codebase, and I think it does not set the frame PTS appropriately
Does HEVC raw bitstream support VFR at all?
And as I already mentioned, x265 with container output mod or FFmpeg libx265 can get around with that VFR "hack".

As for the bits, x265 now signals that timing SEI for every frame, no matter it's duped (removed) or not.
It results in I think around 2.5kbps for 24fps video. How many duped frames should be removed to compensate that (average out), and how to decide the thresholding... well I just don't even bother.
But yeah, that will outweight the signaling overhead very soon.

Worst case it's just a few kbps, not really a big deal at all.

But since he is not seeing incorrect timing, the probability of that worst case happening is high.

Last edited by Z2697; 15th January 2025 at 19:01.
Z2697 is offline   Reply With Quote
Old 15th January 2025, 19:46   #9697  |  Link
cubicibo
Registered User
 
Join Date: Feb 2022
Posts: 166
VFR can be signaled both on the entire stream or at a CWS level. But specifying the actual picture output-presentation delay is overly complicated here.

Anyway, it does not matter for the current problem. pic_struct should be used with CFR. But frame entry time in decoder must be adapted with respect to the last pic struct instruction. I can't find any such code in x265, so VBV conformance must be way off.
cubicibo is offline   Reply With Quote
Old 15th January 2025, 20:26   #9698  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by cubicibo View Post
VFR can be signaled both on the entire stream or at a CWS level. But specifying the actual picture output-presentation delay is overly complicated here.

Anyway, it does not matter for the current problem. pic_struct should be used with CFR. But frame entry time in decoder must be adapted with respect to the last pic struct instruction. I can't find any such code in x265, so VBV conformance must be way off.
The timeing SEI we were talking contains and utilizes pic_struct, unless you are referring a different thing.
It shouldn't be necessary for "normal" CFR, only for "de-duped" CFR.

Last edited by Z2697; 15th January 2025 at 21:10. Reason: is pic_struct -> contains and utilizes pic_struct
Z2697 is offline   Reply With Quote
Old 15th January 2025, 20:29   #9699  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 2,871
Well, it happens that I've found something about the --frame-dup crashing on my PC.

The very same video can be encoded when
  • 1920×1080
  • 1600×900
  • 1280×720
  • 960×540
but crashes miserably with
  • 1440×810
  • 1500×844
with error:

Video encoding returned exit code: -1073741819 (0xC0000005)

Any idea?
__________________
@turment on Telegram

Last edited by tormento; 15th January 2025 at 20:33.
tormento is offline   Reply With Quote
Old 15th January 2025, 22:11   #9700  |  Link
cubicibo
Registered User
 
Join Date: Feb 2022
Posts: 166
Quote:
Originally Posted by Z2697 View Post
The timeing SEI we were talking contains and utilizes pic_struct, unless you are referring a different thing.
It shouldn't be necessary for "normal" CFR, only for "de-duped" CFR.
We're talking about the same thing, and I am telling you that x265 does not seem to make use of that field in the ratecontrol code. Frame following duplicated ones aren't shifted in time appropriately, buffer has less time to refill and hence fewer bits are allocated to the frames. More problematic, computed HRD fields are probably wrong.

Since there's a copy paste error on the CLI documentation for --pic-struct (claims to be needed for HLG), I will assume they never tested the feature or verified it was working correctly.
cubicibo is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:35.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.