Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th December 2024, 20:00   #9661  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,635
About mbtree and x264.
Disabling mbtree is recommended when you target Blu-Ray, according mp3dom results and tests. But i think it's the only case (or of course, cases using very very similar encode parameters). The issue seems to be the small (1s) keyint value, not working well with mbtree. With standard [250] keyint value, mbtree is doing a proper job (still according mp3dom).
__________________
My github.

Last edited by jpsdr; 16th December 2024 at 23:51.
jpsdr is offline   Reply With Quote
Old 16th December 2024, 22:11   #9662  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 615
Quote:
Originally Posted by jpsdr View Post
About mbtree and x264.
Disabling mbtree is recommended when you targer Blu-Ray, according mp3dom results and tests. But i think it's the only case (or of course, cases using very very similar encode parameters). The issue seems to be the small (1s) keyint value, not working well with mbtree. With standard [250] keyint value, mbtree is doing a proper job (still according mp3dom).
And DS's paper: https://archive.org/download/x264_mb...264_mbtree.pdf
GeoffreyA is offline   Reply With Quote
Old 25th December 2024, 05:57   #9663  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 864
Hmm, something doesn't feel right.
Code:
source/encoder/slicetype.cpp

void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
...

    double totalDuration = 0.0;
    for (int j = 0; j <= numframes; j++)
        totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;

    double averageDuration = totalDuration / (numframes + 1);
I'm "investigating" (f* around and find out) the "inconsistency in bitstreams encoded by different compiler target flags produced binaries" which reportedly is associated with cutree.
Me, being a incompetent wannabe programmer, also kind of narrowed things down (or, rather, "confirmed") to cutree after, what, like 4 days? And is now moderately sure the quoted codes (in combination with some compiler / ISA optimization) caused the inconsistency.
(only averageDuration is used in the following codes)
As x265 itself does not have variable framerate awareness, this block code is unnecessary, I think, all these code will just result us back to "(double)m_param->fpsDenom / m_param->fpsNum", theoretically, if not the weirdness of FP math is happening, you know, the .1+.2 != .3 and non-associative stuff.

Code:
double averageDuration = (double) m_param->fpsDenom / m_param->fpsNum;
After replacing the loop with just this line, things seemed to be consistent. (But not consistent with "pre-this-modification" version)
Alternatively you can put "#pragma GCC novec" or other compiler's equivalent right before the for loop. (This retains the consistency with "pre-this-modification" "nocona" (default -march of GCC) / "SSE3" version) But this is compiler specific.

I presonally think the former "solution" is more elegant.

The "inconsistency between bitstreams encoded by different compilers produced binaries" still exists after this modification. (e.g. GCC vs Clang)

Some suspicious executables (10bit only)
https://files.catbox.moe/nhla9t.7z

Am I missing something? Please help me. plzzzzzzzz

Last edited by Z2697; 25th December 2024 at 10:33.
Z2697 is offline   Reply With Quote
Old 25th December 2024, 07:26   #9664  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 523
One way to proceed would be to make a small repro case test application and then objdump -S that, checking what the differences in the generated code are.
__________________
My github...
rwill is offline   Reply With Quote
Old 25th December 2024, 07:48   #9665  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 615
Quote:
Originally Posted by Z2697 View Post
Hmm, something doesn't feel right.
Code:
source/encoder/slicetype.cpp

void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
...

    double totalDuration = 0.0;
    for (int j = 0; j <= numframes; j++)
        totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;

    double averageDuration = totalDuration / (numframes + 1);
I'm "investigating" (f* around and find out) the "inconsistency in bitstreams encoded by different compiler target flags produced binaries" which reportedly is associated with cutree.
Me, being a incompetent wannabe programmer, also kind of narrowed things down (or, rather, "confirmed") to cutree after, what, like 4 days? And is now moderately sure the quoted codes (in combination with some compiler / ISA optimization) caused the inconsistency.
(only averageDuration is used in the following codes)
As x265 itself does not have variable framerate awareness, this block code is unnecessary, I think, all these code will just result us back to "(double)m_param->fpsDenom / m_param->fpsNum", theoretically, if not the weirdness of FP math is happening, you know, the .1+.2 != .3 and non-associative stuff.

Code:
double averageDuration = (double) m_param->fpsDenom / m_param->fpsNum;
After replacing the loop with just this line, things seemed to be consistent. (But not consistent with "pre-this-modification" version)
The "inconsistency between bitstreams encoded by different compilers produced binaries" still exists after this modification. (e.g. GCC vs Clang)

Am I missing something? Please help me. plzzzzzzzz
You're right because all that loop does is multiply (Denom / Numer) by (numframes + 1), using repeated addition, and those values do not change within the loop. I wonder if there is some reason for such a superfluous piece of code. Perhaps some compiler issue back in the day, or the person was half asleep?
GeoffreyA is offline   Reply With Quote
Old 25th December 2024, 08:35   #9666  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 864
Quote:
Originally Posted by GeoffreyA View Post
You're right because all that loop does is multiply (Denom / Numer) by (numframes + 1), using repeated addition, and those values do not change within the loop. I wonder if there is some reason for such a superfluous piece of code. Perhaps some compiler issue back in the day, or the person was half asleep?
It looks like "they were planning on VFR support" to me, but eventually that didn't happen.
Z2697 is offline   Reply With Quote
Old 25th December 2024, 10:20   #9667  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 615
Quote:
Originally Posted by Z2697 View Post
It looks like "they were planning on VFR support" to me, but eventually that didn't happen.
Looks like it.
GeoffreyA is offline   Reply With Quote
Old 29th December 2024, 23:27   #9668  |  Link
higher
Registered User
 
Join Date: Apr 2017
Location: Hungary
Posts: 9
Quote:
Originally Posted by benwaggoner View Post
Yeah, past a certain resolution and frame threads, more cores will start winning over better cores. M4 Max likely wins for 1080p or lower, and possibly 4K if using only 1 frame thread.

Unfortunately the M4 Max is only in MacBook Pro, which is a whole lot more expensive and bigger for headless work. Mac Mini tops out with the M4 Pro currently.
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).

The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.

M4 Pro: 4m 20s
5900X: 5m 25s

Quite impresssive.
higher is offline   Reply With Quote
Old 29th December 2024, 23:48   #9669  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,546
Quote:
Originally Posted by higher View Post
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).

The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.

M4 Pro: 4m 20s
5900X: 5m 25s

Quite impresssive.
Well not really simply because 9950X is more 2X (at least) powerfull than 5900X. 9950X at stock will certainely produce something like ~2 m 30 s to encode this source with x265.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 30th December 2024 at 06:04.
Sagittaire is offline   Reply With Quote
Old 30th December 2024, 08:56   #9670  |  Link
Ritsuka
Registered User
 
Join Date: Mar 2007
Posts: 113
Of course, but the 9950X has a 170W TDP, and the M4 Pro is what, 40 W at max, with 6 less performance cores than the 9950X.

And it seems all the latest arm64 optimizations are again stuck on the x265-devel mailing list.
Ritsuka is offline   Reply With Quote
Old 30th December 2024, 17:03   #9671  |  Link
higher
Registered User
 
Join Date: Apr 2017
Location: Hungary
Posts: 9
Quote:
Originally Posted by Sagittaire View Post
Well not really simply because 9950X is more 2X (at least) powerfull than 5900X. 9950X at stock will certainely produce something like ~2 m 30 s to encode this source with x265.
The 9950X is only 50% faster than 5900X at 4K resolution. I guess an M4 Max could almost match an 9950X while consuming a lot less power.

Screenshot 2024-12-30 165716.png
higher is offline   Reply With Quote
Old 30th December 2024, 23:12   #9672  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,546
Quote:
Originally Posted by higher View Post
The 9950X is only 50% faster than 5900X at 4K resolution. I guess an M4 Max could almost match an 9950X while consuming a lot less power.

Attachment 18790
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.

I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.

When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power

Stockfisk:
5900X: 14.52 Mips
9950X: 30.78 Mips (+111%)

Blender:
5900X: 114.9 s
9950X: 56 s (+105%)

V-Ray:
5900X: 21538
9950X: 48899 (+127%)

__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 30th December 2024 at 23:36.
Sagittaire is offline   Reply With Quote
Old 31st December 2024, 20:49   #9673  |  Link
ShortKatz
Registered User
 
Join Date: Aug 2018
Location: Germany
Posts: 153
Quote:
Originally Posted by higher View Post
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).

The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.

M4 Pro: 4m 20s
5900X: 5m 25s

Quite impresssive.
I would be quite interested how much faster my M4 Max would be compared to the M4 Pro.
ShortKatz is offline   Reply With Quote
Old 31st December 2024, 23:26   #9674  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 505
x265 v4.1+62-441e1e4
Built on December 31 2024, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit

https://bitbucket.org/multicoreware/.../branch/master

DL :
https://www.mediafire.com/file/6nh9e7dfb72b3pi
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.

Last edited by Barough; 31st December 2024 at 23:29.
Barough is offline   Reply With Quote
Old 2nd January 2025, 14:07   #9675  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 394
Quote:
Originally Posted by Sagittaire View Post
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.

I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.

When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power

Stockfisk:
5900X: 14.52 Mips
9950X: 30.78 Mips (+111%)

Blender:
5900X: 114.9 s
9950X: 56 s (+105%)

V-Ray:
5900X: 21538
9950X: 48899 (+127%)
These are two different methodologies, this is not a case of "correct" and "not correct" way of doing it. Single instance encoding is still a thing, and actually the most common case for most users, so benchmarking single instance is still very much relevant.

Most software does not have perfect parallelization scaling, and in most cases were that is the case, i.e. 3d-rendering and simulations etc, those loads usually gain more to be calculated on GPUs anyway. And although I think it makes perfect sense to test both cases here, cause you can just run two encodes at the same time even though you dont wanna start doing chunk-encoding to get "more" out of your CPU. Its not like we have a history of starting to run multiple parallel benchmark of a software cause we dont find the thread-scaling good enough when that the results dont see the full "potential" of the CPU, cause this argument can be made for most of them (audio encoding, compression, compiling etc).

Last edited by excellentswordfight; 2nd January 2025 at 14:43.
excellentswordfight is offline   Reply With Quote
Old 2nd January 2025, 20:46   #9676  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,546
Quote:
Originally Posted by excellentswordfight View Post
These are two different methodologies, this is not a case of "correct" and "not correct" way of doing it. Single instance encoding is still a thing, and actually the most common case for most users, so benchmarking single instance is still very much relevant.

Most software does not have perfect parallelization scaling, and in most cases were that is the case, i.e. 3d-rendering and simulations etc, those loads usually gain more to be calculated on GPUs anyway. And although I think it makes perfect sense to test both cases here, cause you can just run two encodes at the same time even though you dont wanna start doing chunk-encoding to get "more" out of your CPU. Its not like we have a history of starting to run multiple parallel benchmark of a software cause we dont find the thread-scaling good enough when that the results dont see the full "potential" of the CPU, cause this argument can be made for most of them (audio encoding, compression, compiling etc).

Yes but time to encode wav to mp3 with lame is not really a problem.

Encoding video source can take several hours. And multipart or ABR Ladder to saturate CPU are simply well-known techniques in the professional world.

For exemple ABR Ladder is full option include directly in x265 codec.

Make multiscession encoding is option too and directly in handbrake.

Why buy a $600 CPU to do the fastest possible encoding in AOM AV1, when you can do it 4 times faster with a $200 CPU using the right encoding technique.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 2nd January 2025 at 20:52.
Sagittaire is offline   Reply With Quote
Old 3rd January 2025, 10:07   #9677  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 864
Quote:
Originally Posted by Sagittaire View Post
Yes but time to encode wav to mp3 with lame is not really a problem.

Encoding video source can take several hours. And multipart or ABR Ladder to saturate CPU are simply well-known techniques in the professional world.

For exemple ABR Ladder is full option include directly in x265 codec.

Make multiscession encoding is option too and directly in handbrake.

Why buy a $600 CPU to do the fastest possible encoding in AOM AV1, when you can do it 4 times faster with a $200 CPU using the right encoding technique.
In fact you can buy a whole "lowest end" mac mini m4 version with the money of a "just CPU" 9950X. (of course the performance is far away)
Since m4 pro only comes with severely overpriced memory, if you are planning on only use the CPU to do "work" (but why) that's just not worth it.
Or maybe it's the other way around, the basic model of mac mini m4 is underpriced? You know, like the razor and blades model?
I don't know, I don't own a mac.
(wait a minute, m4 pro has 2 variants? and they are very different errrr
it's a great cpu but apple is just confusing)

It's a great chip, but it doesn't come as just a chip, I just don't want to buy it this way. (and the "large scale" customers are likely don't want as well)

It seems like even a m4 in basic model mac mini has more transistors than 9950X (although with integrated memory and GPU), and with the best process node at the time, I'm not surprised that it's performant and efficient, and the best model (m4 max 12p+4e) can come close to 9950X with less power draw.
Physics works, how surprising.

I have to say this is very out of topic now.

Last edited by Z2697; 3rd January 2025 at 10:25.
Z2697 is offline   Reply With Quote
Old 3rd January 2025, 17:12   #9678  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 864
Hell yeah let's just error out if input resolution exceeds 8192x4320
Z2697 is offline   Reply With Quote
Old 3rd January 2025, 22:25   #9679  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 505
x265 v4.1+78-5223ea7
Built on January 03 2025, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit

https://bitbucket.org/multicoreware/.../branch/master

DL :
https://www.mediafire.com/file/86pd5zd03csrk67
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.
Barough is offline   Reply With Quote
Old 12th January 2025, 17:50   #9680  |  Link
tormento
Acid fr0g
 
tormento's Avatar
 
Join Date: May 2002
Location: Italy
Posts: 3,065
Finally I have a working build with --frame-dup working and I'd like to play with it a bit, as I mostly encode animes.

What value of --dup-threshold should be ok? The default is 70 but it doesn't tell too much to me.

Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
__________________
@turment on Telegram
tormento is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:55.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.