Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th December 2024, 14:23   #9661  |  Link
LunaRabbit
Lunarian
 
LunaRabbit's Avatar
 
Join Date: Dec 2024
Posts: 15
Quote:
Originally Posted by Boulder View Post
If you are on Windows, jpsdr does offer pre-compiled binaries in GitHub.
Yes I'm aware. I just prefer to build from source even for the Windows machine I still have kicking around. I'm used to is since I normally live in Emacs.

Whenever I get it working I'll update the .diff and maybe build some bins for anyone that wants to try it. Just trying to leave things nicer for the next person. I do appreciate having a diff to work from it's really helpful.

Quote:
Originally Posted by Z2697 View Post
Here're 2 test results of "cutree-strength" I just ran.
I leave them as links because they are long images.
Please ignore the speed, I ran them in VM.

https://files.catbox.moe/4qugbj.png
https://files.catbox.moe/1tqlou.png

The curves from "corresponding" qcomp and "cutree-strength" values are almost completely aligned with each other, maybe not easy to see in CRF results since the final bitrate differs quite a lot, so I did a 2-pass test.

(qcomp 0.65 corresponds to cutree-strength 1.75 and qocmp 0.7 corresponds to cutree-strength 1.5)

Also keep in mind this is not a valid quality comparison across different "qcomp and cutree-strength pairs", the matrics usually "performs" poorly when it comes to "bits re-distribution". (my layman's understanding is: aq is bits re-distribution within frame, cutree/mbtree is bits re-distribution across frames)
Interesting thanks for testing it. Whenever I get it working I'll run some tests and see how it compares to my older settings with qcomp 0.7 and 0.8. Metrics are nice but all I really care about is if there is any gain that I can see in the material. I have a source I'm very familiar with that should serve as a good test for seeing if cutree-strength makes any difference. As I need to re-do the whole thing due to some improvements in the filters it requires as of late.

Last edited by LunaRabbit; 16th December 2024 at 14:24. Reason: quote full post with links to images since pagnitation kicked in.
LunaRabbit is offline   Reply With Quote
Old 16th December 2024, 14:25   #9662  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
frame-rc enables rate control mode (CRF, ABR or CQP) to be reconfigured per-frame, I think.

It enables the control, but how to actually use control... zones? qpfile? api calls? IDK.
Z2697 is offline   Reply With Quote
Old 16th December 2024, 14:27   #9663  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by LunaRabbit View Post
Yes I'm aware. I just prefer to build from source even for the Windows machine I still have kicking around. I'm used to is since I normally live in Emacs.

Whenever I get it working I'll update the .diff and maybe build some bins for anyone that wants to try it. Just trying to leave things nicer for the next person. I do appreciate having a diff to work from it's really helpful.



Interesting thanks for testing it. Whenever I get it working I'll run some tests and see how it compares to my older settings with qcomp 0.7 and 0.8. Metrics are nice but all I really care about is if there is any gain that I can see in the material. I have a source I'm very familiar with that should serve as a good test for seeing if cutree-strength makes any difference. As I need to re-do the whole thing due to some improvements in the filters it requires as of late.
But why not build it directly from the mod branch? Why "extract" the patch and apply to master branch then build? Seems unnecessary.

As for the metric I think it works OK to evaluate the similarity of the "corresponding qcomp and cutree-strength pairs", it's just not valid when comparing different pairs.
I used my eyes as well, of course. Just the summary report is easier to post.

Last edited by Z2697; 16th December 2024 at 14:32.
Z2697 is offline   Reply With Quote
Old 16th December 2024, 14:56   #9664  |  Link
LunaRabbit
Lunarian
 
LunaRabbit's Avatar
 
Join Date: Dec 2024
Posts: 15
Quote:
Originally Posted by Z2697 View Post
frame-rc enables rate control mode (CRF, ABR or CQP) to be reconfigured per-frame, I think.

It enables the control, but how to actually use control... zones? qpfile? api calls? IDK.
That's what I'm wondering as well. If it can be enabled from the qpfile and just work. The docs do not mention anything about what kind of syntax is expected.

Quote:
Originally Posted by Z2697 View Post
But why not build it directly from the mod branch? Why "extract" the patch and apply to master branch then build? Seems unnecessary.
As far as I'm aware there are no mod branches that have the cu-tree modifications working on v4.1 of x265. If you know of one please share.

I do appreciate the testing and providing the helpful chart. Didn't mean to imply otherwise.
LunaRabbit is offline   Reply With Quote
Old 16th December 2024, 18:10   #9665  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by LunaRabbit View Post
That's what I'm wondering as well. If it can be enabled from the qpfile and just work. The docs do not mention anything about what kind of syntax is expected.



As far as I'm aware there are no mod branches that have the cu-tree modifications working on v4.1 of x265. If you know of one please share.

I do appreciate the testing and providing the helpful chart. Didn't mean to imply otherwise.
I think it would be easier to cherry-pick the related commits of cutree-strength mod into jpsdr's x265_mod branch, the code behind it is way less.
Z2697 is offline   Reply With Quote
Old 16th December 2024, 20:00   #9666  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,388
About mbtree and x264.
Disabling mbtree is recommended when you target Blu-Ray, according mp3dom results and tests. But i think it's the only case (or of course, cases using very very similar encode parameters). The issue seems to be the small (1s) keyint value, not working well with mbtree. With standard [250] keyint value, mbtree is doing a proper job (still according mp3dom).
__________________
My github.

Last edited by jpsdr; 16th December 2024 at 23:51.
jpsdr is offline   Reply With Quote
Old 16th December 2024, 22:11   #9667  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 260
Quote:
Originally Posted by jpsdr View Post
About mbtree and x264.
Disabling mbtree is recommended when you targer Blu-Ray, according mp3dom results and tests. But i think it's the only case (or of course, cases using very very similar encode parameters). The issue seems to be the small (1s) keyint value, not working well with mbtree. With standard [250] keyint value, mbtree is doing a proper job (still according mp3dom).
And DS's paper: https://archive.org/download/x264_mb...264_mbtree.pdf
GeoffreyA is offline   Reply With Quote
Old 25th December 2024, 05:57   #9668  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Hmm, something doesn't feel right.
Code:
source/encoder/slicetype.cpp

void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
...

    double totalDuration = 0.0;
    for (int j = 0; j <= numframes; j++)
        totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;

    double averageDuration = totalDuration / (numframes + 1);
I'm "investigating" (f* around and find out) the "inconsistency in bitstreams encoded by different compiler target flags produced binaries" which reportedly is associated with cutree.
Me, being a incompetent wannabe programmer, also kind of narrowed things down (or, rather, "confirmed") to cutree after, what, like 4 days? And is now moderately sure the quoted codes (in combination with some compiler / ISA optimization) caused the inconsistency.
(only averageDuration is used in the following codes)
As x265 itself does not have variable framerate awareness, this block code is unnecessary, I think, all these code will just result us back to "(double)m_param->fpsDenom / m_param->fpsNum", theoretically, if not the weirdness of FP math is happening, you know, the .1+.2 != .3 and non-associative stuff.

Code:
double averageDuration = (double) m_param->fpsDenom / m_param->fpsNum;
After replacing the loop with just this line, things seemed to be consistent. (But not consistent with "pre-this-modification" version)
Alternatively you can put "#pragma GCC novec" or other compiler's equivalent right before the for loop. (This retains the consistency with "pre-this-modification" "nocona" (default -march of GCC) / "SSE3" version) But this is compiler specific.

I presonally think the former "solution" is more elegant.

The "inconsistency between bitstreams encoded by different compilers produced binaries" still exists after this modification. (e.g. GCC vs Clang)

Some suspicious executables (10bit only)
https://files.catbox.moe/nhla9t.7z

Am I missing something? Please help me. plzzzzzzzz

Last edited by Z2697; 25th December 2024 at 10:33.
Z2697 is offline   Reply With Quote
Old 25th December 2024, 07:26   #9669  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Location: Berlin, Germany
Posts: 429
One way to proceed would be to make a small repro case test application and then objdump -S that, checking what the differences in the generated code are.
__________________
My github...
rwill is offline   Reply With Quote
Old 25th December 2024, 07:48   #9670  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 260
Quote:
Originally Posted by Z2697 View Post
Hmm, something doesn't feel right.
Code:
source/encoder/slicetype.cpp

void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
...

    double totalDuration = 0.0;
    for (int j = 0; j <= numframes; j++)
        totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;

    double averageDuration = totalDuration / (numframes + 1);
I'm "investigating" (f* around and find out) the "inconsistency in bitstreams encoded by different compiler target flags produced binaries" which reportedly is associated with cutree.
Me, being a incompetent wannabe programmer, also kind of narrowed things down (or, rather, "confirmed") to cutree after, what, like 4 days? And is now moderately sure the quoted codes (in combination with some compiler / ISA optimization) caused the inconsistency.
(only averageDuration is used in the following codes)
As x265 itself does not have variable framerate awareness, this block code is unnecessary, I think, all these code will just result us back to "(double)m_param->fpsDenom / m_param->fpsNum", theoretically, if not the weirdness of FP math is happening, you know, the .1+.2 != .3 and non-associative stuff.

Code:
double averageDuration = (double) m_param->fpsDenom / m_param->fpsNum;
After replacing the loop with just this line, things seemed to be consistent. (But not consistent with "pre-this-modification" version)
The "inconsistency between bitstreams encoded by different compilers produced binaries" still exists after this modification. (e.g. GCC vs Clang)

Am I missing something? Please help me. plzzzzzzzz
You're right because all that loop does is multiply (Denom / Numer) by (numframes + 1), using repeated addition, and those values do not change within the loop. I wonder if there is some reason for such a superfluous piece of code. Perhaps some compiler issue back in the day, or the person was half asleep?
GeoffreyA is offline   Reply With Quote
Old 25th December 2024, 08:35   #9671  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 370
Quote:
Originally Posted by GeoffreyA View Post
You're right because all that loop does is multiply (Denom / Numer) by (numframes + 1), using repeated addition, and those values do not change within the loop. I wonder if there is some reason for such a superfluous piece of code. Perhaps some compiler issue back in the day, or the person was half asleep?
It looks like "they were planning on VFR support" to me, but eventually that didn't happen.
Z2697 is offline   Reply With Quote
Old 25th December 2024, 10:20   #9672  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 260
Quote:
Originally Posted by Z2697 View Post
It looks like "they were planning on VFR support" to me, but eventually that didn't happen.
Looks like it.
GeoffreyA is offline   Reply With Quote
Old 29th December 2024, 23:27   #9673  |  Link
higher
Registered User
 
Join Date: Apr 2017
Location: Hungary
Posts: 9
Quote:
Originally Posted by benwaggoner View Post
Yeah, past a certain resolution and frame threads, more cores will start winning over better cores. M4 Max likely wins for 1080p or lower, and possibly 4K if using only 1 frame thread.

Unfortunately the M4 Max is only in MacBook Pro, which is a whole lot more expensive and bigger for headless work. Mac Mini tops out with the M4 Pro currently.
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).

The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.

M4 Pro: 4m 20s
5900X: 5m 25s

Quite impresssive.
higher is offline   Reply With Quote
Old 29th December 2024, 23:48   #9674  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,530
Quote:
Originally Posted by higher View Post
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).

The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.

M4 Pro: 4m 20s
5900X: 5m 25s

Quite impresssive.
Well not really simply because 9950X is more 2X (at least) powerfull than 5900X. 9950X at stock will certainely produce something like ~2 m 30 s to encode this source with x265.
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 30th December 2024 at 06:04.
Sagittaire is offline   Reply With Quote
Old 30th December 2024, 08:56   #9675  |  Link
Ritsuka
Registered User
 
Join Date: Mar 2007
Posts: 103
Of course, but the 9950X has a 170W TDP, and the M4 Pro is what, 40 W at max, with 6 less performance cores than the 9950X.

And it seems all the latest arm64 optimizations are again stuck on the x265-devel mailing list.
Ritsuka is offline   Reply With Quote
Old 30th December 2024, 17:03   #9676  |  Link
higher
Registered User
 
Join Date: Apr 2017
Location: Hungary
Posts: 9
Quote:
Originally Posted by Sagittaire View Post
Well not really simply because 9950X is more 2X (at least) powerfull than 5900X. 9950X at stock will certainely produce something like ~2 m 30 s to encode this source with x265.
The 9950X is only 50% faster than 5900X at 4K resolution. I guess an M4 Max could almost match an 9950X while consuming a lot less power.

Name:  Screenshot 2024-12-30 165716.png
Views: 417
Size:  176.6 KB
higher is offline   Reply With Quote
Old 30th December 2024, 23:12   #9677  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,530
Quote:
Originally Posted by higher View Post
The 9950X is only 50% faster than 5900X at 4K resolution. I guess an M4 Max could almost match an 9950X while consuming a lot less power.

Attachment 18790
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.

I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.

When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power

Stockfisk:
5900X: 14.52 Mips
9950X: 30.78 Mips (+111%)

Blender:
5900X: 114.9 s
9950X: 56 s (+105%)

V-Ray:
5900X: 21538
9950X: 48899 (+127%)

__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 30th December 2024 at 23:36.
Sagittaire is offline   Reply With Quote
Old 31st December 2024, 20:49   #9678  |  Link
ShortKatz
Registered User
 
Join Date: Aug 2018
Location: Germany
Posts: 139
Quote:
Originally Posted by higher View Post
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).

The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.

M4 Pro: 4m 20s
5900X: 5m 25s

Quite impresssive.
I would be quite interested how much faster my M4 Max would be compared to the M4 Pro.
ShortKatz is offline   Reply With Quote
Old 31st December 2024, 23:26   #9679  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 492
x265 v4.1+62-441e1e4
Built on December 31 2024, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit

https://bitbucket.org/multicoreware/.../branch/master

DL :
https://www.mediafire.com/file/6nh9e7dfb72b3pi
__________________
Do NOT re-post any of my Mediafire links. Download & re-host the content(s) if you want to share it somewhere else.

Last edited by Barough; 31st December 2024 at 23:29.
Barough is offline   Reply With Quote
Old 2nd January 2025, 14:07   #9680  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 352
Quote:
Originally Posted by Sagittaire View Post
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.

I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.

When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power

Stockfisk:
5900X: 14.52 Mips
9950X: 30.78 Mips (+111%)

Blender:
5900X: 114.9 s
9950X: 56 s (+105%)

V-Ray:
5900X: 21538
9950X: 48899 (+127%)
These are two different methodologies, this is not a case of "correct" and "not correct" way of doing it. Single instance encoding is still a thing, and actually the most common case for most users, so benchmarking single instance is still very much relevant.

Most software does not have perfect parallelization scaling, and in most cases were that is the case, i.e. 3d-rendering and simulations etc, those loads usually gain more to be calculated on GPUs anyway. And although I think it makes perfect sense to test both cases here, cause you can just run two encodes at the same time even though you dont wanna start doing chunk-encoding to get "more" out of your CPU. Its not like we have a history of starting to run multiple parallel benchmark of a software cause we dont find the thread-scaling good enough when that the results dont see the full "potential" of the CPU, cause this argument can be made for most of them (audio encoding, compression, compiling etc).

Last edited by excellentswordfight; 2nd January 2025 at 14:43.
excellentswordfight is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:07.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2025, vBulletin Solutions Inc.