Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 17th October 2018, 03:52   #6441  |  Link
Wolfberry
Helenium(Easter)
 
Wolfberry's Avatar
 
Join Date: Aug 2017
Location: Hsinchu, Taiwan
Posts: 98
Quote:
Originally Posted by benwaggoner View Post
No reason it would be a bad idea, I just haven't seen it used before. Generally parameters that have a reliable quality/speed tradeoff are in a preset. But it isn't listed as experimental...
Anyone from multicoreware care to weigh in? What's the tradeoff? Is this something that should get added to the faster presets, or default to on but off in --preset placebo?
Quote:
In fact, this skip is not a fast skip algorithm.
As the sum of split cost is larger than none split CU's best cost (both rdcost of sub-cu and none split CU are without split flag cost), which means splitting into 4 parts at this depth of cu is a worse case compared with none split CU. So that, the remain N * 1/4 parts of CU analysis is useless.
Quote:
Originally Posted by LigH View Post
If I understood the patch comment in the mailing list correctly, it should speed up intra split cost calculation a little while possibly preserving identical output.
splitrd-skip sounds like a small speedup with no trade off, but still disabled by default (according to the doc)
__________________
Monochrome Anomaly
Wolfberry is offline   Reply With Quote
Old 17th October 2018, 11:10   #6442  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 6,807
Quote:
Originally Posted by benwaggoner View Post
Would it be hotter? Or would it just be slower due to thermal throttling.

I wouldn't be surprised if Intel's next big microarchitecture revision makes AVX512 useful in cases where it isn't today. We saw the same thing with Skylake and AVX2.
This will require 10nm process. 14nm++++++++++++ has reached its thermal limits. Upcoming 8C/16T CPUs already are very hot at 5GHz.
Atak_Snajpera is offline   Reply With Quote
Old 17th October 2018, 15:51   #6443  |  Link
RieGo
Registered User
 
Join Date: Nov 2009
Posts: 57
Quote:
Originally Posted by benwaggoner View Post
Would it be hotter? Or would it just be slower due to thermal throttling.
in my personal experience avx512 isn't generating much more heat than avx2 - on x265
so as long as you are only using it on x265 and use a safe thermal throttling setting u should be fine imho

in case i am wrong and anyone else has different results please correct me
RieGo is offline   Reply With Quote
Old 18th October 2018, 07:08   #6444  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,525
Quote:
Originally Posted by Boulder View Post
I recall someone else also complaining about max-merge for smoothing things if the value is too big. I'll need to retest and file a bug report if it's still reproducable. It's been some time since I tested it.

Will also retest deblock 0:0. When I tested it earlier, I did find it smooth things slightly so I went one notch down from the standard -1:-1 of x264's --tune film.
I've done some testing on a 1080p encode, basically I've set CRF 21 and then tested one parameter at a time by comparing still frames. I know it's not the optimal way but it is very hard to notice the differences in motion. What I've tried to compare are areas which show distortion quite well, such as eyes and their surroundings, hair etc. I've also tried picking frames from fast motion and almost still scenes.

I'll post my results as soon as I finish the 720p tests. From what I can tell is that 1080p requires slightly different parameters, but of course it could be that things have changed so much under the hood that my set of parameters have been obsolete all the time

What I already found strange is that deblock doesn't really affect bitrate. For example deblock 6:6 ended up around the same size as deblock 0:0.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 18th October 2018, 08:07   #6445  |  Link
Forteen88
Herr
 
Join Date: Apr 2009
Location: North Europe
Posts: 346
Quote:
Originally Posted by Boulder View Post
What I already found strange is that deblock doesn't really affect bitrate. For example deblock 6:6 ended up around the same size as deblock 0:0.
But did the avg QP, or visual quality remain the same?
Forteen88 is offline   Reply With Quote
Old 18th October 2018, 15:58   #6446  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,525
Quote:
Originally Posted by Forteen88 View Post
But did the avg QP, or visual quality remain the same?
The average QP remained pretty much the same, differences were around 0.01-0.03 units between 1:1 - -3:-3. Based on my tests, the higher values did soften the image more at least in places I checked, so maybe the bits were allocated elsewhere in the image.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 18th October 2018, 16:40   #6447  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,827
Quote:
Originally Posted by RieGo View Post
in my personal experience avx512 isn't generating much more heat than avx2 - on x265
so as long as you are only using it on x265 and use a safe thermal throttling setting u should be fine imho
Thatís what Iíd expect; TDP is TDP, and thermal throttling should kick in regardless of where the heat is coming from.

Picojoules per pixels is the more relevant metric here. Heat produced is equal to power draw. Better coolers can get the heat away from the CPU power better of course, but watts to the CPU is going to be the same as watts of heat to dissipate.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th October 2018, 16:45   #6448  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,827
Quote:
Originally Posted by Boulder View Post
I'll post my results as soon as I finish the 720p tests. From what I can tell is that 1080p requires slightly different parameters, but of course it could be that things have changed so much under the hood that my set of parameters have been obsolete all the time
Pretty much any settings from before x264 2.4 are invalid now due to the new lambda tables. Lots of other things have improved, but that was probably the biggest change.

Quote:
What I already found strange is that deblock doesn't really affect bitrate. For example deblock 6:6 ended up around the same size as deblock 0:0.
Lots of codecsí internal decisions are done ignoring in-loop deblocking to improve speed. So thatís not terribly surprising. And if this is CRF, it is a delta from QP based on spatial complexity, so itís not surprising that it would change.

It IS surprising that you wouldnít see bitrate OR QP change, even with a significant loss of detail. Not that using 6:6 is something anyone is likely to do in practice. But reduced detail should result in lower bitrate and/or QP. With lowered detail (including grain/gain noise), prediction should be more efficient...

Maybe log your repro for this as an issue for MCW?
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th October 2018, 16:48   #6449  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,827
Quote:
Originally Posted by Boulder View Post
The average QP remained pretty much the same, differences were around 0.01-0.03 units between 1:1 - -3:-3. Based on my tests, the higher values did soften the image more at least in places I checked, so maybe the bits were allocated elsewhere in the image.
Hmmm. You should look at it in motion then; maybe the improvements are there.

Playing back at 1/4 speed is generally okay to see temporal artifacts better while still keeping temporal coherence. You can still detect discontinuities between IDR/i/P/B/b that way.

Testing at a higher CRF, like maybe 28, can also make differences at lot more obvious. Working at a quality where most things look pretty good can make differences much harder to detect.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 18th October 2018, 17:24   #6450  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,687
Quote:
Originally Posted by benwaggoner View Post
Would it be hotter? Or would it just be slower due to thermal throttling.

I wouldn't be surprised if Intel's next big microarchitecture revision makes AVX512 useful in cases where it isn't today. We saw the same thing with Skylake and AVX2.
The real problem with AVX512 is the strong downclock on stock CPUs, and the high density of the AVX512 instructions, causing a high power draw in a small die area.

With overclockable CPUs, you can control the AVX512 offset and make it useful for x265, but on Xeons or the like, you probably don't get that level of control, and the downclock during AVX512 might offset the advantages it offers.

Additionally, x265 is a pretty "light" AVX512 load. If you change the offset for a lower downclock and then run a strong AVX512 load (like pure math, ie. FFTs, for a prolonged time), your system may become unstable, due to the extreme density of all the heat and power draw. So its really hard to balance.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 19th October 2018, 07:47   #6451  |  Link
DotJun
Registered User
 
Join Date: Aug 2014
Posts: 17
Quote:
Originally Posted by Atak_Snajpera View Post
You should encode whole movie (130k frames) instead of ultra short clip with few hundred of frames.
The longer you encode the more heat your cpu will produce and hence more aggressive AVX negative offset will be activated.
I encoded a full length movie and temps did not go over 80c which I think is ok for my chip? I should have stated that this computer is in a climate controlled area set to 70F.

My chip is OC'd to 4.5ghz with a -4 offset so that it drops to 4.1 when using avx. I guess I should have tried to compare encoding speed with avx on and off instead of just avx-512 enabled and disabled. I have no throttling issues and the load is a pretty consistent 100% with the occasional 1 second dip down to 87% every 30 seconds or so.

What is the command to disable avx entirely?

Quote:
Originally Posted by RieGo View Post
afaik avx512 enabled and disabled should produce exact same output. at least on my tests it did. can you share your command line?
This is what I use for 4k source since there is no Film preset:
--preset slower --crf 17 --profile main10 --me 3 --subme 5 --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 --qcomp 0.8 --deblock -1:-1 --no-sao --repeat-headers --hdr-opt --range limited --colorprim 9 --transfer 16 --colormatrix 9 --master-display "G(13250, 34500)B(7500, 3000)R(34000, 16000)WP(15635, 16450)L(10000000, 1)"

Last edited by DotJun; 19th October 2018 at 07:51.
DotJun is offline   Reply With Quote
Old 19th October 2018, 12:21   #6452  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 6,807
Quote:
I encoded a full length movie and temps did not go over 80c which I think is ok for my chip? I should have stated that this computer is in a climate controlled area set to 70F.
It is funny how you in USA use two different scales for temperature

Quote:
What is the command to disable avx entirely?
Bad idea! AVX2 is very useful in x265.

Last edited by Atak_Snajpera; 19th October 2018 at 12:23.
Atak_Snajpera is offline   Reply With Quote
Old 19th October 2018, 13:53   #6453  |  Link
Ma
Registered User
 
Join Date: Feb 2015
Posts: 324
Quote:
Originally Posted by DotJun View Post
What is the command to disable avx entirely?
--asm sse4

Main asm levels are:
--asm no
--asm sse2
--asm ssse3
--asm sse4
--asm avx2
--asm avx512

--asm avx512 works a bit different -- it only enable possibility to use AVX-512 in auto-detection of CPU capabilities. For hard use of AVX-512 code (without checking) please use
--asm avx,avx512
Ma is offline   Reply With Quote
Old 19th October 2018, 19:36   #6454  |  Link
excellentswordfight
Lost my old account :(
 
Join Date: Jul 2017
Posts: 83
Not sure if there is anyone that works on x265 is active in this thread anymore, but I've seen some discussion regarding --CTU and --merange both in this thread in the past and recently in a few threads.

From what I’ve read here and from what I’ve experienced from my testing is that using a CU size of 64 is overkill for 1080p and bellow, I see both a speed increase and a multithread increase when lowering it together with merange with no apparent loss in compression. I also saw some posts way back in this thread that suggested that the default CTU size should be based on resolution, wasn’t this implemented for any specific reason or is it just that no one has committed a patch for it?

I also have a question for merange, it says in the docs that the value of 57 is based on CTU-size and search method, but the value is set to 57 for all presets even though CTU-size and search method vary. How come?

Last edited by excellentswordfight; 19th October 2018 at 19:49.
excellentswordfight is offline   Reply With Quote
Old 19th October 2018, 20:37   #6455  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,525
I finished my short tests today, and these are the settings that I found to suit my requirements.

1080p:
Code:
--deblock -3:-3 --no-strong-intra-smoothing --merange 44 --no-sao --qcomp 0.75 --aq-mode 3 --aq-strength 0.8 --ctu 32
--max-tu-size 32 --qg-size 8 --tu-inter-depth 4 --tu-intra-depth 4 --limit-tu 4 --limit-refs 3 --max-merge 3 --rd-refine
--ref 6 --bframes 10 --crf 20.5
720p:
Code:
--deblock -3:-3 --no-strong-intra-smoothing --merange 38 --no-sao --qcomp 0.8 --aq-mode 3 --aq-strength 0.8 --ctu 16
--max-tu-size 16 --qg-size 8 --tu-inter-depth 3 --tu-intra-depth 3 --limit-tu 4 --limit-refs 3 --max-merge 3 --rd-refine
--ref 6 --bframes 10 --crf 19.5
Very few changes between the two, but the CTU/TU size was an obvious change based on the comparison I made. Upon playback, the video looks very good in both resolutions on my 65" LCD even when watched from 1 meter or so. As a test source, I used the first episode of Black Sails as it's generally considered a very high quality Blu-ray release.

I did try finding out differences between deblock 1:1 and -3:-3 in motion but couldn't tell. If someone had ABX'd me, I probably wouldn't have been able to say which one is which. Mind you, the scene I used contained quite a lot of motion and some fast cuts so maybe they were just not visible there. Anyway, I chose a low value as I think it will retain detail and sharpness better. I watch everything from about 3-3.5 meters anyway, so any small blocking won't be that visible.

Edit: both encodes using the preset "veryslow".
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 20th October 2018 at 16:19.
Boulder is offline   Reply With Quote
Old 20th October 2018, 00:30   #6456  |  Link
Dclose
Registered User
 
Join Date: Aug 2014
Posts: 50
Quote:
Originally Posted by benwaggoner View Post
Did you see any issues with using 0:0?
I did a 4k down to 1080 encode the other day, with an accidental 0:0 deblock setting. It looks good in a "slick," "photoshopped" way, but it definitely loses a lot of detail, at least at CRF 23.

That's too much deblocking for me. Though I do put db at 0:0 when increasing CRF and dropping resolution a lot.
Dclose is offline   Reply With Quote
Old 20th October 2018, 05:53   #6457  |  Link
DotJun
Registered User
 
Join Date: Aug 2014
Posts: 17
x265 HEVC Encoder

Quote:
Originally Posted by Ma View Post
--asm sse4



Main asm levels are:

--asm no

--asm sse2

--asm ssse3

--asm sse4

--asm avx2

--asm avx512



--asm avx512 works a bit different -- it only enable possibility to use AVX-512 in auto-detection of CPU capabilities. For hard use of AVX-512 code (without checking) please use

--asm avx,avx512


Thanks, Iíll put up three test runs of default, sse4 and avx512 to see what changes there are in FPS and efficiency.

Will medium or fast preset be ok to use or it has to be slower or higher?
Will it be ok to use ďasm sse4Ē or do I have to specify sse4.2 like my log file shows?

Last edited by DotJun; 20th October 2018 at 06:11.
DotJun is offline   Reply With Quote
Old 20th October 2018, 10:42   #6458  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,525
Quote:
Quote:
--qg-size 16, tested values from 64 to 8, 16 looked best (in terms of distortion in small details again) when compared frame-by-frame. Tested with a 720p encode, so I'll need to check that also with 1080p later.
I would expect it would look better, but at some cost to efficiency due to the signaling overhead. --opt-cu-delta-qp might help that some.
I was wondering about this parameter. Does it actually shift bits inside the frame so that the distribution is closer to the average - so that some areas with fine detail could suffer at the cost of flat areas looking better? Or the other way around as flat areas could be compressed using a lower QP to make them look equally good?
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 20th October 2018, 22:39   #6459  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 2,827
Quote:
Originally Posted by Boulder View Post
I was wondering about this parameter. Does it actually shift bits inside the frame so that the distribution is closer to the average - so that some areas with fine detail could suffer at the cost of flat areas looking better? Or the other way around as flat areas could be compressed using a lower QP to make them look equally good?


AFAIK it works by reducing signaling overhead without changing actual pixels. There are several bitstream options that do things like that. This is the one likely to have the most impact, as WP signaling can take place MANY times per frame. The others are per frame or per GOP.


Sent from my iPhone using Tapatalk
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 21st October 2018, 07:23   #6460  |  Link
DotJun
Registered User
 
Join Date: Aug 2014
Posts: 17
I did a few more test runs with the same parameters I stated before using 50k frames test clip. Here are the results.

Normal Preset:
sse4 2.19fps
avx2 2.89fps
512 3.18fps

They all had the exact same kbps of 23463.

Slower Preset:
sse4 0.71fps
avx2 0.89fps
512 0.95fps

All of them ended up with the exact same kbps of 22868.

My previous test from the other day seems to be a failure since I didn't use correct the correct switches for --asm.
DotJun is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:31.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.