Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 15th May 2008, 07:47   #21  |  Link
sparky
DivX Team
 
Join Date: Oct 2001
Location: San Diego, CA
Posts: 24
Quote:
Originally Posted by Dark Shikari View Post
Hmm, it appears there's still a lot of room for improvement here. As far as I can tell, DivX's decoder contains no SSE3, no SSSE3, and no SSE4

SSSE3 is useful for palignr (luma MC) and pmaddubsw (chroma MC), for example.
This is correct. To my knowledge, CoreAVC does not contain any SSE3, either. (My knowledge could be outdated) We will add support of these instruction sets eventually.

Quote:
There's also some general serious fail in the code, such as emms being put after some assembly functions, which wastes clocks since floating point code is never or almost never used in a decoder (emms should be put before float functions, not after asm functions). There's also the fact that I'm seeing the frame pointer being used, in other words the code was compiled without -fomit-frame-pointer or its <insert compiler name here> equivalent.

I also get the feeling from reading some extremely bad assembly in here that this was built using an autovectorizing compiler of some sort. For example, an 8x7 VSAD (for adaptive deinterlacing, I assume) that keeps its sum in a GPR and repeatedly adds to it from MMX registers (WTF?!). That must be slowing down the function by at least a factor of 3 or 4.
Does not sound familiar at all. Care to copy/paste the code?

The decoder is part of a bigger source tree, there was no effort to "prune" dead code for this release. You could be seeing some assembly that is not used by the H.264 decoder. For example, ASP encoder does use floating point. That should cover most instances of 'emms'. But you have a good point.
sparky is offline   Reply With Quote
Old 15th May 2008, 07:52   #22  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by sparky View Post
This is correct. To my knowledge, CoreAVC does not contain any SSE3, either. (My knowledge could be outdated) We will add support of these instruction sets eventually.
Yup, I don't think it does. I have seen use of lddqu in Elecard though, and FFDshow of course uses SSSE3 for luma/chroma MC.
Quote:
Originally Posted by sparky View Post
The decoder is part of a bigger source tree, there was no effort to "prune" dead code for this release. You could be seeing some assembly that is not used by the H.264 decoder. For example, ASP encoder does use floating point. That should cover most instances of 'emms'. But you have a good point.
Ah, that would explain why I found an MPEG-4 iDCT in there!

Here's the code I found:

Code:
1008962e:       55                      push   %ebp
1008962f:       89 e5                   mov    %esp,%ebp
10089631:       81 ec 08 00 00 00       sub    $0x8,%esp
10089637:       89 7d f8                mov    %edi,0xfffffff8(%ebp)
1008963a:       31 c9                   xor    %ecx,%ecx
1008963c:       8b 45 08                mov    0x8(%ebp),%eax
1008963f:       8b 7d 0c                mov    0xc(%ebp),%edi
10089642:       01 f8                   add    %edi,%eax
10089644:       0f 6f 00                movq   (%eax),%mm0
10089647:       0f 6f 0c 38             movq   (%eax,%edi,1),%mm1
1008964b:       0f f6 c1                psadbw %mm1,%mm0
1008964e:       0f 7e c2                movd   %mm0,%edx
10089651:       01 d1                   add    %edx,%ecx
10089653:       8d 04 78                lea    (%eax,%edi,2),%eax
10089656:       0f 6f 00                movq   (%eax),%mm0
10089659:       0f f6 c8                psadbw %mm0,%mm1
1008965c:       0f 7e ca                movd   %mm1,%edx
1008965f:       01 d1                   add    %edx,%ecx
10089661:       01 f8                   add    %edi,%eax
10089663:       0f 6f 08                movq   (%eax),%mm1
10089666:       0f f6 c1                psadbw %mm1,%mm0
10089669:       0f 7e c2                movd   %mm0,%edx
1008966c:       01 d1                   add    %edx,%ecx
1008966e:       0f 6f 04 38             movq   (%eax,%edi,1),%mm0
10089672:       0f 6f 0c 78             movq   (%eax,%edi,2),%mm1
10089676:       0f f6 c1                psadbw %mm1,%mm0
10089679:       0f 7e c2                movd   %mm0,%edx
1008967c:       01 d1                   add    %edx,%ecx
1008967e:       8d 04 78                lea    (%eax,%edi,2),%eax
10089681:       0f 6f 04 38             movq   (%eax,%edi,1),%mm0
10089685:       0f f6 c8                psadbw %mm0,%mm1
10089688:       0f 7e ca                movd   %mm1,%edx
1008968b:       01 d1                   add    %edx,%ecx
1008968d:       0f 6f 0c 78             movq   (%eax,%edi,2),%mm1
10089691:       0f f6 c1                psadbw %mm1,%mm0
10089694:       0f 7e c2                movd   %mm0,%edx
10089697:       01 d1                   add    %edx,%ecx
10089699:       31 c0                   xor    %eax,%eax
1008969b:       8b 55 10                mov    0x10(%ebp),%edx
1008969e:       d1 e2                   shl    %edx
100896a0:       39 d1                   cmp    %edx,%ecx
100896a2:       0f 9e c0                setle  %al
100896a5:       0f 77                   emms
100896a7:       8b 7d f8                mov    0xfffffff8(%ebp),%edi
100896aa:       89 ec                   mov    %ebp,%esp
100896ac:       5d                      pop    %ebp
100896ad:       c3                      ret
Akupenguin simplified this to the following (using x264 nasm syntax):
Code:
cglobal vsad, 2,3
lea    r2,  [r1*3]
movq   mm0, [r0]
movq   mm1, [r0+r1]
movq   mm2, [r0+r1*2]
movq   mm3, [r0+r2]
lea    r0,  [r0+r1*4]
movq   mm4, [r0]
movq   mm5, [r0+r1]
movq   mm6, [r0+r1*2]
psadbw mm0, mm1
psadbw mm1, mm2
psadbw mm2, mm3
psadbw mm3, mm4
psadbw mm4, mm5
psadbw mm5, mm6
paddd  mm0, mm1
paddd  mm2, mm3
paddd  mm4, mm5
paddd  mm0, mm2
mov    r2,  r2m
paddd  mm0, mm4
shl    r2
xor    eax, eax
movd   r1,  mm0
cmp    r1,  r2
setle  al
ret
I'm also noticing some other interesting stuff--you chose to put dequant as part of the iHCT process instead of as part of the entropy decoding process.

(This would be a whole lot easier if we could get an unstripped debug build, but like that'll ever happen... )

Last edited by Dark Shikari; 15th May 2008 at 07:57.
Dark Shikari is offline   Reply With Quote
Old 15th May 2008, 08:25   #23  |  Link
sparky
DivX Team
 
Join Date: Oct 2001
Location: San Diego, CA
Posts: 24
Okay. That code is very old and it is part of ASP deblocking.

Quote:
This would be a whole lot easier if we could get an unstripped debug build, but like that'll ever happen...
that'll take all the challenge out of it, won't it? We only give unstripped debug build to employees, sorry. You're lucky that Al forgot to ASProtect the filter :P

BTW do you have any lossless files we could test?

Last edited by sparky; 15th May 2008 at 08:32.
sparky is offline   Reply With Quote
Old 15th May 2008, 08:28   #24  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by sparky View Post
that'll take all the challenge out of it, won't it?
Perhaps, but running oprofile in order to identify specific non-obvious functions (e.g. CABAC) does get annoying after a while...
Dark Shikari is offline   Reply With Quote
Old 15th May 2008, 08:45   #25  |  Link
DigitAl56K
Registered User
 
Join Date: Nov 2002
Location: San Diego, CA
Posts: 936
Hey guys,

Although I don't like to do so, I feel I do need to point out that the license with this software does not allow reversing, decompiling, disassembling, and so forth.

Dark Shikari: Your comments are welcome, but let's avoid a public disassembly of the binaries (I do realize we asked you to post a code snippet earlier). You can contact sparky by PM or e-mail for lower-level development issues if you like.

Thanks for your understanding
DigitAl56K is offline   Reply With Quote
Old 15th May 2008, 08:53   #26  |  Link
Gabriel_Bouvigne
L.A.M.E. developer
 
Gabriel_Bouvigne's Avatar
 
Join Date: Dec 2001
Location: Paris - France
Posts: 276
Fgm ?
Gabriel_Bouvigne is offline   Reply With Quote
Old 15th May 2008, 08:54   #27  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by DigitAl56K View Post
Hey guys,

Although I don't like to do so, I feel I do need to point out that the license with this software does not allow reversing, decompiling, disassembling, and so forth.

Dark Shikari: Your comments are welcome, but let's avoid a public disassembly of the binaries (I do realize we asked you to post a code snippet earlier). You can contact sparky by PM or e-mail for lower-level development issues if you like.

Thanks for your understanding
I was going to avoid posting all actual disassembly until you guys asked me to, and I'd be happy to not post anything else (that was what I was going to do anyways! ). I fully understand not posting code particularly in the case that the beta isn't actually public yet. If you want me to elaborate on any more general lower-level comments in a manner that requires you to see the code I'm referring to, I'd be happy to PM it on request instead of post it.

Of course, prohibiting "disassembly" in a license is not merely completely unenforceable (both legally and practically) but totally silly given how trivial disassembly is (objdump -d). And if you think that people actually obey rules about not disassembling programs, well, you might want to look at certain striking similarities between some code in CoreAVC, ffmpeg, and x264...
Quote:
Originally Posted by Gabriel_Bouvigne View Post
Fgm ?
Good question. Support for FGM in DivX would pave the way for x264 support of FGM.

Last edited by Dark Shikari; 15th May 2008 at 09:03.
Dark Shikari is offline   Reply With Quote
Old 15th May 2008, 09:33   #28  |  Link
DigitAl56K
Registered User
 
Join Date: Nov 2002
Location: San Diego, CA
Posts: 936
Thanks Dark Shikari. Instead of discussing the intricacies of dissassembly and software licenses, let me just say that this will save me hours of thoroughly entertaining chit chat with our legal team in the morning

Gabriel: We don't have FGM yet. Perhaps a discussion of features the x264 team is interested in working on and the order that they want to accomplish them might make for an interesting (but separate) thread. It's important to keep in mind that we're in the middle of preparing the future generation of DivX right now and we often can't elaborate too much on our roadmap ("Hey! Here's a super-fast H.264 decoder. Surprise!"), but there are times where we can aim to align our work to support other projects. If we can understand your priorities we can build a better codec. Thanks for your work on LAME btw.

Last edited by DigitAl56K; 15th May 2008 at 09:52.
DigitAl56K is offline   Reply With Quote
Old 15th May 2008, 15:17   #29  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Heh, after more glancing around dissassembly my overall comment would be that if DivX has already managed to get faster than CoreAVC with this, lets just say they have a very wide margin in which to improve it . I suspect most of the efficiency here must be from overall code optimization rather than particularly ingenious ASM, which demonstrates yet again the sheer complexity of H.264 and how much good coding practices matter for performance. An oprofile of ffmpeg is a great way to see this; a massive amount of time is spent in many pure C functions (fill_caches, etc). Whatever design improvements DivX made in order to avoid a lot of this overhead: good job, the result is quite impressive.

The most important thing here is that there's finally a competitor for CoreAVC... perhaps this will force the Core guys to get working

Edit: Perhaps FFDshow might have been used a bit too much as a model during DivX development...
Quote:
22:19 < checkers> apparently it gives bit identical output to ffdshow
22:19 < checkers> including errors in ffdshow decoding :|

Last edited by Dark Shikari; 15th May 2008 at 15:21.
Dark Shikari is offline   Reply With Quote
Old 15th May 2008, 15:24   #30  |  Link
BetaBoy
CoreCodec Founder
 
BetaBoy's Avatar
 
Join Date: Oct 2001
Location: San Francisco
Posts: 1,421
Quote:
Originally Posted by Dark Shikari View Post

The most important thing here is that there's finally a competitor for CoreAVC... perhaps this will force the Core guys to get working
Done.... CoreAVC 2.0 on the devel deck. But as I said in the other thread its good to see DivX with something 'new' and its great to have competition it only makes for better products.
__________________
Dan "BetaBoy" Marlin
Ubiquitous Multimedia Technologies and Developer Tools

http://corecodec.com
BetaBoy is offline   Reply With Quote
Old 15th May 2008, 16:11   #31  |  Link
Inventive Software
Turkey Machine
 
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
Wow! Nice going guys. Finally, an alternative to CoreAVC. Hows interlacing handled? I know it's only been a day, but if it can handle all the streams bob0r keeps throwing at CoreAVC and failing, then you lot are clear winners.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld
Inventive Software is offline   Reply With Quote
Old 15th May 2008, 17:04   #32  |  Link
CiNcH
Registered User
 
CiNcH's Avatar
 
Join Date: Jan 2004
Posts: 567
Got some PAFF samples from DVB broadcasts working through GraphStudio (with Haali Media Splitter). DVBViewer with its demuxer crashes for the time being...

__________________
Bye

Last edited by CiNcH; 15th May 2008 at 17:32.
CiNcH is offline   Reply With Quote
Old 15th May 2008, 17:15   #33  |  Link
sparky
DivX Team
 
Join Date: Oct 2001
Location: San Diego, CA
Posts: 24
Quote:
Originally Posted by Dark Shikari View Post
Heh, after more glancing around dissassembly my overall comment would be that if DivX has already managed to get faster than CoreAVC with this, lets just say they have a very wide margin in which to improve it . I suspect most of the efficiency here must be from overall code optimization rather than particularly ingenious ASM, which demonstrates yet again the sheer complexity of H.264 and how much good coding practices matter for performance.
Come on, our ASM isn't nearly as bad as you say (but if you have any specific ideas how to improve it by a very wide margin, feel free to shoot a PM )

Quote:
Edit: Perhaps FFDshow might have been used a bit too much as a model during DivX development...
FFDshow hasn't been used at all. The decoder produces output that's bitexact with JM.

Last edited by sparky; 15th May 2008 at 17:44.
sparky is offline   Reply With Quote
Old 15th May 2008, 19:27   #34  |  Link
Inventive Software
Turkey Machine
 
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
Quote:
Originally Posted by sparky View Post
FFDshow hasn't been used at all. The decoder produces output that's bitexact with JM.
So it's just sheer luck that the decoder gives the same errors that ffdshow does? If so, would be good to kick errors on both sides.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld
Inventive Software is offline   Reply With Quote
Old 15th May 2008, 19:30   #35  |  Link
sparky
DivX Team
 
Join Date: Oct 2001
Location: San Diego, CA
Posts: 24
Can you give an example?
sparky is offline   Reply With Quote
Old 15th May 2008, 19:58   #36  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by sparky View Post
Come on, our ASM isn't nearly as bad as you say (but if you have any specific ideas how to improve it by a very wide margin, feel free to shoot a PM )
I suspect a lot of the "bad" ASM is old stuff that isn't used in the decoder, due to (as you said) the total mess of a source tree here, with old ASP stuff and so forth.

I'll probably go through it with oprofile in a few days to find what's actually used and see if its really as bad as the initial unused functions demonstrated, or if those are just outliers
Dark Shikari is offline   Reply With Quote
Old 15th May 2008, 20:43   #37  |  Link
DigitAl56K
Registered User
 
Join Date: Nov 2002
Location: San Diego, CA
Posts: 936
Once again, this is beta 1, i.e. not yet perfect! We will clean up the project sources as we move closer to a release.

I would look at it this way: Despite the flaws you think it may have if the decoder is already extremely fast and you believe it still has room for improvement then that is a good thing. If you think the decoder could be more efficient, but it is already outperforming the other decoders, then they could be more efficient still. Maybe we can try to be less negative unless it's really warranted

BTW - I've just gone through my PM's and a whole bunch of people should now have access to the download.

@BetaBoy: Agreed. The choice of two powerful H.264 decoders is better than one

On DVBViewer: Seems to be a common problem there, we'll take a look at it. We have a few DVBViewer users in the Rémoulade group now.
DigitAl56K is offline   Reply With Quote
Old 15th May 2008, 20:44   #38  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,666
Quote:
Originally Posted by DigitAl56K View Post
Despite any flaws you think it may have, if the decoder is already extremely fast and you believe it still has room for improvement then that is a good thing. Also, if you think the decoder could be more efficient, but it is already outperforming the other decoders, then they could be more efficient still.
That was actually the entire point of my posts to begin with

I was surprised that it managed to be significantly faster than Core given the lack of optimization on many levels, which obviously means that if those missing optimizations are added, it'll be even better. And everyone benefits from that.
Dark Shikari is offline   Reply With Quote
Old 15th May 2008, 22:42   #39  |  Link
ChronoCross
Does it really matter?
 
ChronoCross's Avatar
 
Join Date: Jun 2004
Location: Chicago, IL
Posts: 1,542
Just remember to take it with a grain of salt because we all know about Divx's "incredible speed increases" in their encoder were proven to be nothing but marketing hype.
ChronoCross is offline   Reply With Quote
Old 15th May 2008, 22:55   #40  |  Link
DigitAl56K
Registered User
 
Join Date: Nov 2002
Location: San Diego, CA
Posts: 936
Wow, that's quite a flame! hehe

Please sign up at Labs and send me your account name so I can get a copy into your hands
DigitAl56K is offline   Reply With Quote
Reply

Tags
coreavc, divx, h264 decoder, remoulade

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 09:27.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.