Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

 
 
Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
Old 2nd November 2015, 15:01   #1  |  Link
viper_room
Registered User
 
Join Date: May 2008
Posts: 12
speeding up x264

Hello,

I’m trying to run the x264 encoder on an ARM processor (2 cores, 1.2GHz, ARM Cortex-A9 with v7-A instruction set, with Neon extensions) and I would like to run it in real time.

I use this command line “x264 --input-res 1920x1080 --preset ultrafast -I -1 -b 0 --bitrate 3500 --fps 25 --nr 0 --threads 2 -o /mnt/ramdisk/1.264 /mnt/ramdisk/00001.yuv”

(It's a low delay application so that's why the -b 0 and -I -1, I know the quality is bad but that's not the concern here)

I obtain about 12fps performance. The profiling gives the following functions as being the major ones:
8.67% x264_quant_4x4x4_neon
6.81% x264_macroblock_cache_load_progressive
4.78% x264_plane_copy_neon
4.29% x264_frame_init_lowres_core_neon
4.19% x264_macroblock_cache_save
3.93% x264_mb_encode_chroma
3.73% x264_mb_encode_i16x16
3.23% x264_mc_copy_w16_aligned_neon

So I’m trying to see how I could accelerate the code (any suggestion is welcome!). As we are also looking into low delay stuff so mainly only I frames for now or I and P (but B have too long a delay). For the I frames only bitstream (which is the one running fastest I could get) I’m thinking about restraining the number of intra predictions tried or the number of possible partitions. I know this will result in a loss of quality or bitrate but my aim is to simplify the encoder to have it run in real time…

I’ve got two questions about the code:
• Is there any documentation for the code? Not all steps are self-explanatory…e.g. I couldn’t find out what the function mc_copy_w16 does as it is also called when we compress with I frames only (meaning when there is no motion compensation…)
• I’m also trying to figure out the macroblock_cache_load and save functions as they represent around 11% of the profiling. As I’m using the ultrafast preset, the only prediction modes tried are the 4 intra 16x16. So I’m surprised that we spend that much time loading and saving MBs in the cache and I’m trying to save some time there…

Thanks!
viper_room is offline   Reply With Quote
 

Tags
performance, x264

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 13:44.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.