Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
2nd November 2015, 15:01 | #1 | Link |
Registered User
Join Date: May 2008
Posts: 12
|
speeding up x264
Hello,
I’m trying to run the x264 encoder on an ARM processor (2 cores, 1.2GHz, ARM Cortex-A9 with v7-A instruction set, with Neon extensions) and I would like to run it in real time. I use this command line “x264 --input-res 1920x1080 --preset ultrafast -I -1 -b 0 --bitrate 3500 --fps 25 --nr 0 --threads 2 -o /mnt/ramdisk/1.264 /mnt/ramdisk/00001.yuv” (It's a low delay application so that's why the -b 0 and -I -1, I know the quality is bad but that's not the concern here) I obtain about 12fps performance. The profiling gives the following functions as being the major ones: 8.67% x264_quant_4x4x4_neon 6.81% x264_macroblock_cache_load_progressive 4.78% x264_plane_copy_neon 4.29% x264_frame_init_lowres_core_neon 4.19% x264_macroblock_cache_save 3.93% x264_mb_encode_chroma 3.73% x264_mb_encode_i16x16 3.23% x264_mc_copy_w16_aligned_neon So I’m trying to see how I could accelerate the code (any suggestion is welcome!). As we are also looking into low delay stuff so mainly only I frames for now or I and P (but B have too long a delay). For the I frames only bitstream (which is the one running fastest I could get) I’m thinking about restraining the number of intra predictions tried or the number of possible partitions. I know this will result in a loss of quality or bitrate but my aim is to simplify the encoder to have it run in real time… I’ve got two questions about the code: • Is there any documentation for the code? Not all steps are self-explanatory…e.g. I couldn’t find out what the function mc_copy_w16 does as it is also called when we compress with I frames only (meaning when there is no motion compensation…) • I’m also trying to figure out the macroblock_cache_load and save functions as they represent around 11% of the profiling. As I’m using the ultrafast preset, the only prediction modes tried are the 4 intra 16x16. So I’m surprised that we spend that much time loading and saving MBs in the cache and I’m trying to save some time there… Thanks! |
Tags |
performance, x264 |
Thread Tools | Search this Thread |
Display Modes | |
|
|