From my Vapoursynth experience it's not zscale that is the speed limiting factor, but I'm only guessing.
ffmpeg isn't very good at multi-threading at all, and I think the tonemap filter is just made as a proof of concept as a single-threaded demo. Guess you will have to ask on the ffmpeg-user list.
|