From the screenshot it seems Sony is using bilinear downsampling for 444->422, this blends max 2 neighbor pixels as opposed to summed area.
I changed various interpolation options until I got very similar picture.
For 442->444 it is still bilinear upsampling, no changes.