Yes internally doing a BitBlt(..., dest_pitch*2, ....) would be twice as fast as the weave I suggested, which does the above blit twice.
The resizer core does struggle to do a point-resize efficently, it stupidly goes through the full motion, multiplying every pixel by 1 in a loop of 1 cycle.
|