Why doesn't step3 work..?
I mean, it's totally possible that the matching position of some data block is so far far away from the decimal mark and it takes even more space to store the pointer than to store the data block directly (something like uint8_t * takes 8 bytes and uint8_t only takes one)
I think step3 is required to make sure the process is actually compressing the data not blowing it even fatter
|