(further) lossless compression with pi?

feisty2 · 26th February 2016, 11:06

so π (the ratio of a circle's circumference to its diameter) is (very likely to be) normal (mathematically "normal", not "normal" normal) like, a common sense..
which equals to any combo of 0-9 will eventually appear somewhere within the infinite digits of π..

well, after typical lossless compression, the statistic redundancy of the compressed data will be like, none and makes further (lossless) compression almost impossible

but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps:
1. convert the binary to many base 10 data blocks
2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2
3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2
4. repeat step 1-3 till all data blocks are matched

what do you think?

vivan · 26th February 2016, 17:57

πfs (except pt. 3 because it doesn't work this way)

feisty2 · 26th February 2016, 18:54

Why doesn't step3 work..?
I mean, it's totally possible that the matching position of some data block is so far far away from the decimal mark and it takes even more space to store the pointer than to store the data block directly (something like uint8_t * takes 8 bytes and uint8_t only takes one)
I think step3 is required to make sure the process is actually compressing the data not blowing it even fatter

wonkey_monkey · 26th February 2016, 18:55

It won't work.

Quote:

1. convert the binary to many base 10 data blocks

No need; a normal number, such as pi is suspected to be, has an equal distribution of digits in all bases. You could do it in hexadecimal, base 256, or binary.

Quote:

if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching,

Here's where it will fall down: in order to differentiate between raw data blocks and compressed data blocks, you will need to add additional bytes (or at the very least, bits). These will outweigh the miniscule savings you'll make even if you do happen to find matching blocks early enough in pi (which in almost all cases you won't).

feisty2 · 26th February 2016, 19:44

Well, that's a shame..
How about using it as some kind of data encryption?
Like, only take the first 1000 digits of pi
Find the closest (not perfect) match in this 1000 digits and convert the data block to position, length and error..
Sounds funny to me

ggf31416 · 29th February 2016, 02:18

You can't losslessly compress random data (in average). A simplified argument goes like this:
Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.

LoRd_MuldeR · 29th February 2016, 22:30

Quote:

Originally Posted by feisty2

but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps:
1. convert the binary to many base 10 data blocks
2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2
3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2
4. repeat step 1-3 till all data blocks are matched

what do you think?

What you describe here is essentially "dictionary compression" (cf. LZ77). And pretty much any general purpose lossless compressor uses this in one way or another

I think the most important difference between your idea and how dictionary compression is implemented in practice is that you are using a fixed dictionary (i.e. the digits of π), while practical implementations use a dynamic dictionary. The latter means that the dictionary that is being built/updated while the data is being compressed/decompressed. Most often, a "sliding window" buffer, containing the last N symbols that have been compressed/decompressed, is used here.

Another difference is that your dictionary has an infinite size (i.e. the digits of π), while piratical application use a bounded size dictionary (e.g. the last N symbols that have been processed). Overall, I think your idea performs worse than the algorithms use in practice, because (a) your dictionary does not adapt to the input data and (b) because of that you will need a VERY large "a1" to find a match - which means "a1+a2" will almost always be bigger than the data block itself.

Quote:

Originally Posted by ggf31416

You can't losslessly compress random data (in average). A simplified argument goes like this:
Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.

This is called "Schubfachprinzip" or "Pigeonhole principle". It means that if you have exactly N places where you can put something, but you have more than N items to be stored, then at least one place unavoidably ends up containing more than one item! This explains why a "perfect" compression, that makes every possible input (file) smaller, can not exist in reality. Or, in other words, if the compressor makes some inputs (files) smaller, there also have to be some files that become bigger.

If a "perfect" compressor (which makes every input smaller by at least one byte) did exist, you could apply it recursively again and again, until eventually every input gets compress to a single byte (or even zero byte?). Clearly makes no sense

26th February 2016, 11:06	#1 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	(further) lossless compression with pi? so π (the ratio of a circle's circumference to its diameter) is (very likely to be) normal (mathematically "normal", not "normal" normal) like, a common sense.. which equals to any combo of 0-9 will eventually appear somewhere within the infinite digits of π.. well, after typical lossless compression, the statistic redundancy of the compressed data will be like, none and makes further (lossless) compression almost impossible but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps: 1. convert the binary to many base 10 data blocks 2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2 3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2 4. repeat step 1-3 till all data blocks are matched what do you think?

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode

26th February 2016, 17:57	#2 \| Link
vivan /人 ◕ ‿‿ ◕ 人\ Join Date: May 2011 Location: Russia Posts: 643	πfs (except pt. 3 because it doesn't work this way)

26th February 2016, 18:54	#3 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	Why doesn't step3 work..? I mean, it's totally possible that the matching position of some data block is so far far away from the decimal mark and it takes even more space to store the pointer than to store the data block directly (something like uint8_t * takes 8 bytes and uint8_t only takes one) I think step3 is required to make sure the process is actually compressing the data not blowing it even fatter

26th February 2016, 19:44	#5 \| Link
feisty2 I'm Siri Join Date: Oct 2012 Location: void Posts: 2,633	Well, that's a shame.. How about using it as some kind of data encryption? Like, only take the first 1000 digits of pi Find the closest (not perfect) match in this 1000 digits and convert the data block to position, length and error.. Sounds funny to me

29th February 2016, 02:18	#6 \| Link
ggf31416 Registered User Join Date: Jul 2007 Posts: 16	You can't losslessly compress random data (in average). A simplified argument goes like this: Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.