Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 Doom9's Forum (further) lossless compression with pi?
 Register FAQ Calendar Search Today's Posts Mark Forums Read

 26th February 2016, 11:06 #1  |  Link feisty2 I'm Siri     Join Date: Oct 2012 Location: Providence, RI Posts: 2,269 (further) lossless compression with pi? so π (the ratio of a circle's circumference to its diameter) is (very likely to be) normal (mathematically "normal", not "normal" normal) like, a common sense.. which equals to any combo of 0-9 will eventually appear somewhere within the infinite digits of π.. well, after typical lossless compression, the statistic redundancy of the compressed data will be like, none and makes further (lossless) compression almost impossible but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps: 1. convert the binary to many base 10 data blocks 2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2 3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2 4. repeat step 1-3 till all data blocks are matched what do you think? __________________ If I got new ideas, will post here: https://github.com/IFeelBloated
 26th February 2016, 17:57 #2  |  Link vivan /人 ◕ ‿‿ ◕ 人\   Join Date: May 2011 Location: Russia Posts: 649 πfs (except pt. 3 because it doesn't work this way)
 26th February 2016, 18:54 #3  |  Link feisty2 I'm Siri     Join Date: Oct 2012 Location: Providence, RI Posts: 2,269 Why doesn't step3 work..? I mean, it's totally possible that the matching position of some data block is so far far away from the decimal mark and it takes even more space to store the pointer than to store the data block directly (something like uint8_t * takes 8 bytes and uint8_t only takes one) I think step3 is required to make sure the process is actually compressing the data not blowing it even fatter __________________ If I got new ideas, will post here: https://github.com/IFeelBloated
26th February 2016, 18:55   #4  |  Link
wonkey_monkey
Formerly davidh*****

Join Date: Jan 2004
Posts: 1,956
It won't work.

Quote:
 1. convert the binary to many base 10 data blocks
No need; a normal number, such as pi is suspected to be, has an equal distribution of digits in all bases. You could do it in hexadecimal, base 256, or binary.

Quote:
 if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching,
Here's where it will fall down: in order to differentiate between raw data blocks and compressed data blocks, you will need to add additional bytes (or at the very least, bits). These will outweigh the miniscule savings you'll make even if you do happen to find matching blocks early enough in pi (which in almost all cases you won't).
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 26th February 2016 at 19:01.

 26th February 2016, 19:44 #5  |  Link feisty2 I'm Siri     Join Date: Oct 2012 Location: Providence, RI Posts: 2,269 Well, that's a shame.. How about using it as some kind of data encryption? Like, only take the first 1000 digits of pi Find the closest (not perfect) match in this 1000 digits and convert the data block to position, length and error.. Sounds funny to me __________________ If I got new ideas, will post here: https://github.com/IFeelBloated
 29th February 2016, 02:18 #6  |  Link ggf31416 Registered User   Join Date: Jul 2007 Posts: 17 You can't losslessly compress random data (in average). A simplified argument goes like this: Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.
29th February 2016, 22:30   #7  |  Link
LoRd_MuldeR
Software Developer

Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,122
Quote:
 Originally Posted by feisty2 but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps: 1. convert the binary to many base 10 data blocks 2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2 3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2 4. repeat step 1-3 till all data blocks are matched what do you think?
What you describe here is essentially "dictionary compression" (cf. LZ77). And pretty much any general purpose lossless compressor uses this in one way or another

I think the most important difference between your idea and how dictionary compression is implemented in practice is that you are using a fixed dictionary (i.e. the digits of π), while practical implementations use a dynamic dictionary. The latter means that the dictionary that is being built/updated while the data is being compressed/decompressed. Most often, a "sliding window" buffer, containing the last N symbols that have been compressed/decompressed, is used here.

Another difference is that your dictionary has an infinite size (i.e. the digits of π), while piratical application use a bounded size dictionary (e.g. the last N symbols that have been processed). Overall, I think your idea performs worse than the algorithms use in practice, because (a) your dictionary does not adapt to the input data and (b) because of that you will need a VERY large "a1" to find a match - which means "a1+a2" will almost always be bigger than the data block itself.

Quote:
 Originally Posted by ggf31416 You can't losslessly compress random data (in average). A simplified argument goes like this: Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.
This is called "Schubfachprinzip" or "Pigeonhole principle". It means that if you have exactly N places where you can put something, but you have more than N items to be stored, then at least one place unavoidably ends up containing more than one item! This explains why a "perfect" compression, that makes every possible input (file) smaller, can not exist in reality. Or, in other words, if the compressor makes some inputs (files) smaller, there also have to be some files that become bigger.

If a "perfect" compressor (which makes every input smaller by at least one byte) did exist, you could apply it recursively again and again, until eventually every input gets compress to a single byte (or even zero byte?). Clearly makes no sense
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.

Last edited by LoRd_MuldeR; 29th February 2016 at 22:51.