Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 26th February 2016, 11:06   #1  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Providence, RI
Posts: 2,269
(further) lossless compression with pi?

so π (the ratio of a circle's circumference to its diameter) is (very likely to be) normal (mathematically "normal", not "normal" normal) like, a common sense..
which equals to any combo of 0-9 will eventually appear somewhere within the infinite digits of π..

well, after typical lossless compression, the statistic redundancy of the compressed data will be like, none and makes further (lossless) compression almost impossible

but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps:
1. convert the binary to many base 10 data blocks
2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2
3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2
4. repeat step 1-3 till all data blocks are matched

what do you think?
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 26th February 2016, 17:57   #2  |  Link
vivan
/人 ◕ ‿‿ ◕ 人\
 
Join Date: May 2011
Location: Russia
Posts: 649
πfs (except pt. 3 because it doesn't work this way)
vivan is offline   Reply With Quote
Old 26th February 2016, 18:54   #3  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Providence, RI
Posts: 2,269
Why doesn't step3 work..?
I mean, it's totally possible that the matching position of some data block is so far far away from the decimal mark and it takes even more space to store the pointer than to store the data block directly (something like uint8_t * takes 8 bytes and uint8_t only takes one)
I think step3 is required to make sure the process is actually compressing the data not blowing it even fatter
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 26th February 2016, 18:55   #4  |  Link
wonkey_monkey
Formerly davidh*****
 
wonkey_monkey's Avatar
 
Join Date: Jan 2004
Posts: 1,956
It won't work.

Quote:
1. convert the binary to many base 10 data blocks
No need; a normal number, such as pi is suspected to be, has an equal distribution of digits in all bases. You could do it in hexadecimal, base 256, or binary.

Quote:
if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching,
Here's where it will fall down: in order to differentiate between raw data blocks and compressed data blocks, you will need to add additional bytes (or at the very least, bits). These will outweigh the miniscule savings you'll make even if you do happen to find matching blocks early enough in pi (which in almost all cases you won't).
__________________
My AviSynth filters / I'm the Doctor

Last edited by wonkey_monkey; 26th February 2016 at 19:01.
wonkey_monkey is offline   Reply With Quote
Old 26th February 2016, 19:44   #5  |  Link
feisty2
I'm Siri
 
feisty2's Avatar
 
Join Date: Oct 2012
Location: Providence, RI
Posts: 2,269
Well, that's a shame..
How about using it as some kind of data encryption?
Like, only take the first 1000 digits of pi
Find the closest (not perfect) match in this 1000 digits and convert the data block to position, length and error..
Sounds funny to me
__________________
If I got new ideas, will post here: https://github.com/IFeelBloated
feisty2 is offline   Reply With Quote
Old 29th February 2016, 02:18   #6  |  Link
ggf31416
Registered User
 
Join Date: Jul 2007
Posts: 17
You can't losslessly compress random data (in average). A simplified argument goes like this:
Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.
ggf31416 is offline   Reply With Quote
Old 29th February 2016, 22:30   #7  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,122
Quote:
Originally Posted by feisty2 View Post
but I guess π could help here, since any random base 10 data will appear somewhere within the digits of π like eventually, an extra compression could be done by the following steps:
1. convert the binary to many base 10 data blocks
2. search the data block in the digits of π, if any match found, mark the matching digit position a1 and the length of data block a2
3. if the size of a1+a2 >= the data block itself, abort the rest steps and start the next data block matching, if not, replace the whole data block with a1 and a2
4. repeat step 1-3 till all data blocks are matched

what do you think?
What you describe here is essentially "dictionary compression" (cf. LZ77). And pretty much any general purpose lossless compressor uses this in one way or another

I think the most important difference between your idea and how dictionary compression is implemented in practice is that you are using a fixed dictionary (i.e. the digits of π), while practical implementations use a dynamic dictionary. The latter means that the dictionary that is being built/updated while the data is being compressed/decompressed. Most often, a "sliding window" buffer, containing the last N symbols that have been compressed/decompressed, is used here.

Another difference is that your dictionary has an infinite size (i.e. the digits of π), while piratical application use a bounded size dictionary (e.g. the last N symbols that have been processed). Overall, I think your idea performs worse than the algorithms use in practice, because (a) your dictionary does not adapt to the input data and (b) because of that you will need a VERY large "a1" to find a match - which means "a1+a2" will almost always be bigger than the data block itself.

Quote:
Originally Posted by ggf31416 View Post
You can't losslessly compress random data (in average). A simplified argument goes like this:
Take all 2^n possible different files of size n. Compress them with your random compressor by 1-bit. Now you have 2^n files of size n-1, but there can be only 2^(n-1) different files of size n-1, then you can't decompress correctly half of your files.
This is called "Schubfachprinzip" or "Pigeonhole principle". It means that if you have exactly N places where you can put something, but you have more than N items to be stored, then at least one place unavoidably ends up containing more than one item! This explains why a "perfect" compression, that makes every possible input (file) smaller, can not exist in reality. Or, in other words, if the compressor makes some inputs (files) smaller, there also have to be some files that become bigger.

If a "perfect" compressor (which makes every input smaller by at least one byte) did exist, you could apply it recursively again and again, until eventually every input gets compress to a single byte (or even zero byte?). Clearly makes no sense
__________________
There was of course no way of knowing whether you were being watched at any given moment.
How often, or on what system, the Thought Police plugged in on any individual wire was guesswork.



Last edited by LoRd_MuldeR; 29th February 2016 at 22:51.
LoRd_MuldeR is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 03:25.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.