About this site @ GitHub: Projects on Music/Coding contests/Games
Status:
question/project completed
full story Part I here and Part II here
Here I will put the technical details of how to prove by reduction that we can not do better than the shannon entropy of the population using Pooled Testing.
We will take as granted the following property. And we will reduce the problem of Pooling faster-than-Shannon to the central problem behind this property, is communicating faster-than-shannon possible?
Given a random experiment with different outcomes (say a dice has 6 outcomes), whose distribution is not necessasarily uniform. We wish to communicate over a bit-channel the outcome. You may share beforehand conventions, algorithms, since what we are interested is the marginal cost – we wish to repeat a huge number of times, therefore all overhead beforehand is irrelavent, since the random experiment has not been performed, and you can not smuggle any data in the preparation stage. We require correctness: your compression should be lossless, the decompression should be faithful.
In this case, the marginal (i.e. per single run of the experiment) Cost (unit:bit) is bounded by the Entropy associated to the probability distribution – whatever the algorithm, whatever the distribution.
Let us give the specs of a deterministic algorithm on Pooled Testing:
We write the following program to compress a N-Tuple Boolean Variable whose sub-Boolean Variable are described by a probability vector P. The subvariables are independently distributed.
Decompression works trivially, the only difference being we read the interaction log received and input the yes/no to the pools asked accordingly. Since the APT algorithm is supposedly deterministic, the interaction sequence is identical to compression. Finally, we take the declaration as the decompressed original message.
From the point of view of the APT algorithm, the interaction it sees is identically distributed to a scenario where COVID19 pooled testing is being done according to a P probabiltity. It can not guess that you were actually using it to compress a variable over the internet…
We have thus realized a compression algorithm on Boolean Tuples described by any probabilty P. According to the theorem stated above the size of the message should be smaller than the entropy. We just prooved that the number of boolean inputs (each corresponding to a COVID19 testing pool) must be inferior to the Shannon Entropy. CQED.
Say that we wish to compress statistics on a battlefield continuously to the Central Command. We know that the first variable indicates the weather: sunny/cloudy (In England that’s 10-90 chance…). The next 100 bits describes if the 100 soldiers in your battailion are operational or not. After some statistics you know that they are healthy, and the odds are 99-01. You can compress using the APT algorithm taking as parameter the probability list 0.1,0.99,0.99 ….
tags: algorithm