Monday, August 13, 2018

Can you make a fair dice roll out of a rigged coin?



This post might be boring for non-math people out there, and too elementary for math people, but this was really intriguing me during the airport in my Europe trip so I thought about it. Here are those thoughts that have been intriguing me the past couple of days. 

Let’s start this with a simpler question can you simulate a dice roll with a fair coin. Lets consider coin tosses as a binary sequence of heads or tails. For example a sequence of three coin tosses could look like HHT which comes from heads, then heads, then tails on any of the 7 other possibilities. Given this we can show that for three tosses there are 8 possibilities each of which are equally like and happen with probability 1/8. Listed below are all of the possibilities.

HHH
HHT
HTH
THH
TTH
THT
HTT
TTT

So we know that any of these sequences are equally likely, just as the roll on a dice, but the problem is currently this gives us each side with probability 1/8. Well what if we reflip or reject some samples. when we see the first two sequences (HHH and HHT) we can chose to reflip, ok so this means with probability 1/4 we start reflipping, and with probability 3/4 we don't reflip. Well then we can easily label each sequence such that it mimics a dice roll as follows:

TTT -> 1
TTH -> 2
THT -> 3
THH -> 4
HTT -> 5
HTH -> 6

We also have shown that each of these sequences happens with probability 1/8, so they have the same probability of occurring. Now two questions arise, how many flips will we do on average, and can we get the coin to mimic any rational probability. The answer to both of these questions are promising, lets start with the later, choosing 1/6 was a fairly arbitrary choice can we do other numbers of course. Let’s do examples and then you will see my algorithm. We will solve for these two probabilities 3/10 and 7/100. 

Ok so first to make my life easier lets say H = 1, T = 0. 

We can write the decimal number 10 in binary as 1010. This means if we want to simulate a probability of a number with a denominator of 10, we can just look for coin flips of 0000 -> 1001
Which give us ten distinct numbers. 
Namely,

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001

Chose the first 3 as success the other 7 as failures and you have a probability 3/10. All other samples from 1010 -> 1111 you reject. This is actually really cool because you get to reject a lot of other samples, for example if your first two flips are heads, you should start over, since you have decided to reject 11XX, and similarly if you get 101X you can start over immediately. This will reduce the total number of flips.

One more example of the power of this, Let’s say I want 7/100. Write 100 in binary, 64 + 32 + 4. 1100100 then we accept as valid samples 0000000 -> 1100011 and we just take 0000000-> 000110 as our success samples. 

So here is a clear algorithm description, 

1. Write the denominator of your probability in binary, you will accept samples of 0 up until d -1.
2. Success will be defined as numbers between 0 up until n -1, where n is your numerator.
3. Reflip if you know the number will be greater than yours in binary. 

Cool, So now here is another cool thing about this, the number of flips required on average to simulate a given probability is logarithmic in the denominator. First we know that there are log_2(denominator) flips since that is how many bits it requires to express a number. Consider the denominator with the largest reject probability, I.e. the number of times you decide to reflip is the greatest. I will argue that the maximum probability of that is 1/2. If it is greater than 1/2 then it means there is a way such that the first bit does not matter (i.e. you either reflip because you get a 1 or you get a 0 and then have no information as to wether to accept or reject the trial). This means that the first bit is significant and thus 1000…0000 is in your sample and you don’t reflip on this or any number lower. This means that at least 1/2 of the bits are significant so the maximum probability of reflipping is 1/2. With that in mind we can write out this sum but I am too lazy but it equals 2 log_2(denominator) and thus you only need a logarithmic number of flips.

OK SO NOW IS WHERE SHIT GETS REAL COOL! THIS IS INTEGRATING A VON NEUMANN COINFLIP INTO THIS. IF YOU KNOW WHAT A VON NEUMANN COIN FLIP IS THEN U ARE COOL. IF YOU DON’T KNOW WHAT ONE IS THEN YOU ARE TO BECOME COOL AFTER READING BELOW. GET HYPED

Here is the idea, you go to a Vegas casino and the clerk gives u a rigged coin, and your like hold up I wanna place a 50/50 bet on something where the odds are actually 50-50. Unfortunately the Vegas tells you you can only bet on tails, at 50-50 but you suspect the probability of tails is less than 50%. WHAT SHOULD YOU DO? 

Well lucky for you, you will soon be able to propose a Von Neumann flip which guarantees fairness. Here is the idea, if you were to flip the coin only twice, the probability of HT and TH should be the same, even though the probability of HH >> TT since the probability of H > T as we assumed above. Why is that true, well if you did two flips that are independent, the probability of tails on the first flip is the same as the probability of tails on the second flip, so p(HT) = P(TH). Lets do an example to show you,

Say the coin is rigged such that the probability of heads p(H) = 70%,
And the probability of tails  p(T) = 30%,
(An important trick is that the probability must always add up to 100% or 1 if your probabilities describe all of the possible events that could occur)

Well the probability of seeing two heads after two flips is p(HH)= 70% * 70% or .7 * .7 = 49%
The probability of seeing two tails is p(TT) =  30% * 30% = 9%
The probability of seeing heads then tails, p(HT) = 70% * 30% = 21%
The probability of seeing tails then heads, p(TH) = 30% * 70% = 21%
  
As you can see these probabilities add to 100% and describe the entire event space. But as we’ve shown above the order which you see HT or TH does not matter, so if we could exploit this then we would have a fair coin toss.

Here is the algorithm for the fair von Neumann coin toss,

1. Flip 2 coins, if HT or TH accept the sample. Label HT as heads, TH as tails.
2. If you flip the coin twice and get HH or TT then you start over from fresh, until you get one of HT or TH. 

Here is a sample run,

Flip two coins I get HH, reject the sample start over.

Flip two coins I get HH, reject the sample start over.

Flip two coins I get TT, reject the sample start over.

Flip  two coins I get HH, reject the sample start over.

Flip two coins I get TH, Accept the sample the result is tails.

The sequence looked like this HHHHTTHHTH. How come an HT appears before TH but we went with TH. HT is more likely to appear in the sequence first if it is rigged towards heads, since the first flip is most likely heads, and there for the next flips will be heads until you reach a T. In fact HT will occur first with probability = to the probability of heads. However, when you segment and break into chunks, it is equally likely that TH and HT occur when looking at two disjoint coin flips. Thats why you should think of it like the sample run and not look at the whole sequence. 

Lets put everything in this post together, We can use a von Neumann flip to get a fair coin toss, and with our fair coin toss, we can use the algorithm described above to get any possible rational fraction. 

References:
Cover: Elements of Information Theory


No comments:

Post a Comment