Wednesday, November 21, 2018

One in a hundred year events

On the news you often hear it is "the coldest it has been on thanksgiving day in 100 years"? How should you make sense of this?

Don't, first there is a huge implicit bias of the news, which is they like to report on rare events. You would not watch the news if the story line read something like "Hey Chris went to work today, and it was sunny today." Instead the news reports interesting stories which happen to be rare events. More importantly though is that things like temperature are dependent on the current state of the world, and that state constantly changes. Basically the probability distribution (more bayesian stats to come in this blog) for the weather on thanksgiving may have changed based on different factors.

Anyways there should be three major takeaways from this post.

1. News has a bias towards rare or interesting events, so the stuff you see on the news is not an accurate representation of the real world. I used weather here because that is relatively not politically charged and this blog will do its best to stick clear of politics.
2. Things happen as a function of the environment they are in.
3. Historical data is a very dangerous thing to use as the future is much different from the past.


Chris Haack

Ronaldo or Messi! The Argument for Ronaldo

So who is the best Ronaldo or Messi?

First I think we need to clarify what best means and I will start by answering four questions that do not pertain to who is the best.

1. Who is more entertaining? Leo Messi
2. Which player has more natural ability? Leo Messi
3. Which player is more "complete"? Leo Messi
4. Which player is more involved in the whole game? Leo Messi

The best soccer player is not about any of the above, it is the player who most increases your probability of winning given the remaining players on your team.

(A brief aside) This notion of best soccer player means that:
  1. It is possible to be objectively better on team A then team B and this is almost always the case
  2. There is no absolute best player for every team, unless that player is so significantly better than everyone else that they would provide the greatest utility on every team possible. 
  3. When acquiring new players on you're team there is a trade off between fit and objectively quality, and that means different positions should have different market values! 
  4. The quality of your other players in the same position determines the importance for that position, i.e. replacing a winger who runs a 100 meter dash in 16 seconds, is more important than finding an upgrade on your amazing striker.
So you might be wondering if you believe that best player can only defined by assessing the needs for a current team then how can you state anything about whether Messi or Ronaldo is best for their team So first I will give examples of teams in which Messi is better than Ronaldo and teams for which Ronaldo is better than Messi. Messi is better than Ronaldo on Man City with Pep, Barcelona, and Arsenal with Arsene Wenger and possibly a few more teams that are dominant. Ronaldo is better than Messi on any Mourinho team, Probably Klopp's liverpool, and any team in the bottom half of the premier league. Having Ronaldo on a team that is in the bottom half of a league, is a much more wise pick then messi, as Messi does well when his team does well, and has control of the ball up field. Ronaldo just needs his team to be able to cross the ball into the box or play it behind the defence. Messi needs to already start from a decent position to create an attack. On average in my opinion Ronaldo is more useful (has a larger effect on their win probability) to most soccer teams, and Messi is a better player for when your team is already better than the opposition. Take Ronaldo in the Champions league in recent years, Real Madrid was far worse than its opposition in it's later stage games yet Ronaldo came in the clutch with huge goals.

Unfortunately, this post has to come with a disclaimer that says I have no idea whether Ronaldo is actually better than Messi, as I can't simulate outcomes of various teams with replacing Ronaldo and Messi in them. I hope this post gives people a different lens to which to view the debate between Ronaldo and Messi. 

If you find this interesting send me a message!

Chris Haack

P.S. I am thankful for everything Ronaldo and Messi have done for the game and for society at large! 

Monday, August 13, 2018

Can you make a fair dice roll out of a rigged coin?

This post might be boring for non-math people out there, and too elementary for math people, but this was really intriguing me during the airport in my Europe trip so I thought about it. Here are those thoughts that have been intriguing me the past couple of days. 

Let’s start this with a simpler question can you simulate a dice roll with a fair coin. Lets consider coin tosses as a binary sequence of heads or tails. For example a sequence of three coin tosses could look like HHT which comes from heads, then heads, then tails on any of the 7 other possibilities. Given this we can show that for three tosses there are 8 possibilities each of which are equally like and happen with probability 1/8. Listed below are all of the possibilities.


So we know that any of these sequences are equally likely, just as the roll on a dice, but the problem is currently this gives us each side with probability 1/8. Well what if we reflip or reject some samples. when we see the first two sequences (HHH and HHT) we can chose to reflip, ok so this means with probability 1/4 we start reflipping, and with probability 3/4 we don't reflip. Well then we can easily label each sequence such that it mimics a dice roll as follows:

TTT -> 1
TTH -> 2
THT -> 3
THH -> 4
HTT -> 5
HTH -> 6

We also have shown that each of these sequences happens with probability 1/8, so they have the same probability of occurring. Now two questions arise, how many flips will we do on average, and can we get the coin to mimic any rational probability. The answer to both of these questions are promising, lets start with the later, choosing 1/6 was a fairly arbitrary choice can we do other numbers of course. Let’s do examples and then you will see my algorithm. We will solve for these two probabilities 3/10 and 7/100. 

Ok so first to make my life easier lets say H = 1, T = 0. 

We can write the decimal number 10 in binary as 1010. This means if we want to simulate a probability of a number with a denominator of 10, we can just look for coin flips of 0000 -> 1001
Which give us ten distinct numbers. 


Chose the first 3 as success the other 7 as failures and you have a probability 3/10. All other samples from 1010 -> 1111 you reject. This is actually really cool because you get to reject a lot of other samples, for example if your first two flips are heads, you should start over, since you have decided to reject 11XX, and similarly if you get 101X you can start over immediately. This will reduce the total number of flips.

One more example of the power of this, Let’s say I want 7/100. Write 100 in binary, 64 + 32 + 4. 1100100 then we accept as valid samples 0000000 -> 1100011 and we just take 0000000-> 000110 as our success samples. 

So here is a clear algorithm description, 

1. Write the denominator of your probability in binary, you will accept samples of 0 up until d -1.
2. Success will be defined as numbers between 0 up until n -1, where n is your numerator.
3. Reflip if you know the number will be greater than yours in binary. 

Cool, So now here is another cool thing about this, the number of flips required on average to simulate a given probability is logarithmic in the denominator. First we know that there are log_2(denominator) flips since that is how many bits it requires to express a number. Consider the denominator with the largest reject probability, I.e. the number of times you decide to reflip is the greatest. I will argue that the maximum probability of that is 1/2. If it is greater than 1/2 then it means there is a way such that the first bit does not matter (i.e. you either reflip because you get a 1 or you get a 0 and then have no information as to wether to accept or reject the trial). This means that the first bit is significant and thus 1000…0000 is in your sample and you don’t reflip on this or any number lower. This means that at least 1/2 of the bits are significant so the maximum probability of reflipping is 1/2. With that in mind we can write out this sum but I am too lazy but it equals 2 log_2(denominator) and thus you only need a logarithmic number of flips.


Here is the idea, you go to a Vegas casino and the clerk gives u a rigged coin, and your like hold up I wanna place a 50/50 bet on something where the odds are actually 50-50. Unfortunately the Vegas tells you you can only bet on tails, at 50-50 but you suspect the probability of tails is less than 50%. WHAT SHOULD YOU DO? 

Well lucky for you, you will soon be able to propose a Von Neumann flip which guarantees fairness. Here is the idea, if you were to flip the coin only twice, the probability of HT and TH should be the same, even though the probability of HH >> TT since the probability of H > T as we assumed above. Why is that true, well if you did two flips that are independent, the probability of tails on the first flip is the same as the probability of tails on the second flip, so p(HT) = P(TH). Lets do an example to show you,

Say the coin is rigged such that the probability of heads p(H) = 70%,
And the probability of tails  p(T) = 30%,
(An important trick is that the probability must always add up to 100% or 1 if your probabilities describe all of the possible events that could occur)

Well the probability of seeing two heads after two flips is p(HH)= 70% * 70% or .7 * .7 = 49%
The probability of seeing two tails is p(TT) =  30% * 30% = 9%
The probability of seeing heads then tails, p(HT) = 70% * 30% = 21%
The probability of seeing tails then heads, p(TH) = 30% * 70% = 21%
As you can see these probabilities add to 100% and describe the entire event space. But as we’ve shown above the order which you see HT or TH does not matter, so if we could exploit this then we would have a fair coin toss.

Here is the algorithm for the fair von Neumann coin toss,

1. Flip 2 coins, if HT or TH accept the sample. Label HT as heads, TH as tails.
2. If you flip the coin twice and get HH or TT then you start over from fresh, until you get one of HT or TH. 

Here is a sample run,

Flip two coins I get HH, reject the sample start over.

Flip two coins I get HH, reject the sample start over.

Flip two coins I get TT, reject the sample start over.

Flip  two coins I get HH, reject the sample start over.

Flip two coins I get TH, Accept the sample the result is tails.

The sequence looked like this HHHHTTHHTH. How come an HT appears before TH but we went with TH. HT is more likely to appear in the sequence first if it is rigged towards heads, since the first flip is most likely heads, and there for the next flips will be heads until you reach a T. In fact HT will occur first with probability = to the probability of heads. However, when you segment and break into chunks, it is equally likely that TH and HT occur when looking at two disjoint coin flips. Thats why you should think of it like the sample run and not look at the whole sequence. 

Lets put everything in this post together, We can use a von Neumann flip to get a fair coin toss, and with our fair coin toss, we can use the algorithm described above to get any possible rational fraction. 

Cover: Elements of Information Theory

Monday, July 23, 2018

Welcome to Haackalytics

Hi All and Welcome,

I have always wanted a place where I can share my thoughts on the world. I am excited to be releasing a blog at the start of my career as a computer scientist/ data scientist/ machine learning engineer/ statistician.

So what is it that I want to do with my career?

That's a good question, and honestly I have no idea what specifically I will be doing, although I do want to go into soccer analytics at some point. Even though I do not know what exactly I will do in the next 50 or so years, I have some intuition on how I will be able to proceed. I hope to be an expert at properly interpreting data to draw meaningful conclusions.  The path to this goal is incredibly daunting, and to interpret data properly you must always doubt yourself and the conclusions which you draw. In this post I will give you a flavor into how I think about truths in the world.

So how do you know that what you discovered is correct?

I like to believe that I follow a Bayesian train of thought. I believe you should never be 100% certain about anything you believe, and new data should always be able to update your beliefs. (this is known as updating your posterior) But if you hold an opinion or a belief you must be willing to discard that belief on a moments notice given new data that supports a contrary argument. In this regard I believe all beliefs are probability distributions in which you believe the most likely outcome. I am very confident about certain things, for example, I'm almost 100% certain that the Earth is round. However, if there were stronger evidence to prove the contrary then I must update my belief system and think that there is some probability for the earth being non-round. This process is not easy and I would like to introduce one more tricky aspect to these types of problems which is the concept of noisy data.

How can I properly update my beliefs if the data which I see in the world is not all correct?

It is often the case that good intentioned people will see something that actually did not happen and believe it as truth. For example you may only go to southern California during a rainy week and think that it always rains there*. I think the most important thing to do whenever you handle any kind of data is to analyze it and try to gain some intuition as to how accurate you think your data is. If you can have some understanding of the flaws which are created by the data itself, then you are less likely to believe the wrong thing.

Thanks For Reading,

Chris Haack

* This is an effect caused by seemingly random events like whether it rains on a given data at a certain location, and having an insufficient sample size.