Monday, July 23, 2018

Welcome to Haackalytics

Hi All and Welcome,

I have always wanted a place where I can share my thoughts on the world. I am excited to be releasing a blog at the start of my career as a computer scientist/ data scientist/ machine learning engineer/ statistician.

So what is it that I want to do with my career?

That's a good question, and honestly I have no idea what specifically I will be doing, although I do want to go into soccer analytics at some point. Even though I do not know what exactly I will do in the next 50 or so years, I have some intuition on how I will be able to proceed. I hope to be an expert at properly interpreting data to draw meaningful conclusions.  The path to this goal is incredibly daunting, and to interpret data properly you must always doubt yourself and the conclusions which you draw. In this post I will give you a flavor into how I think about truths in the world.

So how do you know that what you discovered is correct?

I like to believe that I follow a Bayesian train of thought. I believe you should never be 100% certain about anything you believe, and new data should always be able to update your beliefs. (this is known as updating your posterior) But if you hold an opinion or a belief you must be willing to discard that belief on a moments notice given new data that supports a contrary argument. In this regard I believe all beliefs are probability distributions in which you believe the most likely outcome. I am very confident about certain things, for example, I'm almost 100% certain that the Earth is round. However, if there were stronger evidence to prove the contrary then I must update my belief system and think that there is some probability for the earth being non-round. This process is not easy and I would like to introduce one more tricky aspect to these types of problems which is the concept of noisy data.

How can I properly update my beliefs if the data which I see in the world is not all correct?

It is often the case that good intentioned people will see something that actually did not happen and believe it as truth. For example you may only go to southern California during a rainy week and think that it always rains there*. I think the most important thing to do whenever you handle any kind of data is to analyze it and try to gain some intuition as to how accurate you think your data is. If you can have some understanding of the flaws which are created by the data itself, then you are less likely to believe the wrong thing.

Thanks For Reading,

Chris Haack

* This is an effect caused by seemingly random events like whether it rains on a given data at a certain location, and having an insufficient sample size.

No comments:

Post a Comment