I've been following along with [Bayesian Methods for Hackers](https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/tree/master/) and I wanted to try using PyMC3 with my own small dataset.
A week ago, a couple friends and I went out and played Disc Golf at a local park. In case you don't know what Disc Golf is, the goal is for a player to throw a disc at a target with as few throws as possible. Throughout each of the *holes*, I recorded the number of tosses required until we completed that hole.
| Brandon | Chris | Clare |
| ------- | ----- | ----- |
| 6 | 8 | 10 |
| 7 | 9 | 10 |
| 7 | 8 | 9 |
| 7 | 7 | 8 |
| 5 | 4 | 9 |
| 5 | 5 | 10 |
| 4 | 4 | 7 |
| 5 | 6 | 9 |
| 6 | 5 | 7 |
| 7 | 7 | 8 |
| 5 | 6 | 8 |
| 6 | 5 | 7 |
| 6 | 6 | 8 |
| 5 | 4 | 6 |
You can also [download the CSV](/data/discgolf03242020.csv).
What I want to know is the distribution of the number of tosses for each player.
## PyMC3
Let's try answering this with Bayesian Statistics + PyMC3.
Since the number of tosses are count data, we are going to use the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution) for the estimation. This means that we are going to characterize each player's number of tosses with this distribution. The Poisson distribution has a parameter $\lambda$ which also serves as the [expected value](https://en.wikipedia.org/wiki/Expected_value). The shape of the distribution is dependent on this parameter, so we'll need to estimate what this parameter is. Since the expected value must be positive, the exponential distribution is a good candidate.
$$
toss \sim Poisson(\lambda) \\
\lambda \sim Exp(\alpha)
$$
The exponential distribution also has a parameter $\alpha$. The expected value of an exponential distribution with respect to $\alpha$ is $\frac{1}{\alpha}$. At this point we can give an estimate of what we believe $\alpha$ could be. Given the relationships with the expected values, the mean score of each of the players is a great choice.
```python
with pm.Model() as model:
# Random Variables
ALPHA = [1. / DISC_DATA[person].mean() for person in PEOPLE]
LAMBDAS = [pm.Exponential("lambda_" + person, alpha) for person, alpha in zip(PEOPLE, ALPHAS)]
```
Now to show how easy the library is, we will provide the data and run Monte Carlo simulations to see the distribution that the $\lambda$s live in.
We can see in the first graphic that the average number of tosses required for Brandon and Chris is about $6$, while for Clare it's $8$. Looking at the second graphic though, shows us that Clare has a wide range of possible tosses.
With PyMC3, we got to see the larger trends in the data by analyzing distributions that are more likely given the data.