Confidence intervals expands the concept of a point estimation by giving a margin of error such that one can be confident that a certain percentage of the time the true parameter falls within that interval.
In this lab, we will look at confidence intervals for a mean. This lab focuses on a certain method of confidence intervals that depends on the distribution of sample means being Normal. We will show how the violation of this assumption impacts the probability that the true parameter falls within the interval.
## Methods
The observed level of confidence tells us the proportion of times the true mean falls within a confidence interval. To show how the violation of the Normality assumption affects this, we will sample from both a Normal distribution, T distribution, and exponential distribution with different sample sizes.
The normal and T distributions are sampled with a mean of 5 and a standard deviation of 2. The exponential deviation is sampled with a lambda of 2 or mean of 0.5.
From the samples, we obtain the mean and the upper/lower bounds of the confidence interval. This is performed 100,000 times. That way we obtain a distribution of these statistics.
We know that a confidence interval is valid, if the lower bound is no more than the true mean and the upper bound is no less than the true mean. From this definition, we can compute a proportion of observed confidence from the simulations
### Visualizations
From the distributions of statistics, we can create visualizations to support the understanding of confidence intervals.
The first one is a scatterplot of lower bounds vs upper bounds. This plot demonstrates the valid confidence intervals in blue and the invalid ones in red. It demonstrates how confidence intervals that are invalid are not located inside the box.
The second visualization is a histogram of all the sample means collected. The sample means that didn't belong to a valid confidence interval are shaded in red. This graphic helps demonstrate the type I errors on both sides of the distribution.
In this lab, we're interested in seeing how our observed level of confidence differs from our theoretical level of confidence (95%) when different sample sizes and distributions are applied.
## Results
We can see from the table section in the Appendix that sampling from a Normal or t distribution does not adversely affect our observed level of confidence. The observed level of confidence varies slightly from the theoretical level of confidence of 0.95.
When sampling from the exponential distribution, however, the observed level of confidence highly depends upon the sample size.
Looking at Table III, we can see that for a sample size of 10, the observed level of confidence is at a meager 90%. This is 5% off from our theoretical level of confidence. This shows how the normality assumption is vital to the precision of our estimate.
This comes from the fact that using this type of confidence interval on a mean from a non-normal distribution requires a large sample size for the central limit theorem to take affect.
The central limit theorem states that if the sample size is "large", the distribution of sample means approach the normal distribution. You can see how in Figure XVIII, the distribution of sample means is skewed, though as the sample size increases, the distribution of sample means become more symmetric (Figure XIX).
## Conclusion
From this, we can conclude that violating the underlying assumption of normality decreases the observed level of confidence. We can mitigate the decrease of the observed level of confidence when sampling means from a non-normal distribution by having a larger sample size. This is due to the central limit theorem.
## Appendix
### Tables
#### Table I. Sampling from Normal
| Sample Size | Proportion of Means Within CI |
| ----------- | ----------------------------- |
| 10 | 0.94849 |
| 20 | 0.94913 |
| 50 | 0.95045 |
| 100 | 0.94955 |
#### Table II. Sampling from T Distribution
| Sample Size | Proportion of Means Within CI |
| ----------- | ----------------------------- |
| 10 | 0.94966 |
| 20 | 0.94983 |
| 50 | 0.94932 |
| 100 | 0.94999 |
#### Table III. Sampling from Exponential Distribution
| Sample Size | Proportion of Means Within CI |
| ----------- | ----------------------------- |
| 10 | 0.89934 |
| 20 | 0.91829 |
| 50 | 0.93505 |
| 100 | 0.94172 |
### Figures
#### Normal Distribution
##### Figure I. Scatterplot of Bounds for Normal Distribution of Sample Size 10