Confidence Intervals (Psychology)

Confidence Interval

If you are working with a sample of the population and want to use this sample to predict the mean of the whole population, you would use a confidence interval to say in what range you would expect the true mean to lie in. This is done by using the standard error of the mean.

In the following we assume that the population mean we are estimating and is unknown is $\mu$, the population variance is $\sigma^2$. Also a sample of size $n$ has sample mean $\bar{x}$ and sample variance $s^2$.

There are two important situations:

Population Variance Known

Here we know or have been given the population variance $\sigma^2$.

You can choose the different significance level for the interval. Usually you would choose the level to be $95$% meaning that if you were to repeat our experiment $100$ times, approximately $95$ of the experiments would contain the true value of the mean.

You can set the significance level to different values $(1-\alpha)100$% such as $99\%,\;\;\alpha=0.01$ or $99.9\%\;\;\alpha=0.001$, making the interval more accurate but significantly larger, then find the corresponding $z$ values by looking for the value at the $\alpha/2$ level $(z_{\alpha/2})$ using the appropriate table.

When working with data that is (approximately or exactly) normally distributed you can begin constructing the confidence interval for the population mean using the following formula:

\begin{equation} z = \dfrac{(\bar{x} -\mu)}{\sqrt{\dfrac{\sigma^2}{n}}} \end{equation}

This makes $z$ a $N(0,1)$ distribution and you can say for a $95$% confidence level that the $P( -1.96 < z < 1.96 ) = 0.95$. For different levels replace $1.96$ ( which corresponds to the value at $z_{0.025})$by the corresponding values from the normal table

However, as you want an interval for $\mu$ and not $z$ you must rearrange \begin{align} -1.96 < &z < 1.96,\\ -1.96 <&\dfrac{\bar{x} - \mu }{\sqrt{\dfrac{\sigma^2}{n}}}<1.96,\\ -\bar{x} -1.96\sqrt{\dfrac{\sigma^2}{n}} < -&\mu < 1.96\sqrt{\dfrac{\sigma^2}{n}}-\bar{x}, \end{align}

\begin{equation} \bar{x} - 1.96\sqrt{\dfrac{\sigma^2}{n}} <\mu <\bar{x} +1.96\sqrt{\dfrac{\sigma^2}{n}} \end{equation}

The interval is usually written $\bar{x} \pm 1.96\sqrt{\dfrac{\sigma^2}{n}}$.

Worked Example - Confidence Interval

The percentage yearly return of the flu virus in a remote town. The return is known to be normally distributed with a standard deviation of $0.8$. A sample of $20$ randomly selected values yields a mean of $\bar{x} = 3.2$%.

Obtain a $99$% confidence interval for the mean yearly returns of flu.

Solution

You want a $99$% $(\text{this is a}(1-\alpha)100$%$)$ confidence interval so $\alpha = 0.01$ and $\alpha/2 = 0.005$ and thus $z_{0.005} = 2.5758$. The confidence interval is:

$3.2 \pm 2.5758\sqrt{\dfrac{0.8^2}{20}~} = 3.2 \pm 0.51516$ which means the true mean is likely to lie in the range $( 2.739, 3.661)$.

The news here is not that bad! The true mean is likely to lie in the range $( 2.739, 3.661)%$ so on average the chances of getting the flu are relatively low.

Population Variance Unknown.

You may not always work with data where you know $\sigma^2$ (the population variance). In fact this is usually the case. This means you cannot use the above formula for working out confidence intervals. What you can do instead is use the sample variance ($s^2$) and substitute this for $\sigma^2$ into the formula for $z$ and call it $T$. (Note: this is still assuming that the observations are approximately normally distributed.)

\begin{equation} T =\dfrac{(\bar{x} -\mu)}{\sqrt{\dfrac{s^2}{n}}} \end{equation}

Now our value of $T$ has a larger variation between each sample. This means $T$ does not have a $N(0,1)$ distribution. So we now have to use a t-distribution with $(n-1)$ degrees of freedom instead. The t-distribution becomes close to the normal distribution as the number of degrees of freedom increases to over $30$. (See reading tables for the t-table and how to read it.) The formula for the confidence interval when the population variance is unknown is:

\begin{equation} \bar{x} - t_{\alpha/2}\sqrt{\dfrac{s^2}{n}} <\mu <\bar{x} +t_{\alpha/2}\sqrt{\dfrac{s^2}{n}} \end{equation}

Again for convenience this is usually written: $\bar{x} \pm t_{\alpha/2}\sqrt{\dfrac{s^2}{n}}$ and this is still interpreted as a confidence interval for the true value of the population mean.

Worked Example - Confidence Intervals

A sample of test scores of $15$ children from one parent families were collected. This sample gave a mean of $15.29$ with a sample standard deviation of $1.95$.

Find the $95$% confidence interval for the mean test score. What do we need to assume?

Solution

Firstly, the population variance is unknown, and we have a relatively small sample, so to construct this interval we must use the t-distribution and thus we must assume that the data is normally distributed to continue.

You have a sample of size $n = 15$ so we have $n-1= 14$ degrees of freedom. Since you are wanting to find the $95$% confidence interval $\alpha = 0.05$ and so $\alpha/2 = 0.025$. Looking up the the corresponding t-value $t_{0.025} = 2.1448$ from the t-distribution table.

The formula for the $95$% confidence interval is:

$\bar{x} \pm t_{0.0025}\sqrt{\dfrac{s^2}{n}~}$

Substituting the values in gives:

$15.29 \pm 2.1448\sqrt{\dfrac{1.95^2}{15}~}$

Thus the confidence interval for the mean of the test scores is $(14.21 , 16.37)$.

Test Yourself

Try our Numbas test on confidence intervals.