Residuals

Definition

The residual for each observation is the difference between predicted values of $y$ (dependent variable) and observed values of $y$. \begin{align} \text{Residual}&=\text{actual } y \text{ value} - \text{predicted }y \text{ value} \text{,}\\ r_i&=y_i-\hat{y_i} . \end{align}

Having a negative residual means that the predicted value is too high, similarly if you have a positive residual it means that the predicted value was too low. The aim of a regression line is to minimise the sum of residuals.

Calculating Residuals

Knowing that \[r_i=y_i-\hat{y_i}\] and knowing that the regression line has the equation \[\displaystyle \hat{y_i}=a+b{x_i}\] we calculate the residual of an observation as follows: \[r_i=y_i-\hat{y_i}=y_i-(a+bx_i).\]

Worked Example

Worked Example

To see how students' physical ability has increased over a four-year period, ten students completed an obstacle course and then four years later they took the same course again. Here are their times:

Student

Debbie

Edna

Jerry

Norman

Joseph

Betty

Susan

Marilyn

Bert

Alice

First Test, $x$, (seconds)

$67$

$53$

$68$

$57$

$71$

$74$

$63$

$75$

$66$

$66$

Second Test, $y$, (seconds)

$46$

$29$

$37$

$44$

$41$

$35$

$41$

$43$

$33$

$36$

The equation of our regression line is $\hat{y}=23.91+0.22x$. What is the predicted time to complete the second course for Betty and what is the residual value?

Solution

Using our regression line equation we can calculate the predicted value, $\hat{y}$, by simply substituting in our value for $x$ (the first test score for Betty).

\begin{align} \hat{y_i}&=a{x_i}+b\\ &=23.91+0.22x_i\\ &=23.91+0.22\times74\\ &=40.19 \end{align}

The residual value is calculated by

\begin{align} r_i&=y_i-\hat{y_i}\\ &=35-40.19\\ &=-5.19 \end{align}

Video Example

This is a video example involving calculating residuals produced by Alissa Grant-Walker.

External Resources

See Also