Multiple Regression

Definition

Multiple linear regression aims to find a linear relationship between variables in situations where there are several independent variables. The independent variables can either be continuous or qualitative, however the dependent variable must be measured on a continuous scale. A multiple regression model with $k$ independent variables fits a regression “surface” in a $k + 1$ dimensional space.

The least squares regression line for multiple regression of $n$ independent variables is \[\displaystyle \hat{y}=a+b_1x_1+b_2x_2+...+b_nx_n\]

where:

  • $a$ is a constant,
  • $x_n$ is the $n$th independent variable,
  • $b_n$ is the coefficient of the $n$th independent variable.

Bivariate Model

A bivariate model is a model with two independent variables. \[\hat{y}=a+b_1x_1+b_2x_2\] The values form a plane in a 3-dimensional space.

A multiple regression model with two predictor variables fits a regression plane in 3-dimensional space

The intercept $a$ predicts where the plane will cross the $y$-axis. The value $b_1$ is gradient of the variable $x_1$, this predicts $y$ with every change in unit of $x_1$ whilst $x_2$ is constant. The gradient of the variable $x_2$, $b_2$, predicts $y$ with every change in unit of $x_2$ whilst $x_1$ is constant.

An example of when you might use a bivariate model is to show the correlation between stress and number of working hours along with stress and days without exercise, as both sets of variables have a positive correlation. An example of when you would not be able to use a bivariate model is to show the correlation between depression and hours of sunlight along with depression and temperature, as hours of sunlight and temperature are not independent of each other.

Minimise the sum of square residuals

Minimising the sum of the square residuals is a method which follows on from the sum of the square residuals for simple regression. It is used to calculate the equation of the regression line. It is very long and has complicated equations, which is why it is usually calculated by a computer.

Interpreting a Multiple Regression Model

\[\hat{y}=a+b_1x_1+b_2x_2+\ldots + b_nx_n\] When all the independent variables $x_1, x_2, \ldots x_n$ are constant the predicted value of $y$ is $a$.

The predicted value of $y$ changes by $b_1$ for each one unit increase in $x_1$, when all other variables are constant. Similarly, the predicted value of $y$ changes by $b_2$ for each one unit increase in $x_2$, when all other variables are constant. Therefore, the predicted value of $y$ changes by $b_n$ for each one unit increase in $x_n$, when all other variables are constant.

Video Examples

Alissa Grant-Walker presents a video on how to interpret multiple regression.

Test Yourself

External Resources

See Also