What Is Multiple Regression? (With Tips for Calculation)

By Indeed Editorial Team

Published 12 October 2022

The Indeed Editorial Team comprises a diverse and talented team of writers, researchers and subject matter experts equipped with Indeed's data and insights to deliver useful tips to help guide your career journey.

Multiple regression is a statistical technique that allows us to estimate the relationships between multiple independent variables and a dependent variable. You can do this by fitting a regression model to data, meaning that you estimate the model using a set of data points. Knowing what this entails can help you understand how to properly use the technique and successfully interpret the results. In this article, we define what multiple linear regression is, give some helpful tips for calculating it, explain the five important assumptions behind it and discuss when this model is necessary.

What is multiple regression?

Multiple regression is a statistical technique that you can use to predict the value of a dependent variable based on the values of two or more independent variables. The dependent variable is also the outcome variable, while the independent variables are the predictor variables. It's an extension of linear regression since it involves more than one independent variable whereas linear regression only involves one.

The coefficients in multiple regression tell you the effect of each independent variable on the dependent variable while controlling for the other independent variables. This means that the coefficient for an independent variable tells you the effect of that independent variable on the dependent variable while holding all of the other independent variables constant. The coefficient is the amount that the dependent variable changes given a one-unit change in an independent variable while holding all other variables constant.

Related: What Is Multicollinearity? (Definition and Examples)

Tips for calculating multiple regression

These are some tips you can follow when calculating the multiple linear regression:

Make sure you understand the formula

It's important to understand the formula for multiple linear regression before you start using it. This makes it easier for you to input the data and interpret the results. The formula is:

Y = b0 + b1X1 + b1 + b2X2 + ... + bpXp

You predict the dependent variable (Y) by a linear combination of the p predictor variables (X1, X2, ..., Xp). The coefficients (b0, b1, ..., bp) are the estimated regression coefficients. The error term (e) is the difference between the actual value of the dependent variable (Y) and the predicted value (Ŷ).

Related: A Comprehensive Guide to Inferential Statistics (With FAQs)

Use a software package

While it's possible to calculate multiple linear regression manually, it's usually much easier to use a statistical software package. This means you don't have to input the data or calculate the regression coefficients yourself. You instead simply input the data and the software does the work. You can find many different statistical software packages online.

Related: What Are Some Great Data Analyst Skills? Plus Job Duties

Choose the right type of regression

There are different types of regression, so it's important to choose the right one for your data. If you have categorical data, then you can use multiple linear regression. Meanwhile, if you have time series data, then you might want to use a different type of regression, such as autoregressive moving average or autoregressive integrated moving average.

Input the data correctly

When inputting the data, it's important to make sure that the independent variables are in columns and the dependent variable is in a row. You can label the columns so you know which variables are which. This makes it easier to input the data and interpret the results. Also, make sure the data is clean and free of any errors, meaning there are no missing values or incorrect values.

Use a reliable source

When you're doing research, it's important to use a reliable source, meaning a source that's accurate and up-to-date. This way you can be sure that the information you're using is correct. This is especially true when you're looking for information on statistical methods, such as multiple linear regression. A reliable source might be a peer-reviewed journal article or a reputable website.

Check the results

Once you calculate the multiple linear regression, it's important to check the results. This includes checking the residuals which are the differences between the actual values and the predicted values. If the residuals are distributed normally, then the multiple linear regression is likely to be reliable.

You can also check the R-squared, which is a measure of how well the model fits the data. A high R-squared value means that the model fits the data well because you can explain the dependent variable by the independent variables.

Related: What Is R-Squared? (Plus Ways to Use it and Examples)

Interpret the results

It's also important to interpret the results of the multiple linear regression correctly. This includes understanding the coefficients and knowing how they relate to the dependent variable. For example, a positive coefficient means that as the independent variable increases, the dependent variable also increases. Meanwhile, a negative coefficient means that as the independent variable increases, the dependent variable decreases.

Related: Parameter vs. Statistic: Key Differences (With FAQ)

Use caution when extrapolating

One limitation of multiple linear regression is that you only use it to predict values within the data set. This is because the model is only as good as the data that you use to create it. So, if you try to extrapolate or predict values outside of the data set, then the results might not be accurate. This is why it's important to use caution when extrapolating, meaning when you're trying to predict values that are outside of the data set.

Replicate the results

When you're doing research, it's important to replicate the results, meaning you do the study again to see if you get the same results. This is because sometimes the results of a study can be a fluke or chance. So, by replicating the results, you can be more confident that the results are accurate. To replicate the results, you can use the same data set or a different data set.

Remember cause and effect

It's important to remember that correlation doesn't imply causation, meaning that just because two things seem related, it doesn't mean that one thing causes the other. There might be a third variable that's causing both of the variables to increase or decrease. It's important to remember cause and effect when interpreting the results of a multiple linear regression to make sure that you're making the right conclusions from the data.

5 assumptions behind multiple regression

These are five of the assumptions that form the basis of multiple linear regression:

1. Multivariate normality

Multivariate normality means that the dependent variable distributes normally, so it has a bell-shaped curve. This assumption is important because it ensures that the residuals also distribute normally. You can test this assumption by observing how the residuals distribute.

If they don't distribute normally, then the multiple linear regression might not be reliable. You can also test the assumption using the Normal Probability Plot method or with a histogram that has a superimposed normal curve. The Normal Probability Plot method is more reliable, but the histogram is easier to do.

2. Independence of observation

The independence of observation assumption means that the observations don't relate to each other. This is important because if the observations relate to each other, then it's not possible to accurately assess the impact of the independent variables on the dependent variable.

You can test this assumption using the Durbin-Watson statistic, which is a test for autocorrelation. If you find that the Durbin-Watson statistic is two, then it means that there's no autocorrelation and the assumption is met.

3. Constant variance of the residuals

The variance of the residuals is important because it ensures that the errors distribute evenly. When there's a pattern in the residuals, it negates the assumption. You can test this assumption using a scatter plot of the residuals. If you see a pattern in the scatter plot, then it means that the assumption isn't met.

4. Independent variables aren't highly correlated with each other

The independent variables aren't highly correlated with each other because they can impact the coefficients and cause problems with the interpretation of the results. You can test this assumption by using the correlation matrix which is a table that shows the relationships between the variables. If you see that the independent variables are highly correlated with each other, meaning they have a correlation coefficient of above 0.7, then it means that it doesn't meet the assumption.

5. Linear relationship between the dependent and independent variables

There's a linear relationship between the dependent and independent variables because multiple linear regression is based on a linear model. You can test this assumption using a scatter plot of the dependent variable and each independent variable.

If you see a linear relationship, then it means that the assumption is met. If you see a non-linear relationship, then it means that the assumption isn't met and you might require a different type of regression.

When is multiple regression necessary?

There are many different fields that use multiple linear regression, such as:

  • Business: It can assess the impact of different factors on sales or profit, such as the impact of price, advertising and location on sales.

  • Medicine: It can assess the impact of different factors on a patient's recovery, such as age, weight and medications on a patient's health.

  • Social science: It can assess the impact of different factors on a dependent variable, such as the impact of income, education and family size on happiness.


Explore more articles