How to interpret regression results in R
How to interpret regression results in R?. The lm() utility in R can be used to fit a linear regression model.
The summary() command can then be used to view the regression model’s output.
In R, this article shows you how to interpret each value in the regression output.
Example: How to interpret regression results in R
Using hp, drat, and wt as predictor variables and mpg as the response variable, the following code explains how to fit a multiple linear regression model with the built-in mtcars dataset:
Let’s fit regression model with predictors hp, drat, and wt
model <- lm(mpg ~ hp + drat + wt, data = mtcars)
Let’s view the model summary
Call: lm(formula = mpg ~ hp + drat + wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.3598 -1.8374 -0.5099 0.9681 5.7078 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.394934 6.156303 4.775 5.13e-05 *** hp -0.032230 0.008925 -3.611 0.001178 ** drat 1.615049 1.226983 1.316 0.198755 wt -3.227954 0.796398 -4.053 0.000364 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.561 on 28 degrees of freedom Multiple R-squared: 0.8369, Adjusted R-squared: 0.8194 F-statistic: 47.88 on 3 and 28 DF, p-value: 3.768e-11
Here’s how to figure out what each value in the output means.
lm(formula = mpg ~ hp + drat + wt, data = mtcars)
This part reminds us of the regression model formula we used earlier.
We used mpg as the response variable and hp, drat, and wt as the predictor variables, as shown. Each variable came from the mtcars dataset.
Min 1Q Median 3Q Max -3.3598 -1.8374 -0.5099 0.9681 5.7078
The distribution of residuals from the regression model is summarised in this section. Remember that a residual is a difference between the actual value and the regression model’s anticipated value.
The lowest residual was -3.3598, the median was -0.5099, and the maximum was 5.7078.
Estimate Std. Error t value Pr(>|t|) (Intercept) 29.394934 6.156303 4.775 5.13e-05 *** hp -0.032230 0.008925 -3.611 0.001178 ** drat 1.615049 1.226983 1.316 0.198755 wt -3.227954 0.796398 -4.053 0.000364 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The regression model’s estimated coefficients are displayed in this section. These coefficients can be used to create the following estimated regression equation:
mpg = 29.39 – .03*hp + 1.62*drat – 3.23*wt
The following values are provided for each predictor variable:
Estimate: The estimated coefficient is called an estimate. Assuming all other predictor variables remain constant, this tells us the average increase in the response variable is associated with a one-unit increase in the predictor variable.
Standard Error: This is the coefficient’s standard error. This is a measure of the degree of uncertainty in our coefficient estimate.
t value: The t-statistic for the predictor variable is obtained as (Estimate) / (Estimate) / (Estimate) / (Estimate) / (Estimate) / (Estimate) / (Estimate) (Std. Error).
Pr(>|t|): This is the t-corresponding statistic’s p-value. The predictor variable is deemed to be statistically significant if this value is less than some alpha level (e.g. 0.05).
We can say that hp and wt are statistically significant predictors in this regression model if we use an alpha threshold of =0.05 to identify which predictors are significant.
Assessing Model Fit
Residual SE: 2.561 on 28 degrees of freedom Multiple R-squared: 0.8369, Adjusted R-squared: 0.8194 F-statistic: 47.88 on 3 and 28 DF, p-value: 3.768e-11
This final portion shows a variety of figures that assist us to determine how well the regression model fits our data.
The residual standard error shows us how far the observed values deviate from the regression line on average. The lower the value, the more accurately the regression model can fit the data.
The degrees of freedom are calculated using the formula n-k-1, where n represents the total number of observations and k represents the number of predictors.
Because mtcars has 32 observations and the regression model has three predictors, the degrees of freedom are 32 – 3 – 1 = 28.
The coefficient of determination is also known as multiple R-squared. It indicates how much of the variance in the response variable is explained by the predictor factors.
This number can be anywhere between 0 and 1. The closer it gets to 1, the better the predictor variables can predict the responder variable’s value.
R-squared with the number of predictors adjusted: This is a modified version of R-squared with the number of predictors adjusted. It is never greater than R-squared.
When comparing the fit of different regression models with varied numbers of predictor variables, the adjusted R-squared might be informative.
The F-statistic indicates whether a regression model with independent variables fits the data better than a model with no independent variables. It basically determines whether the regression model as a whole is useful.
The p-value that corresponds to the F-statistic is called the p-value. If this value is less than a certain level of significance (for example, 0.05), the regression model fits the data better than a model without predictors.
We hope that this p-value is smaller than some significance level when designing regression models since it implies that the predictor variables are genuinely effective for predicting the value of the response variable.
A hypothesis test is done on the global model while running a regression model, whether simple or complex.
The null hypothesis is that the dependent variable and the independent variable(s) have no relationship, while the alternative hypothesis is that they do.
In other words, the null hypothesis states that all of the variables in your model have zero coefficients. Another possibility is that at least one of them is not zero.
The F-statistic and overall p-value assist us in determining the test’s outcome. Depending on how many factors are in your test, looking at the F-statistic alone can be misleading.
It’s typical for an F-statistic to be near one and still give a p-value that rejects the null hypothesis when there are a lot of independent variables.
A bigger F-statistic, on the other hand, implies that the null hypothesis should be rejected for smaller models.
The p-value that is connected with the F-statistic is a better method. In reality, a p-value of less than 0.05 usually indicates that your model has at least one non-zero coefficient.