Ordinary Least Squares Regression Alternative in R
Ordinary Least Squares Regression Alternative in R, when there are outliers or influential observations in the dataset we’re dealing with, we might utilize robust regression as an alternative to regular least squares regression.
In R, we may use the rlm() function from the MASS package to do robust regression.
For a given dataset, the following step-by-step example illustrates how to do robust regression in R.
Step 1: Create the Data
Let’s start by making a fictitious dataset to work with:
generate information
df <- data.frame(x1=c(3, 2, 3, 3, 3, 4, 6, 7, 9, 6, 11, 16, 16, 11, 11, 21, 22, 23, 28, 21), x2=c(7, 7, 4, 129, 13, 24, 17, 14, 20, 11, 25, 26, 26, 16, 17, 19, 30, 31, 31, 32), y=c(12, 150, 29, 19, 24, 20, 125, 29, 35, 42, 44, 60, 41, 43, 63, 84, 81, 97, 49, 80))
Let’s view the first six rows of data
head(df)
x1 x2 y 1 3 7 12 2 2 7 150 3 3 4 29 4 3 29 19 5 3 13 24 6 4 24 20
Step 2: Ordinary Least Squares Regression should be used.
Next, generate a plot of the standardized residuals using an ordinary least squares regression model.
In practice, any standardized residual with an absolute value larger than 3 is frequently regarded as an anomaly.
Now we can fit a regression model using ordinary least squares
ols <- lm(y~x1+x2, data=df) ols
Call: lm(formula = y ~ x1 + x2, data = df) Coefficients: (Intercept) x1 x2 49.2703 1.2364 -0.2762
make a graph of the y-values vs. the standardized residuals
plot(df$y, rstandard(ols), ylab='Standardized Residuals', xlab='y') abline(h=0)
We can see from the plot that there are two observations with standardized residuals of about 3.
This indicates that there is one or two probable outliers in the dataset and that performing robust regression instead would be beneficial.
Step 3: Robust Regression should be used.
Next, we’ll fit a robust regression model with the rlm() function:
library(MASS)
Now we can fit a robust regression model
robustre<- rlm(y~x1+x2, data=df) robustre
Call: rlm(formula = y ~ x1 + x2, data = df) Converged in 15 iterations Coefficients: (Intercept) x1 x2 20.34227587 2.70504796 -0.08318438 Degrees of freedom: 20 total; 17 residual Scale estimate: 12.2
The residual standard error of each model can be calculated to see if this robust regression model fits the data better than the OLS model.
In a regression model, the residual standard error (RSE) is a means to calculate the standard deviation of the residuals. The lower the RSE score, the better a model’s ability to match the data.
The code below demonstrates how to compute the RSE for each model:
find the ols model’s residual standard error
summary(ols)$sigma [1] 36.65571
To find the residual standard error of ols model
summary(robustre)$sigma [1] 12.20205
The RSE for the robust regression model is substantially lower than the RSE for the ordinary least squares regression model, indicating that the robust regression model fits the data better.
How to Calculate Partial Correlation coefficient in R-Quick Guide » finnstats