Ordinary Least Squares Regression Alternative in R

by finnstats

Ordinary Least Squares Regression Alternative in R, when there are outliers or influential observations in the dataset we’re dealing with, we might utilize robust regression as an alternative to regular least squares regression.

In R, we may use the rlm() function from the MASS package to do robust regression.

For a given dataset, the following step-by-step example illustrates how to do robust regression in R.

Step 1: Create the Data

Let’s start by making a fictitious dataset to work with:

generate information

df <- data.frame(x1=c(3, 2, 3, 3, 3, 4, 6, 7, 9, 6,
                      11, 16, 16, 11, 11, 21, 22, 23, 28, 21),
                 x2=c(7, 7, 4, 129, 13, 24, 17, 14, 20, 11,
                      25, 26, 26, 16, 17, 19, 30, 31, 31, 32),
                 y=c(12, 150, 29, 19, 24, 20, 125, 29, 35, 42,
                     44, 60, 41, 43, 63, 84, 81, 97, 49, 80))

Let’s view the first six rows of data

head(df)

  x1 x2   y
1  3  7  12
2  2  7 150
3  3  4  29
4  3 29  19
5  3 13  24
6  4 24  20

Step 2: Ordinary Least Squares Regression should be used.

Next, generate a plot of the standardized residuals using an ordinary least squares regression model.

In practice, any standardized residual with an absolute value larger than 3 is frequently regarded as an anomaly.

Now we can fit a regression model using ordinary least squares

ols <- lm(y~x1+x2, data=df)
ols

Call:
lm(formula = y ~ x1 + x2, data = df)
Coefficients:
(Intercept)           x1           x2 
    49.2703       1.2364      -0.2762

make a graph of the y-values vs. the standardized residuals

plot(df$y, rstandard(ols), ylab='Standardized Residuals', xlab='y')
abline(h=0)

We can see from the plot that there are two observations with standardized residuals of about 3.

This indicates that there is one or two probable outliers in the dataset and that performing robust regression instead would be beneficial.

Step 3: Robust Regression should be used.

Next, we’ll fit a robust regression model with the rlm() function:

library(MASS)

Now we can fit a robust regression model

robustre<- rlm(y~x1+x2, data=df)
robustre

Call:
rlm(formula = y ~ x1 + x2, data = df)
Converged in 15 iterations
Coefficients:
(Intercept)          x1          x2
20.34227587  2.70504796 -0.08318438
Degrees of freedom: 20 total; 17 residual
Scale estimate: 12.2

The residual standard error of each model can be calculated to see if this robust regression model fits the data better than the OLS model.

In a regression model, the residual standard error (RSE) is a means to calculate the standard deviation of the residuals. The lower the RSE score, the better a model’s ability to match the data.

The code below demonstrates how to compute the RSE for each model:

find the ols model’s residual standard error

summary(ols)$sigma
[1] 36.65571

To find the residual standard error of ols model

summary(robustre)$sigma
[1] 12.20205

The RSE for the robust regression model is substantially lower than the RSE for the ordinary least squares regression model, indicating that the robust regression model fits the data better.

How to Calculate Partial Correlation coefficient in R-Quick Guide » finnstats