Homoscedasticity in Regression Analysis

by finnstats

Homoscedasticity in Regression Analysis, The Goldfeld–Quandt test checks for homoscedasticity in regression studies in statistics.

This is accomplished by separating a dataset into two portions or groups, which is why the test is also known as a two-group test.

The Goldfeld–Quandt test is one of two tests proposed by Stephen Goldfeld and Richard Quandt in a paper published in 1965.

Granger Causality Test in R (with Example) »

Homoscedasticity in Regression Analysis

Heteroscedasticity in a regression model refers to the unequal scatter of residuals at different levels of a response variable.

If there is heteroscedasticity, one of the essential assumptions of linear regression is that the residuals are evenly distributed at each level of the response variable.

This article will show you how to use R to perform the Goldfeld-Quandt test to see if a regression model has heteroscedasticity.

Building a Regression Model is the first step.

Introduction to Machine Learning with TensorFlow »

First, we’ll use R’s built-in mtcars dataset to create a multiple linear regression model:

we can make use of one of our previous posts and identify the best regression model

model <- lm(mpg~ wt + qsec + am, data=mtcars)

Let’s view the model summary

summary(model)

Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
    Min      1Q  Median      3Q     Max
-3.4811 -1.5555 -0.7257  1.4110  4.6610
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   9.6178     6.9596   1.382 0.177915   
wt           -3.9165     0.7112  -5.507 6.95e-06 ***
qsec          1.2259     0.2887   4.247 0.000216 ***
am            2.9358     1.4109   2.081 0.046716 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,      Adjusted R-squared:  0.8336
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Now we can make the Goldfeld-Quandt test.

The Goldfeld-Quandt test will then be performed using the gqtest() function from the lmtest package to see if heteroscedasticity exists.

Hypothesis

Null (H0): Heteroscedasticity is not present.

Alternative (H1): Heteroscedasticity is present.

The syntax for this function is as follows:

gqtest(model, order.by, data, fraction)

where:

model: The lm() program constructed a linear regression model.

order.by: Predictor variables in the model.

data: The dataset name

fraction: Remove the specified number of central observations from the dataset.

The Goldfeld-Quandt test is performed by eliminating a certain number of observations from the dataset’s center, then comparing the spread of residuals between the two datasets on either side of the central observations.

Details:

The Goldfeld-Quandt test examines two submodels’ variances divided by a defined breakpoint and rejects if the variances disagree.

Under H0, the Goldfeld-Quandt test’s test statistic follows an F distribution with degrees of freedom as specified in the parameter.

Value

The function returns the following components

statistic	the value of the test statistic.
p.value	the p-value of the test.
parameter	degrees of freedom.
method	a character string indicating what type of test was performed.
data.name	a character string giving the name(s) of the data.

We usually choose to discard roughly 20% of the total observations. We can opt to eliminate the center 7 observations in this example because mtcars has 32 total observations.

Sentiment analysis in R » Complete Tutorial »

library(lmtest)

Now we can perform the Goldfeld Quandt test

gqtest(modegqtest(model, order.by = ~wt + qsec + am, data = mtcars, fraction = 7)

            Goldfeld-Quandt test
data:  model
GQ = 1.6434, df1 = 9, df2 = 8, p-value = 0.2477
alternative hypothesis: variance increases from segment 1 to 2
The test statistic is 1.6434 and the corresponding p-value is 0.2477.

Since the p-value is not less than 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

In this case, If the Goldfeld-Quandt test fails to reject the null hypothesis, heteroscedasticity is not present, and we can interpret the original regression data.

Suppose if we observed heteroscedasticity in the model then we can transform the response variable or we can make use of weighted regression.

You might try transforming the response variable by taking the log, square root, or cube root of it. Heteroscedasticity is usually eliminated as a result of this.

Each data point is given a weight based on the variance of its fitted value in weighted regression. This reduces the squared residuals of data points with higher variances by assigning tiny weights to them.

Weighted regression can alleviate the problem of heteroscedasticity when the appropriate weights are employed.

Note:

There is no test that can determine whether or not there is heteroscedasticity in a black-and-white manner. We can only speculate about its presence.

So, if the null hypothesis is rejected, we may argue that heteroscedasticity is extremely likely to exist, and if it is accepted, we can conclude that heteroscedasticity is unlikely to exist.

Line Plots in R-Time Series Data Visualization »

Subscribe to our newsletter!

[newsletter_form type=”minimal”]