Homoscedasticity in Regression Analysis
Homoscedasticity in Regression Analysis, The Goldfeld–Quandt test checks for homoscedasticity in regression studies in statistics.
This is accomplished by separating a dataset into two portions or groups, which is why the test is also known as a two-group test.
The Goldfeld–Quandt test is one of two tests proposed by Stephen Goldfeld and Richard Quandt in a paper published in 1965.
Homoscedasticity in Regression Analysis
Heteroscedasticity in a regression model refers to the unequal scatter of residuals at different levels of a response variable.
If there is heteroscedasticity, one of the essential assumptions of linear regression is that the residuals are evenly distributed at each level of the response variable.
This article will show you how to use R to perform the Goldfeld-Quandt test to see if a regression model has heteroscedasticity.
Building a Regression Model is the first step.
First, we’ll use R’s built-in mtcars dataset to create a multiple linear regression model:
we can make use of one of our previous posts and identify the best regression model
model <- lm(mpg~ wt + qsec + am, data=mtcars)
Let’s view the model summary
Call: lm(formula = mpg ~ wt + qsec + am, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.4811 -1.5555 -0.7257 1.4110 4.6610 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.6178 6.9596 1.382 0.177915 wt -3.9165 0.7112 -5.507 6.95e-06 *** qsec 1.2259 0.2887 4.247 0.000216 *** am 2.9358 1.4109 2.081 0.046716 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.459 on 28 degrees of freedom Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336 F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
Now we can make the Goldfeld-Quandt test.
The Goldfeld-Quandt test will then be performed using the gqtest() function from the lmtest package to see if heteroscedasticity exists.
Null (H0): Heteroscedasticity is not present.
Alternative (H1): Heteroscedasticity is present.
The syntax for this function is as follows:
gqtest(model, order.by, data, fraction)
model: The lm() program constructed a linear regression model.
order.by: Predictor variables in the model.
data: The dataset name
fraction: Remove the specified number of central observations from the dataset.
The Goldfeld-Quandt test is performed by eliminating a certain number of observations from the dataset’s center, then comparing the spread of residuals between the two datasets on either side of the central observations.
The Goldfeld-Quandt test examines two submodels’ variances divided by a defined breakpoint and rejects if the variances disagree.
Under H0, the Goldfeld-Quandt test’s test statistic follows an F distribution with degrees of freedom as specified in the parameter.
The function returns the following components
|statistic||the value of the test statistic.|
|p.value||the p-value of the test.|
|parameter||degrees of freedom.|
|method||a character string indicating what type of test was performed.|
|data.name||a character string giving the name(s) of the data.|
We usually choose to discard roughly 20% of the total observations. We can opt to eliminate the center 7 observations in this example because mtcars has 32 total observations.
Now we can perform the Goldfeld Quandt test
gqtest(modegqtest(model, order.by = ~wt + qsec + am, data = mtcars, fraction = 7)
Goldfeld-Quandt test data: model GQ = 1.6434, df1 = 9, df2 = 8, p-value = 0.2477 alternative hypothesis: variance increases from segment 1 to 2 The test statistic is 1.6434 and the corresponding p-value is 0.2477.
Since the p-value is not less than 0.05, we fail to reject the null hypothesis. We do not have sufficient evidence to say that heteroscedasticity is present in the regression model.
In this case, If the Goldfeld-Quandt test fails to reject the null hypothesis, heteroscedasticity is not present, and we can interpret the original regression data.
Suppose if we observed heteroscedasticity in the model then we can transform the response variable or we can make use of weighted regression.
You might try transforming the response variable by taking the log, square root, or cube root of it. Heteroscedasticity is usually eliminated as a result of this.
Each data point is given a weight based on the variance of its fitted value in weighted regression. This reduces the squared residuals of data points with higher variances by assigning tiny weights to them.
Weighted regression can alleviate the problem of heteroscedasticity when the appropriate weights are employed.
There is no test that can determine whether or not there is heteroscedasticity in a black-and-white manner. We can only speculate about its presence.
So, if the null hypothesis is rejected, we may argue that heteroscedasticity is extremely likely to exist, and if it is accepted, we can conclude that heteroscedasticity is unlikely to exist.