Analysis of Variance in R: 3 Steps

by finnstats

Analysis of Variance in R, You will be able to identify reasons for employing an Analysis of Variance (or ANOVA) test in your data analysis after completing this tutorial.

You’ll also learn how to analyze the findings of an ANOVA f-test.

Let’s imagine you want to look at a category variable and see how it relates to other variables.

Take, for example, the Airline dataset.

Plot Differences in Two Measurements-Bland-Altman Plot in R »

Step 1: Loading Data

library(tidyverse)
library(dplyr)
library(ggplot2)
data<-read.csv("D:/RStudio/Airlinedata.csv",1)
head(data)

“How do different categories of the reporting airline feature (as a categorical variable) affect flight delays?” is a question you might wish to consider.

The ANOVA method can be used to evaluate categorical variables like “Reporting_Airline.”

ANOVA can be used to determine the relationship between two groups of a categorical variable.

You may use ANOVA to see if there is any difference in the average flight delays for the different airlines in the Airline dataset.

How to Perform Dunnett’s Test in R » Post-hoc Test »

Step 2: Null Hypothesis

As a result, the null hypothesis for ANOVA is that the mean (the reporting airline’s average value) is the same for all groups.

The alternate or research hypothesis is that the average for all groups is not the same.

Here we are going to explain two group cases, a comparison between AA vs AS and AA vs PA (1).

In the first case, the null hypothesis is that the mean values of ‘AA’ and ‘AS’ are the same, while the alternative hypothesis is that they are not.

The F-test score and the p-value are returned by the ANOVA test.

The F-test determines the ratio of the variance between the mean of each sample group and the variation within each sample group.

The p-value indicates whether or not the outcome is statistically significant.

In general, you can consider a variance to be statistically significant if the p-value is less than 0.05.

The association is substantial if the F-test score is high and no association if the F-test score is low.

Repeated Measures of ANOVA in R Complete Tutorial »

Step 3: ANOVA comparison

The aov() function in the stats package can be used to perform the ANOVA test.

data1<-data %>%
select(ArrDelay, Reporting_Airline) %>%
  filter(Reporting_Airline=='AA'|Reporting_Airline=='AS')
AOV<-aov(ArrDelay~Reporting_Airline,data=data1)
summary(AOV)

                    Df    SumSq MeanSq Fvalue Pr(>F)
Reporting_Airline    1     126   125.7    0.13  0.718
Residuals         1139 1097707   963.7

It calculates the ANOVA results once you enter the arrival delay data of the two airline groups you want to compare.

Because the F-test score of 0.13 is less than 1 and the P-value is greater than 0.05, the prices between “AA” and “AS” are not significantly different.

How to Perform Tukey HSD Test in R » Quick Guide »

A similar analysis can be used to “AA” and “PA (1).”

data1<-data %>%
  select(ArrDelay, Reporting_Airline) %>%
  filter(Reporting_Airline=='AA'|Reporting_Airline=='PA (1)')
AOV<-aov(ArrDelay~Reporting_Airline,data=data1)
summary(AOV)

                    Df  Sum Sq Mean Sq F value   Pr(>F)    
Reporting_Airline    1   24008   24008   17.95 2.45e-05 ***
Residuals         1127 1507339    1337

Because the F-test score of 17.95 is quite high and the P-value is 0.0000245, which is less than 0.05, the flight delays between “AA” and “PA (1)” are significantly different.

Because the ANOVA test produces a significant F-test score and a small P-value, you can conclude that there is a strong association between a category variable and other factors.

You learned that an ANOVA test can be used to identify correlations between distinct groups of a categorical variable and that the F-test score and p-value can be used to identify the statistical significance.

Kruskal Wallis test in R-One-way ANOVA Alternative »