Post-Hoc Pairwise Comparisons in R -Quick Guide
Post-Hoc Pairwise Comparisons in R, To see if there is a statistically significant difference between the means of three or more independent groups, a one-way ANOVA is utilized.
The following null and alternative hypotheses are used in a one-way ANOVA.
H0: The means of all the groups are the same.
HA: Not every group’s mean is the same.
We reject the null hypothesis and conclude that not all of the group means are equal if the overall p-value of the ANOVA is less than a specific significance level (e.g. <0.05).
Area Under Curve in R (AUC) » finnstats
We may then run posthoc pairwise comparisons to see which group means are different.
Post-Hoc Pairwise Comparisons in R
The following example demonstrates how to execute posthoc pairwise comparisons in R.
1. Tukey’s Method
2. Scheffe’s Method
3. The Bonferroni Method
4. The Holm Approach
Example: One-Way ANOVA in R
Let’s say a teacher wants to see if three distinct studying methods result in various exam results among students. He/She puts this to the test by randomly assigning 10 students to each study strategy and recording their exam results.
To test for differences in mean exam scores across the three groups, we may use the R code below to do a one-way ANOVA.
Statistical Hypothesis Testing-A Step by Step Guide » finnstats
Let’s create a data frame first
df <- data.frame(technique = rep(c("A", "B", "C"), each=10), score = c(56, 106, 102, 108, 103, 102, 73, 94, 108, 99, 77, 72, 73, 73, 77, 74, 77, 90, 92, 98, 76, 58, 77, 87, 88, 80, 81, 85, 85, 88))
Now we can execute one-way ANOVA
model <- aov(score ~ technique, data = df) model
Call: aov(formula = score ~ technique, data = df) Terms: technique Residuals Sum of Squares 304.267 6858.700 Deg. of Freedom 2 27 Residual standard error: 15.93819 Estimated effects may be unbalanced
Let’s view the summary of ANOVA
summary(model)
Df Sum Sq Mean Sq F value Pr(>F) technique 2 1441 720.4 4.665 0.0182 * Residuals 27 4170 154.4 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We will reject the null hypothesis that the mean exam score is the same for each studying approach because the overall p-value of the ANOVA (0.0182) is less than 0.05.
How to Calculate a Bootstrap Standard Error in R » finnstats
After that, we may do posthoc pairwise comparisons to see whether groups have different means.
1. Tukey’s Method
When the sample sizes of each group are equal, the Tukey posthoc procedure is the best choice.
To conduct the Tukey posthoc method in R, we can utilize the built-in TukeyHSD() function:
Now perform the Tukey posthoc method
TukeyHSD(model, conf.level=.95)
Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = score ~ technique, data = df) $technique diff lwr upr p adj B-A -14.8 -28.57923 -1.0207747 0.0334100 C-A -14.6 -28.37923 -0.8207747 0.0362015 C-B 0.2 -13.57923 13.9792253 0.9992862
The p-value (“p adj”) smaller than 0.05 is for the difference between B and A, C and A, as seen in the output.
As a result, we may conclude that the difference in mean exam scores between students who used A and students who utilized B is statistically significant.
What all Data Science Soft Skills Required? » finnstats
the difference in mean exam scores between students who used C and students who utilized B is statistically significant.
2. Scheffe’s Method
When comparing group means, the Scheffe technique is the most conservative posthoc pairwise comparison approach and produces the broadest confidence intervals.
To execute the Scheffe posthoc procedure in R, we can use the ScheffeTest() function from the DescTools package:
library(DescTools) ScheffeTest(model)
Posthoc multiple comparisons of means: Scheffe Test 95% family-wise confidence level $technique diff lwr.ci upr.ci pval B-A -14.8 -29.19395 -0.4060463 0.0429 * C-A -14.6 -28.99395 -0.2060463 0.0463 * C-B 0.2 -14.19395 14.5939537 0.9994 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the output, we can see that there are two p-values less than 0.05.
The difference in mean exam scores between students who used A and students who utilized B is statistically significant.
The difference in mean exam scores between students who used A and students who utilized C is statistically significant.
3. The Bonferroni Method
When you want to make a set of planned pairwise comparisons, the Bonferroni procedure is the way to go.
To perform the Bonferroni posthoc procedure in R, use the following syntax.
15 Essential packages in R for Data Science » finnstats
apply the Bonferroni posthoc test
pairwise.t.test(df$score, df$technique, p.adj='bonferroni')
Pairwise comparisons using t-tests with pooled SD data: df$score and df$technique A B B 0.039 - C 0.042 1.000 P value adjustment method: bonferroni
From the output, we can see that the p-value less than 0.05 is for the difference between A &B and A & C.
4. The Holm Approach
When you have a set of planned pairwise comparisons to make ahead of time, the Holm technique is also utilized, and it has more power than the Bonferroni method, therefore it’s generally favored.
To conduct the Holm posthoc approach in R, use the following syntax.
Stock Market Predictions Next Week » finnstats
use the Holm method for posthoc analysis
pairwise.t.test(df$score, df$technique, p.adj='holm')
Pairwise comparisons using t tests with pooled SD data: df$score and df$technique A B B 0.039 - C 0.042 1.000 P value adjustment method: bonferroni
From the output, we can see that the p-value less than 0.05 is for the difference between A &B and A & C.
Have you found this article to be interesting? I’d be glad if you could forward it to a friend or share it on Twitter or Linked In to help it spread.
I am interested on this post, i always receive better things for further improvements of R soft ware analysis skills
Thank You