How to perform the Kruskal-Wallis test in R?
How to perform the Kruskal-Wallis test in R, when there are more than two groups, the Kruskal-Wallis test by rank is a non-parametric alternative to the one-way ANOVA test.
It extends the two-samples Wilcoxon test. When the assumptions of the one-way ANOVA test are not met, this method is advised.
This article will show you how to use R to compute the Kruskal-Wallis test.
How to perform the Kruskal-Wallis test in R
We’ll use the PlantGrowth data set that comes with R. It provides the weight of plants produced under two distinct treatment conditions and a control condition.
data <- PlantGrowth
Let’s print the head of the file
head(data)
weight group 1 4.17 ctrl 2 5.58 ctrl 3 5.18 ctrl 4 6.11 ctrl 5 4.50 ctrl 6 4.61 ctrl
The column “group” is known as a factor in R, while the different categories (“ctr”, “trt1”, “trt2”) are known as factor levels. The levels are listed in alphabetical order.
Display group levels
levels(data$group) [1] "ctrl" "trt1" "trt2"
If the levels are not in the correct order automatically, reorder them as follows:
data$group <- ordered(data$group, levels = c("ctrl", "trt1", "trt2"))
Summary statistics can be calculated by groupings. You can use the dplyr package.
Type this to install the dplyr package:
install.packages("dplyr")
Compute summary statistics by groups:
library(dplyr)
group_by(data, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE), median = median(weight, na.rm = TRUE), IQR = IQR(weight, na.rm = TRUE) )
Source: local data frame [3 x 6]
group count mean sd median IQR (fctr) (int) (dbl) (dbl) (dbl) (dbl) 1 ctrl 10 5.032 0.5830914 5.155 0.7425 2 trt1 10 4.661 0.7936757 4.550 0.6625 3 trt2 10 5.526 0.4425733 5.435 0.4675
Use box plots to visualize the data.
Read R base graphs to learn how to utilize them. For easy ggplot2-based data visualization, we’ll use the ggpubr R tool.
Download and install the most recent version of ggpubr.
install.packages("ggpubr")
Let’s plot weight by group and color by group
library("ggpubr") ggboxplot(my_data, x = "group", y = "weight", color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment")
Add error bars: mean_se
library("ggpubr") ggline(data, x = "group", y = "weight", add = c("mean_se", "jitter"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment")
Compute Kruskal-Wallis test
We want to see if the average weights of the plants in the three experimental circumstances vary significantly.
The test can be run using the kruskal.test() function as follows.
kruskal.test(weight ~ group, data = data)
Kruskal-Wallis rank-sum test
data: weight by group Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
Inference
We can conclude that there are significant differences between the treatment groups because the p-value is less than the significance criterion of 0.05.
Multiple pairwise comparisons between groups were conducted.
We know there is a substantial difference between groups based on the Kruskal-Wallis test’s results, but we don’t know which pairings of groups are different.
The function pairwise.wilcox.test() can be used to calculate pairwise comparisons between group levels with different testing corrections.
pairwise.wilcox.test(PlantGrowth$weight, PlantGrowth$group, p.adjust.method = "BH")
Pairwise comparisons using the Wilcoxon rank-sum test
How to perform a one-sample t-test in R?
data: PlantGrowth$weight and PlantGrowth$group ctrl trt1 trt1 0.199 - trt2 0.095 0.027
p-value adjustment method: BH
Conclusion
Only trt1 and trt2 are statistically different (p<0.05) in the pairwise comparison.