Repeated Measures of ANOVA in R Complete Tutorial
Repeated Measures of ANOVA in R, in this tutorial we are going to discuss one-way and two-way repeated measures of ANOVA.
In this case, the same individuals are measured the same outcome variable under different time points or conditions.
This test is also known as a within-subjects ANOVA or ANOVA with repeated measures.
Assumptions
1. No significant outlier
To identify the outlier, you can use the box plot method or the three sigma limit method.
2. Normality Assumption
The outcome of the dependent variable should be normally distributed in each cell of the design.
Based on the Shapiro Wilks test can use it for the same.
3. Assumption of Sphericity
These can be checked based on the anova_test function. The variance of the differences between groups should be equal.
Sphericity can be checked using Mauchly’s test of sphericity, which is automatically reported when using the R function anova_test()
The above assumptions are not met then you can use an alternative of Repeated Measures ANOVA in the nonparametric method (Friedman Test).
One-way Repeated Measures of ANOVA in R
Prerequisites
library(xlsx)
library(rstatix)
library(reshape)
library(tidyverse)
library(dplyr)
library(ggpubr)
library(plyr)
library(datarium)
Getting Data
data<-read.xlsx("D:/RStudio/data.xlsx",sheetName="Sheet1") data <- data %>% gather(key = "time", value = "score", T0, T1, T2) %>% convert_as_factor(id, time) data.frame(head(data, 3))
id time score 1 S1 T0 7.454333333 2 S2 T0 8.030000000 3 S3 T0 9.911666667
Objective
The objective is to identify any significant difference between time points exit or not.
Summary statistics
summary<-data %>% group_by(time) %>% get_summary_stats(score, type = "mean_sd") data.frame(summary)
time variable n mean sd 1 T0 score 24 7.853 3.082 2 T1 score 24 9.298 2.090 3 T2 score 24 5.586 0.396
Visualization
Create a box plot and add points corresponding to individual values:
You can use a box plot also for outlier identification.
Based on boxplot no outlier detected in the dataset.
Outlier Detection
outlier<-data %>% group_by(time) %>% identify_outliers(score) data.frame(outlier)
[1] time id score is.outlier is.extreme <0 rows> (or 0-length row.names)
These indicates no outlier in the data set.
Normality Checking
You can use Shapiro-Wilk’s test for normality checking, Please note if the data set is small then Shapiro-Wilk’s test is ideal otherwise go for the QQ plot.
normality<-data %>%
group_by(time) %>%
shapiro_test(score)
data.frame(normality)
time variable statistic p 1 T0 score 0.9540460218 0.33073045010 2 T1 score 0.9868670126 0.98282491871 3 T2 score 0.9261641361 0.08004275234
The normality assumption can be checked by computing the Shapiro-Wilk test for each time point. If the data is normally distributed, the p-value should be greater than 0.05.
Tested data was normally distributed at each time point, as assessed by Shapiro-Wilk’s test (p > 0.05).
If your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality.
ggqqplot(data, "score", ggtheme = theme_bw()) +
facet_grid(time ~ Treatment, labeller = "label_both")
Sphericity Assumption
res<-anova_test(data=data,dv=score,wid=id,within=time)
get_anova_table(res)
ANOVA Table (type III tests) Effect DFn DFd F p p<.05 ges 1 time 1.42 32.73 24.436 0.00000297 * 0.343
The data set score was significantly different at the different time points during the diet, F(1.42, 33) = 24.4, p < 0.0001, eta2[g] = 0.34
How to do data reshaping in R?
Based on the anova_test function sphericity correction automatically applied to the output. The default is to apply automatically the Greenhouse-Geisser sphericity correction to only within-subject factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).
Pair Wise comparison
pair<-data %>% pairwise_t_test( value~time,paired=TRUE, p.adjust.method = "bonferroni" ) data.frame(pair)
. .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif 1 value t1 t2 10 10 -4.967617803 9 0.000772000 0.002000 ** 2 value t1 t3 10 10 -13.228148424 9 0.000000334 0.000001 **** 3 value t2 t3 10 10 -4.867815675 9 0.000886000 0.003000 **
Summary
Based on the repeated measures of ANOVA, all the tested pairwise differences between time points are statistically significant.
Two-way Repeated Measures of ANOVA in R
The two-way repeated measures ANOVA can be performed in order to determine whether there is a significant interaction between treatment and time on the score.
Getting data
data<-read.xlsx("D:/RStudio/data.xlsx",sheetName="Sheet1") data <- data %>%
gather(key = "time", value = "score", T0, T1, T2) %>%
convert_as_factor(id, Treatment,time)
head(data, 3)
id Treatment time score
S1 A T0 7.5
S2 A T0 8.0
S3 A T0 9.9
Summary statistics
summary<-data %>%
group_by(Treatment,time) %>%
get_summary_stats(score, type = "mean_sd")
data.frame(summary)
Treatment time variable n mean sd A T0 score 10 7.2 2.49 A T1 score 10 8.7 2.00 A T2 score 10 5.7 0.27 B T0 score 10 7.8 3.01 B T1 score 10 9.1 2.05 B T2 score 10 5.7 0.50
Outlier Detection
outlier<-data %>%
group_by(Treatment,time) %>%
identify_outliers(score)
data.frame(outlier)
Treatment time id score is.outlier is.extreme A T1 S7 6.0 TRUE FALSE A T1 S9 4.7 TRUE TRUE
Outlier got detected in the data frame for the subject S7 and S9. Need to remove or revisit the data points in the dataset for further analysis.
Data points are revisited and re analyzed.
tidyverse in R complete tutorial
Summary statistics need to reanalyze again
summary<-data %>%
group_by(Treatment,time) %>%
get_summary_stats(score, type = "mean_sd")
data.frame(summary)
Treatment time variable n mean sd A T0 score 10 7.2 2.49 A T1 score 10 8.7 1.06 A T2 score 10 5.7 0.27 B T0 score 10 7.8 3.01 B T1 score 10 9.1 2.05 B T2 score 10 5.7 0.50
From the corrected data need check outlier detection again.
outlier<-data %>%
group_by(Treatment,time) %>%
identify_outliers(score)
data.frame(outlier)
[1] Treatment time id score is.outlier is.extreme <0 rows> (or 0-length row.names)
You can see now no outlier observed in the current dataset.
Normality Checking
normality<-data %>%
group_by(Treatment,time) %>%
shapiro_test(score)
data.frame(normality)
Treatment time variable statistic p A T0 score 0.89 0.169 A T1 score 0.92 0.347 A T2 score 0.85 0.065 B T0 score 0.96 0.739 B T1 score 0.93 0.411 B T2 score 0.81 0.020
The data set score was normally distributed at each time point (p > 0.05), except for treatment B at time point T2, as assessed by Shapiro-Wilk’s test.
QQ Plot
Based on Shapiro-Wilk’s analysis at time point T2 non-normality observed, so further you need to check the QQ plot for the verification.
From the plot above, as all the points fall approximately along the reference line, we can assume normality.
Sphericity Assumption
res.aov <- anova_test(
data = data, dv = score, wid = id,
within = c(Treatment, time)
)
get_anova_table(res.aov)
ANOVA Table (type III tests) Effect DFn DFd F p p<.05 ges Treatment 1.0 9 0.25 6.3e-01 0.009 time 2.0 18 26.30 4.5e-06 * 0.354 Treatment:time 1.2 11 0.12 7.9e-01 0.004
Observed statistically significant two-way interactions between treatment and time p < 0.005. If the interaction is not significant then simply execute pairwise t test comparison (Code mentioned in the end of the tutorial).
Effect of treatment at each time point
one.way <- data %>%
group_by(time) %>%
anova_test(dv = score, wid = id, within = Treatment) %>%
get_anova_table() %>%
adjust_pvalue(method = "bonferroni")
one.way
time Effect DFn DFd F p p..05 ges p.adj T0 Treatment 1 9 0.169 0.69 0.01200 1 T1 Treatment 1 9 0.244 0.63 0.01600 1 T2 Treatment 1 9 0.003 0.96 0.00021 1
Pairwise comparisons between treatment groups at each time point
pwc <- data %>%
group_by(time) %>%
pairwise_t_test(
score ~ Treatment, paired = TRUE,
p.adjust.method = "bonferroni"
)
data.frame(pwc)
time .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif T0 score A B 10 10 -0.411 9 0.69 0.69 ns T1 score A B 10 10 -0.494 9 0.63 0.63 ns T2 score A B 10 10 -0.051 9 0.96 0.96 ns
No significant difference was observed between samples for any of the tested time points.
Effect of time at each level of treatment
one.way2 <- data %>%
group_by(Treatment) %>%
anova_test(dv = score, wid = id, within = time) %>%
get_anova_table() %>%
adjust_pvalue(method = "bonferroni")
data.frame(one.way2)
Treatment Effect DFn DFd F p p..05 ges p.adj A time 1.3 11 11.0 0.005 * 0.40 0.01 B time 1.3 11 8.5 0.010 * 0.33 0.02
Pairwise comparisons between time points
pwc2 <- data %>%
group_by(Treatment) %>%
pairwise_t_test(
score ~ time, paired = TRUE,
p.adjust.method = "bonferroni"
)
data.frame(pwc2)
Treatment .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif A score T0 T1 10 10 -2.2 9 5.2e-02 1.6e-01 ns A score T0 T2 10 10 1.9 9 9.0e-02 2.7e-01 ns A score T1 T2 10 10 8.2 9 1.8e-05 5.5e-05 **** B score T0 T1 10 10 -2.4 9 3.7e-02 1.1e-01 ns B score T0 T2 10 10 2.0 9 8.1e-02 2.4e-01 ns B score T1 T2 10 10 4.3 9 2.0e-03 6.0e-03 **
A significant difference was observed between time points T1 and T2 for treatments A & B (p<0.05).
If the interaction effect from ANOVA is not significant then you can simply execute a pairwise t-test based on the below command.
Comparisons for treatment variable
data %>%
pairwise_t_test(
score ~ Treatment, paired = TRUE,
p.adjust.method = "bonferroni"
)
Comparisons for the time variable
data %>%
pairwise_t_test(
score ~ time, paired = TRUE,
p.adjust.method = "bonferroni"
)
Conclusion
In this tutorial, we discussed repeated measures of ANOVA in R with codes. In the same way, you can execute Three-way repeated measures of ANOVA.
This is very relevant topic with appropriate example dataset. However, in addition to posting as it is it better to add ppt or pdf
Please make the example dataset avaliable otherwise we cannot replicate your script.
Please kindly share the link from where we can download the dataset used in this tutorial.