How to Measure Execution Time in R
How to Measure Execution Time in R, To compare the execution times of different expressions, use R’s microbenchmark package.
To do so, use the following syntax:
library(microbenchmark)
Let’s compare two different expressions’ execution times
microbenchmark(expression1, expression2))
The following example shows how to use this syntax in practice.
How To Become A Quantitative Analyst » finnstats
Example: Using microbenchmark() in R
Assume we have the following R data frame containing information about points scored by players on various basketball teams:
Make this example replicable.
set.seed(123)
Let’s create a data frame
df <- data.frame(team=rep(c('A', 'B'), each=500), points=rnorm(1000, mean=20))
Now we can view the data frame
Where can I find Data Science Internships » finnstats
head(df)
team points 1 A 19.43952 2 A 19.76982 3 A 21.55871 4 A 20.07051 5 A 20.12929 6 A 21.71506
Assume we want to compute the average points scored by players on each team using two different methods:
Method 1: Use aggregate() from Base R
Method 2: Use group_by() and summarise_at() from dplyr
We can use the microbenchmark() function to see how long each of these expressions takes to execute:
library(microbenchmark) library(dplyr)
the time it takes to calculate the mean value of points by a team
microbenchmark( aggregate(df$points, list(df$team), FUN=mean), df %>% group_by(team) %>% summarise_at(vars(points), list(name = mean)))
Unit: microseconds expr aggregate(df$points, list(df$team), FUN = mean) df %>% group_by(team) %>% summarise_at(vars(points), list(name = mean)) min lq mean median uq max neval cld 609.5 736.65 1013.814 820.2 1063.2 3628.1 100 a 11745.8 12475.40 19648.114 14099.4 16279.9 439932.2 100 b
The microbenchmark() function runs each expression 100 times to calculate the following metrics:
min: The shortest amount of time it took to execute.
Data Science applications in Retail » finnstats
lq: Execution time in the lower quartile (25th percentile).
mean: The amount of time it took to execute.
median: The execution time in minutes.
uq: Upper quartile (75th percentile) execution time.
max: The maximum amount of time it took to execute.
The number of times each expression was evaluated by neval.
Typically, we only consider the mean or median time required to execute each expression.
We can also use the boxplot() function to visualize the time distribution for each expression:
Is It Difficult to Learn Data Science » finnstats
library(microbenchmark) library(dplyr)
the time it takes to calculate the mean value of points by a team
results <- microbenchmark( aggregate(df$points, list(df$team), FUN=mean), df %>% group_by(team) %>% summarise_at(vars(points), list(name = mean)))
Now we can create a boxplot to visualize the results
boxplot(results, names=c('Base R', 'dplyr'))
According to the boxplots, the dplyr method takes longer on average to calculate the mean points value by the team.
Note: In this example, we compared the execution times of two different expressions using the microbenchmark() function, but in practice, you can compare as many expressions as you want.
Based on these findings, we can conclude that the standard R method is significantly faster.