How to Measure Execution Time in R

How to Measure Execution Time in R, To compare the execution times of different expressions, use R’s microbenchmark package.

To do so, use the following syntax:

library(microbenchmark)

Let’s compare two different expressions’ execution times

microbenchmark(expression1, expression2))

The following example shows how to use this syntax in practice.

How To Become A Quantitative Analyst » finnstats

Example: Using microbenchmark() in R

Assume we have the following R data frame containing information about points scored by players on various basketball teams:

Make this example replicable.

set.seed(123)

Let’s create a data frame

df <- data.frame(team=rep(c('A', 'B'), each=500), points=rnorm(1000, mean=20))

Now we can view the data frame

Where can I find Data Science Internships » finnstats

head(df)
    team   points
1    A 19.43952
2    A 19.76982
3    A 21.55871
4    A 20.07051
5    A 20.12929
6    A 21.71506

Assume we want to compute the average points scored by players on each team using two different methods:

Method 1: Use aggregate() from Base R

Method 2: Use group_by() and summarise_at() from dplyr

We can use the microbenchmark() function to see how long each of these expressions takes to execute:

library(microbenchmark)
library(dplyr)

the time it takes to calculate the mean value of points by a team

microbenchmark(
  aggregate(df$points, list(df$team), FUN=mean),
  df %>% group_by(team) %>% summarise_at(vars(points), list(name = mean)))
Unit: microseconds
                                                                    expr
                         aggregate(df$points, list(df$team), FUN = mean)
 df %>% group_by(team) %>% summarise_at(vars(points), list(name = mean))
     min       lq      mean  median      uq      max neval cld
   609.5   736.65  1013.814   820.2  1063.2   3628.1   100  a
 11745.8 12475.40 19648.114 14099.4 16279.9 439932.2   100   b

The microbenchmark() function runs each expression 100 times to calculate the following metrics:

min: The shortest amount of time it took to execute.

Data Science applications in Retail » finnstats

lq: Execution time in the lower quartile (25th percentile).

mean: The amount of time it took to execute.

median: The execution time in minutes.

uq: Upper quartile (75th percentile) execution time.

max: The maximum amount of time it took to execute.

The number of times each expression was evaluated by neval.

Typically, we only consider the mean or median time required to execute each expression.

We can also use the boxplot() function to visualize the time distribution for each expression:

Is It Difficult to Learn Data Science » finnstats

library(microbenchmark)
library(dplyr)

the time it takes to calculate the mean value of points by a team

results <- microbenchmark(
  aggregate(df$points, list(df$team), FUN=mean),
  df %>% group_by(team) %>% summarise_at(vars(points), list(name = mean)))

Now we can create a boxplot to visualize the results

boxplot(results, names=c('Base R', 'dplyr'))

According to the boxplots, the dplyr method takes longer on average to calculate the mean points value by the team.

Note: In this example, we compared the execution times of two different expressions using the microbenchmark() function, but in practice, you can compare as many expressions as you want.

Based on these findings, we can conclude that the standard R method is significantly faster.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

4 × 3 =