Quantiles by Group calculation in R with examples

Quantiles by Group calculation in R, Quantiles are numbers in statistics that divide a ranking dataset into equal groups.

In R, we can use the following functions from the dplyr package to calculate quantiles grouped by a certain variable.

library(dplyr)

Identify the quantiles that you’re interested in.

q<-c(0.25, 0.5, 0.80)

Quantiles are calculated by grouping variables.

The following examples show how to use this syntax in practice.

df %>%
  group_by(grouping_variable) %>%
  summarize(quant25 = quantile(numeric_variable, probs = q[1]),
            quant50 = quantile(numeric_variable, probs = q[2]),
            quant80 = quantile(numeric_variable, probs = q[3]))

Quantiles by Group calculation in R

The following code demonstrates how to calculate the quantiles for a dataset’s number of victories sorted by team.

library(dplyr)

Now we can create a data frame

df <- data.frame(team=c('X', 'X', 'X', 'X', 'X', 'X', 'X', 'X',
                        'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y',
                        'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'),
                 wins=c(12, 14, 24, 15, 8, 5, 13, 13, 13, 15, 12, 13,
                        10, 19, 19, 8, 12, 16, 15, 21, 20, 10, 15, 11))

Let’s see the first six rows of the data frame

head(df)
team wins
1    X   12
2    X   14
3    X   24
4    X   15
5    X    8
6    X    5

Identify the quantiles that you’re interested in.

q<-c(0.25, 0.5, 0.80)

Let’s calculate the quantiles by the grouping variable.

df %>%
  group_by(team) %>%
  summarize(quant25 = quantile(wins, probs = q[1]),
            quant50 = quantile(wins, probs = q[2]),
            quant80 = quantile(wins, probs = q[3]))
team  quant25 quant50 quant80
  <chr>   <dbl>   <dbl>   <dbl>
1 C        11.8      15    18.4
2 X        11        13    14.6
3 Y        11.5      13    17.4

It’s worth noting that we can specify whatever number of quantiles we want:

define interest quantiles

q<-c(0.2, 0.4, 0.6, 0.8)

Now we can calculate quantiles by the grouping variable

df %>%
  group_by(team) %>%
  summarize(quant20 = quantile(wins, probs = q[1]),
            quant40 = quantile(wins, probs = q[2]),
            quant60 = quantile(wins, probs = q[3]),
            quant80 = quantile(wins, probs = q[4]))
team  quant20 quant40 quant60 quant80
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 C        11.4    14.4    15.2    18.4
2 X         9.6    12.8    13.2    14.6
3 Y        10.8    12.8    13.4    17.4

We also have the option of calculating only one quantile per group. For example, here’s how to figure out what the 95th percentile of each team’s victories is:

Calculate the team’s 95th percentile of victories.

Control Chart in Quality Control-Quick Guide – Data Science Tutorial

df %>%
  group_by(team) %>%
  summarize(quant95 = quantile(wins, probs = 0.95))
team  quant95
  <chr>   <dbl>
1 C        20.6
2 X        20.8
3 Y        19

Cool, it’s working well.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

15 − thirteen =