Cumulative Sum calculation in R

Cumulative Sum calculation in R, using the dplyr package in R, you can calculate the cumulative sum of a column using the following methods.

Best online course for R programming – Data Science Tutorials

Approach 1: Calculate Cumulative Sum of One Column

df %>% mutate(cum_sum = cumsum(var1))

Approach 2: Calculate Cumulative Sum by Group

df %>% group_by(var1) %>% mutate(cum_sum = cumsum(var2))

The examples below demonstrate how to apply each strategy in practice.

One way ANOVA Example in R-Quick Guide – Data Science Tutorials

Example 1: Using dplyr, calculate the cumulative sum.

Let’s say we have the following R data frame:

Let’s make a dataset

df <- data.frame(day=c(1, 2, 3, 4, 5, 6, 7, 8),
                 sales=c(57, 42, 50, 99, 59, 51, 58, 45))

Now we can view the dataset

df
  day sales
1   1    57
2   2    42
3   3    50
4   4    99
5   5    59
6   6    51
7   7    58
8   8    45

To create a new column that holds the cumulative sum of the values in the ‘sales’ column, use the following code.

How to Use the Multinomial Distribution in R? – Data Science Tutorials

library(dplyr)

Let’s calculate the cumulative sum of sales

df %>% mutate(cum_sales = cumsum(sales))
    day sales cum_sales
1   1    57        57
2   2    42        99
3   3    50       149
4   4    99       248
5   5    59       307
6   6    51       358
7   7    58       416
8   8    45       461

Example 2: Using dplyr, calculate the Cumulative Sum by Group.

Let’s say we have the following R data frame.

Dealing With Missing values in R – Data Science Tutorials

Make a dataset

df <- data.frame(store=c('X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'),
                 day=c(1, 2, 3, 4, 1, 2, 3, 4),
                 sales=c(87, 82, 80, 98, 98, 81, 88, 83))

View the dataset now

df
      X   1    87
2     X   2    82
3     X   3    80
4     X   4    98
5     Y   1    98
6     Y   2    81
7     Y   3    88
8     Y   4    83

To construct a new column that holds the cumulative sum of the values in the ‘sales’ column, grouped by the ‘store’ column, we can use the following code:

library(dplyr)

Now we can calculate the cumulative sum of sales by store.

Methods for Integrating R and Hadoop complete Guide – Data Science Tutorials

df %>% group_by(store) %>% mutate(cum_sales = cumsum(sales))
store   day sales cum_sales
  <chr> <dbl> <dbl>     <dbl>
1 X         1    87        87
2 X         2    82       169
3 X         3    80       249
4 X         4    98       347
5 Y         1    98        98
6 Y         2    81       179
7 Y         3    88       267
8 Y         4    83       350

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

5 − one =