Aggregate daily data to monthly and yearly in R
Aggregate daily data to monthly and yearly in R, Data aggregation is an important step in data analysis that involves summarizing data in a coarser temporal resolution for ease of analysis and visualization.
For example, daily data can be aggregated into monthly or yearly data, allowing for better identification of trends and patterns.
In this article, we will explore how to aggregate daily data into monthly and yearly in R with multiple examples.
Example 1: Aggregating daily data to monthly and yearly using the zoo package
The zoo package provides an easy-to-use function for aggregating daily data to monthly or yearly frequency. The steps are:
1. Load the zoo package:
library(zoo)
2. Create a sample daily dataset:
date <- seq(as.Date("2020/1/1"), by = "day", length.out = 365) value <- rnorm(365, mean = 10, sd = 3) daily_data <- zoo(value, date) head(daily_data)
2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05 2020-01-06
12.352776 13.781357 13.287932 2.734408 12.295595 8.494018
3. Aggregate the daily data to monthly frequency:
monthly_data <- aggregate(daily_data, as.yearmon(index(daily_data)), sum) monthly_data
Jan 2020 Feb 2020 Mar 2020 Apr 2020 May 2020 Jun 2020 Jul 2020 Aug 2020 Sep 2020
280.3542 281.2937 319.1381 305.1400 287.6361 285.5280 309.2165 300.0028 339.0383
Oct 2020 Nov 2020 Dec 2020
319.4942 332.4325 307.3884
In the above code, `as.yearmon()` is used to convert the date index of the daily dataset to a year and month index.
Then, `aggregate()` function is used to summarize the daily data to monthly data by taking the sum of all values in each month.
4. Aggregate the daily data to yearly frequency:
yearly_data <- aggregate(daily_data, as.year(index(daily_data)), mean)
In the above code, `as.year()` is used to convert the date index of the daily dataset to a year index. Then, `aggregate()` function is used to summarize the daily data to yearly data by taking the mean of all values in each year.
Example 2: Aggregating daily data to monthly and yearly using the xts package
The xts package provides another easy-to-use function for aggregating daily data to monthly or yearly frequency. The steps are:
1. Load the xts package:
library(xts)
2. Create a sample daily dataset:
date <- seq(as.Date("2021/1/1"), by = "day", length.out = 365) value <- rnorm(365, mean = 10, sd = 3) daily_data <- xts(value, order.by = date)
3. Aggregate the daily data to monthly frequency:
monthly_data <- apply.monthly(daily_data, mean) monthly_data [,1] 2021-01-31 9.704321 2021-02-28 10.209904 2021-03-31 10.080591 2021-04-30 9.999510 2021-05-31 10.947511 2021-06-30 10.375928 2021-07-31 10.055084 2021-08-31 10.450189 2021-09-30 10.546134 2021-10-31 9.529693 2021-11-30 10.538989 2021-12-31 9.758710
In the above code, `apply.monthly()` function is used to summarize the daily data to monthly data by taking the mean of all values in each month.
4. Aggregate the daily data to yearly frequency:
yearly_data <- apply.yearly(daily_data, sum) yearly_data 2021-12-31 3716.003
In the above code, `apply.yearly()` function is used to summarize the daily data to yearly data by taking the sum of all values in each year.
Surprising Things You Can Do With R »
Example 3: Aggregating daily data to monthly and yearly using the dplyr package
The dplyr package provides a flexible way of summarizing data by grouping data and applying summary statistics. The steps for aggregating daily data to monthly and yearly frequency are:
1. Load the dplyr package:
library(dplyr)
2. Create a sample daily dataset:
date <- seq(as.Date("2022/1/1"), by = "day", length.out = 365) value <- rnorm(365, mean = 10, sd = 3) daily_data <- data.frame(date, value)
3. Convert the date column to a date format and extract the year and month columns:
daily_data$date <- as.Date(daily_data$date) daily_data$year <- format(daily_data$date, "%Y") daily_data$month <- format(daily_data$date, "%m")
4. Group the daily data by year and month, and summarize by taking the mean of all values in each month:
monthly_data <- daily_data %>% group_by(year, month) %>% summarize(mean_value = mean(value)) monthly_data
year month mean_value
1 2022 01 10.2
2 2022 02 9.59
3 2022 03 9.97
4 2022 04 10.8
5 2022 05 9.85
6 2022 06 10.3
7 2022 07 9.28
8 2022 08 9.40
9 2022 09 10.9
10 2022 10 10.4
11 2022 11 10.3
12 2022 12 10.4
In the above code, `group_by()` function is used to group the daily data by year and month. The `summarize()` function is used to summarize the data by taking the mean of all values in each month.
5. Group the daily data by year, and summarize by taking the sum of all values in each year:
yearly_data <- daily_data %>% group_by(year) %>% summarize(sum_value = sum(value)) yearly_data year sum_value <chr> <dbl> 1 2022 3692.
In the above code, `group_by()` function is used to group the daily data by year. The `summarize()` function is used to summarize the data by taking the sum of all values in each year.
Example 4: Aggregating daily data to monthly and yearly using the data.table package
The data.table package is a fast and efficient package for working with large datasets. The steps for aggregating daily data to monthly and yearly frequency are:
1. Load the data.table package:
library(data.table)
2. Create a sample daily dataset:
date <- seq(as.Date("2023/1/1"), by = "day", length.out = 365) value <- rnorm(365, mean = 10, sd = 3) daily_data <- data.table(date, value)
3. Convert the date column to a date format and extract the year and month columns:
daily_data[, date := as.Date(date)] daily_data[, year := format(date, "%Y")] daily_data[, month := format(date, "%m")]
4. Group the daily data by year and month, and summarize by taking the mean of all values in each month:
monthly_data <- daily_data[, .(mean_value = mean(value)), by = .(year, month)]
In the above code, `.` is used to refer to the data.table. The `by = .(year, month)` argument is used to group the data by year and month.
The `.(mean_value = mean(value))` argument is used to summarize the data by taking the mean of all values in each month.
5. Group the daily data by year, and summarize by taking the sum of all values in each year:
yearly_data <- daily_data[, .(sum_value = sum(value)), by = .(year)]
In the above code, `by = .(year)` argument is used to group the data by year. The `.(sum_value = sum(value))` argument is used to summarize the data by taking the sum of all values in each year.
Conclusion
In this article, we have explored how to aggregate daily data to monthly and yearly frequency using different R packages.
The zoo and xts packages are suitable for simple aggregation tasks, while the dplyr and data.table packages provide more flexibility for summarizing data by grouping and applying summary statistics.
The choice of package and function depends on the specific requirements of the analysis.