aggregate Function in R- A powerful tool for data frames

aggregate Function in R, In this tutorial, we are going to describe the aggregate function in R.

As the name indicate it’s aggregate the input data frame based on a given or specified function.

Let’s see the basic R syntax of an aggregate function.

aggregate(x = any_data, by = group_list, FUN = any_function)

The by an argument can list of columns and this is one of the main advantages of this function.

It can handle one or more columns of a data frame based on by parameter and FUN indicate function you can pass any kinds of functions here.

When we have different groups in our data frames the first steps is to calculate mean and sd.

Here we are going to explain simples examples of an aggregate function.

Repeated Measures of ANOVA in R Complete Tutorial »

We are utilizing iris data frame for the aggregate calculations. Let’s load the data set into data frame ‘data’.

data <- iris
head(data) 
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
 1          5.1         3.5          1.4         0.2  setosa
 2          4.9         3.0          1.4         0.2  setosa
 3          4.7         3.2          1.3         0.2  setosa
 4          4.6         3.1          1.5         0.2  setosa
 5          5.0         3.6          1.4         0.2  setosa
 6          5.4         3.9          1.7         0.4  setosa

aggregate Function in R- Examples

Example 1: Compute Mean by Group Using aggregate Function

LSTM Network in R » Recurrent Neural network »

aggregate(x = data[ , colnames(data) != "Species"],             
         by = list(data$Species),
          FUN = mean)
       Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
 1     setosa        5.006       3.428        1.462       0.246
 2 versicolor        5.936       2.770        4.260       1.326
 3  virginica        6.588       2.974        5.552       2.026

Example 2: Compute Sum by Group Using aggregate Function

aggregate(x = data[ , colnames(data) != "Species"],      
by = list(data$Species),
          FUN = sum)
      Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
 1     setosa        250.3       171.4         73.1        12.3
 2 versicolor        296.8       138.5        213.0        66.3
 3  virginica        329.4       148.7        277.6       101.3

Example 3: Compute SD by Group Using aggregate Function

Data Analysis in R pdf tools & pdftk » Read, Merge, Split, Attach »

aggregate(x = data[ , colnames(data) != "Species"],             
         by = list(data$Species),
          FUN = sd)
      Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
 1     setosa    0.3524897   0.3790644    0.1736640   0.1053856
 2 versicolor    0.5161711   0.3137983    0.4699110   0.1977527
 3  virginica    0.6358796   0.3224966    0.5518947   0.2746501

Sam way you can execute all other available functions while passing in FUN command.

Example 4: Applying aggregate Function to Data Containing NAs

data1<-data
data1[2,3]<-NA
aggregate(x = data1[ , colnames(data1) != "Species"], 
         by = list(data1$Species),
          FUN = mean)
       Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
 1     setosa        5.006       3.428           NA       0.246
 2 versicolor        5.936       2.770        4.260       1.326
 3  virginica        6.588       2.974        5.552       2.026

Here you can see Petal Length mean is not calculated because of NA values in data frame.

Example 5: Applying aggregate Function to Data Containing NAs with na.rm

One sample analysis in R »

aggregate(x = data1[ , colnames(data1) != "Species"],       
          by = list(data1$Species),
          FUN = mean,
          na.rm = TRUE)
       Group.1 Sepal.Length Sepal.Width Petal.Length Petal.Width
 1     setosa        5.006       3.428     1.463265       0.246
 2 versicolor        5.936       2.770     4.260000       1.326
 3  virginica        6.588       2.974     5.552000       2.026

Conclusion

Hope you have found the above information is useful. Here we discussed aggregate function to compute descriptive statistics by group and however many other better functions also available in terms of faster compilation.

Don’t forget to show your love, Subscribe the Newsletter and COMMENT below!

[newsletter_form type=”minimal”]

summarize in r, Data Summarization In R »

You may also like...

2 Responses

  1. The aggregate function also works with formulas which I think is neater than using multiple arguments. Like,

    aggregate(. ~ Species, data = iris, FUN = mean)

    gives the same result as,

    aggregate(x = data[ , colnames(data) != “Species”],
    by = list(data$Species),
    FUN = mean)

Leave a Reply

Your email address will not be published. Required fields are marked *

18 + eighteen =