You can override using the .groups argument

by finnstats

You can override using the .groups argument., You’ll discover how to deal with the dplyr message “‘summarise()’ has grouped output by ‘gr1’.” in this post.

In the R programming language, you can override using the ‘.groups’ parameter.”

The article will provide two approaches to dealing with the dplyr message “‘summarise()’ has grouped output by ‘gr1’.”

The ‘.groups’ argument can be used to overrule.” To be more specific, the article consists of the following sections:

Let’s get started with Example Data and Add-On Packages.

The first step is to gather some data to use as an example:

df <- data.frame(gr1 = rep(LETTERS[1:4],
                             each = 3),
                   gr2 = letters[1:2],
                   values = 10:12)
df

    gr1 gr2 values
1    A   a     10
2    A   b     11
3    A   a     12
4    B   b     10
5    B   a     11
6    B   b     12
7    C   a     10
8    C   b     11
9    C   a     12
10   D   b     10
11   D   a     11
12   D   b     12

dplyr message summarise has grouped output r in the data frame above.

Take a look at the data frame above. Our example data is made up of twelve rows and three columns, as shown.

The variable values have the integer class, and the columns gr1 and gr2 are characters.

We need to install and load the tidyverse’s dplyr package for the next tutorial.

install.packages("dplyr")                    
library("dplyr")

Example 1: Reproduce the Message You can override using the .groups argument.

We’ll show you how to reproduce the message “‘summarise()’ has grouped output by ‘X’.” in this example. The ‘.groups’ argument can be used to overrule.”

Assume we wish to use many columns to group our data (i.e. the group indicators gr1 and gr2).

Then, as illustrated below, we can utilise the dplyr package’s group by and summary functions:

data_group <- df %>%                      
data_group <- df %>%                      
  group_by(gr1, gr2) %>%
  dplyr::summarise(gr_sum = sum(values))

summarise()` has grouped output by 'gr1'. You can override using the `.groups` argument.

The preceding R code, as you can see, returned the message “‘summarise()’ has grouped output by ‘X’.” The ‘.groups’ argument can be used to overrule.”

However, when we examine the generated data, everything appears to be in order:

gr1   gr2   gr_sum
  <chr> <chr>  <int>
1 A     a         22
2 A     b         11
3 B     a         11
4 B     b         22
5 C     a         22
6 C     b         11
7 D     a         11
8 D     b         22

So, what is the significance of this message?

“‘summarise()’ has grouped output by ‘X’,” says the notification.

The ‘.groups’ argument can be used to overrule.” is that if we use multiple columns to group our data before using the summarise function, the dplyr package loses the final group variable supplied in the group by function.

This message informs the user that a grouping has been completed. The message, however, has little bearing on the eventual outcome.

To put it another way, “‘summarise()’ has grouped output by ‘X’.” The ‘.groups’ argument can be used to overrule.” is only a nice warning message that can be disregarded most of the time.

Note: This message may appear unexpectedly in R code that had previously run without displaying the message. The reason for this is that the dplyr package’s default settings changed in a recent release.

Example 2: Avoid the Message – the result of’summarise()’ has been grouped by ‘gr1’. The ‘.groups’ argument can be used to override.

Although this message may be valuable in some circumstances, it may be perplexing in others.

You can alter the global settings for the summarise function as shown below to disable such dplyr notifications in your code.

Classification Problem in Machine Learning » finnstats

options(dplyr.summarise.inform = FALSE)

If we run our code again, the message “‘summarise()’ has grouped output by ‘X'” will appear. The ‘.groups’ argument can be used to overrule.” is no longer visible:

df1 <- data %>%
  group_by(gr1, gr2) %>%
  dplyr::summarise(gr_sum = sum(values))

gr1   gr2   gr_sum
  <chr> <chr>  <int>
1 A     a         22
2 A     b         11
3 B     a         11
4 B     b         22
5 C     a         22
6 C     b         11
7 D     a         11
8 D     b         22

We can see the output without any errors.