Filtering for Unique Values in R- Using the dplyr

Filtering for Unique Values in R, Using the dplyr package in R, you may filter for unique values in a data frame using the following methods.

Method 1: In one column, filter for unique values.

df %>% distinct(var1)

Method 2: Filtering for Unique Values in Multiple Columns

df %>% distinct(var1, var2)

Method 3: In all columns, filter for unique values.

df %>% distinct()

With the following data frame in R, the following examples explain how to utilize each method in practice.

Arrange Data by Month in R with example – Data Science Tutorials

create a data frame

df <- data.frame(team=c('X', 'X', 'X', 'X', 'Y', 'Y', 'Y', 'Y'),
                 rebounds =c('8', '6', '5', '4', '3', '8', '9', '5'),
                 points=c(107, 207, 208, 211, 213, 215, 219, 313))

Now we can view the data frame

df
   team rebounds points
1    X        8    107
2    X        6    207
3    X        5    208
4    X        4    211
5    Y        3    213
6    Y        8    215
7    Y        9    219
8    Y        5    313

Example 1: Column Filter for Unique Values

To filter for unique values in just the team column, we can use the following code.

Rejection Region in Hypothesis Testing – Data Science Tutorials

library(dplyr)

In the team column, only unique values should be selected.

df %>% distinct(team)
  team
1    X
2    Y

It’s worth noting that just the team column’s unique values are returned.

Example 2: Find Unique Values in Multiple Columns Using a Filter

To filter for unique values in the team and points columns, we can use the following code:

library(dplyr)

in the team and points columns, select unique values

df %>% distinct(team, points)
  team points
1    X    107
2    X    207
3    X    208
4    X    211
5    Y    213
6    Y    215
7    Y    219
8    Y    313

It’s worth noting that just the team and points columns’ unique values are returned.

Best Books to Learn R Programming – Data Science Tutorials

Example 3: Filter all columns for unique values

To filter for unique values across all columns in the data frame, we can use the following code.

library(dplyr)

choose unique values in all columns

df %>% distinct()
   team rebounds points
1    X        8    107
2    X        6    207
3    X        5    208
4    X        4    211
5    Y        3    213
6    Y        8    215
7    Y        9    219
8    Y        5    313

It’s worth noting that the unique values from each of the three columns are returned.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

one + twenty =