How to Find Unmatched Records in R

How to Find Unmatched Records in R?, To retrieve all rows in one data frame that do not have matching values in another data frame, use R’s anti_join() function from the dplyr package.

The basic syntax used by this function is as follows.

How to Remove Columns from a data frame in R – Data Science Tutorials

anti_join(df1, df2, by='col_name')

The usage of this syntax is demonstrated in the examples that follow.

Example 1: Use anti_join() with One Column

Suppose we have the two R data frames shown below:

Let’s build data frames

df1 <- data.frame(Q1 = c('a', 'b', 'c', 'd', 'e', 'f'),
                  Q2 = c(152, 514, 114, 218, 322, 323))
df2 <- data.frame(Q1 = c('a', 'a', 'a', 'b', 'b', 'b'),
                  Q3 = c(523, 324, 233, 134, 237, 141))

To return all rows in the first data frame that don’t have a matching Q1 in the second data frame, we can use the anti_join() function.

Bind together two data frames by their rows or columns in R (datasciencetut.com)

library(dplyr)

use the ‘Q1’ column to perform anti join

anti_join(df1, df2, by='Q1')
  Q1  Q2
1  c 114
2  d 218
3  e 322
4  f 323

We can see that there are exactly 4 Q1’s from the first data frame that does not have a matching Q1 name in the second data frame.

Example 2: Use anti_join() with Multiple Columns

Suppose we have the two R data frames shown below.

How to Join Data Frames for different column names in R (datasciencetut.com)

Let’s create a data frames

df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'F', 'G', 'F', 'C'),
                  points=c(152, 114, 219, 254, 356, 441))
df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'C', 'G', 'F', 'F'),
                  points=c(142, 214, 319, 133, 517, 422))

All rows in the first data frame that lack a matching team and position in the second data frame can be returned using the anti_join() function:

library(dplyr)

utilizing the columns for “team” and “position,” perform anti _join.

How to Count Distinct Values in R – Data Science Tutorials

anti_join(df1, df2, by=c('team', 'position'))
   team position points
1    A        F    219
2    B        C    441

We can see that there are exactly two records from the first data frame that do not have a matching team name and position in the second data frame.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

14 − six =