How to Find Unmatched Records in R
How to Find Unmatched Records in R?, To retrieve all rows in one data frame that do not have matching values in another data frame, use the anti_join() function from the dplyr package in R.
What Is the Best Way to Filter by Date in R? – Data Science Tutorials
The following is the fundamental syntax for this function.
anti_join(df1, df2, by='col_name')
The examples below demonstrate how to utilise this syntax in practise.
How to make a rounded corner bar plot in R? – Data Science Tutorials
Example 1: Use anti join() with One Column
Let’s pretend we have the following two R data frames:
Now we data frames
df1 <- data.frame(team=c('A', 'B', 'C', 'D', 'E'), points=c(102, 104, 129, 224, 436))
df2 <- data.frame(team=c('A', 'B', 'C', 'F', 'G'), points=c(412, 514, 519, 233, 117))
To return all rows in the first data frame that do not have a matching team in the second data frame, we can use the anti_join() function.
How to get the last value of each group in R – Data Science Tutorials
library(dplyr)
Using the ‘team’ column, execute an anti-join.
anti_join(df1, df2, by='team')
team points 1 D 224 2 E 436
We can see that in the second data frame, there are exactly two teams from the first data frame that do not have a corresponding team name.
Example 2: Use anti_join() with Multiple Columns
Let’s pretend we have the following two R data frames.
Change ggplot2 Theme Color in R- Data Science Tutorials
Let’s create the data frames
df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'F', 'G', 'F', 'C'), points=c(182, 164, 159, 124, 136, 441))
df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'), position=c('G', 'G', 'C', 'G', 'F', 'F'), points=c(152, 154, 159, 322, 217, 522))
The anti_join() method can be used to return all rows in the first data frame that do not match a team or position in the second data frame.
How to perform the Kruskal-Wallis test in R? – Data Science Tutorials
library(dplyr)
Use the ‘team’ and ‘position’ columns to do an anti-join.
anti_join(df1, df2, by=c('team', 'position'))
team position points 1 A F 159 2 B C 441
We can see that in the second data frame, there are exactly two records from the first data frame that do not have a corresponding team name and position.