How to Remove Duplicates in R with Example
How to Remove Duplicates in R, when we are dealing with data frames one of the common tasks is the removal of duplicate rows in R.
This can handle while using different functions in R like distinct, unique, duplicated, etc…
This tutorial describes how to remove duplicated rows from a data frame in R while using distinct, duplicated, and unique functions.
Remove Duplicates in R
Let’s load the library and create a data frame
library(dplyr)
data<- data.frame(Column1 = c('P1', 'P1', 'P2', 'P3', 'P1', 'P1', 'P3', 'P4', 'P2', 'P4'), Column2 = c(5, 5, 3, 5, 2, 3, 4, 7, 10, 14))
data
Column1 Column2 2 P1 5 3 P2 3 4 P3 5 5 P1 2 6 P1 3 7 P3 4 8 P4 7 9 P2 10 10 P4 14
Approach 1: Remove duplicated rows
Let’s make use of a distinct function from dplyr library.
distinct(data)
Column1 Column2 1 P1 5 2 P2 3 3 P3 5 4 P1 2 5 P1 3 6 P3 4 7 P4 7 8 P2 10 9 P4 14
Approach 2: Remove Duplicates in Column
If we want to delete duplicate rows or values from a certain column, we can use the distinct function.
Let’s remove duplicate rows from Column2.
distinct(data, Column2)
Column2 1 5 2 3 3 2 4 4 5 7 6 10 7 14
Suppose you want to remove duplicate values from column2 and want to retain the respective values in Column1,
distinct(data, Column2, .keep_all = TRUE)
Column1 Column2 1 P1 5 2 P2 3 3 P1 2 4 P3 4 5 P4 7 6 P2 10 7 P4 14
Approach 3: Duplicated function
The duplicated function is also very handy to remove repeated rows from a data frame.
duplicated(data)
FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Let’s remove the duplicated values.
data[!duplicated(data), ]
Column1 Column2 1 P1 5 3 P2 3 4 P3 5 5 P1 2 6 P1 3 7 P3 4 8 P4 7 9 P2 10 10 P4 14
Approach 4: Unique Function
unique(data)
Column1 Column2 1 P1 5 3 P2 3 4 P3 5 5 P1 2 6 P1 3 7 P3 4 8 P4 7 9 P2 10 10 P4 14