Imputing missing values in R

Imputing missing values in R, When an observation is missing in a column of a data frame or has a character value instead of a numeric value, it is referred to as a missing value in data science.

In order to derive the correct conclusion from the data, missing values must be eliminated or replaced.

We will learn how to deal with missing values using several approaches in this article.

In R, we use several ways to replace the missing value of the column, such as replacing the missing value with zero, average, median, and so on.

How to clean the datasets in R? » janitor Data Cleansing » finnstats

We’ll look at how to do it in this article.

1. In R, replace the column’s missing value with zero.

2. Replace the column’s missing value with the mean.

3. Replace the column’s missing value with the median.

Imputing missing values in R

Let’s start by making the data frame.

df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df
   Product Price
1       A   612
2       B   447
3       C   NA
4       D   374
5       E   831

In the Price column, replace the missing value.

Replace the column’s missing value with zero (0):

In the Price column, replace the missing value with zero.

df$Price[is.na(df$Price)] <- 0

as a result, the final data frame will be

Power analysis in Statistics with R » finnstats

df
  Product Price
1       A   612
2       B   447
3       C     0
4       D   374
5       E   831

Replace the column’s missing value with the mean:

Replace the missing value in the Price column with the average.

df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df$Price[is.na(df$Price)] <- mean(df$Price,na.rm = TRUE)
df

So the output data frame will be

Wilcoxon Signed Rank Test in R » an Overview » finnstats

    Product Price
1       A   612
2       B   447
3       C   566
4       D   374
5       E   831

Replace the column’s missing value with the median:

In the Price column, replace the missing number with the median.

df<-data.frame(Product = c('A','B', 'C','D','E'),Price=c(612,447,NA,374,831))
df$Price[is.na(df$Price)]<- median(df$Price,na.rm = TRUE)
df

Output data frame will be

  Product Price
1       A 612.0
2       B 447.0
3       C 529.5
4       D 374.0
5       E 831.0

To further read visit Handling missing values in R Programming »

Note: Mean imputation is best utilized as a last resort when only a few values are missing, and it should be avoided in general.

You may also like...

2 Responses

  1. Stefano says:

    There are no NA values in the construction of the data.frame at the beginning.

Leave a Reply

Your email address will not be published. Required fields are marked *

4 × two =

finnstats