How to Interpolate Missing Values in R With Example

by finnstats

How to Interpolate Missing Values, In today’s world, data comes from a variety of places, is collected through numerous streams, and is then evaluated using a variety of methodologies.

In this article, we’ve discussed missing values and how to deal with them using the zoo library.

To interpolate missing values in a data frame column in R, use the following basic syntax.

library(dplyr)
library(zoo)
df <- df %>%
        mutate(column_name = na.approx(column_name))

The example below demonstrates how to utilize this syntax in practice.

Interpolate Missing Values in R as an example

Let’s say we have the following data frame in R that shows a store’s total sales for 15 days in a row:

create a data frame

df <- data.frame(day=1:15,
                 sales=c(2, 4, 9, 1, 10, 15, 2, NA, NA, 8, NA, 31, 32, 41, 45))

Now we can view the data frame

df

   day sales
1    1     2
2    2     4
3    3     9
4    4     1
5    5    10
6    6    15
7    7     2
8    8    NA
9    9    NA
10  10     8
11  11    NA
12  12    31
13  13    32
14  14    41

Notice that the data frame is lacking sales numbers for four days.

Here’s what a basic line chart to show sales over time would look like:

To visualize sales, construct a line chart.

plot(df$sales, type='o', pch=16, col='red', xlab='Day', ylab='Sales')

in R, interpolate missing values

We can use the na.approx() function from the zoo package and the modify() method from the dplyr package to fill in the missing values.

Adding text labels to ggplot2 Bar Chart » finnstats

library(dplyr)
library(zoo)

in the sales column, interpolate missing numbers

df <- df %>%
        mutate(sales = na.approx(sales))

Now we can view the updated data frame

df

     day sales
1    1   2.0
2    2   4.0
3    3   9.0
4    4   1.0
5    5  10.0
6    6  15.0
7    7   2.0
8    8   4.0
9    9   6.0
10  10   8.0
11  11  19.5
12  12  31.0
13  13  32.0
14  14  41.0
15  15  45.0

It’s worth noting that each missing value has been updated.

Here’s what it would look like if we made a new line chart to show the updated data frame:

To visualize sales, construct a line chart.

plot(df$sales, type='o', pch=16, col='green', xlab='Day', ylab='Sales')

Notice that the values are chosen by the na.approx() function seem to fit the trend in the data quite well.

Now retrieving an image set.

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python, Second Edition (Greyscale Indian Edition)

(4551098)

₹1,475.00 (as of June 25 22:51 GMT +07:00 - )

How to Interpolate Missing Values in R With Example

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python, Second Edition (Greyscale Indian Edition)

You may also like...

Leave a Reply Cancel reply

Quality articles need supporters. Will you be one?

How to Interpolate Missing Values in R With Example

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python, Second Edition (Greyscale Indian Edition)

You may also like...

SAS or R-Yould Should Know!

How to read or export large datasets in R

How to sort a data frame in R

Leave a Reply Cancel reply

Quality articles need supporters. Will you be one?