How to Interpolate Missing Values in R With Example

How to Interpolate Missing Values, In today’s world, data comes from a variety of places, is collected through numerous streams, and is then evaluated using a variety of methodologies.

In this article, we’ve discussed missing values and how to deal with them using the zoo library.

To interpolate missing values in a data frame column in R, use the following basic syntax.

library(dplyr)
library(zoo)
df <- df %>%
        mutate(column_name = na.approx(column_name))

The example below demonstrates how to utilize this syntax in practice.

Interpolate Missing Values in R as an example

Let’s say we have the following data frame in R that shows a store’s total sales for 15 days in a row:

create a data frame

df <- data.frame(day=1:15,
                 sales=c(2, 4, 9, 1, 10, 15, 2, NA, NA, 8, NA, 31, 32, 41, 45))

Now we can view the data frame

df
   day sales
1    1     2
2    2     4
3    3     9
4    4     1
5    5    10
6    6    15
7    7     2
8    8    NA
9    9    NA
10  10     8
11  11    NA
12  12    31
13  13    32
14  14    41

Notice that the data frame is lacking sales numbers for four days.

Here’s what a basic line chart to show sales over time would look like:

To visualize sales, construct a line chart.

plot(df$sales, type='o', pch=16, col='red', xlab='Day', ylab='Sales')

in R, interpolate missing values

We can use the na.approx() function from the zoo package and the modify() method from the dplyr package to fill in the missing values.

Adding text labels to ggplot2 Bar Chart » finnstats

library(dplyr)
library(zoo)

in the sales column, interpolate missing numbers

df <- df %>%
        mutate(sales = na.approx(sales))

Now we can view the updated data frame

df
     day sales
1    1   2.0
2    2   4.0
3    3   9.0
4    4   1.0
5    5  10.0
6    6  15.0
7    7   2.0
8    8   4.0
9    9   6.0
10  10   8.0
11  11  19.5
12  12  31.0
13  13  32.0
14  14  41.0
15  15  45.0

It’s worth noting that each missing value has been updated.

Here’s what it would look like if we made a new line chart to show the updated data frame:

To visualize sales, construct a line chart.

plot(df$sales, type='o', pch=16, col='green', xlab='Day', ylab='Sales')

Notice that the values are chosen by the na.approx() function seem to fit the trend in the data quite well.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

9 − 3 =