droplevels in R with examples

droplevels in R with examples, To remove unneeded factor levels, use R’s droplevels() function.

This function comes in handy when we need to get rid of factor levels that are no longer in use as a result of subsetting a vector or a data frame.

The syntax for this function is as follows

droplevels(x)

where x is an object from which unused factor levels should be removed.

Count Observations by Group in R – Data Science Tutorials

This article shows you how to utilize this function in practice with a couple of examples.

Example 1: Drop Unused Factor Levels in a Vector

Assume we have a data vector with seven-factor levels. Let’s say we create a new data vector using only five of the original seven-factor levels.

define data on a seven-factor scale

data <- factor(c(1, 2, 3, 4, 5,6,7))

original data minus 4th and 5th-factor levels = new data

new <- data[-c(4, 5)]

Now we can view the new data

new
[1] 1 2 3 6 7
Levels: 1 2 3 4 5 6 7

Despite the fact that the new data only has five factors, we can see that the original seven-factor levels are still present.

How to perform the Kruskal-Wallis test in R? – Data Science Tutorials

We may use the droplevels() function to remove these unneeded factor levels.

remove any levels of factors that are no longer in use.

new <- droplevels(new)

Let’s view the data

new
[1] 1 2 3 6 7
Levels: 1 2 3 6 7

There are now only five-factor levels in the new data.

Example 2: Unused Factor Levels in a Data Frame Should Be Removed

Assume we’re working with a data frame in which one of the variables is a five-level factor.

Let’s say we create a new data frame that excludes two of these factor levels.

Checking Missing Values in R – Data Science Tutorials

Let’s create a data frame

df <- data.frame(region=factor(c('P1', 'P2', 'P3', 'P4', 'P5')),
                 sales = c(103, 106, 202, 257, 324))
df
   region sales
1     P1   103
2     P2   106
3     P3   202
4     P4   257
5     P5   324

Now we can define a new data frame

newdf <- subset(df, sales < 225)

view new data frame

newdf
    region sales
1     P1   103
2     P2   106
3     P3   202

Let’s check the levels of the region variable.

How to add labels at the end of each line in ggplot2? (datasciencetut.com)

levels(newdf$region)
[1] "P1" "P2" "P3" "P4" "P5"

The original five-factor levels are still there in the new data frame, despite the fact that the region column only has three factors.

If we tried to make any graphs with this data, we’d run into some issues.

The droplevels() function can be used to eliminate the unnecessary factor levels from the region variable:

Remove any unused factor levels.

newdf$region <- droplevels(newdf$region)

Let’s check now levels of the region variable.

How to make a rounded corner bar plot in R? – Data Science Tutorials

levels(newdf$region)
[1] "P1" "P2" "P3"

Hurray! Done for the day.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

twenty + 14 =