droplevels in R with examples
droplevels in R with examples, To remove unneeded factor levels, use R’s droplevels() function.
This function comes in handy when we need to get rid of factor levels that are no longer in use as a result of subsetting a vector or a data frame.
The syntax for this function is as follows
droplevels(x)
where x is an object from which unused factor levels should be removed.
Count Observations by Group in R – Data Science Tutorials
This article shows you how to utilize this function in practice with a couple of examples.
Example 1: Drop Unused Factor Levels in a Vector
Assume we have a data vector with seven-factor levels. Let’s say we create a new data vector using only five of the original seven-factor levels.
define data on a seven-factor scale
data <- factor(c(1, 2, 3, 4, 5,6,7))
original data minus 4th and 5th-factor levels = new data
new <- data[-c(4, 5)]
Now we can view the new data
new
[1] 1 2 3 6 7 Levels: 1 2 3 4 5 6 7
Despite the fact that the new data only has five factors, we can see that the original seven-factor levels are still present.
How to perform the Kruskal-Wallis test in R? – Data Science Tutorials
We may use the droplevels() function to remove these unneeded factor levels.
remove any levels of factors that are no longer in use.
new <- droplevels(new)
Let’s view the data
new
[1] 1 2 3 6 7 Levels: 1 2 3 6 7
There are now only five-factor levels in the new data.
Example 2: Unused Factor Levels in a Data Frame Should Be Removed
Assume we’re working with a data frame in which one of the variables is a five-level factor.
Let’s say we create a new data frame that excludes two of these factor levels.
Checking Missing Values in R – Data Science Tutorials
Let’s create a data frame
df <- data.frame(region=factor(c('P1', 'P2', 'P3', 'P4', 'P5')), sales = c(103, 106, 202, 257, 324)) df
region sales 1 P1 103 2 P2 106 3 P3 202 4 P4 257 5 P5 324
Now we can define a new data frame
newdf <- subset(df, sales < 225)
view new data frame
newdf
region sales 1 P1 103 2 P2 106 3 P3 202
Let’s check the levels of the region variable.
How to add labels at the end of each line in ggplot2? (datasciencetut.com)
levels(newdf$region) [1] "P1" "P2" "P3" "P4" "P5"
The original five-factor levels are still there in the new data frame, despite the fact that the region column only has three factors.
If we tried to make any graphs with this data, we’d run into some issues.
The droplevels() function can be used to eliminate the unnecessary factor levels from the region variable:
Remove any unused factor levels.
newdf$region <- droplevels(newdf$region)
Let’s check now levels of the region variable.
How to make a rounded corner bar plot in R? – Data Science Tutorials
levels(newdf$region) [1] "P1" "P2" "P3"
Hurray! Done for the day.