How to do Conditional Mutate in R?
How to do Conditional Mutate in R, It’s common to wish to add a new variable based on a condition to an existing data frame. The mutate() and case when() functions from the dplyr package make this task fortunately simple.
Cumulative Sum calculation in R – Data Science Tutorials
With the following data frame, this lesson provides numerous examples of how to apply these functions.
How to do Conditional Mutate in R
Let’s create a data frame
df <- data.frame(player = c('P1', 'P2', 'P3', 'P4', 'P5'), position = c('A', 'B', 'A', 'B', 'B'), points = c(102, 215, 319, 125, 112), rebounds = c(22, 12, 19, 23, 36))
Let’s view the data frame
df
player position points rebounds 1 P1 A 102 22 2 P2 B 215 12 3 P3 A 319 19 4 P4 B 125 23 5 P5 B 112 36
Example 1: Based on one existing variable, create a new variable
A new variable called “score” can be created using the following code depending on the value in the “points” column.
Top Data Science Skills to Get You Hired »
library(dplyr)
Let’s define new variable ‘score’ using mutate() and case_when()
df %>% mutate(score = case_when(points < 105 ~ 'LOW', points < 212 ~ 'MED', points < 450 ~ 'HIGH'))
player position points rebounds score 1 P1 A 102 22 LOW 2 P2 B 215 12 HIGH 3 P3 A 319 19 HIGH 4 P4 B 125 23 MED 5 P5 B 112 36 MED
Example 2: Based on a number of existing variables, create a new variable
The following code demonstrates how to make a new variable called “type” based on the player and position values in the player column.
Tips for Rearranging Columns in R – Data Science Tutorials
library(dplyr)
Now we can define the new variable ‘Type’ using mutate() and case_when()
df %>% mutate(Type = case_when(player == 'P1' | player == 'P2' ~ 'starter', player == 'P3' | player == 'P4' ~ 'backup', position == 'B' ~ 'reserve'))
player position points rebounds Type 1 P1 A 102 22 starter 2 P2 B 215 12 starter 3 P3 A 319 19 backup 4 P4 B 125 23 backup 5 P5 B 112 36 reserve
In order to generate a new variable called “value” depending on the value in the points and rebounds columns, use the following code.
Best online course for R programming – Data Science Tutorials
library(dplyr)
Let’s define the new variable ‘value’ using mutate() and case_when()
df %>% mutate(value = case_when(points <= 102 & rebounds <=45 ~ 2, points <=215 & rebounds > 55 ~ 4, points < 225 & rebounds < 28 ~ 6, points < 325 & rebounds > 29 ~ 7, points >=25 ~ 9))
player position points rebounds value 1 P1 A 102 22 2 2 P2 B 215 12 6 3 P3 A 319 19 9 4 P4 B 125 23 6 5 P5 B 112 36 7
Hope now you are clear with the concept.