Dummy Variable Example in R
Dummy Variable Example in R, A dataset occasionally needs to be arranged according to particular properties.
They are important for statistical modeling because they facilitate the grouping of related objects by providing a dummy variable to indicate if the property requirement has been satisfied.
Dummy Variable Example in R
In order to store statistical data, dummy variables are introduced to a dataset. It is applied when categorizing data based on particular attributes desired.
One dummy variable is required fewer than the total number of categories you intend to construct. To use a dataset of five different types of automobiles to divide a population into groups based on the vehicles they drive.
How to learn Big Data for Beginners? »
Four dummy variables with values of 1 or 0 would be created. The fifth vehicle type in this illustration would be represented by all four dummy variables being equal to 0, with each dummy variable representing a vehicle type that would be denoted by 1.
How to create a dummy variable in R
A simple operator (percent in percent) is all that is required to construct a dummy variable in R, and it returns true if the variable equals the value being sought.
df<-data.frame(ID=c("B","S","T","A"), sex=c("M","F","M","F"), Height=c(5.4,5.2,6,5.6), Weight=c(170,162,180,NA))
df
ID sex Height Weight 1 B M 5.4 170 2 S F 5.2 162 3 T M 6.0 180 4 A F 5.6 NA
A data frame comprising four people’s height, weight, and sex is shown here.
df$male = df$sex %in% ‘M’ df
ID sex Height Weight male 1 B M 5.4 170 TRUE 2 S F 5.2 162 FALSE 3 T M 6.0 180 TRUE 4 A F 5.6 NA FALSE
The data frame now has a new column thanks to the dummy variable df$male that we added earlier. We get the same data when it is printed together with the new variable.
One of the First Steps to Become a Data Scientist »
Useful application
Being able to group comparable objects together is frequently crucial in statistical modeling.
R – Base Data: How to Create a Dummy Variable
team$didsales = team$pastjob %in% c('Research','R&D') team
employee pastjob results didsales 1 1 IT 126 FALSE 2 2 Research 1280 TRUE 3 3 sales 212 FALSE 4 4 sales 301 FALSE 5 5 ops 215 FALSE 6 6 ops 168 FALSE 7 7 R&D 212 TRUE 8 8 IT 314 FALSE
how to create a dummy variable in R – roll up
aggregate(team, by=list(team$didsales),FUN=mean)
Group.1 employee pastjob results didsales 1 FALSE 4.5 NA 222.6667 0 2 TRUE 4.5 NA 746.0000 1
The aggregate() function is used in this sales team example to display the average performance of the team members. The manager can learn a lot about a sales team’s activities as a whole by having access to this information.
What Data Science Is and What You Can Do With It » finnstats
Dummy variables can be used to divide datasets into groups. R makes performing this really simple because it only requires one small operation. This is just one of the many good things about R as a data research tool.