Dummy Variable Example in R

Dummy Variable Example in R, A dataset occasionally needs to be arranged according to particular properties.

They are important for statistical modeling because they facilitate the grouping of related objects by providing a dummy variable to indicate if the property requirement has been satisfied.

Dummy Variable Example in R

In order to store statistical data, dummy variables are introduced to a dataset. It is applied when categorizing data based on particular attributes desired.

One dummy variable is required fewer than the total number of categories you intend to construct. To use a dataset of five different types of automobiles to divide a population into groups based on the vehicles they drive.

How to learn Big Data for Beginners? »

Four dummy variables with values of 1 or 0 would be created. The fifth vehicle type in this illustration would be represented by all four dummy variables being equal to 0, with each dummy variable representing a vehicle type that would be denoted by 1.

How to create a dummy variable in R

A simple operator (percent in percent) is all that is required to construct a dummy variable in R, and it returns true if the variable equals the value being sought.

df<-data.frame(ID=c("B","S","T","A"),
               sex=c("M","F","M","F"),
               Height=c(5.4,5.2,6,5.6),
               Weight=c(170,162,180,NA))

df

ID sex Height Weight
1  B   M    5.4    170
2  S   F    5.2    162
3  T   M    6.0    180
4  A   F    5.6     NA

A data frame comprising four people’s height, weight, and sex is shown here.

df$male = df$sex %in% ‘M’
df

ID sex Height Weight  male
1  B   M    5.4    170  TRUE
2  S   F    5.2    162 FALSE
3  T   M    6.0    180  TRUE
4  A   F    5.6     NA FALSE

The data frame now has a new column thanks to the dummy variable df$male that we added earlier. We get the same data when it is printed together with the new variable.

One of the First Steps to Become a Data Scientist »

Useful application

Being able to group comparable objects together is frequently crucial in statistical modeling.

R – Base Data: How to Create a Dummy Variable

team$didsales = team$pastjob %in% c('Research','R&D')
team

employee  pastjob results didsales
1        1       IT     126    FALSE
2        2 Research    1280     TRUE
3        3    sales     212    FALSE
4        4    sales     301    FALSE
5        5      ops     215    FALSE
6        6      ops     168    FALSE
7        7      R&D     212     TRUE
8        8       IT     314    FALSE

how to create a dummy variable in R – roll up

aggregate(team, by=list(team$didsales),FUN=mean)

Group.1 employee pastjob  results didsales
1   FALSE      4.5      NA 222.6667        0
2    TRUE      4.5      NA 746.0000        1

The aggregate() function is used in this sales team example to display the average performance of the team members. The manager can learn a lot about a sales team’s activities as a whole by having access to this information.

What Data Science Is and What You Can Do With It » finnstats

Dummy variables can be used to divide datasets into groups. R makes performing this really simple because it only requires one small operation. This is just one of the many good things about R as a data research tool.

Dummy Variable Example in R