How to Use Spread Function in R
How to Use Spread Function in R, A key-value pair can be “spread” across numerous columns using the tidyr package’s spread() function.
The basic syntax used by this function is as follows.
Free Data Science Course-Online 2022 »
spread(data, key value)
where:
data: data frame name
key: column whose values will serve as the names of variables
value: Column where new variables formed from keys will populate with values
The usage of this function is demonstrated in the examples that follow.
The tidyr package’s objective is to produce “tidy” data, which possesses the following properties:
Each column contains a variable.
Each row represents a finding.
Each cell only contains one value.
To create neat data, the tidyr package requires four essential functions:
1. Spread() function
2. The function gather().
3. The function separate().
4. The function unite().
You will be able to produce “tidy” data from any data frame if you can master these four functions.
Example 1: Spread Values Over Two Columns
Let’s say we have the R data frame shown below:
Let’s create a data frame
df <- data.frame(player=rep(c('A', 'B'), each=4), year=rep(c(1, 1, 2, 2), times=2), stat=rep(c('points', 'assists'), times=4), amount=c(14, 6, 18, 7, 22, 9, 38, 4))
Now we can view the data frame
Best Data Science Books For Beginners »
df
player year stat amount 1 P1 1 points 104 2 P1 1 assists 56 3 P1 2 points 108 4 P1 2 assists 45 5 P2 1 points 333 6 P2 1 assists 405 7 P2 2 points 508 8 P2 2 assists 314
The stat column’s values can be separated into separate columns by using the spread() function.
library(tidyr)
Dividing the stats column into several columns
spread(df, key=stat, value=amount)
player year assists points 1 P1 1 56 104 2 P1 2 45 108 3 P2 1 405 333 4 P2 2 314 508
Example 2: Values Should Be Spread Across More Than Two Columns
Let’s say we have the R data frame shown below:
Let’s create a data frame
df <- data.frame(player=rep(c('P1', 'P2'), each=4), year=rep(c(1, 1, 2, 2), times=2), stat=rep(c('points', 'assists', 'steals', 'blocks'), times=2), amount=c(104, 56, 108, 45, 333, 405, 508, 314))
Now we can view the data frame
How to add Circles in Plots in R with Examples »
df
player year stat amount 1 P1 1 points 104 2 P1 1 assists 56 3 P1 2 steals 108 4 P1 2 blocks 45 5 P2 1 points 333 6 P2 1 assists 405 7 P2 2 steals 508 8 P2 2 blocks 314
We can use the spread() function to turn the four unique values in the stat column into four new columns:
library(tidyr)
spread(df2, key=stat, value=amount)
player year assists blocks points steals 1 P1 1 56 NA 104 NA 2 P1 2 NA 45 NA 108 3 P2 1 405 NA 333 NA 4 P2 2 NA 314 NA 508