How to Use Spread Function in R

How to Use Spread Function in R, A key-value pair can be “spread” across numerous columns using the tidyr package’s spread() function.

The basic syntax used by this function is as follows.

Free Data Science Course-Online 2022 »

spread(data, key value)

where:

data: data frame name

key: column whose values will serve as the names of variables

value: Column where new variables formed from keys will populate with values

The usage of this function is demonstrated in the examples that follow.

The tidyr package’s objective is to produce “tidy” data, which possesses the following properties:

Each column contains a variable.

Each row represents a finding.

Each cell only contains one value.

To create neat data, the tidyr package requires four essential functions:

1. Spread() function

2. The function gather().

3. The function separate().

4. The function unite().

You will be able to produce “tidy” data from any data frame if you can master these four functions.

Example 1: Spread Values Over Two Columns

Let’s say we have the R data frame shown below:

Let’s create a data frame

df <- data.frame(player=rep(c('A', 'B'), each=4),
                 year=rep(c(1, 1, 2, 2), times=2),
                 stat=rep(c('points', 'assists'), times=4),
                 amount=c(14, 6, 18, 7, 22, 9, 38, 4))

Now we can view the data frame

Best Data Science Books For Beginners »

df
    player year    stat amount
1     P1    1  points    104
2     P1    1 assists     56
3     P1    2  points    108
4     P1    2 assists     45
5     P2    1  points    333
6     P2    1 assists    405
7     P2    2  points    508
8     P2    2 assists    314

The stat column’s values can be separated into separate columns by using the spread() function.

library(tidyr)

Dividing the stats column into several columns

spread(df, key=stat, value=amount)
   player year assists points
1     P1    1      56    104
2     P1    2      45    108
3     P2    1     405    333
4     P2    2     314    508

Example 2: Values Should Be Spread Across More Than Two Columns

Let’s say we have the R data frame shown below:

Let’s create a data frame

df <- data.frame(player=rep(c('P1', 'P2'), each=4),
                 year=rep(c(1, 1, 2, 2), times=2),
                 stat=rep(c('points', 'assists', 'steals', 'blocks'), times=2),
                 amount=c(104, 56, 108, 45, 333, 405, 508, 314))

Now we can view the data frame

How to add Circles in Plots in R with Examples »

df
   player year    stat amount
1     P1    1  points    104
2     P1    1 assists     56
3     P1    2  steals    108
4     P1    2  blocks     45
5     P2    1  points    333
6     P2    1 assists    405
7     P2    2  steals    508
8     P2    2  blocks    314

We can use the spread() function to turn the four unique values in the stat column into four new columns:

library(tidyr)
spread(df2, key=stat, value=amount)
   player year assists blocks points steals
1     P1    1      56     NA    104     NA
2     P1    2      NA     45     NA    108
3     P2    1     405     NA    333     NA
4     P2    2      NA    314     NA    508

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

10 − 1 =