Transposing Data in R

Transposing Data in R, Data manipulation and transformation are essential tasks in data analysis.

R, being a popular programming language for data science, offers various packages to perform these operations efficiently.

In this article, we will explore two R packages, tidyr and reshape2, which are part of the tidyverse and help in reshaping data from long to wide format or vice versa.

We will also discuss the dplyr package and its limitations in terms of transposing data.

Transpose Data with R

Before we dive into the examples, let’s create a sample dataset for demonstration.

data <- read.table(text="ID X Y Z
1 1 ProductA 13377
1 2 ProductA 14069
1 1 ProductB 11426
1 2 ProductB 11750
2 1 ProductA 11122
2 2 ProductA 11202
2 1 ProductB 14712
2 2 ProductB 10169
",header=TRUE)

Convert Long to Wide Format

Suppose you have data containing three variables (X, Y, and Z) where X contains IDs, Y contains dates, and Z contains income.

You need to convert this long-format data to wide format so that the dates move to column names, and the income information comes under these dates.

Reshape2 Package

The reshape2 package offers two functions, dcast and acast, to transform long-format data to wide format.

The primary difference between these two functions is that dcast returns a data frame as output, while acast returns a vector, matrix, or array as output.

# Install reshape2 package if not installed already
if (!require(reshape2)){
  install.packages('reshape2')
}
library(reshape2)

# Transform Long to Wide Format
mydt = dcast(data, X~Y, value.var = "Z")

tidyr Package

The tidyr package provides the spread() function to reshape data from long to wide format.

How to make a rounded corner bar plot in R? » Data Science Tutorials

library(tidyr)
mydt = spread(data, Y, Z)
mydt
ID X ProductA ProductB
1  1 1    13377    11426
2  1 2    14069    11750
3  2 1    11122    14712
4  2 2    11202    10169

Example 2: More than 1 ID Variable

Let’s consider an example where we have more than one ID variable. The dataset contains information about income generated from two products (Product A and B) reported semi-annually.

data <- read.table(text="Year SemiYear Product Income
1 1 ProductA 13377
1 2 ProductA 14069
1 1 ProductB 11426
1 2 ProductB 11750
2 1 ProductA 11122
2 2 ProductA 11202
2 1 ProductB 14712
2 2 ProductB 10169
",header=TRUE)

Transform Data

library(reshape2)
xx=dcast(data, Year + SemiYear ~ Product, value.var = "Income")
xx
Year SemiYear ProductA ProductB
1    1        1    13377    11426
2    1        2    14069    11750
3    2        1    11122    14712
4    2        2    11202    10169

Using tidyr Package

library(tidyr)
mydt = spread(data, Product, Income)
mydt
Year SemiYear ProductA ProductB
1    1        1    13377    11426
2    1        2    14069    11750
3    2        1    11122    14712
4    2        2    11202    10169

If you want the final output to be reported at the year level:

# Transform Data
dcast(data, Year ~ Product, fun.aggregate = sum, value.var = "Income")
Year ProductA ProductB
1    1    27446    23176
2    2    22324    24881

This will give you the total income for each product at the year level.

Convert Wide Format Data to Long Format

Suppose you have data containing information about species and their sepal length. The data of sepal length for species are in columns.

We will now convert this wide-format data to long format.

# Create Sample Data
mydata = read.table(text= "ID setosa versicolor virginica
1 5.1 NA NA
2 4.9 NA NA
3 NA 7 NA
4 NA 6.4 NA
5 NA NA 6.3
6 NA NA 5.8
", header=TRUE)

# Convert Wide Format Data to Long Format
library(reshape2)
x = colnames(mydata[,-1])
t = melt(mydata, id.vars = "ID", measure.vars = x , variable.name="Species", value.name="Sepal.Length", na.rm = TRUE)
t
ID    Species Sepal.Length
1   1     setosa          5.1
2   2     setosa          4.9
9   3 versicolor          7.0
10  4 versicolor          6.4
17  5  virginica          6.3
18  6  virginica          5.8

tidyr: gather() Function

The gather() function in the tidyr package transforms data from wide to long format. The parameters for this function are:

  • data: Data frame
  • key: Name of the new key column
  • value: Name of the new value column
  • na.rm: na.rm=TRUE means removing rows from the output where the value is missing
# Transform Data
gather(mydata, Species, Sepal.Length, x, na.rm = TRUE)
ID    Species Sepal.Length
1   1     setosa          5.1
2   2     setosa          4.9
9   3 versicolor          7.0
10  4 versicolor          6.4
17  5  virginica          6.3
18  6  virginica          5.8

Conclusion

In this article, we explored the process of reshaping data from long to wide format and vice versa using the tidyr and reshape2 packages in R.

We also discussed the limitations of the dplyr package in terms of transposing data.

By understanding and applying these techniques, you can efficiently manipulate and explore your data using R.

Creating Relative Frequency in R » finnstats

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

7 + three =