How to apply a transformation to multiple columns in R?

by finnstats

How to apply a transformation to multiple columns in R?, To apply a transformation to many columns, use R’s across() function from the dplyr package.

How to apply a transformation to multiple columns in R?

There are innumerable applications for this function, however, the following examples highlight some typical ones:

First Approach: Apply Function to Several Columns

Multiply values in col1 and col2 by 2

df %>%  mutate(across(c(col1, col2), function(x) x*2))

Second Approach: One Summary Statistic for Multiple Columns can be Calculated

calculate the mean of col1 and col2

df %>%  summarise(across(c(col1, col2), mean, na.rm=TRUE))

Third Approach: Multiple Summary Statistics to be Calculated for Multiple Columns

Calculate the mean and standard deviation for col1 and col2

df %>%  summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))

The examples below demonstrate each technique using the given data frame.

Subset rows based on their integer locations

Let’s create a data frame

df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2'),
points=c(26, 22, 28, 15, 32, 28),
rebounds=c(16, 15, 16, 12, 13, 10))

Now we can view the data frame

df

   team points rebounds
1   P1     26       16
2   P1     22       15
3   P1     28       16
4   P2     15       12
5   P2     32       13
6   P2     28       10

Example 1: Apply Function to Multiple Columns

The values in the columns for points and rebounds can be multiplied by 2 using the across() function by using the following code.

library(dplyr)

Multiply by two to the values in the columns for points and rebounds.

df %>%  mutate(across(c(points, rebounds), function(x) x*2))

  team points rebounds
1   P1     52       32
2   P1     44       30
3   P1     56       32
4   P2     30       24
5   P2     64       26
6   P2     56       20

Example 2: One Summary Statistic for Multiple Columns can be Calculated

The across() function can be used to determine the mean value for both the points and rebound columns using the following sample code.

How to do Conditional Mutate in R? – Data Science Tutorials

the average value of the columns for points and rebounds.

df %>%  summarise(across(c(points, rebounds), mean, na.rm=TRUE))

    points rebounds
1 25.16667 13.66667

Be aware that we can also use the is.numeric function to have the data frame’s numeric columns generate a summary statistic automatically.

Calculate the mean value for each column of numbers in the data frame.

df %>%  summarise(across(where(is.numeric), mean, na.rm=TRUE))

  points rebounds
1 25.16667 13.66667

Example 3: Multiple Summary Statistics to be Calculated for Multiple Columns

The across() function may be used to determine the mean and standard deviation of the points and rebounds columns using the following code.

Compute the mean and standard deviation for the columns of points and rebounds.

df %>%  summarise(across(c(points, rebounds), list(mean=mean, sd=sd), na.rm=TRUE))
    points_mean points_sd rebounds_mean rebounds_sd 
1    25.16667  5.946988      13.66667     2.42212

Now we are almost complete with dplyr package techniques. We will discuss transmute() function in an upcoming post.

How to change the column positions in R? – Data Science Tutorials