How to Scale Only Numeric Columns in R

How to Scale Only Numeric Columns in R, To scale only the numeric columns in a data frame in R, use the dplyr package’s following syntax.

Best Books to learn Tensorflow – Data Science Tutorials

library(dplyr)
df %>% mutate(across(where(is.numeric), scale))

How to actually use this function is demonstrated in the example that follows.

Use dplyr to Scale Only Numeric Columns as an example.

Let’s say we have the R data frame shown below, which contains details about numerous basketball players.

How to Scale Only Numeric Columns in R

Let’s create a data frame

df <- data.frame(Team=c('P1', 'P2', 'P3', 'P4', 'P5'),
                 points=c(2, 3, 7, 22, 8),
                 value=c(27, 39, 49, 82, 54))

Now we can view the data frame

df
  Team points value
1   P1      2    27
2   P2      3    39
3   P3      7    49
4   P4     22    82
5   P5      8    54

Technical Remarks

The following fundamental syntax is used by R’s scale() function.

Best Books to Learn Statistics for Data Science (datasciencetut.com)

scale(x, center = TRUE, scale = TRUE)

where:

x: Name of the object to scale

center: whether to scale after subtracting the mean. As a rule, TRUE.

scale: Whether to scale after dividing by the standard deviation. As a general, TRUE.

Scaled values are calculated using the following formula by this function:

xscaled = (xoriginal – x̄) / s

where:

xoriginal: The original x-value

x̄: The sample mean

s: The sample standard deviation

This process, which only changes each original value into a z-score, is also known as normalizing data.

Let’s say we want to scale the data frame’s numeric columns solely, using R’s scale function.

Methods for Integrating R and Hadoop complete Guide – Data Science Tutorials

To do this, we can use the syntax shown below.

library(dplyr)

scale just the data frame’s numerical columns.

df %>% mutate(across(where(is.numeric), scale))
   Team      points      value
1   P1 -0.79813157 -1.1284228
2   P2 -0.67342351 -0.5447558
3   P3 -0.17459128 -0.0583667
4   P4  1.69602958  1.5467175
5   P5 -0.04988322  0.1848279

The team column has remained the same, but the values in the three numerical columns (points, assists, and rebounds) have been scaled.

How to Standardize Data in R? – Data Science Tutorials

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

one + 11 =