How to Standardize Data in R?
How to Standardize Data in R?, A dataset must be scaled so that the mean value is 0 and the standard deviation is 1, which is known as standardization.
The z-score standardization, which scales numbers using the following formula, is the most used method for doing this.
Two-Way ANOVA Example in R-Quick Guide – Data Science Tutorials
(xi – xbar) / s
where:
xi: The ith value in the dataset
xbar: The sample mean
s: The sample standard deviation
The examples below demonstrate how to scale one or more variables in a data frame using the z-score standardization in R by using the scale() function and the dplyr package.
Standardize just one variable
In a data frame containing three variables, the following code demonstrates how to scale just one of the variables.
library(dplyr)
Now make this example reproducible
set.seed(123)
Now let’s create an original data frame
df <- data.frame(var1= runif(10, 0, 50), Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â var2= runif(10, 2, 20), Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â var3= runif(10, 5, 30))
Now we can view the original data frame
df
       var1     var2     var3 1 14.378876 19.223000 27.238483 2 39.415257 10.160015 22.320085 3 20.448846 14.196271 21.012670 4 44.150870 12.307401 29.856744 5 47.023364 3.852644 21.392645 6  2.277825 18.196849 22.713262 7 26.405274 6.429579 18.601651 8 44.620952 2.757072 19.853551 9 27.571751 7.902573 12.228993 10 22.830737 19.181066 8.677841
scale var1 to have mean = 0 and standard deviation = 1
df2 <- df %>% mutate_at(c('var1'), ~(scale(.) %>% as.vector))
df2var1Â Â Â Â Â var2Â Â Â Â Â var3 1Â -0.98619132 19.223000 27.238483 2Â Â 0.71268801 10.160015 22.320085 3Â -0.57430484 14.196271 21.012670 4Â Â 1.03402981 12.307401 29.856744 5Â Â 1.22894699Â 3.852644 21.392645 6Â -1.80732540 18.196849 22.713262 7Â -0.17012290Â 6.429579 18.601651 8Â Â 1.06592790Â 2.757072 19.853551 9Â -0.09096999Â 7.902573 12.228993 10 -0.41267825 19.181066Â 8.677841
You’ll notice that the other two variables didn’t change; only the first variable was scaled.
The new scaled variable has a mean value of 0, and a standard deviation of 1, as we can immediately confirm.
Bind together two data frames by their rows or columns in R (datasciencetut.com)
compute the scaled variable’s mean.
mean(df2$var1) [1] 2.638406e-17 basically zero
calculate the scaled variable’s standard deviation.
sd(df2$var1) [1] 1
Standardize Multiple Variables
Multiple variables in a data frame can be scaled simultaneously using the code provided below:
scale var1 and var2 to have mean = 0 and standard deviation = 1
df3 <- df %>% mutate_at(c('var1', 'var2'), ~(scale(.) %>% as.vector))
df3var1Â Â Â Â Â Â var2Â Â Â Â Â var3 1Â -0.98619132Â 1.2570692 27.238483 2Â Â 0.71268801 -0.2031057 22.320085 3Â -0.57430484Â 0.4471923 21.012670 4Â Â 1.03402981Â 0.1428686 29.856744 5Â Â 1.22894699 -1.2193121 21.392645 6Â -1.80732540Â 1.0917418 22.713262 7Â -0.17012290 -0.8041315 18.601651 8Â Â 1.06592790 -1.3958243 19.853551 9Â -0.09096999 -0.5668114 12.228993 10 -0.41267825Â 1.2503130Â 8.677841
Standardize All Variables
Using the mutate_all function, the following code demonstrates how to scale each variable in a data frame.
scale all variables to have mean = 0 and standard deviation = 1
How to Rank by Group in R? – Data Science Tutorials
df4 <- df %>% mutate_all(~(scale(.) %>% as.vector)) df4
var1Â Â Â Â Â Â var2Â Â Â Â Â Â Â var3 1Â -0.98619132Â 1.2570692Â 1.09158171 2Â Â 0.71268801 -0.2031057Â 0.30768348 3Â -0.57430484Â 0.4471923Â 0.09930665 4Â Â 1.03402981Â 0.1428686Â 1.50888235 5Â Â 1.22894699 -1.2193121Â 0.15986731 6Â -1.80732540Â 1.0917418Â 0.37034828 7Â -0.17012290 -0.8041315 -0.28496363 8Â Â 1.06592790 -1.3958243 -0.08543481 9Â -0.09096999 -0.5668114 -1.30064291 10 -0.41267825Â 1.2503130 -1.86662844