Weighted Standard Deviation in R With Example
When some values in a dataset have higher weights than others, the weighted standard deviation is a handy technique to measure the dispersion of those values.
To calculate a weighted standard deviation, use the following formula.
where:
The total number of observations is denoted by the letter N.
The number of non-zero weights is denoted by the letter M.
wi: A weights vector
xi: A set of data values in a vector
x is the weighted average.
The wt.var() function from the Hmisc package is the simplest approach to calculate a weighted standard deviation in R, and it employs the following syntax.
Let’s define the data values
x <- c(2, 10, 10, 3, ...)
Now, we can define the weights
wt <- c(1, 1, 1, 2, ...)
Let’s figure out what the weighted variance is.
weighted_var <- wtd.var(x, wt)
The weighted standard deviation can now be calculated.
weighted_sd <- sqrt(weighted_var)
The examples below demonstrate how to utilize this function in practice.
Example 1: One-Vector Weighted Standard Deviation
In R, the weighted standard deviation for a single vector can be calculated using the code below.
library(Hmisc)
Let’s define data values
x <- c(10, 11, 12, 21, 22, 30, 23, 33, 33, 12)
Now we can add define weights
wt <- c(2, 1, 1.2, 3, 2, 1, 1.5, 2, 2, 2)
Let’s calculate the weighted variance
weighted_var <- wtd.var(x, wt)
Now we can calculate the weighted standard deviation.
sqrt(weighted_var) 8.707209
The weighted standard deviation turns out to be 8.707209.
Example 2: Weighted Standard Deviation for a Data Frame Column
In R, the weighted standard deviation for one column of a data frame may be calculated using the following code.
library(Hmisc)
Create a data frame,
df <- data.frame(team=c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'C'), wins=c(21, 19, 10, 10, 12, 11, 15, 12), points=c(1.5, 3, 2, 3, 2, 1, 1, 2)) df
team wins points 1 A 21 1.5 2 A 19 3.0 3 A 10 2.0 4 A 10 3.0 5 A 12 2.0 6 B 11 1.0 7 B 15 1.0 8 C 12 2.0
Let’s define weights
wt <- c(1, 2, 1.5, 3, 2, 2, 2, 1)
calculate the weighted sd of points
sqrt(wtd.var(df$points, wt)) [1] 0.8269873
The points column’s weighted standard deviation comes out to be 0.8269873.
Example 3: Weighted Standard Deviation for Data Frames with Multiple Columns
The following code demonstrates how to calculate the weighted standard deviation for many columns of a data frame in R using the sapply() function.
library(Hmisc)
Let’s define a data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'C'), wins=c(21, 19, 10, 10, 12, 11, 15, 12), points=c(1.5, 3, 2, 3, 2, 1, 1, 2))
Let’s define weights
wt <- c(1, 2, 1.5, 3, 2, 2, 2, 1)
calculate the weighted standard deviation of points and wins
sapply(df[c('wins', 'points')], function(x) sqrt(wtd.var(x, wt))) wins points 3.7972229 0.8269873
The weighted standard deviation for the wins column is 3.79 and the weighted standard deviation for the points column is 0.826.
The weighted standard deviation given by sqrt(Hmisc::wtd.var) does not agree with the formula given on top of this page because Hmisc package applies the “frequency weights”-approach
(https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance).