Weighted Standard Deviation in R With Example

When some values in a dataset have higher weights than others, the weighted standard deviation is a handy technique to measure the dispersion of those values.

To calculate a weighted standard deviation, use the following formula.

where:

The total number of observations is denoted by the letter N.

The number of non-zero weights is denoted by the letter M.

wi: A weights vector

xi: A set of data values in a vector

x is the weighted average.

The wt.var() function from the Hmisc package is the simplest approach to calculate a weighted standard deviation in R, and it employs the following syntax.

Let’s define the data values

x <- c(2, 10, 10, 3, ...)

Now, we can define the weights

wt <- c(1, 1, 1, 2, ...)

Let’s figure out what the weighted variance is.

weighted_var <- wtd.var(x, wt)

The weighted standard deviation can now be calculated.

weighted_sd <- sqrt(weighted_var)

The examples below demonstrate how to utilize this function in practice.

Example 1: One-Vector Weighted Standard Deviation

In R, the weighted standard deviation for a single vector can be calculated using the code below.

library(Hmisc)

Let’s define data values

x <- c(10, 11, 12, 21, 22, 30, 23, 33, 33, 12)

Now we can add define weights

wt <- c(2, 1, 1.2, 3, 2, 1, 1.5, 2, 2, 2)

Let’s calculate the weighted variance

weighted_var <- wtd.var(x, wt)

Now we can calculate the weighted standard deviation.

sqrt(weighted_var)
8.707209

The weighted standard deviation turns out to be 8.707209.

Example 2: Weighted Standard Deviation for a Data Frame Column

In R, the weighted standard deviation for one column of a data frame may be calculated using the following code.

library(Hmisc)

Create a data frame,

df <- data.frame(team=c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'C'),
                 wins=c(21, 19, 10, 10, 12, 11, 15, 12),
                 points=c(1.5, 3, 2, 3, 2, 1, 1, 2))
df
  team wins points
1    A   21    1.5
2    A   19    3.0
3    A   10    2.0
4    A   10    3.0
5    A   12    2.0
6    B   11    1.0
7    B   15    1.0
8    C   12    2.0

Let’s define weights

wt <- c(1, 2, 1.5, 3, 2, 2, 2, 1)

calculate the weighted sd of points

sqrt(wtd.var(df$points, wt))
[1] 0.8269873

The points column’s weighted standard deviation comes out to be 0.8269873.

Example 3: Weighted Standard Deviation for Data Frames with Multiple Columns

The following code demonstrates how to calculate the weighted standard deviation for many columns of a data frame in R using the sapply() function.

library(Hmisc)

Let’s define a data frame

df <- data.frame(team=c('A', 'A', 'A', 'A', 'A', 'B', 'B', 'C'),
                 wins=c(21, 19, 10, 10, 12, 11, 15, 12),
                 points=c(1.5, 3, 2, 3, 2, 1, 1, 2))

Let’s define weights

wt <- c(1, 2, 1.5, 3, 2, 2, 2, 1)

calculate the weighted standard deviation of points and wins

sapply(df[c('wins', 'points')], function(x) sqrt(wtd.var(x, wt)))
  wins    points
3.7972229 0.8269873

The weighted standard deviation for the wins column is 3.79 and the weighted standard deviation for the points column is 0.826.

Adding text labels to ggplot2 Bar Chart » finnstats

You may also like...

1 Response

  1. Anonymous says:

    The weighted standard deviation given by sqrt(Hmisc::wtd.var) does not agree with the formula given on top of this page because Hmisc package applies the “frequency weights”-approach
    (https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance).

Leave a Reply

Your email address will not be published. Required fields are marked *

eighteen − ten =