Sample and Population Variance in R

Sample and Population Variance in R, The variance is a metric for determining how dispersed data values are around the mean.

Variance is the expectation of a random variable’s squared departure from its mean in probability theory and statistics, and it informally indicates how far a set of (random) values is spread out from its mean.

The formula for calculating a population’s variance is

σ2 = Σ (xi – μ)2 / N

where μ is the population mean, xi is the ith population element, N is the population size, and is basically Σ a fancy symbol for “sum.”

To determine a sample’s variance, use the following formula:

s2 = Σ (xi – xbar)2 / (n-1)

where xbar represents the sample mean, xi represents the sample’s ith element, and n represents the sample size.

Calculate Sample & Population Variance in R

Assume we have the following R dataset and stored in data1.

Let’s create a data set values

data1<- c(12,84, 5, 17, 18, 11, 13, 19, 69, 92,15,10,55)

The var() function in R can be used to calculate sample variance.

Let’s calculate the sample variance

var(data1)
957.8974

The population variance can be calculated by multiplying the sample variance by (n-1)/n as follows.

Now we can calculate the length of the data1

n <- length(data1)
n
13

It’s ready to find population variance

var(data1) * (n-1)/n
884.213

It’s important to remember that the population variance is always lower than the sample variance.

In practice, we calculate sample variances for datasets because collecting data for a whole population is uncommon.

Calculate the Sample Variance of Multiple Columns as an example

Let’s say we have the following R data frame:

Now we can create a data frame

data2 <- data.frame(X=c(12, 35, 55, 48, 54, 12, 8, 10),
                   Y=c(12, 24, 33, 77, 5, 46, 71, 106),
                   Z=c(1, 2, 63, 8, 12, 77, 92, 102))
data2
   X   Y   Z
1 12  12   1
2 35  24   2
3 55  33  63
4 48  77   8
5 54   5  12
6 12  46  77
7  8  71  92
8 10 106 102

To determine the sample variance of each column in the data frame, we can use the sapply() function:

Yes, now based on sapply we can find each column’s sample variance.

sapply(data2, var)
   X         Y         Z
 439.6429 1238.7857 1863.9821

We can also determine the sample standard deviation of each column using the following code, which is essentially the square root of the sample variance:

To find each column’s sample standard deviation

sapply(data2, sd)
  X        Y        Z
20.96766 35.19639 43.17386

When it comes to data analysis, Sapply is a highly handy function.

Stringr in r 10 data manipulation Tips and Tricks » finnstats

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

seven + 9 =

Ads Blocker Image Powered by Code Help Pro

Quality articles need supporters. Will you be one?

You currently have an Ad Blocker on.

Please support FINNSTATS.COM by disabling these ads blocker.

Powered By
100% Free SEO Tools - Tool Kits PRO