68 95 99 Rule in R
68 95 99 Rule in R, The Empirical Rule, often known as the 68-95-99.7 rule, states that assuming a normal distribution dataset:
Within one standard deviation of the mean, 68 percent of data values fall.
Within two standard deviations of the mean, 95% of data values fall.
Within three standard deviations of the mean, 99.7% of data values fall.
In this lesson, we’ll show you how to use R to apply the Empirical Rule to a dataset.
68 95 99 Rule in R
In R, using the Empirical Rule
The pnorm() function in R returns the value of the normal distribution’s cumulative density function.
The following is the fundamental syntax for this function:
pnorm(q, mean, sd)
where:
q: the value of a properly distributed random variable
mean: mean of the distribution
sd: standard deviation of the distribution
To find the area under the normal distribution curve that lies between multiple standard deviations, we can use the following syntax:
find the area under the normal curve that is within one standard deviation of the mean.
pnorm(1) - pnorm(-1) [1] 0.6826895
Inside 2 standard deviations of the mean, find the area under the normal curve
pnorm(2) - pnorm(-2) [1] 0.9544997
Inside 3 standard deviations of the mean, find the area under the normal curve
pnorm(3) - pnorm(-3) [1] 0.9973002
We can confirm the following from the output:
Within one standard deviation of the mean, 68 percent of data values fall.
Within two standard deviations of the mean, 95% of data values fall.
Within three standard deviations of the mean, 99.7% of data values fall.
The following examples demonstrate how to apply the Empirical Rule to various datasets.
Example 1: Using R to Apply the Empirical Rule to a Dataset
Let’s say we have a dataset with a mean of 5 and a standard deviation of 2 that is normally distributed.
To identify which values include 68 percent, 95 percent, and 99.7% of the data, we can use the following code:
Let’s define the terms mean and standard deviation
mean=5 sd=2
To find which values contain 68% of the data
mean-2; mean+2 [1] 3 [1] 7
To find which values contain 95% of the data
mean-2*2; mean+2*2 [1] 1 [1] 9
To find which values contain 99.7% of the data
mean-3*2; mean+3*2 [1] -1 [1] 11
From this output, we can see:
68 percent of the data is in the range of 3 to 7.
95 percent of the data is in the range of 1 to 9 and 99.7% of the data is in the range of -1 to 11.
Example 2: Determining the percent of data that falls between two values
Consider a dataset that is normally distributed and has a mean of 100 and a standard deviation of 5.
Let’s say we want to know what proportion of the data in this distribution falls between 90 and 110.
To obtain the solution, we can utilise the pnorm() function:
between 90 and 110, find the area under the normal curve.
pnorm(110, mean=100, sd=5) - pnorm(90, mean=100, sd=5) 0.9544997
In this distribution, 95.44 percent of the data falls between the values 90 and 110.