How to Calculate a Bootstrap Standard Error in R
Bootstrap Standard Error in R, Bootstrapping is a technique for calculating the standard error of a mean.
The following is the basic procedure for calculating a bootstrapped standard error.
Model Selection in Machine Learning » finnstats
From a given dataset, take k repeated samples using replacement and calculate the standard error for each sample: s/√n
As a result, there are k distinct standard error estimates. Take the mean of the k standard errors to get the bootstrapped standard error.
The following examples show how to calculate a bootstrapped standard error in R using two distinct methods.
Approach 1: Boot Package
The boot() function from the boot library is one technique to calculate a bootstrap standard error in R.
In R, the following code demonstrates how to compute a bootstrap standard error for a given dataset.
Let’s take the example reproducible
set.seed(123)
Now load the boot library
library(boot)
We can define the dataset
x <- c(112, 64, 84, 78, 67, 221, 125, 219, 45, 79)
Let’s create a function to calculate mean
meanF <- function(x,i){mean(x[i])}
Okay, now we can calculate standard error using 500 bootstrapped samples
boot(x, meanF, 5000)
ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = meanF, R = 5000) Bootstrap Statistics : original bias std. error t1* 109.4 -0.13972 18.41172
The “original” number of 109.4 represents the dataset’s mean. The bootstrap standard error of the mean is represented by the value 18.41 in the “std. error” column.
NLP Courses Online (Natural Language Processing) » finnstats
In this example, we used 5000 bootstrapped samples to estimate the standard error of the mean, but we could have used 1,000, 10,000, or any other number of bootstrapped samples.
Approach 2: Own Formula
We can also construct our own code to calculate a bootstrapped standard error.
The code below demonstrates how to do so:
create a repeatable example
set.seed(123)
Let’s load the boot library
library(boot)
Now we can use the same dataset
x <- c(112, 64, 84, 78, 67, 221, 125, 219, 45, 79) mean(replicate(500, sd(sample(x, replace=T))/sqrt(length(x)))) [1] 18.11736
18.11 is the bootstrapped standard error. This standard error looks a lot like the one determined in the previous example.