Systematic Sampling in R with example

by finnstats

Systematic Sampling in R, Systematic sampling is a sort of probability sampling in which individuals of a bigger population are chosen at random from a larger population but at a fixed, periodic interval.

The fixed periodic interval, also known as the sampling interval, is calculated by dividing the population size by the required sample size.

Researchers frequently gather samples from a population and use the findings to derive conclusions about the entire population.

Systematic sampling is a widely used sampling approach that involves a simple two-step procedure.

1. Sort the members of a population into some sort of order.

2. Select every nth member to be included in the sample from a random beginning point.

Systematic Sampling in R as an example

Assume a school manager wants to take a sample of 100 students from a school with a total enrollment of 500.

In systematic sampling, which requires alphabetizing each student by name, choosing a starting point at random, and picking every fifth student to be included in the sample.

The following code demonstrates how to generate a fictitious data frame in R:

Make this example repeatable.

set.seed(123)

develop a simple function for generating random names

Names <- function(n = 2000) {
  do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
}

Now we can create a data frame

df <- data.frame(name = Names(500),
                 score = rnorm(500, mean=25, sd=5))

Let’s view the first six rows of the data frame

head(df)

  name    score
1 XAEIP 22.83760
2 RDPZK 23.21978
3 QIQER 27.75507
4 HVSTN 26.25443
5 LHDUA 23.26211
6 ZCMLO 30.46350

The following code demonstrates how to use systematic sampling to obtain a sample of 100 students:

To obtain a systematic sample, define a function.

sys = function(N,n){
  k = ceiling(N/n)
  r = sample(1:k, 1)
  seq(r, r + k*(n-1), k)
}

assemble a systematic sample

sys_sample<-df[sys(nrow(df), 100), ]

Now we can view the first six rows of the data frame

head(sys_sample)

    name    score
3  QIQER 27.75507
8  FGNNE 19.50552
13 BSSUH 28.75092
18 JFSIS 24.28128
23 RAXJU 18.27119
28 THUAR 29.22662

dim(sys_sample)
[1] 100   2

It’s worth noting that the sample’s first member was in row 3 of the original data frame. The next member of the sample is 5 rows after the previous one.

We can observe that the systematic sample we got is a data frame with 100 rows and 2 columns by using dim().