# Cluster Sampling in R-Cluster or area sampling in a nutshell

Cluster Sampling in R, In most situations, the sampling frame for elementary units of the population is not available, moreover, it is not easy to prepare it.

Cluster Meaning, the information is available to groups of items called clusters.

For example, the list of houses may be available but not the person residing in them, list of individual farms may not be available but the list of villages is available.

Hence, in those conditions, homes or villages are referred to as clusters, and the choice must be the product of homes or villages withinside the sample. This sampling procedure is called Cluster sampling.

When the whole region containing the populace is split into smaller areas or segments, those small regions or segments are taken as sampling units.

This procedure is known as area sampling. Clusters or area segments are also known as primary units.

In cluster sampling, precautions should be taken to ensure that a unit will never belong to more than one cluster.

Also, every primary unit of the population should definitely belong to one primary unit.

Principal component analysis (PCA) in R »

### In what situation is cluster sampling preferred?

Cluster sampling is used when,

1. The sampling frame is not available and it is too expensive and time-consuming to prepare it.

2. The sampling units are situated distant apart.

3. In this situation selection of elementary units make the survey very difficult.

4. The elementary units may not be easily available and locatable.

In most cases, researchers take samples from a population and use the sample data to arrive at conclusions about the population.

One of the most commonly used methods is cluster sampling, where a population is divided into multiple clusters and where all units in particular clusters are selected.

KNN Algorithm Machine Learning » Classification & Regression »

### Cluster Sampling in R

set.seed(123) df <- data.frame(Person = rep(1:20, each=5), Expertise = rnorm(200, mean=7, sd=1)) head(df)

Output:-

Person Expertise 1 1 6.4 2 1 6.8 3 1 8.6 4 1 7.1 5 1 7.1 6 2 8.7

randomly choose 4 tour groups out of the 20

clusters <- sample(unique(df$Person), size=4, replace=F)

Define sample as all persons who belong to one of the 4 groups, you can adjust the size you want.

cluster_sample <- df[df$Person %in% clusters, ]

Lets check out the how many persons in each cluster

Customer Segmentation K Means Cluster »

table(cluster_sample$Person)

Output:-

3 7 9 15 10 10 10 10

From the output we can seen that,

10 persons from group 3, 10 persons from group 7, 10 persons from group 9 and 10 persons from group 15.

If you want to read more on cluster analysis in R, check the link Cluster Analysis in R » Unsupervised Approach »