Age structure diagram in R
Age structure diagram also known as a population pyramid, A population pyramid is a graph that depicts a population’s age and gender distribution.
It’s a helpful chart for quickly grasping a population’s makeup as well as the present trend in population increase.
A rectangular population pyramid indicates that a population is growing at a slower rate, with older generations being replaced by new generations of nearly equal size.
A pyramid-shaped population pyramid indicates that a population is rising at a quicker rate, with older generations spawning larger new generations.
The gender is displayed on the left and right sides of the chart, the age is displayed on the y-axis, and the percentage or amount of the population is displayed on the x-axis.
This R lesson shows you how to make a population pyramid.
Age structure diagram in R
Consider the following dataset, which depicts the percentage make-up of a population by age (0 to 100 years) and gender (M = “Male,” F = “Female”).
Let’s make this example reproducible
set.seed(123)
Now we can create a data frame
df <- data.frame(age = rep(1:100, 2), gender = rep(c("M", "F"), each = 500)) head(df)
1 1 M 2 2 M 3 3 M 4 4 M 5 5 M 6 6 M
Now we can add the population variable into the above data frame.
df$population <- 1/sqrt(df$age) * runif(200, 15000, 20000) head(df)
age gender population 1 1 M 18917.779 2 2 M 10876.846 3 3 M 11247.496 4 4 M 7547.882 5 5 M 8638.949 6 6 M 7941.566
Let’s convert the population variable into the percentage
df$population <- df$population / sum(df$population) * 100 head(df)
age gender population 1 1 M 0.5803445 2 2 M 0.3336711 3 3 M 0.3450417 4 4 M 0.2315479 5 5 M 0.2650188 6 6 M 0.2436250
Using the ggplot2 library, we can make a basic population pyramid for this dataset:
library(ggplot2)
Now let’s create a population pyramid
ggplot(df, aes(x = age, fill = gender, y = ifelse(test = gender == "M", yes = -population, no = population))) + geom_bar(stat = "identity") + scale_y_continuous(labels = abs, limits = max(df$population) * c(-1,1)) + coord_flip()+ylab("")
Adding Labels and Titles
Using the labs() parameter, we can add titles and axis labels to the population pyramid:
ggplot(df, aes(x = age, fill = gender, y = ifelse(test = gender == "M", yes = -population, no = population))) + geom_bar(stat = "identity") + scale_y_continuous(labels = abs, limits = max(df$population) * c(-1,1)) + labs(title = "Age Structure Diagram", x = "Age", y = "Percentage of population") + coord_flip()
Changing the Colours
Using the scale color manual() parameter, we may change the two colors used to symbolize the genders:
ggplot(df, aes(x = age, fill = gender, y = ifelse(test = gender == "M", yes = -population, no = population))) + geom_bar(stat = "identity") + scale_y_continuous(labels = abs, limits = max(df$population) * c(-1,1)) + labs(title = "Age Structure Diagram", x = "Age", y = "Percentage of population") + scale_colour_manual(values = c("red", "green"), aesthetics = c("colour", "fill")) + coord_flip()
Pyramids of Multiple Populations
The facet wrap() argument can also be used to plot multiple population pyramids together. Assume we have demographic data for three countries: A, B, and C.
The code below shows how to make a unique demographic pyramid for each country:
set.seed(123) data<- data.frame(age = rep(1:100, 6), gender = rep(c("M", "F"), each = 300), country = rep(c("A", "B", "C"), each = 100, times = 2)) data$population<- round(1/sqrt(data$age)*runif(200, 15000, 20000), 0) head(data)
age gender country population 1 1 M A 16929 2 2 M A 11190 3 3 M A 11161 4 4 M A 8049 5 5 M A 6982 6 6 M A 7122
Now we can create one population pyramid per country
ggplot(data, aes(x = age, fill = gender, y = ifelse(test = gender == "M", yes = -population, no = population))) + geom_bar(stat = "identity") + scale_y_continuous(labels = abs, limits = max(data$population) * c(-1,1)) + labs(y = "Population Amount") + coord_flip() + facet_wrap(~ country) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Changing the Theme
Finally, we may change the chart’s theme. The following code, for example, utilizes theme classic() to make the charts look more minimalist.
ggplot(data, aes(x = age, fill = gender, y = ifelse(test = gender == "M", yes = -population, no = population))) + geom_bar(stat = "identity") + scale_y_continuous(labels = abs, limits = max(data$population) * c(-1,1)) + labs(y = "Population Amount") + coord_flip() + facet_wrap(~ country) + theme_classic() + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ scale_colour_manual(values = c("red", "green"), aesthetics = c("colour", "fill"))
Good job, go head other relevant advanced statistics parts.
Thank You