How to Plot Categorical Data in R-Quick Guide
Plot Categorical Data in R, Categorical variables are data types that can be separated into categories. Race, sex, age group, and educational level are examples of categorical variables.
This tutorial describes three approaches to plot categorical data in R.
Let’s make use of Bar Charts, Mosaic Plots, and Boxplots by Group.
Approach 1: Bar Chart
The code below demonstrates how to make a bar chart to represent the frequency of teams in a given data frame. Let’s load the library first,
Timeseries analysis in R » Decomposition, & Forecasting »
data <- data.frame(result = c('W', 'L', 'L', 'W', 'W', 'L', 'L', 'L', 'W', 'L'), team = c('B', 'D', 'B', 'A', 'D', 'A', 'A', 'D', 'C', 'D'), score = c(18, 38, 29, 28, 32, 55, 22, 48, 33, 12), rebounds = c(15, 5, 9, 10, 15, 8, 9, 12, 11, 10))
result team score rebounds 1 W B 18 15 2 L D 38 5 3 L B 29 9 4 W A 28 10 5 W D 32 15 6 L A 55 8
Create a bar chart of teams
ggplot(data, aes(x=reorder(team, team, function(x)-length(x)))) + geom_bar(fill='red') + labs(x='Team')
datatable editor-DT package in R » Shiny, R Markdown & R »
Approach 2: Boxplots by Group
A useful technique to show a numeric variable that is grouped by a categorical variable is to use grouped boxplots.
The following code, explains how to make boxplots that display the distribution of points scored per team
Create boxplots of points, grouped by team
ggplot(data, aes(x=team, y=score)) + geom_boxplot(fill='green')
The teams are represented on the x-axis, while the distribution of points scored by each team is represented on the y-axis.
Kruskal Wallis test in R-One-way ANOVA Alternative »
Approach 3: Mosaic Plot
A mosaic plot is a form of a graph that shows the frequencies of two categorical variables on the same graph.
The following code demonstrates how to make a mosaic plot that displays the frequency of the categorical variables “result” and “team” in one figure
Create the counts
counts <- table(data$result, data$team)
#create a mosaic plot
mosaicplot(counts, xlab='Match Result', ylab='Team',main='Wins by Team', col='orange')
The game outcome is displayed on the x-axis, while the four separate teams are displayed on the y-axis.
Principal component analysis (PCA) in R »