How to Plot Categorical Data in R-Quick Guide

Plot Categorical Data in R, Categorical variables are data types that can be separated into categories. Race, sex, age group, and educational level are examples of categorical variables.

This tutorial describes three approaches to plot categorical data in R.

Let’s make use of Bar Charts, Mosaic Plots, and Boxplots by Group.

Approach 1: Bar Chart

The code below demonstrates how to make a bar chart to represent the frequency of teams in a given data frame. Let’s load the library first,

Timeseries analysis in R » Decomposition, & Forecasting »

library(ggplot2)

Create dataset

data <- data.frame(result = c('W', 'L', 'L', 'W', 'W', 'L', 'L', 'L', 'W', 'L'),
team = c('B', 'D', 'B', 'A', 'D', 'A', 'A', 'D', 'C', 'D'),
score = c(18, 38, 29, 28, 32, 55, 22, 48, 33, 12),
rebounds = c(15, 5, 9, 10, 15, 8, 9, 12, 11, 10))
head(data)
    result team score rebounds
1      W    B    18       15
2      L    D    38        5
3      L    B    29        9
4      W    A    28       10
5      W    D    32       15
6      L    A    55        8

Create a bar chart of teams

ggplot(data, aes(x=reorder(team, team, function(x)-length(x)))) +
geom_bar(fill='red') +  labs(x='Team')

datatable editor-DT package in R » Shiny, R Markdown & R »

Approach 2: Boxplots by Group

A useful technique to show a numeric variable that is grouped by a categorical variable is to use grouped boxplots.

The following code, explains how to make boxplots that display the distribution of points scored per team

Create boxplots of points, grouped by team

ggplot(data, aes(x=team, y=score)) +  geom_boxplot(fill='green')

The teams are represented on the x-axis, while the distribution of points scored by each team is represented on the y-axis.

Kruskal Wallis test in R-One-way ANOVA Alternative »

Approach 3: Mosaic Plot

A mosaic plot is a form of a graph that shows the frequencies of two categorical variables on the same graph.

The following code demonstrates how to make a mosaic plot that displays the frequency of the categorical variables “result” and “team” in one figure

Create the counts

counts <- table(data$result, data$team)

#create a mosaic plot

mosaicplot(counts, xlab='Match Result', ylab='Team',main='Wins by Team', col='orange')

The game outcome is displayed on the x-axis, while the four separate teams are displayed on the y-axis.

Principal component analysis (PCA) in R »

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

sixteen + sixteen =