How to create a Sankey plot in R?

How to create a Sankey plot in R?, You must install the ggsankey library and modify your dataset using the package’s make_long function in order to produce a Sankey diagram in ggplot2.

The data’s columns must correspond to the stages x (current stage), next_x (next stage), node (current node), and next_node (the following node).

Keep in mind that the final stage should indicate a NA.

A Side-by-Side Boxplot in R: How to Do It – Data Science Tutorials

Let’s install the remotes packages first,

install.packages("remotes")

Now we can install ggsankey package

remotes::install_github("davidsjoberg/ggsankey")
library(ggsankey)

Load Data

We can make use of mtcars data sets in R.

df <- mtcars %>%
  make_long(cyl, vs, am, gear, carb)
df
    x   node next_x next_node
1  cyl    6     vs         0
2   vs    0     am         1
3   am    1   gear         4
4 gear    4   carb         4
5 carb    4   <NA>        NA
6  cyl    6     vs         0

How to Create an Interaction Plot in R? – Data Science Tutorials

Sankey plot with ggsankey

To construct Sankey diagrams in ggplot2, the ggsankey package includes a geom called geom_sankey.

Keep in mind that you must give a factor as the fill colour when passing the variables to aes. The theme theme_sankey is also present in the function.

Let’s load ggplot2 for graph generation

library(ggplot2)
library(dplyr)
ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node))) +
  geom_sankey() +
  theme_sankey(base_size = 16)

How to add labels in Sankey Plot

The package’s geom_sankey_label function lets you add labels to Sankey diagrams.

Remember to give the variable you want to display as the label inside the aes.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey() +
  geom_sankey_label() +
  theme_sankey(base_size = 16)

How to Add Superscripts and Subscripts to Plots in R? (datasciencetut.com)

How to do Color customization in Sankey Plot

To alter how the Sankey diagram appears in R, a variety of arguments can be changed. The author of the program produced the following pictures as examples.

geom_sankey aesthetics
geom_sankey geometries
Color and fill of the Sankey plot

For instance, by adjusting the fill color palette and a few of the inputs to the geom_sankey_function, we can produce something like this.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey(flow.alpha = 0.5, node.color = 1) +
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d(option = "A", alpha = 0.95) +
  theme_sankey(base_size = 16)

How to Label Outliers in Boxplots in ggplot2? (datasciencetut.com)

Changing the title of the legend

Changes to the legend’s title are available, just like with other ggplot2 charts. Here are several options for action.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey(flow.alpha = 0.5, node.color = 1) +
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 16) +
  guides(fill = guide_legend(title = "Title"))

How to Add a caption to ggplot2 Plots in R? (datasciencetut.com)

Removing the legend

Finally, you can adjust the Sankey plot legend’s position to “none” if you want to remove it.

ggplot(df, aes(x = x,
               next_x = next_x,
               node = node,
               next_node = next_node,
               fill = factor(node),
               label = node)) +
  geom_sankey(flow.alpha = 0.5, node.color = 1) +
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d() +
  theme_sankey(base_size = 16) +
  theme(legend.position = "none")

Changing the Font Size in Base R Plots – Data Science Tutorials

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

three × 2 =