How to Create Summary Tables in R
How to Create Summary Tables in R?, The describe() and describeBy() methods from the psych package is the simplest to use for creating summary tables in R.
How to apply a transformation to multiple columns in R?
library(psych)
Let’s create a summary table
describe(df)
We can now create a summary table that is organized by a certain variable.
describeBy(df, group=df$var_name)
The practical application of these features is demonstrated in the examples that follow.
Example 1:- Create a simple summary table
Let’s say we have the R data frame shown below:
make a data frame
df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P1'),
points=c(150, 222, 229, 421, 330, 211, 219),
rebounds=c(17, 28, 36, 16, 17, 29, 15),
steals=c(11, 151, 152, 73, 85, 79, 58))
Now we can view the data frame
df
team points rebounds steals 1 P1 150 17 11 2 P1 222 28 151 3 P1 229 36 152 4 P2 421 16 73 5 P2 330 17 85 6 P2 211 29 79 7 P1 219 15 58
For each variable in the data frame, a summary table can be made using the describe() function.
Add new calculated variables to a data frame and drop all existing variables
library(psych)
Now will create a summary table
describe(df)
vars n mean sd median trimmed mad min max range skew kurtosis team* 1 7 1.43 0.53 1 1.43 0.00 1 2 1 0.23 -2.20 points 2 7 254.57 90.56 222 254.57 16.31 150 421 271 0.71 -1.03 rebounds 3 7 22.57 8.30 17 22.57 2.97 15 36 21 0.44 -1.73 steals 4 7 87.00 50.34 79 87.00 31.13 11 152 141 0.08 -1.47 se team* 0.20 points 34.23 rebounds 3.14 steals 19.03
Here’s how to interpret each value in the output:
vars: column number
n: Number of valid cases
mean: The mean value
median: The median value
trimmed: The trimmed mean (default trims 10% of observations from each end)
mad: The median absolute deviation (from the median)
min: The minimum value
max: The maximum value
range: The range of values (max – min)
skew: The skewness
kurtosis: The kurtosis
se: The standard error
Any variable that has an asterisk (*) next to it has been transformed from being categorical or logical to becoming a numerical variable with values that represent the numerical ordering of the values.
How to Use Spread Function in R?-tidyr
We shouldn’t take the summary statistics for the variable “team” which has been transformed into a numerical variable.
Also, take note that the setting fast=TRUE allows you to merely compute the most typical summary statistics.
Now we can create a smaller summary table
describe(df, fast=TRUE)
vars n mean sd min max range se team 1 7 NaN NA Inf -Inf -Inf NA points 2 7 254.57 90.56 150 421 271 34.23 rebounds 3 7 22.57 8.30 15 36 21 3.14 steals 4 7 87.00 50.34 11 152 141 19.03
Additionally, we have the option of only computing the summary statistics for a subset of the data frame’s variables:
make a summary table using only the columns “points” and “rebounds”
describe(df[ , c('points', 'rebounds')], fast=TRUE)
vars n mean sd min max range se points 1 7 254.57 90.56 150 421 271 34.23 rebounds 2 7 22.57 8.30 15 36 21 3.14
Example 2: Make a summary table that is grouped by a certain variable.
The describeBy() function can be used to group the data frame’s summary table by the variable “team” using the following code.
build the summary table with teams as the primary grouping.
How to Use Mutate function in R – Data Science Tutorials
describeBy(df, group=df$team, fast=TRUE)
Descriptive statistics by group
group: P1 vars n mean sd min max range se team 1 4 NaN NA Inf -Inf -Inf NA points 2 4 205 36.91 150 229 79 18.45 rebounds 3 4 24 9.83 15 36 21 4.92 steals 4 4 93 70.22 11 152 141 35.11 ------------------------------------------------------------- group: P2 vars n mean sd min max range se team 1 3 NaN NA Inf -Inf -Inf NA points 2 3 320.67 105.31 211 421 210 60.80 rebounds 3 3 20.67 7.23 16 29 13 4.18 steals 4 3 79.00 6.00 73 85 12 3.46
The summary statistics for each of the three teams in the data frame are displayed in the output.