R Summary Statistics Table
R Summary Statistics Table, The describe() and describeBy() methods from the psych package are the simplest way to produce summary tables in R.
library(psych)
The syntax for the summary table
tidyverse in r – Complete Tutorial » Unknown Techniques » finnstats
describe(df)
Now we can create a summary table, grouped by a specific variable
describeBy(df, group=df$varname)
R Summary Statistics Table
The following examples show how to use these functions in practice.
Example 1:- Create a Basic Summary Table
Let’s say we have the following R data frame.
tidyverse in r – Complete Tutorial » Unknown Techniques » finnstats
Let’s take the iris dataset for illustration purposes.
df <- iris
Now we can view the data frame
head(df)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
To construct a summary table for each variable in the data frame, we may use the describe() function.
library(psych) describe(df)
vars n mean sd median trimmed mad min max range skew kurtosis se Sepal.Length 1 150 5.84 0.83 5.80 5.81 1.04 4.3 7.9 3.6 0.31 -0.61 0.07 Sepal.Width 2 150 3.06 0.44 3.00 3.04 0.44 2.0 4.4 2.4 0.31 0.14 0.04 Petal.Length 3 150 3.76 1.77 4.35 3.76 1.85 1.0 6.9 5.9 -0.27 -1.42 0.14 Petal.Width 4 150 1.20 0.76 1.30 1.18 1.04 0.1 2.5 2.4 -0.10 -1.36 0.06 Species* 5 150 2.00 0.82 2.00 2.00 1.48 1.0 3.0 2.0 0.00 -1.52 0.07
The following are some examples of how to interpret each value in the output.
Exploratory Data Analysis (EDA) » Overview » finnstats
vars: number of columns
n: The total number of legitimate cases
mean: The average price
median: The median value has been cut to The average after trimming (default trims 10 percent of observations from each end)
mad: The median value
trimmed: The trimmed mean (default trims 10% of observations from each end)
range: The range of values (max – min)
skew: The skewness
kurtosis: The kurtosis
se: The standard error
Any variable marked with an asterisk (*) is a categorical or logical variable that has been converted to a numerical variable with values that represent the numerical ordering of the values.
Because the variable ‘Species1’ has been changed to a numerical variable in our example, the summary statistics for it should not be taken literally.
Best AI Courses Online-Free » finnstats
Also, the parameter fast=TRUE can be used to only calculate the most common summary statistics.
reduce the size of the summary table
describe(df, fast=TRUE)
vars n mean sd min max range se Sepal.Length 1 150 5.84 0.83 4.3 7.9 3.6 0.07 Sepal.Width 2 150 3.06 0.44 2.0 4.4 2.4 0.04 Petal.Length 3 150 3.76 1.77 1.0 6.9 5.9 0.14 Petal.Width 4 150 1.20 0.76 0.1 2.5 2.4 0.06 Species 5 150 NaN NA Inf -Inf -Inf NA
We may also select to construct summary statistics for only a subset of the data frame’s variables:
only the ‘Petal.Length’ and ‘Sepal.Length’ columns should be included in the summary table
describe(df[ , c('Petal.Length', 'Sepal.Length')], fast=TRUE)
vars n mean sd min max range se Petal.Length 1 150 3.76 1.77 1.0 6.9 5.9 0.14 Sepal.Length 2 150 5.84 0.83 4.3 7.9 3.6 0.07
Example 2: Make a summary table with specific variables grouped together.
The following code demonstrates how to group the data frame by the ‘ Species’ variable and use the describeBy() function to build a summary table.
Data Visualization Graphs-ggside with ggplot » finnstats
Make a summary table based on the ‘Species’ variable.
describeBy(df[,-5], group=df$Species, fast=TRUE)
Descriptive statistics by group
group: setosa vars n mean sd min max range se Sepal.Length 1 50 5.01 0.35 4.3 5.8 1.5 0.05 Sepal.Width 2 50 3.43 0.38 2.3 4.4 2.1 0.05 Petal.Length 3 50 1.46 0.17 1.0 1.9 0.9 0.02 Petal.Width 4 50 0.25 0.11 0.1 0.6 0.5 0.01 --------------------------------------------------------------------------- group: versicolor vars n mean sd min max range se Sepal.Length 1 50 5.94 0.52 4.9 7.0 2.1 0.07 Sepal.Width 2 50 2.77 0.31 2.0 3.4 1.4 0.04 Petal.Length 3 50 4.26 0.47 3.0 5.1 2.1 0.07 Petal.Width 4 50 1.33 0.20 1.0 1.8 0.8 0.03 --------------------------------------------------------------------------- group: virginica vars n mean sd min max range se Sepal.Length 1 50 6.59 0.64 4.9 7.9 3.0 0.09 Sepal.Width 2 50 2.97 0.32 2.2 3.8 1.6 0.05 Petal.Length 3 50 5.55 0.55 4.5 6.9 2.4 0.08 Petal.Width 4 50 2.03 0.27 1.4 2.5 1.1 0.04
The summary statistics for each of the three Species in the data frame are displayed in the output.