R Summary Statistics Table

R Summary Statistics Table, The describe() and describeBy() methods from the psych package are the simplest way to produce summary tables in R.

library(psych)

The syntax for the summary table

tidyverse in r – Complete Tutorial » Unknown Techniques » finnstats

describe(df)

Now we can create a summary table, grouped by a specific variable

describeBy(df, group=df$varname)

R Summary Statistics Table

The following examples show how to use these functions in practice.

Example 1:- Create a Basic Summary Table

Let’s say we have the following R data frame.

tidyverse in r – Complete Tutorial » Unknown Techniques » finnstats

Let’s take the iris dataset for illustration purposes.

df <- iris

Now we can view the data frame

head(df)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

To construct a summary table for each variable in the data frame, we may use the describe() function.

library(psych)
describe(df)
         vars   n mean   sd median trimmed  mad min max range  skew kurtosis   se
Sepal.Length    1 150 5.84 0.83   5.80    5.81 1.04 4.3 7.9   3.6  0.31    -0.61 0.07
Sepal.Width     2 150 3.06 0.44   3.00    3.04 0.44 2.0 4.4   2.4  0.31     0.14 0.04
Petal.Length    3 150 3.76 1.77   4.35    3.76 1.85 1.0 6.9   5.9 -0.27    -1.42 0.14
Petal.Width     4 150 1.20 0.76   1.30    1.18 1.04 0.1 2.5   2.4 -0.10    -1.36 0.06
Species*        5 150 2.00 0.82   2.00    2.00 1.48 1.0 3.0   2.0  0.00    -1.52 0.07

The following are some examples of how to interpret each value in the output.

Exploratory Data Analysis (EDA) » Overview » finnstats

vars: number of columns

n: The total number of legitimate cases

mean: The average price

median: The median value has been cut to The average after trimming (default trims 10 percent of observations from each end)

mad: The median value

trimmed: The trimmed mean (default trims 10% of observations from each end)

range: The range of values (max – min)

skew: The skewness

kurtosis: The kurtosis

se: The standard error

Any variable marked with an asterisk (*) is a categorical or logical variable that has been converted to a numerical variable with values that represent the numerical ordering of the values.

Because the variable ‘Species1’ has been changed to a numerical variable in our example, the summary statistics for it should not be taken literally.

Best AI Courses Online-Free » finnstats

Also, the parameter fast=TRUE can be used to only calculate the most common summary statistics.

reduce the size of the summary table

describe(df, fast=TRUE)
             vars   n mean   sd min  max range   se
Sepal.Length    1 150 5.84 0.83 4.3  7.9   3.6 0.07
Sepal.Width     2 150 3.06 0.44 2.0  4.4   2.4 0.04
Petal.Length    3 150 3.76 1.77 1.0  6.9   5.9 0.14
Petal.Width     4 150 1.20 0.76 0.1  2.5   2.4 0.06
Species         5 150  NaN   NA Inf -Inf  -Inf   NA

We may also select to construct summary statistics for only a subset of the data frame’s variables:

only the ‘Petal.Length’ and ‘Sepal.Length’ columns should be included in the summary table

describe(df[ , c('Petal.Length', 'Sepal.Length')], fast=TRUE)
     vars   n mean   sd min max range   se
Petal.Length    1 150 3.76 1.77 1.0 6.9   5.9 0.14
Sepal.Length    2 150 5.84 0.83 4.3 7.9   3.6 0.07

Example 2: Make a summary table with specific variables grouped together.

The following code demonstrates how to group the data frame by the ‘ Species’ variable and use the describeBy() function to build a summary table.

Data Visualization Graphs-ggside with ggplot » finnstats

Make a summary table based on the ‘Species’ variable.

describeBy(df[,-5], group=df$Species, fast=TRUE)

Descriptive statistics by group

group: setosa
             vars  n mean   sd min  max range   se
Sepal.Length    1 50 5.01 0.35 4.3  5.8   1.5 0.05
Sepal.Width     2 50 3.43 0.38 2.3  4.4   2.1 0.05
Petal.Length    3 50 1.46 0.17 1.0  1.9   0.9 0.02
Petal.Width     4 50 0.25 0.11 0.1  0.6   0.5 0.01
---------------------------------------------------------------------------

group: versicolor
             vars  n mean   sd min  max range   se
Sepal.Length    1 50 5.94 0.52 4.9  7.0   2.1 0.07
Sepal.Width     2 50 2.77 0.31 2.0  3.4   1.4 0.04
Petal.Length    3 50 4.26 0.47 3.0  5.1   2.1 0.07
Petal.Width     4 50 1.33 0.20 1.0  1.8   0.8 0.03
---------------------------------------------------------------------------

group: virginica
             vars  n mean   sd min  max range   se
Sepal.Length    1 50 6.59 0.64 4.9  7.9   3.0 0.09
Sepal.Width     2 50 2.97 0.32 2.2  3.8   1.6 0.05
Petal.Length    3 50 5.55 0.55 4.5  6.9   2.4 0.08
Petal.Width     4 50 2.03 0.27 1.4  2.5   1.1 0.04

The summary statistics for each of the three Species in the data frame are displayed in the output.

You may also like...

Leave a Reply

Your email address will not be published.

error

Subscribe Now