Descriptive Statistics in R

Descriptive Statistics in R: A Step-by-Step Guide

Descriptive statistics are a crucial part of data analysis, as they provide a snapshot of the central tendency and variability of a dataset.

In R, there are two primary functions that can be used to calculate descriptive statistics: summary() and sapply().

In this article, we will explore how to use these functions to gain a deeper understanding of our data.

Replace first match in R » Data Science Tutorials

Method 1: Using the summary() Function

The summary() function is a simple and efficient way to calculate various descriptive statistics for each variable in a data frame. To use this function, simply call it on your data frame, like so:

summary(my_data)

The summary() function will return a variety of values for each variable, including the minimum, first quartile, median, mean, third quartile, and maximum.

For example, let’s say we have the following data frame:

df <- data.frame(x=c(1, 4, 4, 5, 6, 7, 10, 12),
                 y=c(2, 2, 3, 3, 4, 5, 11, 11),
                 z=c(8, 9, 9, 9, 10, 13, 15, 17))

We can use the summary() function to calculate descriptive statistics for each variable:

summary(df)

This will output:

       x                y                z        
 Min.   :1.000   Min.   :2.000   Min.   :8.00  
 1st Qu.:4.000   1st Qu.:2.750   1st Qu.:9.00  
 Median :5.500   Median :3.500   Median :9.50  
 Mean   :6.125   Mean   :5.125   Mean   :11.25  
 3rd Qu.:7.750   3rd Qu.:6.500   3rd Qu.:13.50  
 Max.   :12.000   Max.   :11.000   Max.   :17.00 

Method 2: Using the sapply() Function

The sapply() function is a more versatile option for calculating descriptive statistics. It allows us to specify a custom function to apply to each variable in the data frame.

For example, we can use the sapply() function to calculate the standard deviation of each variable:

sapply(df, sd, na.rm=TRUE)

This will output:

       x        y        z 
3.522884 3.758324 3.327376 

We can also use the sapply() function to calculate more complex descriptive statistics by defining a custom function within it.

For example, let’s say we want to calculate the range of each variable:

sapply(df, function(df) max(df)-min(df), na.rm=TRUE)

This will output:

x      y      z 
11 9 9

Conclusion

In this article, we have explored two methods for calculating descriptive statistics in R: the summary() function and the sapply() function.

The summary() function provides a quick and easy way to calculate common descriptive statistics for each variable in a data frame.

The sapply() function offers more flexibility and allows us to define custom functions to calculate more complex descriptive statistics.

By using these functions effectively, we can gain a deeper understanding of our data and make more informed decisions about our analysis and visualization strategies.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

eighteen − five =

Ads Blocker Image Powered by Code Help Pro

Quality articles need supporters. Will you be one?

You currently have an Ad Blocker on.

Please support FINNSTATS.COM by disabling these ads blocker.

Powered By
100% Free SEO Tools - Tool Kits PRO