PROC SUMMARY in SAS for Descriptive Statistics
PROC SUMMARY in SAS for Descriptive Statistics, When analyzing datasets in SAS, one of the most efficient ways to calculate descriptive statistics for one or multiple variables is by using the PROC SUMMARY
procedure.
This method allows you to quickly summarize key statistical metrics including the number of observations (N), minimum and maximum values, mean, and standard deviation.
PROC SUMMARY in SAS for Descriptive Statistics
In this article, we’ll explore how to utilize this powerful tool using the built-in dataset Fish
, which contains various measurements for 159 fish caught in a Finnish lake.
Overview of PROC SUMMARY
PROC SUMMARY
is designed to provide a concise summary of your data, enabling you to gain insights into its general distribution and characteristics through easily interpretable statistics.
Here’s what each statistic represents:
- N: Total number of observations
- MIN: Minimum value observed
- MAX: Maximum value observed
- MEAN: Average value
- STD: Standard deviation of the values
Viewing the Fish Dataset
Before diving into the summaries, it’s helpful to inspect the dataset. You can do this by displaying the first ten observations of the Fish
dataset from SAS:
/* View the first 10 observations from the Fish dataset */
proc print data=sashelp.Fish (obs=10);
run;
This step offers a glimpse into the data, preparing you for the subsequent analyses.
Example 1: Calculating Descriptive Statistics for a Single Variable
To illustrate the use of PROC SUMMARY
, let’s start by calculating descriptive statistics for the Weight
variable in the Fish
dataset:
/* Calculate descriptive statistics for the Weight variable */
proc summary data=sashelp.Fish;
var Weight;
output out=summaryWeight;
run;
/* Print the output dataset */
proc print data=summaryWeight;
run;
Interpreting the Output
The output table from this procedure includes several columns:
- TYPE: Indicates whether every row was used in the calculations (0 = all rows).
- FREQ: The number of observations used.
- STAT: The name of the descriptive statistic.
- Weight: The resulting values for the statistics.
From the output, you can glean insights, such as:
- Total observations: 158
- Minimum weight: 0
- Maximum weight: 1,650
- Mean weight: 398.70
- Standard deviation: 359.09
These statistics give a solid understanding of the distribution of weights in the dataset.
Example 2: Calculating Descriptive Statistics for Multiple Variables
To analyze multiple variables simultaneously, simply list them in the var
statement. Here’s how to calculate the descriptive statistics for both Weight
and Height
:
/* Calculate descriptive statistics for Weight and Height variables */
proc summary data=sashelp.Fish;
var Weight Height;
output out=summaryWeightHeight;
run;
/* Print the output dataset */
proc print data=summaryWeightHeight;
run;
The resulting output will present descriptive statistics for both variables, allowing for comprehensive analysis.
Example 3: Grouping Statistics by Another Variable
For a more granular analysis, you can group the descriptive statistics by another variable using the class
statement. Let’s look at how to summarize Weight
grouped by Species
:
/* Calculate descriptive statistics for Weight grouped by Species */
proc summary data=sashelp.Fish;
var Weight;
class Species;
output out=summaryWeightSpecies;
run;
/* Print the output dataset */
proc print data=summaryWeightSpecies;
run;
The output will indicate descriptive statistics for each species of fish. For example, you might observe the following for the Bream species:
- Total observations: 34
- Minimum weight: 242
- Maximum weight: 1,000
- Mean weight: 626
- Standard deviation: 206.60
Conclusion
The PROC SUMMARY
procedure in SAS is a robust solution for calculating descriptive statistics efficiently.
Whether you want to analyze a single variable or multiple variables in your dataset, or group statistics by category, this procedure can be implemented with ease.
By following the examples provided, you can effectively summarize and interpret your data, leading to more informed analyses and insights.
Now that you are familiar with using PROC SUMMARY
in SAS, you can enhance your data analysis skills and improve your understanding of statistical distributions in various datasets.
Happy analyzing!