Summarize Dataset in SAS
Summarize Dataset in SAS, PROC CONTENTS
is a valuable tool for printing a summary of the contents of a dataset.
Summarize Dataset in SAS
This procedure provides essential information about the structure and characteristics of the data.
Example: Implementing PROC CONTENTS in SAS
Consider a dataset that includes information about various basketball players. We can create this dataset in SAS as follows:
/* Create dataset */
data original_data;
input team $ points rebounds;
datalines;
A 12 8
A 12 8
A 12 8
A 23 9
A 20 12
A 14 7
A 14 7
B 20 2
B 20 5
B 29 4
B 14 7
B 20 2
B 20 2
B 20 5
;
run;
/* View the dataset */
proc print data=original_data;
run;
Obtaining a Summary of the Dataset
To gather a summary of the contents of this dataset, we use PROC CONTENTS
:
/* View contents of the dataset */
proc contents data=original_data;
run;
Interpreting the Output
The output generated by PROC CONTENTS
is divided into several tables. Key information includes:
- Data Set Name: The name of the dataset (in this case,
original_data
). - Observations: The total number of rows in the dataset (which is 14).
- Variables: The count of columns present in the dataset (which is 3).
The second table typically provides technical details about the SAS engine and the host, which may not be particularly relevant for most analyses.
The third table displays an alphabetical listing of the variables along with their data types and lengths. For example, you can observe:
points
: Numeric variablerebounds
: Numeric variableteam
: Character variable
Displaying Variables in Their Original Order
If you prefer to view the variables in the order they appear in the dataset rather than alphabetically, you can specify the ORDER=VARNUM
option:
/* View contents of the dataset, retaining original variable order */
proc contents data=original_data order=varnum;
run;
This modification will list the variables in the same sequence as they are defined in the dataset.
Conclusion
In this guide, we explored how to use PROC CONTENTS
in SAS to obtain a summary of a dataset’s structure. Specifically, we highlighted its usefulness in identifying:
- The size of the dataset (number of rows and columns)
- The names and data types of each variable present
Using PROC CONTENTS
prior to performing any statistical analysis is a best practice, as it helps analysts gain a clearer understanding of the size and structure of their data.