Differences Between Datasets in SAS PROC COMPARE
Differences Between Datasets in SAS, PROC COMPARE in SAS is a powerful and convenient procedure for quickly assessing the similarities and differences between two datasets.
Differences Between Datasets in SAS
Below, we’ll explore the syntax and application of this procedure through a practical example.
Basic Syntax
The basic syntax for using PROC COMPARE is as follows:
proc compare
base=data1
compare=data2;
run;
Example: PROC COMPARE in Action
Let’s illustrate the usage of PROC COMPARE with two sample datasets.
- Creating the Datasets
First, we create two datasets named data1
and data2
using the following code:
/* Create datasets */
data data1;
input team $ points rebounds;
datalines;
A 25 10
B 18 4
C 18 7
D 24 12
E 27 11
;
run;
data data2;
input team $ points;
datalines;
A 25
B 18
F 27
G 21
H 20
;
run;
- Viewing the Datasets
To understand what each dataset contains, we can print them out:
/* View datasets */
proc print data=data1;
proc print data=data2;
- Comparing the Datasets
Now, we can use PROC COMPARE to find the similarities and differences:
/* Compare the two datasets */
proc compare
base=data1
compare=data2;
run;
Output Tables
When the PROC COMPARE procedure is executed, it produces three informative tables:
- Table 1: Summary of Both Datasets This summary provides an overview of each dataset, highlighting the following:
- Number of variables and observations in each dataset.
data1
has 3 variables and 5 observations.data2
has 2 variables and 5 observations.
- Number of common variables between the two datasets. In our case, they share 2 common variables:
team
andpoints
.
- Table 2: Summary of Differences in Values The second table summarizes the discrepancies observed between corresponding values in the datasets. Notable information includes:
- The
team
variable exhibits differences in 3 observations. - The
points
variable also has 3 differing observations, with a maximum difference of 9.
- Table 3: Detailed Differences Between Observations This table provides a granular view of the actual differences:
- For the
team
variable, the third observation diverges:data1
lists ‘C’, whiledata2
lists ‘F’. - For the
points
variable, we can see an example wheredata1
has a value of 18 anddata2
has a value of 27, resulting in a difference of 9.
These tables collectively furnish a comprehensive understanding of the differences between the two datasets.
Comparing Specific Variables
If you wish to focus only on particular variables, you can specify them in the PROC COMPARE statement. Here’s how to compare just the points
variable:
/* Compare only the 'points' variable */
proc compare
base=data1
compare=data2;
var points;
run;
The output will be similar to the previous comparison but will exclusively highlight the data related to the points
variable.
Conclusion
Utilizing PROC COMPARE is an efficient method for identifying and understanding the differences between datasets in SAS.
The procedure generates detailed summaries that can help in data analysis, ensuring that you can quickly spot variations and discrepancies.
Whether you’re comparing entire datasets or focusing on specific variables, PROC COMPARE is an invaluable tool in the SAS programming arsenal.