Differences Between Datasets in SAS PROC COMPARE

Differences Between Datasets in SAS, PROC COMPARE in SAS is a powerful and convenient procedure for quickly assessing the similarities and differences between two datasets.

Differences Between Datasets in SAS

Below, we’ll explore the syntax and application of this procedure through a practical example.

Basic Syntax

The basic syntax for using PROC COMPARE is as follows:

proc compare
    base=data1
    compare=data2;
run;

Example: PROC COMPARE in Action

Let’s illustrate the usage of PROC COMPARE with two sample datasets.

  1. Creating the Datasets

First, we create two datasets named data1 and data2 using the following code:

/* Create datasets */
data data1;
    input team $ points rebounds;
    datalines;
A 25 10
B 18 4
C 18 7
D 24 12
E 27 11
;
run;

data data2;
    input team $ points;
    datalines;
A 25
B 18
F 27
G 21
H 20
;
run;
  1. Viewing the Datasets

To understand what each dataset contains, we can print them out:

/* View datasets */
proc print data=data1;
proc print data=data2;
  1. Comparing the Datasets

Now, we can use PROC COMPARE to find the similarities and differences:

/* Compare the two datasets */
proc compare
    base=data1
    compare=data2;
run;

Output Tables

When the PROC COMPARE procedure is executed, it produces three informative tables:

  1. Table 1: Summary of Both Datasets This summary provides an overview of each dataset, highlighting the following:
  • Number of variables and observations in each dataset.
    • data1 has 3 variables and 5 observations.
    • data2 has 2 variables and 5 observations.
  • Number of common variables between the two datasets. In our case, they share 2 common variables: team and points.
  1. Table 2: Summary of Differences in Values The second table summarizes the discrepancies observed between corresponding values in the datasets. Notable information includes:
  • The team variable exhibits differences in 3 observations.
  • The points variable also has 3 differing observations, with a maximum difference of 9.
  1. Table 3: Detailed Differences Between Observations This table provides a granular view of the actual differences:
  • For the team variable, the third observation diverges: data1 lists ‘C’, while data2 lists ‘F’.
  • For the points variable, we can see an example where data1 has a value of 18 and data2 has a value of 27, resulting in a difference of 9.

These tables collectively furnish a comprehensive understanding of the differences between the two datasets.

Comparing Specific Variables

If you wish to focus only on particular variables, you can specify them in the PROC COMPARE statement. Here’s how to compare just the points variable:

/* Compare only the 'points' variable */
proc compare
    base=data1
    compare=data2;
    var points;
run;

The output will be similar to the previous comparison but will exclusively highlight the data related to the points variable.

Conclusion

Utilizing PROC COMPARE is an efficient method for identifying and understanding the differences between datasets in SAS.

The procedure generates detailed summaries that can help in data analysis, ensuring that you can quickly spot variations and discrepancies.

Whether you’re comparing entire datasets or focusing on specific variables, PROC COMPARE is an invaluable tool in the SAS programming arsenal.

SAS Archives » FINNSTATS

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

1 + 18 =

Ads Blocker Image Powered by Code Help Pro

Quality articles need supporters. Will you be one?

You currently have an Ad Blocker on.

Please support FINNSTATS.COM by disabling these ads blocker.

Powered By
100% Free SEO Tools - Tool Kits PRO