Sorting and Selecting Columns in SAS
Sorting and Selecting Columns in SAS, SAS (Statistical Analysis System) offers powerful data processing capabilities, including the ability to sort datasets and selectively retain specific columns.
One effective way to achieve this is through the PROC SORT
procedure combined with the KEEP
statement.
Sorting and Selecting Columns in SAS
In this article, we’ll explore how to effectively sort your data and keep only the necessary columns using SAS.
Basic Usage of PROC SORT with the KEEP Statement
The PROC SORT
procedure is used to arrange data in a specified order, and by utilizing the KEEP
statement, you can focus on only the columns that are essential for your analysis. The general syntax for this operation is:
proc sort data=my_data out=sorted_data (keep=var1 var2);
by var2;
run;
In this example, the dataset is sorted based on the values in var2
, and only var1
and var2
are retained in the output dataset.
Practical Example: Using PROC SORT with the KEEP Statement
To illustrate this process, let’s consider a sample dataset that contains information about various basketball teams, including their points and assists.
Step 1: Creating the Sample Dataset
We will first create a dataset named my_data
that consists of several basketball teams along with their respective points and assists:
/* Create dataset */
data my_data;
input team $ points assists;
datalines;
Mavs 113 22
Pacers 95 19
Cavs 100 34
Lakers 114 20
Heat 123 39
Kings 100 22
Raptors 105 11
Hawks 95 25
Magic 103 26
Spurs 119 29
;
run;
/* View dataset */
proc print data=my_data;
run;
Step 2: Sorting the Dataset
Next, let’s sort this dataset based on the values in the points
column. By default, SAS retains all columns after sorting. Here’s how you can accomplish the sorting:
/* Sort rows in dataset based on values in points column */
proc sort data=my_data out=sorted_data;
by points;
run;
/* View sorted dataset */
proc print data=sorted_data;
run;
Upon executing this code, you’ll observe that the dataset’s rows are arranged in ascending order according to the points
column. However, all columns are included in the output.
Step 3: Using the KEEP Statement to Select Specific Columns
If you want to streamline your output to include only relevant information, you can use the KEEP
statement. For instance, to sort the dataset based on points while retaining only the team
and points
columns, you would write:
/* Sort rows in dataset based on values in points column and only keep team and points */
proc sort data=my_data out=sorted_data (keep=team points);
by points;
run;
/* View sorted dataset */
proc print data=sorted_data;
run;
Now, the sorted dataset will display only the team
and points
columns, making it a more focused output that highlights the relevant information without any unnecessary clutter.
Conclusion
By combining PROC SORT
with the KEEP
statement in SAS, you can efficiently sort your data and selectively retain only the columns that are most pertinent to your analysis.
This technique not only simplifies your dataset but also enhances clarity and focus, making your analyses much more effective.
Utilizing PROC SORT
with KEEP
fosters efficient data management practices in SAS, enabling you to concentrate on what truly matters in your datasets.
Whether you’re working with sports data or any other type of information, these tools are vital for data analysis and presentation.