If you are familiar with SQL then you must be aware 'WHERE' keyword is used to retrieve those rows in a dataset which satisfy a particular condition. Luckily, SAS offers exactly same utility with its WHERE keyword
For this tutorial we shall be using the already built in SAS dataset BASEBALL from the library SASHELP.
Firstly let us initialize out libpath:
libname mylib '/home/u50132927/My_datasets';
We can filter for rows using 3 different methods:
Using WHERE keyword in DATA step.
Using WHERE keyword in SET step.
Using WHERE Statement after SET step.
All 3 of them yield same output.
In the following 3 code chunks we are filtering for rows where players belong to Team Cleveland.
DATA mylib.subset;
SET sashelp.baseball;
WHERE Team = "Cleveland";
RUN;
In this second method, SAS reads only rows for Team Cleveland from SASHELP.BASEBALL and then copies the result to our new data. This is an efficient method.
DATA mylib.subset;
SET sashelp.baseball(WHERE = (Team = "Cleveland"));
RUN;
In this third method, SAS reads all the rows from SASHELP.BASEBALL and copies all of them to our new data and then it filters for rows for Team Cleveland. This is comparatively an inefficient method.
DATA mylib.subset(WHERE = (Team = "Cleveland"));
SET sashelp.baseball;
RUN;
Multiple conditions in WHERE statement
We can use the AND or OR keywords to filter for multiple conditions in WHERE statement.
Task: Filter for rows for Team Cleveland where number of hits is more than 100.
DATA mylib.subset(WHERE = (Team = "Cleveland" AND NHITS > 100) );
SET sashelp.baseball;
RUN;
Alternatively,
DATA mylib.subset;
SET sashelp.baseball;
WHERE Team = "Cleveland" AND NHITS > 100;
RUN;
To learn more about filtering data using WHERE keyword refer to this tutorial!
Comments