If you are more comfortable in working with SQL then a good news for you! SAS offers a procedure using which you can use your SQL queries in SAS! Kudos to PROC SQL!!!!
To understand more let us firstly define our LIBNAME:
libname mylib '/home/u50193527/My_datasets';
Syntax:
PROC SQL;
CREATE TABLE <newtablename> AS
....SQL Queries....
...
QUIT;
Note: At the end of PROC SQL - We mention QUIT and not RUN (like other SAS procedures).
In this tutorial we shall make use of SAS' inbuilt dataset: SASHELP.SHOES;
Understanding via examples!
Example 1: Selecting all the columns from SASHELP.SHOES and saving them in a new dataset.
In the following code we are saving the output in our new table output1.
Select * means select all the columns.
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT * FROM SASHELP.SHOES;
QUIT;
Example 2: Selecting only some of the columns from SASHELP.SHOES and saving them in a new dataset.
Here we are selecting only 3 columns: Region, Product and Sales and saving the output in our new table output2.
PROC SQL;
CREATE TABLE MYLIB.output2 AS
SELECT Region, Product, Sales FROM SASHELP.SHOES;
QUIT;
Example 3: Creating a new column from SASHELP.SHOES and saving it in a new dataset.
Here we are selecting only 3 columns: Region, Product and Sales and creating a new columns double_sales = 2*Sales.
PROC SQL;
CREATE TABLE MYLIB.output2 AS
SELECT Region, Product,Sales, 2*Sales as double_sales
FROM SASHELP.SHOES;
QUIT;
CALCULATED in SQL
Example 4: Using a newly calculated column in our calculations
Here we are selecting only 3 columns: Region, Product and Sales and creating a new columns double_sales = 2*Sales and want to use double_sales for further calculations: (in our case original_sales = double_sales/2).
To refer this newly_created column in further computations we mention they keyword CALCULATED and then mention the calculation using new_column (double_sales)
PROC SQL;
CREATE TABLE MYLIB.output2 AS
SELECT Region, Product,Sales, 2*Sales as double_sales,
CALCULATED double_sales/2 as original_sales
FROM SASHELP.SHOES;
QUIT;
Example 4: Selecting unique values from a dataset
Using SELECT DISTINCT query we can retrieve distinct combinations of Region and Product.
PROC SQL;
CREATE TABLE MYLIB.output2 AS
SELECT DISTINCT Region, Product
FROM SASHELP.SHOES;
QUIT;
Example 5: Filtering the dataset
To retrieve the rows satisfying a particular condition we use WHERE.
In the following query we are filtering for rows where Region is either Africa or Canada.
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT * FROM SASHELP.SHOES
WHERE Region IN ('Africa','Canada');
QUIT;
Example 6: Aggregating the results
SQL has many inbuilt agggegate functions like mean, median, sum, count, min, max to get summary statistics for our data.
In the below query we are getting average sales for the entire data and saving the output in a column avg_sales
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT mean(Sales) as avg_sales FROM SASHELP.SHOES
QUIT;
Example 7: Aggregating the results for each category
In the below query we are getting average sales for each region (see GROUP BY statement)
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT Region, mean(Sales) as avg_sales FROM SASHELP.SHOES
GROUP BY Region;
QUIT;
Example 8: Sorting the data.
Firstly we are getting average sales for each region and then sorting the output by Region in descending order.
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT Region, mean(Sales) as avg_sales FROM SASHELP.SHOES
GROUP BY Region
ORDER BY Region DESC;
QUIT;
Example 9: Defining the formats:
Formats in SAS are used to view the values in a column in desired way. We can also specify the FORMATS in PROC SQL.
In the following code chunk we have specified a comma format on our new column avg_sales.
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT Region, mean(Sales) as avg_sales FORMAT comma7.
FROM SASHELP.SHOES
GROUP BY Region
ORDER BY Region DESC;
QUIT;
MONOTONIC( ) in PROC SQL
Example 10: Defining the row numbers:
To view or do the computation via row numbers PROC SQL offers a keyword MONOTONIC( ) which returns the row number. (It is similar to _N_ in SAS)
Here we have specified a new column Row_number using MONOTONIC( ) function.
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT Region, product, sales, MONOTONIC() as Row_number
FROM SASHELP.SHOES;
QUIT;
Example 11: Using MONOTONIC( ) for filtering the rows.
In the below query we are filtering for the first 9 rows using WHERE MONOTONIC( ) < 10
PROC SQL;
CREATE TABLE MYLIB.output1 AS
SELECT Region, product, sales, MONOTONIC() as Row_number
FROM SASHELP.SHOES
WHERE MONOTONIC() < 10;
QUIT;
There are a lot of things which we can do in SQL like dropping a column, dropping a table, creating columns using CASE WHEN (IF-ELSE) conditions, subqueries, joins etc.
Almost all of them can be executed using PROC SQL.
To learn more about them in detail refer to our SQL tutorials page!
Comments