To create frequency distributions or cross tabulations SAS offers a very powerful procedure named PROC FREQ. It not only provides the frequency but also the cumulative totals as well as percentages.
Let us firstly initialize our library:
libname mylib '/home/u47232927/My_datasets';
For this tutorial we shall be making use of SAS' inbuilt dataset: SASHELP.SHOES.
Task: Obtain frequency distribution for various regions.
PROC FREQ DATA=SASHELP.SHOES;
TABLES Region;
RUN;
In the output it can be observed that Western Europe has highest frequency, contributing to more than 15% of the data.
Sorting the data by frequency
To gain more insights from the above table we can also sort the data by frequency by defining ORDER = FREQ.
PROC FREQ DATA=SASHELP.SHOES ORDER = FREQ;
TABLES Region ;
RUN;
We can see that top 4 regions contribute to more than 50% of the rows.
NLEVELS
We can also get the number of unique regions using NLEVELS keyword.
PROC FREQ DATA=SASHELP.SHOES ORDER = FREQ NLEVELS;
TABLES Region ;
RUN;
It can be seen clearly that we have 10 distinct regions available in our data.
Filtering the output in PROC FREQ.
After getting the output we can filter for regions using WHERE statement as follows:
PROC FREQ DATA=SASHELP.SHOES;
TABLES Region;
WHERE Region IN ('Asia','Canada');
RUN;
Removing Cumulative percentages and sums
By default for frequency distributions PROC FREQ provides cumulative frequencies and percentages which might not be of much utility to us. We can remove them by defining NOCUM and NOPERCENT keywords.
PROC FREQ DATA=SASHELP.SHOES;
TABLES Region /NOCUM NOPERCENT;
RUN;
2 X 2 Cross Tabulations
PROC FREQ also provides the power to calculate 2*2 frequency distributions. We need to define the columns with a * sign in TABLES statement.
PROC FREQ DATA=SASHELP.SHOES;
TABLES Region * Product;
RUN;
Removing row and column percentages
Using NOROW, NOCOL and NOPERCENT keywords you can drop Row percentages, column percentages and overall percentages respectively.
PROC FREQ DATA=SASHELP.SHOES;
TABLES Region * Product / NOPERCENT NOROW NOCOL;
RUN;
Viewing the output as a LIST
Instead of a 2X2 format you can also view the output in a list format where both the attributes (here Region and Product) are two separate columns using LIST keyword.
PROC FREQ DATA=SASHELP.SHOES;
TABLES Region * Product / LIST;
RUN;
Saving the output of PROC FREQ
You can also save the output of PROC FREQ in the data using OUT statement.
In the following code I have dropped the PERCENT column because it wasn't necessary for me. You can keep it if you want.
PROC FREQ DATA=SASHELP.SHOES ;
TABLES Region * Product / OUT = MYLIB.proc_freq_output (DROP = PERCENT);
RUN;
PROC FREQ with PROC FORMAT
You can also calculate frequency tables with PROC FREQ for numeric columns using PROC FORMAT.
Let us create a format for sales column.
PROC FORMAT;
VALUE Sales_info
LOW -< 20001 = "Low Sales"
20001 -< 50001 = "Medium Sales"
50001 - HIGH = "High Sales";
RUN;
Define the format using FORMAT statement in PROC FREQ.
PROC FREQ data = SASHELP.SHOES;
TABLES Sales;
FORMAT Sales Sales_info.;
run;
תגובות