In this tutorial we shall try to understand what are long and wide format datasets and how to convert them from wide to long and long to wide in SAS.
Key concepts:
Long format:
Following is the example of a long data where both the attributes- department and month information are present as separate columns and we have a different column for values.
Wide format:
In a wide format at least one of our attributes is present as multiple columns (here we have separate columns for each month) and values are filled in these 2X2 table (where values are our number of cases corresponding to each department and month).
Many a times we need to transpose our data i.e. switch from long or wide format or vice - versa.
Dataset
Let us firstly define our library and create our dataset using the following code:
libname mylib '/home/u50167827/My_datasets';
data mylib.mydata;
input Department : $30. Months $ No_of_cases No_of_deaths;
cards;
Cardiology Jan 7713 23
Cardiology Feb 243 21
Cardiology Mar 543 10
E&A Jan 772 42
E&A Feb 443 13
E&A Mar 82 12
Neurology Jan 1039 282
Neurology Feb 1943 102
Neurology Mar 1474 221
;
run;
Our data looks as follows:
Transposing data using PROC TRANSPOSE
In SAS to transpose our data we use PROC TRANSPOSE. Before going through the code let us firstly understand a few definitions:
BY Statement: It acts as an identifier for each row. It can contain multiple columns which should have distinct combinations in original dataset.
VAR Statement: Can contain multiple columns. It denotes which values should be filled inside the cells after transposing.
ID Statement: It helps in assigning the column names to new columns in the transposed data.
Basic Syntax:
PROC TRANSPOSE DATA = mylib.mydata;
BY Department;
ID Months;
VAR No_of_cases;
run;
Output: Note that in the following output no_of_cases are stored as values.
_NAME_ column denotes which column value is being denoted by each row in the transposed table.
Similarly we can transpose the data by keeping no_of_deaths as values in our transposed table.
PROC TRANSPOSE DATA = mylib.mydata;
BY Department;
ID Months;
VAR No_of_deaths;
run;
We can have multiple columns defined in VAR statement as shown below:
PROC TRANSPOSE DATA = mylib.mydata;
BY Department;
ID Months;
VAR No_of_cases No_of_deaths;
run;
Renaming _NAME_ column
We can also rename _NAME_ column be defining NAME = new_column_name. In the following illustration we have rename _NAME_ as Variable_description
PROC TRANSPOSE DATA = mylib.mydata NAME = Variable_description;
BY Department;
ID Months;
VAR No_of_cases No_of_deaths;
run;
Adding a prefix in the newly created columns
We can also add a prefix to the columns defined in ID statement by using PREFIX = prefix_name.
In the code chunk provided below we have kept our prefix as 'Month_'
PROC TRANSPOSE DATA = mylib.mydata NAME = Variable_description Prefix = Month_;
BY Department;
ID Months;
VAR No_of_cases No_of_deaths;
run;
Skipping by statement
It is permissible to skip the by statement. In our example we have defined Department in the ID variables list.
PROC TRANSPOSE DATA = mylib.mydata delimiter=_;
ID Department Months;
VAR No_of_cases No_of_deaths;
run;
Saving the output
SAS also provides the utility to save the output provided by PROC TRANSPOSE using OUT = new_dataset_name.In the output dataset we can als
PROC TRANSPOSE DATA = mylib.mydata OUT = mylib.proc_transpose_output ;
BY Department;
ID Months;
VAR No_of_cases No_of_deaths;
run;
In the output dataset we can also remove _NAME_ column by defining (DROP = _NAME_) along with the name of our new dataset.
PROC TRANSPOSE DATA = mylib.mydata OUT = mylib.proc_transpose_output2 (DROP = _NAME_);
BY Department;
ID Months;
VAR No_of_cases;
run;
We can also rename _NAME_ column in our output dataset by defining NAME = new_column_name
PROC TRANSPOSE DATA = mylib.mydata OUT = mylib.proc_transpose_output3 NAME = column_which_is_transposed;
BY Department;
ID Months;
VAR No_of_cases ;
run;
Wide to long format
In SAS we can also transform our data from wide to long format. In our case we have skipped used the dataset created above in DATA statement and have skipped the ID and VAR statement.
PROC TRANSPOSE data = mylib.proc_transpose_output3;
BY department;
run;
Comments