In this tutorial we shall learn about how to import data in SAS using PROC IMPORT.
We will mainly focus on importing:
CSV files
EXCEL files
Let us first set our library using the following code: My directory name is mylib
libname mylib '/home/u50182927/My_datasets';
For this tutorial I have uploaded following 2 files:
CSV File:
Excel File:
I am using SAS' University edition thus I have uploaded my files on the SAS server as follows:
Importing a CSV file
PROC IMPORT DATAFILE = '/home/u50132927/My_datasets/Employee_info.csv'
OUT = mylib.imported_CSV_file
DBMS =csv REPLACE;
GETNAMES=Yes;
GUESSINGROWS = MAX;
RUN;
Using the above syntax we can import our CSV file in SAS. Following is the explanation of keywords in PROC IMPORT.
DATAFILE : Location of file and file name which needs to be imported.
OUT : Defining the library and new dataset name which we will use for referring in SAS.
DBMS = CSV (for CSV files)
REPLACE: Replace the already existing dataset. In our case if imported_CSV_file exists in our mylib then that will be replaced by our new file.
GETNAMES : If GETNAMES = Yes then first row would be treated as our header row, if your data does not have any headers then set GETNAMES = No
GUESSINGROWS: By default SAS takes only some of the rows to guess the informats of the dataset. By setting GUESSINGROWS = MAX we are telling SAS to guess the informats of the columns using all the rows.
Importing an Excel file
For importing an excel file we use DBMS = xlsx or DBMS = xls depending upon which version you have chosen to create that excel file.
PROC IMPORT DATAFILE = '/home/u50132927/My_datasets/Mydata.xlsx'
OUT = mylib.imported_excel_file
DBMS =xlsx REPLACE;
GETNAMES=No;
RUN;
If we set GETNAMES = No then our first row would be treated as a part of our data and default column names are provided by SAS.
Datarow= 3 means start reading the rows form 3rd line i.e. from 3rd row including the header.
PROC IMPORT DATAFILE = '/home/u50132927/My_datasets/Mydata.xlsx'
OUT = mylib.imported_excel_file
DBMS =xlsx REPLACE;
DATAROW = 3
GETNAMES=Yes;
RUN;
SHEET = ... If our excel file has multiple sheets then we can define the sheetname as SHEET = ... By default PROC IMPORT reads first sheet in an excel.
PROC IMPORT DATAFILE = '/home/u50132927/My_datasets/Mydata.xlsx'
OUT = mylib.imported_excel_file
DBMS =xlsx REPLACE;
SHEET = 'Sheet2';
GETNAMES=Yes;
RUN;