In this tutorial we will mainly deal with how to read / import datasets of various formats:
Let us firstly set up our working directory using setwd( )
setwd("C:\\Users\\Ekta\\Importing data in R")
getwd()
Reading a csv file
To read a .csv extension file we use read.csv command.
by default header = T, which means first row would be treated as a header row.
data1 = read.csv("Employee_info.csv",header = T)
Defining NAs
Suppose in your data, you know if values such as 999 or any other pattern appears then it should be treated as NA, we can define it using na.strings = "999"
data2 = read.csv("Employee_info.csv",header = F,na.strings = "999")
Note that class of datasets imported using read.csv( ) is data.frame
class(data1)
class(data2)
Reading a table
To read a table, we use read.table( ) function.
We define our separator manually using sep parameter. By default header = T i.e. first row would be treated as header row.
data4 = read.table("Employee_info.txt",sep = ",",header = T)
Note that class of datasets imported using read.csv( ) is data.frame
class(data4)
read.delim( )
If your file has a different separator other than a tab, a comma or a semicolon, you can use read.delim() and read.delim2() functions.
Reading an excel file
Method 1: Using readxl package
Using library readxl's read_excel function one can import excel files.
library(readxl)
data7= read_excel("Employee_info.xlsx",sheet = 1)
sheet = 1 denotes first sheet should be imported.
Method 2: Using openxlsx package
Using library openxlsx's read.xlsx function one can import excel files.
library("openxlsx")
data8 = read.xlsx('Employee_info.xlsx', sheet = 1)
sheet = 1 denotes first sheet should be imported.
Most Suitable way for txt, csv and xlsx files: data.table way
fread( ) function of data.table is highly convenient to import txt, csv and xlsx files. It automatically comprehends the separator.
install.packages("data.table")
library(data.table)
data3 = fread("Employee_info.csv")
Reading a sas dataset
A sas dataset can be imported using read_sas function from library haven.
library(haven)
data5 = read_sas("filename.sas7bdat")
Reading a SPSS file
Method 1: Using library haven
library(haven)
data_spss = read_spss("filename.sav")
Method 2: Using library foreign
Using read.spss function ( ) from library foreign one can easily import SPSS datasets.
install.packages(“foreign”)
library(foreign)
data_spss <- read.spss("filename.sav", to.data.frame=TRUE)
To save the dataset as a dataframe one needs to specify to.data.frame = TRUE.
Also, if you don't want the columns containing value labels to be converted into factors, you should specify use.value.labels = FALSE:
Reading a stata file
Using read.dta function by library foreign one can import stata datasets.
library(foreign)
file_stata = read.dta("filename")
Comments