top of page
Writer's pictureEkta Aggarwal

Filtering different objects in R

Updated: Jan 12, 2021

In this tutorial we shall explore different ways to subset a vector, matrix, dataframes and list:


Points to keep in mind:

  • To select only some particular elements (i.e. subsetting the elements) we make use of square brackets [ ].

  • In R, indexing starts from 1 (in some languages, eg. Python indexing starts from 0). i.e. in R, to retrieve the first element we write x[1].


Subsetting a vector...


Subsetting a numeric vector


Let us consider a numeric vector x. Since in R, indexing starts from 1, to extract its first element we write x[1] and to fetch second element we need to write x[2].

If you have worked upon Python you must be aware that in Python x[-1] returns the last element of a vector in Python. But what about R?

Our vector x has length 3 thus length(x) will return 3, hence we can get the last element of x. Similarly to get second last element we write length(x)-1 within the square brackets.


Naming the elements of a vector and using it for filtering the elements.


Let us create a vector named 'price' containing the price of different water bottles by various brands. We can zip the name of the brands along with their respective prices using names( ) function.

By now you must be aware that if you want to extract the first element you can write price[1]. But what if I want to know the price of Vedica?

price[1]
price["Vedica"]

Subsetting a character vector

Let us create a character vector named my_string:

my_string = c("Doctor","Engineer","Lawyer","Panda","Pikachu")

Suppose I want to filter only those locations where the element is Panda. For this writing my_string == "Panda" will return a vector of TRUE and FALSE, where it will be TRUE when it finds Panda on that index. (In our case it will return TRUE in 4th index). We shall use it to filter our my_string.

my_string[my_string == "Panda"]

my_string[my_string %in% c("Panda","Avengers","Pikachu")]

Finding the list of elements which are not equal to Panda, Avenger and Pikachu.

Ans. We write a ! sign (which means not equal to)

my_string[!my_string %in% c("Panda","Avengers,"Pikachu")]

Subsetting a numeric vector using logical operators.

Let us create a vector of 1000 elements

my_vector = 1:1000

Filtering those elements where my_Vector is greater than 500. For this my_vector > 500 would return a logical vector (Containing TRUE and FALSE values) of 1000 elements, where first 500 elements would be FALSE and rest would be TRUE. We shall use it in the following way:

my_vector[my_vector > 500]

my_result = my_vector[my_vector > mean(my_vector)] my_result

Subsetting a matrix or a DataFrame...


Since every matrix can be transormed to a dataframe thus in this lesson the approach for filtering both Matrix and Data Frame is common.

Let us create a matrix:

Subsetting the first row:

my_matrix[1,]

Subsetting the first column:

my_matrix[,1]

Subsetting last row:

my_matrix[nrow(my_matrix),]

Subsetting last column:

my_matrix[,ncol(my_matrix)]


Filtering a matrix or dataframe by rownames and colnames.


Filtering data where row-name is Amar.

Filtering data for Amar, excluding columns Physics and Biology


In R, you can't exclude columns in a matrix or dataframe by writing !c("Physics","Biology"). R throws up the error.


Let us create a dataframe:

x = data.frame(Name = c("Thor","Iron Man","Black Widow","Captain America"),
               Weight = c(90,80,48,75),
               Like_or_Not = c(F,T,T,F))
x

Fetching a column from a dataframe:

x$Name
x[,1]     #Alternatively
x[,"Name"]      #Alternatively

Subsetting a List..


Let us create a list: In my_list we have a list containing 3 elements , all 3 of them are vectors of different lengths

my_list = list(Names = c("Archie","Henry","John","Leonard"),Gender = c("F","M","M","M","F","F"),
               Salary = c(50000,100000,45000))
my_list

Extracting the first element of a list

There are 3 ways of extracting the first element of the list: using a dollar sign, using [ ] brackets and using [[ ]] brackets.

What if we need to subset the first element (Archie) of the the first element(Names) of my_list? Since my_list[1] is a list of one element thus, adding [1] in front of it will return the entire list

Thus you need to use [[ ]] to firstly get a character vector and then write [1] in front of it to retrieve the first element (Archie)


Comments


bottom of page