In SQL interviews, sub-queries as an integral part to ask questions from. But what are sub-queries?
What is a sub-query?
When a SQL query is used inside another SQL query then it is called a sub-query. Sub-queries in SQL are mostly placed in SELECT, FROM or WHERE statements.
Output of a sub-query:
A sub-query can result in following outputs:
A scalar number
A vector
A SQL dataset
Prerequisites:
WHERE: Click here to review the tutorial Aggregate functions and GROUP BY: Click here to review the tutorial Joins in SQL: Click here to review the tutorial
Dataset:
For this tutorial we shall make use of 2 datasets: patient_info and hospital_record
CREATE TABLE patient_info(
patient_id int,
department varchar(20),
gender char(1),
age int)
INSERT INTO
patient_info(patient_id,department,gender,age)
VALUES
(1001,'Cardiology','M',24),
(1002,'Radiology','M',32),
(1003,'E&A','F',44),
(1004,'Radiology','M',32),
(1005,'Neurology','M',38);
CREATE TABLE hospital_record(
patient_id int,
department varchar(20),
length_of_stay int,
last_visit_date date)
INSERT INTO
hospital_record(patient_id,department,length_of_stay,last_visit_date)
VALUES
(1001,'Cardiology',20,'12Aug2020'),
(1002,'Radiology',0,'13Feb2019'),
(1004,'Radiology',2,'31Jan2020'),
(1005,'Neurology',4,'9Aug2018'),
(1005,'Cardiology',18,'9Aug2018'),
(1004,'Neurology',65,'19Jan2020');
Our data looks as follows:
SELECT * FROM patient_info;
SELECT * FROM hospital_record;
Subquery in WHERE statement:
Task: Retrieve information from patient_info where patient's age is more than the average age of all the patients.
Firstly we will try to get the average age of all the patients using following query:
SELECT avg(age) FROM patient_info
Above query will result in a scalar as an output. We need to get the rows where age of the patients is more than 34 years, thus we use the above query inside our main query.
SELECT * from patient_info
WHERE age > (SELECT avg(age) FROM patient_info);
Task: Retrieve information from patient_info where patient's record is available in hospital_record. Let us firstly select the list of patient id from hospital_record.
SELECT distinct patient_id FROM hospital_record
We need to get the information of only these patient id from patient_info, thus above query will be used as a subquery in our main query.
SELECT * FROM patient_info
WHERE patient_id IN (SELECT distinct patient_id FROM hospital_record)
Subquery in FROM statement:
Task: Get average age and average stay duration for each department. Firstly we will join both the tables patient_info and hospital_record to get the information in one single table.
SELECT a.*,b.length_of_stay FROM patient_info as a
INNER JOIN hospital_record as b
ON a.patient_id = b.patient_id AND a.department = b.department;
Note: The above query results in a table as an output. In the below query, after the subquery we MUST provide an alias (in our case our alias name is 'subquery')
SELECT department, avg (age) as avg_Age, avg(length_of_stay) as avg_stay_duration FROM
(SELECT a.*,b.length_of_stay FROM patient_info as a
INNER JOIN hospital_record as b
ON a.patient_id = b.patient_id AND a.department = b.department) as subquery
GROUP BY department;
Subquery in SELECT statement:
Task: Get average age for each department and overall average age for all the patients. Following query results in average of all the patients. (which is 34 years) (select avg(age) from patient_info)
Now we can use this same query in our main query to get department wise average age and overall age in one single table.
SELECT department, avg(age) as dept_avg_Age, (select avg(age) from patient_info) as total_avg_age
FROM patient_info
GROUP BY department
Task: Calculate the difference between average department age and overall average age.
SELECT department, avg(age) as dept_avg_Age, avg(age) - (select avg(age) from patient_info) as diff
FROM patient_info
GROUP BY department
Comments