In SQL, GROUP BY is used to aggregate the data and calculate summary statistics like averages, median, minimum, maximum, totals etc. on the basis of categorical variable(s).
HAVING key word helps us to filter the rows fulfilling particular condition (s).
Note: On grouped data WHERE command fails to filter the data for each group.
Dataset:
For this tutorial we shall make use of employee_performance:
CREATE TABLE employee_performance(
employee_id int,
department varchar(20),
education_level varchar(20) ,
gender char(1),
no_of_trainings int,
age int,
previous_year_rating numeric(2,1),
length_of_service int,
KPIs_met char(1),
avg_training_score numeric(5,2),
promoted_or_not varchar(3))
INSERT INTO
employee_performance(employee_id,department,education_level,gender,no_of_trainings,age,previous_year_rating,length_of_service,KPIs_met,avg_training_score,promoted_or_not)
VALUES
(1001,'Marketing','Graduate','M',2,24,NULL,1,'Y',69.5,'N'),
(1002,'Analytics','Post-Graduate','M',5,32,4.5,7,'Y',89.5,'Y'),
(1003,'R&D','Graduate','F',5,44,5,10,'Y',87,'Y'),
(1004,'HR','Graduate','M',1,32,3,3,'Y',54,'N'),
(1005,'Marketing','PhD','M',0,38,4,7,'N',79.5,'N'),
(1006,'IT','Graduate','F',2,23,NULL,0,'Y',83.5,'N'),
(1007,'Analytics','Post-Graduate','M',2,28,5,4,'Y',78.5,'Y'),
(1008,'Sales','Graduate','F',5,22,NULL,0,'Y',69.5,'N'),
(1009,'Marketing','PhD','F',2,48,3,7,'Y',98,'Y'),
(10010,'Marketing','PhD','M',10,39,5,4,'Y',71,'N'),
(10011,'Finance','Post-Graduate','M',3,35,3.5,10,'N',55,'N'),
(10012,'Sales','Post-Graduate','F',2,55,4,19,'Y',87,'Y'),
(10013,'HR','Graduate','M',0,48,2.5,21,'Y',72.5,'N'),
(10014,'IT','Graduate','F',2,24,3.5,12,'Y',64,'N');
Our dataset looks as follows:
SELECT * FROM employee_performance;
GROUP BY
Task: For each department calculate the total number of training sessions conducted, how many employees took the training, and average training score.
SELECT department,sum(no_of_trainings) as total_trainings_attended, count(employee_id)
as employee_strength,avg(avg_training_score) as avg_training_score
FROM employee_performance
GROUP BY department;
You can use several aggregate functions when making use of GROUP BY command. You can refer to this article for more details:
Aggregate functions in SQL
GROUP BY + HAVING
Task: Calculate the total number of training sessions conducted, how many employees took the training, and average training score for the departments where at least 5 training sessions have been conducted.
Firstly the data gets grouped by department and then HAVING command words on filtering the grouped data.
SELECT department,sum(no_of_trainings) as total_trainings_attended, count(employee_id)
as employee_strength,avg(avg_training_score) as avg_training_score
FROM employee_performance
GROUP BY department
HAVING sum(no_of_trainings) > 5;
GROUP BY + HAVING + ORDER BY
Task: Sort the above table by average training score.
If you wish to sort the grouped data, then ORDER BY keyword is placed after GROUP BY and HAVING commands.
SELECT department,sum(no_of_trainings) as total_trainings_attended, count(employee_id)
as employee_strength,avg(avg_training_score) as avg_training_score
FROM employee_performance
GROUP BY department
HAVING sum(no_of_trainings) > 5
ORDER BY avg_training_score desc;
Comments