Locating content of highest value groups in SQL

SQL (Structured Query Language) code on computer monitor and server room background. Example of SQL code to query data from a database.

Quite often, while skimming through SQL to prepare for interviews, one comes across the question of finding the employee with the highest or 2nd highest salary by joining a table that holds employee information with another that contains department information. This begs the question: what about finding the employee who earns the nth highest salary department-wise?

There may also be a scenario where a particular department(s) does/do not contain an nth highest value, for example, a department with only two employees will not have an employee earning the 3rd highest salary. This article tends to emphasize such a scenario.

So, let’s begin by creating two tables:

1. Department

This table includes fields such as: dept_id and dept_name.

CREATE TABLE department (
    dept_id int,
    dept_name varchar(60)
);

Let’s insert various departments into the new table.

INSERT INTO department (dept_id,dept_name)
VALUES (780,'HR');
INSERT INTO department (dept_id,dept_name)
VALUES (781,'Marketing');
INSERT INTO department (dept_id,dept_name)
VALUES (782,'Sales');
INSERT INTO department (dept_id,dept_name)
VALUES (783,'Web Dev');

2. Employee

This table incorporates the fields: first_name, last_name, dept_id, and salary.

CREATE TABLE employee (
    first_name varchar(100),
    last_name varchar(100),
    dept_id int,
    salary int
);

Inserting values into the table:

INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Sam','Burton',781,80000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Peter','Mellark',780,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Happy','Hogan',782,110000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Steve','Palmer',782,120000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Christopher','Walker',783,140000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Richard','Freeman',781,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Alex','Wilson',782,115000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Harry','Simmons',781,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Thomas','Henderson',780,95000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Ronald','Thompson',783,130000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('James','Martin',783,135000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Laurent','Fisher',780,100000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Brooks',780,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Bennington',783,140000);

Fig 2. Employee Table ordered by dept_id

The following is inferred from the employee table:

Department_ID:Number of employees:

780:4

781:3

782:3

783:4

To understand what this article aims to achieve, let us observe the desired output for different inputs.

Considering we want the information of the 2nd highest earning employees of different departments along with their department’s name (considering using DENSE_RANK), the output shall be as follows:

Fig 3. Information of 2nd highest earning employees department-wise

If we apply the same query for finding the 4th highest earning employees, the following will be the output:

Observe that though department 783 has four employees, there are two who will be classified as 3rd highest earning employees of that department. Now, to the main question: What if we wanted to display the dept_ID and dept_name with null values for employee-related fields when the department(s) does/do not have an nth highest-earning employee? For this, the result portrayed in Fig. 4 shall be modified as follows:

The table displayed in Fig. 5 is what we will aim to obtain when certain departments do not have an nth highest earning employees.

The ultimate query that helps obtain the table in Fig 5. is as follows:

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
from (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * from null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B;

It may look overwhelming, but let’s break it down step-by-step:

(i) Using DENSE_RANK() to display employee and department information (not involving null for the absence of the nth highest earning member):

SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4;

Output:

(ii) Excluding the ‘rank1’ column from the table in Fig 6:

select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A;

Output:

(iii) Pointing out the departments from the department table that do not have an nth highest earning employee:

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null))
SELECT * FROM full1)B;

Output:

Replace the ‘full1’ in the last line of the above code with ‘null1’:

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null))
SELECT * FROM null1)B;

Now, we need to fill the null values of the dept_id and dept_name of the table in Fig. 9 with the corresponding values in the table in Fig. 8.

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * from nulled) B;

The ‘nulled’ query uses CASE WHEN on the nulls encountered in the dept_id and dept_name columns of ‘null1’ table to replace them with the corresponding values in the ‘full1’ table. Now, all we need to do is apply UNION on the tables obtained in Fig. 7 and Fig. 10. This can be accomplished by declaring the last query in the previous code using WITH and then UNION-izing it with ‘null1’.

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
from (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * from null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B;

It can be inferred from Fig. 11 that ‘Marketing’, ‘Sales’, and ‘Web Dev’ are the departments with no employee earning the 4th highest salary.