Wednesday, July 27, 2011

SQL query to get last and second last payments to employees

Joe Celko is an American relational database expert. He has participated on the ANSI X3H2 Database Standards Committee, and helped write the SQL-89 and SQL-92 standards. The above lines would explain his innumerable contributions to RDBMS world. His “SQL Puzzles and Answers” is another golden feather on his cap. If you are an SQL geek and you’ve never been come across this book then please try this. I promise you will never be disappointed. Here is a puzzle from this book.

Assume that there is salary table with three columns hold employee name, salary date and salary. This table holds salary information paid to employees. The requirement is to write a query that should return employee’s current and previous salary status for each employee.

You can use the following statement and the pl/sql block to create and populate sample data to test the solutions.

CREATE TABLE emp ( NAME VARCHAR2(50), sal_date DATE, salary NUMBER(15,2))

BEGIN
FOR i IN 1 .. 10
LOOP
INSERT INTO emp
VALUES ('James', ADD_MONTHS (TO_DATE(‘01-01-2010’,’DD-MM-YYYY’), i), 5000 * i);

INSERT INTO emp
VALUES ('Rajendran', ADD_MONTHS (TO_DATE(‘01-01-2010’,’DD-MM-YYYY’), i), 5100 * i);

INSERT INTO emp
VALUES ('Krishna', ADD_MONTHS (TO_DATE(‘01-01-2010’,’DD-MM-YYYY’), i), 5200 * i);

INSERT INTO emp
VALUES ('Jagdish', ADD_MONTHS (TO_DATE(‘01-01-2010’,’DD-MM-YYYY’), i), 5300 * i);


END LOOP;
END;

Here is my first solution to this problem,

SELECT NAME,
(SELECT SUM (salary)
FROM emp emp_inner1
WHERE emp_inner1.sal_date > emp_outer.second_last
AND emp_inner1.NAME = emp_outer.NAME) last_salary,
(SELECT salary
FROM emp e1
WHERE e1.NAME = emp_outer.NAME
AND e1.sal_date = emp_outer.second_last) second_last_salary
FROM (SELECT NAME, MAX (sal_date) second_last
FROM emp emp1
WHERE sal_date < (SELECT MAX (sal_date)
FROM emp emp2
WHERE emp2.NAME = emp1.NAME)
GROUP BY NAME) emp_outer;

I think this is a little expensive query. The main query fetches the second last salary data for each employee. The sub queries in select clause find the last and second last salary by using second last salary date.

But the above query fails when there is a new employee who has got only one payment. The above query does not report the new employee at all. The following solution which looks better than the above also takes care of employees with only one payment.

SELECT NAME,
(SELECT salary
FROM emp emp_inner1
WHERE emp_inner1.sal_date = emp_outer.latest
AND emp_inner1.NAME = emp_outer.NAME) latest,
(SELECT salary
FROM emp emp_inner2
WHERE emp_inner2.sal_date = emp_outer.second_latest
AND emp_inner2.NAME = emp_outer.NAME) second_latest
FROM (SELECT e1.NAME, MAX (e1.sal_date) latest,
MAX (e2.sal_date) second_latest
FROM emp e1, emp e2
WHERE e1.NAME = e2.NAME(+) AND e1.sal_date > e2.sal_date(+)
GROUP BY e1.NAME) emp_outer;

The main query fetches both last and second last payment date using self-outer join. This outer join makes sure that employees with only one payment are not missed out. I'm hoping that I will be updating this post with an even better query one day


No comments: