I want to quickly go over some useful analytic functions introduced in sql server 2012 and how you can use it to accomplish some cool tasks in your data warehouse. Lets take a scenario where you have many employees who makes sales to various customers over the years. You are interested in generating report which gives you the employee with the highest sales and the one with the lowest sales each year. This report will help you identify best performing employee in terms of sales over the years or to find out who is always performing poorest over the years.
Lets generate some data to illustrate how to use the analytic functions First_Value and Last_value to easily extract this information.
We create employee table and insert some dummy data of employee names with below
CREATE table #employee(employeeid INT IDENTITY(1,1) PRIMARY KEY,FullName VARCHAR(255))
INSERT INTO #employee(FullName)
VALUES(‘Jane Doe1’),
(‘Jane Doe2’),
(‘Jane Doe3’),
(‘Jane Doe4’)
Now lets create Employee sales table to contain the total sales each year for our employees and insert some dummy total sales for the employees
CREATE TABLE #EmployeeSales(EmployeeID INT, SaleYear INT, TotalSales Money)
INSERT INTO #EmployeeSales(EmployeeID,SaleYear,TotalSales)
VALUES(1,2014,1000.00),
(1,2015,2000.00),
(1,2016,3000.00),
(1,2017,4000.00),
(1,2018,5000.00),
(2,2014,1500.00),
(2,2015,500.00),
(2,2016,300.00),
(2,2017,400.00),
(2,2018,5010.00),
(3,2014,8000.00),
(3,2015,2010.00),
(3,2016,3010.00),
(3,2017,4010.00),
(3,2018,5010.00),
(4,2015,2000.00),
(4,2016,4000.00),
(4,2017,8000.00),
(4,2018,5000.00)
Ok, now our task is given the employee sales table above, generate report showing the highest seller and lowest seller each year. Note that some employees do not have sales in some year.
Here is the query. lets explain
SELECT s.SaleYear,e.FullName,s.TotalSales,
FIRST_VALUE(e.FullName) OVER(PARTITION BY s.SaleYear ORDER BY s.TotalSales DESC) AS HighestSale,
LAST_VALUE(e.FullName) OVER(PARTITION BY s.SaleYear ORDER BY s.TotalSales DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LowestSale
FROM #EmployeeSales s
INNER JOIN #employee e ON e.employeeid = s.EmployeeID
Looking at the resultset, in 2014 we can see Jane Doe3 has the highest totalsales follow by Doe2 and Doe1. You can see Doe4 has no sales in 2014.
You can interpret the result set that its giving you the ordering of the totalsales descending for each saleyear and giving the highest seller and lowest seller in that year. In this resultset you can see all the employees and their total sales.
How about if we are interested in getting only the record for the highest seller and the lowest seller and their corresponding sales so we can have a smaller resultset for our report?
This can be accomplished differently but l will go with a common table expression for the query above and select from it what l want to use for my final report.
;WITH Sales_cte(SaleYear,FullName,TotalSales,HighestSale,LowestSale)
AS
(SELECT s.SaleYear,e.FullName,s.TotalSales,
FIRST_VALUE(e.FullName) OVER(PARTITION BY s.SaleYear ORDER BY s.TotalSales DESC) AS HighestSale,
LAST_VALUE(e.FullName) OVER(PARTITION BY s.SaleYear ORDER BY s.TotalSales DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LowestSale
FROM #EmployeeSales s
INNER JOIN #employee e ON e.employeeid = s.EmployeeID)
SELECT SaleYear,Max(TotalSales) AS HighestSalesTotal,Min(TotalSales) AS LowestSalesTotal,MAX(HighestSale) AS HighestSalesPerson,MAX(LowestSale) AS LowestSalesPerson
FROM Sales_cte
GROUP BY SaleYear
With this l am able to see clearly each year who is the best seller and who is the poorest seller no matter how many employees l have.
There are some other analytic functions such as LEAD,LAG, PERCENT_RANK,CUME_DISC,PERCENTILE_CONT,PERCENTILE_DISC etc all introduced in sql server 2012 so its good to know how this functions behave and take advantage of the functionality.
Hope you find this interesting. Let me know.