Windows Functions

Since solving any reasonable SQL problem requires a combination of all the topics covered here, hence it becomes difficult to seggregate problems based on one topic alone. So for SQL we are creating a dedicated Problems section. Theoretical and Basic questions will still be under their dedicated sections.

Reference: 📖Explanation, 🔫Playground

Window (also, windowing or windowed) functions perform a calculation over a set of rows. I like to think of “looking through the window” at the rows that are being returned and having one last chance to perform a calculation. The window is defined by the OVER clause which determines if the rows are partitioned into smaller sets and if they are ordered. They allow you to add your favourite aggregate function to a non-aggregate query. Similar to Transform is pandas group by clause.

Common Windows Functions

  • Ranking functions

    • ROW_NUMBER: is used to add unique row numbers to a partition or to the entire result set. It has the ability to turn non-unique rows into unique rows

    • RANK: it will give numbers same as row_number just that same data will get same rank

    • DENSE_RANK: doesnot skip and rank number

    • NTILE: It assigns bucket numbers to the rows instead of row numbers or ranks

  • Offset functions

    • LAG: the function allows you to pull columns or expressions from a row before the current row

    • LEAD: the function allows you to pull columns or expressions from a row after the current row

    • FIRST_VALUE: the functions allows you to return values from the first row of the partition

    • LAST_VALUE: the functions allows you to return values from the last row of the partition

  • Statistical functionsPERCENT_RANK: returns the percentage of rows that rank lower than the current row, its formula is Rank1Row count1\frac{\text{Rank} -1}{\text{Row count} -1}

    • CUME_DIST: cumulative distribution, returns the exact rank, its formula is RankRow count\frac{\text{Rank}}{\text{Row count}}

    • PERCENTILE_DISC & PERCENTILE_CONT: these two work in the opposite way. Given a percent rank, find the value at that rank. They differ in that PERCENTILE_DISC will return a value that exists in the set while PERCENTILE_CONT will calculate an exact value if none of the values in the set falls precisely at that rank. You can use PERCENTILE_CONT to calculate a median by supplying 0.5 as the percent rank. For example, which temperature ranks at 50% in St. Louis?

One problem is you cannot add window functions to the WHERE clause. But certain Data Bases like TeraData, Snowflake, Databricks, Big Query, etc. have support for something called QUALIFY. Let's take an example to understand the difference:

In traditional SQL you will write the below SQL:

select * from (SELECT Product, Region, Revenue, ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Revenue DESC) as rn FROM Sales ) where rn=1

Here's how you can use the QUALIFY keyword to achieve this:

SELECT Product, Region, Revenue FROM Sales QUALIFY ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Revenue DESC) = 1;

Remember you CAN include window function and GROUP BY in the same statement, in such case the WINDOW function is evaluated after GROUP BY. As an example, below we can get both the user2 count at a user1 level along with total number of users in the table in the same query:

select user1 
, 100*() as FLOAT)/CAST ()) as popularity_percent
from facebook_friends
group by user1

Last updated