Problems

If you want to have some hands on practice without the hassle of installing and setting up the required softwares in your local machine 🔫DB Fiddle provides free SQL sandbox. In a lot of problems below prebuilt sandbox links are already provided to refer but it is always recommended that you setup your personal sandbox to play around.

[Leetcode] Second highest salary

For a similar problem with different approach check Nth highest salary problem

Write a SQL query to get the second highest salary from the Employee table.

| Id | Salary |
|----|--------|
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

For example, given the above Employee table, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.

Answer

Multiple solutions are possible two approaches are given below for reference

SELECT
(SELECT DISTINCT(Salary)
FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1) 
AS SecondHighestSalary

I feel the below solution is more complete as it gives you the ability to handle edge cases if id is also needed and there are multiple employees with same salary:

with cte(
select 
salary,
dense_rank() over(order by salary desc) as rank
from Employee)

select salary as SecondHighestSalary
from cte where rank = 2
[Leetcode] Rank Scores

Reference - Leetcode

Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.

| Id | Score |
|----|-------|
| 1  | 3.40  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.50  |
| 5  | 4.00  |
| 6  | 3.65  |

For example, given the above Scores table, your query should generate the following report (order by highest score):

| score | Rank    |
|-------|---------|
| 4.00  | 1       |
| 4.00  | 1       |
| 3.95  | 2       |
| 3.65  | 3       |
| 3.65  | 3       |
| 3.40  | 4       |

Answer

The tie resolving method which is being asked in the question is called Dense Rank, if we use Rank it will have "holes"

select 
Score, dense_rank() over(order by score desc) as Rank
from Scores
[CHEWY] 2nd Highest score
| Id | subject | marks |
|---:|---------|------:|
|  1 | Maths   |    30 |
|  1 | Phy     |    50 |
|  1 | Chem    |    85 |
|  2 | Maths   |    90 |
|  2 | Phy     |    50 |
|  2 | Chem    |    85 |

Select the second highest mark for each student.

Answer

with CTE as(
	select *, rank() over(partition by Id order by marks desc) as Rank from tablename
)
select Id, subject, marks from CTE where Rank = 1
[Leetcode] Consecutive Numbers

Reference - Leetcode

Write an SQL query to find all numbers that appear at least three times consecutively.

Return the result table in any order.

Input:

Logs table:

| Id | Num |
|----|-----|
| 1  | 1   |
| 2  | 1   |
| 3  | 1   |
| 4  | 2   |
| 5  | 1   |
| 6  | 2   |
| 7  | 2   |

Result table:

| ConsecutiveNums |
|-----------------|
| 1               |

1 is the only number that appears consecutively for at least three times.

Answer

Multiple solutions are possible, one of them is given below

with a(Num,NextNum,SecondNextNum ) as(

	SELECT   Num
	         , LEAD(Num, 1) OVER (ORDER BY Id) AS NextNum
	         , LEAD(Num, 2) OVER (ORDER BY Id) AS SecondNextNum
	      FROM Logs
	      
	)

	select distinct(Num) as ConsecutiveNums from a
	where
	Num = NextNum
	and Num = SecondNextNum
[SALESFORCE] User Growth

🔫Playground

Given you have user data for 2 accounts for 2 months. Calculate the growth rate of users in each account where growth rate is defined as unique users in month 2 divided by unique users in month 1.

| date_details | account_id | user_id |
|--------------|------------|---------|
| 2021-01-01   | U1         | A1      |
| 2021-01-01   | U1         | A2      |
| 2021-01-01   | U1         | A3      |
| 2021-01-01   | U1         | A4      |
| 2021-02-01   | U1         | A1      |
| 2021-02-01   | U1         | A2      |
| 2021-02-01   | U1         | A3      |
| 2021-02-01   | U1         | A4      |
| 2021-02-01   | U1         | A5      |
| 2021-01-01   | U2         | A1      |
| 2021-01-01   | U2         | A2      |
| 2021-01-01   | U2         | A3      |
| 2021-02-01   | U2         | A1      |
| 2021-02-01   | U2         | A2      |

Answer

with cte as (
	select account_id, count(distinct(user_id)) as unique_user, MONTH(date_details) as user_month from tablename
	group by account_id, MONTH(date_details)
	)

select a.account_id,month_2,month_1,
cast((month_2/month_1)as float) as growth  from 
(select account_id, unique_user as month_1
from cte where user_month = 1)a
left join
(select account_id, unique_user as month_2
from cte where user_month = 2)b
on (a.account_id = b.account_id)
[SALESFORCE] Month over Month Revenue

🔫Playground

You have 2 tables:

  • transactions: date, prod_id, quantity

  • products: prod_id, price

Calculate the month over month revenue, example month over month revenue for month2 is month2_Revenue- month1_Revenue

Answer

[SALESFORCE] Retention Rate

(Source)

Find the monthly retention rate of users for each account separately for Dec 2020 and Jan 2021. Retention rate is the percentage of active users an account retains over a given period of time. In this case, assume the user is retained if he/she stays with the app in any future months. For example, if a user was active in Dec 2020 and has activity in any future month, consider them retained for Dec. You can assume all accounts are present in Dec 2020 and Jan 2021. Your output should have the account ID and the Jan 2021 retention rate divided by Dec 2020 retention rate.

Note: I believe the official solution provided on the website is not correct as of 25-10-2023

Answer

[SALESFORCE] Employee earning more than their manager

Reference - Leetcode

Write an SQL query to find the employees who earn more than their managers.

Output will be : Joe

Answer

[Leetcode] Highest Salary in each Department

Reference - Leetcode

Write an SQL query to find employees who have the highest salary in each of the departments.

(../SQL/images/image3.PNG)

Answer

[AMAZON] Cumulative Sum

Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.

🔫Playground

Answer

Tree Structure Labeling

🔫Playground Input:

Write SQL such that you label each node as a “leaf”, “inner” or “Root” node, such that for the nodes above the output is:

Output:

Answer

[FACEBOOK] Binning data

🔫Playground Input:

Bin the videos into groups of 5 secs each

Output:

Another similar question was asked in Facebook but instead of video length the ask was to write a SQL query to create a histogram of number of comments per user in the month of January 2020. As the approach is similar hence not including it here.

Answer

[DROPBOX] Closest SAT Scores

🔫Playground

Given a table of students and their SAT test scores, write a query to return the two students with the closest test scores with the score difference. Assume a random pick if there are multiple students with the same score difference.

Input:

Output:

Answer

[AMAZON] Average Distance between Cities

🔫Playground

You are given a table with varying distances from various cities. How do you find the average distance between each of the pairs of the cities?

Output:

Another variant of this question is

"Write a query to create a new table, named flight routes, that displays unique pairs of two locations?"

Answer

[AMAZON] Duplicate Rows

Given a users table, write a query to return only its duplicate rows

Answer

Multiple solutions are possible only one approach is given below for reference

Let's assume there are 2 columns: id, name

[INTUIT] Product Average

transactions table

products table

Given a table of transactions and products, write a query to return the product id, product price, and average transaction price of all products with price greater than the average transaction price.

Answer

Source

[INTUIT] Data Analyst Interview Question

Given the following tables:

Where

Experiments is a table in which we store whether a user is part of an experiment and if so whether they are in test or control (assume there is only one test variant per experiment). The fields are:

● user_id - There are many users each of whom can be in many experiments

● assignment_ts - timestamp of when the user was allocated to the experiment. A user is only allocated once per experiment.

● experiment_id - An experiment has many users

● experiment_assignment - Whether the user is in test or control. Assignments are immutable and there is only one assignment per user/experiment combo.

Subscriptions is a table of subscription related events. For each user, there will always be a trial start event however there will only be a subscription start event if the user subscribes. Assume a given user can only have one trial start and at most one subscription start. The subscription can start at any time after the trial start and times for either event type are captured in event_ts.

Questions

Write queries to produce the following:

  1. When did each experiment start? Use the first instance of an experiment assignment to either test or control for an experiment to equate to when the experiment started. Results should look like:

  1. How long did each experiment last, expressed in days? Assume the last instance of an experiment assignment to test or control for an experiment to equate to when the experiment ended. Results should look like:

  1. How many users are in test and control for each experiment? Result should look like:

  1. What is the conversion rate by experiment assignment for each experiment? A conversion is any user for whom there is a subscription start event in addition to the trial start event (all users have a trial start event). If a user is in multiple experiments at the same time, it’s ok to count them towards the conversion rate of each experiment. We also want to only return one row per experiment. Result should look like:

5) For each experiment_id, rank and list first 3 user_ids who subscribed to the product. Output should look like:

[INTUIT] Employer EINs

🔫Playground

We're given a table called employers that consists of a user_id, year, and employer EIN label. Users can have multiple employers dictated by the different EIN labels.

Write a query to add a flag to each user if they've added a new employer in the current year.

Example:

Answer

This problem is a little trickier than it looks at the outset

Last updated

Was this helpful?