Problems
If you want to have some hands on practice without the hassle of installing and setting up the required softwares in your local machine 🔫DB Fiddle provides free SQL sandbox. In a lot of problems below prebuilt sandbox links are already provided to refer but it is always recommended that you setup your personal sandbox to play around.
[Leetcode] Second highest salary
For a similar problem with different approach check Nth highest salary problem
Write a SQL query to get the second highest salary from the Employee table.
| Id | Salary |
|----|--------|
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |For example, given the above Employee table, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.
Answer
Multiple solutions are possible two approaches are given below for reference
SELECT
(SELECT DISTINCT(Salary)
FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1)
AS SecondHighestSalaryI feel the below solution is more complete as it gives you the ability to handle edge cases if id is also needed and there are multiple employees with same salary:
with cte(
select
salary,
dense_rank() over(order by salary desc) as rank
from Employee)
select salary as SecondHighestSalary
from cte where rank = 2[Leetcode] Rank Scores
Reference - Leetcode
Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.
| Id | Score |
|----|-------|
| 1 | 3.40 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.50 |
| 5 | 4.00 |
| 6 | 3.65 |For example, given the above Scores table, your query should generate the following report (order by highest score):
| score | Rank |
|-------|---------|
| 4.00 | 1 |
| 4.00 | 1 |
| 3.95 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.40 | 4 |Answer
The tie resolving method which is being asked in the question is called Dense Rank, if we use Rank it will have "holes"
select
Score, dense_rank() over(order by score desc) as Rank
from Scores[CHEWY] 2nd Highest score
| Id | subject | marks |
|---:|---------|------:|
| 1 | Maths | 30 |
| 1 | Phy | 50 |
| 1 | Chem | 85 |
| 2 | Maths | 90 |
| 2 | Phy | 50 |
| 2 | Chem | 85 |Select the second highest mark for each student.
Answer
with CTE as(
select *, rank() over(partition by Id order by marks desc) as Rank from tablename
)
select Id, subject, marks from CTE where Rank = 1[Leetcode] Consecutive Numbers
Reference - Leetcode
Write an SQL query to find all numbers that appear at least three times consecutively.
Return the result table in any order.
Input:
Logs table:
| Id | Num |
|----|-----|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |Result table:
| ConsecutiveNums |
|-----------------|
| 1 |1 is the only number that appears consecutively for at least three times.
Answer
Multiple solutions are possible, one of them is given below
with a(Num,NextNum,SecondNextNum ) as(
SELECT Num
, LEAD(Num, 1) OVER (ORDER BY Id) AS NextNum
, LEAD(Num, 2) OVER (ORDER BY Id) AS SecondNextNum
FROM Logs
)
select distinct(Num) as ConsecutiveNums from a
where
Num = NextNum
and Num = SecondNextNum[SALESFORCE] User Growth
Given you have user data for 2 accounts for 2 months. Calculate the growth rate of users in each account where growth rate is defined as unique users in month 2 divided by unique users in month 1.
| date_details | account_id | user_id |
|--------------|------------|---------|
| 2021-01-01 | U1 | A1 |
| 2021-01-01 | U1 | A2 |
| 2021-01-01 | U1 | A3 |
| 2021-01-01 | U1 | A4 |
| 2021-02-01 | U1 | A1 |
| 2021-02-01 | U1 | A2 |
| 2021-02-01 | U1 | A3 |
| 2021-02-01 | U1 | A4 |
| 2021-02-01 | U1 | A5 |
| 2021-01-01 | U2 | A1 |
| 2021-01-01 | U2 | A2 |
| 2021-01-01 | U2 | A3 |
| 2021-02-01 | U2 | A1 |
| 2021-02-01 | U2 | A2 |Answer
with cte as (
select account_id, count(distinct(user_id)) as unique_user, MONTH(date_details) as user_month from tablename
group by account_id, MONTH(date_details)
)
select a.account_id,month_2,month_1,
cast((month_2/month_1)as float) as growth from
(select account_id, unique_user as month_1
from cte where user_month = 1)a
left join
(select account_id, unique_user as month_2
from cte where user_month = 2)b
on (a.account_id = b.account_id)[SALESFORCE] Month over Month Revenue
You have 2 tables:
transactions: date, prod_id, quantity
products: prod_id, price
Calculate the month over month revenue, example month over month revenue for month2 is month2_Revenue- month1_Revenue
Answer
[SALESFORCE] Retention Rate
(Source)
Find the monthly retention rate of users for each account separately for Dec 2020 and Jan 2021. Retention rate is the percentage of active users an account retains over a given period of time. In this case, assume the user is retained if he/she stays with the app in any future months. For example, if a user was active in Dec 2020 and has activity in any future month, consider them retained for Dec. You can assume all accounts are present in Dec 2020 and Jan 2021. Your output should have the account ID and the Jan 2021 retention rate divided by Dec 2020 retention rate.
Note: I believe the official solution provided on the website is not correct as of 25-10-2023
Answer
[SALESFORCE] Employee earning more than their manager
Reference - Leetcode
Write an SQL query to find the employees who earn more than their managers.
Output will be : Joe
Answer
[Leetcode] Highest Salary in each Department
Reference - Leetcode
Write an SQL query to find employees who have the highest salary in each of the departments.
(../SQL/images/image3.PNG)
Answer
[AMAZON] Cumulative Sum
Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.
Answer
Tree Structure Labeling
🔫Playground Input:
Write SQL such that you label each node as a “leaf”, “inner” or “Root” node, such that for the nodes above the output is:
Output:
Answer
[FACEBOOK] Binning data
🔫Playground Input:
Bin the videos into groups of 5 secs each
Output:
Another similar question was asked in Facebook but instead of video length the ask was to write a SQL query to create a histogram of number of comments per user in the month of January 2020. As the approach is similar hence not including it here.
Answer
[DROPBOX] Closest SAT Scores
Given a table of students and their SAT test scores, write a query to return the two students with the closest test scores with the score difference. Assume a random pick if there are multiple students with the same score difference.
Input:
Output:
Answer
[AMAZON] Average Distance between Cities
You are given a table with varying distances from various cities. How do you find the average distance between each of the pairs of the cities?
Output:
Another variant of this question is
"Write a query to create a new table, named flight routes, that displays unique pairs of two locations?"
Answer
[AMAZON] Duplicate Rows
Given a users table, write a query to return only its duplicate rows
Answer
Multiple solutions are possible only one approach is given below for reference
Let's assume there are 2 columns: id, name
[INTUIT] Product Average
transactions table
products table
Given a table of transactions and products, write a query to return the product id, product price, and average transaction price of all products with price greater than the average transaction price.
Answer
[INTUIT] Data Analyst Interview Question
Given the following tables:
Where
Experiments is a table in which we store whether a user is part of an experiment and if so whether they are in test or control (assume there is only one test variant per experiment). The fields are:
● user_id - There are many users each of whom can be in many experiments
● assignment_ts - timestamp of when the user was allocated to the experiment. A user is only allocated once per experiment.
● experiment_id - An experiment has many users
● experiment_assignment - Whether the user is in test or control. Assignments are immutable and there is only one assignment per user/experiment combo.
Subscriptions is a table of subscription related events. For each user, there will always be a trial start event however there will only be a subscription start event if the user subscribes. Assume a given user can only have one trial start and at most one subscription start. The subscription can start at any time after the trial start and times for either event type are captured in event_ts.
Questions
Write queries to produce the following:
When did each experiment start? Use the first instance of an experiment assignment to either test or control for an experiment to equate to when the experiment started. Results should look like:

How long did each experiment last, expressed in days? Assume the last instance of an experiment assignment to test or control for an experiment to equate to when the experiment ended. Results should look like:

How many users are in test and control for each experiment? Result should look like:

What is the conversion rate by experiment assignment for each experiment? A conversion is any user for whom there is a subscription start event in addition to the trial start event (all users have a trial start event). If a user is in multiple experiments at the same time, it’s ok to count them towards the conversion rate of each experiment. We also want to only return one row per experiment. Result should look like:

5) For each experiment_id, rank and list first 3 user_ids who subscribed to the product. Output should look like:

[INTUIT] Employer EINs
We're given a table called employers that consists of a user_id, year, and employer EIN label. Users can have multiple employers dictated by the different EIN labels.
Write a query to add a flag to each user if they've added a new employer in the current year.
Example:
Answer
This problem is a little trickier than it looks at the outset
Last updated
Was this helpful?