Problems

circle-info

If you want to have some hands on practice without the hassle of installing and setting up the required softwares in your local machine 🔫DB Fiddle provides free SQL sandbox. In a lot of problems below prebuilt sandbox links are already provided to refer but it is always recommended that you setup your personal sandbox to play around.

chevron-right[Leetcode] Second highest salaryhashtag

For a similar problem with different approach check Nth highest salary problem

Write a SQL query to get the second highest salary from the Employee table.

| Id | Salary |
|----|--------|
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |

For example, given the above Employee table, the query should return 200 as the second highest salary. If there is no second highest salary, then the query should return null.

Answer

Multiple solutions are possible two approaches are given below for reference

SELECT
(SELECT DISTINCT(Salary)
FROM Employee
ORDER BY Salary DESC
LIMIT 1 OFFSET 1) 
AS SecondHighestSalary

I feel the below solution is more complete as it gives you the ability to handle edge cases if id is also needed and there are multiple employees with same salary:

with cte(
select 
salary,
dense_rank() over(order by salary desc) as rank
from Employee)

select salary as SecondHighestSalary
from cte where rank = 2
chevron-right[Leetcode] Rank Scoreshashtag

Reference - Leetcodearrow-up-right

Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.

| Id | Score |
|----|-------|
| 1  | 3.40  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.50  |
| 5  | 4.00  |
| 6  | 3.65  |

For example, given the above Scores table, your query should generate the following report (order by highest score):

| score | Rank    |
|-------|---------|
| 4.00  | 1       |
| 4.00  | 1       |
| 3.95  | 2       |
| 3.65  | 3       |
| 3.65  | 3       |
| 3.40  | 4       |

Answer

The tie resolving method which is being asked in the question is called Dense Rank, if we use Rank it will have "holes"

select 
Score, dense_rank() over(order by score desc) as Rank
from Scores
chevron-right[CHEWY] 2nd Highest scorehashtag
| Id | subject | marks |
|---:|---------|------:|
|  1 | Maths   |    30 |
|  1 | Phy     |    50 |
|  1 | Chem    |    85 |
|  2 | Maths   |    90 |
|  2 | Phy     |    50 |
|  2 | Chem    |    85 |

Select the second highest mark for each student.

Answer

with CTE as(
	select *, rank() over(partition by Id order by marks desc) as Rank from tablename
)
select Id, subject, marks from CTE where Rank = 1
chevron-right[Leetcode] Consecutive Numbershashtag

Reference - Leetcodearrow-up-right

Write an SQL query to find all numbers that appear at least three times consecutively.

Return the result table in any order.

Input:

Logs table:

| Id | Num |
|----|-----|
| 1  | 1   |
| 2  | 1   |
| 3  | 1   |
| 4  | 2   |
| 5  | 1   |
| 6  | 2   |
| 7  | 2   |

Result table:

| ConsecutiveNums |
|-----------------|
| 1               |

1 is the only number that appears consecutively for at least three times.

Answer

Multiple solutions are possible, one of them is given below

with a(Num,NextNum,SecondNextNum ) as(

	SELECT   Num
	         , LEAD(Num, 1) OVER (ORDER BY Id) AS NextNum
	         , LEAD(Num, 2) OVER (ORDER BY Id) AS SecondNextNum
	      FROM Logs
	      
	)

	select distinct(Num) as ConsecutiveNums from a
	where
	Num = NextNum
	and Num = SecondNextNum
chevron-right[SALESFORCE] User Growthhashtag

🔫Playgroundarrow-up-right

Given you have user data for 2 accounts for 2 months. Calculate the growth rate of users in each account where growth rate is defined as unique users in month 2 divided by unique users in month 1.

| date_details | account_id | user_id |
|--------------|------------|---------|
| 2021-01-01   | U1         | A1      |
| 2021-01-01   | U1         | A2      |
| 2021-01-01   | U1         | A3      |
| 2021-01-01   | U1         | A4      |
| 2021-02-01   | U1         | A1      |
| 2021-02-01   | U1         | A2      |
| 2021-02-01   | U1         | A3      |
| 2021-02-01   | U1         | A4      |
| 2021-02-01   | U1         | A5      |
| 2021-01-01   | U2         | A1      |
| 2021-01-01   | U2         | A2      |
| 2021-01-01   | U2         | A3      |
| 2021-02-01   | U2         | A1      |
| 2021-02-01   | U2         | A2      |

Answer

with cte as (
	select account_id, count(distinct(user_id)) as unique_user, MONTH(date_details) as user_month from tablename
	group by account_id, MONTH(date_details)
	)

select a.account_id,month_2,month_1,
cast((month_2/month_1)as float) as growth  from 
(select account_id, unique_user as month_1
from cte where user_month = 1)a
left join
(select account_id, unique_user as month_2
from cte where user_month = 2)b
on (a.account_id = b.account_id)
chevron-right[SALESFORCE] Month over Month Revenuehashtag

🔫Playgroundarrow-up-right

You have 2 tables:

  • transactions: date, prod_id, quantity

  • products: prod_id, price

Calculate the month over month revenue, example month over month revenue for month2 is month2_Revenue- month1_Revenue

Answer

chevron-right[SALESFORCE] Retention Ratehashtag

(Sourcearrow-up-right)

Find the monthly retention rate of users for each account separately for Dec 2020 and Jan 2021. Retention rate is the percentage of active users an account retains over a given period of time. In this case, assume the user is retained if he/she stays with the app in any future months. For example, if a user was active in Dec 2020 and has activity in any future month, consider them retained for Dec. You can assume all accounts are present in Dec 2020 and Jan 2021. Your output should have the account ID and the Jan 2021 retention rate divided by Dec 2020 retention rate.

Note: I believe the official solution provided on the website is not correct as of 25-10-2023

Answer

chevron-right[SALESFORCE] Employee earning more than their managerhashtag

Reference - Leetcodearrow-up-right

Write an SQL query to find the employees who earn more than their managers.

Output will be : Joe

Answer

chevron-right[Leetcode] Highest Salary in each Departmenthashtag

Reference - Leetcodearrow-up-right

Write an SQL query to find employees who have the highest salary in each of the departments.

(../SQL/images/image3.PNG)

Answer

chevron-right[AMAZON] Cumulative Sumhashtag

Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.

🔫Playgroundarrow-up-right

Answer

chevron-rightTree Structure Labelinghashtag

🔫Playgroundarrow-up-right Input:

Write SQL such that you label each node as a “leaf”, “inner” or “Root” node, such that for the nodes above the output is:

Output:

Answer

chevron-right[FACEBOOK] Binning datahashtag

🔫Playgroundarrow-up-right Input:

Bin the videos into groups of 5 secs each

Output:

Another similar question was asked in Facebook but instead of video length the ask was to write a SQL query to create a histogram of number of comments per user in the month of January 2020. As the approach is similar hence not including it here.

Answer

chevron-right[DROPBOX] Closest SAT Scoreshashtag

🔫Playgroundarrow-up-right

Given a table of students and their SAT test scores, write a query to return the two students with the closest test scores with the score difference. Assume a random pick if there are multiple students with the same score difference.

Input:

Output:

Answer

chevron-right[AMAZON] Average Distance between Citieshashtag

🔫Playgroundarrow-up-right

You are given a table with varying distances from various cities. How do you find the average distance between each of the pairs of the cities?

Output:

Another variant of this question is

"Write a query to create a new table, named flight routes, that displays unique pairs of two locations?"

Answer

chevron-right[AMAZON] Duplicate Rowshashtag

Given a users table, write a query to return only its duplicate rows

Answer

Multiple solutions are possible only one approach is given below for reference

Let's assume there are 2 columns: id, name

chevron-right[INTUIT] Product Averagehashtag

transactions table

products table

Given a table of transactions and products, write a query to return the product id, product price, and average transaction price of all products with price greater than the average transaction price.

Answer

Sourcearrow-up-right

chevron-right[INTUIT] Data Analyst Interview Questionhashtag

Given the following tables:

Where

Experiments is a table in which we store whether a user is part of an experiment and if so whether they are in test or control (assume there is only one test variant per experiment). The fields are:

● user_id - There are many users each of whom can be in many experiments

● assignment_ts - timestamp of when the user was allocated to the experiment. A user is only allocated once per experiment.

● experiment_id - An experiment has many users

● experiment_assignment - Whether the user is in test or control. Assignments are immutable and there is only one assignment per user/experiment combo.

Subscriptions is a table of subscription related events. For each user, there will always be a trial start event however there will only be a subscription start event if the user subscribes. Assume a given user can only have one trial start and at most one subscription start. The subscription can start at any time after the trial start and times for either event type are captured in event_ts.

Questions

Write queries to produce the following:

  1. When did each experiment start? Use the first instance of an experiment assignment to either test or control for an experiment to equate to when the experiment started. Results should look like:

  1. How long did each experiment last, expressed in days? Assume the last instance of an experiment assignment to test or control for an experiment to equate to when the experiment ended. Results should look like:

  1. How many users are in test and control for each experiment? Result should look like:

  1. What is the conversion rate by experiment assignment for each experiment? A conversion is any user for whom there is a subscription start event in addition to the trial start event (all users have a trial start event). If a user is in multiple experiments at the same time, it’s ok to count them towards the conversion rate of each experiment. We also want to only return one row per experiment. Result should look like:

5) For each experiment_id, rank and list first 3 user_ids who subscribed to the product. Output should look like:

chevron-right[INTUIT] Employer EINshashtag

🔫Playgroundarrow-up-right

We're given a table called employers that consists of a user_id, year, and employer EIN label. Users can have multiple employers dictated by the different EIN labels.

Write a query to add a flag to each user if they've added a new employer in the current year.

Example:

Answer

This problem is a little trickier than it looks at the outset

Last updated