Probability Basics

Probability theory is the mathematical foundation of statistical inference, which is indispensable for analyzing data affected by chance, and thus essential for data scientists.

Probability theory is the mathematical framework that allows us to analyze chance events in a logically sound manner. The probability of an event is a number indicating how likely that event will occur.

Note that when we say the probability of a head is 1/2, we are not claiming that any sequence of coin tosses will consist of exactly 50% heads. If we toss a fair coin ten times, it would not be surprising to observe 6 heads and 4 tails, or even 3 heads and 7 tails. But as we continue to toss the coin over and over again, we expect the long-run frequency of heads to get ever closer to 50%.

In general, it is important in statistics to understand the distinction between theoretical and empirical quantities. Here, the true (theoretical) probability of a head was 1/2, but any realized (empirical) sequence of coin tosses may have more or less than exactly 50% heads.

Common Terminologies

The sample space is the set of all possible outcomes in the experiment: for some dice =1,2,3,4,5,6Ω = {1, 2, 3, 4, 5, 6}.

Any subset of Ω is a valid event. we can speak of the event FF of rolling a 4, F=4F = {4}.

Consider the outcome of a single die roll and call it XX. A reasonable question one might ask is “What is the average value of XX?". We define this notion of “average” as a weighted sum of outcomes. This is called the expected value, or expectation of XX, denoted by E(X)E(X),

WeightedAverage=16(1+2+3+4+5+6)=3.5Weighted Average = \frac{1}{6} * (1 + 2 + 3 + 4 + 5 + 6) = 3.5

If you play the game \infty times the average value becomes E(X)E(X)

The variance of a random variable XX is a nonnegative number that summarizes on average how much XX differs from its mean, or expectation. The square root of the variance is called the standard deviation.

Var(X)=(13.5)2+(23.5)2+(33.5)2+(43.5)2+(53.5)2+(63.5)26=17.56Var(X) = \frac{(1−3.5)^2+(2−3.5)^2+(3−3.5)^2+(4−3.5)^2+(5−3.5)^2+(6−3.5)^2}{6} = \frac{17.5}{6}


A set, broadly defined, is a collection of objects. In the context of probability theory, we use set notation to specify compound events. For example, we can represent the event roll an even number by the set {2, 4, 6}.

Permutation and Combination

It can be surprisingly difficult to count the number of sequences or sets satisfying certain conditions. This is where Premutation and Combination comes in. For example, consider a bag of marbles in which each marble is a different color. If we draw marbles one at a time from the bag without replacement, how many different ordered sequences (permutations) of the marbles are possible? How many different unordered sets (combinations)?

  • Permutation(ABBAAB \neq BA , order matters) = nPr=n!(nr)!nPr = \frac{n!}{(n-r)!}

  • Combination (AB=BAAB = BA, order does not matter) = nCr=n!r!(nr)!nCr = \frac{n!}{r!(n-r)!}

Joint & Conditional Probability

  • Joint Probability is the probability of two independent events occurring: P(AB)=P(A)P(B)P(A \cap B) = P(A)*P(B)

  • Conditional probability tells the probability of BB given AA has occurred, it allows us to account for information we have about our system of interest: P(BA)=P(AB)P(A)P(B|A) = \frac{P(A \cap B)}{P(A)}

If both are same, then A and B are independent events.

Bayes' Theorem

Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical formula for determining conditional probability. Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring.

An easy way of remembering it is using the below example:

What is the probability of a fruit being banana given that it is long and yellow?

P(BananaLong,Yellow)=P(LongBanana)P(YellowBanana)P(Banana)P(Long)P(Yellow)P(Banana|Long,Yellow) = \frac{P(Long|Banana)*P(Yellow|Banana)*P(Banana)}{P(Long)*P(Yellow)}


The Maximum Aposteriori Probability (MAP) Estimation of the random variable y, given we have observed IID (x1,x2,x3,...)(x_1, x_2, x_3, ... ) here we try to accommodate our prior knowledge when estimating. In Maximum Likelihood Estimation (MLE), we assume we don’t have any prior knowledge of the quantity being estimated.


[UBER] Dice in increasing order

We throw 3 dice one by one. What is the probability that we obtain 3 points in strictly increasing order?


Suppose we get 44 in the first roll then,

Total Probability = P(4)P(5)P(6)=1/61/61/6=1/216P(4) * P(5) * P(6) = 1/6 * 1/6 * 1/6 = 1/216

Similarly for 33, P(3)P(4,54,65,6)=1/6(1/36+1/36+1/36)=3/216P(3) * P(4,5 | 4,6 | 5,6) = 1/6 * (1/36 + 1/36 + 1/36) = 3/216

Taking into consideration P(1)P(1) and P(2)P(2) we have the total as =10/216+6/216+3/216+1/216=20/216= 10/216 + 6/216 + 3/216 + 1/216 = 20/216

[LINKEDIN] Cards in increasing order

Imagine a deck of 500 cards numbered from 1 to 500. If all the cards are shuffled randomly and you are asked to pick three cards, one at a time, what's the probability of each subsequent card being larger than the previous drawn card?


It is actually easy to solve this if you think on it a little. Let's pick any 33 cards, now if you rearrange it there will only be 11 way in which each subsequent card is larger the previous card. So, a total of 66 **** ways to arrange the cards out of which only 11 is valid. So the result is 16\frac{1}{6}.

[STATE FARM] Cards without replacement

Pull 2 cards from a deck without replacement what is probability that both are of different colors.

There can be many variants to this question.



Here it is not specified which color the cards should be - they can be either red or black.

The probability that the first card drawn is either red or black is 11 since these two are the only possible outcomes.

After the first draw, the total number of cards remaining in the pack is 5151, out of which 2525 cards are of the same colour as that of the card that is already drawn. Hence the probability of drawing a card of the same colour as the first one is 2551\frac{25}{51}.

⇒ The probability of drawing two cards of the same colour is 12551=25511*\frac{25}{51}=\frac{25}{51}.

Another approach to this can be:

Two cards of a particular color can be drawn in C(26,2)C(26,2) ways.

⇒ Two cards of either red or black can be drawn in 2×C(26,2)2×C(26,2) ways.

The total number of ways of drawing any two cards from the pack is C(52,2)C(52,2).

⇒ The probability of drawing two cards of the same colour is 2×C(26,2)C(52,2)=2×26!2!×24!2!×50!52!=2551\frac{2×C(26,2)}{C(52,2)} = \frac{2×26!}{2!×24!}\frac{2!×50!}{52!}=\frac{25}{51}


Suppose you're playing a dice game. You have 2 dice.

  • What's the probability of rolling at least one 3?

  • What's the probability of rolling at least one 3 given N die?


P(at least 1 three) = P(exactly 1 three) + P(2 three) = 1/6 * 5/6 + 5/6 * 1/6 + 1/36 = 11/36

Solution received from the community via mail

To count the number of ways to throw at least 11 three for NN dice, you need to sum overall kk, 1<k<=N1<k<=N, where kk is the number of threes you throw. For each kk, there are C(N,k)C(N,k) possible combinations of dice that are three. For each of these combinations, there are 55 possible values for the other NKN-K dice. So, the number of ways to throw kk threes with NN dice is 5(Nk)C(N,k)5^{(N-k)}*C(N,k).

The total sum over 1<k<=N1<k<=N is k=1N5(Nk)(n k )=6N5N\sum_{k=1}^N 5^{(N-k)} \begin{pmatrix} n\ k\ \end{pmatrix} = 6^N-5^N. Since there are 6N6^N ways to throw the dice, the probability is (6N5N)/6N=1(5/6)N(6^N - 5^N)/6^N = 1 - (5/6)^N.

There is a simpler way to solve this problem: calculate the number of ways to not throw any threes, then subtract this number from the total number of ways to throw the dice. For N=2N=2, this is 1(5/6)2=125/36=11/361 - (5/6)^2 = 1 - 25/36 = 11/36. For NN, it is 1(5/6)N1 - (5/6)^N You can see that this is equivalent to the probability calculated using the above sum: 1(5/6)N1 - (5/6)^N.

Tip:`` ``Check the general case for N=2 and see if the numbers match

[FACEBOOK] 3 Zebras

Three zebras are chilling in the desert. Suddenly a lion attacks.

Each zebra is sitting on a corner of an equally length triangle. Each zebra randomly picks a direction and only runs along the outline of the triangle to either edge of the triangle.

What is the probability that none of the zebras collide?


Each zebra has 2 options of travel: clockwise or anticlockwise. So a total of 222=82*2*2 = 8 options.

Out of this only way in which they donot collide is if all of them travel clockwise or anticlockwise. So a total of 22.

Therefore the probability of no collision =2/8=25= 2/8 = 25%\

[POSTMATES] Four Person Elevator

There are four people on the ground floor of a building that has five levels not including the ground floor. They all get into the same elevator.

If each person is equally likely to get on any floor and they leave independently of each other, what is the probability that no two passengers will get off at the same floor?


The number of ways to assigning five floors to four different people is to get the total sample space. In this case it would be 55555 * 5 * 5 * 5.

The number of ways to assign five floors to four people without repetition of floors is 54325 * 4 * 3 * 2 because for the first passenger you have five different options. The second person has four, and so on. Note that this number counts all possible orders between passengers as well.

The result is then 54325555=0.192\frac{5 * 4 * 3 * 2}{5 * 5 * 5 * 5} = 0.192

[AMAZON] Found Item

Amazon has a warehouse system where items on the website are located at different distribution centers across a city. Let's say in one example city, the probability that a specific item X at location A is 0.6, and at location B the probability is 0.8.

Given you're a customer in this example city and the items are only found on the website if they exist in the distribution centers, what is the probability that the item X would be found on Amazon's website?


Probability of the item being present= 11- p(item NOT in A AND NOT in B) =1(0.40.2)=0.92= 1-(0.4*0.2)=0.92

[SPOTIFY] Max Dice Roll

A fair die is rolled nn times. What is the probability that the largest number rolled is rr, for each rr in 1..61..6?

Answer If r(1r6)r(1≤r≤6) is the largest number you have allowed for your nn rolls, then you forbid any number larger than rr. That is, you forbid 6r6−r values. The probability that your single roll does not show any of these 6r6−r values is 6r6\frac{6−r}{6} and the probability that this happens each time during a series of nn rolls is the obviously (6r6)n(\frac{6−r}{6})^n

There is a subtle nuance to this problem, in the above solution we have assumed the max<=rmax<=r which is different from max=rmax=r or in other words if r=3r=3, the above solution gives results for r=1,2,3r= 1,2,3. The solution of r=3r=3 is a little more involved:

Let's take r=3r=3, for nn die rolls we should have atleast one rr. The Probability of that is:

P(r=3)P(r=3) =P(of getting all n values as 1,2,3P(atleast one 3))= P(\text{of getting all n values as 1,2,3} * P(\text{atleast one 3})) = (\frac{3}{6})^n * (1-P(\text{no 3's occuring}))$$$$= (\frac{3}{6})^n * (1-(\frac{\text{only getting 1,2}}{\text{out of 1,2,3}})^n)$$$$= (\frac{3}{6})^n * (1-(\frac{2}{3})^n)$$$$= \text{generalizing } (\frac{r}{6})^n * (1-(\frac{r-1}{r})^n)$$$$= \frac{r^n - (r-1)^n}{6^n}

[FACEBOOK] Labeling Content

Facebook has a content team that labels pieces of content on the platform as spam or not spam. 9090%\ of them are diligent raters and will label 2020%\ of the content as spam and 8080%\ as non-spam. The remaining 1010%\ are non-diligent raters and will label 00%\ of the content as spam and 100100%\ as non-spam. Assume the pieces of content are labeled independently from one another, for every rater. Given that a rater has labeled 44 pieces of content as good, what is the probability that they are a diligent rater?


This can be solved using Baye's theorem:

  • Not Spam = NSNS

  • Spam = SS

  • Diligent =DD

  • NotDiligent =NDND

P(DNS,NS,NS,NS)=P(NS,NS,NS,NSD)P(D)P(NS,NS,NS,NSD)P(D)+P(NS,NS,NS,NSND)P(ND)P(D|NS, NS, NS, NS) = \frac{P(NS, NS, NS, NS|D)*P(D)}{P(NS, NS, NS, NS|D)*P(D)+P(NS, NS, NS, NS|ND)*P(ND)} P(DNS,NS,NS,NS)=0.840.90.840.9+140.1P(D|NS, NS, NS, NS) = \frac{0.8^4*0.9}{0.8^4*0.9+1^4*0.1} = ~0.7870.787

[FACEBOOK] Raining

You are about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call 33 random friends of yours who live there and ask each independently if it's raining. Each of your friends has a 2/32/3 chance of telling you the truth and a 1/31/3 chance of messing with you by lying. All 33 friends tell you that "Yes" it is raining.

What is the probability that it's actually raining in Seattle?


Even though the problem is straightforward one can interpret the problem in many ways. Taking a Bayesian approach is probably appropriate in a real world sense, but if you are told by the interviewer you have no ability to determine the priors, you can't use Bayesian. Check this thread for a detailed discussion on this problem.

For it to be not raining, all friends must be lying. Therefore, the solution must be the inverse of the probability that all three are "messing with you." (1/3)(1/3)(1/3)=1/27(1/3)*(1/3)*(1/3)=1/27 (3.7% chance they are all lying).

Since there is only a 3.73.7%\ chance all three friends are messing with you, there is a 96.396.3%\ chance it is raining.

[MICROSOFT] First to Six

Amy and Brad take turns in rolling a fair six-sided die. Whoever rolls a 66 first wins the game. Amy starts by rolling first.

What's the probability that Amy wins?


Amy can win on the first roll, third roll, fifth roll, and so on.

Probability of Amy winning in the first roll = P(six rolled by her) = 1/61/6

Probability of Amy winning in the third roll = P(six NOT rolled by her in first try) * P(six NOT rolled by Brad in first try) * P(six rolled by her in 2nd try) = (5/6)(5/6)(1/6)=1/6(5/6)2(5/6) * (5/6) * (1/6) = 1/6 * (5/6)^2

Similarly, the probability of Amy winning in the fifth roll = (1/6)(5/6)4(1/6) * (5/6)^4

Similarly, the probability of Amy winning in the seventh roll = (1/6)(5/6)6(1/6) * (5/6)^6

Hence, total probability of Amy winning = Sum of all such events = (1/6)+(1/6(5/6)2)+(1/6(5/6)4)+(1/6(5/6)6)+...(1/6) + (1/6 * (5/6)^2) + (1/6 * (5/6)^4) + (1/6 * (5/6)^6) + ...

The sum of such an infinite Geometric Progression series is = a1r=(1/6)/(125/36)=(1/6)/(11/36)=6/11\frac{a}{1-r} = (1/6) / (1 - 25/36) = (1/6) / (11/36) = 6/11

Hence, probability of Amy winning in any of her turns = 6/116/11

[GOOGLE][FACEBOOK] Double Sided Coin

A jar has 10001000 coins, of which 999999 are fair and 11 is double headed. Pick a coin at random, and toss it 1010 times. Given that you see 1010 heads, what is the probability that the next toss of that coin is also a head?


There are two ways of choosing the coin. One is to pick a fair coin and the other is to pick the one with two heads.

  • Probability of selecting fair coin =999/1000=0.999= 999/1000 = 0.999

  • Probability of selecting unfair coin =1/1000=0.001= 1/1000 = 0.001

Selecting 1010 heads in a row = Selecting fair coin * Getting 10 heads + Selecting an unfair coin

  • P (A) =0.999(1/2)5=0.999(1/1024)=0.000976= 0.999 * (1/2)^5 = 0.999 * (1/1024) = 0.000976

  • P (B) =0.0011=0.001= 0.001 * 1 = 0.001

  • P( A / A + B ) =0.000976/(0.000976+0.001)=0.4939= 0.000976 / (0.000976 + 0.001) = 0.4939

  • P( B / A + B ) =0.001/0.001976=0.5061= 0.001 / 0.001976 = 0.5061

Probability of selecting another head =P(A/A+B)0.5+P(B/A+B)1=0.49390.5+0.5061=0.7531= P(A/A+B) * 0.5 + P(B/A+B) * 1 = 0.4939 * 0.5 + 0.5061 = 0.7531

[GOOGLE] Forming a triangle

A 10 feet pole is randomly cut into 3 pieces. What is the probability that exactly form a triangle?



Suppose the pole is ABAB and there are two points PP and QQ such that AP=xAP = x and PQ=yPQ = y, so that QB=10xyQB = 10 – x – y as we know that sum of two sides of a triangle is greater than the 3rd side. Hence

x+y>(10xy)x + y > (10 – x – y) or x+y>5x + y > 5

y+(10xy)>xy + (10 – x – y) > x or x<5x < 5

(10xy)>y(10 – x – y) > y or y<5y < 5

Also, we know that all the parts of pole must be greater than 00,

or x>0,y>0,10xy>0orx>0,y>0,x+y<10x > 0, y > 0, 10 – x – y > 0 or x > 0, y > 0, x + y < 10

Plotting the lines x+y=10x+y=5,x=5,y=5x + y = 10 x + y = 5, x = 5, y = 5. Now favorable area is the area of the middle red shaded triangle.

Required probability =1/4= 1/4

[LYFT] Flips until two heads

What is the expected number of coin flips needed to get two consecutive heads?



Let's first assume xx is the expected number of coin flips required for getting two heads in a row. Now:

  • If the first flip turns out to be tail you need xx more flips since the events are independent. Probability of the event 1/21/2. Since 11 flip was wasted total number of flips required (1+x)(1+x).

  • If the first flip becomes head, but the second one is tail(HTHT) - 22 flips are wasted, here total number flips required would be (2+x)(2+x). Probability of HTHT out of HH,HT,TH,TTHH, HT, TH, TT is (1/4)(1/4)

  • The best case, the first two flips turn out to be heads both(HHHH). Probability, 1/41/4 i.e. HHHH out of HH,HT,TH,TTHH, HT, TH, TT. No of flips required 22.

So from the above scenarios, x=1/2(1+x)+1/4(2+x)+(1/4)2x = 1/2(1+x) + 1/4(2+x) + (1/4 )* 2 =1/2[(1+x)+1/2(2+x)+1]= 1/2 [ (1+x) + 1/2(2+x) + 1 ] =1/2[1+x+1+x/2+1]= 1/2 [ 1 + x + 1 + x/2 + 1 ]

x/4=3/2x / 4 = 3/2 x=6x = 6

So the expected number of flips would be 66

[LYFT] Number of cards before an ace

How many cards would you expect to draw from a standard deck before seeing the first ace?



Let XX represent the number of cards that are turned up to produce the 1st1^{st} ace. For this problem, we cannot apply the Geometric Distribution because cards are sampled without replacement.

Instead, we begin by considering the probabilities of drawing the 1st1^{st} ace on the 1st1^{st} card, 2nd2^{nd} card, and so on:

P(1stcard)=452P(1^{st} card)= \frac{4}{52}

P(2ndcard)=4852451P(2^{nd} card)= \frac{48}{52}\frac{4}{51}

P(3rdcard)=48524751450P(3^{rd} card)= \frac{48}{52}\frac{47}{51}\frac{4}{50}

P(nthcard)=448!(49x)!(52x)!(52)!P(n^{th} card)= 4* \frac{48!}{(49-x)!}\frac{(52-x)!}{(52)!}

With this we can calculate the average number of cards by applying the definition of expected value:

E[X]=x=1524x48!(49x)!(52x)!(52)!=535=10.6E[X]= \sum\limits_{x=1}^{52} 4x \frac{48!}{(49-x)!}\frac{(52-x)!}{(52)!} = \frac{53}{5} = 10.6

[FACEBOOK] Ad Raters

Let’s say we use people to rate ads.

There are two types of raters. Random and independent from our point of view:

80% of raters are careful and they rate an ad as good (60% chance) or bad (40% chance). 20% of raters are lazy and they rate every ad as good (100% chance).

  • Suppose we have 100 raters each rating one ad independently. What’s the expected number of good ads?

  • Now suppose we have 1 rater rating 100 ads. What’s the expected number of good ads?

  • Suppose we have 1 ad, rated as bad. What’s the probability the rater was lazy?


  • 100 raters are divided into 2 groups according to probabilities:

2020 lazy raters: 100100%\ good ads -> 2020 good ads; 8080 careful raters: 6060%\ good ads -> 800.6=4880 * 0.6 = 48 good ads. Total 6868 good ads.

  • There could be 2 cases:

Random rater is careful with probability of 0.8:0.80.6=0.480.8: 0.8 * 0.6 = 0.48 - probability or rating good ad Random rater is lazy with probability of 0.2:0.21=0.20.2: 0.2 * 1 = 0.2 - probability or rating good ad Total probability of rating ad as good is 0.48+0.2=0.680.48+0.2 = 0.68. The expected amount of good rates 1000.68=68100*0.68 = 68.

  • It’s 00 probability that the rater is lazy because lazy raters always rate ads as good.

[INTUIT] Chances of Winning

Let’s say you can play a coin flipping guessing game either once or a 2 out of 3 game. What is the best strategy for winning?


Scenario 1: Single Coin Flip Game

  • In this game, you have a 50% chance of winning on each flip because there are two possible outcomes (heads or tails), and you need one specific outcome (either heads or tails) to win.

Probability of winning a single coin flip = 0.5 (50%)

Scenario 2: 2 out of 3 Coin Flip Game

  • In this game, you need to win at least two out of three coin flips to win the game. To calculate this probability, we can use the binomial probability formula.

Probability of winning two out of three coin flips:

  • Calculate the probability of winning exactly two out of three flips:

    • P(Win 2 out of 3) = C(3, 2) * (0.5)^2 * (0.5)^(3-2) = 3 * 0.25 * 0.5 = 0.375

  • Calculate the probability of winning all three flips:

    • P(Win 3 out of 3) = (0.5)^3 = 0.125

Now, sum these probabilities to get the overall probability of winning the 2 out of 3 game:

Probability of winning the 2 out of 3 game = P(Win 2 out of 3) + P(Win 3 out of 3) = 0.375 + 0.125 = 0.5 (50%)

So, the probability of winning in either a single coin flip game or a 2 out of 3 coin flip game is 50% for both scenarios.

Last updated