Links

Probability Basics

Probability theory is the mathematical foundation of statistical inference, which is indispensable for analyzing data affected by chance, and thus essential for data scientists.
Probability theory is the mathematical framework that allows us to analyze chance events in a logically sound manner. The probability of an event is a number indicating how likely that event will occur.
Note that when we say the probability of a head is 1/2, we are not claiming that any sequence of coin tosses will consist of exactly 50% heads. If we toss a fair coin ten times, it would not be surprising to observe 6 heads and 4 tails, or even 3 heads and 7 tails. But as we continue to toss the coin over and over again, we expect the long-run frequency of heads to get ever closer to 50%.
In general, it is important in statistics to understand the distinction between theoretical and empirical quantities. Here, the true (theoretical) probability of a head was 1/2, but any realized (empirical) sequence of coin tosses may have more or less than exactly 50% heads.

Common Terminologies

The sample space is the set of all possible outcomes in the experiment: for some dice
=1,2,3,4,5,6Ω = {1, 2, 3, 4, 5, 6}
.
Any subset of Ω is a valid event. we can speak of the event
FF
of rolling a 4,
F=4F = {4}
.
Consider the outcome of a single die roll and call it
XX
. A reasonable question one might ask is “What is the average value of
XX
?". We define this notion of “average” as a weighted sum of outcomes. This is called the expected value, or expectation of
XX
, denoted by
E(X)E(X)
,
WeightedAverage=16(1+2+3+4+5+6)=3.5Weighted Average = \frac{1}{6} * (1 + 2 + 3 + 4 + 5 + 6) = 3.5
If you play the game
\infty
times the average value becomes
E(X)E(X)
The variance of a random variable
XX
is a nonnegative number that summarizes on average how much
XX
differs from its mean, or expectation. The square root of the variance is called the standard deviation.
Var(X)=(13.5)2+(23.5)2+(33.5)2+(43.5)2+(53.5)2+(63.5)26=17.56Var(X) = \frac{(1−3.5)^2+(2−3.5)^2+(3−3.5)^2+(4−3.5)^2+(5−3.5)^2+(6−3.5)^2}{6} = \frac{17.5}{6}

Set

A set, broadly defined, is a collection of objects. In the context of probability theory, we use set notation to specify compound events. For example, we can represent the event roll an even number by the set {2, 4, 6}.

Permutation and Combination

It can be surprisingly difficult to count the number of sequences or sets satisfying certain conditions. This is where Premutation and Combination comes in. For example, consider a bag of marbles in which each marble is a different color. If we draw marbles one at a time from the bag without replacement, how many different ordered sequences (permutations) of the marbles are possible? How many different unordered sets (combinations)?
  • Permutation(
    ABBAAB \neq BA
    , order matters) =
    nPr=n!(nr)!nPr = \frac{n!}{(n-r)!}
  • Combination (
    AB=BAAB = BA
    , order does not matter) =
    nCr=n!r!(nr)!nCr = \frac{n!}{r!(n-r)!}

Joint & Conditional Probability

  • Joint Probability is the probability of two independent events occurring:
    P(AB)=P(A)P(B)P(A \cap B) = P(A)*P(B)
  • Conditional probability tells the probability of
    BB
    given
    AA
    has occurred, it allows us to account for information we have about our system of interest:
    P(BA)=P(AB)P(A)P(B|A) = \frac{P(A \cap B)}{P(A)}
If both are same, then A and B are independent events.

Bayes' Theorem

Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical formula for determining conditional probability. Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring.
Baye's Theorem
An easy way of remembering it is using the below example:
What is the probability of a fruit being banana given that it is long and yellow?
P(BananaLong,Yellow)=P(LongBanana)P(YellowBanana)P(Banana)P(Long)P(Yellow)P(Banana|Long,Yellow) = \frac{P(Long|Banana)*P(Yellow|Banana)*P(Banana)}{P(Long)*P(Yellow)}

MAP vs MLE

The Maximum Aposteriori Probability (MAP) Estimation of the random variable y, given we have observed IID
(x1,x2,x3,...)(x_1, x_2, x_3, ... )
here we try to accommodate our prior knowledge when estimating. In Maximum Likelihood Estimation (MLE), we assume we don’t have any prior knowledge of the quantity being estimated.
MAP vs MLE

Questions

[UBER] Dice in increasing order
We throw 3 dice one by one. What is the probability that we obtain 3 points in strictly increasing order?
Answer
Suppose we get
44
in the first roll then,
Total Probability =
P(4)P(5)P(6)=1/61/61/6=1/216P(4) * P(5) * P(6) = 1/6 * 1/6 * 1/6 = 1/216
Similarly for
33
,
P(3)P(4,54,65,6)=1/6(1/36+1/36+1/36)=3/216P(3) * P(4,5 | 4,6 | 5,6) = 1/6 * (1/36 + 1/36 + 1/36) = 3/216
Taking into consideration
P(1)P(1)
and
P(2)P(2)
we have the total as
=10/216+6/216+3/216+1/216=20/216= 10/216 + 6/216 + 3/216 + 1/216 = 20/216
[LINKEDIN] Cards in increasing order
Imagine a deck of 500 cards numbered from 1 to 500. If all the cards are shuffled randomly and you are asked to pick three cards, one at a time, what's the probability of each subsequent card being larger than the previous drawn card?
Answer
It is actually easy to solve this if you think on it a little. Let's pick any
33
cards, now if you rearrange it there will only be
11
way in which each subsequent card is larger the previous card. So, a total of
66
**** ways to arrange the cards out of which only
11
is valid. So the result is
16\frac{1}{6}
.
[STATE FARM] Cards without replacement
Pull 2 cards from a deck without replacement what is probability that both are of different colors.
There can be many variants to this question.
Answer
Source
Here it is not specified which color the cards should be - they can be either red or black.
The probability that the first card drawn is either red or black is
11
since these two are the only possible outcomes.
After the first draw, the total number of cards remaining in the pack is
5151
, out of which
2525
cards are of the same colour as that of the card that is already drawn. Hence the probability of drawing a card of the same colour as the first one is
2551\frac{25}{51}
.
⇒ The probability of drawing two cards of the same colour is
12551=25511*\frac{25}{51}=\frac{25}{51}
.
Another approach to this can be:
Two cards of a particular color can be drawn in
C(26,2)C(26,2)
ways.
⇒ Two cards of either red or black can be drawn in
2×C(26,2)2×C(26,2)
ways.
The total number of ways of drawing any two cards from the pack is
C(52,2)C(52,2)
.
⇒ The probability of drawing two cards of the same colour is
2×C(26,2)C(52,2)=2×26!2!×24!2!×50!52!=2551\frac{2×C(26,2)}{C(52,2)} = \frac{2×26!}{2!×24!}\frac{2!×50!}{52!}=\frac{25}{51}
[FACEBOOK] N Dice
Suppose you're playing a dice game. You have 2 dice.
  • What's the probability of rolling at least one 3?
  • What's the probability of rolling at least one 3 given N die?
Answer
P(at least 1 three) = P(exactly 1 three) + P(2 three) = 1/6 * 5/6 + 5/6 * 1/6 + 1/36 = 11/36
Solution received from the community via mail
To count the number of ways to throw at least
11
three for
NN
dice, you need to sum overall
kk
,
1<k<=N1<k<=N
, where
kk
is the number of threes you throw. For each
kk
, there are
C(N,k)C(N,k)
possible combinations of dice that are three. For each of these combinations, there are
55
possible values for the other
NKN-K
dice. So, the number of ways to throw
kk
threes with
NN
dice is
5(Nk)C(N,k)5^{(N-k)}*C(N,k)
.
The total sum over
1<k<=N1<k<=N
is
k=1N5(Nk)(n k )=6N5N\sum_{k=1}^N 5^{(N-k)} \begin{pmatrix} n\ k\ \end{pmatrix} = 6^N-5^N
. Since there are
6N6^N
ways to throw the dice, the probability is
(6N5N)/6N=1(5/6)N(6^N - 5^N)/6^N = 1 - (5/6)^N
.
There is a simpler way to solve this problem: calculate the number of ways to not throw any threes, then subtract this number from the total number of ways to throw the dice. For
N=2N=2
, this is
1(5/6)2=125/36=11/361 - (5/6)^2 = 1 - 25/36 = 11/36
. For
NN
, it is
1(5/6)N1 - (5/6)^N
You can see that this is equivalent to the probability calculated using the above sum:
1(5/6)N1 - (5/6)^N
.
Tip:`` ``Check the general case for N=2 and see if the numbers match
[FACEBOOK] 3 Zebras
Three zebras are chilling in the desert. Suddenly a lion attacks.
Each zebra is sitting on a corner of an equally length triangle. Each zebra randomly picks a direction and only runs along the outline of the triangle to either edge of the triangle.
What is the probability that none of the zebras collide?
Answer
Each zebra has 2 options of travel: clockwise or anticlockwise. So a total of
222=82*2*2 = 8
options.
Out of this only way in which they donot collide is if all of them travel clockwise or anticlockwise. So a total of
22
.
Therefore the probability of no collision
=2/8=25= 2/8 = 25%\
[POSTMATES] Four Person Elevator
There are four people on the ground floor of a building that has five levels not including the ground floor. They all get into the same elevator.
If each person is equally likely to get on any floor and they leave independently of each other, what is the probability that no two passengers will get off at the same floor?
Answer
The number of ways to assigning five floors to four different people is to get the total sample space. In this case it would be
55555 * 5 * 5 * 5
.
The number of ways to assign five floors to four people without repetition of floors is
54325 * 4 * 3 * 2
because for the first passenger you have five different options. The second person has four, and so on. Note that this number counts all possible orders between passengers as well.
The result is then
54325555=0.192\frac{5 * 4 * 3 * 2}{5 * 5 * 5 * 5} = 0.192
[AMAZON] Found Item
Amazon has a warehouse system where items on the website are located at different distribution centers across a city. Let's say in one example city, the probability that a specific item X at location A is 0.6, and at location B the probability is 0.8.
Given you're a customer in this example city and the items are only found on the website if they exist in the distribution centers, what is the probability that the item X would be found on Amazon's website?
Answer
Probability of the item being present=
11-
p(item NOT in A AND NOT in B)
=1(0.40.2)=0.92= 1-(0.4*0.2)=0.92
[SPOTIFY] Max Dice Roll
A fair die is rolled
nn
times. What is the probability that the largest number rolled is
rr
, for each
rr
in
1..61..6
?
Answer If
r(1r6)r(1≤r≤6)
is the largest number you have allowed for your
nn
rolls, then you forbid any number larger than
rr
. That is, you forbid
6r6−r
values. The probability that your single roll does not show any of these
6r6−r
values is
6r6\frac{6−r}{6}
and the probability that this happens each time during a series of
nn
rolls is the obviously
(6r6)n(\frac{6−r}{6})^n
There is a subtle nuance to this problem, in the above solution we have assumed the
max<=rmax<=r
which is different from
max=rmax=r
or in other words if
r=3r=3
, the above solution gives results for
r=1,2,3r= 1,2,3
. The solution of
r=3r=3
is a little more involved:
Let's take
r=3r=3
, for
nn
die rolls we should have atleast one
rr
. The Probability of that is:
P(r=3)P(r=3)
=P(of getting all n values as 1,2,3P(atleast one 3))= P(\text{of getting all n values as 1,2,3} * P(\text{atleast one 3}))
[FACEBOOK] Labeling Content
Facebook has a content team that labels pieces of content on the platform as spam or not spam.
9090%\
of them are diligent raters and will label
2020%\
of the content as spam and
8080%\
as non-spam. The remaining
1010%\
are non-diligent raters and will label
00%\
of the content as spam and
100100%\
as non-spam. Assume the pieces of content are labeled independently from one another, for every rater. Given that a rater has labeled
44
pieces of content as good, what is the probability that they are a diligent rater?
Answer
This can be solved using Baye's theorem:
  • Not Spam =
    NSNS
  • Spam =
    SS
  • Diligent =
    DD
  • NotDiligent =
    NDND
P(DNS,NS,NS,NS)=P(NS,NS,NS,NSD)P(D)P(NS,NS,NS,NSD)P(D)+P(NS,NS,NS,NSND)P(ND)P(D|NS, NS, NS, NS) = \frac{P(NS, NS, NS, NS|D)*P(D)}{P(NS, NS, NS, NS|D)*P(D)+P(NS, NS, NS, NS|ND)*P(ND)}
P(DNS,NS,NS,NS)=0.840.90.840.9+140.1P(D|NS, NS, NS, NS) = \frac{0.8^4*0.9}{0.8^4*0.9+1^4*0.1}
= ~
0.7870.787
[FACEBOOK] Raining
You are about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call
33
random friends of yours who live there and ask each independently if it's raining. Each of your friends has a
2/32/3
chance of telling you the truth and a
1/31/3
chance of messing with you by lying. All
33
friends tell you that "Yes" it is raining.
What is the probability that it's actually raining in Seattle?
Answer
Even though the problem is straightforward one can interpret the problem in many ways. Taking a Bayesian approach is probably appropriate in a real world sense, but if you are told by the interviewer you have no ability to determine the priors, you can't use Bayesian. Check this thread for a detailed discussion on this problem.
For it to be not raining, all friends must be lying. Therefore, the solution must be the inverse of the probability that all three are "messing with you."
(1/3)(1/3)(1/3)=1/27(1/3)*(1/3)*(1/3)=1/27
(3.7% chance they are all lying).
Since there is only a
3.73.7%\
chance all three friends are messing with you, there is a
96.396.3%\
chance it is raining.
[MICROSOFT] First to Six
Amy and Brad take turns in rolling a fair six-sided die. Whoever rolls a
66
first wins the game. Amy starts by rolling first.
What's the probability that Amy wins?
Answer
Amy can win on the first roll, third roll, fifth roll, and so on.
Probability of Amy winning in the first roll = P(six rolled by her) =
1/61/6
Probability of Amy winning in the third roll = P(six NOT rolled by her in first try) * P(six NOT rolled by Brad in first try) * P(six rolled by her in 2nd try) =
(5/6)(5/6)(1/6)=1/6(5/6)2(5/6) * (5/6) * (1/6) = 1/6 * (5/6)^2
Similarly, the probability of Amy winning in the fifth roll =
(1/6)(5/6)4(1/6) * (5/6)^4
Similarly, the probability of Amy winning in the seventh roll =
(1/6)(5/6)6(1/6) * (5/6)^6
Hence, total probability of Amy winning = Sum of all such events =
(1/6)+(1/6(5/6)2)+(1/6(5/6)4)+(1/6(5/6)6)+...(1/6) + (1/6 * (5/6)^2) + (1/6 * (5/6)^4) + (1/6 * (5/6)^6) + ...
The sum of such an infinite Geometric Progression series is =
a1r=(1/6)/(125/36)=(1/6)/(11/36)=6/11\frac{a}{1-r} = (1/6) / (1 - 25/36) = (1/6) / (11/36) = 6/11
Hence, probability of Amy winning in any of her turns =
6/116/11
[GOOGLE][FACEBOOK] Double Sided Coin
A jar has
10001000
coins, of which
999999
are fair and
11
is double headed. Pick a coin at random, and toss it
1010
times. Given that you see
1010
heads, what is the probability that the next toss of that coin is also a head?
Answer
There are two ways of choosing the coin. One is to pick a fair coin and the other is to pick the one with two heads.
  • Probability of selecting fair coin
    =999/1000=0.999= 999/1000 = 0.999
  • Probability of selecting unfair coin
    =1/1000=0.001= 1/1000 = 0.001
Selecting
1010
heads in a row = Selecting fair coin * Getting 10 heads + Selecting an unfair coin
  • P (A)
    =0.999(1/2)5=0.999(1/1024)=0.000976= 0.999 * (1/2)^5 = 0.999 * (1/1024) = 0.000976
  • P (B)
    =0.0011=0.001= 0.001 * 1 = 0.001
  • P( A / A + B )
    =0.000976/(0.000976+0.001)=0.4939= 0.000976 / (0.000976 + 0.001) = 0.4939
  • P( B / A + B )
    =0.001/0.001976=0.5061= 0.001 / 0.001976 = 0.5061
Probability of selecting another head
=P(A/A+B)0.5+P(B/A+B)1=0.49390.5+0.5061=0.7531= P(A/A+B) * 0.5 + P(B/A+B) * 1 = 0.4939 * 0.5 + 0.5061 = 0.7531
[GOOGLE] Forming a triangle
A 10 feet pole is randomly cut into 3 pieces. What is the probability that exactly form a triangle?
Answer
Source
Suppose the pole is
ABAB
and there are two points
PP
and
QQ
such that
AP=xAP = x
and
PQ=yPQ = y
, so that
QB=10xyQB = 10 – x – y
as we know that sum of two sides of a triangle is greater than the 3rd side. Hence
x+y>(10xy)x + y > (10 – x – y)
or
x+y>5x + y > 5
y+(10xy)>xy + (10 – x – y) > x
or
x<5x < 5
(10xy)>y(10 – x – y) > y
or
y<5y < 5
Also, we know that all the parts of pole must be greater than
00
,
or
x>0,y>0,10xy>0orx>0,y>0,x+y<10x > 0, y > 0, 10 – x – y > 0 or x > 0, y > 0, x + y < 10
Plotting the lines
x+y=10x+y=5,x=5,y=5x + y = 10 x + y = 5, x = 5, y = 5
. Now favorable area is the area of the middle red shaded triangle.
Required probability
=1/4= 1/4
[LYFT] Flips until two heads
What is the expected number of coin flips needed to get two consecutive heads?
Answer
Source:
Let's first assume
xx
is the expected number of coin flips required for getting two heads in a row. Now:
  • If the first flip turns out to be tail you need
    xx
    more flips since the events are independent. Probability of the event
    1/21/2
    . Since
    11
    flip was wasted total number of flips required
    (1+x)(1+x)
    .
  • If the first flip becomes head, but the second one is tail(
    HTHT
    ) -
    22
    flips are wasted, here total number flips required would be
    (2+x)(2+x)
    . Probability of
    HTHT
    out of
    HH,HT,TH,TTHH, HT, TH, TT
    is
    (1/4)(1/4)
  • The best case, the first two flips turn out to be heads both(
    HHHH
    ). Probability,
    1/41/4
    i.e.
    HHHH
    out of
    HH,HT,TH,TTHH, HT, TH, TT
    . No of flips required
    22
    .
So from the above scenarios,
x=1/2(1+x)+1/4(2+x)+(1/4)2x = 1/2(1+x) + 1/4(2+x) + (1/4 )* 2
=1/2[(1+x)+1/2(2+x)+1]= 1/2 [ (1+x) + 1/2(2+x) + 1 ]
=1/2[1+x+1+x/2+1]= 1/2 [ 1 + x + 1 + x/2 + 1 ]
x/4=3/2x / 4 = 3/2
x=6x = 6
So the expected number of flips would be
66
[LYFT] Number of cards before an ace
How many cards would you expect to draw from a standard deck before seeing the first ace?
Answer
Source:
Let
XX
represent the number of cards that are turned up to produce the
1st1^{st}
ace. For this problem, we cannot apply the Geometric Distribution because cards are sampled without replacement.
Instead, we begin by considering the probabilities of drawing the
1st1^{st}
ace on the
1st1^{st}
card,
2nd2^{nd}
card, and so on:
P(1stcard)=452P(1^{st} card)= \frac{4}{52}
P(2ndcard)=4852451P(2^{nd} card)= \frac{48}{52}\frac{4}{51}
P(3rdcard)=48524751450P(3^{rd} card)= \frac{48}{52}\frac{47}{51}\frac{4}{50}
P(nthcard)=448!(49x)!(52x)!(52)!P(n^{th} card)= 4* \frac{48!}{(49-x)!}\frac{(52-x)!}{(52)!}
With this we can calculate the average number of cards by applying the definition of expected value:
E[X]=x=1524x48!(49x)!(52x)!(52)!=535=10.6E[X]= \sum\limits_{x=1}^{52} 4x \frac{48!}{(49-x)!}\frac{(52-x)!}{(52)!} = \frac{53}{5} = 10.6
[FACEBOOK] Ad Raters
Let’s say we use people to rate ads.
There are two types of raters. Random and independent from our point of view:
80% of raters are careful and they rate an ad as good (60% chance) or bad (40% chance). 20% of raters are lazy and they rate every ad as good (100% chance).
  • Suppose we have 100 raters each rating one ad independently. What’s the expected number of good ads?
  • Now suppose we have 1 rater rating 100 ads. What’s the expected number of good ads?
  • Suppose we have 1 ad, rated as bad. What’s the probability the rater was lazy?
Answer
  • 100 raters are divided into 2 groups according to probabilities:
2020
lazy raters:
100100%\
good ads ->
2020
good ads;
8080
careful raters:
6060%\
good ads ->
800.6=4880 * 0.6 = 48
good ads. Total
6868
good ads.
  • There could be 2 cases:
Random rater is careful with probability of
0.8:0.80.6=0.480.8: 0.8 * 0.6 = 0.48
- probability or rating good ad Random rater is lazy with probability of
0.2:0.21=0.20.2: 0.2 * 1 = 0.2
- probability or rating good ad Total probability of rating ad as good is
0.48+0.2=0.680.48+0.2 = 0.68
. The expected amount of good rates
1000.68=68100*0.68 = 68
.
  • It’s
    00
    probability that the rater is lazy because lazy raters always rate ads as good.
[INTUIT] Chances of Winning
Let’s say you can play a coin flipping guessing game either once or a 2 out of 3 game. What is the best strategy for winning?
Answer
Scenario 1: Single Coin Flip Game
  • In this game, you have a 50% chance of winning on each flip because there are two possible outcomes (heads or tails), and you need one specific outcome (either heads or tails) to win.
Probability of winning a single coin flip = 0.5 (50%)
Scenario 2: 2 out of 3 Coin Flip Game
  • In this game, you need to win at least two out of three coin flips to win the game. To calculate this probability, we can use the binomial probability formula.
Probability of winning two out of three coin flips:
  • Calculate the probability of winning exactly two out of three flips:
    • P(Win 2 out of 3) = C(3, 2) * (0.5)^2 * (0.5)^(3-2) = 3 * 0.25 * 0.5 = 0.375
  • Calculate the probability of winning all three flips:
    • P(Win 3 out of 3) = (0.5)^3 = 0.125
Now, sum these probabilities to get the overall probability of winning the 2 out of 3 game:
Probability of winning the 2 out of 3 game = P(Win 2 out of 3) + P(Win 3 out of 3) = 0.375 + 0.125 = 0.5 (50%)
So, the probability of winning in either a single coin flip game or a 2 out of 3 coin flip game is 50% for both scenarios.