# Probability Basics

Probability theory is the mathematical foundation of statistical inference, which is indispensable for analyzing data affected by chance, and thus essential for data scientists.

Last updated

Probability theory is the mathematical foundation of statistical inference, which is indispensable for analyzing data affected by chance, and thus essential for data scientists.

Last updated

Probability theory is the mathematical framework that allows us to analyze chance events in a logically sound manner. The probability of an event is a number indicating how likely that event will occur.

Note that when we say the probability of a head is 1/2, we are not claiming that any sequence of coin tosses will consist of exactly 50% heads. If we toss a fair coin ten times, it would not be surprising to observe 6 heads and 4 tails, or even 3 heads and 7 tails. But as we continue to toss the coin over and over again, we expect the long-run frequency of heads to get ever closer to 50%.

**In general, it is important in statistics to understand the distinction between theoretical and empirical quantities. Here, the true (theoretical) probability of a head was 1/2, but any realized (empirical) sequence of coin tosses may have more or less than exactly 50% heads.**

Common Terminologies

Set

A set, broadly defined, is a collection of objects. In the context of probability theory, we use set notation to specify compound events. For example, we can represent the event roll an even number by the set {2, 4, 6}.

Permutation and Combination

It can be surprisingly difficult to count the number of sequences or sets satisfying certain conditions. This is where **Premutation and Combination** comes in. For example, consider a bag of marbles in which each marble is a different color. If we draw marbles one at a time from the bag without replacement, how many different ordered sequences (permutations) of the marbles are possible? How many different unordered sets (combinations)?

Joint & Conditional Probability

**If both are same, then A and B are independent events.**

Bayes' Theorem

Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical formula for determining conditional probability. **Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring.**

An easy way of remembering it is using the below example:

What is the probability of a fruit being banana given that it is long and yellow?

MAP vs MLE

Questions

The **sample space is the set of all possible outcomes in the experiment**: for some dice .

Any **subset of Ω is a valid event**. we can speak of the event of rolling a 4, .

Consider the outcome of a single die roll and call it . A reasonable question one might ask is “What is the average value of ?". We define this notion of “average” as a weighted sum of outcomes. This is called the **expected value**, or expectation of , denoted by ,

If you play the game times the average value becomes

The **variance** of a random variable is a nonnegative number that summarizes on average how much differs from its mean, or expectation. The square root of the variance is called the **standard deviation.**

Permutation( , order matters) =

Combination (, order does not matter) =

Joint Probability is the probability of two independent events occurring:

Conditional probability tells the probability of given has occurred, it allows us to account for information we have about our system of interest:

The Maximum Aposteriori Probability (MAP) Estimation of the random variable y, given we have observed IID here we try to accommodate our prior knowledge when estimating. In Maximum Likelihood Estimation (MLE), we assume we don’t have any prior knowledge of the quantity being estimated.

Suppose we get in the first roll then,

Total Probability =

Similarly for ,

Taking into consideration and we have the total as

It is actually easy to solve this if you think on it a little. Let's pick any cards, now if you rearrange it there will only be way in which each subsequent card is larger the previous card. So, a total of **** ways to arrange the cards out of which only is valid. So the result is .

The probability that the first card drawn is either red or black is since these two are the only possible outcomes.

After the first draw, the total number of cards remaining in the pack is , out of which cards are of the same colour as that of the card that is already drawn. Hence the probability of drawing a card of the same colour as the first one is .

⇒ The probability of drawing two cards of the same colour is .

Two cards of a particular color can be drawn in ways.

⇒ Two cards of either red or black can be drawn in ways.

The total number of ways of drawing any two cards from the pack is .

⇒ The probability of drawing two cards of the same colour is

*Solution received from the community via*

To count the number of ways to throw at least three for dice, you need to sum overall , , where is the number of threes you throw. For each , there are possible combinations of dice that are three. For each of these combinations, there are possible values for the other dice. So, the number of ways to throw threes with dice is .

The total sum over is . Since there are ways to throw the dice, the probability is .

There is a simpler way to solve this problem: calculate the number of ways to not throw any threes, then subtract this number from the total number of ways to throw the dice. For , this is . For , it is You can see that this is equivalent to the probability calculated using the above sum: .

Each zebra has 2 options of travel: clockwise or anticlockwise. So a total of options.

Out of this only way in which they donot collide is if all of them travel clockwise or anticlockwise. So a total of .

Therefore the probability of no collision

The number of ways to assigning five floors to four different people is to get the total sample space. In this case it would be .

The number of ways to assign five floors to four people without repetition of floors is because for the first passenger you have five different options. The second person has four, and so on. Note that this number counts all possible orders between passengers as well.

The result is then

Probability of the item being present= p(item NOT in A AND NOT in B)

A fair die is rolled times. What is the probability that the largest number rolled is , for each in ?

**Answer** If is the largest number you have allowed for your rolls, then you forbid any number larger than . That is, you forbid values. The probability that your single roll does not show any of these values is and the probability that this happens each time during a series of rolls is the obviously

There is a subtle nuance to this problem, in the above solution we have assumed the which is different from or in other words if , the above solution gives results for . The solution of is a little more involved:

Let's take , for die rolls we should have atleast one . The Probability of that is:

Facebook has a content team that labels pieces of content on the platform as spam or not spam. of them are diligent raters and will label of the content as spam and as non-spam. The remaining are non-diligent raters and will label of the content as spam and as non-spam. Assume the pieces of content are labeled independently from one another, for every rater. Given that a rater has labeled pieces of content as good, what is the probability that they are a diligent rater?

Not Spam =

Spam =

Diligent =

NotDiligent =

= ~

You are about to get on a plane to Seattle. You want to know if you should bring an umbrella. You call random friends of yours who live there and ask each independently if it's raining. Each of your friends has a chance of telling you the truth and a chance of messing with you by lying. All friends tell you that "Yes" it is raining.

Even though the problem is straightforward one can interpret the problem in many ways. Taking a Bayesian approach is probably appropriate in a real world sense, but if you are told by the interviewer you have no ability to determine the priors, you can't use Bayesian. for a detailed discussion on this problem.

For it to be not raining, all friends must be lying. Therefore, the solution must be the inverse of the probability that all three are "messing with you." (3.7% chance they are all lying).

Since there is only a chance all three friends are messing with you, there is a chance it is raining.

Amy and Brad take turns in rolling a fair six-sided die. Whoever rolls a first wins the game. Amy starts by rolling first.

Probability of Amy winning in the first roll = P(six rolled by her) =

Probability of Amy winning in the third roll = P(six NOT rolled by her in first try) * P(six NOT rolled by Brad in first try) * P(six rolled by her in 2nd try) =

Similarly, the probability of Amy winning in the fifth roll =

Similarly, the probability of Amy winning in the seventh roll =

Hence, total probability of Amy winning = Sum of all such events =

The sum of such an infinite Geometric Progression series is =

Hence, probability of Amy winning in any of her turns =

A jar has coins, of which are fair and is double headed. Pick a coin at random, and toss it times. Given that you see heads, what is the probability that the next toss of that coin is also a head?

Probability of selecting fair coin

Probability of selecting unfair coin

Selecting heads in a row = Selecting fair coin * Getting 10 heads + Selecting an unfair coin

P (A)

P (B)

P( A / A + B )

P( B / A + B )

Probability of selecting another head

Suppose the pole is and there are two points and such that and , so that as we know that sum of two sides of a triangle is greater than the 3rd side. Hence

or

or

or

Also, we know that all the parts of pole must be greater than ,

or

Plotting the lines . Now favorable area is the area of the middle red shaded triangle.

Required probability

:

Let's first assume is the expected number of coin flips required for getting two heads in a row. Now:

If the first flip turns out to be tail you need more flips since the events are independent. Probability of the event . Since flip was wasted total number of flips required .

If the first flip becomes head, but the second one is tail() - flips are wasted, here total number flips required would be . Probability of out of is

The best case, the first two flips turn out to be heads both(). Probability, i.e. out of . No of flips required .

So from the above scenarios,

So the expected number of flips would be

:

Let represent the number of cards that are turned up to produce the ace. For this problem, we cannot apply the Geometric Distribution because cards are sampled without replacement.

Instead, we begin by considering the probabilities of drawing the ace on the card, card, and so on:

lazy raters: good ads -> good ads; careful raters: good ads -> good ads. Total good ads.

Random rater is careful with probability of - probability or rating good ad Random rater is lazy with probability of - probability or rating good ad Total probability of rating ad as good is . The expected amount of good rates .

It’s probability that the rater is lazy because lazy raters always rate ads as good.