# Probability Basics

Probability theory is the mathematical foundation of statistical inference, which is indispensable for analyzing data affected by chance, and thus essential for data scientists.

Probability theory is the mathematical framework that allows us to analyze chance events in a logically sound manner. The probability of an event is a number indicating how likely that event will occur.

Note that when we say the probability of a head is 1/2, we are not claiming that any sequence of coin tosses will consist of exactly 50% heads. If we toss a fair coin ten times, it would not be surprising to observe 6 heads and 4 tails, or even 3 heads and 7 tails. But as we continue to toss the coin over and over again, we expect the long-run frequency of heads to get ever closer to 50%.

**In general, it is important in statistics to understand the distinction between theoretical and empirical quantities. Here, the true (theoretical) probability of a head was 1/2, but any realized (empirical) sequence of coin tosses may have more or less than exactly 50% heads.**

## Common Terminologies

The **sample space is the set of all possible outcomes in the experiment**: for some dice $Ω = {1, 2, 3, 4, 5, 6}$.

Any **subset of Ω is a valid event**. we can speak of the event $F$ of rolling a 4, $F = {4}$.

Consider the outcome of a single die roll and call it $X$. A reasonable question one might ask is “What is the average value of $X$?". We define this notion of “average” as a weighted sum of outcomes. This is called the **expected value**, or expectation of $X$, denoted by $E(X)$,

If you play the game $\infty$ times the average value becomes $E(X)$

The **variance** of a random variable $X$ is a nonnegative number that summarizes on average how much $X$ differs from its mean, or expectation. The square root of the variance is called the **standard deviation.**

## Set

A set, broadly defined, is a collection of objects. In the context of probability theory, we use set notation to specify compound events. For example, we can represent the event roll an even number by the set {2, 4, 6}.

## Permutation and Combination

It can be surprisingly difficult to count the number of sequences or sets satisfying certain conditions. This is where **Premutation and Combination** comes in. For example, consider a bag of marbles in which each marble is a different color. If we draw marbles one at a time from the bag without replacement, how many different ordered sequences (permutations) of the marbles are possible? How many different unordered sets (combinations)?

Permutation($AB \neq BA$ , order matters) = $nPr = \frac{n!}{(n-r)!}$

Combination ($AB = BA$, order does not matter) = $nCr = \frac{n!}{r!(n-r)!}$

## Joint & Conditional Probability

Joint Probability is the probability of two independent events occurring: $P(A \cap B) = P(A)*P(B)$

Conditional probability tells the probability of $B$ given $A$ has occurred, it allows us to account for information we have about our system of interest: $P(B|A) = \frac{P(A \cap B)}{P(A)}$

**If both are same, then A and B are independent events.**

## Bayes' Theorem

Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical formula for determining conditional probability. **Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring.**

An easy way of remembering it is using the below example:

What is the probability of a fruit being banana given that it is long and yellow?

## MAP vs MLE

The Maximum Aposteriori Probability (MAP) Estimation of the random variable y, given we have observed IID $(x_1, x_2, x_3, ... )$ here we try to accommodate our prior knowledge when estimating. In Maximum Likelihood Estimation (MLE), we assume we don’t have any prior knowledge of the quantity being estimated.

## Questions

Last updated