Mathematical Probability Theory

🎲Mathematical Probability Theory Unit 1 – Probability Theory Fundamentals

Probability theory fundamentals form the backbone of mathematical analysis of random events. These concepts, including sample spaces, probability axioms, and distributions, provide a framework for quantifying uncertainty and making predictions in various fields. Expected values, conditional probability, and independence are crucial tools for understanding complex systems. By mastering these concepts, you'll be equipped to tackle real-world problems involving randomness, from finance to scientific research.

Study Guides for Unit 1

1.1

Basic concepts and definitions

4 min read

1.2

Sample spaces and events

3 min read

1.3

Axioms of probability

3 min read

1.4

Conditional probability and independence

5 min read

Key Concepts and Definitions

Probability measures the likelihood of an event occurring and is expressed as a number between 0 and 1 (inclusive)
Sample space ( $\Omega$ $Ω$ ) represents the set of all possible outcomes of an experiment or random process
- For example, when rolling a fair six-sided die, the sample space is $\Omega = \{1, 2, 3, 4, 5, 6\}$
An event is a subset of the sample space and represents a collection of outcomes
Random variables are functions that assign numerical values to the outcomes in a sample space
Probability distributions describe the likelihood of different outcomes for a random variable
- Discrete probability distributions (probability mass functions) are used for random variables with countable outcomes
- Continuous probability distributions (probability density functions) are used for random variables with uncountable outcomes
Expected value (mean) of a random variable is the average value obtained over a large number of trials
Variance and standard deviation measure the dispersion or spread of a probability distribution around its expected value

Probability Axioms and Properties

Axiom 1 (Non-negativity): The probability of any event A is non-negative, i.e., $P(A) \geq 0$
Axiom 2 (Normalization): The probability of the entire sample space is 1, i.e., $P(\Omega) = 1$
Axiom 3 (Countable Additivity): For any countable sequence of mutually exclusive events $A_1, A_2, \ldots$ , the probability of their union is the sum of their individual probabilities, i.e., $P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i)$
Complement Rule: For any event A, $P(A^c) = 1 - P(A)$ , where $A^c$ is the complement of A
Addition Rule: For any two events A and B, $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ $P (A \cup B) = P (A) + P (B) - P (A \cap B)$
- If A and B are mutually exclusive, then $P(A \cup B) = P(A) + P(B)$
Multiplication Rule: For any two events A and B, $P(A \cap B) = P(A) \cdot P(B|A) = P(B) \cdot P(A|B)$ $P (A \cap B) = P (A) \cdot P (B ∣ A) = P (B) \cdot P (A ∣ B)$ , where $P(B|A)$ $P (B ∣ A)$ is the conditional probability of B given A
- If A and B are independent, then $P(A \cap B) = P(A) \cdot P(B)$

Sample Spaces and Events

A sample space can be discrete (finite or countably infinite) or continuous (uncountably infinite)
- Examples of discrete sample spaces: coin flips, dice rolls, card draws
- Examples of continuous sample spaces: time between arrivals, weight of a randomly selected object
Events can be simple (single outcome) or compound (multiple outcomes)
The empty set ( $\emptyset$ ) and the sample space ( $\Omega$ ) are always events
The complement of an event A ( $A^c$ ) contains all outcomes in the sample space that are not in A
Two events A and B are mutually exclusive if their intersection is empty, i.e., $A \cap B = \emptyset$
The union of two events A and B ( $A \cup B$ ) contains all outcomes that are in either A or B (or both)
The intersection of two events A and B ( $A \cap B$ ) contains all outcomes that are in both A and B

Probability Distributions

Probability mass function (PMF) for a discrete random variable X is denoted by $p_X(x)$ $p_{X} (x)$ and gives the probability that X takes on a specific value x
- Properties of a PMF: non-negative, sum over all possible values equals 1
Cumulative distribution function (CDF) for a random variable X is denoted by $F_X(x)$ $F_{X} (x)$ and gives the probability that X is less than or equal to a specific value x
- Properties of a CDF: non-decreasing, right-continuous, $\lim_{x \to -\infty} F_X(x) = 0$ , $\lim_{x \to \infty} F_X(x) = 1$
Probability density function (PDF) for a continuous random variable X is denoted by $f_X(x)$ $f_{X} (x)$ and is used to calculate probabilities for intervals of values
- Properties of a PDF: non-negative, area under the curve equals 1
Common discrete probability distributions: Bernoulli, Binomial, Poisson, Geometric
Common continuous probability distributions: Uniform, Normal (Gaussian), Exponential, Gamma

Conditional Probability and Independence

Conditional probability of an event B given an event A is denoted by $P(B|A)$ and is defined as $P(B|A) = \frac{P(A \cap B)}{P(A)}$ , where $P(A) > 0$
Bayes' Theorem: For events A and B with $P(B) > 0$ $P (B) > 0$ , $P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}$ $P (A ∣ B) = \frac{P ( B ∣ A ) \cdot P ( A )}{P ( B )}$
- Useful for updating probabilities based on new information or evidence
Two events A and B are independent if $P(A \cap B) = P(A) \cdot P(B)$ or equivalently, $P(A|B) = P(A)$ and $P(B|A) = P(B)$
Conditional independence: Events A and B are conditionally independent given an event C if $P(A \cap B|C) = P(A|C) \cdot P(B|C)$
Chain Rule: For events $A_1, A_2, \ldots, A_n$ , $P(A_1 \cap A_2 \cap \ldots \cap A_n) = P(A_1) \cdot P(A_2|A_1) \cdot P(A_3|A_1 \cap A_2) \cdot \ldots \cdot P(A_n|A_1 \cap A_2 \cap \ldots \cap A_{n-1})$

Random Variables and Expected Values

A random variable is a function that assigns a numerical value to each outcome in a sample space
The expected value (mean) of a discrete random variable X is denoted by $E[X]$ and is calculated as $E[X] = \sum_{x} x \cdot p_X(x)$
The expected value of a continuous random variable X is denoted by $E[X]$ and is calculated as $E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$
Linearity of expectation: For random variables X and Y and constants a and b, $E[aX + bY] = aE[X] + bE[Y]$
The variance of a random variable X is denoted by $Var(X)$ $Va r (X)$ and is defined as $Var(X) = E[(X - E[X])^2]$ $Va r (X) = E [(X - E [X])^{2}]$
- Can also be calculated using the formula $Var(X) = E[X^2] - (E[X])^2$
The standard deviation of a random variable X is denoted by $\sigma_X$ and is the square root of the variance, i.e., $\sigma_X = \sqrt{Var(X)}$
Chebyshev's Inequality: For a random variable X with mean $\mu$ and standard deviation $\sigma$ , and any $k > 0$ , $P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$

Applications and Problem-Solving Techniques

Identify the sample space and relevant events for the given problem
Determine whether to use discrete or continuous probability distributions based on the nature of the random variable
Apply the appropriate probability rules, axioms, and properties to calculate the desired probabilities
Utilize conditional probability and Bayes' Theorem when dealing with problems involving updated information or dependent events
Calculate expected values, variances, and standard deviations to characterize the behavior of random variables
Use linearity of expectation to simplify calculations involving multiple random variables
Apply Chebyshev's Inequality to bound the probability of a random variable deviating from its mean by a certain amount
Solve problems involving common probability distributions by identifying their parameters and using their properties

Advanced Topics and Extensions

Moment-generating functions (MGFs) are used to uniquely characterize probability distributions and calculate moments
- The MGF of a random variable X is defined as $M_X(t) = E[e^{tX}]$
Joint probability distributions describe the probabilities of multiple random variables simultaneously
- Joint PMF for discrete random variables X and Y: $p_{X,Y}(x,y)$
- Joint PDF for continuous random variables X and Y: $f_{X,Y}(x,y)$
Marginal probability distributions are obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variable(s)
Conditional probability distributions describe the probabilities of one random variable given the value of another
Covariance measures the linear relationship between two random variables X and Y and is defined as $Cov(X,Y) = E[(X - E[X])(Y - E[Y])]$
Correlation coefficient measures the strength and direction of the linear relationship between two random variables X and Y and is defined as $\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}$
Stochastic processes are collections of random variables indexed by time or space, such as Markov chains and Poisson processes
Limit theorems, such as the Law of Large Numbers and the Central Limit Theorem, describe the behavior of random variables and their sums as the number of variables increases