Theoretical Statistics

📈Theoretical Statistics Unit 1 – Probability Theory Basics

Probability theory forms the foundation of statistical analysis, providing tools to quantify uncertainty and make predictions. This unit covers key concepts like sample spaces, events, and probability axioms, as well as techniques for calculating probabilities and working with random variables. Students learn about various probability distributions, including discrete and continuous types, and their applications. The unit also explores important concepts like expectation, variance, and covariance, which are essential for understanding and analyzing random phenomena in real-world situations.

Study Guides for Unit 1

1.1

Set theory and probability axioms

10 min read

1.2

Conditional probability

6 min read

1.3

Independence

6 min read

1.4

Bayes' theorem

7 min read

1.5

Combinatorics

8 min read

Key Concepts and Definitions

Probability measures the likelihood of an event occurring ranges from 0 (impossible) to 1 (certain)
Sample space ( $\Omega$ ) set of all possible outcomes of a random experiment
Event (E) subset of the sample space represents a specific outcome or set of outcomes
Mutually exclusive events cannot occur simultaneously in a single trial (rolling a 1 and a 2 on a fair die)
Collectively exhaustive events cover all possible outcomes in the sample space
- Example: rolling a number less than 4 and rolling a number greater than or equal to 4 on a 6-sided die
Complement of an event ( $E^c$ ) consists of all outcomes in the sample space that are not in the event E
Union of events ( $E \cup F$ ) includes all outcomes that are in either event E or event F, or both
Intersection of events ( $E \cap F$ ) includes only the outcomes that are common to both events E and F

Sample Spaces and Events

Discrete sample space has a finite or countably infinite number of possible outcomes (rolling a die, flipping a coin)
Continuous sample space has an uncountably infinite number of possible outcomes (measuring the height of a person)
Simple event consists of a single outcome from the sample space (drawing a specific card from a deck)
Compound event combines two or more simple events (drawing a red card and then a black card from a deck)
Venn diagrams visually represent relationships between events using overlapping circles or other shapes
- Example: two overlapping circles representing events A and B, with the overlapping region representing $A \cap B$
Tree diagrams illustrate all possible outcomes of a sequence of events using branches and nodes
Permutations count the number of ways to arrange a set of objects in a specific order
Combinations count the number of ways to select a subset of objects from a larger set, disregarding the order

Probability Axioms and Rules

Non-negativity axiom: $P(E) \geq 0$ for any event E in the sample space
Normalization axiom: $P(\Omega) = 1$ , where $\Omega$ is the entire sample space
Additivity axiom: for mutually exclusive events $E_1, E_2, \ldots$ , $P(\bigcup_{i=1}^{\infty} E_i) = \sum_{i=1}^{\infty} P(E_i)$
Complement rule: $P(E^c) = 1 - P(E)$ , where $E^c$ is the complement of event E
Addition rule: $P(E \cup F) = P(E) + P(F) - P(E \cap F)$ $P (E \cup F) = P (E) + P (F) - P (E \cap F)$ for any two events E and F
- Simplifies to $P(E \cup F) = P(E) + P(F)$ when E and F are mutually exclusive
Multiplication rule: $P(E \cap F) = P(E) \cdot P(F|E)$ $P (E \cap F) = P (E) \cdot P (F ∣ E)$ , where $P(F|E)$ $P (F ∣ E)$ is the conditional probability of F given E
- Simplifies to $P(E \cap F) = P(E) \cdot P(F)$ when E and F are independent events
Inclusion-exclusion principle calculates the probability of the union of multiple events by considering their intersections

Conditional Probability and Independence

Conditional probability $P(F|E)$ $P (F ∣ E)$ measures the probability of event F occurring given that event E has already occurred
- Formula: $P(F|E) = \frac{P(E \cap F)}{P(E)}$ , where $P(E) > 0$
Independence two events E and F are independent if the occurrence of one does not affect the probability of the other
- Mathematically, $P(E \cap F) = P(E) \cdot P(F)$ or equivalently, $P(F|E) = P(F)$ and $P(E|F) = P(E)$
Bayes' theorem relates conditional probabilities $P(E|F)$ $P (E ∣ F)$ and $P(F|E)$ $P (F ∣ E)$
- Formula: $P(E|F) = \frac{P(F|E) \cdot P(E)}{P(F)}$ , where $P(F) > 0$
Law of total probability expresses the probability of an event as a sum of conditional probabilities
- Formula: $P(F) = \sum_{i=1}^{n} P(F|E_i) \cdot P(E_i)$ , where $E_1, E_2, \ldots, E_n$ form a partition of the sample space
Chain rule (multiplication rule for conditional probabilities) calculates the probability of the intersection of multiple events
- Formula: $P(E_1 \cap E_2 \cap \ldots \cap E_n) = P(E_1) \cdot P(E_2|E_1) \cdot P(E_3|E_1 \cap E_2) \cdot \ldots \cdot P(E_n|E_1 \cap E_2 \cap \ldots \cap E_{n-1})$

Random Variables and Distributions

Random variable (X) function that assigns a real number to each outcome in a sample space
Discrete random variable has a countable number of possible values (number of heads in 10 coin flips)
Continuous random variable has an uncountable number of possible values within a range (time taken to complete a task)
Probability mass function (PMF) for a discrete random variable X, denoted by $p_X(x)$ $p_{X} (x)$ , gives the probability of X taking on a specific value x
- Properties: $p_X(x) \geq 0$ for all x, and $\sum_x p_X(x) = 1$
Cumulative distribution function (CDF) for a random variable X, denoted by $F_X(x)$ $F_{X} (x)$ , gives the probability of X being less than or equal to a specific value x
- Formula: $F_X(x) = P(X \leq x)$
- Properties: $0 \leq F_X(x) \leq 1$ , $\lim_{x \to -\infty} F_X(x) = 0$ , $\lim_{x \to \infty} F_X(x) = 1$ , and $F_X(x)$ is non-decreasing
Probability density function (PDF) for a continuous random variable X, denoted by $f_X(x)$ $f_{X} (x)$ , is used to calculate probabilities for ranges of values
- Properties: $f_X(x) \geq 0$ for all x, and $\int_{-\infty}^{\infty} f_X(x) dx = 1$
- Relationship with CDF: $F_X(x) = \int_{-\infty}^{x} f_X(t) dt$

Expectation and Variance

Expectation (mean) of a discrete random variable X, denoted by $E[X]$ $E [X]$ or $\mu_X$ $μ_{X}$ , is the weighted average of all possible values
- Formula: $E[X] = \sum_x x \cdot p_X(x)$
Expectation of a continuous random variable X is calculated using the PDF
- Formula: $E[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$
Linearity of expectation for random variables X and Y and constants a and b: $E[aX + bY] = aE[X] + bE[Y]$
Variance of a random variable X, denoted by $Var(X)$ $Va r (X)$ or $\sigma_X^2$ $σ_{X}^{2}$ , measures the average squared deviation from the mean
- Formula for discrete X: $Var(X) = E[(X - \mu_X)^2] = \sum_x (x - \mu_X)^2 \cdot p_X(x)$
- Formula for continuous X: $Var(X) = \int_{-\infty}^{\infty} (x - \mu_X)^2 \cdot f_X(x) dx$
Standard deviation $\sigma_X$ is the square root of the variance
Properties of variance: $Var(aX + b) = a^2Var(X)$ for constants a and b, and $Var(X + Y) = Var(X) + Var(Y)$ for independent random variables X and Y
Covariance $Cov(X, Y)$ $C o v (X, Y)$ measures the linear relationship between two random variables X and Y
- Formula: $Cov(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$
Correlation coefficient $\rho_{X,Y}$ $ρ_{X, Y}$ standardizes covariance to be between -1 and 1
- Formula: $\rho_{X,Y} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y}$

Common Probability Distributions

Bernoulli distribution models a single trial with two possible outcomes (success with probability p, failure with probability 1-p)
- PMF: $p_X(x) = p^x (1-p)^{1-x}$ for $x \in \{0, 1\}$
- Mean: $E[X] = p$ , Variance: $Var(X) = p(1-p)$
Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials
- PMF: $p_X(x) = \binom{n}{x} p^x (1-p)^{n-x}$ for $x \in \{0, 1, \ldots, n\}$
- Mean: $E[X] = np$ , Variance: $Var(X) = np(1-p)$
Poisson distribution models the number of rare events occurring in a fixed interval of time or space
- PMF: $p_X(x) = \frac{e^{-\lambda} \lambda^x}{x!}$ for $x \in \{0, 1, 2, \ldots\}$
- Mean: $E[X] = \lambda$ , Variance: $Var(X) = \lambda$
Uniform distribution models a random variable with constant probability density over a specified range
- PDF (continuous): $f_X(x) = \frac{1}{b-a}$ for $x \in [a, b]$
- Mean: $E[X] = \frac{a+b}{2}$ , Variance: $Var(X) = \frac{(b-a)^2}{12}$
Normal (Gaussian) distribution models many natural phenomena and has a bell-shaped PDF
- PDF: $f_X(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ for $x \in (-\infty, \infty)$
- Mean: $E[X] = \mu$ , Variance: $Var(X) = \sigma^2$
Exponential distribution models the time between rare events in a Poisson process
- PDF: $f_X(x) = \lambda e^{-\lambda x}$ for $x \geq 0$
- Mean: $E[X] = \frac{1}{\lambda}$ , Variance: $Var(X) = \frac{1}{\lambda^2}$

Applications and Problem-Solving Techniques

Identify the sample space and events relevant to the problem
Determine the type of probability distribution that best models the situation (discrete or continuous, specific distribution)
Use the given information to find the parameters of the distribution (success probability, mean, variance)
Apply the appropriate probability rules and formulas to calculate the desired probabilities or values
- Example: using the binomial PMF to find the probability of a specific number of successes in a fixed number of trials
Utilize conditional probability and Bayes' theorem when dealing with dependent events or updating probabilities based on new information
Recognize when to use the law of total probability to break down a complex problem into simpler subproblems
Apply the properties of expectation and variance to solve problems involving random variables
- Example: using linearity of expectation to find the mean of a sum of random variables
Interpret the results in the context of the original problem and communicate the findings clearly
Verify the reasonableness of the solution by checking if the probabilities are within the valid range [0, 1] and if the results make sense intuitively