๐Ÿ“ˆTheoretical Statistics Unit 3 โ€“ Expectation and moments

Expectation and moments are fundamental concepts in probability theory and statistics. They provide powerful tools for analyzing random variables and their distributions, allowing us to quantify average values, spread, and other important characteristics. From basic definitions to advanced applications, this topic covers a wide range of ideas. We'll explore probability foundations, random variables, moment generating functions, and their roles in statistical inference, giving you a solid understanding of these essential concepts.

Key Concepts and Definitions

  • Expectation represents the average value of a random variable over its entire range of possible outcomes
  • Moments measure different aspects of a probability distribution, such as central tendency, dispersion, and shape
  • First moment is the mean or expected value, denoted as $\mathbb{E}[X]$ for a random variable $X$
  • Second moment is the expected value of the squared random variable, $\mathbb{E}[X^2]$, related to the variance
    • Variance measures the spread of a distribution around its mean, defined as $\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]$
  • Higher moments (third, fourth, etc.) capture additional characteristics of a distribution, such as skewness and kurtosis
  • Moment generating functions (MGFs) are a tool for generating moments of a random variable through differentiation
  • MGFs uniquely characterize a probability distribution and can be used to derive its properties

Probability Foundations

  • Probability is a measure of the likelihood of an event occurring, ranging from 0 (impossible) to 1 (certain)
  • Sample space $\Omega$ is the set of all possible outcomes of a random experiment
  • Events are subsets of the sample space, and the probability of an event $A$ is denoted as $P(A)$
  • Probability axioms: non-negativity ($P(A) \geq 0$), normalization ($P(\Omega) = 1$), and countable additivity ($P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i)$ for disjoint events $A_i$)
  • Conditional probability $P(A|B)$ is the probability of event $A$ given that event $B$ has occurred, defined as $P(A|B) = \frac{P(A \cap B)}{P(B)}$ when $P(B) > 0$
  • Independence of events: Two events $A$ and $B$ are independent if $P(A \cap B) = P(A)P(B)$, meaning the occurrence of one does not affect the probability of the other

Random Variables and Distributions

  • A random variable is a function that assigns a numerical value to each outcome in a sample space
  • Discrete random variables have countable outcomes (integers), while continuous random variables have uncountable outcomes (real numbers)
  • Probability mass function (PMF) for a discrete random variable $X$ is denoted as $p_X(x) = P(X = x)$, giving the probability of $X$ taking a specific value $x$
  • Probability density function (PDF) for a continuous random variable $X$ is denoted as $f_X(x)$, satisfying $P(a \leq X \leq b) = \int_a^b f_X(x) dx$
  • Cumulative distribution function (CDF) $F_X(x) = P(X \leq x)$ gives the probability of a random variable being less than or equal to a given value $x$
    • For discrete random variables, $F_X(x) = \sum_{y \leq x} p_X(y)$
    • For continuous random variables, $F_X(x) = \int_{-\infty}^x f_X(y) dy$
  • Common discrete distributions include Bernoulli, Binomial, Poisson, and Geometric
  • Common continuous distributions include Uniform, Normal (Gaussian), Exponential, and Beta

Expectation: Basics and Properties

  • Expectation is a linear operator, meaning $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$ for constants $a$ and $b$ and random variables $X$ and $Y$
  • For a discrete random variable $X$ with PMF $p_X(x)$, the expectation is calculated as $\mathbb{E}[X] = \sum_x x \cdot p_X(x)$
  • For a continuous random variable $X$ with PDF $f_X(x)$, the expectation is calculated as $\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$
  • Law of the unconscious statistician (LOTUS): For a function $g(X)$ of a random variable $X$, $\mathbb{E}[g(X)] = \sum_x g(x) \cdot p_X(x)$ (discrete case) or $\mathbb{E}[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f_X(x) dx$ (continuous case)
  • Expectation of a constant: $\mathbb{E}[c] = c$ for any constant $c$
  • Expectation of a sum: $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$ for random variables $X$ and $Y$
  • Expectation of a product: $\mathbb{E}[XY] = \mathbb{E}[X] \cdot \mathbb{E}[Y]$ for independent random variables $X$ and $Y$

Moments and Their Significance

  • Raw moments: The $k$-th raw moment of a random variable $X$ is defined as $\mathbb{E}[X^k]$
    • First raw moment is the mean, $\mathbb{E}[X]$
    • Second raw moment is $\mathbb{E}[X^2]$, used to calculate variance
  • Central moments: The $k$-th central moment of a random variable $X$ is defined as $\mathbb{E}[(X - \mathbb{E}[X])^k]$
    • First central moment is always 0
    • Second central moment is the variance, $\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]$
  • Standardized moments: The $k$-th standardized moment of a random variable $X$ is defined as $\mathbb{E}[(\frac{X - \mathbb{E}[X]}{\sqrt{\text{Var}(X)}})^k]$
    • Third standardized moment measures skewness, the asymmetry of a distribution
    • Fourth standardized moment measures kurtosis, the heaviness of the tails of a distribution
  • Moments can be used to characterize and compare different probability distributions
  • Higher moments provide additional information about the shape and properties of a distribution

Moment Generating Functions

  • The moment generating function (MGF) of a random variable $X$ is defined as $M_X(t) = \mathbb{E}[e^{tX}]$, where $t$ is a real number
  • MGFs uniquely determine a probability distribution, meaning two random variables with the same MGF have the same distribution
  • The $k$-th moment of $X$ can be found by differentiating the MGF $k$ times and evaluating at $t=0$: $\mathbb{E}[X^k] = M_X^{(k)}(0)$
  • MGFs can be used to derive the mean, variance, and other properties of a distribution
  • For independent random variables $X$ and $Y$, the MGF of their sum is the product of their individual MGFs: $M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$
  • MGFs can be used to prove various results in probability theory, such as the Central Limit Theorem

Applications in Statistical Inference

  • Moments and MGFs play a crucial role in parameter estimation and hypothesis testing
  • Method of moments estimators are obtained by equating sample moments to population moments and solving for the parameters
    • For example, the sample mean $\bar{X}$ is an estimator for the population mean $\mu$
  • Maximum likelihood estimation (MLE) is another common approach, which finds the parameter values that maximize the likelihood function
  • MGFs can be used to derive the sampling distributions of estimators and test statistics
  • Moments and MGFs are also used in Bayesian inference to specify prior and posterior distributions for parameters
  • Higher moments, such as skewness and kurtosis, can be used to assess the normality assumption in various statistical tests

Advanced Topics and Extensions

  • Multivariate moments and MGFs extend the concepts to random vectors and joint distributions
  • Conditional expectation $\mathbb{E}[X|Y]$ is the expected value of $X$ given the value of another random variable $Y$
  • Moment inequalities, such as Markov's inequality and Chebyshev's inequality, provide bounds on the probability of a random variable deviating from its mean
  • Characteristic functions, defined as $\phi_X(t) = \mathbb{E}[e^{itX}]$ for real $t$, are another tool for uniquely characterizing distributions
  • Cumulants are an alternative to moments, with the $k$-th cumulant defined as the $k$-th derivative of the logarithm of the MGF evaluated at 0
  • Empirical moments and MGFs can be used to estimate population moments and MGFs from sample data
  • Robust moments, such as trimmed means and winsorized means, are less sensitive to outliers and heavy-tailed distributions