← back to intro to probabilistic methods

intro to probabilistic methods unit 3 study guides

discrete random variables

unit 3 review

Discrete random variables are a fundamental concept in probability theory, describing outcomes that can be counted or listed. They form the basis for understanding various probabilistic scenarios, from coin flips to customer arrivals, and are essential in fields like statistics and data science. This unit covers key concepts, types of discrete random variables, probability mass functions, cumulative distribution functions, expected values, and variance. It also explores common discrete distributions like Bernoulli, binomial, geometric, Poisson, and hypergeometric, providing a solid foundation for analyzing real-world probabilistic events.

Key Concepts and Definitions

  • Discrete random variables take on a countable number of distinct values (integers, whole numbers)
  • Sample space $S$ consists of all possible outcomes of a random experiment
    • Each outcome is assigned a probability $P(x)$ where $x$ is an element of the sample space
  • Probability distribution assigns a probability to each possible value of a discrete random variable
    • Sum of all probabilities in a probability distribution equals 1
  • Independence means the occurrence of one event does not affect the probability of another event occurring
    • Example: flipping a fair coin multiple times, each flip is independent of the others
  • Mutually exclusive events cannot occur at the same time (rolling a 1 and a 6 on a single die roll)

Types of Discrete Random Variables

  • Bernoulli random variable takes on only two possible values, typically 0 and 1 (success or failure)
    • Example: a single coin flip where heads is 1 and tails is 0
  • Binomial random variable represents the number of successes in a fixed number of independent Bernoulli trials
    • Trials must have the same probability of success $p$ for each trial
  • Geometric random variable counts the number of trials needed to achieve the first success in a series of independent Bernoulli trials
  • Poisson random variable models the number of events occurring in a fixed interval of time or space
    • Events occur independently at a constant average rate $\lambda$
  • Hypergeometric random variable describes the number of successes in a fixed number of draws from a population without replacement
    • Population size, number of successes in the population, and number of draws are all fixed

Probability Mass Functions

  • Probability Mass Function (PMF) denoted as $P(X=x)$ gives the probability that a discrete random variable $X$ takes on a specific value $x$
    • $P(X=x) \geq 0$ for all $x$ in the sample space
    • $\sum_{x} P(X=x) = 1$ where the sum is taken over all possible values of $X$
  • PMF can be represented as a table, graph, or formula
    • Table lists all possible values of $X$ and their corresponding probabilities
    • Graph plots the probability $P(X=x)$ against the value $x$
  • PMF uniquely characterizes the probability distribution of a discrete random variable
  • Example: PMF for a fair six-sided die roll where $P(X=x) = \frac{1}{6}$ for $x = 1, 2, 3, 4, 5, 6$

Cumulative Distribution Functions

  • Cumulative Distribution Function (CDF) denoted as $F(x) = P(X \leq x)$ gives the probability that a discrete random variable $X$ takes on a value less than or equal to $x$
    • $F(x)$ is a non-decreasing function with $F(-\infty) = 0$ and $F(\infty) = 1$
  • CDF can be obtained from the PMF by summing the probabilities of all values less than or equal to $x$
    • $F(x) = \sum_{t \leq x} P(X=t)$ where the sum is taken over all values $t$ less than or equal to $x$
  • CDF uniquely determines the probability distribution of a discrete random variable
    • $P(a < X \leq b) = F(b) - F(a)$ for any values $a$ and $b$ with $a < b$
  • Example: CDF for a fair six-sided die roll where $F(x) = \frac{x}{6}$ for $x = 1, 2, 3, 4, 5, 6$

Expected Value and Variance

  • Expected value (mean) of a discrete random variable $X$ denoted as $E(X)$ is the weighted average of all possible values
    • $E(X) = \sum_{x} x \cdot P(X=x)$ where the sum is taken over all possible values of $X$
  • Variance of a discrete random variable $X$ denoted as $Var(X)$ measures the spread of the distribution around the mean
    • $Var(X) = E[(X - E(X))^2] = E(X^2) - [E(X)]^2$
  • Standard deviation $\sigma$ is the square root of the variance
    • Measures the average distance between each value and the mean
  • Linearity of expectation states that $E(aX + bY) = aE(X) + bE(Y)$ for constants $a$ and $b$ and random variables $X$ and $Y$
    • Holds even if $X$ and $Y$ are not independent
  • Example: for a fair six-sided die roll, $E(X) = \frac{7}{2}$ and $Var(X) = \frac{35}{12}$

Common Discrete Distributions

  • Bernoulli distribution with parameter $p$ where $P(X=1) = p$ and $P(X=0) = 1-p$
    • Mean is $p$ and variance is $p(1-p)$
  • Binomial distribution with parameters $n$ and $p$ where $P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}$ for $k = 0, 1, \ldots, n$
    • Mean is $np$ and variance is $np(1-p)$
  • Geometric distribution with parameter $p$ where $P(X=k) = (1-p)^{k-1}p$ for $k = 1, 2, \ldots$
    • Mean is $\frac{1}{p}$ and variance is $\frac{1-p}{p^2}$
  • Poisson distribution with parameter $\lambda$ where $P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$ for $k = 0, 1, 2, \ldots$
    • Mean and variance are both equal to $\lambda$
  • Hypergeometric distribution with parameters $N$, $K$, and $n$ where $P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$ for $\max(0, n+K-N) \leq k \leq \min(n, K)$
    • Mean is $\frac{nK}{N}$ and variance is $\frac{nK(N-K)(N-n)}{N^2(N-1)}$

Applications and Examples

  • Quality control inspecting a sample of products for defects (binomial distribution)
    • Each product is either defective or non-defective, and the probability of a defect is constant
  • Modeling the number of customers arriving at a store within a given time period (Poisson distribution)
    • Customers arrive independently at a constant average rate
  • Analyzing the number of trials needed to achieve a success in a series of experiments (geometric distribution)
    • Each trial is independent and has the same probability of success
  • Studying the distribution of rare events such as accidents or machine failures (Poisson distribution)
    • Events occur randomly and independently over time or space
  • Sampling without replacement from a population to estimate proportions (hypergeometric distribution)
    • Population size, sample size, and number of successes in the population are fixed

Problem-Solving Techniques

  • Identify the type of discrete random variable and its parameters based on the problem description
    • Determine if the variable follows a specific distribution (binomial, geometric, Poisson, etc.)
  • Write the probability mass function or cumulative distribution function for the given random variable
    • Use the appropriate formula based on the distribution type and parameters
  • Calculate probabilities, expected values, and variances using the PMF, CDF, or distribution-specific formulas
    • Apply the definitions and properties of expectation and variance
  • Use the linearity of expectation to find the expected value of a sum or difference of random variables
    • Simplify complex problems by breaking them down into simpler components
  • Recognize when to apply the Central Limit Theorem for approximating the distribution of a sum or average of random variables
    • Use the normal distribution as an approximation for large sample sizes
  • Solve problems involving conditional probability and independence by using the multiplication rule and Bayes' theorem
    • Update probabilities based on new information or events