All Study Guides Intro to Probability Unit 5
🎲 Intro to Probability Unit 5 – Discrete Random Variables & DistributionsDiscrete random variables are the building blocks of probability theory, describing outcomes that can be counted or listed. They're essential for modeling real-world scenarios like coin flips, dice rolls, and survey responses. Understanding their properties and distributions is crucial for data analysis and decision-making.
This unit covers key concepts like probability mass functions, cumulative distribution functions, expected values, and variance. It also explores common discrete distributions such as binomial, Poisson, and geometric, providing tools to analyze and predict outcomes in various fields like finance, engineering, and social sciences.
Key Concepts
Discrete random variables take on a countable number of distinct values
Probability mass function (PMF) assigns probabilities to each possible value of a discrete random variable
Cumulative distribution function (CDF) gives the probability that a discrete random variable is less than or equal to a specific value
Expected value represents the average value of a discrete random variable over many trials
Variance measures the spread or dispersion of a discrete random variable around its expected value
Calculated by taking the average of the squared differences between each value and the expected value
Independence and mutual exclusivity are important properties of discrete random variables
Independent events do not affect each other's probabilities
Mutually exclusive events cannot occur simultaneously
Discrete distributions describe the probabilities of different outcomes for a discrete random variable (coin flips, dice rolls)
Types of Discrete Random Variables
Bernoulli random variables have only two possible outcomes (success or failure)
Examples include coin flips (heads or tails) and yes/no survey questions
Binomial random variables count the number of successes in a fixed number of independent Bernoulli trials
Characterized by the number of trials n n n and the probability of success p p p
Geometric random variables represent the number of trials needed to achieve the first success in a series of independent Bernoulli trials
Poisson random variables model the number of events occurring in a fixed interval of time or space
Characterized by the average rate of occurrence λ \lambda λ
Hypergeometric random variables describe the number of successes in a fixed number of draws from a population without replacement
Negative binomial random variables represent the number of failures before a specified number of successes in a series of independent Bernoulli trials
Discrete uniform random variables assign equal probabilities to a finite set of values
Probability Mass Functions (PMF)
A probability mass function (PMF) is a function that gives the probability of a discrete random variable taking on a specific value
Denoted as P ( X = x ) P(X = x) P ( X = x ) , where X X X is the random variable and x x x is a possible value
The sum of all probabilities in a PMF must equal 1
∑ x P ( X = x ) = 1 \sum_{x} P(X = x) = 1 ∑ x P ( X = x ) = 1
PMFs can be represented as tables, formulas, or graphs
In a PMF graph, the height of each point represents the probability of the corresponding value
The PMF of a sum of independent discrete random variables is the convolution of their individual PMFs
PMFs are used to calculate probabilities, expected values, and other properties of discrete random variables
Cumulative Distribution Functions (CDF)
A cumulative distribution function (CDF) gives the probability that a discrete random variable is less than or equal to a specific value
Denoted as F ( x ) = P ( X ≤ x ) F(x) = P(X \leq x) F ( x ) = P ( X ≤ x ) , where X X X is the random variable and x x x is a possible value
CDFs are non-decreasing functions, meaning that F ( a ) ≤ F ( b ) F(a) \leq F(b) F ( a ) ≤ F ( b ) if a ≤ b a \leq b a ≤ b
The CDF can be obtained by summing the PMF values for all values less than or equal to x x x
F ( x ) = ∑ t ≤ x P ( X = t ) F(x) = \sum_{t \leq x} P(X = t) F ( x ) = ∑ t ≤ x P ( X = t )
CDFs can be used to calculate probabilities for intervals of values
P ( a < X ≤ b ) = F ( b ) − F ( a ) P(a < X \leq b) = F(b) - F(a) P ( a < X ≤ b ) = F ( b ) − F ( a )
The CDF of a discrete random variable is a step function, with jumps at each possible value
CDFs are useful for comparing different discrete distributions and determining percentiles
Expected Value and Variance
The expected value (or mean) of a discrete random variable is the weighted average of all possible values
Denoted as E ( X ) E(X) E ( X ) or μ \mu μ
Calculated by summing the product of each value and its probability: E ( X ) = ∑ x x ⋅ P ( X = x ) E(X) = \sum_{x} x \cdot P(X = x) E ( X ) = ∑ x x ⋅ P ( X = x )
The variance of a discrete random variable measures the average squared deviation from the expected value
Denoted as V a r ( X ) Var(X) Va r ( X ) or σ 2 \sigma^2 σ 2
Calculated using the formula: V a r ( X ) = E ( X 2 ) − [ E ( X ) ] 2 Var(X) = E(X^2) - [E(X)]^2 Va r ( X ) = E ( X 2 ) − [ E ( X ) ] 2
The standard deviation is the square root of the variance and has the same units as the random variable
The expected value and variance have important properties:
Linearity of expectation: E ( a X + b ) = a E ( X ) + b E(aX + b) = aE(X) + b E ( a X + b ) = a E ( X ) + b
Variance of a constant: V a r ( a ) = 0 Var(a) = 0 Va r ( a ) = 0
Variance of a sum of independent random variables: V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) Var(X + Y) = Var(X) + Var(Y) Va r ( X + Y ) = Va r ( X ) + Va r ( Y )
These properties are useful for calculating the expected value and variance of transformed or combined discrete random variables
Common Discrete Distributions
Binomial distribution: models the number of successes in a fixed number of independent Bernoulli trials
PMF: P ( X = k ) = ( n k ) p k ( 1 − p ) n − k P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} P ( X = k ) = ( k n ) p k ( 1 − p ) n − k
Expected value: E ( X ) = n p E(X) = np E ( X ) = n p
Variance: V a r ( X ) = n p ( 1 − p ) Var(X) = np(1-p) Va r ( X ) = n p ( 1 − p )
Poisson distribution: models the number of events occurring in a fixed interval of time or space
PMF: P ( X = k ) = e − λ λ k k ! P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!} P ( X = k ) = k ! e − λ λ k
Expected value: E ( X ) = λ E(X) = \lambda E ( X ) = λ
Variance: V a r ( X ) = λ Var(X) = \lambda Va r ( X ) = λ
Geometric distribution: models the number of trials needed to achieve the first success in a series of independent Bernoulli trials
PMF: P ( X = k ) = ( 1 − p ) k − 1 p P(X = k) = (1-p)^{k-1}p P ( X = k ) = ( 1 − p ) k − 1 p
Expected value: E ( X ) = 1 p E(X) = \frac{1}{p} E ( X ) = p 1
Variance: V a r ( X ) = 1 − p p 2 Var(X) = \frac{1-p}{p^2} Va r ( X ) = p 2 1 − p
Hypergeometric distribution: models the number of successes in a fixed number of draws from a population without replacement
PMF: P ( X = k ) = ( K k ) ( N − K n − k ) ( N n ) P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}} P ( X = k ) = ( n N ) ( k K ) ( n − k N − K )
Expected value: E ( X ) = n K N E(X) = n\frac{K}{N} E ( X ) = n N K
Variance: V a r ( X ) = n K N ( 1 − K N ) N − n N − 1 Var(X) = n\frac{K}{N}\left(1-\frac{K}{N}\right)\frac{N-n}{N-1} Va r ( X ) = n N K ( 1 − N K ) N − 1 N − n
Negative binomial distribution: models the number of failures before a specified number of successes in a series of independent Bernoulli trials
PMF: P ( X = k ) = ( k + r − 1 k ) ( 1 − p ) k p r P(X = k) = \binom{k+r-1}{k}(1-p)^k p^r P ( X = k ) = ( k k + r − 1 ) ( 1 − p ) k p r
Expected value: E ( X ) = r ( 1 − p ) p E(X) = \frac{r(1-p)}{p} E ( X ) = p r ( 1 − p )
Variance: V a r ( X ) = r ( 1 − p ) p 2 Var(X) = \frac{r(1-p)}{p^2} Va r ( X ) = p 2 r ( 1 − p )
Properties and Applications
Discrete random variables have several important properties:
The probability of any single value is between 0 and 1 (inclusive)
The sum of all probabilities in a PMF equals 1
The CDF is a non-decreasing function with values between 0 and 1 (inclusive)
Discrete distributions can be used to model various real-world phenomena:
Binomial distribution: quality control (defective items), medical trials (treatment success)
Poisson distribution: call center arrivals, website traffic, radioactive decay
Geometric distribution: number of attempts until first success (job interviews, sales)
Hypergeometric distribution: sampling without replacement (defective items in a batch)
Negative binomial distribution: number of failures before a specified number of successes (customer complaints before a product is redesigned)
Discrete random variables can be transformed or combined to create new random variables
Example: the sum of two independent discrete random variables is a new discrete random variable
Understanding the properties and applications of discrete random variables is crucial for modeling and analyzing real-world situations
Problem-Solving Techniques
Identify the type of discrete random variable and its parameters (e.g., binomial with n n n and p p p )
Write the PMF or CDF of the random variable using the appropriate formula
Calculate probabilities using the PMF, CDF, or properties of the distribution
For PMF: P ( X = x ) P(X = x) P ( X = x )
For CDF: P ( a < X ≤ b ) = F ( b ) − F ( a ) P(a < X \leq b) = F(b) - F(a) P ( a < X ≤ b ) = F ( b ) − F ( a )
For complement: P ( X > a ) = 1 − P ( X ≤ a ) P(X > a) = 1 - P(X \leq a) P ( X > a ) = 1 − P ( X ≤ a )
Use the expected value and variance formulas to calculate these properties for the given distribution
Apply the linearity of expectation and properties of variance when working with transformed or combined random variables
Recognize when to use the PMF, CDF, or other properties of the distribution to solve a problem
Interpret the results in the context of the problem, considering the real-world implications
Verify that the solution makes sense and satisfies any given conditions or constraints