Statistical distributions are the mathematical backbone of everything you'll do in data science and statistics. When you're fitting models, running hypothesis tests, or making predictions, you're implicitly assuming your data follows some underlying distribution. Choose the wrong one, and your confidence intervals are meaningless, your p-values are garbage, and your predictions fall apart. The distributions in this guide aren't just abstract math—they're the tools you'll use to model everything from customer arrivals to stock prices to survival times.
You're being tested on more than just memorizing PDFs and parameters. Exams will ask you to identify which distribution fits a scenario, derive relationships between distributions, and justify computational choices. The key concepts here include discrete vs. continuous modeling, conjugate relationships, limiting behaviors, and the role of parameters in shaping distributions. Don't just memorize formulas—know what problem each distribution solves and when one distribution approximates or generalizes another.
Foundational Discrete Distributions
These distributions model countable outcomes—successes, failures, and events. They form the building blocks of discrete probability and appear constantly in sampling, quality control, and event modeling.
Bernoulli Distribution
Single binary trial—the simplest random variable, taking value 1 (success) with probability p and 0 (failure) with probability 1−p
PMF:P(X=x)=px(1−p)1−x for x∈{0,1}, with meanμ=p and varianceσ2=p(1−p)
Foundation for compound distributions—the binomial, geometric, and negative binomial all build on independent Bernoulli trials
Binomial Distribution
Counts successes in n fixed trials—each trial independent with success probability p, giving X∼Binomial(n,p)
PMF:P(X=k)=(kn)pk(1−p)n−k, with meannp and variancenp(1−p)
Normal approximation applies when np≥10 and n(1−p)≥10—critical for computational efficiency in large-sample problems
Geometric Distribution
Trials until first success—models waiting time in discrete steps, with X∼Geometric(p)
Memoryless property for discrete distributions: P(X>m+n∣X>m)=P(X>n)—past failures don't affect future success probability
Mean1/p and variancep21−p—useful in reliability testing and retry-until-success algorithms
Compare: Geometric vs. Negative Binomial—both model trials until success, but geometric stops at the first success while negative binomial waits for r successes. If an FRQ asks about "number of attempts needed," identify whether you need one success or multiple.
Count and Rate-Based Distributions
When you're modeling the number of events in a fixed interval—whether time, space, or another continuous domain—these distributions capture the underlying randomness of arrival processes.
Poisson Distribution
Events in fixed intervals—models count data when events occur independently at constant rate λ, giving X∼Poisson(λ)
PMF:P(X=k)=k!λke−λ, with the elegant property that mean = variance = λ
Binomial limit: as n→∞ and p→0 with np=λ fixed, Binomial(n,p)→Poisson(λ)—this is your go-to approximation for rare events
Negative Binomial Distribution
Trials until r successes—generalizes geometric distribution with parameters r (target successes) and p (success probability)
Handles overdispersion in count data where variance exceeds mean—unlike Poisson, allows Var(X)>E[X]
Meanpr(1−p) and variancep2r(1−p)—preferred over Poisson when modeling clustered or bursty events
Compare: Poisson vs. Negative Binomial—both model counts, but Poisson assumes mean equals variance. When your data shows overdispersion (variance > mean), negative binomial is the better choice. This distinction appears frequently in regression model selection.
Continuous Distributions for Modeling Data
These workhorses model measurements, times, and proportions. Understanding their shapes, parameters, and relationships is essential for both theoretical derivations and practical modeling.
Normal (Gaussian) Distribution
The universal limit—symmetric, bell-shaped with X∼N(μ,σ2); PDF is f(x)=σ2π1e−2σ2(x−μ)2
Central Limit Theorem: sample means of any distribution converge to normal as n→∞—this justifies most large-sample inference
Standard normalZ∼N(0,1) is the reference; transform via Z=σX−μ for probability calculations and hypothesis tests
Uniform Distribution
Equal probability across [a,b]—the "maximum entropy" distribution when you only know the range, with PDF f(x)=b−a1
Mean2a+b and variance12(b−a)2—memorize these for quick calculations
Simulation foundation: if U∼Uniform(0,1), you can generate any distribution via inverse transform sampling—this is how random number generators work
Lognormal Distribution
Exponential of normal—if Y∼N(μ,σ2), then X=eY is lognormal; always positive and right-skewed
Multiplicative processes naturally produce lognormal data—stock returns, biological growth, income distributions
Meaneμ+σ2/2 and variancee2μ+σ2(eσ2−1)—note that μ and σ are parameters of the log, not the distribution itself
Compare: Normal vs. Lognormal—normal models additive effects (measurement error, heights), while lognormal models multiplicative effects (returns, concentrations). If your data is strictly positive and right-skewed, try logging it and checking for normality.
Waiting Time and Reliability Distributions
These continuous distributions model duration, lifetime, and time-to-event data. They're fundamental in survival analysis, queuing theory, and engineering reliability.
Exponential Distribution
Time between Poisson events—the continuous analog of geometric, with rate λ and PDF f(x)=λe−λx for x≥0
Memoryless property:P(X>s+t∣X>s)=P(X>t)—the only continuous memoryless distribution, modeling "fresh start" scenarios
Mean1/λ and variance1/λ2—connects directly to Poisson: if arrivals are Poisson(λ), inter-arrival times are Exponential(λ)
Gamma Distribution
Sum of exponentials—if X1,…,Xk are i.i.d. Exponential(λ), then ∑Xi∼Gamma(k,λ)
Two-parameter flexibility with shape k and rate λ (or scale θ=1/λ); meank/λ and variancek/λ2
Special cases: Exponential is Gamma(1, λ); Chi-square(ν) is Gamma(ν/2, 1/2)—these relationships are heavily tested
PDF:f(x)=λk(λx)k−1e−(x/λ)k for x≥0, with scale λ
Reliability standard—models infant mortality (k<1), random failures (k=1), and wear-out (k>1) in a single framework
Compare: Exponential vs. Weibull—exponential assumes constant failure rate (memoryless), while Weibull allows the failure rate to change over time. If a problem mentions "aging" or "wear-out," Weibull with k>1 is your answer.
Distributions for Proportions and Bounded Data
When your random variable is constrained to a finite interval—especially [0,1]—these distributions provide the necessary flexibility.
Beta Distribution
Flexible on [0,1]—shape parameters α and β control skewness; PDF f(x)=B(α,β)xα−1(1−x)β−1
Conjugate prior for binomial likelihood in Bayesian inference—if prior is Beta(α,β) and you observe k successes in n trials, posterior is Beta(α+k,β+n−k)
Meanα+βα and special cases: Uniform(0,1) = Beta(1,1); symmetric when α=β
Compare: Beta vs. Uniform—uniform is just Beta(1,1), assuming no prior information. As you observe data, the beta posterior concentrates around the true proportion. This prior-to-posterior update is a classic Bayesian exam topic.
Sampling and Inference Distributions
These distributions arise from sampling theory and are essential for hypothesis testing, confidence intervals, and model comparison. They're derived from the normal distribution and appear whenever you're doing inference.
Chi-Square Distribution
Sum of squared normals—if Z1,…,Zν are i.i.d. N(0,1), then ∑Zi2∼χν2 with νdegrees of freedom
Variance inference: sample variance S2 from normal data satisfies σ2(n−1)S2∼χn−12—used for confidence intervals on variance
Goodness-of-fit and independence tests—the test statistic ∑Ei(Oi−Ei)2 follows chi-square under the null hypothesis
Student's t-Distribution
Ratio of normal to chi-square—if Z∼N(0,1) and V∼χν2 independently, then T=V/νZ∼tν
Heavier tails than normal—accounts for uncertainty in estimating σ; converges to N(0,1) as ν→∞
Small-sample inference: use tn−1 for confidence intervals and hypothesis tests on means when σ is unknown—this is the default for real data
F-Distribution
Ratio of chi-squares—if U∼χd12 and V∼χd22 independently, then F=V/d2U/d1∼Fd1,d2
ANOVA test statistic: compares between-group variance to within-group variance; large F suggests group means differ
Regression significance: the overall F-test checks if at least one predictor has nonzero coefficient—always report alongside R2
Compare: t vs. F distributions—t-tests compare one or two means, while F-tests compare variances or multiple means simultaneously. Note that tν2=F1,ν—a two-sided t-test is equivalent to an F-test with 1 numerator degree of freedom.
Quick Reference Table
Concept
Best Examples
Discrete counts (fixed trials)
Bernoulli, Binomial, Geometric, Negative Binomial
Event rates in continuous time/space
Poisson, Exponential
Symmetric continuous data
Normal, Student's t
Positive, right-skewed data
Lognormal, Gamma, Weibull, Exponential
Bounded proportions [0,1]
Beta, Uniform
Variance and model comparison
Chi-Square, F
Small-sample mean inference
Student's t
Bayesian conjugate priors
Beta (for binomial), Gamma (for Poisson)
Self-Check Questions
A call center receives an average of 4 calls per minute. Which distribution models the number of calls in a 5-minute window, and which models the time until the next call?
You're modeling the proportion of defective items in a batch using Bayesian inference with a binomial likelihood. What distribution family should your prior belong to, and why?
Compare the exponential and Weibull distributions: under what conditions does Weibull reduce to exponential, and when would you prefer Weibull in a reliability analysis?
Your sample variance from 25 observations is used to construct a confidence interval for the population variance. What distribution does the pivotal quantity follow, and how many degrees of freedom does it have?
Explain why the normal distribution appears so frequently in inference, even when the underlying data is clearly non-normal. What theorem justifies this, and what conditions must hold?