🧮Data Science Numerical Analysis

Common Statistical Distributions

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Statistical distributions are the mathematical backbone of everything you'll do in data science and statistics. When you're fitting models, running hypothesis tests, or making predictions, you're implicitly assuming your data follows some underlying distribution. Choose the wrong one, and your confidence intervals are meaningless, your p-values are garbage, and your predictions fall apart. The distributions in this guide aren't just abstract math—they're the tools you'll use to model everything from customer arrivals to stock prices to survival times.

You're being tested on more than just memorizing PDFs and parameters. Exams will ask you to identify which distribution fits a scenario, derive relationships between distributions, and justify computational choices. The key concepts here include discrete vs. continuous modeling, conjugate relationships, limiting behaviors, and the role of parameters in shaping distributions. Don't just memorize formulas—know what problem each distribution solves and when one distribution approximates or generalizes another.

Foundational Discrete Distributions

These distributions model countable outcomes—successes, failures, and events. They form the building blocks of discrete probability and appear constantly in sampling, quality control, and event modeling.

Bernoulli Distribution

Single binary trial—the simplest random variable, taking value 1 (success) with probability $p$ and 0 (failure) with probability $1-p$
PMF: $P(X=x) = p^x(1-p)^{1-x}$ for $x \in \{0,1\}$ , with mean $\mu = p$ and variance $\sigma^2 = p(1-p)$
Foundation for compound distributions—the binomial, geometric, and negative binomial all build on independent Bernoulli trials

Binomial Distribution

Counts successes in $n$ fixed trials—each trial independent with success probability $p$ , giving $X \sim \text{Binomial}(n, p)$
PMF: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$ , with mean $np$ and variance $np(1-p)$
Normal approximation applies when $np \geq 10$ and $n(1-p) \geq 10$ —critical for computational efficiency in large-sample problems

Geometric Distribution

Trials until first success—models waiting time in discrete steps, with $X \sim \text{Geometric}(p)$
Memoryless property for discrete distributions: $P(X > m + n \mid X > m) = P(X > n)$ —past failures don't affect future success probability
Mean $1/p$ and variance $\frac{1-p}{p^2}$ —useful in reliability testing and retry-until-success algorithms

Compare: Geometric vs. Negative Binomial—both model trials until success, but geometric stops at the first success while negative binomial waits for $r$ successes. If an FRQ asks about "number of attempts needed," identify whether you need one success or multiple.

Count and Rate-Based Distributions

When you're modeling the number of events in a fixed interval—whether time, space, or another continuous domain—these distributions capture the underlying randomness of arrival processes.

Poisson Distribution

Events in fixed intervals—models count data when events occur independently at constant rate $\lambda$ , giving $X \sim \text{Poisson}(\lambda)$
PMF: $P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}$ , with the elegant property that mean = variance = $\lambda$
Binomial limit: as $n \to \infty$ and $p \to 0$ with $np = \lambda$ fixed, $\text{Binomial}(n,p) \to \text{Poisson}(\lambda)$ —this is your go-to approximation for rare events

Negative Binomial Distribution

Trials until $r$ successes—generalizes geometric distribution with parameters $r$ (target successes) and $p$ (success probability)
Handles overdispersion in count data where variance exceeds mean—unlike Poisson, allows $\text{Var}(X) > E[X]$
Mean $\frac{r(1-p)}{p}$ and variance $\frac{r(1-p)}{p^2}$ —preferred over Poisson when modeling clustered or bursty events

Compare: Poisson vs. Negative Binomial—both model counts, but Poisson assumes mean equals variance. When your data shows overdispersion (variance > mean), negative binomial is the better choice. This distinction appears frequently in regression model selection.

Continuous Distributions for Modeling Data

These workhorses model measurements, times, and proportions. Understanding their shapes, parameters, and relationships is essential for both theoretical derivations and practical modeling.

Normal (Gaussian) Distribution

The universal limit—symmetric, bell-shaped with $X \sim N(\mu, \sigma^2)$ ; PDF is $f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
Central Limit Theorem: sample means of any distribution converge to normal as $n \to \infty$ —this justifies most large-sample inference
Standard normal $Z \sim N(0,1)$ is the reference; transform via $Z = \frac{X - \mu}{\sigma}$ for probability calculations and hypothesis tests

Uniform Distribution

Equal probability across $[a, b]$ —the "maximum entropy" distribution when you only know the range, with PDF $f(x) = \frac{1}{b-a}$
Mean $\frac{a+b}{2}$ and variance $\frac{(b-a)^2}{12}$ —memorize these for quick calculations
Simulation foundation: if $U \sim \text{Uniform}(0,1)$ , you can generate any distribution via inverse transform sampling—this is how random number generators work

Lognormal Distribution

Exponential of normal—if $Y \sim N(\mu, \sigma^2)$ , then $X = e^Y$ is lognormal; always positive and right-skewed
Multiplicative processes naturally produce lognormal data—stock returns, biological growth, income distributions
Mean $e^{\mu + \sigma^2/2}$ and variance $e^{2\mu + \sigma^2}(e^{\sigma^2} - 1)$ —note that $\mu$ and $\sigma$ are parameters of the log, not the distribution itself

Compare: Normal vs. Lognormal—normal models additive effects (measurement error, heights), while lognormal models multiplicative effects (returns, concentrations). If your data is strictly positive and right-skewed, try logging it and checking for normality.

Waiting Time and Reliability Distributions

These continuous distributions model duration, lifetime, and time-to-event data. They're fundamental in survival analysis, queuing theory, and engineering reliability.

Exponential Distribution

Time between Poisson events—the continuous analog of geometric, with rate $\lambda$ and PDF $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$
Memoryless property: $P(X > s + t \mid X > s) = P(X > t)$ —the only continuous memoryless distribution, modeling "fresh start" scenarios
Mean $1/\lambda$ and variance $1/\lambda^2$ —connects directly to Poisson: if arrivals are Poisson( $\lambda$ ), inter-arrival times are Exponential( $\lambda$ )

Gamma Distribution

Sum of exponentials—if $X_1, \ldots, X_k$ are i.i.d. Exponential( $\lambda$ ), then $\sum X_i \sim \text{Gamma}(k, \lambda)$
Two-parameter flexibility with shape $k$ and rate $\lambda$ (or scale $\theta = 1/\lambda$ ); mean $k/\lambda$ and variance $k/\lambda^2$
Special cases: Exponential is Gamma(1, $\lambda$ ); Chi-square( $\nu$ ) is Gamma( $\nu/2$ , 1/2)—these relationships are heavily tested

Weibull Distribution

Flexible failure modeling—shape parameter $k$ controls hazard rate behavior: $k < 1$ (decreasing), $k = 1$ (constant = exponential), $k > 1$ (increasing)
PDF: $f(x) = \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/\lambda)^k}$ for $x \geq 0$ , with scale $\lambda$
Reliability standard—models infant mortality ( $k < 1$ ), random failures ( $k = 1$ ), and wear-out ( $k > 1$ ) in a single framework

Compare: Exponential vs. Weibull—exponential assumes constant failure rate (memoryless), while Weibull allows the failure rate to change over time. If a problem mentions "aging" or "wear-out," Weibull with $k > 1$ is your answer.

Distributions for Proportions and Bounded Data

When your random variable is constrained to a finite interval—especially $[0, 1]$ —these distributions provide the necessary flexibility.

Beta Distribution

Flexible on $[0, 1]$ —shape parameters $\alpha$ and $\beta$ control skewness; PDF $f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}$
Conjugate prior for binomial likelihood in Bayesian inference—if prior is Beta( $\alpha, \beta$ ) and you observe $k$ successes in $n$ trials, posterior is Beta( $\alpha + k, \beta + n - k$ )
Mean $\frac{\alpha}{\alpha + \beta}$ and special cases: Uniform(0,1) = Beta(1,1); symmetric when $\alpha = \beta$

Compare: Beta vs. Uniform—uniform is just Beta(1,1), assuming no prior information. As you observe data, the beta posterior concentrates around the true proportion. This prior-to-posterior update is a classic Bayesian exam topic.

Sampling and Inference Distributions

These distributions arise from sampling theory and are essential for hypothesis testing, confidence intervals, and model comparison. They're derived from the normal distribution and appear whenever you're doing inference.

Chi-Square Distribution

Sum of squared normals—if $Z_1, \ldots, Z_\nu$ are i.i.d. $N(0,1)$ , then $\sum Z_i^2 \sim \chi^2_\nu$ with $\nu$ degrees of freedom
Variance inference: sample variance $S^2$ from normal data satisfies $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}$ —used for confidence intervals on variance
Goodness-of-fit and independence tests—the test statistic $\sum \frac{(O_i - E_i)^2}{E_i}$ follows chi-square under the null hypothesis

Student's t-Distribution

Ratio of normal to chi-square—if $Z \sim N(0,1)$ and $V \sim \chi^2_\nu$ independently, then $T = \frac{Z}{\sqrt{V/\nu}} \sim t_\nu$
Heavier tails than normal—accounts for uncertainty in estimating $\sigma$ ; converges to $N(0,1)$ as $\nu \to \infty$
Small-sample inference: use $t_{n-1}$ for confidence intervals and hypothesis tests on means when $\sigma$ is unknown—this is the default for real data

F-Distribution

Ratio of chi-squares—if $U \sim \chi^2_{d_1}$ and $V \sim \chi^2_{d_2}$ independently, then $F = \frac{U/d_1}{V/d_2} \sim F_{d_1, d_2}$
ANOVA test statistic: compares between-group variance to within-group variance; large $F$ suggests group means differ
Regression significance: the overall F-test checks if at least one predictor has nonzero coefficient—always report alongside $R^2$

Compare: t vs. F distributions—t-tests compare one or two means, while F-tests compare variances or multiple means simultaneously. Note that $t^2_\nu = F_{1,\nu}$ —a two-sided t-test is equivalent to an F-test with 1 numerator degree of freedom.

Quick Reference Table

Concept	Best Examples
Discrete counts (fixed trials)	Bernoulli, Binomial, Geometric, Negative Binomial
Event rates in continuous time/space	Poisson, Exponential
Symmetric continuous data	Normal, Student's t
Positive, right-skewed data	Lognormal, Gamma, Weibull, Exponential
Bounded proportions $[0,1]$	Beta, Uniform
Variance and model comparison	Chi-Square, F
Small-sample mean inference	Student's t
Bayesian conjugate priors	Beta (for binomial), Gamma (for Poisson)

Self-Check Questions

A call center receives an average of 4 calls per minute. Which distribution models the number of calls in a 5-minute window, and which models the time until the next call?
You're modeling the proportion of defective items in a batch using Bayesian inference with a binomial likelihood. What distribution family should your prior belong to, and why?
Compare the exponential and Weibull distributions: under what conditions does Weibull reduce to exponential, and when would you prefer Weibull in a reliability analysis?
Your sample variance from 25 observations is used to construct a confidence interval for the population variance. What distribution does the pivotal quantity follow, and how many degrees of freedom does it have?
Explain why the normal distribution appears so frequently in inference, even when the underlying data is clearly non-normal. What theorem justifies this, and what conditions must hold?