upgrade
upgrade

🐛Biostatistics

Common Probability Distributions

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Probability distributions are the foundation of everything you'll do in biostatistics—from designing clinical trials to interpreting p-values to modeling disease outbreaks. When you're tested on this material, you're not just being asked to recall formulas. You're being evaluated on whether you understand when to apply each distribution, what assumptions it requires, and how distributions relate to each other. The exam will present scenarios and expect you to identify the appropriate distribution based on the data type and underlying process.

Think of distributions as tools in a toolkit: the normal distribution handles continuous measurements, the binomial counts successes in fixed trials, and the Poisson tracks rare events over time. Each distribution encodes specific assumptions about how data behave—independence, fixed trials, constant rates, symmetry. Don't just memorize parameters—know what real-world process each distribution models and when one distribution approximates another.


Continuous Distributions for Measurement Data

These distributions model variables that can take any value within a range. They're essential for analyzing measurements like blood pressure, weight, reaction times, and biomarker concentrations.

Normal (Gaussian) Distribution

  • Bell-shaped and symmetric around the mean (μ\mu)—the most important distribution in statistics because of its mathematical properties and real-world prevalence
  • Empirical Rule (68-95-99.7): approximately 68% of data falls within ±1σ\pm 1\sigma, 95% within ±2σ\pm 2\sigma, and 99.7% within ±3σ\pm 3\sigma of the mean
  • Central Limit Theorem guarantees that sample means approach normality as nn increases, even when the underlying population isn't normal

Student's t-Distribution

  • Heavier tails than the normal distribution—accounts for extra uncertainty when estimating population parameters from small samples
  • Degrees of freedom (df) control the shape; as dfdf \to \infty, the t-distribution converges to the standard normal
  • Primary use: hypothesis tests and confidence intervals for means when σ\sigma is unknown and n<30n < 30

Uniform Distribution

  • All outcomes equally likely between minimum (aa) and maximum (bb) values—the "no information" distribution
  • Mean is (a+b)/2(a + b)/2 and variance is (ba)2/12(b - a)^2/12 for the continuous case
  • Simulation workhorse: random number generators produce uniform variates that are transformed into other distributions

Compare: Normal vs. t-distribution—both are symmetric and bell-shaped, but the t-distribution has heavier tails to account for uncertainty in small samples. If an FRQ asks about confidence intervals with unknown population standard deviation, reach for the t-distribution.


Discrete Distributions for Counting Events

These distributions model outcomes you can count—number of successes, number of occurrences, binary outcomes. They're fundamental for categorical data analysis and clinical trial design.

Bernoulli Distribution

  • Single trial with two outcomes: success (1) with probability pp or failure (0) with probability 1p1-p
  • Mean is pp and variance is p(1p)p(1-p)—maximum variance occurs when p=0.5p = 0.5
  • Building block for the binomial distribution; understanding Bernoulli trials is essential for grasping more complex counting distributions

Binomial Distribution

  • Counts successes in nn independent Bernoulli trials—think drug response rates, disease prevalence in samples, or treatment outcomes
  • Parameters: nn (number of trials) and pp (probability of success); mean is npnp, variance is np(1p)np(1-p)
  • Normal approximation works well when np10np \geq 10 and n(1p)10n(1-p) \geq 10, making large-sample calculations tractable

Poisson Distribution

  • Models count of rare events in a fixed interval of time or space, given average rate λ\lambda
  • Key property: mean and variance are both equal to λ\lambda—if observed variance greatly exceeds the mean, Poisson assumptions may be violated
  • Applications: hospital admissions per day, mutations per genome region, adverse events in clinical trials

Compare: Binomial vs. Poisson—both count discrete events, but binomial requires a fixed number of trials while Poisson models events in continuous time/space. Poisson approximates binomial when nn is large and pp is small (λ=np\lambda = np). Use this approximation for rare disease incidence.


Distributions for Time-to-Event and Waiting Times

These continuous distributions model how long until something happens—critical for survival analysis, reliability studies, and pharmacokinetics.

Exponential Distribution

  • Models time between events in a Poisson process, with rate parameter λ\lambda (or equivalently, mean 1/λ1/\lambda)
  • Memoryless property: P(T>s+tT>s)=P(T>t)P(T > s + t \mid T > s) = P(T > t)the system doesn't "age," making future predictions independent of elapsed time
  • Survival analysis foundation: models time to death, equipment failure, or disease recurrence when hazard rate is constant

Gamma Distribution

  • Generalizes the exponential—models the waiting time for kk events in a Poisson process (shape parameter kk, scale parameter θ\theta)
  • Flexible shape: can be right-skewed (small kk) or nearly symmetric (large kk); when k=1k = 1, reduces to exponential
  • Applications: total hospital length of stay, aggregate waiting times, and as a prior distribution in Bayesian analysis

Compare: Exponential vs. Gamma—exponential models time to one event; gamma models time to multiple events. If a problem asks about time until the third patient arrives, you need the gamma distribution with k=3k = 3.


Distributions for Proportions and Hypothesis Testing

These distributions arise in specific statistical procedures—testing hypotheses, estimating proportions, and Bayesian inference.

Chi-Square Distribution

  • Sum of squared standard normal variables—arises naturally when estimating variance or testing categorical data
  • Degrees of freedom (df) determine shape; distribution is right-skewed but approaches normality as dfdf increases
  • Primary uses: goodness-of-fit tests, tests of independence in contingency tables, and confidence intervals for variance

Beta Distribution

  • Defined on the interval [0, 1]—perfect for modeling probabilities, proportions, and rates
  • Shape parameters α\alpha and β\beta control the distribution: symmetric when α=β\alpha = \beta, skewed otherwise; uniform when α=β=1\alpha = \beta = 1
  • Bayesian workhorse: serves as the conjugate prior for binomial data, making posterior calculations elegant

Compare: Chi-square vs. t-distribution—both depend on degrees of freedom and are used in hypothesis testing, but chi-square tests categorical relationships and variance while t-tests compare means. Know which test statistic follows which distribution.


Quick Reference Table

ConceptBest Examples
Continuous measurements (symmetric)Normal, t-distribution
Counting successes in fixed trialsBernoulli, Binomial
Rare events in time/spacePoisson
Time until event occursExponential, Gamma
Modeling proportions [0, 1]Beta, Uniform
Hypothesis testing (categorical)Chi-square
Small-sample inferencet-distribution
Bayesian priorsBeta, Gamma

Self-Check Questions

  1. A researcher is counting the number of patients who respond to a new drug out of 50 treated. Which distribution models this outcome, and what parameters define it?

  2. Compare and contrast the Poisson and exponential distributions. How are they mathematically related, and when would you use each?

  3. Why does the t-distribution have heavier tails than the normal distribution? Under what conditions do they become equivalent?

  4. You're modeling the proportion of time a patient spends in remission (a value between 0 and 1). Which distribution is most appropriate, and why?

  5. An FRQ presents hospital emergency room data showing the mean number of arrivals per hour equals 4, but the variance equals 12. Should you use a Poisson model? Explain your reasoning using the distribution's key property.