Why This Matters
Probability distributions are the foundation of everything you'll do in biostatistics—from designing clinical trials to interpreting p-values to modeling disease outbreaks. When you're tested on this material, you're not just being asked to recall formulas. You're being evaluated on whether you understand when to apply each distribution, what assumptions it requires, and how distributions relate to each other. The exam will present scenarios and expect you to identify the appropriate distribution based on the data type and underlying process.
Think of distributions as tools in a toolkit: the normal distribution handles continuous measurements, the binomial counts successes in fixed trials, and the Poisson tracks rare events over time. Each distribution encodes specific assumptions about how data behave—independence, fixed trials, constant rates, symmetry. Don't just memorize parameters—know what real-world process each distribution models and when one distribution approximates another.
Continuous Distributions for Measurement Data
These distributions model variables that can take any value within a range. They're essential for analyzing measurements like blood pressure, weight, reaction times, and biomarker concentrations.
Normal (Gaussian) Distribution
- Bell-shaped and symmetric around the mean (μ)—the most important distribution in statistics because of its mathematical properties and real-world prevalence
- Empirical Rule (68-95-99.7): approximately 68% of data falls within ±1σ, 95% within ±2σ, and 99.7% within ±3σ of the mean
- Central Limit Theorem guarantees that sample means approach normality as n increases, even when the underlying population isn't normal
Student's t-Distribution
- Heavier tails than the normal distribution—accounts for extra uncertainty when estimating population parameters from small samples
- Degrees of freedom (df) control the shape; as df→∞, the t-distribution converges to the standard normal
- Primary use: hypothesis tests and confidence intervals for means when σ is unknown and n<30
- All outcomes equally likely between minimum (a) and maximum (b) values—the "no information" distribution
- Mean is (a+b)/2 and variance is (b−a)2/12 for the continuous case
- Simulation workhorse: random number generators produce uniform variates that are transformed into other distributions
Compare: Normal vs. t-distribution—both are symmetric and bell-shaped, but the t-distribution has heavier tails to account for uncertainty in small samples. If an FRQ asks about confidence intervals with unknown population standard deviation, reach for the t-distribution.
Discrete Distributions for Counting Events
These distributions model outcomes you can count—number of successes, number of occurrences, binary outcomes. They're fundamental for categorical data analysis and clinical trial design.
Bernoulli Distribution
- Single trial with two outcomes: success (1) with probability p or failure (0) with probability 1−p
- Mean is p and variance is p(1−p)—maximum variance occurs when p=0.5
- Building block for the binomial distribution; understanding Bernoulli trials is essential for grasping more complex counting distributions
Binomial Distribution
- Counts successes in n independent Bernoulli trials—think drug response rates, disease prevalence in samples, or treatment outcomes
- Parameters: n (number of trials) and p (probability of success); mean is np, variance is np(1−p)
- Normal approximation works well when np≥10 and n(1−p)≥10, making large-sample calculations tractable
Poisson Distribution
- Models count of rare events in a fixed interval of time or space, given average rate λ
- Key property: mean and variance are both equal to λ—if observed variance greatly exceeds the mean, Poisson assumptions may be violated
- Applications: hospital admissions per day, mutations per genome region, adverse events in clinical trials
Compare: Binomial vs. Poisson—both count discrete events, but binomial requires a fixed number of trials while Poisson models events in continuous time/space. Poisson approximates binomial when n is large and p is small (λ=np). Use this approximation for rare disease incidence.
Distributions for Time-to-Event and Waiting Times
These continuous distributions model how long until something happens—critical for survival analysis, reliability studies, and pharmacokinetics.
Exponential Distribution
- Models time between events in a Poisson process, with rate parameter λ (or equivalently, mean 1/λ)
- Memoryless property: P(T>s+t∣T>s)=P(T>t)—the system doesn't "age," making future predictions independent of elapsed time
- Survival analysis foundation: models time to death, equipment failure, or disease recurrence when hazard rate is constant
Gamma Distribution
- Generalizes the exponential—models the waiting time for k events in a Poisson process (shape parameter k, scale parameter θ)
- Flexible shape: can be right-skewed (small k) or nearly symmetric (large k); when k=1, reduces to exponential
- Applications: total hospital length of stay, aggregate waiting times, and as a prior distribution in Bayesian analysis
Compare: Exponential vs. Gamma—exponential models time to one event; gamma models time to multiple events. If a problem asks about time until the third patient arrives, you need the gamma distribution with k=3.
Distributions for Proportions and Hypothesis Testing
These distributions arise in specific statistical procedures—testing hypotheses, estimating proportions, and Bayesian inference.
Chi-Square Distribution
- Sum of squared standard normal variables—arises naturally when estimating variance or testing categorical data
- Degrees of freedom (df) determine shape; distribution is right-skewed but approaches normality as df increases
- Primary uses: goodness-of-fit tests, tests of independence in contingency tables, and confidence intervals for variance
Beta Distribution
- Defined on the interval [0, 1]—perfect for modeling probabilities, proportions, and rates
- Shape parameters α and β control the distribution: symmetric when α=β, skewed otherwise; uniform when α=β=1
- Bayesian workhorse: serves as the conjugate prior for binomial data, making posterior calculations elegant
Compare: Chi-square vs. t-distribution—both depend on degrees of freedom and are used in hypothesis testing, but chi-square tests categorical relationships and variance while t-tests compare means. Know which test statistic follows which distribution.
Quick Reference Table
|
| Continuous measurements (symmetric) | Normal, t-distribution |
| Counting successes in fixed trials | Bernoulli, Binomial |
| Rare events in time/space | Poisson |
| Time until event occurs | Exponential, Gamma |
| Modeling proportions [0, 1] | Beta, Uniform |
| Hypothesis testing (categorical) | Chi-square |
| Small-sample inference | t-distribution |
| Bayesian priors | Beta, Gamma |
Self-Check Questions
-
A researcher is counting the number of patients who respond to a new drug out of 50 treated. Which distribution models this outcome, and what parameters define it?
-
Compare and contrast the Poisson and exponential distributions. How are they mathematically related, and when would you use each?
-
Why does the t-distribution have heavier tails than the normal distribution? Under what conditions do they become equivalent?
-
You're modeling the proportion of time a patient spends in remission (a value between 0 and 1). Which distribution is most appropriate, and why?
-
An FRQ presents hospital emergency room data showing the mean number of arrivals per hour equals 4, but the variance equals 12. Should you use a Poisson model? Explain your reasoning using the distribution's key property.