4.1 Hypergeometric Distribution

3 min readjune 25, 2024

The calculates probabilities for from finite populations. It's used when the and are known, like drawing cards from a deck or selecting defective items from a batch.

This distribution differs from others in its handling of independence and sampling. Unlike the binomial distribution, which assumes constant probability and independence, the hypergeometric distribution accounts for changing probabilities as items are removed from the population.

Hypergeometric Distribution

Hypergeometric distribution calculations

Top images from around the web for Hypergeometric distribution calculations
Top images from around the web for Hypergeometric distribution calculations
  • Calculates probabilities for scenarios involving from a finite population
  • :
    • P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}
      • NN: total population size (number of items in a batch)
      • KK: number of successes in the population (defective items in the batch)
      • nn: (items selected for inspection)
      • kk: number of successes in the sample (defective items found in the sample)
    • (nk)\binom{n}{k} calculated as:
      • (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!}
        • n!n! represents of nn, product of all positive integers ≤ nn (5! = 5 × 4 × 3 × 2 × 1)
  • Steps to calculate probabilities using hypergeometric distribution:
    1. Identify values for NN, KK, nn, and kk
    2. Substitute values into hypergeometric distribution formula
    3. Calculate binomial coefficients using formula or calculator
    4. Simplify fraction to obtain probability
  • The can be used to calculate probabilities for ranges of values

Applications of hypergeometric distribution

  • Appropriate when:
    • Sampling without replacement from finite population (cards from deck, defective items from batch)
    • Population size known (52 cards in standard deck)
    • Number of successes in population known (4 aces in deck)
    • Sample size fixed (5-card hand)
  • Examples of hypergeometric distribution applications:
    • Drawing cards from deck without replacing them
    • Selecting defective items from batch without replacing them
    • Choosing fixed number of individuals with specific characteristic from population (10 left-handed people from group of 50)
  • Used to model discrete random variables in scenarios with finite populations

Hypergeometric vs other distributions

  • Hypergeometric distribution differs from other probability distributions in independence and sampling
    • Independence:
      • Hypergeometric distribution: probability of success changes with each trial due to sampling without replacement
      • Other distributions (binomial): probability of success constant for each trial due to sampling with replacement or assumed independence
    • Sampling:
      • Hypergeometric distribution: sampling without replacement from finite population
      • Other distributions (binomial): assume sampling with replacement or infinite population
  • Comparison with binomial distribution:
    • Binomial distribution assumes constant probability of success for each trial and independence between trials (coin flips)
    • Hypergeometric distribution used when population finite and sampling without replacement, leading to dependent trials (cards drawn from deck)
  • Hypergeometric distribution incorporates the finite population correction factor, which accounts for the changing probability of success as items are removed from the population

Additional Concepts

  • : Measures the spread of the hypergeometric distribution
  • : The hypergeometric distribution can be used to calculate probabilities of events given that certain conditions have been met

Key Terms to Review (22)

Binomial Coefficient: The binomial coefficient is a mathematical concept that represents the number of ways to choose a certain number of items from a set, without regard to order. It is a fundamental concept in probability theory and combinatorics, with applications in various fields, including statistics, computer science, and finance.
Conditional Probability: Conditional probability is the likelihood of an event occurring given that another event has already occurred. It represents the probability of one event happening, given the knowledge of another event happening.
Conditional probability of A given B: Conditional probability of A given B, denoted as $P(A|B)$, is the probability that event A occurs given that event B has already occurred. It quantifies the relationship between two events in a probabilistic context.
Cumulative Distribution Function: The cumulative distribution function (CDF) is a fundamental concept in probability and statistics that describes the probability of a random variable taking a value less than or equal to a given value. It provides a comprehensive way to represent the distribution of a random variable and is closely related to other important statistical concepts such as probability density functions and probability mass functions.
Cumulative distribution function (CDF): A cumulative distribution function (CDF) represents the probability that a continuous random variable takes on a value less than or equal to a specific value. It is an integral of the probability density function (PDF).
Discrete Random Variable: A discrete random variable is a random variable that can take on a countable number of distinct values. It is a variable that represents the outcome of a random experiment or process, where the possible values are separated and distinct, rather than a continuous range of values.
Estimate of the error variance: Estimate of the error variance is a measure of the variability in the observed values that cannot be explained by the regression model. It is often denoted as $\hat{\sigma}^2$ and calculated as the sum of squared residuals divided by the degrees of freedom.
Excel: Excel is a powerful spreadsheet software application that allows users to organize, analyze, and visualize data through a variety of tools and functions. It is widely used in business, finance, and academic settings for tasks such as data manipulation, statistical analysis, and creating reports and presentations.
Expected value: Expected value is the weighted average of all possible values that a random variable can take on, with weights being their respective probabilities. It provides a measure of the center of the distribution of the variable.
Expected Value: Expected value is a statistical concept that represents the average or central tendency of a probability distribution. It is the weighted average of all possible outcomes, where the weights are the probabilities of each outcome occurring. The expected value provides a measure of the central tendency and is a useful tool for decision-making and analysis in various contexts, including the topics of 3.1 Terminology, 4.1 Hypergeometric Distribution, 4.2 Binomial Distribution, 5.1 Properties of Continuous Probability Density Functions, 5.2 The Uniform Distribution, and 6.3 Estimating the Binomial with the Normal Distribution.
Factorial: The factorial of a non-negative integer n, denoted as n!, is the product of all positive integers less than or equal to n. It is a fundamental concept in probability and combinatorics that is particularly relevant in the context of the Hypergeometric and Poisson distributions.
Fisher's Exact Test: Fisher's Exact Test is a statistical significance test used to determine if there is a significant relationship between two categorical variables in a 2x2 contingency table. It is particularly useful when sample sizes are small and the assumptions of the chi-square test are not met.
Hypergeometric Distribution: The hypergeometric distribution is a discrete probability distribution that describes the probability of a certain number of successes in a fixed number of draws, without replacement, from a finite population. It is used to model situations where a sample is drawn from a population without replacement, and the interest lies in the number of items with a particular characteristic in the sample.
Hypergeometric Distribution Formula: The hypergeometric distribution formula is a probability distribution used to calculate the likelihood of obtaining a certain number of successes in a fixed number of trials without replacement from a finite population. It is particularly useful in situations where the population size is relatively small, and the probability of success in each trial is not constant.
Number of Successes: The number of successes, or favorable outcomes, that occur in a given experiment or trial. This term is particularly relevant in the context of the Hypergeometric Distribution, which models the probability of obtaining a certain number of successes in a fixed number of trials without replacement.
Population Size: Population size refers to the total number of individuals or units that make up a given population. It is a fundamental concept in statistics and is particularly relevant in the context of statistical distributions and sampling methods.
Probability Mass Function: The probability mass function (PMF) is a fundamental concept in probability theory that describes the probability distribution of a discrete random variable. It assigns a probability to each possible value that the random variable can take, providing a complete description of the likelihood of different outcomes occurring.
R: R is a statistical programming language and software environment used for data analysis, visualization, and statistical computing. It is widely used in various fields, including business, academia, and research, due to its powerful capabilities and versatility.
Sample Size: Sample size refers to the number of observations or data points collected in a statistical study or experiment. It is a crucial factor that determines the reliability and precision of the conclusions drawn from the data.
Sampling without replacement: Sampling without replacement is a method of sample selection where each selected unit is not returned to the population before the next draw. This ensures that no unit can be chosen more than once.
Sampling Without Replacement: Sampling without replacement is a statistical technique where items or individuals are selected from a finite population, and once an item is selected, it is not returned to the population before the next selection. This method ensures that each item in the population has a unique chance of being chosen and prevents the same item from being selected multiple times within a single sample.
Variance: Variance is a measure of the spread or dispersion of a dataset, indicating how far each data point deviates from the mean or average value. It is a fundamental statistical concept that quantifies the variability within a distribution and plays a crucial role in various statistical analyses and probability distributions.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary