Joint probability distributions are a fundamental concept in stochastic processes, describing how multiple random variables interact. They allow us to model complex systems with multiple uncertain components, providing a framework for analyzing their behavior and making predictions.

These distributions come in discrete and continuous forms, each with unique properties and calculation methods. Understanding marginal and conditional distributions derived from joint distributions is crucial for extracting specific information and updating probabilities based on observed data.

Joint probability distribution definition

  • Joint probability distributions describe the probabilistic relationship between two or more random variables, capturing how likely different combinations of values are to occur simultaneously
  • Allow modeling and analyzing systems or experiments involving multiple uncertain quantities, which is foundational in stochastic processes and many real-world applications

Discrete vs continuous

Top images from around the web for Discrete vs continuous
Top images from around the web for Discrete vs continuous
  • Discrete joint distributions are used when the random variables can only take on a countable number of distinct values (integers, specific categories)
  • Continuous joint distributions apply when the variables have an uncountably infinite range of possible values (real numbers on an interval or the whole real line)
  • The type of joint distribution affects how probabilities are calculated and represented mathematically (sums for discrete, integrals for continuous)

Marginal vs conditional distributions

  • Marginal distributions consider only one variable at a time, ignoring information about the others
    • Obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variables
    • Represents the individual behavior of each component variable
  • Conditional distributions fix the values of some variables and look at the probabilities for the remaining ones
    • Calculated by dividing the joint probability by the marginal of the fixed variables (like Bayes' rule)
    • Shows how the distribution of certain variables changes based on knowledge of others

Joint probability mass functions

  • A (PMF) gives the probability of each possible combination of values for discrete random variables
  • The PMF is a function p(x1,x2,,xn)p(x_1, x_2, \ldots, x_n) that maps from the possible values of the variables to probabilities between 0 and 1
  • The probabilities for all possible outcomes must sum to 1, a key property of valid PMFs

Discrete random variables

  • PMFs are defined over a countable sample space, the set of all possible combinations of values the discrete random variables can take
  • Common discrete distributions used in multivariate settings include multinomial, Poisson, geometric, and more
  • Many concepts from univariate discrete distributions extend intuitively to the multivariate case (expected values, variance, generating functions)

Multivariate distributions

  • A is a joint distribution over more than one variable, discrete or continuous
  • Multivariate PMFs can be represented by tables or matrices enumerating the probability of each possible combination of values
  • Sums and other operations on the PMF can be used to derive useful quantities and distributions (marginals, conditionals, moments)

Calculating probabilities

  • Probabilities of events are calculated by summing the PMF values for all outcomes contained in the event
  • For an event AA defined by conditions on the variables: P(A)=(x1,,xn)Ap(x1,,xn)P(A) = \sum_{(x_1,\ldots,x_n) \in A} p(x_1,\ldots,x_n)
  • The inclusion-exclusion principle and other counting techniques are often helpful in determining which outcomes satisfy the conditions defining an event of interest

Joint probability density functions

  • A (PDF) is used to specify a continuous multivariate distribution
  • Gives the relative likelihood of different combinations of values, but not directly interpretable as probabilities
  • Probabilities are found by integrating the PDF over a region of interest, not just evaluating it at a point

Continuous random variables

  • Joint PDFs apply to continuous random variables that can take any value in a specified range
  • Common continuous multivariate distributions include multivariate normal, exponential, beta, gamma, and more
  • Densities allow working with continuous quantities (measurements, times, etc.) without discretization

Multivariate density functions

  • A multivariate PDF is a function f(x1,,xn)f(x_1,\ldots,x_n) that gives the joint density of continuous random variables X1,,XnX_1,\ldots,X_n
  • Must be non-negative everywhere, and integrate to 1 over the entire domain
  • Can be used to find marginal and conditional PDFs through integration and division similar to the discrete case

Probability calculations with integrals

  • For an event AA defined by conditions on the continuous random variables, the probability is given by an integral: P(A)=Af(x1,,xn)dx1dxnP(A) = \int_{A} f(x_1,\ldots,x_n) dx_1\cdots dx_n
  • Multiple integrals are often required, taken over the region of the sample space corresponding to event AA
  • Computational tools and clever manipulations are often needed to evaluate the integrals for complex regions

Joint cumulative distribution functions

  • The joint cumulative distribution function (CDF) of random variables X1,,XnX_1,\ldots,X_n is defined as F(x1,,xn)=P(X1x1,,Xnxn)F(x_1,\ldots,x_n) = P(X_1 \leq x_1,\ldots, X_n \leq x_n)
  • Gives the probability that each variable is less than or equal to a specified value simultaneously
  • Applies to both discrete and continuous distributions, unifying the PMF and PDF perspectives

CDF definition for joint distributions

  • For discrete variables, the joint CDF can be expressed as a sum: F(x1,,xn)=y1x1ynxnp(y1,,yn)F(x_1,\ldots,x_n) = \sum_{y_1 \leq x_1} \cdots \sum_{y_n \leq x_n} p(y_1,\ldots,y_n)
  • In the continuous case, the CDF is an integral: F(x1,,xn)=x1xnf(y1,,yn)dyndy1F(x_1,\ldots,x_n) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_n} f(y_1,\ldots,y_n) dy_n \cdots dy_1
  • The CDF is the fundamental way to specify any multivariate distribution, from which other representations can be derived

Properties of joint CDFs

  • Joint CDFs are monotonically increasing in each argument: if xiyix_i \leq y_i for all ii, then F(x1,,xn)F(y1,,yn)F(x_1,\ldots,x_n) \leq F(y_1,\ldots,y_n)
  • Marginal CDFs can be found by taking limits as the other arguments go to infinity: limx1,,xi1,xi+1,,xnF(x1,,xn)=Fi(xi)\lim_{x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n \to \infty} F(x_1,\ldots,x_n) = F_i(x_i)
  • The joint CDF converges to 1 as all arguments go to infinity, and to 0 if any argument goes to -\infty

Relationship to probability

  • The joint CDF evaluated at particular values gives the probability of the random variables falling in the rectangular region bounded above by those values
  • P(a1<X1b1,,an<Xnbn)=F(b1,,bn)F(b1,,bn1,an)F(a1,b2,,bn)++(1)nF(a1,,an)P(a_1 < X_1 \leq b_1, \ldots, a_n < X_n \leq b_n) = F(b_1,\ldots,b_n) - F(b_1,\ldots,b_{n-1},a_n) - \cdots - F(a_1,b_2,\ldots,b_n) + \cdots + (-1)^n F(a_1,\ldots,a_n)
  • Intuitively, the probability is found by including and excluding the relevant corners of the rectangular region

Independent vs dependent variables

  • and dependence describe the relationship between random variables in a joint distribution
  • Determine whether knowing the value of one variable provides any information about the likely values of the others
  • Have significant implications for inference, sampling, and many applications of joint distributions

Definition of independence

  • Random variables X1,,XnX_1,\ldots,X_n are independent if their joint PMF or PDF factors as a product of marginals: p(x1,,xn)=p1(x1)pn(xn)p(x_1,\ldots,x_n) = p_1(x_1) \cdots p_n(x_n) or f(x1,,xn)=f1(x1)fn(xn)f(x_1,\ldots,x_n) = f_1(x_1) \cdots f_n(x_n)
  • Intuitively, the variables are independent if knowing the values of some of them provides no information about the probabilities of the others
  • Independent variables can be treated separately, simplifying analysis and allowing results from univariate distributions to be applied more easily

Factoring joint distributions

  • For independent variables, the joint PMF, PDF, or CDF can be written as a product of the marginal distributions for each variable
  • This factorization greatly simplifies working with the joint distribution, as the individual variables can be considered in isolation
  • Many results for sums and transformations of independent random variables rely on this product structure

Conditional distributions for dependence

  • If random variables are not independent, their conditional distributions provide a way to describe the dependence between them
  • The conditional PMF or PDF of X1,,XkX_1,\ldots,X_k given Xk+1,,XnX_{k+1},\ldots,X_n is defined as p(x1,,xkxk+1,,xn)=p(x1,,xn)p(xk+1,,xn)p(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{p(x_1,\ldots,x_n)}{p(x_{k+1},\ldots,x_n)} or f(x1,,xkxk+1,,xn)=f(x1,,xn)f(xk+1,,xn)f(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{f(x_1,\ldots,x_n)}{f(x_{k+1},\ldots,x_n)}
  • Conditional distributions allow updating probabilities based on observed values, a key idea in Bayesian inference and many applications

Covariance and correlation

  • and are two measures of the linear dependence between random variables
  • Provide a way to quantify the strength and direction of any linear relationship
  • Are important summary statistics for multivariate data and appear in many formulas related to joint distributions

Measures of dependence

  • The covariance between random variables XX and YY is defined as Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]
    • Measures the joint variability of the variables around their means
    • Is positive when larger values of one variable tend to occur with larger values of the other, and negative when larger values of one tend to occur with smaller values of the other
  • The correlation between XX and YY is defined as ρ(X,Y)=Cov(X,Y)Var(X)Var(Y)\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}
    • Normalizes the covariance to be between -1 and 1, allowing comparison across different scales
    • Measures the linear relationship: ρ=±1\rho = \pm 1 implies a perfect linear relationship, while ρ=0\rho = 0 implies no linear relationship (but a nonlinear relationship may exist)

Covariance matrix

  • The covariance matrix Σ\Sigma of a random vector X=(X1,,Xn)\mathbf{X} = (X_1,\ldots,X_n) is an n×nn \times n matrix whose (i,j)(i,j) entry is Cov(Xi,Xj)\text{Cov}(X_i,X_j)
  • Summarizes all pairwise covariances between the components of the random vector
  • Is symmetric and positive semi-definite, with diagonal entries equal to the variances of each component
  • Appears in multivariate versions of Chebyshev's inequality, the weak law of large numbers, and the central limit theorem

Correlation coefficient

  • The correlation coefficient matrix R\mathbf{R} has (i,j)(i,j) entry equal to the correlation ρ(Xi,Xj)\rho(X_i,X_j)
  • Is the covariance matrix of the standardized variables (Xiμi)/σi(X_i - \mu_i)/\sigma_i, where μi\mu_i and σi\sigma_i are the mean and standard deviation of XiX_i
  • Has diagonal entries of 1 and off-diagonal entries between -1 and 1
  • Is often easier to interpret than the covariance matrix due to the normalized scale

Transformations of random vectors

  • Transformations of random vectors are used to create new random variables or vectors from existing ones
  • Often used to simplify calculations, standardize variables, or obtain distributions with desirable properties
  • The distribution of the transformed variables can be found using the joint distribution of the original variables

Linear transformations

  • A linear transformation of a random vector X=(X1,,Xn)\mathbf{X} = (X_1,\ldots,X_n) is a new vector Y=(Y1,,Ym)\mathbf{Y} = (Y_1,\ldots,Y_m) defined by Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b} for an m×nm \times n matrix A\mathbf{A} and m×1m \times 1 vector b\mathbf{b}
  • The mean vector and covariance matrix of Y\mathbf{Y} are given by E[Y]=AE[X]+bE[\mathbf{Y}] = \mathbf{A}E[\mathbf{X}] + \mathbf{b} and Cov(Y)=ACov(X)AT\text{Cov}(\mathbf{Y}) = \mathbf{A}\text{Cov}(\mathbf{X})\mathbf{A}^T
  • Many important results in statistics and signal processing involve linear transformations of random vectors (principal component analysis, filtering, etc.)

Jacobian matrix

  • For a general (nonlinear) transformation Y=g(X)\mathbf{Y} = g(\mathbf{X}), the joint PDF of Y\mathbf{Y} is related to that of X\mathbf{X} by fY(y)=fX(g1(y))det(Jg1(y))f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(g^{-1}(\mathbf{y})) |\det(J_{g^{-1}}(\mathbf{y}))|
  • Jg1(y)J_{g^{-1}}(\mathbf{y}) is the Jacobian matrix of the inverse transformation X=g1(Y)\mathbf{X} = g^{-1}(\mathbf{Y}), with (i,j)(i,j) entry equal to xiyj\frac{\partial x_i}{\partial y_j}
  • The Jacobian matrix accounts for how the transformation stretches or compresses regions of the sample space, affecting the probability density

Distribution of transformed variables

  • The joint CDF of Y=g(X)\mathbf{Y} = g(\mathbf{X}) is given by FY(y)=P(g(X)y)=g(x)yfX(x)dxF_{\mathbf{Y}}(\mathbf{y}) = P(g(\mathbf{X}) \leq \mathbf{y}) = \int_{g(\mathbf{x}) \leq \mathbf{y}} f_{\mathbf{X}}(\mathbf{x}) d\mathbf{x}
    • The region of integration is the set of x\mathbf{x} values that map into the rectangle (,y1]××(,ym](-\infty,y_1] \times \cdots \times (-\infty,y_m] under gg
  • For linear transformations of continuous random vectors, the joint PDF can be found using the Jacobian formula with Jg1(y)=A1J_{g^{-1}}(\mathbf{y}) = \mathbf{A}^{-1}
  • In the discrete case, the PMF of Y\mathbf{Y} is given by pY(y)=x:g(x)=ypX(x)p_{\mathbf{Y}}(\mathbf{y}) = \sum_{\mathbf{x}: g(\mathbf{x}) = \mathbf{y}} p_{\mathbf{X}}(\mathbf{x})

Sums of random variables

  • Sums of random variables arise in many applications, such as repeated measurements, cumulative effects, or aggregations
  • The distribution of a sum depends on the joint distribution of the individual variables being added together
  • Convolutions provide a general way to find the distribution of sums in both the discrete and continuous cases

Convolution for discrete variables

  • For independent discrete random variables XX and YY with PMFs pXp_X and pYp_Y, the PMF of their sum Z=X+YZ = X + Y is given by the convolution sum: pZ(z)=kpX(k)pY(zk)p_Z(z) = \sum_k p_X(k)p_Y(z-k)
    • The convolution evaluates the probability of all ways to achieve a sum of zz by adding values of XX and YY
  • The convolution sum extends to more than two variables: pX1++Xn(z)=k1++kn=zpX1(k1)pXn(kn)p_{X_1 + \cdots + X_n}(z) = \sum_{k_1 + \cdots + k_n = z} p_{X_1}(k_1) \cdots p_{X_n}(k_n)
  • Convolution sums can be efficiently computed using generating functions or Fourier transforms

Convolution integral for continuous variables

  • For independent continuous random variables XX and YY with PDFs fXf_X and fYf_Y, the PDF of their sum Z=X+YZ = X + Y is given by the convolution integral: $f_Z(z) = \int_{

Key Terms to Review (16)

Bivariate Normal Distribution: The bivariate normal distribution is a probability distribution that describes the joint behavior of two continuous random variables, each following a normal distribution. This distribution is characterized by its mean vector and covariance matrix, capturing the relationship between the two variables. It is essential in understanding how changes in one variable can impact another and helps in multivariate statistical analysis.
Change of variables: Change of variables is a mathematical technique used to transform random variables in probability distributions, making it easier to work with joint and marginal distributions. This technique allows us to express probabilities in terms of new variables that may be more convenient, often simplifying the integration and analysis involved in calculating probabilities.
Conditional distribution: Conditional distribution describes the probability distribution of a random variable given that another random variable takes on a specific value. It allows us to understand how one variable behaves in relation to another, highlighting the dependencies between them. This concept is essential for analyzing joint behaviors and can be applied to both discrete and continuous variables, as well as in the context of marginal distributions, where it helps reveal how distributions change under specific conditions.
Contour Plots: Contour plots are graphical representations of three-dimensional data in two dimensions, where contour lines connect points of equal value on a plane. They are useful for visualizing joint probability distributions, as they allow us to see how probabilities are distributed across different combinations of two variables, highlighting areas of higher and lower likelihoods.
Correlation: Correlation is a statistical measure that describes the degree to which two variables move in relation to each other. A strong correlation indicates that when one variable changes, the other variable tends to change as well, either positively or negatively. Understanding correlation is crucial in analyzing relationships between random variables and interpreting how joint distributions behave, especially in continuous contexts and when looking at marginal and conditional distributions.
Covariance: Covariance is a statistical measure that indicates the extent to which two random variables change together. It helps in understanding the relationship between variables, such as whether they tend to increase or decrease simultaneously. This concept is crucial for assessing how variables interact, and it plays a significant role in analyzing joint distributions, continuous distributions, and in determining the characteristics of stochastic processes like Brownian motion.
F(x, y): In the context of joint probability distributions, f(x, y) represents the joint probability density function (PDF) for two random variables, X and Y. This function describes the likelihood of simultaneous outcomes for both variables, illustrating how they interact with each other. It provides a way to visualize and analyze the relationship between the two variables by depicting their combined probabilities across their possible values.
Independence: Independence refers to the statistical property where two random variables do not influence each other's outcomes. When two variables are independent, the occurrence of one does not affect the probability of the other occurring. This concept is crucial in understanding how random variables interact and is foundational in determining joint and conditional probabilities.
Joint probability density function: A joint probability density function (PDF) is a mathematical function that describes the likelihood of two or more continuous random variables occurring simultaneously. It provides a way to model the relationship between these variables and their associated probabilities, allowing us to compute the probability of specific outcomes within a given range for each variable. Understanding joint PDFs is crucial for analyzing the behavior of multiple random variables and their interdependencies.
Joint Probability Mass Function: A joint probability mass function (PMF) is a function that gives the probability of each possible combination of outcomes for two or more discrete random variables. This function captures the relationship between these random variables, allowing for the calculation of probabilities associated with their joint occurrences. Understanding the joint PMF is crucial for analyzing how multiple random variables interact, as it provides insights into their dependencies and correlations.
Jointly distributed random variables: Jointly distributed random variables refer to a set of two or more random variables that have a defined probability distribution together, capturing the relationship between them. This distribution allows us to understand how these variables interact, revealing insights into their joint behavior and dependencies. By examining jointly distributed random variables, we can assess correlations, independence, and the overall structure of their combined probabilities.
Marginal Distribution: Marginal distribution is the probability distribution of a single random variable within a multi-dimensional context, obtained by summing or integrating over the other variables. This concept is essential as it helps to understand how the probabilities of individual variables are influenced by their relationships with others, highlighting key insights in both discrete and continuous settings. It also lays the groundwork for analyzing conditional distributions, allowing for a deeper exploration of dependence and independence between variables.
Multivariate distribution: A multivariate distribution describes the probability distribution of two or more random variables simultaneously. It captures the relationships and interactions between these variables, allowing for a deeper understanding of their joint behavior and dependencies. This concept is crucial in assessing how the variables influence one another and helps in modeling complex scenarios in various fields such as statistics, finance, and machine learning.
P(x, y): p(x, y) represents the joint probability distribution function for two random variables, x and y. This function gives the probability that x takes on a specific value and y takes on another specific value simultaneously. Understanding p(x, y) is essential for analyzing the relationship between two variables, as it allows us to determine how they may influence each other and assess the probabilities of their combined outcomes.
Scatter plots: A scatter plot is a graphical representation that displays values for two variables for a set of data. Each point on the scatter plot corresponds to one observation in the dataset, with the x-axis representing one variable and the y-axis representing another. Scatter plots are particularly useful for identifying relationships and trends between the two variables, such as correlation or distribution patterns.
Transformations of Random Variables: Transformations of random variables refer to the process of applying a mathematical function to a random variable to create a new random variable. This concept allows us to analyze how changes in the original variable influence the behavior and distribution of the new variable, which is particularly important when dealing with joint probability distributions and understanding the relationships between multiple random variables.
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.