Generalized Linear Models (GLMs) expand on traditional linear regression by allowing for non-normal response variables. Two core components make this possible: the link function, which transforms the expected value of the response so predictions stay in a valid range, and the linear predictor, which combines your explanatory variables and coefficients into a single expression. Together, they let you model everything from binary outcomes to count data within one unified framework.

Link Functions in GLMs

Purpose and Concept

In ordinary linear regression, you model the mean of the response directly as a linear combination of predictors. That works fine when the response is continuous and unbounded, but it breaks down for other types of data. If you're modeling a probability, for instance, nothing stops a plain linear model from predicting values below 0 or above 1.

A link function solves this by transforming the expected value of the response onto a scale where a linear model makes sense. Formally, if $\mu = E(Y)$ is the mean of the response, the link function $g(\cdot)$ satisfies:

$g(\mu) = \eta$

where $\eta$ is the linear predictor. The link function must be monotonic (strictly increasing or decreasing) and differentiable, so the transformation is smooth and invertible.

A few key points about choosing a link function:

The choice depends on the nature of the response variable and the assumed relationship between predictors and response.
The link function constrains predicted values to the valid range for the response distribution (e.g., positive values for counts, values between 0 and 1 for probabilities).
The canonical link is the link function that corresponds to the natural parameter of the exponential family distribution assumed for the response. Canonical links have nice mathematical properties (they simplify estimation), but you're not required to use them.

Importance in GLMs

Link functions are what allow a single modeling framework to handle binary, count, and continuous response variables.
They let you capture non-linear relationships between predictors and the response while keeping the model linear on the transformed scale.
Choosing the wrong link function can lead to biased coefficient estimates and poor model fit. If your residual diagnostics look off, the link function is one of the first things to reconsider.
By keeping predicted values within the valid range for the response distribution, link functions help the model satisfy the assumptions of the GLM framework.

Common Link Functions

Binary Response Variables

For binary outcomes (0/1), you need a link that maps probabilities in $(0, 1)$ to the entire real line $(-\infty, +\infty)$ .

Logit link (canonical for binomial): transforms the mean to the log-odds scale. If $p$ is the probability of success, the logit link is $g(p) = \log\!\left(\frac{p}{1-p}\right)$ . This is the most common choice for binary data.
Probit link: uses the inverse of the standard normal CDF, so $g(p) = \Phi^{-1}(p)$ . Results are often similar to the logit, but the probit assumes a latent normal variable underlying the binary outcome.

Example: Modeling the probability that a customer purchases a product (yes/no) based on age and income.

Purpose and concept, r - Using GLM on a continuous response variable - Cross Validated

Count Response Variables

Count data (0, 1, 2, ...) are non-negative integers, so you need predicted values that are always positive.

Log link (canonical for Poisson): $g(\mu) = \log(\mu)$ . This ensures $\mu > 0$ and is the standard choice when assuming a Poisson distribution.

Example: Modeling the number of car accidents per day based on weather conditions and traffic volume.

Continuous Positive Response Variables

When the response is continuous but strictly positive (like dollar amounts or durations), two common links apply:

Log link with a gamma distribution: $g(\mu) = \log(\mu)$ . Models the log of the expected value.
Inverse link (canonical for gamma): $g(\mu) = 1/\mu$ . Also used with the inverse Gaussian distribution.

Example: Modeling insurance claim amounts based on policyholder characteristics.

Normal Response Variables

Identity link (canonical for normal): $g(\mu) = \mu$ . The link function does nothing; the linear predictor directly equals the expected value. This is just ordinary linear regression.

Example: Modeling height based on age and gender.

Rare Events or Extreme Probabilities

Complementary log-log link: $g(p) = \log(-\log(1 - p))$ . Useful for binary outcomes where the event of interest is rare, because the link is asymmetric (unlike the logit and probit, which are symmetric around $p = 0.5$ ).

Example: Modeling the occurrence of a rare disease based on risk factors.

Linear Predictors in GLMs

Purpose and concept, Prediction intervals for GLMs part I

Construction

The linear predictor $\eta$ is a linear combination of explanatory variables and their coefficients. In its simplest form:

$\eta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$

Beyond main effects, the linear predictor can include:

Interaction terms (e.g., $\beta_3 x_1 x_2$ ), which capture how the effect of one variable changes depending on the level of another.
Polynomial terms (e.g., $\beta_4 x_1^2$ ), which allow for curvature in the relationship between a predictor and the response on the link scale.

For categorical explanatory variables, you incorporate them using dummy variables or contrast coding:

Dummy variables create binary (0/1) indicators for each category (minus one reference category).
Contrast coding compares categories to a reference level or to the overall mean.

The intercept $\beta_0$ represents the value of the linear predictor when all explanatory variables equal zero. For models with categorical predictors, it corresponds to the reference level of those variables.

Relationship with the Link Function

The linear predictor and the link function are connected by:

$g(\mu) = \eta$

or equivalently:

$\mu = g^{-1}(\eta)$

This means the linear predictor lives on the transformed scale, and you apply the inverse link to get back to the scale of the response. The choice of link function determines how you interpret $\eta$ :

With the log link, $\eta$ represents $\log(\mu)$ , so a one-unit increase in a predictor shifts the log of the expected response by the corresponding coefficient.
With the logit link, $\eta$ represents $\log(p/(1-p))$ , so coefficients describe changes in log-odds.

The relationship between predictors and the response on its original scale is non-linear whenever the link function is non-linear. The linearity in a GLM refers to the linear predictor, not to the relationship between predictors and the response itself.

Interpreting Coefficients in GLMs

Interpretation Based on Link Function

How you interpret a coefficient depends entirely on which link function you're using. In every case, the interpretation is "holding all other variables constant."

Identity link (linear regression): Coefficients represent the change in the expected response for a one-unit increase in the predictor. A coefficient of 2.5 for age means the expected response increases by 2.5 units per additional year of age.

Log link: Coefficients represent the change in $\log(\mu)$ for a one-unit increase in the predictor. Exponentiating gives the multiplicative effect on the expected response. A coefficient of 0.3 for income means the expected response is multiplied by $e^{0.3} \approx 1.35$ for each additional unit of income. In other words, a 35% increase.

Logit link: Coefficients represent the change in the log-odds of the response for a one-unit increase in the predictor. Exponentiating gives the odds ratio. A coefficient of 1.2 for a binary predictor means the odds of "success" are $e^{1.2} \approx 3.32$ times higher when that predictor equals 1 compared to 0.

Probit link: Coefficients represent the change in $\Phi^{-1}(p)$ for a one-unit increase in the predictor. These are harder to interpret directly. A coefficient of 0.5 means the probit of the expected probability increases by 0.5 per unit increase. Converting to probability requires plugging specific values into the normal CDF, so marginal effects are often reported instead.

Interaction Terms

An interaction term means the effect of one predictor on the response depends on the level of another predictor. Interpretation varies by link function:

Identity link: The interaction coefficient gives the additional change in the expected response when both interacting variables increase by one unit, beyond the sum of their individual effects. If the interaction between age and gender has a coefficient of $-0.8$ , the effect of age on the response is 0.8 units lower for one gender compared to the other.

Log link: The interaction coefficient gives an additional multiplicative factor. If the interaction between price and promotion has a coefficient of 0.2, the multiplicative effect of price on the response is scaled by an additional factor of $e^{0.2} \approx 1.22$ when a promotion is present.

When interpreting interactions, always consider the main effects alongside the interaction term. The effect of one variable on the response varies across levels of the interacting variable, so a single number rarely tells the full story. Plotting predicted values or marginal effects across the range of the interacting variables is the most reliable way to understand what the interaction is doing.

2,589 studying →