Outliers are data points that fall far from the overall pattern in your data. In regression, even a single outlier can drag the regression line in its direction, inflate or deflate the correlation coefficient, and distort predictions. Knowing how to identify outliers and decide what to do with them is a core skill for this unit.

Outliers Using the Standard Deviations Rule

The simplest way to flag potential outliers is to look at how far each data point sits from the mean of a variable.

Calculate the mean and standard deviation of your dataset.
Check each data point's distance from the mean.
Any point more than two standard deviations from the mean is a potential outlier.

This rule assumes the data roughly follows a normal distribution, where about 95% of values fall within two standard deviations of the mean. If the distribution is heavily skewed, this rule becomes less reliable.

Once you flag a potential outlier, investigate it further:

Is it an error? Measurement mistakes or data entry typos are common sources of outliers.
Is it a genuine extreme value? Some real observations are just unusual. A student who scores 100 on every exam in a class averaging 72 is an outlier, but that's real data.

Whether you include or exclude an outlier depends on the context and goals of your analysis.

Outliers using standard deviations rule, The Normal Distribution and Standard Deviation – Physics 132 Lab Manual

Standard Deviation of Residuals

While the standard deviations rule looks at individual variables, residuals let you find outliers relative to the regression line itself. A residual is the difference between an observed $y$ -value and the predicted $\hat{y}$ -value from the regression model:

$residual = y - \hat{y}$

A data point might look normal on its own but still be an outlier in the context of the regression if it falls far from the predicted line.

To identify regression outliers using residuals:

Fit the regression model and calculate each point's residual.
Compute the standard deviation of the residuals using:

$s = \sqrt{\frac{\sum(residual_i - \overline{residual})^2}{n - 2}}$

The denominator is $n - 2$ (not $n - 1$ ) because the regression model estimates two parameters: the slope and the intercept.

Flag any point whose residual is more than two standard deviations from the mean residual as a potential outlier.

These flagged points may be pulling the regression line toward them and weakening the model's fit. Studentized residuals are a refined version that standardizes each residual by its own estimated variability, making comparisons across data points more reliable.

Outliers using standard deviations rule, The Empirical Rule – Math For Our World

Impact of Outlier Removal

Outliers can substantially shift both the regression line and the correlation coefficient $r$ . To evaluate their influence, compare results with and without the suspected outlier:

Fit the regression model with all data points. Record the regression equation and $r$ (or $r^2$ ).
Remove the identified outlier(s) from the dataset.
Refit the regression model without the outlier(s). Record the new equation and $r$ .
Compare the two sets of results.

Large changes in slope, intercept, or $r$ mean the outlier was highly influential.
Small changes suggest the outlier wasn't driving the results, and the model is stable without it.

Be cautious about removing outliers. A few things to keep in mind:

Outliers may represent valid, informative data. Removing them just because they're inconvenient can introduce bias.
Always document and justify any removal. "It looked weird" is not a good reason; "the value was a confirmed recording error" is.
Consider the consequences: removing a point that anchors one end of the data can make the remaining relationship look stronger (or weaker) than it really is.

Distribution Characteristics and Outlier Treatment

The shape of your data's distribution affects how you identify and handle outliers.

Skewness measures asymmetry. In a right-skewed distribution, high values are more spread out, so points on the right tail may look like outliers under the two-standard-deviations rule even though they're a natural part of the distribution's shape.
Kurtosis describes how heavy the tails are. Distributions with high kurtosis (called leptokurtic) naturally produce more extreme values, so you'd expect to see more points beyond two standard deviations even without any true outliers.

Two additional techniques sometimes come up in this context:

Winsorization replaces extreme values with less extreme ones (for example, capping all values beyond the 95th percentile at the 95th percentile value). This limits the influence of outliers without removing data points entirely.
Jackknife resampling systematically removes one observation at a time and recalculates the statistic of interest. If dropping a single point causes a large shift in the result, that point is highly influential.