๐Ÿ”ขNumerical Analysis I

Curve Fitting Methods

Study smarter with Fiveable

Get study guides, practice questions, and cheatsheets for all your subjects. Join 500,000+ students with a 96% pass rate.

Get Started

Why This Matters

Curve fitting connects messy real-world data to clean mathematical models you can use for analysis, prediction, and computation. In Numerical Analysis I, you need to choose the right method for a given situation: minimizing overall error, passing exactly through data points, or avoiding numerical instabilities like oscillation. These concepts tie directly into error analysis, convergence behavior, computational efficiency, and numerical stability.

The methods below represent fundamentally different philosophies about how to approximate data. Some prioritize exactness at known points (interpolation), others prioritize minimizing total error (regression), and still others prioritize smoothness or special mathematical properties. Knowing when each method shines and why it might fail matters just as much as knowing the formulas.


Interpolation: Passing Exactly Through Points

These methods construct functions that pass exactly through every data point. The tradeoff: you gain precision at known locations but risk instability between them, especially as the number of points grows.

Lagrange Interpolation

Given n+1n+1 data points, Lagrange interpolation constructs the unique degree-nn interpolating polynomial directly using basis polynomials Li(x)L_i(x). Each basis polynomial equals 1 at point ii and 0 at every other data point, so the interpolant is:

P(x)=โˆ‘i=0nyiLi(x),Li(x)=โˆj=0jโ‰ inxโˆ’xjxiโˆ’xjP(x) = \sum_{i=0}^{n} y_i L_i(x), \quad L_i(x) = \prod_{\substack{j=0 \\ j \neq i}}^{n} \frac{x - x_j}{x_i - x_j}

This formulation is conceptually elegant but computationally expensive: it requires O(n2)O(n^2) operations and a complete recalculation if any data point changes. For that reason, Lagrange interpolation is most useful in theoretical analysis and proofs rather than large-scale computation.

Newton's Divided Difference Method

Newton's method builds the same interpolating polynomial but does so recursively through a divided difference table. The polynomial takes the form:

P(x)=f[x0]+f[x0,x1](xโˆ’x0)+f[x0,x1,x2](xโˆ’x0)(xโˆ’x1)+โ‹ฏP(x) = f[x_0] + f[x_0, x_1](x - x_0) + f[x_0, x_1, x_2](x - x_0)(x - x_1) + \cdots

where f[x0,x1,โ€ฆ,xk]f[x_0, x_1, \ldots, x_k] denotes the kkth-order divided difference.

The key advantage: when a new data point arrives, you only need to compute new divided differences and append one more term. You don't start from scratch. This method also handles unevenly spaced data naturally since the divided difference formulation makes no assumption about uniform spacing.

Compare: Lagrange vs. Newton's Divided Difference produce the same interpolating polynomial, but Newton's method is computationally superior when data points are added incrementally. If a problem asks about updating an interpolation with new data, Newton is your answer.

Polynomial Interpolation

A few foundational facts tie the interpolation methods together:

  • Uniqueness theorem: For nn data points, there is exactly one polynomial of degree at most nโˆ’1n-1 passing through all of them. Lagrange and Newton are just two different ways to construct it.
  • Runge's phenomenon: High-degree polynomials can oscillate wildly between data points, especially near the boundaries of the interval. For example, interpolating the function f(x)=11+25x2f(x) = \frac{1}{1+25x^2} on equally spaced nodes produces increasingly severe oscillations as the degree grows.
  • Practical limit: Polynomial interpolation works best for small, well-behaved datasets. Avoid it when you have many points or data with high variability near the edges of your interval.

Piecewise Methods: Controlling Local Behavior

Rather than forcing one polynomial through all points, these methods use different polynomials in different regions. This local control dramatically improves stability and smoothness.

Spline Interpolation

A spline is a piecewise polynomial where the pieces are joined at connection points called knots, with continuity conditions enforced at each knot.

Cubic splines are the standard choice. Between each pair of consecutive data points, you fit a cubic polynomial, then require:

  1. The spline passes through every data point (interpolation condition).
  2. Adjacent cubic pieces share the same function value at each knot (C0C^0 continuity).
  3. First derivatives match at each knot (C1C^1 continuity, ensuring no corners).
  4. Second derivatives match at each knot (C2C^2 continuity, ensuring visual smoothness).

Two additional boundary conditions are needed to close the system. Common choices include natural splines (Sโ€ฒโ€ฒ(x0)=Sโ€ฒโ€ฒ(xn)=0S''(x_0) = S''(x_n) = 0) and clamped splines (first derivatives specified at the endpoints).

Because each piece is only cubic, splines avoid Runge's phenomenon entirely. This makes them ideal for interpolating large datasets where a single high-degree polynomial would oscillate.

Bezier Curves

Bezier curves take a different approach: they're defined by control points rather than data points. The curve is influenced by all control points but only passes through the first and last. The mathematical basis is the set of Bernstein polynomials:

Bin(t)=(ni)ti(1โˆ’t)nโˆ’i,tโˆˆ[0,1]B_i^n(t) = \binom{n}{i} t^i (1-t)^{n-i}, \quad t \in [0,1]

A Bezier curve with n+1n+1 control points has degree nn. Quadratic (3 control points) and cubic (4 control points) Bezier curves dominate practical applications. Moving a single control point creates a predictable, localized change to the curve shape, which is why Bezier curves are the standard in computer graphics and CAD.

Compare: Splines vs. Bezier Curves: splines pass through all data points while Bezier curves use control points to shape the curve. Splines excel at data fitting; Bezier curves excel at design applications where intuitive manipulation matters more than exact interpolation.


Regression: Minimizing Overall Error

When data contains noise or you don't need to pass exactly through points, regression methods find the "best fit" by minimizing error across all observations.

Least Squares Method

The least squares method minimizes the sum of squared residuals:

S=โˆ‘i=1n(yiโˆ’y^i)2S = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Why squared? Squaring penalizes large errors more heavily than small ones, prevents positive and negative errors from canceling, and yields a smooth objective function with a unique minimum.

For a linear model (linear in the parameters, not necessarily in xx), the optimal coefficients satisfy the normal equations:

ATAx=ATbA^T A \mathbf{x} = A^T \mathbf{b}

where AA is the design matrix, x\mathbf{x} is the vector of unknown coefficients, and b\mathbf{b} is the vector of observed yy-values. Understanding this system is essential since it connects curve fitting directly to linear algebra.

Linear Regression

Linear regression fits the model y=mx+by = mx + b by finding the slope and intercept that minimize squared error. The key assumptions are:

  • The relationship between xx and yy is approximately linear.
  • Errors are independent and roughly constant in magnitude.

A closed-form solution exists (directly from the normal equations), so no iteration is needed. This makes linear regression fast and reliable. Violations of the assumptions, such as a clearly curved relationship or errors that grow with xx, degrade prediction quality and should prompt you to consider a different model.

Nonlinear Regression

When the relationship can't be captured by a polynomial in the parameters, you need nonlinear regression. Models like y=aebxy = ae^{bx} or y=a1+eโˆ’bxy = \frac{a}{1 + e^{-bx}} fall into this category.

No closed-form solution exists, so you must use iterative optimization methods such as Gauss-Newton or Levenberg-Marquardt. These algorithms refine parameter estimates step by step, which introduces two practical concerns:

  • Initial guesses matter. Poor starting values can cause convergence to a local minimum (not the global best) or outright failure to converge.
  • Convergence is not guaranteed. You may need to try multiple starting points or apply regularization.

Compare: Linear vs. Nonlinear Regression: linear regression has a guaranteed unique solution computed directly, while nonlinear regression requires iterative methods with potential convergence issues. A useful strategy: try linearizing your model first (e.g., taking logs to turn y=aebxy = ae^{bx} into lnโกy=lnโกa+bx\ln y = \ln a + bx) before resorting to full nonlinear fitting.


Special Approximation Techniques

These methods optimize for specific mathematical properties beyond just fitting data: minimizing maximum error, capturing frequency content, or achieving optimal convergence.

Chebyshev Approximation

Most methods minimize total error. Chebyshev approximation instead minimizes the maximum error across the entire interval (the minimax criterion). This distributes error evenly rather than allowing it to concentrate in certain regions.

The Chebyshev polynomials Tn(x)=cosโก(narccosโกx)T_n(x) = \cos(n \arccos x) on [โˆ’1,1][-1, 1] have remarkable properties:

  • Their roots, called Chebyshev nodes, provide the optimal choice of interpolation points to minimize Runge's phenomenon. Using these nodes instead of equally spaced points dramatically reduces interpolation error.
  • The equioscillation theorem characterizes the best polynomial approximation of degree nn: the error curve alternates between its maximum and minimum values exactly n+2n+2 times.

Use Chebyshev approximation when you need to control worst-case error, such as in function approximation for numerical libraries.

Fourier Series Approximation

Fourier series represent functions as sums of sines and cosines:

f(x)โ‰ˆa02+โˆ‘n=1N[ancosโก(nx)+bnsinโก(nx)]f(x) \approx \frac{a_0}{2} + \sum_{n=1}^{N} \left[ a_n \cos(nx) + b_n \sin(nx) \right]

The coefficients ana_n and bnb_n capture the frequency content of the original function. This makes Fourier approximation optimal for periodic functions: it converges to the function at every point of continuity.

At discontinuities, Fourier series exhibit the Gibbs phenomenon: the partial sums overshoot by roughly 9% of the jump size, and this overshoot persists no matter how many terms you include. Fourier methods also form the foundation for spectral methods in solving differential equations and for signal processing applications.

Compare: Chebyshev vs. Fourier Approximation: Chebyshev minimizes the worst-case error for general (non-periodic) functions on an interval, while Fourier is optimal for periodic functions and frequency analysis. Choose Chebyshev for polynomial approximation problems; choose Fourier when periodicity or frequency content matters.


Quick Reference Table

ConceptBest Methods
Exact interpolation through pointsLagrange, Newton's Divided Difference, Polynomial Interpolation
Avoiding oscillation / Runge's phenomenonSpline Interpolation, Chebyshev Approximation
Piecewise constructionSpline Interpolation, Bezier Curves
Minimizing total errorLeast Squares, Linear Regression, Nonlinear Regression
Efficient updates with new dataNewton's Divided Difference
Periodic function approximationFourier Series
Minimax (minimize maximum error)Chebyshev Approximation
Design / graphics applicationsBezier Curves

Self-Check Questions

  1. Both Lagrange interpolation and Newton's divided difference produce the same polynomial. What computational advantage does Newton's method offer, and in what scenario does this matter most?

  2. You have 20 noisy data points and want to find a trend line. Would you choose polynomial interpolation or least squares regression? Explain your reasoning in terms of error behavior.

  3. Compare and contrast spline interpolation with high-degree polynomial interpolation. What phenomenon does spline interpolation avoid, and what mathematical property makes this possible?

  4. A function exhibits periodic behavior with multiple frequency components. Which approximation method would best capture this structure, and what mathematical objects form its basis?

  5. Given a dataset where new points are frequently added, recommend an interpolation method and justify your choice. Then explain what would change if the data contained significant measurement noise.