Overview of RLS algorithm
The Recursive Least Squares (RLS) algorithm solves a core problem in adaptive filtering: how to update filter coefficients efficiently as new data arrives, without re-solving the entire least squares problem from scratch each time. It does this by recursively minimizing a weighted least squares cost function, which gives it significantly faster convergence than gradient-based methods like LMS.
RLS achieves this speed by exploiting second-order statistics of the input signal (through a recursively updated covariance matrix), rather than relying on instantaneous gradient estimates. The trade-off is computational cost: per time step versus for LMS, where is the filter length. Despite this, RLS remains a go-to algorithm when fast adaptation and accurate tracking of time-varying systems are priorities.
Key concepts in RLS
Adaptive filtering
Adaptive filtering adjusts filter coefficients automatically based on the input signal and a desired reference output. The filter "learns" by minimizing the error between its output and the desired signal according to some optimization criterion.
In the RLS context, that criterion is weighted least squares. This makes adaptive filters useful in situations where the signal statistics are unknown or changing over time:
- System identification: estimating an unknown system's impulse response
- Noise cancellation: subtracting an estimated noise component from a corrupted signal
- Channel equalization: compensating for distortions in a communication channel
Least squares estimation
Least squares estimation finds the parameter vector that minimizes the sum of squared errors between observed data and model predictions. Unlike gradient-descent approaches (which take incremental steps toward the optimum), least squares provides a closed-form solution via the normal equations.
In RLS, the least squares criterion is weighted by the forgetting factor , so recent samples contribute more than older ones. The key advantage is that this closed-form solution can be updated recursively: each new sample refines the previous solution rather than requiring a full recomputation.
Recursive updates
The recursive update mechanism is what makes RLS practical for real-time use. Rather than inverting the autocorrelation matrix from scratch at every time step, RLS applies the matrix inversion lemma to update the inverse incrementally.
Each recursion involves four operations:
- Compute the gain vector from the current input and previous covariance matrix
- Compute the a priori error using the previous weights
- Update the filter weights using the gain vector and error
- Update the inverse covariance matrix
This structure keeps the per-sample cost at instead of the that direct matrix inversion would require.
Mathematical formulation
Cost function minimization
RLS minimizes a weighted least squares cost function that exponentially discounts older samples:
Here, is the desired signal, is the filter weight vector at time , is the input vector, and is the forgetting factor (). The term means a sample that is steps in the past gets weighted by , so older data fades exponentially.
Normal equations
Setting the gradient of with respect to to zero yields the normal equations:
where:
- is the weighted autocorrelation matrix
- is the weighted cross-correlation vector
Solving directly at each step would cost . The matrix inversion lemma avoids this.
Matrix inversion lemma
The matrix inversion lemma (Woodbury identity) provides a way to update from when changes by a rank-one update. Since:
the inverse can be computed recursively without ever performing a full matrix inversion. This is the mathematical backbone that makes RLS computationally feasible.
RLS algorithm derivation
Here is the complete RLS algorithm, step by step:
Step 0: Initialization
- Set filter weights:
- Set inverse covariance matrix: , where is a small positive constant (e.g., ). A smaller means a larger initial , reflecting high initial uncertainty.
- Choose forgetting factor , typically in the range . A common default is .
Step 1: Gain vector calculation
For each new time step , compute the gain vector:
The gain vector controls how much the new sample influences the weight update. The denominator is a scalar, so this computation costs (one matrix-vector product plus an inner product).
Step 2: Error signal computation
Compute the a priori estimation error using the previous weights:
This is the error before the weights are updated, which is what drives the correction.
Step 3: Filter weights update
Update the weight vector:
The correction pushes the weights in the direction that reduces the current error, scaled by the gain vector.
Step 4: Covariance matrix update
Update the inverse covariance matrix:
This is the matrix inversion lemma applied recursively. The factor accounts for the exponential weighting. Then return to Step 1 for the next sample.
Properties of RLS
Fast convergence
RLS converges much faster than LMS because it incorporates second-order statistics of the input through the covariance matrix . Where LMS convergence speed depends on the eigenvalue spread of the input autocorrelation matrix (and slows dramatically when the spread is large), RLS effectively whitens the input, making its convergence rate nearly independent of the eigenvalue spread.
In practice, RLS can converge to near-optimal weights in roughly to iterations (where is the filter order), regardless of the input signal's spectral characteristics.
Tracking ability
The forgetting factor gives RLS the ability to track time-varying systems. By exponentially down-weighting past data, the algorithm continuously adapts to changes in the underlying system. A smaller makes the algorithm more responsive to recent changes, while a larger provides smoother, more stable estimates.
This tracking capability is what makes RLS valuable in non-stationary environments like mobile communication channels, where the system parameters change continuously.
Computational complexity
The per-sample complexity of RLS is due to the matrix-vector operations in the gain vector and covariance matrix updates. Compare this to:
- LMS: per sample
- Direct least squares (re-solving from scratch): per sample
For large filter orders, the cost can become significant. Fast RLS variants (lattice-based, fast transversal filters) can reduce this to , but at the cost of increased algorithmic complexity and potential numerical sensitivity.
Forgetting factor
Role in RLS
The forgetting factor () controls the effective memory of the algorithm. It determines how quickly the influence of past data decays.
- : infinite memory. All past samples are weighted equally. This is appropriate for stationary environments and is equivalent to growing-window least squares.
- : finite effective memory. The effective window length is approximately . For example, gives an effective window of about 100 samples; gives about 20 samples.
Exponential weighting
The weight assigned to a sample steps in the past is . This creates an exponential envelope over the data history. For , a sample 100 steps old has weight , while a sample 500 steps old has weight . The decay is smooth and continuous, with no hard cutoff.
Trade-off: tracking vs. stability
Choosing involves a fundamental trade-off:
- Smaller (e.g., 0.95): faster tracking of system changes, but higher misadjustment noise. The algorithm reacts quickly but is more sensitive to measurement noise, and the covariance matrix can grow, potentially causing instability.
- Larger (e.g., 0.999): smoother, more stable estimates, but slower response to system changes. The algorithm may lag behind rapid variations.
In practice, the optimal depends on the rate of system variation relative to the noise level. For slowly varying systems in moderate noise, to is typical. For rapidly changing environments, values closer to may be needed.
Variants and extensions
Sliding window RLS
Instead of exponential weighting over the entire data history, sliding window RLS uses only the most recent samples. Data older than steps is discarded entirely. This provides a hard cutoff rather than the soft exponential decay of standard RLS.
Sliding window RLS can be more appropriate when the system undergoes abrupt changes (where you want to completely forget old data) and avoids the gradual covariance growth that can occur with . The trade-off is additional bookkeeping to manage the window boundaries.
Stabilized fast RLS
Standard RLS can suffer from numerical instability, particularly when is close to 1 or the input signal is poorly conditioned. The covariance matrix may lose symmetry or positive definiteness due to accumulated rounding errors.
Stabilized variants address this through matrix factorization techniques:
- Square-root RLS: propagates a square-root factor of instead of itself, which guarantees positive definiteness
- UD factorization: decomposes into (unit upper triangular times diagonal times transpose), providing better numerical conditioning
These methods add some implementation complexity but are strongly recommended for long-running or safety-critical applications.
QR decomposition-based RLS
QR-RLS reformulates the problem by maintaining a QR decomposition of the (weighted) data matrix rather than explicitly tracking . At each time step, the QR factorization is updated using Givens rotations or Householder reflections.
QR-RLS offers superior numerical stability compared to standard RLS and is particularly well-suited for high filter orders or ill-conditioned inputs. The per-sample cost remains , but the constant factor is somewhat larger than standard RLS.
Applications of RLS
System identification
RLS is widely used to estimate the parameters of an unknown system from its input-output data. The adaptive filter models the unknown system, and RLS adjusts the filter weights until the model output matches the actual system output.
This is valuable in control systems (estimating plant dynamics online), acoustic modeling (estimating room impulse responses), and biomedical engineering (modeling physiological systems). RLS is preferred over LMS here when fast, accurate identification is needed or when the system parameters drift over time.
Channel equalization
In digital communications, the transmitted signal is distorted by the channel (multipath propagation, frequency-selective fading). An RLS-based equalizer estimates the inverse of the channel response and applies it to the received signal to recover the original data.
RLS equalizers adapt quickly to changing channel conditions, which is critical in mobile wireless systems where the channel can vary on a millisecond timescale. The fast convergence of RLS means fewer training symbols are needed compared to LMS-based equalizers, improving spectral efficiency.
Noise cancellation
In adaptive noise cancellation, a reference signal correlated with the noise (but not with the desired signal) is filtered by an adaptive filter. The filter output estimates the noise component, which is then subtracted from the noisy observation.
RLS provides faster noise suppression than LMS because it converges more quickly to the optimal noise estimate. Applications include removing 50/60 Hz power line interference from ECG signals, canceling engine noise in hands-free communication systems, and suppressing acoustic echo in teleconferencing.
Comparison with other algorithms
RLS vs. LMS
| Property | RLS | LMS |
|---|---|---|
| Update basis | Second-order statistics (covariance matrix) | First-order (instantaneous gradient) |
| Convergence speed | Fast, nearly independent of eigenvalue spread | Slow when eigenvalue spread is large |
| Per-sample complexity | ||
| Memory requirements | (stores matrix) | |
| Tracking ability | Superior | Adequate for slowly varying systems |
| Numerical stability | Requires care (factorization methods) | Inherently stable |
| Tuning parameter | Forgetting factor | Step size |
The choice between them depends on your constraints. If computational resources are limited and convergence speed is not critical, LMS is simpler and more robust. If you need fast convergence or are dealing with colored (non-white) input signals, RLS is the better choice.
RLS vs. Kalman filter
RLS can be viewed as a special case of the Kalman filter. Specifically, if you model the filter weights as the state vector with a random-walk state transition model and set the process noise covariance appropriately, the Kalman filter equations reduce to the RLS equations.
The Kalman filter is more general:
- It handles state-space models with arbitrary dynamics (not just static or random-walk weight models)
- It can incorporate known process noise statistics
- It provides uncertainty estimates (error covariance) alongside state estimates
RLS is preferred when you specifically need an adaptive FIR filter and don't have a detailed state-space model. The Kalman filter is preferred when you have a physical model of how the system evolves and want to exploit that structure.
Numerical examples
Simulated data
Simulated examples are the standard way to verify an RLS implementation and study its behavior. A typical setup:
- Generate a known FIR system with coefficients
- Drive it with white Gaussian noise to produce the desired signal
- Add observation noise at a specified SNR
- Run RLS and monitor the weight error norm over time
You can then vary , the filter order, or the SNR to observe how convergence speed, steady-state misadjustment, and tracking behavior change. Comparing the RLS learning curve against LMS under the same conditions clearly demonstrates the convergence advantage.
Real-world signals
Real-world examples demonstrate RLS under practical conditions where the signal statistics are unknown and non-stationary. Common demonstrations include:
- Acoustic echo cancellation: using a recorded speech signal and a measured room impulse response
- ECG denoising: removing power line interference from clinical ECG recordings
- Wireless channel equalization: using measured or standards-based channel models (e.g., ITU pedestrian or vehicular channel profiles)
These examples highlight challenges that don't appear in simulations, such as model mismatch (the true system may not be exactly FIR), non-Gaussian noise, and abrupt system changes.
Implementation considerations
Numerical stability
Numerical stability is the most important practical concern when implementing RLS. The covariance matrix must remain symmetric and positive definite throughout operation. In finite-precision arithmetic, accumulated rounding errors can violate these properties, leading to divergence.
Practical measures to ensure stability:
- Use symmetrization: replace with periodically
- Use square-root or UD factorization variants for long-running applications
- Initialize carefully; too large a value can cause transient instability, too small can slow initial convergence
- Monitor the trace or eigenvalues of as a diagnostic
Computational efficiency
For real-time systems, the per-sample cost sets a limit on the maximum filter order at a given sample rate. For example, at a 48 kHz audio sample rate, you have roughly 20 microseconds per sample. Whether or is feasible depends on your processor.
Strategies to improve efficiency:
- Exploit symmetry of to halve the storage and reduce multiplications
- Use fast RLS variants (lattice RLS, fast transversal filter) for complexity when applicable
- Consider block RLS, which processes multiple samples at once and can leverage vectorized hardware
- Use fixed-point arithmetic on embedded platforms, though this requires careful scaling analysis
Hardware requirements
RLS requires storing the full matrix , so memory scales as . For , that's roughly 1 MB in double precision. This is manageable on modern processors but can be a constraint on microcontrollers or low-power DSPs.
For demanding applications (large , high sample rates), FPGA implementations can exploit the regular structure of the matrix operations for pipelined, parallel execution. GPU acceleration is another option for offline or batch processing scenarios where latency is less critical.