~/satyajit

The Kalman filter from first principles

mdjsonmcp

2026-06-28 · 7 min · state-estimation · kalman-filter · slam · robotics · explainer

I spend a lot of time on odometry and SLAM, and underneath almost all of it is the same idea: you have a noisy sensor, and a model of how the thing you're tracking moves, and you want to fuse them into a single best estimate that also tells you how much to trust itself. That's the Kalman filter. It's the optimal recursive estimator for a linear system with Gaussian noise, and once it clicks, you see it everywhere — GPS, IMUs, radar tracking, and the LiDAR-inertial filters I'll build on in the next article.

Let me derive it the way it actually makes sense: as fusing two Gaussians.

Two sources of information

You're tracking a state x\mathbf{x} — say the position and velocity of something. You have two things:

  1. A process model: how the state evolves on its own. For constant velocity, pt=pt1+vΔtp_{t} = p_{t-1} + v\,\Delta t. You can predict forward, but the prediction drifts — it accumulates uncertainty.
  2. A measurement model: a sensor that observes some function of the state, noisily. A position sensor gives you z=p+noisez = p + \text{noise}.

Neither is enough alone. The prediction drifts; the sensor is noisy and jittery. The Kalman filter combines them, and the key fact is that combining two Gaussian estimates yields a third Gaussian that is sharper than either input.

prediction × measurement = posterior
predictionmeasurementposterior
prediction spread σ_pred10
measurement spread σ_z7
Kalman gain K
0.67
posterior mean
55.4
posterior σ
5.7 (< both)

The posterior mean is μpred + K(z − μpred): the gain K = σ²pred / (σ²pred + σ²z) is the fraction of the way you move from the prediction toward the measurement. Shrink the sensor noise and K → 1 (snap to the measurement); shrink the prediction noise and K → 0 (keep the prediction). Either way the posterior is narrower than both inputs — combining two noisy estimates always tells you more than either alone. That’s the whole filter.

That's the whole update step. If the prediction is N(μ1,σ12)\mathcal{N}(\mu_1, \sigma_1^2) and the measurement is N(μ2,σ22)\mathcal{N}(\mu_2, \sigma_2^2), their normalized product has

μ=μ1+K(μ2μ1),σ2=(1K)σ12,K=σ12σ12+σ22.\mu = \mu_1 + K(\mu_2 - \mu_1), \qquad \sigma^2 = (1-K)\,\sigma_1^2, \qquad K = \frac{\sigma_1^2}{\sigma_1^2 + \sigma_2^2}.

KK is the Kalman gain — the fraction of the way you move from the prediction toward the measurement, set by which one is more certain. Trust the sensor (σ20\sigma_2 \to 0) and K1K \to 1; trust the prediction (σ10\sigma_1 \to 0) and K0K \to 0. Everything else is this, generalized to vectors.

The predict-update cycle

In general the state is a vector x\mathbf{x} with covariance P\mathbf{P}, and the models are linear maps with Gaussian noise:

xt=Fxt1+But+w,wN(0,Q);zt=Hxt+v,vN(0,R).\mathbf{x}_t = \mathbf{F}\mathbf{x}_{t-1} + \mathbf{B}\mathbf{u}_t + \mathbf{w}, \quad \mathbf{w}\sim\mathcal{N}(0,\mathbf{Q}); \qquad \mathbf{z}_t = \mathbf{H}\mathbf{x}_t + \mathbf{v}, \quad \mathbf{v}\sim\mathcal{N}(0,\mathbf{R}).

F\mathbf{F} is the state-transition, H\mathbf{H} maps state to measurement, Q\mathbf{Q} is process noise (how much the model drifts), R\mathbf{R} is measurement noise. The filter alternates two steps forever:

Predict — push the state and its uncertainty forward through the model:

x^=Fx+Bu,P^=FPF ⁣+Q.\hat{\mathbf{x}} = \mathbf{F}\mathbf{x} + \mathbf{B}\mathbf{u}, \qquad \hat{\mathbf{P}} = \mathbf{F}\mathbf{P}\mathbf{F}^{\!\top} + \mathbf{Q}.

Update — correct with the measurement, by the matrix Kalman gain:

y=zHx^(innovation),S=HP^H ⁣+R(innovation covariance),\mathbf{y} = \mathbf{z} - \mathbf{H}\hat{\mathbf{x}} \quad(\text{innovation}), \qquad \mathbf{S} = \mathbf{H}\hat{\mathbf{P}}\mathbf{H}^{\!\top} + \mathbf{R} \quad(\text{innovation covariance}), K=P^H ⁣S1,x=x^+Ky,P=(IKH)P^.\mathbf{K} = \hat{\mathbf{P}}\mathbf{H}^{\!\top}\mathbf{S}^{-1}, \qquad \mathbf{x} = \hat{\mathbf{x}} + \mathbf{K}\mathbf{y}, \qquad \mathbf{P} = (\mathbf{I} - \mathbf{K}\mathbf{H})\hat{\mathbf{P}}.
predictx̂=Fx · P̂=FPFᵀ+Qupdatex=x̂+K(z−Hx̂)prior x̂, P̂posterior x, P → next stepmeasurement z
The Kalman loop: predict pushes the estimate and its covariance forward through the motion model (uncertainty grows by Q); update pulls it back toward the measurement by the gain K (uncertainty shrinks). The two steps alternate for every timestep.

Watch it run on a constant-velocity tracker. The state is [position,velocity][\text{position}, \text{velocity}], the sensor sees only position, and the filter has to infer velocity and smooth the noise. Drag RR and QQ to feel the trust trade-off the gain encodes:

constant-velocity Kalman filter · live
truthmeasurementsestimate ±1σ
measurement noise R (distrust sensor)49
process noise Q (let state drift)0.50

Raise R and the filter trusts its prediction over the sensor — the estimate smooths out but lags. Raise Qand it trusts fresh measurements more — it snaps to the data but gets jittery. The Kalman gain is exactly this ratio, computed optimally each step from the covariances, and the band is the filter’s own honest estimate of how unsure it is.

In code

The whole thing is a few lines of linear algebra. Here's the constant-velocity tracker above, in NumPy — no framework, runnable as-is:

import numpy as np
 
dt = 1.0
F = np.array([[1, dt],      # constant-velocity transition
              [0, 1]])
H = np.array([[1.0, 0.0]])  # observe position only
Q = 0.5 * np.array([[dt**3/3, dt**2/2],   # process noise (drift)
                    [dt**2/2, dt]])
R = np.array([[49.0]])      # measurement noise (sensor variance)
 
x = np.array([[0.0], [0.0]])      # initial state
P = np.eye(2) * 50.0              # initial uncertainty
 
def step(x, P, z):
    # predict
    x = F @ x
    P = F @ P @ F.T + Q
    # update
    y = z - H @ x                 # innovation
    S = H @ P @ H.T + R           # innovation covariance
    K = P @ H.T @ np.linalg.inv(S)   # Kalman gain
    x = x + K @ y
    P = (np.eye(2) - K @ H) @ P
    return x, P
 
for z in measurements:            # stream of noisy position readings
    x, P = step(x, P, np.array([[z]]))
    # x[0] is the smoothed position estimate, P[0,0] its variance

Two knobs do all the tuning. R says how noisy the sensor is; Q says how much you let the model drift. Get their ratio right and the filter is optimal; get it wrong and it either lags reality (Q too small) or chases noise (R too small). In practice you start from the sensor's datasheet for R and tune Q until the innovation y\mathbf{y} looks like white noise.

When the world isn't linear

The plain Kalman filter assumes F\mathbf{F} and H\mathbf{H} are linear. Robotics is full of rotations and projections that aren't. Three extensions matter, and the last one is the bridge to LiDAR-inertial SLAM:

That last combination — an iterated, error-state Kalman filter on a manifold — is exactly the engine inside FAST-LIO2, fusing a high-rate IMU prediction with thousands of raw LiDAR points per scan. The Kalman gain there gets one more clever rewrite to handle those thousands of measurements cheaply, which is where I'll pick up next.

The one-paragraph summary

A Kalman filter holds a Gaussian belief over a state. Predict moves the belief through a motion model and inflates its uncertainty; update multiplies it by the measurement's Gaussian, which sharpens it and pulls the mean toward the data by the gain K\mathbf{K} — optimally weighted by which source is more certain. Linearize for nonlinear systems (EKF), relinearize for hard ones (iEKF), and track the error in the tangent space for rotations. That's the whole toolkit, and it runs real robots.

share