# The Kalman filter from first principles

> Satyajit Ghana — Head of Engineering @ Inkers Technology
> canonical: https://ai.thesatyajit.com/articles/kalman-filter
> date: 2026-06-28
> tags: state-estimation, kalman-filter, slam, robotics, explainer
I spend a lot of time on odometry and SLAM, and underneath almost all of it is the same
idea: you have a noisy sensor, and a model of how the thing you're tracking moves, and you
want to fuse them into a single best estimate that also tells you how much to trust itself.
That's the Kalman filter. It's the optimal recursive estimator for a linear system with
Gaussian noise, and once it clicks, you see it everywhere — GPS, IMUs, radar tracking, and
the LiDAR-inertial filters I'll build on in the next article.

Let me derive it the way it actually makes sense: as fusing two Gaussians.

## Two sources of information

You're tracking a state $\mathbf{x}$ — say the position and velocity of something. You have
two things:

1. A **process model**: how the state evolves on its own. For constant velocity,
   $p_{t} = p_{t-1} + v\,\Delta t$. You can *predict* forward, but the prediction drifts —
   it accumulates uncertainty.
2. A **measurement model**: a sensor that observes some function of the state, noisily.
   A position sensor gives you $z = p + \text{noise}$.

Neither is enough alone. The prediction drifts; the sensor is noisy and jittery. The
Kalman filter combines them, and the key fact is that **combining two Gaussian estimates
yields a third Gaussian that is sharper than either input**.

<KalmanFuse />

That's the whole update step. If the prediction is $\mathcal{N}(\mu_1, \sigma_1^2)$ and the
measurement is $\mathcal{N}(\mu_2, \sigma_2^2)$, their normalized product has

$$
\mu = \mu_1 + K(\mu_2 - \mu_1), \qquad \sigma^2 = (1-K)\,\sigma_1^2,
\qquad K = \frac{\sigma_1^2}{\sigma_1^2 + \sigma_2^2}.
$$

$K$ is the **Kalman gain** — the fraction of the way you move from the prediction toward
the measurement, set by which one is more certain. Trust the sensor ($\sigma_2 \to 0$) and
$K \to 1$; trust the prediction ($\sigma_1 \to 0$) and $K \to 0$. Everything else is this,
generalized to vectors.

## The predict-update cycle

In general the state is a vector $\mathbf{x}$ with covariance $\mathbf{P}$, and the models
are linear maps with Gaussian noise:

$$
\mathbf{x}_t = \mathbf{F}\mathbf{x}_{t-1} + \mathbf{B}\mathbf{u}_t + \mathbf{w},
\quad \mathbf{w}\sim\mathcal{N}(0,\mathbf{Q});
\qquad
\mathbf{z}_t = \mathbf{H}\mathbf{x}_t + \mathbf{v}, \quad \mathbf{v}\sim\mathcal{N}(0,\mathbf{R}).
$$

$\mathbf{F}$ is the state-transition, $\mathbf{H}$ maps state to measurement, $\mathbf{Q}$
is process noise (how much the model drifts), $\mathbf{R}$ is measurement noise. The filter
alternates two steps forever:

**Predict** — push the state and its uncertainty forward through the model:

$$
\hat{\mathbf{x}} = \mathbf{F}\mathbf{x} + \mathbf{B}\mathbf{u}, \qquad
\hat{\mathbf{P}} = \mathbf{F}\mathbf{P}\mathbf{F}^{\!\top} + \mathbf{Q}.
$$

**Update** — correct with the measurement, by the matrix Kalman gain:

$$
\mathbf{y} = \mathbf{z} - \mathbf{H}\hat{\mathbf{x}}
\quad(\text{innovation}), \qquad
\mathbf{S} = \mathbf{H}\hat{\mathbf{P}}\mathbf{H}^{\!\top} + \mathbf{R}
\quad(\text{innovation covariance}),
$$
$$
\mathbf{K} = \hat{\mathbf{P}}\mathbf{H}^{\!\top}\mathbf{S}^{-1}, \qquad
\mathbf{x} = \hat{\mathbf{x}} + \mathbf{K}\mathbf{y}, \qquad
\mathbf{P} = (\mathbf{I} - \mathbf{K}\mathbf{H})\hat{\mathbf{P}}.
$$

<Diagram caption="The Kalman loop: predict pushes the estimate and its covariance forward through the motion model (uncertainty grows by Q); update pulls it back toward the measurement by the gain K (uncertainty shrinks). The two steps alternate for every timestep.">
  <svg viewBox="0 0 620 170" role="img" aria-label="The predict-update cycle: predict grows covariance via the motion model, update shrinks it via the measurement and Kalman gain." style={{ width: "100%", height: "auto" }}>
    <rect x="40" y="56" width="170" height="58" rx="10" fill="oklch(0.72 0.13 60)" opacity="0.25" stroke="var(--border)" />
    <text x="125" y="80" textAnchor="middle" fontFamily="monospace" fontSize="12" fill="var(--foreground)">predict</text>
    <text x="125" y="98" textAnchor="middle" fontFamily="monospace" fontSize="9" fill="var(--muted-foreground)">x̂=Fx · P̂=FPFᵀ+Q</text>
    <rect x="410" y="56" width="170" height="58" rx="10" fill="oklch(0.72 0.13 195)" opacity="0.25" stroke="var(--border)" />
    <text x="495" y="80" textAnchor="middle" fontFamily="monospace" fontSize="12" fill="var(--foreground)">update</text>
    <text x="495" y="98" textAnchor="middle" fontFamily="monospace" fontSize="9" fill="var(--muted-foreground)">x=x̂+K(z−Hx̂)</text>
    {/* arrows */}
    <path d="M 210 70 q 100 -34 200 0" fill="none" stroke="var(--muted-foreground)" strokeWidth="1.3" markerEnd="url(#ka)" />
    <text x="310" y="36" textAnchor="middle" fontFamily="monospace" fontSize="9" fill="var(--muted-foreground)">prior x̂, P̂</text>
    <path d="M 410 100 q -100 34 -200 0" fill="none" stroke="var(--muted-foreground)" strokeWidth="1.3" markerEnd="url(#ka)" />
    <text x="310" y="146" textAnchor="middle" fontFamily="monospace" fontSize="9" fill="var(--muted-foreground)">posterior x, P → next step</text>
    <text x="495" y="34" textAnchor="middle" fontFamily="monospace" fontSize="9" fill="var(--muted-foreground)">measurement z</text>
    <line x1="495" y1="38" x2="495" y2="54" stroke="var(--muted-foreground)" strokeWidth="1.3" markerEnd="url(#ka)" />
    <defs><marker id="ka" markerWidth="7" markerHeight="7" refX="6" refY="3.5" orient="auto"><path d="M0,0 L7,3.5 L0,7 Z" fill="var(--muted-foreground)" /></marker></defs>
  </svg>
</Diagram>

Watch it run on a constant-velocity tracker. The state is $[\text{position}, \text{velocity}]$,
the sensor sees only position, and the filter has to infer velocity and smooth the noise.
Drag $R$ and $Q$ to feel the trust trade-off the gain encodes:

<KalmanTrack />

## In code

The whole thing is a few lines of linear algebra. Here's the constant-velocity tracker
above, in NumPy — no framework, runnable as-is:

```python
import numpy as np

dt = 1.0
F = np.array([[1, dt],      # constant-velocity transition
              [0, 1]])
H = np.array([[1.0, 0.0]])  # observe position only
Q = 0.5 * np.array([[dt**3/3, dt**2/2],   # process noise (drift)
                    [dt**2/2, dt]])
R = np.array([[49.0]])      # measurement noise (sensor variance)

x = np.array([[0.0], [0.0]])      # initial state
P = np.eye(2) * 50.0              # initial uncertainty

def step(x, P, z):
    # predict
    x = F @ x
    P = F @ P @ F.T + Q
    # update
    y = z - H @ x                 # innovation
    S = H @ P @ H.T + R           # innovation covariance
    K = P @ H.T @ np.linalg.inv(S)   # Kalman gain
    x = x + K @ y
    P = (np.eye(2) - K @ H) @ P
    return x, P

for z in measurements:            # stream of noisy position readings
    x, P = step(x, P, np.array([[z]]))
    # x[0] is the smoothed position estimate, P[0,0] its variance
```

Two knobs do all the tuning. `R` says how noisy the sensor is; `Q` says how much you let
the model drift. Get their ratio right and the filter is optimal; get it wrong and it either
lags reality (`Q` too small) or chases noise (`R` too small). In practice you start from the
sensor's datasheet for `R` and tune `Q` until the innovation $\mathbf{y}$ looks like white
noise.

<Callout type="note">
For numerical robustness in real systems, use the **Joseph form** of the covariance update,
$\mathbf{P} = (\mathbf{I}-\mathbf{KH})\hat{\mathbf{P}}(\mathbf{I}-\mathbf{KH})^{\!\top} +
\mathbf{KRK}^{\!\top}$, which stays symmetric positive-definite even with floating-point
error, where the compact $(\mathbf{I}-\mathbf{KH})\hat{\mathbf{P}}$ can drift and diverge.
</Callout>

## When the world isn't linear

The plain Kalman filter assumes $\mathbf{F}$ and $\mathbf{H}$ are linear. Robotics is full
of rotations and projections that aren't. Three extensions matter, and the last one is the
bridge to LiDAR-inertial SLAM:

- **Extended KF (EKF).** The dynamics $f$ and measurement $h$ are nonlinear, so you
  *linearize* them at the current estimate: use the Jacobians
  $\mathbf{F} = \left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\hat{\mathbf{x}}}$ and
  $\mathbf{H} = \left.\frac{\partial h}{\partial \mathbf{x}}\right|_{\hat{\mathbf{x}}}$ in
  place of the matrices, run the same equations. Cheap, and it works when the nonlinearity
  is mild over one step.
- **Iterated EKF (iEKF).** One linearization point can be bad if the prior is far from the
  truth. So *relinearize*: after the update, recompute $\mathbf{H}$ at the new estimate and
  redo the update, iterating until it converges. Each iteration is a Gauss-Newton step on
  the maximum-a-posteriori objective. This is what FAST-LIO uses — it matters because the
  point-to-plane LiDAR residual is very nonlinear in the pose.
- **Error-state / on-manifold KF.** You can't add a vector to a rotation
  $\mathbf{R}\in SO(3)$ and stay on the manifold. So you track the state on the manifold but
  the *error* (and its covariance) in the tangent space, fusing with the $\boxplus/\boxminus$
  operators instead of $+/-$. This keeps rotations valid and the covariance minimal
  (3 numbers for orientation, not 9).

That last combination — an **iterated, error-state Kalman filter on a manifold** — is
exactly the engine inside FAST-LIO2, fusing a high-rate IMU prediction with thousands of raw
LiDAR points per scan. The Kalman gain there gets one more clever rewrite to handle those
thousands of measurements cheaply, which is where I'll pick up next.

## The one-paragraph summary

A Kalman filter holds a Gaussian belief over a state. **Predict** moves the belief through a
motion model and inflates its uncertainty; **update** multiplies it by the measurement's
Gaussian, which sharpens it and pulls the mean toward the data by the gain $\mathbf{K}$ —
optimally weighted by which source is more certain. Linearize for nonlinear systems (EKF),
relinearize for hard ones (iEKF), and track the error in the tangent space for rotations.
That's the whole toolkit, and it runs real robots.
