# The Kalman filter from first principles > Satyajit Ghana — Head of Engineering @ Inkers Technology > canonical: https://ai.thesatyajit.com/articles/kalman-filter > date: 2026-06-28 > tags: state-estimation, kalman-filter, slam, robotics, explainer I spend a lot of time on odometry and SLAM, and underneath almost all of it is the same idea: you have a noisy sensor, and a model of how the thing you're tracking moves, and you want to fuse them into a single best estimate that also tells you how much to trust itself. That's the Kalman filter. It's the optimal recursive estimator for a linear system with Gaussian noise, and once it clicks, you see it everywhere — GPS, IMUs, radar tracking, and the LiDAR-inertial filters I'll build on in the next article. Let me derive it the way it actually makes sense: as fusing two Gaussians. ## Two sources of information You're tracking a state $\mathbf{x}$ — say the position and velocity of something. You have two things: 1. A **process model**: how the state evolves on its own. For constant velocity, $p_{t} = p_{t-1} + v\,\Delta t$. You can *predict* forward, but the prediction drifts — it accumulates uncertainty. 2. A **measurement model**: a sensor that observes some function of the state, noisily. A position sensor gives you $z = p + \text{noise}$. Neither is enough alone. The prediction drifts; the sensor is noisy and jittery. The Kalman filter combines them, and the key fact is that **combining two Gaussian estimates yields a third Gaussian that is sharper than either input**. That's the whole update step. If the prediction is $\mathcal{N}(\mu_1, \sigma_1^2)$ and the measurement is $\mathcal{N}(\mu_2, \sigma_2^2)$, their normalized product has $$ \mu = \mu_1 + K(\mu_2 - \mu_1), \qquad \sigma^2 = (1-K)\,\sigma_1^2, \qquad K = \frac{\sigma_1^2}{\sigma_1^2 + \sigma_2^2}. $$ $K$ is the **Kalman gain** — the fraction of the way you move from the prediction toward the measurement, set by which one is more certain. Trust the sensor ($\sigma_2 \to 0$) and $K \to 1$; trust the prediction ($\sigma_1 \to 0$) and $K \to 0$. Everything else is this, generalized to vectors. ## The predict-update cycle In general the state is a vector $\mathbf{x}$ with covariance $\mathbf{P}$, and the models are linear maps with Gaussian noise: $$ \mathbf{x}_t = \mathbf{F}\mathbf{x}_{t-1} + \mathbf{B}\mathbf{u}_t + \mathbf{w}, \quad \mathbf{w}\sim\mathcal{N}(0,\mathbf{Q}); \qquad \mathbf{z}_t = \mathbf{H}\mathbf{x}_t + \mathbf{v}, \quad \mathbf{v}\sim\mathcal{N}(0,\mathbf{R}). $$ $\mathbf{F}$ is the state-transition, $\mathbf{H}$ maps state to measurement, $\mathbf{Q}$ is process noise (how much the model drifts), $\mathbf{R}$ is measurement noise. The filter alternates two steps forever: **Predict** — push the state and its uncertainty forward through the model: $$ \hat{\mathbf{x}} = \mathbf{F}\mathbf{x} + \mathbf{B}\mathbf{u}, \qquad \hat{\mathbf{P}} = \mathbf{F}\mathbf{P}\mathbf{F}^{\!\top} + \mathbf{Q}. $$ **Update** — correct with the measurement, by the matrix Kalman gain: $$ \mathbf{y} = \mathbf{z} - \mathbf{H}\hat{\mathbf{x}} \quad(\text{innovation}), \qquad \mathbf{S} = \mathbf{H}\hat{\mathbf{P}}\mathbf{H}^{\!\top} + \mathbf{R} \quad(\text{innovation covariance}), $$ $$ \mathbf{K} = \hat{\mathbf{P}}\mathbf{H}^{\!\top}\mathbf{S}^{-1}, \qquad \mathbf{x} = \hat{\mathbf{x}} + \mathbf{K}\mathbf{y}, \qquad \mathbf{P} = (\mathbf{I} - \mathbf{K}\mathbf{H})\hat{\mathbf{P}}. $$ Watch it run on a constant-velocity tracker. The state is $[\text{position}, \text{velocity}]$, the sensor sees only position, and the filter has to infer velocity and smooth the noise. Drag $R$ and $Q$ to feel the trust trade-off the gain encodes: ## In code The whole thing is a few lines of linear algebra. Here's the constant-velocity tracker above, in NumPy — no framework, runnable as-is: ```python import numpy as np dt = 1.0 F = np.array([[1, dt], # constant-velocity transition [0, 1]]) H = np.array([[1.0, 0.0]]) # observe position only Q = 0.5 * np.array([[dt**3/3, dt**2/2], # process noise (drift) [dt**2/2, dt]]) R = np.array([[49.0]]) # measurement noise (sensor variance) x = np.array([[0.0], [0.0]]) # initial state P = np.eye(2) * 50.0 # initial uncertainty def step(x, P, z): # predict x = F @ x P = F @ P @ F.T + Q # update y = z - H @ x # innovation S = H @ P @ H.T + R # innovation covariance K = P @ H.T @ np.linalg.inv(S) # Kalman gain x = x + K @ y P = (np.eye(2) - K @ H) @ P return x, P for z in measurements: # stream of noisy position readings x, P = step(x, P, np.array([[z]])) # x[0] is the smoothed position estimate, P[0,0] its variance ``` Two knobs do all the tuning. `R` says how noisy the sensor is; `Q` says how much you let the model drift. Get their ratio right and the filter is optimal; get it wrong and it either lags reality (`Q` too small) or chases noise (`R` too small). In practice you start from the sensor's datasheet for `R` and tune `Q` until the innovation $\mathbf{y}$ looks like white noise. For numerical robustness in real systems, use the **Joseph form** of the covariance update, $\mathbf{P} = (\mathbf{I}-\mathbf{KH})\hat{\mathbf{P}}(\mathbf{I}-\mathbf{KH})^{\!\top} + \mathbf{KRK}^{\!\top}$, which stays symmetric positive-definite even with floating-point error, where the compact $(\mathbf{I}-\mathbf{KH})\hat{\mathbf{P}}$ can drift and diverge. ## When the world isn't linear The plain Kalman filter assumes $\mathbf{F}$ and $\mathbf{H}$ are linear. Robotics is full of rotations and projections that aren't. Three extensions matter, and the last one is the bridge to LiDAR-inertial SLAM: - **Extended KF (EKF).** The dynamics $f$ and measurement $h$ are nonlinear, so you *linearize* them at the current estimate: use the Jacobians $\mathbf{F} = \left.\frac{\partial f}{\partial \mathbf{x}}\right|_{\hat{\mathbf{x}}}$ and $\mathbf{H} = \left.\frac{\partial h}{\partial \mathbf{x}}\right|_{\hat{\mathbf{x}}}$ in place of the matrices, run the same equations. Cheap, and it works when the nonlinearity is mild over one step. - **Iterated EKF (iEKF).** One linearization point can be bad if the prior is far from the truth. So *relinearize*: after the update, recompute $\mathbf{H}$ at the new estimate and redo the update, iterating until it converges. Each iteration is a Gauss-Newton step on the maximum-a-posteriori objective. This is what FAST-LIO uses — it matters because the point-to-plane LiDAR residual is very nonlinear in the pose. - **Error-state / on-manifold KF.** You can't add a vector to a rotation $\mathbf{R}\in SO(3)$ and stay on the manifold. So you track the state on the manifold but the *error* (and its covariance) in the tangent space, fusing with the $\boxplus/\boxminus$ operators instead of $+/-$. This keeps rotations valid and the covariance minimal (3 numbers for orientation, not 9). That last combination — an **iterated, error-state Kalman filter on a manifold** — is exactly the engine inside FAST-LIO2, fusing a high-rate IMU prediction with thousands of raw LiDAR points per scan. The Kalman gain there gets one more clever rewrite to handle those thousands of measurements cheaply, which is where I'll pick up next. ## The one-paragraph summary A Kalman filter holds a Gaussian belief over a state. **Predict** moves the belief through a motion model and inflates its uncertainty; **update** multiplies it by the measurement's Gaussian, which sharpens it and pulls the mean toward the data by the gain $\mathbf{K}$ — optimally weighted by which source is more certain. Linearize for nonlinear systems (EKF), relinearize for hard ones (iEKF), and track the error in the tangent space for rotations. That's the whole toolkit, and it runs real robots.