2026-06-28 · 21 min · slam · lidar · state-estimation · point-cloud · explainer
FAST-LIO2 is the LiDAR-inertial odometry I keep coming back to: it's accurate, it runs at 100 Hz on a laptop, it survives 1000 deg/s rotations, and the whole thing is one tight loop around a Kalman filter. But the official code is a maze of templates, and the cleanest annotated fork is in Chinese. So this is the article I wanted: what FAST-LIO2 actually does, derived from first principles, with real code — and a path to rebuild the core without ROS, because once you can do that, you understand SLAM.
If the Kalman filter isn't fresh, read my Kalman piece first — FAST-LIO2 is exactly the "iterated, error-state, on-manifold" filter that article ends on, fed by a high-rate IMU and corrected by thousands of LiDAR points per scan.
The whole system in one loop
The problem: a LiDAR gives you ~100k 3D points per scan at 10 Hz, but a scan takes ~100 ms during which the sensor moves, so the points are distorted; and LiDAR alone is slow to register and fragile under fast motion. An IMU gives you 200–1000 Hz acceleration and angular velocity — great for short-term motion, but it drifts. Fuse them tightly and each fixes the other's weakness.

Before the math, here's the cycle as a sequence — what each stage does and, just as important, why it has to be there. It runs once per LiDAR scan; the IMU drives the prediction in between:
What: Read the next batch of 200–1000 Hz accelerometer + gyro samples.
Why: The IMU is the only thing fast enough to describe motion within a 100 ms LiDAR sweep.
Two contributions set FAST-LIO2 apart from its predecessor:
- Direct registration. No edge/plane feature extraction — it registers raw points to the map by point-to-plane residuals. Less to tune, and it uses all the geometry.
- The ikd-Tree. An incremental k-d tree that inserts, deletes, downsamples, and re-balances in place, so the map updates in real time instead of being rebuilt.
Underneath both is a tightly-coupled iterated error-state Kalman filter on a manifold.
Let's build it piece by piece. I'll quote C++ from
zlwang7/S-FAST_LIO — a clean reimplementation
that writes the filter out explicitly instead of hiding it in template magic — and give
simplified NumPy alongside.
The state lives on a manifold
You can't store an orientation as a 3-vector and add to it — rotations live on the manifold . FAST-LIO2 tracks a 24-dimensional nominal state but a 23-dimensional error state (and covariance), because each rotation needs 3 tangent dimensions, not the 4 of a quaternion, and gravity lives on the sphere (2 dimensions). The state is:
position, attitude, LiDAR→IMU extrinsic rotation and translation, velocity, gyro bias, accel bias, and gravity. In the clean code that's one manifold declaration:
// include/use-ikfom.hpp — 24-D nominal, 23-D tangent
MTK_BUILD_MANIFOLD(state_ikfom,
((vect3, pos)) ((SO3, rot))
((SO3, offset_R_L_I)) ((vect3, offset_T_L_I))
((vect3, vel)) ((vect3, bg)) ((vect3, ba))
((S2, grav)));We update on the manifold with (retraction) and measure differences with — for the part, and . Everything else is ordinary vector .
Forward propagation: ride the IMU
Between LiDAR scans, the IMU drives the state forward. The continuous kinematics are the standard strapdown model — position integrates velocity, attitude integrates de-biased angular velocity, velocity integrates de-biased, gravity-corrected acceleration:
with the biases doing a slow random walk. That's get_f verbatim:
// f(x,u): continuous-time kinematics
vect3 omega = in.gyro - s.bg; // ω = ω_m − b_g
vect3 a_inertial = s.rot * (in.acc - s.ba); // R(a_m − b_a)
res(i) = s.vel[i]; // ṗ = v
res(i + 3) = omega[i]; // Ṙ ← ω
res(i + 12) = a_inertial[i] + s.grav[i]; // v̇ = R(a_m−b_a) + gThe predict step pushes the mean forward by and the covariance forward with the error-state Jacobians :
void predict(double &dt, Matrix<double,12,12> &Q, const input_ikfom &i_in) {
Matrix<double,24,1> f_ = get_f(x_, i_in); // 24×1
Matrix<double,24,23> df_dx_ = df_dx(x_, i_in); // ∂f/∂x
Matrix<double,24,12> df_dw_ = df_dw(x_, i_in); // ∂f/∂w
x_ = x_.plus(f_, dt); // x ⊞ (dt·f)
// F_x = I + dt·A·df_dx , F_w = dt·df_dw (assembled via the boxplus Jacobian)
P_ = F_x1 * P_ * F_x1.transpose() + (dt*F_w1) * Q * (dt*F_w1).transpose();
}In NumPy the shape of it is just:
def predict(x, P, imu, dt, Q):
f = get_f(x, imu) # kinematics above
F_x, F_w = jacobians(x, imu, dt) # error-state transition + noise maps
x = boxplus(x, f * dt) # advance the mean on the manifold
P = F_x @ P @ F_x.T + F_w @ Q @ F_w.T
return x, PThis runs once per IMU sample, and the per-sample poses are cached — we need them next.
Backward propagation: deskew the scan
Because the scan sweeps over time while the platform moves, every point was measured from a slightly different pose. Stack them naively and a flat wall comes out sheared:
Every point lands in the same frame even though the sensor moved between measurements, so a flat wall comes out sheared by the platform's motion. Register this against the map and you'd fight your own motion. Faster motion, worse skew — turn the speed up.
FAST-LIO2 fixes this with backward propagation: walk the cached IMU poses from the scan-end time backward, and transform each point from the pose it was actually sampled at into the scan-end frame. For a point sampled at time with the IMU pose relative to the scan-end pose :
which is exactly the compensation in UndistortPcl:
M3D R_i(R_imu * Exp(angvel_avr, dt)); // attitude at this point's sample time
V3D T_ei(pos_imu + vel_imu*dt + 0.5*acc_imu*dt*dt - imu_state.pos);
V3D P_compensate = imu_state.offset_R_L_I.conjugate() *
(imu_state.rot.conjugate() * (R_i * (imu_state.offset_R_L_I * P_i
+ imu_state.offset_T_L_I) + T_ei) - imu_state.offset_T_L_I);Now every point lives in one consistent frame and it's safe to register against the map.
The measurement: point-to-plane
FAST-LIO2 doesn't extract features. For each deskewed point it transforms it into the world with the current state, finds its 5 nearest map points via the ikd-Tree, fits a plane to them, and the residual is the point-to-plane distance — zero when the point sits exactly on the surface:

For a point in the body frame, transformed to world
, a plane with
unit normal and offset gives residual .
That's h_share_model:
V3D p_global(s.rot * (s.offset_R_L_I * p_body + s.offset_T_L_I) + s.pos); // to world
ikdtree.Nearest_Search(point_world, NUM_MATCH_POINTS, points_near, sqDis); // 5 nearest
if (esti_plane(pabcd, points_near, 0.1f)) { // fit plane (a,b,c,d)
float pd2 = pabcd(0)*x + pabcd(1)*y + pabcd(2)*z + pabcd(3); // point-to-plane dist
...
}
// Jacobian row (w.r.t. attitude θ and extrinsic), residual = −distance
V3D C(s.rot.conjugate() * norm_vec); // Rᵀu
V3D A(point_I_crossmat * C); // (R_L^I p + t_L^I)^∧ Rᵀu
ekfom_data.h_x.block<1,12>(i,0) << norm_p.x, norm_p.y, norm_p.z, A, ...; // [ u | A | … ]
ekfom_data.h(i) = -norm_p.intensity; // the residualThe crucial detail: the Jacobian is — thousands of points, but only 12 columns (6 for pose, 6 for the extrinsic), because a single LiDAR scan can't observe velocity, biases, or gravity directly. Hold that thought; it's why the next step is fast. In NumPy:
def build_H_z(points_body, x, ikdtree, map_pts, R_LI, t_LI):
H, z = [], []
for p in points_body:
pw = x.R @ (R_LI @ p + t_LI) + x.p # body → world
nn = ikdtree.nearest(pw, k=5) # 5 nearest map points
n, d = fit_plane(map_pts[nn]) # unit normal, offset
r = n @ pw + d # point-to-plane distance
if abs(r) < 0.1: # keep confident matches
pI = R_LI @ p + t_LI
A = skew(pI) @ (x.R.T @ n) # ∂r/∂θ (attitude block)
H.append(np.concatenate([n, A])) # [ normal | attitude ]
z.append(-r)
return np.array(H), np.array(z) # H: m×6 (here), z: mThe iterated update, and the gain that makes it cheap
A single EKF update would linearize the very-nonlinear point-to-plane fit once, at a possibly-wrong pose, and be off. So FAST-LIO2 iterates: re-associate, rebuild at the latest estimate, take one Kalman step, repeat until the correction is tiny. Watch the scan snap onto the map:
Each iteration re-associates points to planes at the latest estimate, builds the measurement Jacobian H, and takes one Kalman step dx = K·h + (I−KH)(x⊟x̂). Relinearizing is what makes the very nonlinear point-to-plane fit converge.
Each iteration is
iterating until every component of drops below . The piece that makes FAST-LIO fast is the reformulated Kalman gain. The textbook form,
inverts an matrix — and is thousands of points. FAST-LIO uses the information-form identity to rewrite it as
which inverts a matrix — the state dimension — no matter how many points there are. That's the whole trick, and in clean code it's one block:
// R is a scalar (LASER_POINT_COV = 0.001), so R⁻¹ = 1/R
auto K_front = (HTH / R + P_.inverse()).inverse(); // (HᵀR⁻¹H + P⁻¹)⁻¹ — 23×23
K = K_front.block<23,12>(0,0) * H.transpose() / R; // … Hᵀ R⁻¹
Matrix<double,23,1> dx_ = K * dyn_share.h // K z
+ (Matrix<double,23,23>::Identity() - K*H) * dx_new; // (I−KH)(x ⊟ x̂)
x_ = x_.boxplus(dx_);
// convergence: every |dx_[j]| < epsi (0.001); then update covariance
P_ = (Matrix<double,23,23>::Identity() - K*H) * P_;The same loop in NumPy, with the cheap gain spelled out:
def update_iterated(x, P, points_body, ikdtree, map_pts, R=1e-3, max_iter=4, eps=1e-3):
x_prior = x.copy()
n = P.shape[0] # 23 (error-state dim)
for _ in range(max_iter):
H, z = build_H_z(points_body, x, ikdtree, map_pts, R_LI, t_LI) # relinearize
# information-form gain: invert (state × state), independent of len(z)
S = H.T @ H / R + np.linalg.inv(P) # n×n
K = np.linalg.solve(S, H.T) / R # K = S⁻¹ Hᵀ R⁻¹
dx = K @ z + (np.eye(n) - K @ H) @ (-boxminus(x, x_prior))
x = boxplus(x, dx)
if np.max(np.abs(dx)) < eps:
break
P = (np.eye(n) - K @ H) @ P
return x, PThat's the engine. The converged is your odometry output, published at LiDAR rate.
The map: an incremental k-d tree
The nearest-neighbor search in the measurement step is the hot path, and the map is growing and moving. A static k-d tree would be rebuilt every scan — fatal. The ikd-Tree instead inserts points in place, downsamples on the tree, deletes whole regions with one box-wise delete as the local map window slides with the sensor, and lazily re-balances only the subtrees that get lopsided:
A static k-d tree would have to be rebuilt from scratch every time the map changed. The ikd-Tree instead inserts and deletes points in place, does voxel downsampling on the tree itself, removes whole regions with one box-wise delete as the window slides, and lazily re-balances only the subtrees that get lopsided — so a moving robot keeps a bounded, balanced map and the nearest-neighbor search in the measurement step stays cheap.

In code it's a handful of calls:
ikdtree.Build(feats_down_world->points); // first scan
ikdtree.Add_Points(PointToAdd, true); // incremental insert + on-tree downsample
ikdtree.Delete_Point_Boxes(cub_needrm); // box-wise delete (window slid)
ikdtree.Nearest_Search(point_world, 5, near, d); // kNN, inside the measurement stepThe payoff is real: on the authors' benchmarks FAST-LIO2 spends less time per scan than FAST-LIO while holding a larger map, on both Intel and Arm.

Putting it together — and dropping ROS
Here's the entire main loop, which is shorter than you'd expect:
while (running) {
if (sync_packages(Measures)) { // group IMU + one LiDAR scan by time
p_imu->Process(Measures, kf, feats_undistort); // forward-propagate + deskew
downSizeFilterSurf.filter(*feats_down_body); // voxel-downsample the scan
kf.update_iterated_dyn_share_modified( // the iterated point-to-plane EKF
LASER_POINT_COV, feats_down_body, ikdtree, Nearest_Points,
NUM_MAX_ITERATIONS, extrinsic_est_en);
state_point = kf.get_x(); // odometry output
map_incremental(); // ikdtree.Add_Points(...)
}
}Notice what's not algorithm here: sync_packages is just time-aligning two streams,
publish_odometry/publish_frame_world are ROS topics, and tf is bookkeeping. None of
that is the filter. To reproduce FAST-LIO2 without ROS you only need:
| You need | You don't need |
|---|---|
| read IMU samples (t, ω, a) from a file/array | ROS subscribers / message types |
| read LiDAR points (x, y, z, per-point time) | rosbag, nodelets |
| forward-propagate + deskew (the IMU code) | tf tree |
a k-d tree over the map (ikd-Tree, or even scipy cKDTree rebuilt per scan to start) | rviz, publishers |
| the iterated point-to-plane update | the IKFoM template layer |
A no-ROS skeleton is just the loop, fed from arrays:
x, P = init_state(), init_cov()
ikdtree = KDMap(voxel=0.5) # or scipy cKDTree to begin with
for scan in lidar_scans: # each: points + per-point timestamps
imu_batch = imu_between(prev_t, scan.t_end)
for imu in imu_batch: # 1) forward propagation
x, P = predict(x, P, imu, imu.dt, Q)
pts = deskew(scan.points, imu_poses, x) # 2) backward propagation
pts = voxel_downsample(pts, 0.5) # 3) downsample
x, P = update_iterated(x, P, pts, ikdtree, ikdtree.points) # 4) iterated EKF
ikdtree.add(transform_to_world(pts, x)) # 5) grow the map
yield x.p, x.R # pose = odometryStart with a cKDTree rebuilt each scan to get the algorithm working end to end, then swap
in a true incremental tree once you care about speed. That ordering — correctness first,
then the ikd-Tree — is exactly how to learn it.
To anchor yourself in the real repo, here's the file → concept map for the clean version:
| File | What it owns |
|---|---|
use-ikfom.hpp | the state manifold, get_f, df_dx, df_dw |
esekfom.hpp | the explicit ESEKF: predict, h_share_model, update_iterated_dyn_share_modified, the reformulated gain |
IMU_Processing.hpp | IMU init, forward propagation, UndistortPcl (deskew) |
ikd_Tree.cpp | Build, Add_Points, Delete_Point_Boxes, Nearest_Search |
laserMapping.cpp | the ROS glue + main loop (the part you replace) |
A complete, runnable implementation
I wrote the whole thing as one dependency-light file —
fastlio2_mini.py (≈250 lines, numpy +
scipy + rosbags). It's a faithful teaching implementation: the on-manifold state,
forward/backward propagation, the iterated point-to-plane update with the reformulated
gain — all the code blocks above, assembled and tested. It takes a real LiDAR→IMU
extrinsic and does per-scan voxel downsampling; the remaining simplifications, called out
honestly, are gravity fixed after init and a scipy cKDTree rebuilt per scan instead of a
true ikd-Tree.
The driver is the no-ROS loop, fed from plain arrays — read IMU, propagate (caching poses), deskew into the IMU frame, downsample, iterated-update, grow the map:
def run_offline(imu_stream, lidar_scans, voxel=0.4, scan_voxel=0.5,
T_LI=None, R_LI=None, acc_cov=1e-2, gyr_cov=1e-2,
bacc_cov=1e-4, bgyr_cov=1e-4, init_secs=0.5):
Q = np.diag([gyr_cov]*3 + [acc_cov]*3 + [bgyr_cov]*3 + [bacc_cov]*3)
R_LI = np.eye(3) if R_LI is None else np.asarray(R_LI, float)
T_LI = np.zeros(3) if T_LI is None else np.asarray(T_LI, float)
to_imu = lambda p: (R_LI @ p.T).T + T_LI # LiDAR points -> IMU frame
g, bg = imu_init([s for s in imu_stream if s[0] < imu_stream[0][0] + init_secs])
kf = ESEKF(g); kf.x.bg = bg
lmap = LocalMap(voxel); traj = []; imu_i = 0; bootstrapped = False
for scan in lidar_scans:
poses = []
while imu_i < len(imu_stream) and imu_stream[imu_i][0] <= scan['t_end']:
t, acc, gyro = imu_stream[imu_i]
dt = t - (imu_stream[imu_i-1][0] if imu_i > 0 else t)
if dt > 0: kf.predict(acc, gyro, dt, Q) # 1. forward propagation
poses.append((t, kf.x.R.copy(), kf.x.p.copy()))
imu_i += 1
body = to_imu(scan['points']) # extrinsic
pts = deskew(body, scan['dts'], poses, scan['t_end']) # 2. backward deskew
pts = voxel_downsample(pts, scan_voxel) # sparse, even set
if not bootstrapped:
lmap.add((kf.x.R @ pts.T).T + kf.x.p); bootstrapped = True # seed the map
else:
kf.update(pts, lmap) # 3. iterated point-to-plane EKF
lmap.add((kf.x.R @ pts.T).T + kf.x.p) # 4. grow the map
traj.append((scan['t_end'], kf.x.p.copy(), kf.x.R.copy()))
return traj, lmapIt ships with a synthetic world (a robot looping through a 10×10×3 m room) so you can run it with no dataset at all — and that's how I validated it:
$ python fastlio2_mini.py
in-memory : ATE rmse = 0.037 m final = 0.020 m
via .bag : ATE rmse = 0.069 m final = 0.058 m
The second line is the important one: the file also writes the simulated data to a real
ROS1 .bag (sensor_msgs/Imu + PointCloud2), reads it back through read_bag(), and
re-runs — exercising the exact bag-parsing path you'd use on real hardware, end to end, to
4–7 cm of absolute trajectory error. The math is correct.
Reading a real .bag — including Livox
read_bag() uses the pure-python rosbags (no ROS install) and handles both standard
sensor_msgs/PointCloud2 (Velodyne/Ouster) and Livox's custom CustomMsg. Livox is the
catch with FAST-LIO data — its bags aren't PointCloud2, they're a custom message, so you
register the type definition and parse it yourself:
from rosbags.typesys import Stores, get_typestore
from rosbags.typesys.msg import get_types_from_msg
ts = get_typestore(Stores.ROS1_NOETIC)
ts.register(get_types_from_msg( # the Livox point struct
"uint32 offset_time\nfloat32 x\nfloat32 y\nfloat32 z\n"
"uint8 reflectivity\nuint8 tag\nuint8 line\n", 'livox_ros_driver/msg/CustomPoint'))
ts.register(get_types_from_msg( # the Livox scan message
"std_msgs/Header header\nuint64 timebase\nuint32 point_num\nuint8 lidar_id\n"
"uint8[3] rsvd\nlivox_ros_driver/CustomPoint[] points\n", 'livox_ros_driver/msg/CustomMsg'))
# then: msg.points -> (x,y,z, offset_time); offset_time is per-point time for the deskewSo fetching and running an actual HKU dataset is two steps:
pip install gdown
gdown 1YqxHuDKzWUcda80QKBV61lXI86TXsGjP -O avia.bag # a Livox Avia indoor bag
python fastlio2_mini.py avia.bagWhat happens on real data — and the bug that taught me the most
I ran exactly that on the HKU Avia "quick-shack" bag (49 s, 9953 IMU + 491 Livox scans). It tracks. Over the full bag the filter follows 47.1 m of path and 47.9 rad of cumulative rotation (the sensor is waved around a small room — roughly eight full turns in a ~3 m box) and reconstructs a recognizable room:

But it did not track on the first try, and the reason is the most useful thing in this whole article — because it wasn't the filter math at all. The first run collapsed exactly the way a broken LIO does: the trajectory froze within a few centimetres of the origin while the IMU clearly showed the sensor swinging through ~1 rad/s of rotation. I almost wrote it off as "the teaching filter isn't robust enough." It wasn't that.
The bug was a sensor-clock mismatch. I instrumented the scan timestamps and found them
landing at t ≈ -1.6e9 relative to the IMU — an impossible 50-year gap. The Livox
CustomMsg header stamps each scan on the sensor's own clock (seconds since the LiDAR
booted, ~361 s into this bag), while the /livox/imu messages are stamped on the bag's
record clock (Unix time, ~1.6 billion). My propagate-up-to-scan-end loop compares the two:
while imu_stream[imu_i][0] <= scan['t_end']: # IMU time vs LiDAR header time
kf.predict(...) # ...never true → never runsBecause every IMU timestamp (1.6e9) was vastly larger than every scan's header time (361),
that condition was never true. The IMU never propagated. The filter sat at its
initial pose, the update snapped each scan onto the origin-seeded map, and the whole thing
looked like a plausible "data-association collapse" — when really it was a unit/epoch bug
two layers down. The fix is to ignore the Livox header entirely and timestamp each scan
with the bag record time (which rosbags gives you for every message, on one
consistent clock):
for conn, t, raw in reader.messages(connections=conns): # t = bag record time (ns)
...
lidar_scans.append({'t_end': t * 1e-9, 'points': pts, 'dts': dts}) # not msg.header!That one change is the difference between a frozen origin and a tracked 47 m trajectory.
Three smaller changes turned tracking into clean tracking, and they're the calibration
work any real deployment needs: the LiDAR→IMU extrinsic from the repo's avia.yaml
(T = [0.04165, 0.02326, -0.0284], R = I), the real IMU noise (acc_cov = gyr_cov = 0.1, which my synthetic Q had 100× too small — too small and the filter trusts a stale
prediction and refuses to move), and per-scan voxel downsampling so the iterated update
stays real-time on tens of thousands of raw points. All four are wired into run_offline
and the CLI uses the Avia values by default, so python fastlio2_mini.py avia.bag just
works.
The honest caveats remain: this is still a teaching filter. The reconstructed map smears
where fast rotation meets the narrow ~70° Avia FOV (you can see stray points in the wide
view), there's no loop closure, and the cKDTree-rebuilt-per-scan map is why a full bag
takes a few minutes offline rather than running at sensor rate. Real FAST-LIO2's ikd-Tree,
in-state gravity, and tighter outlier handling are what close that last gap. But the spine
— the five steps, the manifold state, the reformulated gain — is exactly what's running
here, and it's enough to track a real Livox bag. The lesson I'd carry out of this: in
sensor fusion, check your clocks first. A mismatched epoch or a nanosecond-vs-second
unit error masquerades perfectly as a modelling failure, and you can waste a day tuning
covariances that were never the problem.
Honest notes
- It needs a decent IMU and initialization. The filter assumes a high-rate IMU (200 Hz+) and a short static period at start to estimate the gravity direction and biases. Garbage init, garbage trajectory.
- Geometric degeneracy is the real failure mode. Point-to-plane constraints vanish in a long featureless tunnel or an open field — the update becomes unobservable along the degenerate direction and the IMU drift takes over. This is fundamental to LiDAR odometry, not a bug.
- It's odometry, not loop-closing SLAM. FAST-LIO2 drifts slowly but has no global loop closure; pair it with a pose-graph backend (e.g. a FAST-LIO-SLAM setup) if you need globally consistent maps.
- The "100 Hz" is real but hardware-shaped. The headline rates assume the ikd-Tree and a reasonable CPU; the per-scan cost grows with map density and point count, which is exactly what the downsampling and the moving window are there to bound.
The thing I'd take away: FAST-LIO2 is not a pile of heuristics — it's one iterated error-state Kalman filter, fed a deskewed point cloud, corrected by point-to-plane residuals, over an incremental map, with a gain rewritten so thousands of measurements cost the same as a few. Understand those five steps and you can rebuild it, and you understand the spine of modern LiDAR SLAM.
Built on FAST-LIO2: Fast Direct LiDAR-Inertial Odometry (Xu, Cai, Bai, Zhang, 2021), the original FAST-LIO and ikd-Tree papers, and the HKU-MARS/FAST_LIO source. C++ snippets are from the clean reimplementation zlwang7/S-FAST_LIO; Python is simplified for teaching.