# FAST-LIO2 from scratch: LiDAR-inertial odometry you can actually reproduce

> Satyajit Ghana — Head of Engineering @ Inkers Technology
> canonical: https://ai.thesatyajit.com/articles/fast-lio2-lidar-inertial-odometry
> date: 2026-06-28
> tags: slam, lidar, state-estimation, point-cloud, explainer
FAST-LIO2 is the LiDAR-inertial odometry I keep coming back to: it's accurate, it runs at
100 Hz on a laptop, it survives 1000 deg/s rotations, and the whole thing is one tight loop
around a Kalman filter. But the official code is a maze of templates, and the cleanest
annotated fork is in Chinese. So this is the article I wanted: what FAST-LIO2 actually
*does*, derived from first principles, with real code — and a path to rebuild the core
without ROS, because once you can do that, you understand SLAM.

If the Kalman filter isn't fresh, read [my Kalman piece](/articles/kalman-filter) first —
FAST-LIO2 is exactly the "iterated, error-state, on-manifold" filter that article ends on,
fed by a high-rate IMU and corrected by thousands of LiDAR points per scan.

## The whole system in one loop

The problem: a LiDAR gives you ~100k 3D points per scan at 10 Hz, but a scan takes ~100 ms
during which the sensor moves, so the points are distorted; and LiDAR alone is slow to
register and fragile under fast motion. An IMU gives you 200–1000 Hz acceleration and
angular velocity — great for short-term motion, but it drifts. Fuse them tightly and each
fixes the other's weakness.

<Figure
  src="/articles/fastlio2/x1.png"
  alt="FAST-LIO2 system overview: IMU and LiDAR inputs feed forward propagation, points accumulation, backward propagation, residual computation, and an iterated state update that loops until converged; on the right an ikd-Tree map handles point-wise insert, box-wise delete, and kNN search."
  caption="FAST-LIO2's system overview (paper, Figure 1). Left: the state-estimation loop — IMU forward propagation, point accumulation, backward-propagation deskew, point-to-plane residual, iterated state update, repeat until converged. Right: the ikd-Tree map — incremental insert with on-tree downsampling, box-wise delete as the map window moves, and kNN search feeding the residual."
/>

Before the math, here's the cycle as a sequence — what each stage does and, just as
important, *why it has to be there*. It runs once per LiDAR scan; the IMU drives the
prediction in between:

<LioFlow />

Two contributions set FAST-LIO2 apart from its predecessor:

1. **Direct registration.** No edge/plane feature extraction — it registers *raw* points to
   the map by point-to-plane residuals. Less to tune, and it uses all the geometry.
2. **The ikd-Tree.** An incremental k-d tree that inserts, deletes, downsamples, and
   re-balances in place, so the map updates in real time instead of being rebuilt.

Underneath both is a **tightly-coupled iterated error-state Kalman filter on a manifold**.
Let's build it piece by piece. I'll quote C++ from
[`zlwang7/S-FAST_LIO`](https://github.com/zlwang7/S-FAST_LIO) — a clean reimplementation
that writes the filter out explicitly instead of hiding it in template magic — and give
simplified NumPy alongside.

## The state lives on a manifold

You can't store an orientation as a 3-vector and add to it — rotations live on the manifold
$SO(3)$. FAST-LIO2 tracks a **24-dimensional nominal state** but a **23-dimensional error
state** (and covariance), because each $SO(3)$ rotation needs 3 tangent dimensions, not the
4 of a quaternion, and gravity lives on the sphere $S^2$ (2 dimensions). The state is:

$$
\mathbf{x} = \big[\ \mathbf{p},\ \mathbf{R},\ \mathbf{R}_{L}^{I},\ \mathbf{t}_{L}^{I},\
\mathbf{v},\ \mathbf{b}_g,\ \mathbf{b}_a,\ \mathbf{g}\ \big]
$$

position, attitude, LiDAR→IMU extrinsic rotation and translation, velocity, gyro bias,
accel bias, and gravity. In the clean code that's one manifold declaration:

```cpp
// include/use-ikfom.hpp  — 24-D nominal, 23-D tangent
MTK_BUILD_MANIFOLD(state_ikfom,
  ((vect3, pos))      ((SO3, rot))
  ((SO3, offset_R_L_I)) ((vect3, offset_T_L_I))
  ((vect3, vel))      ((vect3, bg)) ((vect3, ba))
  ((S2, grav)));
```

We update on the manifold with $\boxplus$ (retraction) and measure differences with
$\boxminus$ — for the $SO(3)$ part, $\mathbf{R}\boxplus\delta = \mathbf{R}\,\mathrm{Exp}(\delta)$
and $\mathbf{R}_1\boxminus\mathbf{R}_2 = \mathrm{Log}(\mathbf{R}_2^{\!\top}\mathbf{R}_1)$.
Everything else is ordinary vector $+/-$.

## Forward propagation: ride the IMU

Between LiDAR scans, the IMU drives the state forward. The continuous kinematics are the
standard strapdown model — position integrates velocity, attitude integrates de-biased
angular velocity, velocity integrates de-biased, gravity-corrected acceleration:

$$
\dot{\mathbf{p}} = \mathbf{v}, \qquad
\dot{\mathbf{R}} = \mathbf{R}\,(\boldsymbol{\omega}_m - \mathbf{b}_g)^\wedge, \qquad
\dot{\mathbf{v}} = \mathbf{R}(\mathbf{a}_m - \mathbf{b}_a) + \mathbf{g},
$$

with the biases doing a slow random walk. That's `get_f` verbatim:

```cpp
// f(x,u): continuous-time kinematics
vect3 omega      = in.gyro - s.bg;            // ω = ω_m − b_g
vect3 a_inertial = s.rot * (in.acc - s.ba);   // R(a_m − b_a)
res(i)      = s.vel[i];                        // ṗ = v
res(i + 3)  = omega[i];                        // Ṙ ← ω
res(i + 12) = a_inertial[i] + s.grav[i];       // v̇ = R(a_m−b_a) + g
```

The predict step pushes the *mean* forward by $\mathbf{x} \boxplus (\Delta t\,\mathbf{f})$ and
the *covariance* forward with the error-state Jacobians $\mathbf{F_x},\mathbf{F_w}$:

$$
\hat{\mathbf{x}} = \mathbf{x}\boxplus(\Delta t\,\mathbf{f}), \qquad
\hat{\mathbf{P}} = \mathbf{F_x}\,\mathbf{P}\,\mathbf{F_x}^{\!\top} +
(\Delta t\,\mathbf{F_w})\,\mathbf{Q}\,(\Delta t\,\mathbf{F_w})^{\!\top}.
$$

```cpp
void predict(double &dt, Matrix<double,12,12> &Q, const input_ikfom &i_in) {
    Matrix<double,24,1>  f_     = get_f(x_, i_in);    // 24×1
    Matrix<double,24,23> df_dx_ = df_dx(x_, i_in);    // ∂f/∂x
    Matrix<double,24,12> df_dw_ = df_dw(x_, i_in);    // ∂f/∂w
    x_ = x_.plus(f_, dt);                              // x ⊞ (dt·f)
    // F_x = I + dt·A·df_dx ,  F_w = dt·df_dw  (assembled via the boxplus Jacobian)
    P_ = F_x1 * P_ * F_x1.transpose() + (dt*F_w1) * Q * (dt*F_w1).transpose();
}
```

In NumPy the shape of it is just:

```python
def predict(x, P, imu, dt, Q):
    f      = get_f(x, imu)              # kinematics above
    F_x, F_w = jacobians(x, imu, dt)   # error-state transition + noise maps
    x = boxplus(x, f * dt)             # advance the mean on the manifold
    P = F_x @ P @ F_x.T + F_w @ Q @ F_w.T
    return x, P
```

This runs once per IMU sample, and the per-sample poses are cached — we need them next.

## Backward propagation: deskew the scan

Because the scan sweeps over time while the platform moves, every point was measured from a
slightly different pose. Stack them naively and a flat wall comes out sheared:

<Deskew />

FAST-LIO2 fixes this with **backward propagation**: walk the cached IMU poses from the
scan-end time backward, and transform each point from the pose it was *actually* sampled at
into the scan-end frame. For a point sampled at time $\rho_j$ with the IMU pose
$(\mathbf{R}_j,\mathbf{t}_j)$ relative to the scan-end pose $(\mathbf{R}_e,\mathbf{t}_e)$:

$$
\mathbf{p}^{\text{end}}_j = \mathbf{R}_{L}^{I\,\top}\!\Big(\mathbf{R}_e^{\!\top}\big(\mathbf{R}_j(\mathbf{R}_{L}^{I}\mathbf{p}_j + \mathbf{t}_{L}^{I}) + (\mathbf{t}_j - \mathbf{t}_e)\big) - \mathbf{t}_{L}^{I}\Big)
$$

which is exactly the compensation in `UndistortPcl`:

```cpp
M3D R_i(R_imu * Exp(angvel_avr, dt));        // attitude at this point's sample time
V3D T_ei(pos_imu + vel_imu*dt + 0.5*acc_imu*dt*dt - imu_state.pos);
V3D P_compensate = imu_state.offset_R_L_I.conjugate() *
    (imu_state.rot.conjugate() * (R_i * (imu_state.offset_R_L_I * P_i
     + imu_state.offset_T_L_I) + T_ei) - imu_state.offset_T_L_I);
```

Now every point lives in one consistent frame and it's safe to register against the map.

## The measurement: point-to-plane

FAST-LIO2 doesn't extract features. For each deskewed point it transforms it into the world
with the current state, finds its 5 nearest map points via the ikd-Tree, fits a plane to
them, and the **residual is the point-to-plane distance** — zero when the point sits exactly
on the surface:

<Figure
  src="/articles/fastlio2/x2.png"
  alt="The point-to-plane measurement model: a scan point (red) and the corresponding plane fit from nearby map points (blue), with the plane normal u_j; the residual is the signed distance from the point to the plane."
  caption="The measurement model (paper, Figure 2): a scan point (red) is matched to the local plane fit from its nearest map points (blue). The residual is the signed point-to-plane distance along the normal uⱼ — minimized when the estimated pose lands the point on the surface."
/>

For a point $\mathbf{p}$ in the body frame, transformed to world
$\mathbf{p}^W = \mathbf{R}(\mathbf{R}_L^I\mathbf{p}+\mathbf{t}_L^I)+\mathbf{p}$, a plane with
unit normal $\mathbf{u}$ and offset $d$ gives residual $z = \mathbf{u}^{\!\top}\mathbf{p}^W + d$.
That's `h_share_model`:

```cpp
V3D p_global(s.rot * (s.offset_R_L_I * p_body + s.offset_T_L_I) + s.pos);  // to world
ikdtree.Nearest_Search(point_world, NUM_MATCH_POINTS, points_near, sqDis); // 5 nearest
if (esti_plane(pabcd, points_near, 0.1f)) {                                 // fit plane (a,b,c,d)
    float pd2 = pabcd(0)*x + pabcd(1)*y + pabcd(2)*z + pabcd(3);            // point-to-plane dist
    ...
}
// Jacobian row (w.r.t. attitude θ and extrinsic), residual = −distance
V3D C(s.rot.conjugate() * norm_vec);          // Rᵀu
V3D A(point_I_crossmat * C);                  // (R_L^I p + t_L^I)^∧ Rᵀu
ekfom_data.h_x.block<1,12>(i,0) << norm_p.x, norm_p.y, norm_p.z, A, ...;   // [ u | A | … ]
ekfom_data.h(i) = -norm_p.intensity;          // the residual
```

The crucial detail: the Jacobian $\mathbf{H}$ is $m\times 12$ — $m$ thousands of points,
but only **12 columns** (6 for pose, 6 for the extrinsic), because a single LiDAR scan can't
observe velocity, biases, or gravity directly. Hold that thought; it's why the next step is
fast. In NumPy:

```python
def build_H_z(points_body, x, ikdtree, map_pts, R_LI, t_LI):
    H, z = [], []
    for p in points_body:
        pw = x.R @ (R_LI @ p + t_LI) + x.p          # body → world
        nn = ikdtree.nearest(pw, k=5)               # 5 nearest map points
        n, d = fit_plane(map_pts[nn])               # unit normal, offset
        r = n @ pw + d                              # point-to-plane distance
        if abs(r) < 0.1:                            # keep confident matches
            pI = R_LI @ p + t_LI
            A  = skew(pI) @ (x.R.T @ n)             # ∂r/∂θ  (attitude block)
            H.append(np.concatenate([n, A]))        # [ normal | attitude ]
            z.append(-r)
    return np.array(H), np.array(z)                 # H: m×6 (here), z: m
```

## The iterated update, and the gain that makes it cheap

A single EKF update would linearize the very-nonlinear point-to-plane fit once, at a
possibly-wrong pose, and be off. So FAST-LIO2 **iterates**: re-associate, rebuild $\mathbf{H}$
at the latest estimate, take one Kalman step, repeat until the correction is tiny. Watch the
scan snap onto the map:

<IEKFRegister />

Each iteration is

$$
\delta\mathbf{x} = \mathbf{K}\,\mathbf{z} + (\mathbf{I}-\mathbf{K}\mathbf{H})(\mathbf{x}^\kappa \boxminus \hat{\mathbf{x}}),
\qquad \mathbf{x}^{\kappa+1} = \mathbf{x}^\kappa \boxplus \delta\mathbf{x},
$$

iterating $\kappa$ until every component of $\delta\mathbf{x}$ drops below $10^{-3}$. The
piece that makes FAST-LIO *fast* is the **reformulated Kalman gain**. The textbook form,

$$
\mathbf{K} = \hat{\mathbf{P}}\mathbf{H}^{\!\top}(\mathbf{H}\hat{\mathbf{P}}\mathbf{H}^{\!\top}+\mathbf{R})^{-1},
$$

inverts an $m\times m$ matrix — and $m$ is *thousands* of points. FAST-LIO uses the
information-form identity to rewrite it as

$$
\mathbf{K} = (\mathbf{H}^{\!\top}\mathbf{R}^{-1}\mathbf{H} + \hat{\mathbf{P}}^{-1})^{-1}\mathbf{H}^{\!\top}\mathbf{R}^{-1},
$$

which inverts a $23\times 23$ matrix — the **state** dimension — no matter how many points
there are. That's the whole trick, and in clean code it's one block:

```cpp
// R is a scalar (LASER_POINT_COV = 0.001), so R⁻¹ = 1/R
auto K_front = (HTH / R + P_.inverse()).inverse();      // (HᵀR⁻¹H + P⁻¹)⁻¹  — 23×23
K = K_front.block<23,12>(0,0) * H.transpose() / R;      // … Hᵀ R⁻¹
Matrix<double,23,1> dx_ = K * dyn_share.h               // K z
       + (Matrix<double,23,23>::Identity() - K*H) * dx_new;  // (I−KH)(x ⊟ x̂)
x_ = x_.boxplus(dx_);
// convergence: every |dx_[j]| < epsi (0.001); then update covariance
P_ = (Matrix<double,23,23>::Identity() - K*H) * P_;
```

The same loop in NumPy, with the cheap gain spelled out:

```python
def update_iterated(x, P, points_body, ikdtree, map_pts, R=1e-3, max_iter=4, eps=1e-3):
    x_prior = x.copy()
    n = P.shape[0]                                   # 23 (error-state dim)
    for _ in range(max_iter):
        H, z = build_H_z(points_body, x, ikdtree, map_pts, R_LI, t_LI)  # relinearize
        # information-form gain: invert (state × state), independent of len(z)
        S = H.T @ H / R + np.linalg.inv(P)           # n×n
        K = np.linalg.solve(S, H.T) / R              # K = S⁻¹ Hᵀ R⁻¹
        dx = K @ z + (np.eye(n) - K @ H) @ (-boxminus(x, x_prior))
        x  = boxplus(x, dx)
        if np.max(np.abs(dx)) < eps:
            break
    P = (np.eye(n) - K @ H) @ P
    return x, P
```

That's the engine. The converged $\mathbf{x}$ is your odometry output, published at LiDAR
rate.

## The map: an incremental k-d tree

The nearest-neighbor search in the measurement step is the hot path, and the map is
*growing and moving*. A static k-d tree would be rebuilt every scan — fatal. The **ikd-Tree**
instead inserts points in place, downsamples on the tree, deletes whole regions with one
box-wise delete as the local map window slides with the sensor, and lazily re-balances only
the subtrees that get lopsided:

<IkdMap />

<Figure
  src="/articles/fastlio2/x3.png"
  alt="2D illustration of ikd-Tree map region management: a local map cube around the sensor that slides as the platform moves, with regions added at the leading edge and removed at the trailing edge."
  caption="Map region management (paper, Figure 3): the ikd-Tree keeps a local map window around the sensor. As the platform moves, new regions are inserted and far regions are removed with box-wise deletes — keeping the active map bounded and the kNN query fast."
/>

In code it's a handful of calls:

```cpp
ikdtree.Build(feats_down_world->points);          // first scan
ikdtree.Add_Points(PointToAdd, true);             // incremental insert + on-tree downsample
ikdtree.Delete_Point_Boxes(cub_needrm);           // box-wise delete (window slid)
ikdtree.Nearest_Search(point_world, 5, near, d);  // kNN, inside the measurement step
```

The payoff is real: on the authors' benchmarks FAST-LIO2 spends *less* time per scan than
FAST-LIO while holding a *larger* map, on both Intel and Arm.

<Figure
  src="/articles/fastlio2/x6.png"
  alt="Processing time per LiDAR scan over time for FAST-LIO and FAST-LIO2 on Intel and Arm CPUs (log scale, top), and the number of map points held over time (bottom); FAST-LIO2 is consistently faster while keeping more map points."
  caption="Per-scan processing time (paper, Figure 8): FAST-LIO2 (cyan/red) stays below FAST-LIO (green/purple) on both Intel and Arm — while the bottom panel shows it maintaining a larger map. The ikd-Tree is why the direct, all-points approach is still real-time."
/>

## Putting it together — and dropping ROS

Here's the entire main loop, which is shorter than you'd expect:

```cpp
while (running) {
  if (sync_packages(Measures)) {                  // group IMU + one LiDAR scan by time
    p_imu->Process(Measures, kf, feats_undistort); // forward-propagate + deskew
    downSizeFilterSurf.filter(*feats_down_body);   // voxel-downsample the scan
    kf.update_iterated_dyn_share_modified(         // the iterated point-to-plane EKF
        LASER_POINT_COV, feats_down_body, ikdtree, Nearest_Points,
        NUM_MAX_ITERATIONS, extrinsic_est_en);
    state_point = kf.get_x();                      // odometry output
    map_incremental();                             // ikdtree.Add_Points(...)
  }
}
```

Notice what's *not* algorithm here: `sync_packages` is just time-aligning two streams,
`publish_odometry`/`publish_frame_world` are ROS topics, and `tf` is bookkeeping. **None of
that is the filter.** To reproduce FAST-LIO2 without ROS you only need:

| You need | You don't need |
|---|---|
| read IMU samples (t, ω, a) from a file/array | ROS subscribers / message types |
| read LiDAR points (x, y, z, per-point time) | rosbag, nodelets |
| forward-propagate + deskew (the IMU code) | tf tree |
| a k-d tree over the map (ikd-Tree, or even scipy `cKDTree` rebuilt per scan to start) | rviz, publishers |
| the iterated point-to-plane update | the IKFoM template layer |

A no-ROS skeleton is just the loop, fed from arrays:

```python
x, P = init_state(), init_cov()
ikdtree = KDMap(voxel=0.5)                 # or scipy cKDTree to begin with
for scan in lidar_scans:                   # each: points + per-point timestamps
    imu_batch = imu_between(prev_t, scan.t_end)
    for imu in imu_batch:                  # 1) forward propagation
        x, P = predict(x, P, imu, imu.dt, Q)
    pts = deskew(scan.points, imu_poses, x)        # 2) backward propagation
    pts = voxel_downsample(pts, 0.5)               # 3) downsample
    x, P = update_iterated(x, P, pts, ikdtree, ikdtree.points)  # 4) iterated EKF
    ikdtree.add(transform_to_world(pts, x))        # 5) grow the map
    yield x.p, x.R                                 # pose = odometry
```

Start with a `cKDTree` rebuilt each scan to get the algorithm working end to end, then swap
in a true incremental tree once you care about speed. That ordering — correctness first,
then the ikd-Tree — is exactly how to learn it.

To anchor yourself in the real repo, here's the file → concept map for the clean version:

| File | What it owns |
|---|---|
| `use-ikfom.hpp` | the state manifold, `get_f`, `df_dx`, `df_dw` |
| `esekfom.hpp` | the explicit ESEKF: `predict`, `h_share_model`, `update_iterated_dyn_share_modified`, the reformulated gain |
| `IMU_Processing.hpp` | IMU init, forward propagation, `UndistortPcl` (deskew) |
| `ikd_Tree.cpp` | `Build`, `Add_Points`, `Delete_Point_Boxes`, `Nearest_Search` |
| `laserMapping.cpp` | the ROS glue + main loop (the part you replace) |

## A complete, runnable implementation

I wrote the whole thing as one dependency-light file —
[`fastlio2_mini.py`](/articles/fastlio2/fastlio2_mini.py) (≈250 lines, `numpy` +
`scipy` + `rosbags`). It's a faithful teaching implementation: the on-manifold state,
forward/backward propagation, the iterated point-to-plane update with the reformulated
gain — all the code blocks above, assembled and tested. It takes a real LiDAR→IMU
extrinsic and does per-scan voxel downsampling; the remaining simplifications, called out
honestly, are gravity fixed after init and a `scipy` cKDTree rebuilt per scan instead of a
true ikd-Tree.

The driver is the no-ROS loop, fed from plain arrays — read IMU, propagate (caching
poses), deskew into the IMU frame, downsample, iterated-update, grow the map:

```python
def run_offline(imu_stream, lidar_scans, voxel=0.4, scan_voxel=0.5,
                T_LI=None, R_LI=None, acc_cov=1e-2, gyr_cov=1e-2,
                bacc_cov=1e-4, bgyr_cov=1e-4, init_secs=0.5):
    Q = np.diag([gyr_cov]*3 + [acc_cov]*3 + [bgyr_cov]*3 + [bacc_cov]*3)
    R_LI = np.eye(3) if R_LI is None else np.asarray(R_LI, float)
    T_LI = np.zeros(3) if T_LI is None else np.asarray(T_LI, float)
    to_imu = lambda p: (R_LI @ p.T).T + T_LI                # LiDAR points -> IMU frame
    g, bg = imu_init([s for s in imu_stream if s[0] < imu_stream[0][0] + init_secs])
    kf = ESEKF(g); kf.x.bg = bg
    lmap = LocalMap(voxel); traj = []; imu_i = 0; bootstrapped = False
    for scan in lidar_scans:
        poses = []
        while imu_i < len(imu_stream) and imu_stream[imu_i][0] <= scan['t_end']:
            t, acc, gyro = imu_stream[imu_i]
            dt = t - (imu_stream[imu_i-1][0] if imu_i > 0 else t)
            if dt > 0: kf.predict(acc, gyro, dt, Q)        # 1. forward propagation
            poses.append((t, kf.x.R.copy(), kf.x.p.copy()))
            imu_i += 1
        body = to_imu(scan['points'])                                   # extrinsic
        pts = deskew(body, scan['dts'], poses, scan['t_end'])           # 2. backward deskew
        pts = voxel_downsample(pts, scan_voxel)                         # sparse, even set
        if not bootstrapped:
            lmap.add((kf.x.R @ pts.T).T + kf.x.p); bootstrapped = True   # seed the map
        else:
            kf.update(pts, lmap)                            # 3. iterated point-to-plane EKF
            lmap.add((kf.x.R @ pts.T).T + kf.x.p)           # 4. grow the map
        traj.append((scan['t_end'], kf.x.p.copy(), kf.x.R.copy()))
    return traj, lmap
```

It ships with a synthetic world (a robot looping through a 10×10×3 m room) so you can run
it with **no dataset at all** — and that's how I validated it:

```
$ python fastlio2_mini.py
in-memory  : ATE rmse = 0.037 m   final = 0.020 m
via .bag   : ATE rmse = 0.069 m   final = 0.058 m
```

The second line is the important one: the file also writes the simulated data to a real
ROS1 `.bag` (`sensor_msgs/Imu` + `PointCloud2`), reads it back through `read_bag()`, and
re-runs — exercising the exact bag-parsing path you'd use on real hardware, end to end, to
**4–7 cm** of absolute trajectory error. The math is correct.

## Reading a real `.bag` — including Livox

`read_bag()` uses the pure-python `rosbags` (no ROS install) and handles both standard
`sensor_msgs/PointCloud2` (Velodyne/Ouster) and Livox's custom `CustomMsg`. Livox is the
catch with FAST-LIO data — its bags aren't PointCloud2, they're a custom message, so you
register the type definition and parse it yourself:

```python
from rosbags.typesys import Stores, get_typestore
from rosbags.typesys.msg import get_types_from_msg
ts = get_typestore(Stores.ROS1_NOETIC)
ts.register(get_types_from_msg(                              # the Livox point struct
    "uint32 offset_time\nfloat32 x\nfloat32 y\nfloat32 z\n"
    "uint8 reflectivity\nuint8 tag\nuint8 line\n", 'livox_ros_driver/msg/CustomPoint'))
ts.register(get_types_from_msg(                              # the Livox scan message
    "std_msgs/Header header\nuint64 timebase\nuint32 point_num\nuint8 lidar_id\n"
    "uint8[3] rsvd\nlivox_ros_driver/CustomPoint[] points\n", 'livox_ros_driver/msg/CustomMsg'))
# then: msg.points -> (x,y,z, offset_time);  offset_time is per-point time for the deskew
```

So fetching and running an actual HKU dataset is two steps:

```bash
pip install gdown
gdown 1YqxHuDKzWUcda80QKBV61lXI86TXsGjP -O avia.bag    # a Livox Avia indoor bag
python fastlio2_mini.py avia.bag
```

## What happens on real data — and the bug that taught me the most

I ran exactly that on the HKU Avia "quick-shack" bag (49 s, 9953 IMU + 491 Livox scans).
It tracks. Over the full bag the filter follows **47.1 m of path** and **47.9 rad of
cumulative rotation** (the sensor is waved around a small room — roughly eight full turns
in a ~3 m box) and reconstructs a recognizable room:

<Figure
  src="/articles/fastlio2/avia-trajectory.png"
  alt="Two plots from running fastlio2_mini on the HKU Livox Avia bag: a top-down view showing the estimated path (colored by time) tracing through a reconstructed room point cloud, and a side x–z view; the path stays bounded within the room rather than collapsing or diverging."
  caption="fastlio2_mini on the real HKU Livox Avia bag (491 scans, 49 s). Left: top-down path (colored by scan time) over the reconstructed map — the room's walls are clearly recovered. Right: the x–z side view. This is the minimal Python filter — no ROS, ~250 lines — tracking a real Livox bag end to end."
/>

But it did **not** track on the first try, and the reason is the most useful thing in this
whole article — because it wasn't the filter math at all. The first run collapsed exactly
the way a broken LIO does: the trajectory froze within a few centimetres of the origin
while the IMU clearly showed the sensor swinging through ~1 rad/s of rotation. I almost
wrote it off as "the teaching filter isn't robust enough." It wasn't that.

**The bug was a sensor-clock mismatch.** I instrumented the scan timestamps and found them
landing at `t ≈ -1.6e9` *relative to the IMU* — an impossible 50-year gap. The Livox
`CustomMsg` header stamps each scan on the **sensor's own clock** (seconds since the LiDAR
booted, ~361 s into this bag), while the `/livox/imu` messages are stamped on the **bag's
record clock** (Unix time, ~1.6 billion). My propagate-up-to-scan-end loop compares the two:

```python
while imu_stream[imu_i][0] <= scan['t_end']:   # IMU time vs LiDAR header time
    kf.predict(...)                            # ...never true → never runs
```

Because every IMU timestamp (1.6e9) was vastly larger than every scan's header time (361),
that condition was *never* true. **The IMU never propagated.** The filter sat at its
initial pose, the update snapped each scan onto the origin-seeded map, and the whole thing
looked like a plausible "data-association collapse" — when really it was a unit/epoch bug
two layers down. The fix is to ignore the Livox header entirely and timestamp each scan
with the **bag record time** (which `rosbags` gives you for every message, on one
consistent clock):

```python
for conn, t, raw in reader.messages(connections=conns):   # t = bag record time (ns)
    ...
    lidar_scans.append({'t_end': t * 1e-9, 'points': pts, 'dts': dts})  # not msg.header!
```

That one change is the difference between a frozen origin and a tracked 47 m trajectory.
Three smaller changes turned tracking into *clean* tracking, and they're the calibration
work any real deployment needs: the **LiDAR→IMU extrinsic** from the repo's `avia.yaml`
(`T = [0.04165, 0.02326, -0.0284]`, `R = I`), the **real IMU noise** (`acc_cov = gyr_cov =
0.1`, which my synthetic `Q` had 100× too small — too small and the filter trusts a stale
prediction and refuses to move), and **per-scan voxel downsampling** so the iterated update
stays real-time on tens of thousands of raw points. All four are wired into `run_offline`
and the CLI uses the Avia values by default, so `python fastlio2_mini.py avia.bag` just
works.

The honest caveats remain: this is still a teaching filter. The reconstructed map smears
where fast rotation meets the narrow ~70° Avia FOV (you can see stray points in the wide
view), there's no loop closure, and the `cKDTree`-rebuilt-per-scan map is why a full bag
takes a few minutes offline rather than running at sensor rate. Real FAST-LIO2's ikd-Tree,
in-state gravity, and tighter outlier handling are what close that last gap. But the spine
— the five steps, the manifold state, the reformulated gain — is exactly what's running
here, and it's enough to track a real Livox bag. The lesson I'd carry out of this: in
sensor fusion, **check your clocks first.** A mismatched epoch or a nanosecond-vs-second
unit error masquerades perfectly as a modelling failure, and you can waste a day tuning
covariances that were never the problem.

## Honest notes

- **It needs a decent IMU and initialization.** The filter assumes a high-rate IMU
  (200 Hz+) and a short static period at start to estimate the gravity direction and biases.
  Garbage init, garbage trajectory.
- **Geometric degeneracy is the real failure mode.** Point-to-plane constraints vanish in a
  long featureless tunnel or an open field — the update becomes unobservable along the
  degenerate direction and the IMU drift takes over. This is fundamental to LiDAR odometry,
  not a bug.
- **It's odometry, not loop-closing SLAM.** FAST-LIO2 drifts slowly but has no global loop
  closure; pair it with a pose-graph backend (e.g. a FAST-LIO-SLAM setup) if you need
  globally consistent maps.
- **The "100 Hz" is real but hardware-shaped.** The headline rates assume the ikd-Tree and a
  reasonable CPU; the per-scan cost grows with map density and point count, which is exactly
  what the downsampling and the moving window are there to bound.

The thing I'd take away: FAST-LIO2 is not a pile of heuristics — it's *one* iterated
error-state Kalman filter, fed a deskewed point cloud, corrected by point-to-plane residuals,
over an incremental map, with a gain rewritten so thousands of measurements cost the same as
a few. Understand those five steps and you can rebuild it, and you understand the spine of
modern LiDAR SLAM.

---

*Built on [FAST-LIO2: Fast Direct LiDAR-Inertial Odometry](https://arxiv.org/abs/2107.06829)
(Xu, Cai, Bai, Zhang, 2021), the original [FAST-LIO](https://arxiv.org/abs/2010.08196) and
[ikd-Tree](https://arxiv.org/abs/2102.10808) papers, and the
[HKU-MARS/FAST_LIO](https://github.com/hku-mars/FAST_LIO) source. C++ snippets are from the
clean reimplementation [zlwang7/S-FAST_LIO](https://github.com/zlwang7/S-FAST_LIO); Python is
simplified for teaching.*
