FAST-LIO2 from scratch: LiDAR-inertial odometry you can actually reproduce

2026-06-28 · 21 min · slam · lidar · state-estimation · point-cloud · explainer

FAST-LIO2 is the LiDAR-inertial odometry I keep coming back to: it's accurate, it runs at 100 Hz on a laptop, it survives 1000 deg/s rotations, and the whole thing is one tight loop around a Kalman filter. But the official code is a maze of templates, and the cleanest annotated fork is in Chinese. So this is the article I wanted: what FAST-LIO2 actually does, derived from first principles, with real code — and a path to rebuild the core without ROS, because once you can do that, you understand SLAM.

If the Kalman filter isn't fresh, read my Kalman piece first — FAST-LIO2 is exactly the "iterated, error-state, on-manifold" filter that article ends on, fed by a high-rate IMU and corrected by thousands of LiDAR points per scan.

The whole system in one loop

The problem: a LiDAR gives you ~100k 3D points per scan at 10 Hz, but a scan takes ~100 ms during which the sensor moves, so the points are distorted; and LiDAR alone is slow to register and fragile under fast motion. An IMU gives you 200–1000 Hz acceleration and angular velocity — great for short-term motion, but it drifts. Fuse them tightly and each fixes the other's weakness.

FAST-LIO2 system overview: IMU and LiDAR inputs feed forward propagation, points accumulation, backward propagation, residual computation, and an iterated state update that loops until converged; on the right an ikd-Tree map handles point-wise insert, box-wise delete, and kNN search. — FAST-LIO2's system overview (paper, Figure 1). Left: the state-estimation loop — IMU forward propagation, point accumulation, backward-propagation deskew, point-to-plane residual, iterated state update, repeat until converged. Right: the ikd-Tree map — incremental insert with on-tree downsampling, box-wise delete as the map window moves, and kNN search feeding the residual.

Before the math, here's the cycle as a sequence — what each stage does and, just as important, why it has to be there. It runs once per LiDAR scan; the IMU drives the prediction in between:

FAST-LIO2 cycle · one scan

→→→→→→→↻ loop

1. IMU in

What: Read the next batch of 200–1000 Hz accelerometer + gyro samples.

Why: The IMU is the only thing fast enough to describe motion within a 100 ms LiDAR sweep.

Two contributions set FAST-LIO2 apart from its predecessor:

Direct registration. No edge/plane feature extraction — it registers raw points to the map by point-to-plane residuals. Less to tune, and it uses all the geometry.
The ikd-Tree. An incremental k-d tree that inserts, deletes, downsamples, and re-balances in place, so the map updates in real time instead of being rebuilt.

Underneath both is a tightly-coupled iterated error-state Kalman filter on a manifold. Let's build it piece by piece. I'll quote C++ from zlwang7/S-FAST_LIO — a clean reimplementation that writes the filter out explicitly instead of hiding it in template magic — and give simplified NumPy alongside.

The state lives on a manifold

You can't store an orientation as a 3-vector and add to it — rotations live on the manifold $SO(3)$ . FAST-LIO2 tracks a 24-dimensional nominal state but a 23-dimensional error state (and covariance), because each $SO(3)$ rotation needs 3 tangent dimensions, not the 4 of a quaternion, and gravity lives on the sphere $S^2$ (2 dimensions). The state is:

\mathbf{x} = \big[\ \mathbf{p},\ \mathbf{R},\ \mathbf{R}_{L}^{I},\ \mathbf{t}_{L}^{I},\ \mathbf{v},\ \mathbf{b}_g,\ \mathbf{b}_a,\ \mathbf{g}\ \big]

position, attitude, LiDAR→IMU extrinsic rotation and translation, velocity, gyro bias, accel bias, and gravity. In the clean code that's one manifold declaration:

// include/use-ikfom.hpp  — 24-D nominal, 23-D tangent
MTK_BUILD_MANIFOLD(state_ikfom,
  ((vect3, pos))      ((SO3, rot))
  ((SO3, offset_R_L_I)) ((vect3, offset_T_L_I))
  ((vect3, vel))      ((vect3, bg)) ((vect3, ba))
  ((S2, grav)));

We update on the manifold with $\boxplus$ (retraction) and measure differences with $\boxminus$ — for the $SO(3)$ part, $\mathbf{R}\boxplus\delta = \mathbf{R}\,\mathrm{Exp}(\delta)$ and $\mathbf{R}_1\boxminus\mathbf{R}_2 = \mathrm{Log}(\mathbf{R}_2^{\!\top}\mathbf{R}_1)$ . Everything else is ordinary vector $+/-$ .

Forward propagation: ride the IMU

Between LiDAR scans, the IMU drives the state forward. The continuous kinematics are the standard strapdown model — position integrates velocity, attitude integrates de-biased angular velocity, velocity integrates de-biased, gravity-corrected acceleration:

\dot{\mathbf{p}} = \mathbf{v}, \qquad \dot{\mathbf{R}} = \mathbf{R}\,(\boldsymbol{\omega}_m - \mathbf{b}_g)^\wedge, \qquad \dot{\mathbf{v}} = \mathbf{R}(\mathbf{a}_m - \mathbf{b}_a) + \mathbf{g},

with the biases doing a slow random walk. That's get_f verbatim:

// f(x,u): continuous-time kinematics
vect3 omega      = in.gyro - s.bg;            // ω = ω_m − b_g
vect3 a_inertial = s.rot * (in.acc - s.ba);   // R(a_m − b_a)
res(i)      = s.vel[i];                        // ṗ = v
res(i + 3)  = omega[i];                        // Ṙ ← ω
res(i + 12) = a_inertial[i] + s.grav[i];       // v̇ = R(a_m−b_a) + g

The predict step pushes the mean forward by $\mathbf{x} \boxplus (\Delta t\,\mathbf{f})$ and the covariance forward with the error-state Jacobians $\mathbf{F_x},\mathbf{F_w}$ :

\hat{\mathbf{x}} = \mathbf{x}\boxplus(\Delta t\,\mathbf{f}), \qquad \hat{\mathbf{P}} = \mathbf{F_x}\,\mathbf{P}\,\mathbf{F_x}^{\!\top} + (\Delta t\,\mathbf{F_w})\,\mathbf{Q}\,(\Delta t\,\mathbf{F_w})^{\!\top}.

void predict(double &dt, Matrix<double,12,12> &Q, const input_ikfom &i_in) {
    Matrix<double,24,1>  f_     = get_f(x_, i_in);    // 24×1
    Matrix<double,24,23> df_dx_ = df_dx(x_, i_in);    // ∂f/∂x
    Matrix<double,24,12> df_dw_ = df_dw(x_, i_in);    // ∂f/∂w
    x_ = x_.plus(f_, dt);                              // x ⊞ (dt·f)
    // F_x = I + dt·A·df_dx ,  F_w = dt·df_dw  (assembled via the boxplus Jacobian)
    P_ = F_x1 * P_ * F_x1.transpose() + (dt*F_w1) * Q * (dt*F_w1).transpose();
}

In NumPy the shape of it is just:

def predict(x, P, imu, dt, Q):
    f      = get_f(x, imu)              # kinematics above
    F_x, F_w = jacobians(x, imu, dt)   # error-state transition + noise maps
    x = boxplus(x, f * dt)             # advance the mean on the manifold
    P = F_x @ P @ F_x.T + F_w @ Q @ F_w.T
    return x, P

This runs once per IMU sample, and the per-sample poses are cached — we need them next.

Backward propagation: deskew the scan

Because the scan sweeps over time while the platform moves, every point was measured from a slightly different pose. Stack them naively and a flat wall comes out sheared:

scan deskew · motion compensation

platform speed during scan22

Every point lands in the same frame even though the sensor moved between measurements, so a flat wall comes out sheared by the platform's motion. Register this against the map and you'd fight your own motion. Faster motion, worse skew — turn the speed up.

FAST-LIO2 fixes this with backward propagation: walk the cached IMU poses from the scan-end time backward, and transform each point from the pose it was actually sampled at into the scan-end frame. For a point sampled at time $\rho_j$ with the IMU pose $(\mathbf{R}_j,\mathbf{t}_j)$ relative to the scan-end pose $(\mathbf{R}_e,\mathbf{t}_e)$ :

\mathbf{p}^{\text{end}}_j = \mathbf{R}_{L}^{I\,\top}\!\Big(\mathbf{R}_e^{\!\top}\big(\mathbf{R}_j(\mathbf{R}_{L}^{I}\mathbf{p}_j + \mathbf{t}_{L}^{I}) + (\mathbf{t}_j - \mathbf{t}_e)\big) - \mathbf{t}_{L}^{I}\Big)

which is exactly the compensation in UndistortPcl:

M3D R_i(R_imu * Exp(angvel_avr, dt));        // attitude at this point's sample time
V3D T_ei(pos_imu + vel_imu*dt + 0.5*acc_imu*dt*dt - imu_state.pos);
V3D P_compensate = imu_state.offset_R_L_I.conjugate() *
    (imu_state.rot.conjugate() * (R_i * (imu_state.offset_R_L_I * P_i
     + imu_state.offset_T_L_I) + T_ei) - imu_state.offset_T_L_I);

Now every point lives in one consistent frame and it's safe to register against the map.

The measurement: point-to-plane

FAST-LIO2 doesn't extract features. For each deskewed point it transforms it into the world with the current state, finds its 5 nearest map points via the ikd-Tree, fits a plane to them, and the residual is the point-to-plane distance — zero when the point sits exactly on the surface:

The point-to-plane measurement model: a scan point (red) and the corresponding plane fit from nearby map points (blue), with the plane normal u_j; the residual is the signed distance from the point to the plane. — The measurement model (paper, Figure 2): a scan point (red) is matched to the local plane fit from its nearest map points (blue). The residual is the signed point-to-plane distance along the normal uⱼ — minimized when the estimated pose lands the point on the surface.

For a point $\mathbf{p}$ in the body frame, transformed to world $\mathbf{p}^W = \mathbf{R}(\mathbf{R}_L^I\mathbf{p}+\mathbf{t}_L^I)+\mathbf{p}$ , a plane with unit normal $\mathbf{u}$ and offset $d$ gives residual $z = \mathbf{u}^{\!\top}\mathbf{p}^W + d$ . That's h_share_model:

V3D p_global(s.rot * (s.offset_R_L_I * p_body + s.offset_T_L_I) + s.pos);  // to world
ikdtree.Nearest_Search(point_world, NUM_MATCH_POINTS, points_near, sqDis); // 5 nearest
if (esti_plane(pabcd, points_near, 0.1f)) {                                 // fit plane (a,b,c,d)
    float pd2 = pabcd(0)*x + pabcd(1)*y + pabcd(2)*z + pabcd(3);            // point-to-plane dist
    ...
}
// Jacobian row (w.r.t. attitude θ and extrinsic), residual = −distance
V3D C(s.rot.conjugate() * norm_vec);          // Rᵀu
V3D A(point_I_crossmat * C);                  // (R_L^I p + t_L^I)^∧ Rᵀu
ekfom_data.h_x.block<1,12>(i,0) << norm_p.x, norm_p.y, norm_p.z, A, ...;   // [ u | A | … ]
ekfom_data.h(i) = -norm_p.intensity;          // the residual

The crucial detail: the Jacobian $\mathbf{H}$ is $m\times 12$ — $m$ thousands of points, but only 12 columns (6 for pose, 6 for the extrinsic), because a single LiDAR scan can't observe velocity, biases, or gravity directly. Hold that thought; it's why the next step is fast. In NumPy:

def build_H_z(points_body, x, ikdtree, map_pts, R_LI, t_LI):
    H, z = [], []
    for p in points_body:
        pw = x.R @ (R_LI @ p + t_LI) + x.p          # body → world
        nn = ikdtree.nearest(pw, k=5)               # 5 nearest map points
        n, d = fit_plane(map_pts[nn])               # unit normal, offset
        r = n @ pw + d                              # point-to-plane distance
        if abs(r) < 0.1:                            # keep confident matches
            pI = R_LI @ p + t_LI
            A  = skew(pI) @ (x.R.T @ n)             # ∂r/∂θ  (attitude block)
            H.append(np.concatenate([n, A]))        # [ normal | attitude ]
            z.append(-r)
    return np.array(H), np.array(z)                 # H: m×6 (here), z: m

The iterated update, and the gain that makes it cheap

A single EKF update would linearize the very-nonlinear point-to-plane fit once, at a possibly-wrong pose, and be off. So FAST-LIO2 iterates: re-associate, rebuild $\mathbf{H}$ at the latest estimate, take one Kalman step, repeat until the correction is tiny. Watch the scan snap onto the map:

iterated EKF · point-to-plane registration

iteration

0/5

mean residual

8.0

point-to-plane residual

Each iteration re-associates points to planes at the latest estimate, builds the measurement Jacobian H, and takes one Kalman step dx = K·h + (I−KH)(x⊟x̂). Relinearizing is what makes the very nonlinear point-to-plane fit converge.

Each iteration is

\delta\mathbf{x} = \mathbf{K}\,\mathbf{z} + (\mathbf{I}-\mathbf{K}\mathbf{H})(\mathbf{x}^\kappa \boxminus \hat{\mathbf{x}}), \qquad \mathbf{x}^{\kappa+1} = \mathbf{x}^\kappa \boxplus \delta\mathbf{x},

iterating $\kappa$ until every component of $\delta\mathbf{x}$ drops below $10^{-3}$ . The piece that makes FAST-LIO fast is the reformulated Kalman gain. The textbook form,

\mathbf{K} = \hat{\mathbf{P}}\mathbf{H}^{\!\top}(\mathbf{H}\hat{\mathbf{P}}\mathbf{H}^{\!\top}+\mathbf{R})^{-1},

inverts an $m\times m$ matrix — and $m$ is thousands of points. FAST-LIO uses the information-form identity to rewrite it as

\mathbf{K} = (\mathbf{H}^{\!\top}\mathbf{R}^{-1}\mathbf{H} + \hat{\mathbf{P}}^{-1})^{-1}\mathbf{H}^{\!\top}\mathbf{R}^{-1},

which inverts a $23\times 23$ matrix — the state dimension — no matter how many points there are. That's the whole trick, and in clean code it's one block:

// R is a scalar (LASER_POINT_COV = 0.001), so R⁻¹ = 1/R
auto K_front = (HTH / R + P_.inverse()).inverse();      // (HᵀR⁻¹H + P⁻¹)⁻¹  — 23×23
K = K_front.block<23,12>(0,0) * H.transpose() / R;      // … Hᵀ R⁻¹
Matrix<double,23,1> dx_ = K * dyn_share.h               // K z
       + (Matrix<double,23,23>::Identity() - K*H) * dx_new;  // (I−KH)(x ⊟ x̂)
x_ = x_.boxplus(dx_);
// convergence: every |dx_[j]| < epsi (0.001); then update covariance
P_ = (Matrix<double,23,23>::Identity() - K*H) * P_;

The same loop in NumPy, with the cheap gain spelled out:

def update_iterated(x, P, points_body, ikdtree, map_pts, R=1e-3, max_iter=4, eps=1e-3):
    x_prior = x.copy()
    n = P.shape[0]                                   # 23 (error-state dim)
    for _ in range(max_iter):
        H, z = build_H_z(points_body, x, ikdtree, map_pts, R_LI, t_LI)  # relinearize
        # information-form gain: invert (state × state), independent of len(z)
        S = H.T @ H / R + np.linalg.inv(P)           # n×n
        K = np.linalg.solve(S, H.T) / R              # K = S⁻¹ Hᵀ R⁻¹
        dx = K @ z + (np.eye(n) - K @ H) @ (-boxminus(x, x_prior))
        x  = boxplus(x, dx)
        if np.max(np.abs(dx)) < eps:
            break
    P = (np.eye(n) - K @ H) @ P
    return x, P

That's the engine. The converged $\mathbf{x}$ is your odometry output, published at LiDAR rate.

The map: an incremental k-d tree

The nearest-neighbor search in the measurement step is the hot path, and the map is growing and moving. A static k-d tree would be rebuilt every scan — fatal. The ikd-Tree instead inserts points in place, downsamples on the tree, deletes whole regions with one box-wise delete as the local map window slides with the sensor, and lazily re-balances only the subtrees that get lopsided:

ikd-Tree · incremental map with a moving window

active map points

window

bounded

query

kNN, balanced

A static k-d tree would have to be rebuilt from scratch every time the map changed. The ikd-Tree instead inserts and deletes points in place, does voxel downsampling on the tree itself, removes whole regions with one box-wise delete as the window slides, and lazily re-balances only the subtrees that get lopsided — so a moving robot keeps a bounded, balanced map and the nearest-neighbor search in the measurement step stays cheap.

2D illustration of ikd-Tree map region management: a local map cube around the sensor that slides as the platform moves, with regions added at the leading edge and removed at the trailing edge. — Map region management (paper, Figure 3): the ikd-Tree keeps a local map window around the sensor. As the platform moves, new regions are inserted and far regions are removed with box-wise deletes — keeping the active map bounded and the kNN query fast.

In code it's a handful of calls:

ikdtree.Build(feats_down_world->points);          // first scan
ikdtree.Add_Points(PointToAdd, true);             // incremental insert + on-tree downsample
ikdtree.Delete_Point_Boxes(cub_needrm);           // box-wise delete (window slid)
ikdtree.Nearest_Search(point_world, 5, near, d);  // kNN, inside the measurement step

The payoff is real: on the authors' benchmarks FAST-LIO2 spends less time per scan than FAST-LIO while holding a larger map, on both Intel and Arm.

Processing time per LiDAR scan over time for FAST-LIO and FAST-LIO2 on Intel and Arm CPUs (log scale, top), and the number of map points held over time (bottom); FAST-LIO2 is consistently faster while keeping more map points. — Per-scan processing time (paper, Figure 8): FAST-LIO2 (cyan/red) stays below FAST-LIO (green/purple) on both Intel and Arm — while the bottom panel shows it maintaining a larger map. The ikd-Tree is why the direct, all-points approach is still real-time.

Putting it together — and dropping ROS

Here's the entire main loop, which is shorter than you'd expect:

while (running) {
  if (sync_packages(Measures)) {                  // group IMU + one LiDAR scan by time
    p_imu->Process(Measures, kf, feats_undistort); // forward-propagate + deskew
    downSizeFilterSurf.filter(*feats_down_body);   // voxel-downsample the scan
    kf.update_iterated_dyn_share_modified(         // the iterated point-to-plane EKF
        LASER_POINT_COV, feats_down_body, ikdtree, Nearest_Points,
        NUM_MAX_ITERATIONS, extrinsic_est_en);
    state_point = kf.get_x();                      // odometry output
    map_incremental();                             // ikdtree.Add_Points(...)
  }
}

Notice what's not algorithm here: sync_packages is just time-aligning two streams, publish_odometry/publish_frame_world are ROS topics, and tf is bookkeeping. None of that is the filter. To reproduce FAST-LIO2 without ROS you only need:

You need	You don't need
read IMU samples (t, ω, a) from a file/array	ROS subscribers / message types
read LiDAR points (x, y, z, per-point time)	rosbag, nodelets
forward-propagate + deskew (the IMU code)	tf tree
a k-d tree over the map (ikd-Tree, or even scipy `cKDTree` rebuilt per scan to start)	rviz, publishers
the iterated point-to-plane update	the IKFoM template layer

A no-ROS skeleton is just the loop, fed from arrays:

x, P = init_state(), init_cov()
ikdtree = KDMap(voxel=0.5)                 # or scipy cKDTree to begin with
for scan in lidar_scans:                   # each: points + per-point timestamps
    imu_batch = imu_between(prev_t, scan.t_end)
    for imu in imu_batch:                  # 1) forward propagation
        x, P = predict(x, P, imu, imu.dt, Q)
    pts = deskew(scan.points, imu_poses, x)        # 2) backward propagation
    pts = voxel_downsample(pts, 0.5)               # 3) downsample
    x, P = update_iterated(x, P, pts, ikdtree, ikdtree.points)  # 4) iterated EKF
    ikdtree.add(transform_to_world(pts, x))        # 5) grow the map
    yield x.p, x.R                                 # pose = odometry

Start with a cKDTree rebuilt each scan to get the algorithm working end to end, then swap in a true incremental tree once you care about speed. That ordering — correctness first, then the ikd-Tree — is exactly how to learn it.

To anchor yourself in the real repo, here's the file → concept map for the clean version:

File	What it owns
`use-ikfom.hpp`	the state manifold, `get_f`, `df_dx`, `df_dw`
`esekfom.hpp`	the explicit ESEKF: `predict`, `h_share_model`, `update_iterated_dyn_share_modified`, the reformulated gain
`IMU_Processing.hpp`	IMU init, forward propagation, `UndistortPcl` (deskew)
`ikd_Tree.cpp`	`Build`, `Add_Points`, `Delete_Point_Boxes`, `Nearest_Search`
`laserMapping.cpp`	the ROS glue + main loop (the part you replace)

A complete, runnable implementation

I wrote the whole thing as one dependency-light file — fastlio2_mini.py (≈250 lines, numpy + scipy + rosbags). It's a faithful teaching implementation: the on-manifold state, forward/backward propagation, the iterated point-to-plane update with the reformulated gain — all the code blocks above, assembled and tested. It takes a real LiDAR→IMU extrinsic and does per-scan voxel downsampling; the remaining simplifications, called out honestly, are gravity fixed after init and a scipy cKDTree rebuilt per scan instead of a true ikd-Tree.

The driver is the no-ROS loop, fed from plain arrays — read IMU, propagate (caching poses), deskew into the IMU frame, downsample, iterated-update, grow the map:

def run_offline(imu_stream, lidar_scans, voxel=0.4, scan_voxel=0.5,
                T_LI=None, R_LI=None, acc_cov=1e-2, gyr_cov=1e-2,
                bacc_cov=1e-4, bgyr_cov=1e-4, init_secs=0.5):
    Q = np.diag([gyr_cov]*3 + [acc_cov]*3 + [bgyr_cov]*3 + [bacc_cov]*3)
    R_LI = np.eye(3) if R_LI is None else np.asarray(R_LI, float)
    T_LI = np.zeros(3) if T_LI is None else np.asarray(T_LI, float)
    to_imu = lambda p: (R_LI @ p.T).T + T_LI                # LiDAR points -> IMU frame
    g, bg = imu_init([s for s in imu_stream if s[0] < imu_stream[0][0] + init_secs])
    kf = ESEKF(g); kf.x.bg = bg
    lmap = LocalMap(voxel); traj = []; imu_i = 0; bootstrapped = False
    for scan in lidar_scans:
        poses = []
        while imu_i < len(imu_stream) and imu_stream[imu_i][0] <= scan['t_end']:
            t, acc, gyro = imu_stream[imu_i]
            dt = t - (imu_stream[imu_i-1][0] if imu_i > 0 else t)
            if dt > 0: kf.predict(acc, gyro, dt, Q)        # 1. forward propagation
            poses.append((t, kf.x.R.copy(), kf.x.p.copy()))
            imu_i += 1
        body = to_imu(scan['points'])                                   # extrinsic
        pts = deskew(body, scan['dts'], poses, scan['t_end'])           # 2. backward deskew
        pts = voxel_downsample(pts, scan_voxel)                         # sparse, even set
        if not bootstrapped:
            lmap.add((kf.x.R @ pts.T).T + kf.x.p); bootstrapped = True   # seed the map
        else:
            kf.update(pts, lmap)                            # 3. iterated point-to-plane EKF
            lmap.add((kf.x.R @ pts.T).T + kf.x.p)           # 4. grow the map
        traj.append((scan['t_end'], kf.x.p.copy(), kf.x.R.copy()))
    return traj, lmap

It ships with a synthetic world (a robot looping through a 10×10×3 m room) so you can run it with no dataset at all — and that's how I validated it:

$ python fastlio2_mini.py
in-memory  : ATE rmse = 0.037 m   final = 0.020 m
via .bag   : ATE rmse = 0.069 m   final = 0.058 m

The second line is the important one: the file also writes the simulated data to a real ROS1 .bag (sensor_msgs/Imu + PointCloud2), reads it back through read_bag(), and re-runs — exercising the exact bag-parsing path you'd use on real hardware, end to end, to 4–7 cm of absolute trajectory error. The math is correct.

Reading a real `.bag` — including Livox

read_bag() uses the pure-python rosbags (no ROS install) and handles both standard sensor_msgs/PointCloud2 (Velodyne/Ouster) and Livox's custom CustomMsg. Livox is the catch with FAST-LIO data — its bags aren't PointCloud2, they're a custom message, so you register the type definition and parse it yourself:

from rosbags.typesys import Stores, get_typestore
from rosbags.typesys.msg import get_types_from_msg
ts = get_typestore(Stores.ROS1_NOETIC)
ts.register(get_types_from_msg(                              # the Livox point struct
    "uint32 offset_time\nfloat32 x\nfloat32 y\nfloat32 z\n"
    "uint8 reflectivity\nuint8 tag\nuint8 line\n", 'livox_ros_driver/msg/CustomPoint'))
ts.register(get_types_from_msg(                              # the Livox scan message
    "std_msgs/Header header\nuint64 timebase\nuint32 point_num\nuint8 lidar_id\n"
    "uint8[3] rsvd\nlivox_ros_driver/CustomPoint[] points\n", 'livox_ros_driver/msg/CustomMsg'))
# then: msg.points -> (x,y,z, offset_time);  offset_time is per-point time for the deskew

So fetching and running an actual HKU dataset is two steps:

pip install gdown
gdown 1YqxHuDKzWUcda80QKBV61lXI86TXsGjP -O avia.bag    # a Livox Avia indoor bag
python fastlio2_mini.py avia.bag

What happens on real data — and the bug that taught me the most

I ran exactly that on the HKU Avia "quick-shack" bag (49 s, 9953 IMU + 491 Livox scans). It tracks. Over the full bag the filter follows 47.1 m of path and 47.9 rad of cumulative rotation (the sensor is waved around a small room — roughly eight full turns in a ~3 m box) and reconstructs a recognizable room:

Two plots from running fastlio2_mini on the HKU Livox Avia bag: a top-down view showing the estimated path (colored by time) tracing through a reconstructed room point cloud, and a side x–z view; the path stays bounded within the room rather than collapsing or diverging. — fastlio2_mini on the real HKU Livox Avia bag (491 scans, 49 s). Left: top-down path (colored by scan time) over the reconstructed map — the room's walls are clearly recovered. Right: the x–z side view. This is the minimal Python filter — no ROS, ~250 lines — tracking a real Livox bag end to end.

But it did not track on the first try, and the reason is the most useful thing in this whole article — because it wasn't the filter math at all. The first run collapsed exactly the way a broken LIO does: the trajectory froze within a few centimetres of the origin while the IMU clearly showed the sensor swinging through ~1 rad/s of rotation. I almost wrote it off as "the teaching filter isn't robust enough." It wasn't that.

The bug was a sensor-clock mismatch. I instrumented the scan timestamps and found them landing at t ≈ -1.6e9 relative to the IMU — an impossible 50-year gap. The Livox CustomMsg header stamps each scan on the sensor's own clock (seconds since the LiDAR booted, ~361 s into this bag), while the /livox/imu messages are stamped on the bag's record clock (Unix time, ~1.6 billion). My propagate-up-to-scan-end loop compares the two:

while imu_stream[imu_i][0] <= scan['t_end']:   # IMU time vs LiDAR header time
    kf.predict(...)                            # ...never true → never runs

Because every IMU timestamp (1.6e9) was vastly larger than every scan's header time (361), that condition was never true. The IMU never propagated. The filter sat at its initial pose, the update snapped each scan onto the origin-seeded map, and the whole thing looked like a plausible "data-association collapse" — when really it was a unit/epoch bug two layers down. The fix is to ignore the Livox header entirely and timestamp each scan with the bag record time (which rosbags gives you for every message, on one consistent clock):

for conn, t, raw in reader.messages(connections=conns):   # t = bag record time (ns)
    ...
    lidar_scans.append({'t_end': t * 1e-9, 'points': pts, 'dts': dts})  # not msg.header!

That one change is the difference between a frozen origin and a tracked 47 m trajectory. Three smaller changes turned tracking into clean tracking, and they're the calibration work any real deployment needs: the LiDAR→IMU extrinsic from the repo's avia.yaml (T = [0.04165, 0.02326, -0.0284], R = I), the real IMU noise (acc_cov = gyr_cov = 0.1, which my synthetic Q had 100× too small — too small and the filter trusts a stale prediction and refuses to move), and per-scan voxel downsampling so the iterated update stays real-time on tens of thousands of raw points. All four are wired into run_offline and the CLI uses the Avia values by default, so python fastlio2_mini.py avia.bag just works.

The honest caveats remain: this is still a teaching filter. The reconstructed map smears where fast rotation meets the narrow ~70° Avia FOV (you can see stray points in the wide view), there's no loop closure, and the cKDTree-rebuilt-per-scan map is why a full bag takes a few minutes offline rather than running at sensor rate. Real FAST-LIO2's ikd-Tree, in-state gravity, and tighter outlier handling are what close that last gap. But the spine — the five steps, the manifold state, the reformulated gain — is exactly what's running here, and it's enough to track a real Livox bag. The lesson I'd carry out of this: in sensor fusion, check your clocks first. A mismatched epoch or a nanosecond-vs-second unit error masquerades perfectly as a modelling failure, and you can waste a day tuning covariances that were never the problem.

Honest notes

It needs a decent IMU and initialization. The filter assumes a high-rate IMU (200 Hz+) and a short static period at start to estimate the gravity direction and biases. Garbage init, garbage trajectory.
Geometric degeneracy is the real failure mode. Point-to-plane constraints vanish in a long featureless tunnel or an open field — the update becomes unobservable along the degenerate direction and the IMU drift takes over. This is fundamental to LiDAR odometry, not a bug.
It's odometry, not loop-closing SLAM. FAST-LIO2 drifts slowly but has no global loop closure; pair it with a pose-graph backend (e.g. a FAST-LIO-SLAM setup) if you need globally consistent maps.
The "100 Hz" is real but hardware-shaped. The headline rates assume the ikd-Tree and a reasonable CPU; the per-scan cost grows with map density and point count, which is exactly what the downsampling and the moving window are there to bound.

The thing I'd take away: FAST-LIO2 is not a pile of heuristics — it's one iterated error-state Kalman filter, fed a deskewed point cloud, corrected by point-to-plane residuals, over an incremental map, with a gain rewritten so thousands of measurements cost the same as a few. Understand those five steps and you can rebuild it, and you understand the spine of modern LiDAR SLAM.

Built on FAST-LIO2: Fast Direct LiDAR-Inertial Odometry (Xu, Cai, Bai, Zhang, 2021), the original FAST-LIO and ikd-Tree papers, and the HKU-MARS/FAST_LIO source. C++ snippets are from the clean reimplementation zlwang7/S-FAST_LIO; Python is simplified for teaching.