Unconventional AI's Un-0: generating images with coupled oscillators

2026-06-27 · 12 min · generative-models · neuromorphic · physics · image-generation · explainer

Every image model you know is built from the same parts: neural-network layers, a lot of matrix multiplies, and — for the generative step — either a diffusion schedule or an adversary. Un-0 throws all of that out. Its computational core is a population of coupled oscillators, and the generative step is just letting them settle.

This is the first release from Unconventional AI, the company Naveen Rao (ex-Databricks AI head, founder of Nervana and MosaicML) started with Michael Carbin and Sara Achour, on a $475M seed. The thesis is one sentence: physics as a computational primitive. Instead of simulating a dynamical system on a von Neumann machine, run the dynamical system directly in analog silicon, and let the chip's physics be the computation — chasing brain-like (~20 W) efficiency.

Un-0 is explicitly the "hello world" of that program: a proof, in software simulation, that the math produces real images. The chip doesn't exist yet. Keep that line bright; I'll come back to it.

Here is the thing itself: a field of oscillators, each pulled toward its neighbours. Raise the coupling and watch incoherent speckle organise into travelling waves. That self-organisation — not a matrix multiply — is the computation Un-0 runs.

2D coupled-oscillator field · hue = phaser = 0.00

coupling strength K2.60

palette

From random phase the field is pure speckle (low K). Raise the coupling and neighbours fall into step, carving out travelling spiral waves and chimera-like domains where order and chaos coexist — the structure Un-0 decodes into pixels.

The primitive: Kuramoto oscillators

An oscillator is just a phase $\theta_i \in [0, 2\pi)$ turning at its own natural frequency $\omega_i$ . Couple a population of them and each one also feels a pull toward its neighbours' phases. That's the Kuramoto model:

\dot{\theta}_i \;=\; \omega_i \;+\; \frac{K}{N}\sum_{j=1}^{N} \sin(\theta_j - \theta_i)

$K$ is the coupling strength. The behaviour has a sharp phase transition. Below a critical $K$ , everyone runs at their own frequency and the phases scatter — incoherent. Above it, the population spontaneously synchronizes into one travelling cluster. The standard measure is the order parameter

r\,e^{i\psi} \;=\; \frac{1}{N}\sum_{j=1}^{N} e^{i\theta_j},

where $r \to 0$ is total incoherence and $r \to 1$ is full lock. You've seen this in the physical world — pendulum metronomes started out of step on a shared, freely-moving base pull each other into perfect synchrony:

Several pendulum metronomes started at different phases on a common moving platform gradually synchronizing into lockstep. — Coupled metronomes on a shared base (Unconventional AI): the same Kuramoto physics in hardware — independent oscillators, weakly coupled through the platform, spontaneously phase-lock.

Each oscillator can be drawn as its own dial. Drag $K$ through the transition and watch the hands go from smeared to locked, and $r$ climb:

oscillator phases · θ̇ᵢ = ωᵢ + K·r·sin(ψ − θᵢ)r = 0.01

coupling strength K0.60

each to its own rhythmlocked in step

At K=0 every dial runs at its own natural frequency ωᵢ and the hands smear across all phases (r≈0). Raise K and each hand feels a pull toward the population’s mean phase proportional to how synchronized it already is — a positive feedback that snaps the whole grid into lockstep past a critical coupling. Un-0 shapes exactly this settling to encode an image.

The point for Un-0: that transition, and the rich partially-synchronized regime around it, is a programmable dynamical system. If you can shape the coupling and the frequencies, the settled phase pattern can encode something — like an image.

The pipeline: condition, evolve, read out

Un-0's main class is a ConditionalImplicitKuramotoGenerator. There's no denoising schedule, no adversary, no iterative refinement loop in the diffusion sense — just an ODE you integrate forward once.

Un-0's generation pipeline: random initial phases, conditioned by a separate class-oscillator array through one-directional coupling, are evolved through the Kuramoto ODE for a fixed time T (explicit Euler). The settled phases are read out via sin/cos and a small conventional decoder (≤15% of parameters) renders pixels. No diffusion schedule.

Step by step:

Initialize every oscillator's phase randomly.
Condition on the class label through a separate oscillator array that couples one-directionally into the main population — the label bends the dynamics without being bent back.
Evolve the coupled ODE forward for a fixed time $T$ with explicit Euler integration. This is the entire "generation" — no schedule, no sampler loop.
Read out the settled phases via $\sin\theta, \cos\theta$ .
Decode with a small conventional network — capped at ≤15% of total parameters — to produce pixels.

Training learns the coupling matrix $K$ , the natural frequencies $\omega$ , and the decoder weights, via a "drifting loss" that uses a frozen DINOv2 feature extractor, with AdamW. So the learning is conventional gradient descent; what's unconventional is that the thing being learned is the physics of a dynamical system, not a stack of attention layers.

Training: differentiating through the dynamics

The subtle part is how you get gradients into an ODE. The forward pass is the Euler integration of the Kuramoto system — a long chain of $\sin$ -coupled updates — and the decoder reads the final state. Because every step is differentiable, you can backpropagate through the unrolled trajectory and update $K$ , $\omega$ , and the decoder end-to-end. The "drifting loss" supervises in a perceptual feature space (a frozen DINOv2 encoder) rather than raw pixels, which is what lets a tiny decoder — capped at ≤15% of parameters — get away with so little work: the oscillator field is doing the heavy lifting, and the loss only has to match high-level features, not paint exact RGB.

Two things fall out of this design that are worth stating plainly:

Capacity lives in the coupling. Almost all the model's parameters are the coupling matrix $K$ (it's $O(n^2)$ in the oscillator count $n$ ), which is exactly why FID improves monotonically as you scale $n$ — you're literally adding interaction terms to the dynamical system.
The natural frequencies $\omega$ are learned, not fixed. The model gets to choose each oscillator's intrinsic rhythm, so it can place itself wherever in the synchronize/desynchronize landscape is most useful for a given class.

Oscillators vs diffusion

It's tempting to file Un-0 under "another iterative generator," but the comparison is instructive precisely because of how it differs:

	Diffusion model	Un-0 (oscillators)
Generative step	reverse a noising schedule, T denoising passes	integrate one coupled ODE to time T
Core compute	matrix multiplies in NN layers	sin-coupled phase updates
Conditioning	cross-attention / adaLN on the class	one-directional coupling from class oscillators
Stochasticity	injected noise at each step	random initial phases only
Why it might be efficient	—	the physics can run in analog silicon

A diffusion model spends its compute pushing tensors through learned layers many times. Un-0 spends its compute letting a physical system relax. On a GPU that's a wash at best — more on that below — but the bet is that the relaxation is free when the substrate is the right kind of analog hardware.

Does it actually generate images?

Yes — and the honest version is "yes, modestly". Un-0 is class-conditional and low-res, and the company is upfront that it underperforms state-of-the-art generators like EDM. The headline is FID 6.74 on ImageNet 64×64, which they frame as matching early conventional generators.

A mosaic of small images generated by Un-0's coupled-oscillator model — recognizable class-conditional samples at low resolution. — Samples from Un-0 (Unconventional AI). Class-conditional, low-resolution, generated by integrating a coupled-oscillator ODE and decoding the settled phases — no diffusion schedule involved.

And here is the generation happening — a row of samples resolving out of the oscillator field as the ODE integrates forward in time. There's no denoising loop; this is the population relaxing toward its conditioned attractor and the decoder reading it out frame by frame:

A row of Un-0 generated images sharpening from blur into recognizable class-conditional samples as the oscillator ODE integrates over time. — Un-0 generation over integration time (Unconventional AI): each tile resolves from noise into a recognizable image as the coupled oscillators settle — the forward pass is the dynamics relaxing, not a sampler loop.

FID scales the way you'd hope with oscillator count $n$ (more oscillators, lower FID):

Dataset	config	params	FID (↓)
CIFAR-10 32×32	n1024	1.3M	~11.0
CIFAR-10 32×32	n2048	4.9M	~9.3
CIFAR-10 32×32	n4096	19.4M	~8.8
ImageNet 64×64	n6656	57M	~8.4
ImageNet 64×64	n10240	130M	~8.0
ImageNet 64×64	n16384	322M	6.74

Parameter-count versus FID Pareto curve for Un-0 on ImageNet 64x64, FID dropping as oscillator count and parameters grow. — Params-vs-FID frontier on ImageNet 64×64 (Unconventional AI): FID falls monotonically as the oscillator population grows, reaching 6.74 at n16384 / 322M params.

The same monotone scaling holds on CIFAR-10, where even a 1.3M-parameter field already reaches a usable FID:

Parameter-count versus FID Pareto curve for Un-0 on CIFAR-10, FID dropping from about 11 to about 8.8 as the oscillator population grows. — Params-vs-FID frontier on CIFAR-10 32×32 (Unconventional AI): from ~11.0 at 1.3M params (n1024) down to ~8.8 at 19.4M (n4096).

Note the FID values wobble slightly between the blog and the repo README (e.g. 8.41 vs 8.36) — these are self-reported, not third-party-reproduced, so treat them as approximate. The compute is non-trivial too: the largest ImageNet run is reported around 640 B200-GPU-hours — simulating the oscillators on conventional GPUs is the expensive part, which is exactly the cost the proposed chip is meant to erase.

The hardware bet

This is where the whole thing either pays off or doesn't. Today's accelerators are von Neumann machines: weights live in memory, you stream them to compute units, multiply, and write back. That shuffle — not the arithmetic — is where most of the energy goes.

Unconventional AI's proposal is to build the oscillators in physical silicon (CMOS ring oscillators are the usual candidate), so that the coupled dynamics happen rather than being computed. There's no weight streaming because the coupling is the wiring; the system's settling to a synchronized state is the forward pass. The aspiration is brain-like efficiency — order-of-tens-of-watts, against data-center GPUs — and the "1000×" figure is a projection of what that substrate could do relative to simulating the same ODE on a GPU.

It's a real idea with real lineage — analog and neuromorphic computing has chased this for decades — and the team (Naveen Rao, plus Michael Carbin from MIT and Sara Achour from Stanford on the hardware/compiler side) is credible. But it is, today, a proposal. The repo says chip schematics are "coming soon".

The part to keep straight

So separate two claims cleanly. Demonstrated today, in simulation: a coupled-oscillator ODE, conditioned and evolved once, decodes into recognizable class-conditional images at FID 6.74 (ImageNet-64). Proposed, not yet built: the analog oscillator chip whose physics would run that ODE for ~1000× less energy. The first is a real, open, checkable result. The second is a hardware vision — credible given the team and funding, but unbuilt and unverified.

What I make of it

The idea is genuinely different, not a reskin. Replacing layers + a diffusion schedule with "set up a dynamical system and let it settle" is a real departure. The generative step is an ODE integration, and the learned object is the physics itself.
The demo is honest and modest. FID 6.74 on ImageNet-64 is a proof-of-concept that the math closes, deliberately framed as a "hello world", explicitly behind SOTA. That honesty is worth more than a cherry-picked headline.
The whole bet lives in the hardware that isn't here. On a GPU, simulating oscillators is slower and costlier than just running a normal generator — the entire payoff is conditional on the analog chip materializing and delivering the projected efficiency. Until silicon exists, "1000×" is a hypothesis, and the right way to read Un-0 is as a credible research demonstration of physics-based generative computing — not a shipping efficiency win.

Built on Unconventional AI's Un-0 technical writeup and the MIT-licensed Un-0 code. Benchmarks are self-reported; the analog-hardware efficiency claim is a founder projection, not a measured result.