~/satyajit

Unconventional AI's Un-0: generating images with coupled oscillators

mdjsonmcp

2026-06-27 · 12 min · generative-models · neuromorphic · physics · image-generation · explainer

Every image model you know is built from the same parts: neural-network layers, a lot of matrix multiplies, and — for the generative step — either a diffusion schedule or an adversary. Un-0 throws all of that out. Its computational core is a population of coupled oscillators, and the generative step is just letting them settle.

This is the first release from Unconventional AI, the company Naveen Rao (ex-Databricks AI head, founder of Nervana and MosaicML) started with Michael Carbin and Sara Achour, on a $475M seed. The thesis is one sentence: physics as a computational primitive. Instead of simulating a dynamical system on a von Neumann machine, run the dynamical system directly in analog silicon, and let the chip's physics be the computation — chasing brain-like (~20 W) efficiency.

Un-0 is explicitly the "hello world" of that program: a proof, in software simulation, that the math produces real images. The chip doesn't exist yet. Keep that line bright; I'll come back to it.

Here is the thing itself: a field of oscillators, each pulled toward its neighbours. Raise the coupling and watch incoherent speckle organise into travelling waves. That self-organisation — not a matrix multiply — is the computation Un-0 runs.

2D coupled-oscillator field · hue = phaser = 0.00
coupling strength K2.60

From random phase the field is pure speckle (low K). Raise the coupling and neighbours fall into step, carving out travelling spiral waves and chimera-like domains where order and chaos coexist — the structure Un-0 decodes into pixels.

The primitive: Kuramoto oscillators

An oscillator is just a phase θi[0,2π)\theta_i \in [0, 2\pi) turning at its own natural frequency ωi\omega_i. Couple a population of them and each one also feels a pull toward its neighbours' phases. That's the Kuramoto model:

θ˙i  =  ωi  +  KNj=1Nsin(θjθi)\dot{\theta}_i \;=\; \omega_i \;+\; \frac{K}{N}\sum_{j=1}^{N} \sin(\theta_j - \theta_i)

KK is the coupling strength. The behaviour has a sharp phase transition. Below a critical KK, everyone runs at their own frequency and the phases scatter — incoherent. Above it, the population spontaneously synchronizes into one travelling cluster. The standard measure is the order parameter

reiψ  =  1Nj=1Neiθj,r\,e^{i\psi} \;=\; \frac{1}{N}\sum_{j=1}^{N} e^{i\theta_j},

where r0r \to 0 is total incoherence and r1r \to 1 is full lock. You've seen this in the physical world — pendulum metronomes started out of step on a shared, freely-moving base pull each other into perfect synchrony:

Coupled metronomes on a shared base (Unconventional AI): the same Kuramoto physics in hardware — independent oscillators, weakly coupled through the platform, spontaneously phase-lock.

Each oscillator can be drawn as its own dial. Drag KK through the transition and watch the hands go from smeared to locked, and rr climb:

oscillator phases · θ̇ᵢ = ωᵢ + K·r·sin(ψ − θᵢ)r = 0.01
coupling strength K0.60
each to its own rhythmlocked in step

At K=0 every dial runs at its own natural frequency ωᵢ and the hands smear across all phases (r≈0). Raise K and each hand feels a pull toward the population’s mean phase proportional to how synchronized it already is — a positive feedback that snaps the whole grid into lockstep past a critical coupling. Un-0 shapes exactly this settling to encode an image.

The point for Un-0: that transition, and the rich partially-synchronized regime around it, is a programmable dynamical system. If you can shape the coupling and the frequencies, the settled phase pattern can encode something — like an image.

The pipeline: condition, evolve, read out

Un-0's main class is a ConditionalImplicitKuramotoGenerator. There's no denoising schedule, no adversary, no iterative refinement loop in the diffusion sense — just an ODE you integrate forward once.

randomphases θ(0)class label→ cond. oscillators1-way couplingKuramoto ODEEuler, evolve to Tlearn K, ωreadoutsin θ, cos θdecoder≤15% paramsimage
Un-0's generation pipeline: random initial phases, conditioned by a separate class-oscillator array through one-directional coupling, are evolved through the Kuramoto ODE for a fixed time T (explicit Euler). The settled phases are read out via sin/cos and a small conventional decoder (≤15% of parameters) renders pixels. No diffusion schedule.

Step by step:

  1. Initialize every oscillator's phase randomly.
  2. Condition on the class label through a separate oscillator array that couples one-directionally into the main population — the label bends the dynamics without being bent back.
  3. Evolve the coupled ODE forward for a fixed time TT with explicit Euler integration. This is the entire "generation" — no schedule, no sampler loop.
  4. Read out the settled phases via sinθ,cosθ\sin\theta, \cos\theta.
  5. Decode with a small conventional network — capped at ≤15% of total parameters — to produce pixels.

Training learns the coupling matrix KK, the natural frequencies ω\omega, and the decoder weights, via a "drifting loss" that uses a frozen DINOv2 feature extractor, with AdamW. So the learning is conventional gradient descent; what's unconventional is that the thing being learned is the physics of a dynamical system, not a stack of attention layers.

Training: differentiating through the dynamics

The subtle part is how you get gradients into an ODE. The forward pass is the Euler integration of the Kuramoto system — a long chain of sin\sin-coupled updates — and the decoder reads the final state. Because every step is differentiable, you can backpropagate through the unrolled trajectory and update KK, ω\omega, and the decoder end-to-end. The "drifting loss" supervises in a perceptual feature space (a frozen DINOv2 encoder) rather than raw pixels, which is what lets a tiny decoder — capped at ≤15% of parameters — get away with so little work: the oscillator field is doing the heavy lifting, and the loss only has to match high-level features, not paint exact RGB.

Two things fall out of this design that are worth stating plainly:

Oscillators vs diffusion

It's tempting to file Un-0 under "another iterative generator," but the comparison is instructive precisely because of how it differs:

Diffusion modelUn-0 (oscillators)
Generative stepreverse a noising schedule, T denoising passesintegrate one coupled ODE to time T
Core computematrix multiplies in NN layerssin-coupled phase updates
Conditioningcross-attention / adaLN on the classone-directional coupling from class oscillators
Stochasticityinjected noise at each steprandom initial phases only
Why it might be efficientthe physics can run in analog silicon

A diffusion model spends its compute pushing tensors through learned layers many times. Un-0 spends its compute letting a physical system relax. On a GPU that's a wash at best — more on that below — but the bet is that the relaxation is free when the substrate is the right kind of analog hardware.

Does it actually generate images?

Yes — and the honest version is "yes, modestly". Un-0 is class-conditional and low-res, and the company is upfront that it underperforms state-of-the-art generators like EDM. The headline is FID 6.74 on ImageNet 64×64, which they frame as matching early conventional generators.

A mosaic of small images generated by Un-0's coupled-oscillator model — recognizable class-conditional samples at low resolution.
Samples from Un-0 (Unconventional AI). Class-conditional, low-resolution, generated by integrating a coupled-oscillator ODE and decoding the settled phases — no diffusion schedule involved.

And here is the generation happening — a row of samples resolving out of the oscillator field as the ODE integrates forward in time. There's no denoising loop; this is the population relaxing toward its conditioned attractor and the decoder reading it out frame by frame:

Un-0 generation over integration time (Unconventional AI): each tile resolves from noise into a recognizable image as the coupled oscillators settle — the forward pass is the dynamics relaxing, not a sampler loop.

FID scales the way you'd hope with oscillator count nn (more oscillators, lower FID):

DatasetconfigparamsFID (↓)
CIFAR-10 32×32n10241.3M~11.0
CIFAR-10 32×32n20484.9M~9.3
CIFAR-10 32×32n409619.4M~8.8
ImageNet 64×64n665657M~8.4
ImageNet 64×64n10240130M~8.0
ImageNet 64×64n16384322M6.74
Parameter-count versus FID Pareto curve for Un-0 on ImageNet 64x64, FID dropping as oscillator count and parameters grow.
Params-vs-FID frontier on ImageNet 64×64 (Unconventional AI): FID falls monotonically as the oscillator population grows, reaching 6.74 at n16384 / 322M params.

The same monotone scaling holds on CIFAR-10, where even a 1.3M-parameter field already reaches a usable FID:

Parameter-count versus FID Pareto curve for Un-0 on CIFAR-10, FID dropping from about 11 to about 8.8 as the oscillator population grows.
Params-vs-FID frontier on CIFAR-10 32×32 (Unconventional AI): from ~11.0 at 1.3M params (n1024) down to ~8.8 at 19.4M (n4096).

Note the FID values wobble slightly between the blog and the repo README (e.g. 8.41 vs 8.36) — these are self-reported, not third-party-reproduced, so treat them as approximate. The compute is non-trivial too: the largest ImageNet run is reported around 640 B200-GPU-hours — simulating the oscillators on conventional GPUs is the expensive part, which is exactly the cost the proposed chip is meant to erase.

The hardware bet

This is where the whole thing either pays off or doesn't. Today's accelerators are von Neumann machines: weights live in memory, you stream them to compute units, multiply, and write back. That shuffle — not the arithmetic — is where most of the energy goes.

Unconventional AI's proposal is to build the oscillators in physical silicon (CMOS ring oscillators are the usual candidate), so that the coupled dynamics happen rather than being computed. There's no weight streaming because the coupling is the wiring; the system's settling to a synchronized state is the forward pass. The aspiration is brain-like efficiency — order-of-tens-of-watts, against data-center GPUs — and the "1000×" figure is a projection of what that substrate could do relative to simulating the same ODE on a GPU.

It's a real idea with real lineage — analog and neuromorphic computing has chased this for decades — and the team (Naveen Rao, plus Michael Carbin from MIT and Sara Achour from Stanford on the hardware/compiler side) is credible. But it is, today, a proposal. The repo says chip schematics are "coming soon".

The part to keep straight

So separate two claims cleanly. Demonstrated today, in simulation: a coupled-oscillator ODE, conditioned and evolved once, decodes into recognizable class-conditional images at FID 6.74 (ImageNet-64). Proposed, not yet built: the analog oscillator chip whose physics would run that ODE for ~1000× less energy. The first is a real, open, checkable result. The second is a hardware vision — credible given the team and funding, but unbuilt and unverified.

What I make of it


Built on Unconventional AI's Un-0 technical writeup and the MIT-licensed Un-0 code. Benchmarks are self-reported; the analog-hardware efficiency claim is a founder projection, not a measured result.

share