LLM Brainscan maps every trainable parameter of a transformer neural network onto an 8K display — one pixel per weight, updating live as the model learns. Attention mechanisms form, MLP features sharpen, embedding clusters emerge, and you watch it all happen frame by frame.

LLM Brainscan running on an 8K display: weight matrices tiled above a strip of generated text.
Roughly 30 million parameters laid out as a single image. The bottom strip renders the model's generated text, each token coloured by prediction confidence.

Making training visible#

Training a language model is an opaque process — you stare at a loss curve ticking downward and hope for the best. LLM Brainscan makes the internal dynamics visible: roughly 30 million trainable parameters, laid out as a spatial image you can actually watch evolve.

The mapping works because the numbers line up. An 8K display offers 7680×4320 pixels — just over 33 million — which fits a compact transformer’s parameter count almost exactly. Weight matrices tile across the top of the display, while the bottom rows render the model’s generated text in real time, each token coloured by prediction confidence. You can see structure forming in the attention heads, watch layer norms stabilise, and spot the moment the model starts producing coherent output.

A full-width 8K strip showing one pixel per transformer parameter.
One pixel per parameter, across the full 7680-pixel width of an 8K panel.

Static snapshots of learned weights are common enough. The temporal dimension is where it gets interesting — watching training dynamics unfold live, at full resolution, reveals patterns that summary statistics flatten out: symmetry breaking in the attention layers, transient features that appear and dissolve, different parts of the network learning at visibly different rates. Loss curves don’t show you any of this.

A brain you can talk to#

The text strip along the bottom is not just a readout. A microphone makes the piece interactive: speak to it, and your words both steer what the model says back and get folded into the text it learns from. You are not only watching the model train — you are, in a small way, one of the things it trains on.

How it works#

Everything about the model is sized to the screen rather than to win a benchmark, which is rather the point: it exists to be watched. The network is deliberately small, and it uses the simplest vocabulary there is, one symbol per byte, so that almost none of the display is spent on bookkeeping. The vast majority of the pixels show the weights themselves, where the interesting changes happen.

Holding a live, full-resolution picture at 8K is the hard part. Each frame, the weights travel straight from the model to the screen with nothing copied back in between, which is the only way to keep the image moving in real time. The whole thing runs on a single compact machine, small enough to stand in a gallery rather than a server room. The technical write-up is in the code repository.

A Cybernetic Studio project by Ben Swift. Code: github.com/ANUcybernetics/llm-brainscan.

You are on Aboriginal land.

The Australian National University acknowledges, celebrates, and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work as the oldest continuing culture and knowledges in human history.

arrow-right bars search times