LLMs Unplugged is a family of artefacts that make large language models tangible: n-gram language models you can build, hold, and run by hand, with no computers required. The statistical machinery underneath ChatGPT, brought down to human scale — because training is just counting, and generation is just weighted sampling.

This is the project page for the growing LLM Library, the Studio’s shelf of printed n-gram models. The hands-on lessons, worksheets and paper-and-dice activities live on a separate site, llmsunplugged.org.

The LLM Library: hardbound volumes lined up on a shelf.
The LLM Library — pre-trained language models in printed, hardbound form, published by Cybernetic Studio Press.

The LLM Library#

The centrepiece is the LLM Library: a collection of pre-trained language models in printed, hardbound book form. Each volume contains n-gram frequency tables typeset from a classic work of literature — the same statistical patterns that underpin modern LLMs, but at a scale you can read on the page. You can hold the entire model in your hands and use it to generate new text with pen, paper, and dice.

The distinctive feature is the curation. Each volume is built from a specific literary work, then typeset in bigram, trigram, and 4-gram variants. The progression demonstrates the fundamental trade-off between model size and output quality — a trigram model of Frankenstein produces noticeably more coherent text than the bigram version, but the book is considerably thicker. Pick up a Hemingway bigram and you get terse, punchy fragments; the Cloudstreet model wanders into something more sprawling. The model is the text it was trained on, in a way that’s immediately legible.

The shelf keeps growing; the volumes so far are:

  • Mary Shelley’s Frankenstein, in bigram, trigram and 4-gram editions
  • Tim Winton’s Cloudstreet
  • the collected works of Ernest Hemingway
  • a synthetic TinyStories dataset, for comparison

All are published by Cybernetic Studio Press under CC BY-NC-SA 4.0.

A bigram volume open to its typeset frequency tables.
Inside a volume: n-gram frequency tables, four columns to an A4 page, ready to generate text with a die.

Here is what a few entries look like on the page. Each is a word from the book followed by the words that can come after it. Roll a ten-sided die, or two for the busier entries marked ♦♦, and read down the running totals: the small number by each word is the cumulative count, so after Frankenstein’s abandoned a roll up to three gives a comma, up to six gives his, up to nine gives me, and a ten means roll again.

Because each model is only its source text counted up, the book’s own voice leaks through. Cloudstreet’s absolutely runs on to bugger, packin or troppo; Hemingway’s able is followed by to more than nine times in ten. You can make a booklet like this from any text you like, using the in-browser tools at llmsunplugged.org.

From workshop to artwork#

The Library grew out of LLMs Unplugged, a set of hands-on activities that teach the training-to-generation loop with nothing but paper, dice, pens, and scissors. The core activity is disarmingly simple: count word patterns in a children’s book, then generate new text by rolling dice weighted according to those counts. Something clicks when you watch plausible-ish sentences emerge from pure statistics — the difference between your grid paper and GPT-4 is scale, not sorcery.

LLMs Unplugged workshop materials: grids, tokens, and dice.
The unplugged toolkit: tally grids, labelled buckets, and weighted dice. Training is counting; generation is sampling.

The activities build on the CS Unplugged tradition and, further back, on Claude Shannon’s 1948 work generating synthetic English from hand-drawn frequency tables. They have run with over 400 participants, from primary-school students to senior public-service executives, at venues including the ACT Academy of Future Skills, Brimbank Tech School in Victoria and a Year 5 class at Duffy Primary. At the ANU’s Tjabal Centre, the Luritja poet Matt Heffernan brought one of his poems in both Luritja and English, and participants built language models from it. It makes a simple point: there is nothing especially English about a language model. Train it on Luritja, and Luritja is what comes out. The resources are published openly at llmsunplugged.org and written up for the ACE 2026 computing-education conference.

Generating text by hand from an unplugged language model.
Generating text the unplugged way — pen, paper, and a weighted roll of the dice.

Where the workshop materials are disposable, the Library volumes are curated, printed, and hardbound as standalone artefacts: the same Rust-to-Typst pipeline, turned toward something you would keep on a shelf. A Cybernetic Studio project by Ben Swift. Code and source texts: github.com/ANUcybernetics/llms-unplugged.

You are on Aboriginal land.

The Australian National University acknowledges, celebrates, and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work as the oldest continuing culture and knowledges in human history.

arrow-right bars search times