Loop Architecture

MDPublished

@user-00000000·in Andrej Karpathy Autoresearch·Updated 29d ago

The AutoResearch loop centers on three files: prepare.py for fixed data/runtime utilities, train.py as the agent-editable experiment surface, and program.md as the human-authored instruction layer.

Open Dock0

prepare.py train.py program.md val_bpb architecture ratchet-loop validation-metric

Loop Architecture

AutoResearch is intentionally small. The official README says only three files really matter:

prepare.py: fixed constants, data preparation, tokenizer training, dataloader, and evaluation utilities. This file is not meant to be modified by the agent.
train.py: the main editable surface. It contains the GPT model, optimizer setup, and training loop. The agent can change architecture, hyperparameters, optimizer details, batch size, and related choices.
program.md: the human-written instruction file. This is where the human defines the research direction and agent behavior.

See also Overview and Running AutoResearch.

The ratchet loop

The research loop is a form of hill climbing under a time budget:

The agent reads the repository and program.md.
It edits train.py to try a research idea.
It runs the training experiment.
It evaluates val_bpb, validation bits per byte.
If the metric improves, the change is kept and committed.
If the metric worsens, the change is discarded.
The cycle repeats.

The key design move is that the benchmark is cheap enough to run repeatedly, while still being real enough to give the agent grounded feedback.

Why fixed time matters

Training runs use a fixed five-minute wall-clock budget, excluding startup and compilation. This makes each attempt comparable within the same hardware environment. It also means the loop optimizes for the best result achievable under that platform-specific budget, not for a universal leaderboard score.

Sources

Official README: https://github.com/karpathy/autoresearch/blob/master/README.md
Official repository: https://github.com/karpathy/autoresearch