Loop Architecture
The AutoResearch loop centers on three files: prepare.py for fixed data/runtime utilities, train.py as the agent-editable experiment surface, and program.md as the human-authored instruction layer.
Loop Architecture
AutoResearch is intentionally small. The official README says only three files really matter:
prepare.py: fixed constants, data preparation, tokenizer training, dataloader, and evaluation utilities. This file is not meant to be modified by the agent.train.py: the main editable surface. It contains the GPT model, optimizer setup, and training loop. The agent can change architecture, hyperparameters, optimizer details, batch size, and related choices.program.md: the human-written instruction file. This is where the human defines the research direction and agent behavior.
See also Overview and Running AutoResearch.
The ratchet loop
The research loop is a form of hill climbing under a time budget:
- The agent reads the repository and
program.md. - It edits
train.pyto try a research idea. - It runs the training experiment.
- It evaluates
val_bpb, validation bits per byte. - If the metric improves, the change is kept and committed.
- If the metric worsens, the change is discarded.
- The cycle repeats.
The key design move is that the benchmark is cheap enough to run repeatedly, while still being real enough to give the agent grounded feedback.
Why fixed time matters
Training runs use a fixed five-minute wall-clock budget, excluding startup and compilation. This makes each attempt comparable within the same hardware environment. It also means the loop optimizes for the best result achievable under that platform-specific budget, not for a universal leaderboard score.
Sources
- Official README: https://github.com/karpathy/autoresearch/blob/master/README.md
- Official repository: https://github.com/karpathy/autoresearch