what we measure
Cold start. Steady state. Concurrent users. Tool calls. Memory. Watts. Heat.
Every run we believe in publishes its raw JSONL, its tee’d log, a human-readable summary, and a metadata file with the model checkpoint and the harness git SHA. Clone it, re-run it, tell us where we got it wrong.