vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50
6 calls across 5 cell(s); ~0.0 tok/s mean; p50 527.8s
Methodology
See A3B_AND_CPU_OVERNIGHT_2026-05-05
for the full procedure.
Reproducible at git SHA ddbaaf46.
Results
| Cell | tok/s mean | tok/s p50 | tok/s p95 | duration p50 | calls |
|---|---|---|---|---|---|
| phi-4 | — | — | — | — | 0 |
| gemma-4-26b-a4b | — | — | — | — | 0 |
| qwen3-30b-a3b | — | — | — | — | 0 |
| qwen2.5-72b | — | — | — | — | 0 |
| phi-4-q4km-cpu-ctx32k | 0.0 | 0.0 | 0.0 | 8m47s | 6 |
tokens per second — mean · p50 · p95
Cold start vs warm
Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.
| Cell | cold n | cold tok/s | cold p50 | warm n | warm tok/s | warm p50 | warm/cold |
|---|---|---|---|---|---|---|---|
| vps50:llamacpp:phi-4-q4km-cpu-ctx32k | 1 | 0.0 | 9m0s | 3 | 0.0 | 8m35s | 1.00× |
By prompt difficulty
Tokens per second by prompt class. hello is a trivial
one-line prompt; P-MEDIUM and P-HARD are the
deeper questions in the suite. The shape of the gap tells you whether
the model is bottlenecked on parsing or on generation.
| Cell | hello |
|---|---|
| vps50:llamacpp:phi-4-q4km-cpu-ctx32k | 0.0 tok/s 8m47s · n=4 |
Per-call timeline
Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.
Raw data
Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.
Cite
Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/cef4b219.html Public benchmarks of the Weeyuga cluster. Run id: cef4b219-6b65-4740-8c3b-5b68aa10cf3f. SHA ddbaaf46.