predator-a3b-ngl-ctx-2d-1 — qwen3 on predator
90 calls across 6 cell(s); ~5.9 tok/s mean; p50 73.4s
Methodology
See A3B_AND_CPU_OVERNIGHT_2026-05-05
for the full procedure.
Reproducible at git SHA ddbaaf46.
Results
| Cell | tok/s mean | tok/s p50 | tok/s p95 | duration p50 | calls |
|---|---|---|---|---|---|
| qwen3:30b-a3b-iq2m-think500-ngl12-ctx64k | 5.5 | 5.3 | 6.8 | 1m18s | 15 |
| qwen3:30b-a3b-iq2m-think500-ngl12-ctx96k | 5.8 | 5.8 | 6.5 | 58.5s | 15 |
| qwen3:30b-a3b-iq2m-think500-ngl12-ctx131k | 5.0 | 5.1 | 6.0 | 1m45s | 15 |
| qwen3:30b-a3b-iq2m-think500-ngl24-ctx48k | 6.6 | 6.6 | 6.9 | 1m4s | 15 |
| qwen3:30b-a3b-iq2m-think500-ngl24-ctx64k | 6.4 | 6.4 | 6.9 | 1m20s | 15 |
| qwen3:30b-a3b-iq2m-think500-ngl24-ctx96k | 6.1 | 6.4 | 6.7 | 1m8s | 15 |
tokens per second — mean · p50 · p95
Cold start vs warm
Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.
| Cell | cold n | cold tok/s | cold p50 | warm n | warm tok/s | warm p50 | warm/cold |
|---|---|---|---|---|---|---|---|
| predator:llamacpp:qwen3:30b-a3b-i… | 3 | 5.4 | 1m50s | 9 | 5.5 | 1m16s | 1.02× |
| predator:llamacpp:qwen3:30b-a3b-i… | 3 | 5.9 | 45.7s | 9 | 5.8 | 59.7s | 0.97× |
| predator:llamacpp:qwen3:30b-a3b-i… | 3 | 4.1 | 1m58s | 9 | 5.3 | 1m35s | 1.29× |
| predator:llamacpp:qwen3:30b-a3b-i… | 3 | 6.4 | 58.6s | 9 | 6.7 | 1m9s | 1.05× |
| predator:llamacpp:qwen3:30b-a3b-i… | 3 | 6.0 | 1m39s | 9 | 6.5 | 1m9s | 1.09× |
| predator:llamacpp:qwen3:30b-a3b-i… | 3 | 4.9 | 1m12s | 9 | 6.5 | 1m4s | 1.31× |
By prompt difficulty
Tokens per second by prompt class. hello is a trivial
one-line prompt; P-MEDIUM and P-HARD are the
deeper questions in the suite. The shape of the gap tells you whether
the model is bottlenecked on parsing or on generation.
| Cell | hello | P-MEDIUM | P-HARD |
|---|---|---|---|
| predator:llamacpp:qwen3:30b-a3b-i… | 6.0 tok/s 20.0s · n=4 | 5.3 tok/s 1m18s · n=4 | 5.1 tok/s 2m33s · n=4 |
| predator:llamacpp:qwen3:30b-a3b-i… | 5.8 tok/s 18.1s · n=4 | 5.8 tok/s 58.5s · n=4 | 5.7 tok/s 2m12s · n=4 |
| predator:llamacpp:qwen3:30b-a3b-i… | 4.7 tok/s 31.9s · n=4 | 5.2 tok/s 1m45s · n=4 | 5.1 tok/s 2m30s · n=4 |
| predator:llamacpp:qwen3:30b-a3b-i… | 6.7 tok/s 15.3s · n=4 | 6.6 tok/s 1m14s · n=4 | 6.5 tok/s 1m54s · n=4 |
| predator:llamacpp:qwen3:30b-a3b-i… | 6.5 tok/s 16.6s · n=4 | 6.4 tok/s 1m21s · n=4 | 6.2 tok/s 2m4s · n=4 |
| predator:llamacpp:qwen3:30b-a3b-i… | 5.5 tok/s 25.9s · n=4 | 6.4 tok/s 1m8s · n=4 | 6.4 tok/s 1m57s · n=4 |
Reasoning vs answer
Thinking models split their output into a hidden reasoning trace and a visible answer. The ratio shows how much of the budget the model spent thinking vs answering.
| Cell | reasoning chars | answer chars | reasoning / answer |
|---|---|---|---|
| predator:llamacpp:qwen3:30b-a3b-i… | 1194 | 809 | 1.47× |
| predator:llamacpp:qwen3:30b-a3b-i… | 1210 | 619 | 1.95× |
| predator:llamacpp:qwen3:30b-a3b-i… | 1296 | 873 | 1.48× |
| predator:llamacpp:qwen3:30b-a3b-i… | 1153 | 769 | 1.50× |
| predator:llamacpp:qwen3:30b-a3b-i… | 1064 | 944 | 1.13× |
| predator:llamacpp:qwen3:30b-a3b-i… | 1185 | 742 | 1.60× |
Per-call timeline
Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.
Raw data
Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.
Cite
Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/e08a7051.html Public benchmarks of the Weeyuga cluster. Run id: e08a7051-a856-4053-b34b-561d7ce5e8cf. SHA ddbaaf46.