... . --> predator-a3b-ngl-ctx-2d-1 — qwen3 on predator — benchmarks.weeyuga.com
← All benchmarks
Headline Methodology Results Cold vs warm By prompt Timeline Raw

5 MAY 2026 · Predator · gaming laptop · GTX 1060 6 GB · 28 GB RAM · qwen3 · param-sweep

predator-a3b-ngl-ctx-2d-1 — qwen3 on predator

90 calls across 6 cell(s); ~5.9 tok/s mean; p50 73.4s

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell tok/s mean tok/s p50 tok/s p95 duration p50 calls
qwen3:30b-a3b-iq2m-think500-ngl12-ctx64k5.55.36.81m18s15
qwen3:30b-a3b-iq2m-think500-ngl12-ctx96k5.85.86.558.5s15
qwen3:30b-a3b-iq2m-think500-ngl12-ctx131k5.05.16.01m45s15
qwen3:30b-a3b-iq2m-think500-ngl24-ctx48k6.66.66.91m4s15
qwen3:30b-a3b-iq2m-think500-ngl24-ctx64k6.46.46.91m20s15
qwen3:30b-a3b-iq2m-think500-ngl24-ctx96k6.16.46.71m8s15

tokens per second — mean · p50 · p95

2 4 6 8 tok/s mean: 5.48 tok/s p50: 5.31 tok/s p95: 6.79 tok/s qwen3:30b-a3b-i… mean: 5.79 tok/s p50: 5.75 tok/s p95: 6.47 tok/s qwen3:30b-a3b-i… mean: 5.01 tok/s p50: 5.13 tok/s p95: 5.97 tok/s qwen3:30b-a3b-i… mean: 6.60 tok/s p50: 6.59 tok/s p95: 6.89 tok/s qwen3:30b-a3b-i… mean: 6.41 tok/s p50: 6.39 tok/s p95: 6.88 tok/s qwen3:30b-a3b-i… mean: 6.08 tok/s p50: 6.44 tok/s p95: 6.68 tok/s qwen3:30b-a3b-i… mean p50 p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cellcold ncold tok/scold p50warm nwarm tok/swarm p50warm/cold
predator:llamacpp:qwen3:30b-a3b-i…35.41m50s95.51m16s1.02×
predator:llamacpp:qwen3:30b-a3b-i…35.945.7s95.859.7s0.97×
predator:llamacpp:qwen3:30b-a3b-i…34.11m58s95.31m35s1.29×
predator:llamacpp:qwen3:30b-a3b-i…36.458.6s96.71m9s1.05×
predator:llamacpp:qwen3:30b-a3b-i…36.01m39s96.51m9s1.09×
predator:llamacpp:qwen3:30b-a3b-i…34.91m12s96.51m4s1.31×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

CellhelloP-MEDIUMP-HARD
predator:llamacpp:qwen3:30b-a3b-i…6.0 tok/s
20.0s · n=4
5.3 tok/s
1m18s · n=4
5.1 tok/s
2m33s · n=4
predator:llamacpp:qwen3:30b-a3b-i…5.8 tok/s
18.1s · n=4
5.8 tok/s
58.5s · n=4
5.7 tok/s
2m12s · n=4
predator:llamacpp:qwen3:30b-a3b-i…4.7 tok/s
31.9s · n=4
5.2 tok/s
1m45s · n=4
5.1 tok/s
2m30s · n=4
predator:llamacpp:qwen3:30b-a3b-i…6.7 tok/s
15.3s · n=4
6.6 tok/s
1m14s · n=4
6.5 tok/s
1m54s · n=4
predator:llamacpp:qwen3:30b-a3b-i…6.5 tok/s
16.6s · n=4
6.4 tok/s
1m21s · n=4
6.2 tok/s
2m4s · n=4
predator:llamacpp:qwen3:30b-a3b-i…5.5 tok/s
25.9s · n=4
6.4 tok/s
1m8s · n=4
6.4 tok/s
1m57s · n=4

Reasoning vs answer

Thinking models split their output into a hidden reasoning trace and a visible answer. The ratio shows how much of the budget the model spent thinking vs answering.

Cellreasoning charsanswer charsreasoning / answer
predator:llamacpp:qwen3:30b-a3b-i…11948091.47×
predator:llamacpp:qwen3:30b-a3b-i…12106191.95×
predator:llamacpp:qwen3:30b-a3b-i…12968731.48×
predator:llamacpp:qwen3:30b-a3b-i…11537691.50×
predator:llamacpp:qwen3:30b-a3b-i…10649441.13×
predator:llamacpp:qwen3:30b-a3b-i…11857421.60×

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

predator:llamacpp:qwen3:3… hello · cold · 25.7s · 5.6 tok/s hello · warm · 10.0s · 6.8 tok/s hello · warm · 15.6s · 6.8 tok/s hello · warm · 24.3s · 4.9 tok/s P-MEDIUM · cold · 110.8s · 5.3 tok/s P-MEDIUM · warm · 76.3s · 5.1 tok/s P-MEDIUM · warm · 70.4s · 5.3 tok/s P-MEDIUM · warm · 81.6s · 5.5 tok/s P-HARD · cold · 142.0s · 5.3 tok/s P-HARD · warm · 150.0s · 5.3 tok/s P-HARD · warm · 162.4s · 5.0 tok/s P-HARD · warm · 157.8s · 4.8 tok/s predator:llamacpp:qwen3:3… hello · cold · 13.7s · 5.8 tok/s hello · warm · 33.0s · 4.9 tok/s hello · warm · 15.8s · 6.7 tok/s hello · warm · 20.4s · 5.9 tok/s P-MEDIUM · cold · 45.7s · 6.3 tok/s P-MEDIUM · warm · 59.7s · 5.6 tok/s P-MEDIUM · warm · 57.4s · 5.7 tok/s P-MEDIUM · warm · 104.3s · 5.9 tok/s P-HARD · cold · 137.5s · 5.7 tok/s P-HARD · warm · 144.3s · 5.5 tok/s P-HARD · warm · 101.9s · 5.7 tok/s P-HARD · warm · 127.9s · 5.9 tok/s predator:llamacpp:qwen3:3… hello · cold · 59.7s · 2.3 tok/s hello · warm · 10.8s · 6.3 tok/s hello · warm · 20.0s · 5.0 tok/s hello · warm · 43.7s · 5.3 tok/s P-MEDIUM · cold · 118.1s · 5.0 tok/s P-MEDIUM · warm · 116.2s · 5.1 tok/s P-MEDIUM · warm · 95.5s · 4.9 tok/s P-MEDIUM · warm · 84.4s · 5.7 tok/s P-HARD · cold · 150.7s · 5.1 tok/s P-HARD · warm · 148.9s · 5.1 tok/s P-HARD · warm · 159.1s · 4.8 tok/s P-HARD · warm · 149.7s · 5.4 tok/s predator:llamacpp:qwen3:3… hello · cold · 19.9s · 6.2 tok/s hello · warm · 14.7s · 6.9 tok/s hello · warm · 15.6s · 6.9 tok/s hello · warm · 15.0s · 6.9 tok/s P-MEDIUM · cold · 58.6s · 6.6 tok/s P-MEDIUM · warm · 89.9s · 6.6 tok/s P-MEDIUM · warm · 92.5s · 6.6 tok/s P-MEDIUM · warm · 59.9s · 6.7 tok/s P-HARD · cold · 121.6s · 6.3 tok/s P-HARD · warm · 69.3s · 6.7 tok/s P-HARD · warm · 114.3s · 6.5 tok/s P-HARD · warm · 115.2s · 6.5 tok/s predator:llamacpp:qwen3:3… hello · cold · 18.5s · 5.6 tok/s hello · warm · 18.5s · 6.9 tok/s hello · warm · 14.7s · 6.8 tok/s hello · warm · 14.4s · 6.9 tok/s P-MEDIUM · cold · 99.5s · 6.2 tok/s P-MEDIUM · warm · 56.2s · 6.4 tok/s P-MEDIUM · warm · 92.5s · 6.4 tok/s P-MEDIUM · warm · 69.6s · 6.6 tok/s P-HARD · cold · 123.4s · 6.2 tok/s P-HARD · warm · 90.9s · 6.3 tok/s P-HARD · warm · 131.5s · 6.3 tok/s P-HARD · warm · 125.6s · 6.2 tok/s predator:llamacpp:qwen3:3… hello · cold · 35.5s · 2.7 tok/s hello · warm · 27.1s · 6.2 tok/s hello · warm · 24.7s · 6.7 tok/s hello · warm · 13.4s · 6.4 tok/s P-MEDIUM · cold · 72.3s · 6.0 tok/s P-MEDIUM · warm · 92.1s · 6.5 tok/s P-MEDIUM · warm · 64.3s · 6.4 tok/s P-MEDIUM · warm · 64.4s · 6.7 tok/s P-HARD · cold · 127.3s · 6.1 tok/s P-HARD · warm · 123.1s · 6.3 tok/s P-HARD · warm · 74.6s · 6.6 tok/s P-HARD · warm · 112.8s · 6.5 tok/s cold warm

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/e08a7051.html
Public benchmarks of the Weeyuga cluster. Run id: e08a7051-a856-4053-b34b-561d7ce5e8cf. SHA ddbaaf46.

Related runs