... . --> predator-a3b-ngl-matrix-1 — qwen3 on predator — benchmarks.weeyuga.com
← All benchmarks
Headline Methodology Results Cold vs warm By prompt Timeline Raw

4 MAY 2026 · Predator · gaming laptop · GTX 1060 6 GB · 28 GB RAM · qwen3 · param-sweep

predator-a3b-ngl-matrix-1 — qwen3 on predator

42 calls across 3 cell(s); ~5.7 tok/s mean; p50 71.5s

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell tok/s mean tok/s p50 tok/s p95 duration p50 calls
qwen3:30b-a3b-iq2m-think500-ctx32k-ngl65.05.05.21m32s14
qwen3:30b-a3b-iq2m-think500-ctx32k-ngl125.65.46.61m21s14
qwen3:30b-a3b-iq2m-think500-ctx32k-ngl246.66.76.91m7s14

tokens per second — mean · p50 · p95

2 4 6 8 tok/s mean: 4.95 tok/s p50: 4.98 tok/s p95: 5.15 tok/s qwen3:30b-a3b-i… mean: 5.59 tok/s p50: 5.36 tok/s p95: 6.61 tok/s qwen3:30b-a3b-i… mean: 6.58 tok/s p50: 6.68 tok/s p95: 6.88 tok/s qwen3:30b-a3b-i… mean p50 p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cellcold ncold tok/scold p50warm nwarm tok/swarm p50warm/cold
predator:llamacpp:qwen3:30b-a3b-i…34.91m48s95.01m32s1.03×
predator:llamacpp:qwen3:30b-a3b-i…35.41m4s95.61m32s1.04×
predator:llamacpp:qwen3:30b-a3b-i…36.21m7s96.71m7s1.08×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

CellhelloP-MEDIUMP-HARD
predator:llamacpp:qwen3:30b-a3b-i…4.8 tok/s
23.1s · n=4
4.9 tok/s
1m36s · n=4
5.1 tok/s
2m33s · n=4
predator:llamacpp:qwen3:30b-a3b-i…6.2 tok/s
14.6s · n=4
5.3 tok/s
1m22s · n=4
5.3 tok/s
2m6s · n=4
predator:llamacpp:qwen3:30b-a3b-i…6.6 tok/s
12.4s · n=4
6.7 tok/s
1m7s · n=4
6.5 tok/s
1m58s · n=4

Reasoning vs answer

Thinking models split their output into a hidden reasoning trace and a visible answer. The ratio shows how much of the budget the model spent thinking vs answering.

Cellreasoning charsanswer charsreasoning / answer
predator:llamacpp:qwen3:30b-a3b-i…12646571.92×
predator:llamacpp:qwen3:30b-a3b-i…11296581.72×
predator:llamacpp:qwen3:30b-a3b-i…10957361.49×

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

predator:llamacpp:qwen3:3… hello · cold · 23.9s · 4.8 tok/s hello · warm · 19.9s · 4.7 tok/s hello · warm · 22.3s · 4.7 tok/s hello · warm · 32.6s · 5.1 tok/s P-MEDIUM · cold · 108.0s · 4.7 tok/s P-MEDIUM · warm · 71.3s · 4.9 tok/s P-MEDIUM · warm · 92.4s · 4.9 tok/s P-MEDIUM · warm · 100.6s · 5.1 tok/s P-HARD · cold · 155.4s · 5.1 tok/s P-HARD · warm · 92.5s · 5.1 tok/s P-HARD · warm · 153.1s · 5.1 tok/s P-HARD · warm · 153.2s · 5.1 tok/s predator:llamacpp:qwen3:3… hello · cold · 41.9s · 5.4 tok/s hello · warm · 10.9s · 6.6 tok/s hello · warm · 9.4s · 6.6 tok/s hello · warm · 18.4s · 5.9 tok/s P-MEDIUM · cold · 64.7s · 5.7 tok/s P-MEDIUM · warm · 96.0s · 5.3 tok/s P-MEDIUM · warm · 93.9s · 5.0 tok/s P-MEDIUM · warm · 70.2s · 5.3 tok/s P-HARD · cold · 149.0s · 5.2 tok/s P-HARD · warm · 92.5s · 5.3 tok/s P-HARD · warm · 108.9s · 5.3 tok/s P-HARD · warm · 144.4s · 5.4 tok/s predator:llamacpp:qwen3:3… hello · cold · 11.7s · 5.7 tok/s hello · warm · 15.0s · 6.9 tok/s hello · warm · 10.3s · 6.9 tok/s hello · warm · 13.1s · 6.9 tok/s P-MEDIUM · cold · 67.1s · 6.6 tok/s P-MEDIUM · warm · 59.2s · 6.7 tok/s P-MEDIUM · warm · 71.6s · 6.7 tok/s P-MEDIUM · warm · 67.0s · 6.7 tok/s P-HARD · cold · 119.3s · 6.4 tok/s P-HARD · warm · 72.0s · 6.7 tok/s P-HARD · warm · 117.3s · 6.5 tok/s P-HARD · warm · 127.7s · 6.4 tok/s cold warm

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/b37836bf.html
Public benchmarks of the Weeyuga cluster. Run id: b37836bf-d1a2-4d6b-a732-aff89da1fa07. SHA ddbaaf46.

Related runs