... . --> predator-a3b-ngl6-ctx-1 — qwen3 on predator — benchmarks.weeyuga.com
← All benchmarks
Headline Methodology Results Cold vs warm By prompt Timeline Raw

5 MAY 2026 · Predator · gaming laptop · GTX 1060 6 GB · 28 GB RAM · qwen3 · param-sweep

predator-a3b-ngl6-ctx-1 — qwen3 on predator

45 calls across 3 cell(s); ~5.2 tok/s mean; p50 84.8s

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell tok/s mean tok/s p50 tok/s p95 duration p50 calls
qwen3:30b-a3b-iq2m-think500-ngl6-ctx64k5.45.35.91m13s15
qwen3:30b-a3b-iq2m-think500-ngl6-ctx96k5.45.45.81m39s15
qwen3:30b-a3b-iq2m-think500-ngl6-ctx131k4.84.84.91m38s15

tokens per second — mean · p50 · p95

2 3 5 7 tok/s mean: 5.42 tok/s p50: 5.35 tok/s p95: 5.91 tok/s qwen3:30b-a3b-i… mean: 5.43 tok/s p50: 5.38 tok/s p95: 5.78 tok/s qwen3:30b-a3b-i… mean: 4.79 tok/s p50: 4.80 tok/s p95: 4.93 tok/s qwen3:30b-a3b-i… mean p50 p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cellcold ncold tok/scold p50warm nwarm tok/swarm p50warm/cold
predator:llamacpp:qwen3:30b-a3b-i…35.21m21s95.51m10s1.05×
predator:llamacpp:qwen3:30b-a3b-i…35.31m50s95.51m28s1.04×
predator:llamacpp:qwen3:30b-a3b-i…34.72m0s94.81m36s1.03×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

CellhelloP-MEDIUMP-HARD
predator:llamacpp:qwen3:30b-a3b-i…5.5 tok/s
19.2s · n=4
5.5 tok/s
1m13s · n=4
5.2 tok/s
2m26s · n=4
predator:llamacpp:qwen3:30b-a3b-i…5.6 tok/s
17.5s · n=4
5.4 tok/s
1m50s · n=4
5.3 tok/s
2m31s · n=4
predator:llamacpp:qwen3:30b-a3b-i…4.7 tok/s
37.4s · n=4
4.8 tok/s
1m40s · n=4
4.8 tok/s
2m31s · n=4

Reasoning vs answer

Thinking models split their output into a hidden reasoning trace and a visible answer. The ratio shows how much of the budget the model spent thinking vs answering.

Cellreasoning charsanswer charsreasoning / answer
predator:llamacpp:qwen3:30b-a3b-i…10318911.16×
predator:llamacpp:qwen3:30b-a3b-i…12667661.65×
predator:llamacpp:qwen3:30b-a3b-i…12507151.75×

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

predator:llamacpp:qwen3:3… hello · cold · 38.2s · 5.2 tok/s hello · warm · 16.0s · 5.9 tok/s hello · warm · 13.8s · 5.9 tok/s hello · warm · 22.4s · 5.2 tok/s P-MEDIUM · cold · 81.1s · 5.4 tok/s P-MEDIUM · warm · 70.8s · 5.5 tok/s P-MEDIUM · warm · 60.4s · 5.6 tok/s P-MEDIUM · warm · 75.6s · 5.5 tok/s P-HARD · cold · 147.5s · 5.1 tok/s P-HARD · warm · 157.6s · 5.3 tok/s P-HARD · warm · 146.3s · 5.3 tok/s P-HARD · warm · 146.2s · 5.3 tok/s predator:llamacpp:qwen3:3… hello · cold · 16.2s · 5.3 tok/s hello · warm · 18.2s · 5.6 tok/s hello · warm · 19.2s · 5.5 tok/s hello · warm · 16.8s · 5.9 tok/s P-MEDIUM · cold · 110.9s · 5.3 tok/s P-MEDIUM · warm · 110.8s · 5.4 tok/s P-MEDIUM · warm · 112.1s · 5.4 tok/s P-MEDIUM · warm · 66.1s · 5.6 tok/s P-HARD · cold · 151.4s · 5.2 tok/s P-HARD · warm · 152.8s · 5.2 tok/s P-HARD · warm · 88.4s · 5.6 tok/s P-HARD · warm · 151.3s · 5.2 tok/s predator:llamacpp:qwen3:3… hello · cold · 34.4s · 4.4 tok/s hello · warm · 18.1s · 4.7 tok/s hello · warm · 40.4s · 4.9 tok/s hello · warm · 43.7s · 4.9 tok/s P-MEDIUM · cold · 120.9s · 4.9 tok/s P-MEDIUM · warm · 73.2s · 4.8 tok/s P-MEDIUM · warm · 104.1s · 4.8 tok/s P-MEDIUM · warm · 96.6s · 4.8 tok/s P-HARD · cold · 167.3s · 4.7 tok/s P-HARD · warm · 145.9s · 4.7 tok/s P-HARD · warm · 99.8s · 4.9 tok/s P-HARD · warm · 156.6s · 4.8 tok/s cold warm

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/79afe366.html
Public benchmarks of the Weeyuga cluster. Run id: 79afe366-1055-4e45-adf6-593864a530e8. SHA ddbaaf46.

Related runs