5 MAY 2026 · Predator · gaming laptop · GTX 1060 6 GB · 28 GB RAM · qwen3 · param-sweep

predator-a3b-ngl-ctx-2d-1 — qwen3 on predator

90 calls across 6 cell(s); ~5.9 tok/s mean; p50 73.4s

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell	tok/s mean	tok/s p50	tok/s p95	duration p50	calls
qwen3:30b-a3b-iq2m-think500-ngl12-ctx64k	5.5	5.3	6.8	1m18s	15
qwen3:30b-a3b-iq2m-think500-ngl12-ctx96k	5.8	5.8	6.5	58.5s	15
qwen3:30b-a3b-iq2m-think500-ngl12-ctx131k	5.0	5.1	6.0	1m45s	15
qwen3:30b-a3b-iq2m-think500-ngl24-ctx48k	6.6	6.6	6.9	1m4s	15
qwen3:30b-a3b-iq2m-think500-ngl24-ctx64k	6.4	6.4	6.9	1m20s	15
qwen3:30b-a3b-iq2m-think500-ngl24-ctx96k	6.1	6.4	6.7	1m8s	15

tokens per second — mean · p50 · p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cell	cold n	cold tok/s	cold p50	warm n	warm tok/s	warm p50	warm/cold
predator:llamacpp:qwen3:30b-a3b-i…	3	5.4	1m50s	9	5.5	1m16s	1.02×
predator:llamacpp:qwen3:30b-a3b-i…	3	5.9	45.7s	9	5.8	59.7s	0.97×
predator:llamacpp:qwen3:30b-a3b-i…	3	4.1	1m58s	9	5.3	1m35s	1.29×
predator:llamacpp:qwen3:30b-a3b-i…	3	6.4	58.6s	9	6.7	1m9s	1.05×
predator:llamacpp:qwen3:30b-a3b-i…	3	6.0	1m39s	9	6.5	1m9s	1.09×
predator:llamacpp:qwen3:30b-a3b-i…	3	4.9	1m12s	9	6.5	1m4s	1.31×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

Cell	hello	P-MEDIUM	P-HARD
predator:llamacpp:qwen3:30b-a3b-i…	6.0 tok/s 20.0s · n=4	5.3 tok/s 1m18s · n=4	5.1 tok/s 2m33s · n=4
predator:llamacpp:qwen3:30b-a3b-i…	5.8 tok/s 18.1s · n=4	5.8 tok/s 58.5s · n=4	5.7 tok/s 2m12s · n=4
predator:llamacpp:qwen3:30b-a3b-i…	4.7 tok/s 31.9s · n=4	5.2 tok/s 1m45s · n=4	5.1 tok/s 2m30s · n=4
predator:llamacpp:qwen3:30b-a3b-i…	6.7 tok/s 15.3s · n=4	6.6 tok/s 1m14s · n=4	6.5 tok/s 1m54s · n=4
predator:llamacpp:qwen3:30b-a3b-i…	6.5 tok/s 16.6s · n=4	6.4 tok/s 1m21s · n=4	6.2 tok/s 2m4s · n=4
predator:llamacpp:qwen3:30b-a3b-i…	5.5 tok/s 25.9s · n=4	6.4 tok/s 1m8s · n=4	6.4 tok/s 1m57s · n=4

Reasoning vs answer

Thinking models split their output into a hidden reasoning trace and a visible answer. The ratio shows how much of the budget the model spent thinking vs answering.

Cell	reasoning chars	answer chars	reasoning / answer
predator:llamacpp:qwen3:30b-a3b-i…	1194	809	1.47×
predator:llamacpp:qwen3:30b-a3b-i…	1210	619	1.95×
predator:llamacpp:qwen3:30b-a3b-i…	1296	873	1.48×
predator:llamacpp:qwen3:30b-a3b-i…	1153	769	1.50×
predator:llamacpp:qwen3:30b-a3b-i…	1064	944	1.13×
predator:llamacpp:qwen3:30b-a3b-i…	1185	742	1.60×

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/e08a7051.html
Public benchmarks of the Weeyuga cluster. Run id: e08a7051-a856-4053-b34b-561d7ce5e8cf. SHA ddbaaf46.

predator-a3b-ngl-ctx-2d-1 — qwen3 on predator

Methodology

Results

Cold start vs warm

By prompt difficulty

Reasoning vs answer

Per-call timeline

Raw data

Cite

Related runs

predator-a3b-ngl6-ctx-1 — qwen3 on predator

predator-a3b-ctx-sweep-1 — qwen3 on predator

predator-a3b-ngl-matrix-1 — qwen3 on predator

A3B cross-machine — Predator side