... . --> Predator Qwen rerun matrix — benchmarks.weeyuga.com
← All benchmarks
Headline Methodology Results Cold vs warm By prompt Timeline Raw

4 MAY 2026 · Predator · gaming laptop · GTX 1060 6 GB · 28 GB RAM · qwen3/qwen3.5 · chat

Predator Qwen rerun matrix

Three Qwen variants on Predator: 36 calls, ~10.9 tok/s mean, p50 24s. The single-machine flagship for the rerun campaign.

What Janie says

The qwen3.5 family on Predator under three different inference shapes — the budget=500 thinking config, the no-think config, and the same model's 14B dense sibling for the contrast. Three variants, twelve prompts each, thirty-six calls. The headline: ten-point-nine tokens per second mean across the matrix, p50 twenty-four seconds per call.

The 14B dense model is the casualty; it sits in this matrix at roughly one token per second on hard prompts and serves as the empirical floor of "what a six-gigabyte VRAM GPU can't do gracefully." The thinking models (budget=500) and the no-think baseline come within striking distance of each other on wall-time — the reasoning-budget cap is doing its job. Same machine, same model architecture across two of the three variants, different deployment knobs. This matrix is what tuning looks like in practice on one machine.

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell tok/s mean tok/s p50 tok/s p95 duration p50 calls
qwen3.5:9b-q4km-think50014.214.915.333.7s14
qwen3.5:9b-q4km-nothink12.614.414.97.2s14
qwen3:14b-q4km1.11.11.31m18s8

tokens per second — mean · p50 · p95

4 9 13 17 tok/s mean: 14.25 tok/s p50: 14.91 tok/s p95: 15.30 tok/s qwen3.5:9b-q4km… mean: 12.55 tok/s p50: 14.37 tok/s p95: 14.93 tok/s qwen3.5:9b-q4km… mean: 1.06 tok/s p50: 1.09 tok/s p95: 1.28 tok/s qwen3:14b-q4km mean p50 p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cellcold ncold tok/scold p50warm nwarm tok/swarm p50warm/cold
predator:llamacpp:qwen3.5:9b-q4km…312.935.9s914.733.6s1.14×
predator:llamacpp:qwen3.5:9b-q4km…311.67.3s912.97.1s1.12×
predator:llamacpp:qwen3:14b-q4km20.93m18s41.159.4s1.27×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

CellhelloP-MEDIUMP-HARD
predator:llamacpp:qwen3.5:9b-q4km…12.7 tok/s
4.7s · n=4
15.0 tok/s
33.7s · n=4
15.1 tok/s
51.8s · n=4
predator:llamacpp:qwen3.5:9b-q4km…8.5 tok/s
1.4s · n=4
14.3 tok/s
7.2s · n=4
14.9 tok/s
22.1s · n=4
predator:llamacpp:qwen3:14b-q4km1.0 tok/s
59.4s · n=4
1.2 tok/s
4m14s · n=2

Reasoning vs answer

Thinking models split their output into a hidden reasoning trace and a visible answer. The ratio shows how much of the budget the model spent thinking vs answering.

Cellreasoning charsanswer charsreasoning / answer
predator:llamacpp:qwen3.5:9b-q4km…14574343.36×
predator:llamacpp:qwen3:14b-q4km5042012.51×

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

predator:llamacpp:qwen3.5… hello · cold · 6.8s · 9.4 tok/s hello · warm · 4.6s · 13.9 tok/s hello · warm · 4.6s · 13.8 tok/s hello · warm · 4.7s · 13.6 tok/s P-MEDIUM · cold · 35.9s · 14.2 tok/s P-MEDIUM · warm · 33.8s · 15.1 tok/s P-MEDIUM · warm · 33.6s · 15.2 tok/s P-MEDIUM · warm · 33.5s · 15.3 tok/s P-HARD · cold · 53.6s · 14.9 tok/s P-HARD · warm · 51.8s · 14.9 tok/s P-HARD · warm · 51.8s · 15.2 tok/s P-HARD · warm · 50.9s · 15.3 tok/s predator:llamacpp:qwen3.5… hello · cold · 2.4s · 5.5 tok/s hello · warm · 1.4s · 9.6 tok/s hello · warm · 1.4s · 9.1 tok/s hello · warm · 1.2s · 9.7 tok/s P-MEDIUM · cold · 7.3s · 14.4 tok/s P-MEDIUM · warm · 8.0s · 14.4 tok/s P-MEDIUM · warm · 7.1s · 14.3 tok/s P-MEDIUM · warm · 6.4s · 14.1 tok/s P-HARD · cold · 20.9s · 14.8 tok/s P-HARD · warm · 24.8s · 14.9 tok/s P-HARD · warm · 23.3s · 14.9 tok/s P-HARD · warm · 20.9s · 14.8 tok/s predator:llamacpp:qwen3:1… hello · cold · 98.0s · 0.7 tok/s hello · warm · 59.9s · 1.1 tok/s hello · warm · 58.7s · 1.1 tok/s hello · warm · 58.9s · 1.1 tok/s P-MEDIUM · cold · 299.4s · 1.1 tok/s P-MEDIUM · warm · 210.4s · 1.3 tok/s cold warm

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/fba9d9b1.html
Public benchmarks of the Weeyuga cluster. Run id: fba9d9b1-cc5d-40bc-9e21-beafbb72c65d. SHA ddbaaf46.

Related runs