... . --> Predator trio bench — benchmarks.weeyuga.com
← All benchmarks
Headline Methodology Results Cold vs warm By prompt Timeline Raw

4 MAY 2026 · Predator · gaming laptop · GTX 1060 6 GB · 28 GB RAM · gemma/granite/qwen3.5 · chat

Predator trio bench

42 calls across 3 cell(s); ~16.5 tok/s mean; p50 9.6s

What Janie says

The clearest snapshot of what the bigger of the two consumer-GPU machines tested actually does. Three different model families on the same Predator hardware, all-GPU, Q4_K_M quant: gemma-4-E4B-it at twenty-three-and-a-half tokens per second on the hard prompt; granite-4.1-8B at sixteen; qwen3.5-9B (no-think) at fifteen.

None of these are frontier-model GPUs running frontier-model parameter counts. They're a six-year-old GTX 1060 running the smartest models we could fit at Q4 quant. The result is fast enough for assistant-style chat (faster than typing speed across all three), three model families landing in roughly the same usable performance band, no thirty-second pauses while you wait for a response. This is the "what works" tier on this hardware — the floor of useful, not the ceiling of impressive.

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell tok/s mean tok/s p50 tok/s p95 duration p50 calls
granite-4.1:8b-q4km13.515.215.76.4s14
gemma-4:e4b-it-q4km21.822.923.68.2s14
qwen3.5:9b-q4km14.014.414.635.3s14

tokens per second — mean · p50 · p95

7 13 20 26 tok/s mean: 13.49 tok/s p50: 15.16 tok/s p95: 15.73 tok/s granite-4.1:8b-… mean: 21.82 tok/s p50: 22.91 tok/s p95: 23.64 tok/s gemma-4:e4b-it-… mean: 14.04 tok/s p50: 14.42 tok/s p95: 14.57 tok/s qwen3.5:9b-q4km mean p50 p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cellcold ncold tok/scold p50warm nwarm tok/swarm p50warm/cold
predator:llamacpp:granite-4.1:8b-…311.96.1s914.06.6s1.18×
predator:llamacpp:gemma-4:e4b-it-…321.43.9s922.012.3s1.03×
predator:llamacpp:qwen3.5:9b-q4km313.335.7s914.335.3s1.08×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

CellhelloP-MEDIUMP-HARD
predator:llamacpp:granite-4.1:8b-…9.7 tok/s
0.9s · n=4
15.1 tok/s
6.4s · n=4
15.7 tok/s
18.8s · n=4
predator:llamacpp:gemma-4:e4b-it-…19.2 tok/s
2.9s · n=4
22.8 tok/s
8.2s · n=4
23.5 tok/s
16.0s · n=4
predator:llamacpp:qwen3.5:9b-q4km13.2 tok/s
4.6s · n=4
14.5 tok/s
35.3s · n=4
14.4 tok/s
1m10s · n=4

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

predator:llamacpp:granite… hello · cold · 1.8s · 5.6 tok/s hello · warm · 0.9s · 10.9 tok/s hello · warm · 0.9s · 10.9 tok/s hello · warm · 0.9s · 11.5 tok/s P-MEDIUM · cold · 6.1s · 14.4 tok/s P-MEDIUM · warm · 6.2s · 15.2 tok/s P-MEDIUM · warm · 7.0s · 15.2 tok/s P-MEDIUM · warm · 6.6s · 15.4 tok/s P-HARD · cold · 20.3s · 15.6 tok/s P-HARD · warm · 18.8s · 15.7 tok/s P-HARD · warm · 18.9s · 15.7 tok/s P-HARD · warm · 17.7s · 15.7 tok/s predator:llamacpp:gemma-4… hello · cold · 3.5s · 18.2 tok/s hello · warm · 0.9s · 14.1 tok/s hello · warm · 2.9s · 21.9 tok/s hello · warm · 2.9s · 22.5 tok/s P-MEDIUM · cold · 3.9s · 22.3 tok/s P-MEDIUM · warm · 4.1s · 21.7 tok/s P-MEDIUM · warm · 12.3s · 23.4 tok/s P-MEDIUM · warm · 13.0s · 23.6 tok/s P-HARD · cold · 28.4s · 23.6 tok/s P-HARD · warm · 16.2s · 23.7 tok/s P-HARD · warm · 15.8s · 23.4 tok/s P-HARD · warm · 14.7s · 23.4 tok/s predator:llamacpp:qwen3.5… hello · cold · 5.5s · 11.6 tok/s hello · warm · 4.7s · 13.7 tok/s hello · warm · 4.6s · 13.8 tok/s hello · warm · 4.6s · 13.8 tok/s P-MEDIUM · cold · 35.7s · 14.4 tok/s P-MEDIUM · warm · 35.3s · 14.5 tok/s P-MEDIUM · warm · 35.3s · 14.5 tok/s P-MEDIUM · warm · 35.2s · 14.5 tok/s P-HARD · cold · 73.7s · 13.9 tok/s P-HARD · warm · 70.3s · 14.6 tok/s P-HARD · warm · 70.3s · 14.6 tok/s P-HARD · warm · 70.3s · 14.6 tok/s cold warm

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/09d8fbde.html
Public benchmarks of the Weeyuga cluster. Run id: 09d8fbde-0008-49bb-99da-03eeaca72be1. SHA ddbaaf46.

Related runs