Benchmarks

12 publishable runs · 4 hardware · 5 model families · 10 archive-only (hidden by default) · last updated 2026-05-06

FlagshipA3B cross-machine — Pavilion side

2026-05-04

Pavilion · GTX 1050 · qwen3 · chat

Pavilion (GTX 1050) running Qwen3-30B-A3B IQ2-XXS at ~6 tok/s mean, p50 58s. The smaller half of the cross-machine pair.

FlagshipA3B cross-machine — Predator side

2026-05-04

Predator · GTX 1070 · qwen3 · chat

Predator (GTX 1070) running the same A3B IQ2-XXS at ~3.9 tok/s mean, p50 102s — the larger half of the pair.

FlagshipPredator Qwen rerun matrix

2026-05-04

Predator · GTX 1070 · qwen3/qwen3.5 · chat

Three Qwen variants on Predator: 36 calls, ~10.9 tok/s mean, p50 24s. The single-machine flagship for the rerun campaign.

FlagshipPredator trio bench

2026-05-04

Predator · GTX 1070 · gemma/granite/qwen3.5 · chat

42 calls across 3 cell(s); ~16.5 tok/s mean; p50 9.6s

Pascal Vulkan-vs-CUDA Cross-Machine Benchmark — 2026-05-06

2026-05-06

pavilion+predator (cross-machine) · gemma/qwen3 · chat

26 calls across 26 cell(s)

vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50

2026-05-05

cicd · CPU only · gemma/phi/qwen2.5/qwen3 · param-sweep

2 calls across 5 cell(s)

predator-a3b-ngl6-ctx-1 — qwen3 on predator

2026-05-05

Predator · GTX 1070 · qwen3 · param-sweep

45 calls across 3 cell(s); ~5.2 tok/s mean; p50 84.8s

predator-a3b-ngl-ctx-2d-1 — qwen3 on predator

2026-05-05

Predator · GTX 1070 · qwen3 · param-sweep

90 calls across 6 cell(s); ~5.9 tok/s mean; p50 73.4s

vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50

2026-05-05

cicd · CPU only · gemma/phi/qwen2.5/qwen3 · param-sweep

6 calls across 5 cell(s); ~0.0 tok/s mean; p50 527.8s

predator-a3b-ctx-sweep-1 — qwen3 on predator

2026-05-04

Predator · GTX 1070 · qwen3 · param-sweep

45 calls across 3 cell(s); ~5.0 tok/s mean; p50 83.9s

predator-a3b-ngl-matrix-1 — qwen3 on predator

2026-05-04

Predator · GTX 1070 · qwen3 · param-sweep

42 calls across 3 cell(s); ~5.7 tok/s mean; p50 71.5s

predator-a3b-1 — qwen3 on predator

2026-05-04

Predator · GTX 1070 · qwen3 · chat

2 calls across 1 cell(s)

Download the full catalogue (JSON) → · Browse on GitHub →