Benchmarks
12 publishable runs · 4 hardware · 5 model families · 10 archive-only (hidden by default) · last updated 2026-05-06
FlagshipA3B cross-machine — Pavilion side
2026-05-04Pavilion (GTX 1050) running Qwen3-30B-A3B IQ2-XXS at ~6 tok/s mean, p50 58s. The smaller half of the cross-machine pair.
FlagshipA3B cross-machine — Predator side
2026-05-04Predator (GTX 1070) running the same A3B IQ2-XXS at ~3.9 tok/s mean, p50 102s — the larger half of the pair.
FlagshipPredator Qwen rerun matrix
2026-05-04Three Qwen variants on Predator: 36 calls, ~10.9 tok/s mean, p50 24s. The single-machine flagship for the rerun campaign.
FlagshipPredator trio bench
2026-05-0442 calls across 3 cell(s); ~16.5 tok/s mean; p50 9.6s
Pascal Vulkan-vs-CUDA Cross-Machine Benchmark — 2026-05-06
2026-05-0626 calls across 26 cell(s)
vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50
2026-05-052 calls across 5 cell(s)
predator-a3b-ngl6-ctx-1 — qwen3 on predator
2026-05-0545 calls across 3 cell(s); ~5.2 tok/s mean; p50 84.8s
predator-a3b-ngl-ctx-2d-1 — qwen3 on predator
2026-05-0590 calls across 6 cell(s); ~5.9 tok/s mean; p50 73.4s
vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50
2026-05-056 calls across 5 cell(s); ~0.0 tok/s mean; p50 527.8s
predator-a3b-ctx-sweep-1 — qwen3 on predator
2026-05-0445 calls across 3 cell(s); ~5.0 tok/s mean; p50 83.9s
predator-a3b-ngl-matrix-1 — qwen3 on predator
2026-05-0442 calls across 3 cell(s); ~5.7 tok/s mean; p50 71.5s
predator-a3b-1 — qwen3 on predator
2026-05-042 calls across 1 cell(s)