... . --> vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50 — benchmarks.weeyuga.com
← All benchmarks
Headline Methodology Results Cold vs warm By prompt Timeline Raw

5 MAY 2026 · VPS50 · cloud · 16 vCPU AMD EPYC · 62 GB RAM · no GPU · gemma/phi/qwen2.5/qwen3 · param-sweep

vps50-cpu-matrix-1 — gemma/phi/qwen2.5/qwen3 on vps50

6 calls across 5 cell(s); ~0.0 tok/s mean; p50 527.8s

Methodology

See A3B_AND_CPU_OVERNIGHT_2026-05-05 for the full procedure. Reproducible at git SHA ddbaaf46.

Results

Cell tok/s mean tok/s p50 tok/s p95 duration p50 calls
phi-40
gemma-4-26b-a4b0
qwen3-30b-a3b0
qwen2.5-72b0
phi-4-q4km-cpu-ctx32k0.00.00.08m47s6

tokens per second — mean · p50 · p95

0 0 0 0 tok/s phi-4 gemma-4-26b-a4b qwen3-30b-a3b qwen2.5-72b mean: 0.02 tok/s p50: 0.02 tok/s p95: 0.02 tok/s phi-4-q4km-cpu-… mean p50 p95

Cold start vs warm

Cold-start measurements are the first call into a model after it loads from disk; warm calls are everything after. The ratio shows how much of the deployment’s wall-time cost is one-time vs steady-state.

Cellcold ncold tok/scold p50warm nwarm tok/swarm p50warm/cold
vps50:llamacpp:phi-4-q4km-cpu-ctx32k10.09m0s30.08m35s1.00×

By prompt difficulty

Tokens per second by prompt class. hello is a trivial one-line prompt; P-MEDIUM and P-HARD are the deeper questions in the suite. The shape of the gap tells you whether the model is bottlenecked on parsing or on generation.

Cellhello
vps50:llamacpp:phi-4-q4km-cpu-ctx32k0.0 tok/s
8m47s · n=4

Per-call timeline

Every call placed during this run, in order, colored by phase. Width is proportional to the call’s share of the cell’s wall-time. Hover any segment for the prompt id and tok/s.

vps50:llamacpp:phi-4-q4km… hello · cold · 540.0s · 0.0 tok/s hello · warm · 628.1s · 0.0 tok/s hello · warm · 512.1s · 0.0 tok/s hello · warm · 515.6s · 0.0 tok/s cold warm

Raw data

Every run gets its JSONL, log, summary, and metadata published. Clone the archive; re-run it; tell us where we got it wrong.

Cite

Margetic, S. et al. (2026). benchmarks.weeyuga.com/benchmarks/cef4b219.html
Public benchmarks of the Weeyuga cluster. Run id: cef4b219-6b65-4740-8c3b-5b68aa10cf3f. SHA ddbaaf46.

Related runs