Archive · Pavilion windows-laptop · 2026-04-13-windows-gpu-qwen35-2b-long-context.html. Originally rendered 2026-04-13. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks

2026-04-13 Pavilion GPU lane - Qwen3.5 2B

Generated 2026-04-13 18:32:41 . Source run: C:\CodexProjects\MyServers\instances\pavilion-windows-laptop\telemetry\generated\long-context\qwen35-2b\2026-04-13_16-06-31\gpu

Note: Hybrid lane (262K/500K/1M CPU-KV scenarios) was aborted after manual stop because the 262K run exceeded 40 minutes without progress. Treat this YaRN/CPU offload configuration as non-viable on Pavilion unless we redesign the workload.

Per-scenario timings
Scenario Context Tokens Prompt Tokens Prefill Seconds Prefill tok/s Generation Tokens Generation Seconds Generation tok/s Wall Seconds
gpu-016k16,38415,56539.936389.7491288.55514.96253.252
gpu-032k32,76831,633101.825310.661288.92514.342117.235
gpu-050k50,00050,000210.937237.03812891.7341.395311.297
gpu-080k80,00080,000446.696179.093128178.0080.719634.421
gpu-128k131,072131,0721054.38124.312256415.0080.6171481.089