Generated 2026-04-13 18:32:41 . Source run: C:\CodexProjects\MyServers\instances\pavilion-windows-laptop\telemetry\generated\long-context\qwen35-2b\2026-04-13_16-06-31\gpu
Note: Hybrid lane (262K/500K/1M CPU-KV scenarios) was aborted after manual stop because the 262K run exceeded 40 minutes without progress. Treat this YaRN/CPU offload configuration as non-viable on Pavilion unless we redesign the workload.
| Scenario | Context Tokens | Prompt Tokens | Prefill Seconds | Prefill tok/s | Generation Tokens | Generation Seconds | Generation tok/s | Wall Seconds |
|---|---|---|---|---|---|---|---|---|
| gpu-016k | 16,384 | 15,565 | 39.936 | 389.749 | 128 | 8.555 | 14.962 | 53.252 |
| gpu-032k | 32,768 | 31,633 | 101.825 | 310.66 | 128 | 8.925 | 14.342 | 117.235 |
| gpu-050k | 50,000 | 50,000 | 210.937 | 237.038 | 128 | 91.734 | 1.395 | 311.297 |
| gpu-080k | 80,000 | 80,000 | 446.696 | 179.093 | 128 | 178.008 | 0.719 | 634.421 |
| gpu-128k | 131,072 | 131,072 | 1054.38 | 124.312 | 256 | 415.008 | 0.617 | 1481.089 |