Archive · vps-81 historical telemetry · local-mac/2026-04-11-parallel-qwen-same-model-20q.html. Originally rendered 2026-04-11. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks
Parallel 20Q Benchmark

Local Mac Qwen Same-Model Parallel 20Q

Run the shared 20-question Python benchmark in two-question batches against one model at a time. Questions 1+2 run together, then 3+4, and so on, while Ollama stays on one loaded model with two parallel request slots and a 32K request context.

Runtime

Benchmark Shape

This report keeps the question pairing explicit: questions 1 and 2 run together, then 3 and 4, and so on. Each lane below preserves that same batch rhythm so the only moving part is the model configuration.

Run mode same model pairs

Same-model batches keep one model hot. Mixed-model batches keep two qwen sizes live at once.

Runtime shape 1 loaded / 2 parallel

This reflects the Ollama envelope requested for the suite, not a guessed runtime after the fact.

Context 32768 tokens

Every primary and follow-up request is sent with this target context size.

Suite wall time 9.0s

Total wall time for the full suite file on this host.

same model

Qwen2.5 Coder 0.5B same-model 2-up

Qwen2.5 Coder 0.5B (shared)

1.677x speedup
Total wall 9.0s

Measured lane wall clock across all ten two-question batches.

Serial equivalent 15.2s

Sum of primary and follow-up request durations as if they had been run one by one.

Wall savings 40.4%

How much wall time the two-up batching saved relative to the summed request durations.

Primary avg 6.0s

Average duration per primary answer in this lane.

Average batch wall time9.0s
Primary average throughput73.46 tok/s
Follow-up average throughput75.48 tok/s
Usable primary answers2/2
Usable follow-up answers2/2
Primary format passes0/2
Follow-up format passes0/2

Per-model slice

Model Role Primary avg Follow-up avg Primary throughput Wall savings
Qwen2.5 Coder 0.5Bshared6.0s1.5s73.46 tok/s0.0%

Batch timing

Batch Questions Assignments Primary wall Follow-up wall Total wall Speedup
1Q1, Q2Qwen2.5 Coder 0.5B (py_csv_parse), Qwen2.5 Coder 0.5B (py_file_scan)7.0s2.1s9.0s1.678