Archive · vps-81 historical telemetry · local-mac/2026-04-11-parallel-qwen-mixed-model-20q.html. Originally rendered 2026-04-11. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks
Parallel 20Q Benchmark

Local Mac Qwen Mixed-Model Parallel 20Q

Run the shared 20-question Python benchmark in two-question batches against qwen size pairs. Within each batch the first model answers the odd-numbered question and the second model answers the even-numbered question, while Ollama keeps two models loaded with two parallel request slots and a 32K request context.

Runtime

Benchmark Shape

This report keeps the question pairing explicit: questions 1 and 2 run together, then 3 and 4, and so on. Each lane below preserves that same batch rhythm so the only moving part is the model configuration.

Run mode mixed model pairs

Same-model batches keep one model hot. Mixed-model batches keep two qwen sizes live at once.

Runtime shape 2 loaded / 2 parallel

This reflects the Ollama envelope requested for the suite, not a guessed runtime after the fact.

Context 32768 tokens

Every primary and follow-up request is sent with this target context size.

Suite wall time 18.1s

Total wall time for the full suite file on this host.

mixed model

Qwen2.5 Coder 0.5B plus Qwen2.5 Coder 3B

Qwen2.5 Coder 0.5B (odd_questions), Qwen2.5 Coder 3B (even_questions)

1.351x speedup
Total wall 18.1s

Measured lane wall clock across all ten two-question batches.

Serial equivalent 24.5s

Sum of primary and follow-up request durations as if they had been run one by one.

Wall savings 26.0%

How much wall time the two-up batching saved relative to the summed request durations.

Primary avg 7.2s

Average duration per primary answer in this lane.

Average batch wall time18.1s
Primary average throughput50.01 tok/s
Follow-up average throughput47.84 tok/s
Usable primary answers2/2
Usable follow-up answers2/2
Primary format passes0/2
Follow-up format passes1/2

Per-model slice

Model Role Primary avg Follow-up avg Primary throughput Wall savings
Qwen2.5 Coder 3Beven_questions12.0s3.9s24.46 tok/s0.0%
Qwen2.5 Coder 0.5Bodd_questions2.5s6.1s75.56 tok/s0.0%

Batch timing

Batch Questions Assignments Primary wall Follow-up wall Total wall Speedup
1Q1, Q2Qwen2.5 Coder 0.5B (py_csv_parse), Qwen2.5 Coder 3B (py_file_scan)12.0s6.1s18.1s1.352