Parallel 20Q Benchmark

Local Mac Qwen Mixed-Model Parallel 20Q

Run the shared 20-question Python benchmark in two-question batches against qwen size pairs. Within each batch the first model answers the odd-numbered question and the second model answers the even-numbered question, while Ollama keeps two models loaded with two parallel request slots and a 32K request context.

Started 2026-04-11T18:56:38Z Finished 2026-04-11T18:56:56Z

Runtime

Benchmark Shape

This report keeps the question pairing explicit: questions 1 and 2 run together, then 3 and 4, and so on. Each lane below preserves that same batch rhythm so the only moving part is the model configuration.

Run mode mixed model pairs

Same-model batches keep one model hot. Mixed-model batches keep two qwen sizes live at once.

Runtime shape 2 loaded / 2 parallel

This reflects the Ollama envelope requested for the suite, not a guessed runtime after the fact.

Context 32768 tokens

Every primary and follow-up request is sent with this target context size.

Suite wall time 18.1s

Total wall time for the full suite file on this host.

mixed model

Qwen2.5 Coder 0.5B plus Qwen2.5 Coder 3B

Qwen2.5 Coder 0.5B (odd_questions), Qwen2.5 Coder 3B (even_questions)

1.351x speedup

Total wall 18.1s

Measured lane wall clock across all ten two-question batches.

Serial equivalent 24.5s

Sum of primary and follow-up request durations as if they had been run one by one.

Wall savings 26.0%

How much wall time the two-up batching saved relative to the summed request durations.

Primary avg 7.2s

Average duration per primary answer in this lane.

Average batch wall time	18.1s
Primary average throughput	50.01 tok/s
Follow-up average throughput	47.84 tok/s
Usable primary answers	2/2
Usable follow-up answers	2/2
Primary format passes	0/2
Follow-up format passes	1/2

Per-model slice

Model	Role	Primary avg	Follow-up avg	Primary throughput	Wall savings
Qwen2.5 Coder 3B	even_questions	12.0s	3.9s	24.46 tok/s	0.0%
Qwen2.5 Coder 0.5B	odd_questions	2.5s	6.1s	75.56 tok/s	0.0%

Batch timing

Batch	Questions	Assignments	Primary wall	Follow-up wall	Total wall	Speedup
1	Q1, Q2	Qwen2.5 Coder 0.5B (py_csv_parse), Qwen2.5 Coder 3B (py_file_scan)	12.0s	6.1s	18.1s	1.352