Parallel 20Q Benchmark

Local Mac Qwen Same-Model Parallel 20Q

Run the shared 20-question Python benchmark in two-question batches against one model at a time. Questions 1+2 run together, then 3+4, and so on, while Ollama stays on one loaded model with two parallel request slots and a 32K request context.

Started 2026-04-11T18:56:28Z Finished 2026-04-11T18:56:37Z

Runtime

Benchmark Shape

This report keeps the question pairing explicit: questions 1 and 2 run together, then 3 and 4, and so on. Each lane below preserves that same batch rhythm so the only moving part is the model configuration.

Run mode same model pairs

Same-model batches keep one model hot. Mixed-model batches keep two qwen sizes live at once.

Runtime shape 1 loaded / 2 parallel

This reflects the Ollama envelope requested for the suite, not a guessed runtime after the fact.

Context 32768 tokens

Every primary and follow-up request is sent with this target context size.

Suite wall time 9.0s

Total wall time for the full suite file on this host.

same model

Qwen2.5 Coder 0.5B same-model 2-up

Qwen2.5 Coder 0.5B (shared)

1.677x speedup

Total wall 9.0s

Measured lane wall clock across all ten two-question batches.

Serial equivalent 15.2s

Sum of primary and follow-up request durations as if they had been run one by one.

Wall savings 40.4%

How much wall time the two-up batching saved relative to the summed request durations.

Primary avg 6.0s

Average duration per primary answer in this lane.

Average batch wall time	9.0s
Primary average throughput	73.46 tok/s
Follow-up average throughput	75.48 tok/s
Usable primary answers	2/2
Usable follow-up answers	2/2
Primary format passes	0/2
Follow-up format passes	0/2

Per-model slice

Model	Role	Primary avg	Follow-up avg	Primary throughput	Wall savings
Qwen2.5 Coder 0.5B	shared	6.0s	1.5s	73.46 tok/s	0.0%

Batch timing

Batch	Questions	Assignments	Primary wall	Follow-up wall	Total wall	Speedup
1	Q1, Q2	Qwen2.5 Coder 0.5B (py_csv_parse), Qwen2.5 Coder 0.5B (py_file_scan)	7.0s	2.1s	9.0s	1.678