Same-model batches keep one model hot. Mixed-model batches keep two qwen sizes live at once.
Local Mac Qwen Same-Model Parallel 20Q
Run the shared 20-question Python benchmark in two-question batches against one model at a time. Questions 1+2 run together, then 3+4, and so on, while Ollama stays on one loaded model with two parallel request slots and a 32K request context.
Benchmark Shape
This report keeps the question pairing explicit: questions 1 and 2 run together, then 3 and 4, and so on. Each lane below preserves that same batch rhythm so the only moving part is the model configuration.
This reflects the Ollama envelope requested for the suite, not a guessed runtime after the fact.
Every primary and follow-up request is sent with this target context size.
Total wall time for the full suite file on this host.
Qwen2.5 Coder 0.5B same-model 2-up
Qwen2.5 Coder 0.5B (shared)
Measured lane wall clock across all ten two-question batches.
Sum of primary and follow-up request durations as if they had been run one by one.
How much wall time the two-up batching saved relative to the summed request durations.
Average duration per primary answer in this lane.
| Average batch wall time | 9.0s |
|---|---|
| Primary average throughput | 73.46 tok/s |
| Follow-up average throughput | 75.48 tok/s |
| Usable primary answers | 2/2 |
| Usable follow-up answers | 2/2 |
| Primary format passes | 0/2 |
| Follow-up format passes | 0/2 |
Per-model slice
| Model | Role | Primary avg | Follow-up avg | Primary throughput | Wall savings |
|---|---|---|---|---|---|
| Qwen2.5 Coder 0.5B | shared | 6.0s | 1.5s | 73.46 tok/s | 0.0% |
Batch timing
| Batch | Questions | Assignments | Primary wall | Follow-up wall | Total wall | Speedup |
|---|---|---|---|---|---|---|
| 1 | Q1, Q2 | Qwen2.5 Coder 0.5B (py_csv_parse), Qwen2.5 Coder 0.5B (py_file_scan) | 7.0s | 2.1s | 9.0s | 1.678 |