Same-model batches keep one model hot. Mixed-model batches keep two qwen sizes live at once.
Local Mac Qwen Mixed-Model Parallel 20Q
Run the shared 20-question Python benchmark in two-question batches against qwen size pairs. Within each batch the first model answers the odd-numbered question and the second model answers the even-numbered question, while Ollama keeps two models loaded with two parallel request slots and a 32K request context.
Benchmark Shape
This report keeps the question pairing explicit: questions 1 and 2 run together, then 3 and 4, and so on. Each lane below preserves that same batch rhythm so the only moving part is the model configuration.
This reflects the Ollama envelope requested for the suite, not a guessed runtime after the fact.
Every primary and follow-up request is sent with this target context size.
Total wall time for the full suite file on this host.
Qwen2.5 Coder 0.5B plus Qwen2.5 Coder 3B
Qwen2.5 Coder 0.5B (odd_questions), Qwen2.5 Coder 3B (even_questions)
Measured lane wall clock across all ten two-question batches.
Sum of primary and follow-up request durations as if they had been run one by one.
How much wall time the two-up batching saved relative to the summed request durations.
Average duration per primary answer in this lane.
| Average batch wall time | 18.1s |
|---|---|
| Primary average throughput | 50.01 tok/s |
| Follow-up average throughput | 47.84 tok/s |
| Usable primary answers | 2/2 |
| Usable follow-up answers | 2/2 |
| Primary format passes | 0/2 |
| Follow-up format passes | 1/2 |
Per-model slice
| Model | Role | Primary avg | Follow-up avg | Primary throughput | Wall savings |
|---|---|---|---|---|---|
| Qwen2.5 Coder 3B | even_questions | 12.0s | 3.9s | 24.46 tok/s | 0.0% |
| Qwen2.5 Coder 0.5B | odd_questions | 2.5s | 6.1s | 75.56 tok/s | 0.0% |
Batch timing
| Batch | Questions | Assignments | Primary wall | Follow-up wall | Total wall | Speedup |
|---|---|---|---|---|---|---|
| 1 | Q1, Q2 | Qwen2.5 Coder 0.5B (py_csv_parse), Qwen2.5 Coder 3B (py_file_scan) | 12.0s | 6.1s | 18.1s | 1.352 |