Runtime Comparison
Qwen3.5 4B MLX vs Qwen3.5 4B Ollama
Both runs now exist on the same 8 GB M1 MacBook Air, so this page compares the Apple-native MLX lane against the regular Ollama lane without crossing machines or changing benchmark questions.
MLX Questions
35
Ollama Questions
35
MLX Primary Avg
19.75s
Ollama Primary Avg
63.67s
MLX Tok/s
16.14
Ollama Tok/s
5.24
Per-Suite Comparison
| Suite | Questions | MLX Primary Avg | Ollama Primary Avg | MLX Tok/s | Ollama Tok/s | MLX Format | Ollama Format | MLX Markers | Ollama Markers |
|---|---|---|---|---|---|---|---|---|---|
| small-model-coding-eval-v1-qwen3_5_4b_mlx | 5 | 9.56s | 30.71s | 14.62 | 6.57 | 2/5 | 4/5 | 20/23 | 20/23 |
| overnight-python-telemetry-v1-qwen3_5_4b_mlx | 20 | 12.78s | 35.32s | 15.72 | 5.14 | 0/20 | 0/20 | 70/80 | 68/80 |
| overnight-python-telemetry-v2-qwen3_5_4b_mlx | 10 | 38.78s | 136.85s | 17.74 | 4.79 | 3/10 | 8/10 | 56/66 | 55/66 |
Artifact Paths
- instances/vps-81-17-99-14/telemetry/generated/local-mac-mlx/qwen3_5_4b_mlx/small_eval/small-eval-qwen3_5_4b_mlx-2026-04-16_00-32-03/python_task_suite.json
- instances/vps-81-17-99-14/telemetry/generated/local-mac-mlx/qwen3_5_4b_mlx/python_v1_20q/python20-qwen3_5_4b_mlx-2026-04-16_00-33-36/python_task_suite.json
- instances/vps-81-17-99-14/telemetry/generated/local-mac-mlx/qwen3_5_4b_mlx/python_v2_10q/python10-qwen3_5_4b_mlx-2026-04-16_00-40-57/python_task_suite.json
- instances/vps-81-17-99-14/telemetry/generated/local-mac/qwen3_5_4b_small_eval/local-mac-2026-04-16_01-18-40/small_model_eval.json
- instances/vps-81-17-99-14/telemetry/generated/local-mac/qwen3_5_4b_python_v1_20q/python-suite-2026-04-16_01-21-14/python_task_suite.json
- instances/vps-81-17-99-14/telemetry/generated/local-mac/qwen3_5_4b_python_v2_10q/python-suite-2026-04-16_01-40-32/python_task_suite.json