Runtime Comparison

Qwen3.5 4B MLX vs Qwen3.5 4B Ollama

Both runs now exist on the same 8 GB M1 MacBook Air, so this page compares the Apple-native MLX lane against the regular Ollama lane without crossing machines or changing benchmark questions.

MLX Questions

Ollama Questions

MLX Primary Avg

19.75s

Ollama Primary Avg

63.67s

MLX Tok/s

16.14

Ollama Tok/s

5.24

Per-Suite Comparison

Suite	Questions	MLX Primary Avg	Ollama Primary Avg	MLX Tok/s	Ollama Tok/s	MLX Format	Ollama Format	MLX Markers	Ollama Markers
small-model-coding-eval-v1-qwen3_5_4b_mlx	5	9.56s	30.71s	14.62	6.57	2/5	4/5	20/23	20/23
overnight-python-telemetry-v1-qwen3_5_4b_mlx	20	12.78s	35.32s	15.72	5.14	0/20	0/20	70/80	68/80
overnight-python-telemetry-v2-qwen3_5_4b_mlx	10	38.78s	136.85s	17.74	4.79	3/10	8/10	56/66	55/66

Artifact Paths

instances/vps-81-17-99-14/telemetry/generated/local-mac-mlx/qwen3_5_4b_mlx/small_eval/small-eval-qwen3_5_4b_mlx-2026-04-16_00-32-03/python_task_suite.json
instances/vps-81-17-99-14/telemetry/generated/local-mac-mlx/qwen3_5_4b_mlx/python_v1_20q/python20-qwen3_5_4b_mlx-2026-04-16_00-33-36/python_task_suite.json
instances/vps-81-17-99-14/telemetry/generated/local-mac-mlx/qwen3_5_4b_mlx/python_v2_10q/python10-qwen3_5_4b_mlx-2026-04-16_00-40-57/python_task_suite.json
instances/vps-81-17-99-14/telemetry/generated/local-mac/qwen3_5_4b_small_eval/local-mac-2026-04-16_01-18-40/small_model_eval.json
instances/vps-81-17-99-14/telemetry/generated/local-mac/qwen3_5_4b_python_v1_20q/python-suite-2026-04-16_01-21-14/python_task_suite.json
instances/vps-81-17-99-14/telemetry/generated/local-mac/qwen3_5_4b_python_v2_10q/python-suite-2026-04-16_01-40-32/python_task_suite.json