Archive · Mac (slobodans-macbook-air) · 2026-04-16-qwen3_5_4b_mlx-local-mac-benchmark.html. Originally rendered 2026-04-16. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks
Local Mac MLX Benchmark

Qwen3.5 4B MLX Full Run

This report captures the completed Apple-silicon MLX lane on the 8 GB M1 MacBook Air. The final run used Qwen thinking disabled in the chat template so the benchmark answers stayed human-facing and format-compliant instead of spilling chain-of-thought style text.

Total Questions
35
Total Wall Time
1081.9s
Weighted Primary Avg
19.75s
Weighted Primary Tok/s
16.14
Primary Format Pass
5/35
Follow-up Format Pass
35/35

Source Artifacts

Suite

small-model-coding-eval-v1-qwen3_5_4b_mlx

Questions
5
Wall Time
88.6s
Primary Avg
9.56s
Primary Tok/s
14.62
Primary Format
2/5
Marker Hits
20/23
IDTitleCategoryPrimaryTok/sMarkersPrimary FormatFollow-upFollow-up Format
disk_guard_bashDisk Guard Scriptshell9.257s14.913/4no7.938syes
ipv4_python_testsIPv4 Validatorpython13.642s17.014/4no9.742syes
nginx_safe_reloadNginx Safe Reloadops3.397s10.012/4yes5.040syes
yaml_cli_planYAML Validator Planplanning11.230s15.586/6yes9.971syes
ssh_lockout_triageSSH Lockout Triagedebugging10.275s15.575/5no7.797syes
Suite

overnight-python-telemetry-v1-qwen3_5_4b_mlx

Questions
20
Wall Time
436.2s
Primary Avg
12.78s
Primary Tok/s
15.72
Primary Format
0/20
Marker Hits
70/80
IDTitleCategoryPrimaryTok/sMarkersPrimary FormatFollow-upFollow-up Format
py_csv_parseCSV Parserparsing17.146s17.674/4no10.890syes
py_file_scanFile Scannerfile_io14.576s15.784/4no9.443syes
py_cli_argsCLI Argumentscli9.947s16.394/4no8.685syes
py_typing_dataclassTyped Dataclasstyping4.210s12.354/4no6.436syes
py_pytest_fixturePytest Fixturetests7.001s14.714/4no7.177syes
py_async_fetchAsync Fetchasync14.672s16.774/4no9.309syes
py_http_retryHTTP Retryhttp24.083s15.613/4no12.282syes
py_json_validateJSON Validationvalidation31.132s17.253/4no13.006syes
py_sqlite_storeSQLite Storesqlite10.306s15.824/4no8.584syes
py_fastapi_handlerFastAPI Handlerweb4.441s10.814/4no6.631syes
py_config_dataclassConfig Dataclassconfig12.334s18.083/4no9.577syes
py_logging_setupLogging Setuplogging15.921s18.783/4no9.687syes
py_thread_poolThread Poolconcurrency13.157s18.243/4no8.453syes
py_package_layoutPackage Layoutpackage6.159s16.894/4no7.029syes
py_debug_stacktraceDebug Stacktracedebugging3.276s12.823/4no5.175syes
py_refactor_splitRefactor Splitrefactor9.143s17.283/4no7.895syes
py_csv_summaryCSV Summaryanalysis15.536s17.253/4no10.806syes
py_pathlib_cleanPathlib Cleanerfilesystem12.375s13.664/4no9.663syes
py_pydantic_modelPydantic Modelvalidation4.239s11.093/4no8.782syes
py_regex_log_parserRegex Log Parserparsing25.890s17.193/4no10.416syes
Suite

overnight-python-telemetry-v2-qwen3_5_4b_mlx

Questions
10
Wall Time
557.1s
Primary Avg
38.78s
Primary Tok/s
17.74
Primary Format
3/10
Marker Hits
56/66
IDTitleCategoryPrimaryTok/sMarkersPrimary FormatFollow-upFollow-up Format
myboard_auth_redirect_triage Auth Redirect Triagedebugging38.507s18.188/8no16.360syes
myboard_board_snapshot_regression_testBoard Snapshot Regression Testtests48.141s18.704/7no19.206syes
myboard_lane_config_patch_planLane Config Patch Planplanning37.823s18.516/6no17.117syes
myboard_api_token_audit_regression_testAPI Token Audit Regression Testtests48.104s18.714/7no18.248syes
myboard_announcements_state_sync_reviewAnnouncements State Sync Reviewreview38.529s18.176/6no19.272syes
myboard_feature_flag_lifecycle_testFeature Flag Lifecycle Testtests48.196s18.675/7no17.979syes
myboard_task_bulk_job_debug_packetTask Bulk Job Debug Packetdebugging31.124s17.006/6yes15.484syes
myboard_user_preferences_contract_testUser Preferences Contract Testtests53.665s17.705/7no18.962syes
myboard_orchestration_timeline_forensicsOrchestration Timeline Forensicsforensics24.918s16.176/6yes14.320syes
truthgraph_ingest_log_triage Ingest Log Triagecross_repo_debugging18.833s15.566/6yes11.936syes