Archive · vps-81 historical telemetry · local-mac/2026-04-12-qwen2_5_coder_14b-local-mac-overview-aborted.html. Originally rendered 2026-04-12. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks
Recovered Full-Suite Summary

Qwen2.5 Coder 14B on the Local Mac was a ceiling probe, not a viable workflow.

This page is written in the same finished-report style as the other telemetry summaries, but the run did not complete. The computer started freezing under sustained 14B load, the suite was aborted during the 20Q stage, and the results below are the recovered artifacts from the completed and partial stages.

Aborted by operator because the Mac became unpleasant to use under load. Treat this as evidence that 14B is beyond the practical ceiling for this 8 GB machine.
Verdict

Recovered headline numbers

Hello check 138.3s

A simple greeting still took over two minutes.

5Q small eval 3/5 passes

Average latency 1990.4s and 0.07 tok/s.

20Q progress 4/20

Recovered from raw JSON after 5702.0s of runtime.

10Q progress 0/10

The runner never reached the final 10Q stage.

Stages

Stage-by-stage outcome

StageStatusProgressElapsedNotes
Hello checkCompleted1/1138.3sBasic model smoke test returned a valid greeting.
5Q small evalCompleted5/51990.4s3/5 strict passes. 1 timeout.
20Q Python suitePartial4/205702.0sRecovered from per-question JSON after the runner was aborted.
10Q Python suiteNot started0/10n/aThe run never reached the 10Q stage before the abort.
Recovered 5Q

Completed small-eval details

The 5Q stage did finish, but only 3 of 5 questions met the strict pass rules. One question timed out at the full 3600-second ceiling, which is the clearest sign that the 14B lane is unusable on this host for normal iteration.

QuestionCategoryDurationThroughputMarker hit rateFormat OKOutcome
Disk Guard Scriptshell1193.9s0.09 tok/s0.75nousable
IPv4 Validatorpython2138.0s0.09 tok/s1.00yesusable
Nginx Safe Reloadops495.5s0.09 tok/s0.75yesusable
YAML Validator Planplanning3600.0sn/a0.00notimed out
SSH Lockout Triagedebugging2524.7s0.10 tok/s1.00yesusable
Recovered 20Q

Partial 20Q progress before abort

The top-level 20Q suite summary never finalized, so this section is reconstructed from the per-question primary and follow-up JSON files that were already on disk when the run was aborted.

Primary avg 547.1s

Average primary duration across the four recovered questions.

Primary throughput 0.14 tok/s

Estimated from eval token count over eval duration.

Follow-up avg 878.2s

Average follow-up duration across the four recovered questions.

Follow-up throughput 0.14 tok/s

Estimated from eval token count over eval duration.

QuestionCategoryPrimary durationPrimary throughputFollow-up durationFollow-up throughput
CSV Parserparsing418.2s0.15 tok/s1042.3s0.14 tok/s
File Scannerfile_io665.9s0.14 tok/s840.0s0.14 tok/s
CLI Argumentscli754.4s0.14 tok/s955.9s0.14 tok/s
Typed Dataclasstyping349.9s0.14 tok/s674.4s0.13 tok/s
Interpretation

What this means

Artifacts

Recovered source files

Hello response preview: Of course! How may I assist you today?

Source host: