Recovered Full-Suite Summary

Qwen2.5 Coder 14B on the Local Mac was a ceiling probe, not a viable workflow.

This page is written in the same finished-report style as the other telemetry summaries, but the run did not complete. The computer started freezing under sustained 14B load, the suite was aborted during the 20Q stage, and the results below are the recovered artifacts from the completed and partial stages.

Aborted by operator because the Mac became unpleasant to use under load. Treat this as evidence that 14B is beyond the practical ceiling for this 8 GB machine.

Back to report catalog qwen2.5-coder:14b ollama-local-mac Output: 2026-04-12-qwen2_5_coder_14b-local-mac-overview-aborted.html

Verdict

Recovered headline numbers

Hello check 138.3s

A simple greeting still took over two minutes.

5Q small eval 3/5 passes

Average latency 1990.4s and 0.07 tok/s.

20Q progress 4/20

Recovered from raw JSON after 5702.0s of runtime.

10Q progress 0/10

The runner never reached the final 10Q stage.

Stages

Stage-by-stage outcome

Stage	Status	Progress	Elapsed	Notes
Hello check	Completed	1/1	138.3s	Basic model smoke test returned a valid greeting.
5Q small eval	Completed	5/5	1990.4s	3/5 strict passes. 1 timeout.
20Q Python suite	Partial	4/20	5702.0s	Recovered from per-question JSON after the runner was aborted.
10Q Python suite	Not started	0/10	n/a	The run never reached the 10Q stage before the abort.

Recovered 5Q

Completed small-eval details

The 5Q stage did finish, but only 3 of 5 questions met the strict pass rules. One question timed out at the full 3600-second ceiling, which is the clearest sign that the 14B lane is unusable on this host for normal iteration.

Question	Category	Duration	Throughput	Marker hit rate	Format OK	Outcome
Disk Guard Script	shell	1193.9s	0.09 tok/s	0.75	no	usable
IPv4 Validator	python	2138.0s	0.09 tok/s	1.00	yes	usable
Nginx Safe Reload	ops	495.5s	0.09 tok/s	0.75	yes	usable
YAML Validator Plan	planning	3600.0s	n/a	0.00	no	timed out
SSH Lockout Triage	debugging	2524.7s	0.10 tok/s	1.00	yes	usable

Recovered 20Q

Partial 20Q progress before abort

The top-level 20Q suite summary never finalized, so this section is reconstructed from the per-question primary and follow-up JSON files that were already on disk when the run was aborted.

Primary avg 547.1s

Average primary duration across the four recovered questions.

Primary throughput 0.14 tok/s

Estimated from eval token count over eval duration.

Follow-up avg 878.2s

Average follow-up duration across the four recovered questions.

Follow-up throughput 0.14 tok/s

Estimated from eval token count over eval duration.

Question	Category	Primary duration	Primary throughput	Follow-up duration	Follow-up throughput
CSV Parser	parsing	418.2s	0.15 tok/s	1042.3s	0.14 tok/s
File Scanner	file_io	665.9s	0.14 tok/s	840.0s	0.14 tok/s
CLI Arguments	cli	754.4s	0.14 tok/s	955.9s	0.14 tok/s
Typed Dataclass	typing	349.9s	0.14 tok/s	674.4s	0.13 tok/s

Interpretation

What this means

This Mac can technically load and answer with qwen2.5-coder:14b, but the response times are too slow for practical use.
The first greeting completed in 138.3s, which already disqualifies the model for normal interactive workflows on this machine.
The 5Q stage averaged 1990.4s per question and still suffered a full timeout.
After roughly 5702.0s, the run had only completed 4/20 questions in the 20Q stage.
The clean recommendation is to stop treating 14B as a candidate daily-driver on this Mac and move back down to a smaller tier for any local benchmark that needs to finish end to end.

Artifacts

Recovered source files

Hello check: [archive-source]
Small eval: [archive-source]
Partial 20Q dir: [archive-source]
Status file: [archive-source]

Hello response preview: Of course! How may I assist you today?

Source host: