Qwen2.5 Coder 1.5B Mac vs VPS

Qwen2.5 Coder 1.5B: Mac vs VPS

This page compares the same qwen2.5-coder:1.5b benchmark stack on Slobodan's Apple M1 Mac and on vps50. The shared comparison uses the suites both sides definitely completed: the five-question small eval, the twenty-question Python suite, and the ten-question real-context suite.

Shared Suite Total 769.4s vs 2108.1s

Mac versus VPS across the shared 5Q, 20Q, and 10Q suites.

Wall-Time Advantage 2.74x

How much shorter the shared benchmark stack was on the Mac.

5Q Throughput Advantage 2.27x

Mac tokens-per-second divided by VPS tokens-per-second on the five-question packet.

Mac Hello Check 4.8s

Of course! How may I assist you today?

Why The Speed Gap Changes By Suite

The ratio is not fixed because each packet stresses the model differently. The Mac wins biggest when the benchmark spends most of its time generating answer tokens, and the gap narrows when the benchmark spends more time digesting large prompt packets first.

In the 20Q Python stack, the Mac cut combined model-eval time from 1365.7s on vps50 to 338.7s locally. That packet is dominated by many medium-length code answers and summary follow-ups, so raw decode throughput matters most and the Mac stretches out to the biggest wall-time lead.

In the 10Q real-context stack, both sides had to chew through much larger repo-shaped prompts before answering. The Mac still finished faster, but the prompt burden stayed high on both sides: about 10,204 prompt tokens locally versus 10,458 on vps50. That pulls the overall ratio back down because the suite is less purely generation-bound.

The 5Q packet is the noisiest of the three. It is only five mixed-format prompts, so one oddball answer or formatting miss moves the average more than it does in the longer Python stacks.

Five-Question Packet

Shell, ops, planning, and SSH triage tasks.

Metric	Mac M1	VPS50
Total wall time	68.6s	116.0s
Average question time	13.7s	23.2s
Average throughput	21.17 tok/s	9.31 tok/s
Average marker hit	83%	90%
Format passes	2/5	3/5
Strict passes	2/5	3/5

Python 20Q

General Python implementation and debugging tasks.

Metric	Mac M1	VPS50
Total wall time	378.6s	1433.8s
Primary avg duration	11.4s	42.0s
Follow-up avg duration	7.5s	29.7s
Primary avg throughput	16.89 tok/s	4.25 tok/s
Follow-up avg throughput	16.56 tok/s	3.87 tok/s
Primary avg marker hit	88%	89%
Follow-up avg marker hit	0%	0%
Usable primary answers	20/20	20/20
Usable follow-up answers	20/20	20/20

Real-Context 10Q

Repo-shaped multi-file tasks closer to real production prompts.

Metric	Mac M1	VPS50
Total wall time	322.2s	558.2s
Primary avg duration	24.5s	45.4s
Follow-up avg duration	7.7s	10.4s
Primary avg throughput	17.82 tok/s	10.61 tok/s
Follow-up avg throughput	17.96 tok/s	11.64 tok/s
Primary avg marker hit	69%	74%
Follow-up avg marker hit	0%	0%
Usable primary answers	10/10	10/10
Usable follow-up answers	10/10	10/10

Qwen2.5 Coder 1.5B: Mac vs VPS

Why The Speed Gap Changes By Suite

Five-Question Packet

Python 20Q

Real-Context 10Q

Drill-Down Reports