Qwen2.5 Coder 3B Mac vs VPS

Qwen2.5 Coder 3B: Mac vs VPS

This page compares the same qwen2.5-coder:3b benchmark stack on Slobodan's Apple M1 Mac and on vps50. The shared comparison uses the suites both sides definitely completed: the five-question small eval, the twenty-question Python suite, and the ten-question real-context suite.

Shared Suite Total 914.5s vs 2282.3s

Mac versus VPS across the shared 5Q, 20Q, and 10Q suites.

Wall-Time Advantage 2.50x

How much shorter the shared benchmark stack was on the Mac.

5Q Throughput Advantage 2.82x

Mac tokens-per-second divided by VPS tokens-per-second on the five-question packet.

Mac Hello Check 5.8s

Of course! I'm here to help. What do you need assistance with today?

Why The Speed Gap Changes By Suite

The ratio is not fixed because each packet stresses the model differently. The Mac wins biggest when the benchmark spends most of its time generating answer tokens, and the gap narrows when the benchmark spends more time digesting large prompt packets first.

In the 20Q Python stack, the Mac cut combined model-eval time from 1189.8s on vps50 to 357.4s locally. That packet is dominated by many medium-length code answers and summary follow-ups, so raw decode throughput matters most and the Mac stretches out to the biggest wall-time lead.

In the 10Q real-context stack, both sides had to chew through much larger repo-shaped prompts before answering. The Mac still finished faster, but the prompt burden stayed high on both sides: about 10,074 prompt tokens locally versus 10,026 on vps50. That pulls the overall ratio back down because the suite is less purely generation-bound.

The 5Q packet is the noisiest of the three. It is only five mixed-format prompts, so one oddball answer or formatting miss moves the average more than it does in the longer Python stacks.

Five-Question Packet

Shell, ops, planning, and SSH triage tasks.

Metric	Mac M1	VPS50
Total wall time	110.1s	336.2s
Average question time	22.0s	67.2s
Average throughput	12.86 tok/s	4.56 tok/s
Average marker hit	90%	90%
Format passes	3/5	3/5
Strict passes	3/5	3/5

Python 20Q

General Python implementation and debugging tasks.

Metric	Mac M1	VPS50
Total wall time	397.1s	1268.5s
Primary avg duration	13.3s	38.1s
Follow-up avg duration	6.5s	25.3s
Primary avg throughput	14.20 tok/s	4.16 tok/s
Follow-up avg throughput	14.91 tok/s	3.91 tok/s
Primary avg marker hit	85%	89%
Follow-up avg marker hit	0%	0%
Usable primary answers	20/20	20/20
Usable follow-up answers	20/20	20/20

Real-Context 10Q

Repo-shaped multi-file tasks closer to real production prompts.

Metric	Mac M1	VPS50
Total wall time	407.3s	677.6s
Primary avg duration	31.1s	54.2s
Follow-up avg duration	9.6s	13.5s
Primary avg throughput	14.01 tok/s	7.91 tok/s
Follow-up avg throughput	14.34 tok/s	8.98 tok/s
Primary avg marker hit	78%	73%
Follow-up avg marker hit	0%	0%
Usable primary answers	10/10	10/10
Usable follow-up answers	10/10	10/10

Qwen2.5 Coder 3B: Mac vs VPS

Why The Speed Gap Changes By Suite

Five-Question Packet

Python 20Q

Real-Context 10Q

Drill-Down Reports