Mac versus VPS across the shared 5Q, 20Q, and 10Q suites.
This page compares the same qwen2.5-coder:1.5b benchmark stack on Slobodan's Apple M1 Mac and on vps50. The shared comparison uses the suites both sides definitely completed: the five-question small eval, the twenty-question Python suite, and the ten-question real-context suite.
Mac versus VPS across the shared 5Q, 20Q, and 10Q suites.
How much shorter the shared benchmark stack was on the Mac.
Mac tokens-per-second divided by VPS tokens-per-second on the five-question packet.
Of course! How may I assist you today?
The ratio is not fixed because each packet stresses the model differently. The Mac wins biggest when the benchmark spends most of its time generating answer tokens, and the gap narrows when the benchmark spends more time digesting large prompt packets first.
In the 20Q Python stack, the Mac cut combined model-eval time from 1365.7s on vps50 to 338.7s locally. That packet is dominated by many medium-length code answers and summary follow-ups, so raw decode throughput matters most and the Mac stretches out to the biggest wall-time lead.
In the 10Q real-context stack, both sides had to chew through much larger repo-shaped prompts before answering. The Mac still finished faster, but the prompt burden stayed high on both sides: about 10,204 prompt tokens locally versus 10,458 on vps50. That pulls the overall ratio back down because the suite is less purely generation-bound.
The 5Q packet is the noisiest of the three. It is only five mixed-format prompts, so one oddball answer or formatting miss moves the average more than it does in the longer Python stacks.
Shell, ops, planning, and SSH triage tasks.
| Metric | Mac M1 | VPS50 |
|---|---|---|
| Total wall time | 68.6s | 116.0s |
| Average question time | 13.7s | 23.2s |
| Average throughput | 21.17 tok/s | 9.31 tok/s |
| Average marker hit | 83% | 90% |
| Format passes | 2/5 | 3/5 |
| Strict passes | 2/5 | 3/5 |
General Python implementation and debugging tasks.
| Metric | Mac M1 | VPS50 |
|---|---|---|
| Total wall time | 378.6s | 1433.8s |
| Primary avg duration | 11.4s | 42.0s |
| Follow-up avg duration | 7.5s | 29.7s |
| Primary avg throughput | 16.89 tok/s | 4.25 tok/s |
| Follow-up avg throughput | 16.56 tok/s | 3.87 tok/s |
| Primary avg marker hit | 88% | 89% |
| Follow-up avg marker hit | 0% | 0% |
| Usable primary answers | 20/20 | 20/20 |
| Usable follow-up answers | 20/20 | 20/20 |
Repo-shaped multi-file tasks closer to real production prompts.
| Metric | Mac M1 | VPS50 |
|---|---|---|
| Total wall time | 322.2s | 558.2s |
| Primary avg duration | 24.5s | 45.4s |
| Follow-up avg duration | 7.7s | 10.4s |
| Primary avg throughput | 17.82 tok/s | 10.61 tok/s |
| Follow-up avg throughput | 17.96 tok/s | 11.64 tok/s |
| Primary avg marker hit | 69% | 74% |
| Follow-up avg marker hit | 0% | 0% |
| Usable primary answers | 10/10 | 10/10 |
| Usable follow-up answers | 10/10 | 10/10 |
Every report below is archived separately so the results stay stable and shareable.