Archive · vps-81 historical telemetry · local-mac/2026-04-11-qwen0_5b-local-mac-vs-vps-report.html. Originally rendered 2026-04-11. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks

Qwen2.5-Coder 0.5B: Mac vs VPS50

This page compares the same qwen2.5-coder:0.5b benchmark stack on Slobodan's Apple M1 Mac and on vps50. It uses only the suites completed on both machines: hello-check, the five-question small eval, the twenty-question Python suite, and the ten-question real-context suite.

Shared Benchmark Total 316.3s vs 627.7s

Mac M1 against VPS50 for hello + 5Q + 20Q + 10Q.

Wall-Time Advantage 1.98x

How much shorter the full shared benchmark was on the Mac.

5Q Throughput Advantage 2.21x

Average tokens-per-second on the small-eval packet.

Quality Snapshot 78% vs 81%

Primary marker-hit average on the 20-question Python suite.

Hello Check

Metric Mac M1 VPS50
Latency 1.8s 4.0s
Reply length 38 38

Five-Question Small Eval

Metric Mac M1 VPS50
Total wall time 88.0s 116.6s
Average question time 17.6s 23.3s
Average throughput 45.69 tok/s 20.67 tok/s
Average marker hit 80% 80%
Format passes 3/5 3/5
Strict passes 2/5 2/5

Python 20Q

Metric Mac M1 VPS50
Total wall time 122.2s 260.1s
Primary avg duration 4.4s 9.4s
Follow-up avg duration 1.7s 3.6s
Primary avg throughput 42.18 tok/s 20.88 tok/s
Follow-up avg throughput 41.80 tok/s 22.08 tok/s
Primary avg marker hit 78% 81%
Follow-up avg marker hit 60% 60%
Usable primary answers 20/20 20/20
Usable follow-up answers 20/20 20/20

Python 10Q

Metric Mac M1 VPS50
Total wall time 104.3s 247.0s
Primary avg duration 8.4s 20.4s
Follow-up avg duration 2.0s 4.2s
Primary avg throughput 38.14 tok/s 17.21 tok/s
Follow-up avg throughput 38.91 tok/s 18.07 tok/s
Primary avg marker hit 55% 63%
Follow-up avg marker hit 80% 70%
Usable primary answers 10/10 10/10
Usable follow-up answers 10/10 10/10

20Q Extremes

Fastest and slowest primary tasks on each machine, so the tail latency shape is easy to compare.

Mac M1
  • Fastest: Pydantic Model in 1.9s
  • Slowest: Pytest Fixture in 8.5s
VPS50
  • Fastest: Typed Dataclass in 3.2s
  • Slowest: Refactor Split in 31.3s

10Q Extremes

The real-context packet shows how sharply first-order repository prompts differ between the Mac and the VPS.

Mac M1
  • Fastest: Orchestration Timeline Forensics in 4.9s
  • Slowest: API Token Audit Regression Test in 15.0s
VPS50
  • Fastest: Ingest Log Triage in 4.4s
  • Slowest: Task Bulk Job Debug Packet in 33.8s

Artifacts

Direct links to the underlying JSON and standalone report pages.

Mac
  • Hello report
  • 5Q report
  • 20Q report
  • 10Q report
VPS50
  • Hello report
  • 5Q report
  • 20Q report
  • 10Q report