# benchmarks.weeyuga.com > Honest benchmarks for personal AI on consumer hardware. Used gaming laptops, base-model Macs, cheap CPU VPSes — measured, not estimated. Published with raw JSONL, methodology, and the git SHA of the harness so anyone can reproduce. Released CC-BY-4.0 with no marketing register. This site exists to save the next person time. What didn't work matters more than what did. Every claim links to its underlying run; every run links to its raw artifacts; every methodology gap we know about is documented up front. ## What's on the site - [Home](https://benchmarks.weeyuga.com/): hero question + walk through the four hardware tiers we measure (CPU VPS, 4 GB GPU laptop, 6 GB GPU laptop, M1 Mac). - [Quickstart](https://benchmarks.weeyuga.com/quickstart.html): click-to-copy agent prompt that delegates a whole new benchmark contribution to a coding agent (read methodology, clone the public archive, run the bench, file a PR). The fastest path from "I want to benchmark this model on my hardware" to "my numbers are in the archive." - [Browse benchmarks](https://benchmarks.weeyuga.com/benchmarks/): chip-filtered card list of every site-grade run, by hardware, model family, or task kind. - [Catalog](https://benchmarks.weeyuga.com/catalog.html): the full 125-entry index including the 104 legacy reports surfaced from earlier campaigns. - [Findings](https://benchmarks.weeyuga.com/findings/): synthesis posts that weave multiple benchmarks together, plus a "dead ends" subsection for things we tried that didn't work. - [How we measure](https://benchmarks.weeyuga.com/methodology.html): what we measure, what we don't, fairness rules, reproducibility, and a running list of mistakes we've published and corrected. - [Data](https://benchmarks.weeyuga.com/data.html): bulk download + GitHub archive link + citation block. - [About](https://benchmarks.weeyuga.com/about.html): one paragraph on who runs it (Sloba Margetic) and why. ## Schema for machine readers - [Catalogue (JSON)](https://benchmarks.weeyuga.com/catalogue.json): every benchmark with id, hardware, model_family, task_kind, results_table, chart_spec, raw_data_urls, related_ids, site_grade. Schema version `1.0` per Ben's lock 2026-05-06T10:30Z. - [Build metadata (JSON)](https://benchmarks.weeyuga.com/build.json): currently-deployed git SHA + timestamp + grade breakdown. - [Public GitHub archive](https://github.com/slobodanmargetic988/weeyuga-benchmarks-public): every run's raw `run.jsonl`, `run.log`, human-readable `run.md`, and `metadata.json` (model checkpoint, env, harness git SHA). ## What you can rely on - **Hardware honestly described.** Every benchmark declares the exact machine: CPU model, GPU model + VRAM, RAM, year. No "consumer GPU" without a number. - **Methodology declared per run.** Every detail page links to the methodology section that produced it. Run-specific deviations get a `methodology_deviations` field rather than being elided. - **Cold-start vs warm separated.** First-call-after-load measurements are tagged separately from steady-state measurements. Speedup ratios published. - **Per-prompt difficulty broken out.** `hello`, `P-MEDIUM`, `P-HARD` (and `P-EASY` where the suite covers it) measured separately so you can see whether a model's bottleneck is parsing or generation. - **Caveats published, not buried.** Any benchmark with a methodology gap or framing risk gets an explicit caveat box in warm-amber on its detail page. - **Mistakes corrected, not memory-holed.** The methodology page lists corrections; v0 numbers stay reachable even after v1 supersedes them. ## What's deliberately not here - Hosted-API comparisons. We don't run benchmarks on hardware we don't own; cloud passthroughs aren't measured here. - Marketing claims. No "fastest local AI on the planet" — every superlative has a measurement next to it. - Frontier-class GPUs. The most capable card we test is a six-year-old GTX 1060 with 6 GB of VRAM. The site is for engineers planning around hardware they can actually buy used. - Synthetic benchmarks with no production analog. Every measurement maps to a workload someone might actually run. ## How to contribute You can re-run any benchmark on your own hardware and tell us where we got it wrong. Four paths: 1. **Hand the [`/quickstart.html`](https://benchmarks.weeyuga.com/quickstart.html) prompt to a coding agent** (Claude Code / Cursor / Codex / Aider / your own). The page hosts a click-to-copy prompt that does the whole loop: read the methodology, clone the public archive, run a benchmark against the model you specify on your hardware, fill out the four canonical files, open a PR. Highest leverage if you already have an AI agent in your workflow. 2. **File an issue** on the public GitHub repo: . Mention the run id (e.g. `09d8fbde-...`) and what you measured differently. 3. **Submit a PR** with your own benchmark JSONL. The harness shape is documented in `methodology.md` inside the public repo. Same metadata fields, same controlled vocab. 4. **Email** if your contribution doesn't fit issue or PR shape (e.g. a corrected methodology proposal that spans multiple runs). Disconfirmation is more valuable than another credulous re-share. If you find a number that doesn't match what your hardware does on the same model with the same prompt, please tell us — the fix is mechanical and the credibility win is large. ## Citation ``` Margetic, S. et al. (2026). benchmarks.weeyuga.com. Public benchmarks of personal AI on consumer hardware. ``` Per-benchmark citation blocks are on every detail page. Data: CC-BY-4.0. Helper scripts: MIT.