Archive · vps-81 historical telemetry · qwen3b/2026-04-11-qwen3b-small-model-manual.html. Originally rendered 2026-04-11. Re-hosted from MyServers on 2026-05-06. Methodology and harness conventions may differ from what we use today; see /methodology.html for current standards. ← back to all benchmarks
vps50 telemetry manual

Small-model performance dashboard

This manual turns the small-model telemetry JSON into one scannable page so we can compare latency, throughput, suite pass-rate, requirement-hit-rate, and question-level behavior without reading a wall of raw logs.

Source: 2026-04-11-qwen3b-small-model-eval.json Generated: 2026-04-11T10:49:28Z Question suite: small-model-coding-eval-v1 Server: vmi3206382 Runner: ollama Ollama: ollama version is 0.20.2

Question Set

Suite: small-model-coding-eval-v1

shell bash_code

Disk Guard Script

disk_guard_bash

Return only Bash code. Write a script that checks disk usage for /, prints a human-readable warning, and exits with status 1 when usage is above 85 percent. Requirements: include a shebang, use df -P /, parse the numeric percentage, and keep the script production-safe.

#!/usr/bin/env bashdf -P /85exit 1
python python_code

IPv4 Validator

ipv4_python_tests

Return only Python code. Write a function named is_valid_ipv4(value: str) -> bool and include exactly three pytest tests that cover a valid address, an out-of-range octet, and a non-numeric input.

def is_valid_ipv4def test_assertsplit('.')
ops shell_lines

Nginx Safe Reload

nginx_safe_reload

Return only Bash commands, one per line. Back up /etc/nginx/nginx.conf, validate nginx config, and reload nginx only if validation passes.

cp /etc/nginx/nginx.confnginx -tsystemctl reload nginx&&
planning four_numbered_steps

YAML Validator Plan

yaml_cli_plan

Return exactly four numbered steps. Plan a Python CLI that scans a git repo for changed YAML files, validates them against a JSON schema, and exits nonzero on failure.

1.2.3.4.JSON schemagit
debugging five_bullets

SSH Lockout Triage

ssh_lockout_triage

Return exactly five bullet points. After hardening, SSH started returning Permission denied (publickey,password). List the safest first checks before changing config. Mention sshd_config, authorized_keys, journalctl, rollback, and PasswordAuthentication.

sshd_configauthorized_keysjournalctlrollbackPasswordAuthentication
Models1small-model runs captured
Avg latency67.24 sacross all models
Avg tokens/sec4.6 tok/sthroughput comparison
Pass rate60.0%suite success average
Requirement hit-rate90.0%spec alignment average

Latency

Lower is better. Sorted fastest to slowest.

Qwen2.5 Coder 3B
67.24 s

Tokens Per Second

Higher is better. Sorted fastest to slowest.

Qwen2.5 Coder 3B
4.56 tok/s

Pass Rate

Higher is better. Derived from the small-model evaluation suite.

Qwen2.5 Coder 3B pass
60.0%

Requirement Hit-Rate

Higher is better. Measures how closely the answer matches the requested shape.

Qwen2.5 Coder 3B hit
90.0%

Per-Question Matrix

Each cell summarizes marker coverage and format fidelity for one model-question pair.

ModelDisk Guard ScriptIPv4 ValidatorNginx Safe ReloadYAML Validator PlanSSH Lockout Triage
Qwen2.5 Coder 3B
75.0%
no fmt • reply
100.0%
fmt • reply
75.0%
fmt • reply
100.0%
fmt • reply
100.0%
no fmt • reply
Show the evaluation prompts
  1. disk_guard_bash: Return only Bash code. Write a script that checks disk usage for /, prints a human-readable warning, and exits with status 1 when usage is above 85 percent. Requirements: include a shebang, use df -P /, parse the numeric percentage, and keep the script production-safe. [bash_code]
  2. ipv4_python_tests: Return only Python code. Write a function named is_valid_ipv4(value: str) -> bool and include exactly three pytest tests that cover a valid address, an out-of-range octet, and a non-numeric input. [python_code]
  3. nginx_safe_reload: Return only Bash commands, one per line. Back up /etc/nginx/nginx.conf, validate nginx config, and reload nginx only if validation passes. [shell_lines]
  4. yaml_cli_plan: Return exactly four numbered steps. Plan a Python CLI that scans a git repo for changed YAML files, validates them against a JSON schema, and exits nonzero on failure. [four_numbered_steps]
  5. ssh_lockout_triage: Return exactly five bullet points. After hardening, SSH started returning Permission denied (publickey,password). List the safest first checks before changing config. Mention sshd_config, authorized_keys, journalctl, rollback, and PasswordAuthentication. [five_bullets]

Model Rollup

Compact index of the captured model rows for quick cross-checking.