Cockpit
Server management, systemd, logs, terminal
Uptime Kuma
6 monitors — HTTP + Push, Telegram alerts
Beszel
Historical metrics, 5 alerts configured
Services (9/9)
kairos-wActive
kairos-apiActive
kairos-bridgeActive
kairos-dispatcherActive
kairos-loopActive
kairos-proactiveActive
openclaw-gatewayActive
caddyActive
uptime-kumaActive
Resources
RAM1.5 GB / 16 GB (10%)
CPULoad 0.33 / 2 vCPU
Disk14 GB / 35 GB (41%)
Memory-Optimized droplet · $84/mo · DigitalOcean
Beszel Alerts
CPU > 80%sustain 5 min
Memory > 90%sustain 5 min
Disk > 85%sustain 5 min
Status Downsustain 1 min
LoadAvg5 > 2.0sustain 10 min
Network & Ports
Caddy (HTTPS)443
kairos-api7070 (0.0.0.0)
kairos-bridge8765
clawmetry8080 (0.0.0.0)
openclaw-gateway18789
cockpit9090
uptime-kuma3001
beszel8090
OpenClaw
StatusHardened (S231)
Agentsmain, optimizer, test-engineer, auditor
Memory Cap1.5 GB (MemoryMax)
Smoke Test5/5 pass (6AM daily)
Health127.0.0.1:18789/health
Architecture Stack
Kairos WVPS runtime (6 services)
ANVILGovernance spine
OpenClawAgent execution layer
ManifestPort mismatch
FORGEMac M4 Pro (build machine)
VPS — Service Management
Check all services
systemctl list-units 'kairos-*' --no-pagerOpenClaw status
systemctl --user status openclaw-gatewayRestart kairos-w
sudo systemctl restart kairos-wRestart OpenClaw
systemctl --user restart openclaw-gatewayOpenClaw logs (live)
journalctl --user -u openclaw-gateway -f --no-pagerKairos-w logs (50 lines)
sudo journalctl -u kairos-w -n 50 --no-pagerVPS — Health & Monitoring
OpenClaw health
curl -s http://127.0.0.1:18789/health | python3 -m json.toolRun smoke test
bash ~/scripts/openclaw-smoke-test.shHeartbeat status
cat ~/nightagent-sync/ops/health/openclaw-health.jsonResource snapshot
free -h && echo "---" && df -h / && echo "---" && uptimeTop processes
ps aux --sort=-%mem | head -10ANVIL — Pipeline
Overnight pipeline
python3 anvil/scripts/overnight_loop.py --base-dir .Dry-run pipeline
python3 anvil/scripts/overnight_loop.py --base-dir . --dry-runCompile context
python3 anvil/scripts/compile_context.py --base-dir .Read briefing
python3 anvil/scripts/read_briefing.py --base-dir .Emit outcome
python3 anvil/scripts/emit_outcome.py --base-dir . --summarizeTrust recalculate
python3 anvil/scripts/trust_engine.py --ledger anvil/runtime/trust/trust_ledger.json --recalcANVIL — Testing
ANVIL tests
python3 -m pytest anvil/tests/ -vAPEX Router tests
cd apex-router && python3 -m pytest tests/ -vConfig drift check
python3 anvil/scripts/check_config_drift.pyDedupe patterns
python3 anvil/scripts/dedupe_patterns.py anvil/runtime/patterns/patterns.jsonlSSH & Deployment
SSH to VPS
ssh -i .cowork-secrets/vps_ed25519 ubuntu@134.209.50.232Backup file (pre-op)
bash scripts/backup_file.sh kairos-vps <path>Sync diff check
diff <(ls ~/nightagent/anvil/scripts/*.py | xargs -n1 basename) <(ls ~/nightagent-sync/anvil/scripts/*.py | xargs -n1 basename)Open Threads — High Priority
Lock down open portsHIGH
Bind kairos-api (7070) and clawmetry (8080) to 127.0.0.1 or put behind Caddy with auth.
Manifest tier routingHIGH
Debug port mismatch (NestJS expects 3001, config says 2099). Populate tier assignments. Test 5 delegations.
Open Threads — Medium
Unify Python runtimeMED
Standardize 6 services on nightagent/.venv/bin/python (3 runtimes currently).
VPS cleanupMED
Execute cleanup commands from s231-optimizer-cleanup-findings.md (~250MB reclaimable).
Log rotationMED
6 service logs append to flat files, no visible logrotate config.
Investigate clawmetryMED
9.8% CPU sustained — assess if still necessary.
Recent Sessions
2026-03-20 — S231
Architecture Review Deep Dive
4 deliverables (review summary, service map, capacity plan, decision log). Beszel 5 alerts configured. Manifest investigation — port mismatch blocker found. OpenClaw systemd hardened (MemoryMax, journald, EnvironmentFile). Daily smoke test deployed (5/5 pass). Cleanup report: 250MB reclaimable.
2026-03-19 — S230
Sprint: 6 Items + VPS Resize
OpenClaw emergency config fix. Email intel integration. VPS resize 8→16GB. loop_runner regex fix. 87 new tests.
2026-03-18 — S229
E2E Delegation Pipe
Delegation pipe fix. Classifier tightening. KHS refactor. WCR hardening. Manifest install. 87 tests.
2026-03-17 — S228
Heartbeat & Deploy Fix
Heartbeat monitor fix (iptables). Deploy bundle sync. Delegation classifier health keyword fix.
Key Discoveries (S231)
3 different Python runtimes across 6 Kairos services
Ports 7070 & 8080 bind 0.0.0.0 (internet-accessible)
Node.js + cgroup MemoryHigh = V8 GC OOM (never use MemoryHigh on Node)
Manifest API: NestJS expects 3001, config says 2099
Beszel agent was already connected — just needed alerts
Test Health
ANVIL371 pass
ANVIL (pre-existing fail)14
APEX Router134 pass
APEX Router (pre-existing fail)16
OpenClaw delegation62 pass
Regressions0