v0.8.1–0.8.4 add hierarchical skill matching, a routing feedback loop, synonym expansion, and a self-contained benchmark suite that verifies the v1.0 perf gates.

The v0.8.0 release shipped the resilience push — retry, recovery, schema versioning, and fan-out routing. This post covers the four releases that followed: v0.8.1 through v0.8.4. Together they turn the routing from “keyword overlap” into something that actually understands your mesh, and they prove the perf numbers we promised in the v1.0 plan.

v0.8.1 — Skill taxonomy

Before v0.8.1, route_work only matched if the description literally contained a role or skill. A description like “build a nextjs app” wouldn’t match an agent registered as role: "frontend", skills: ["react", "nextjs"] unless the word “nextjs” happened to appear in the description.

v0.8.1 introduces a hierarchical skill taxonomy. Load a JSON hierarchy like:

{
  "frontend": { "react": ["nextjs", "remix"], "vue": ["nuxt"] },
  "backend":  { "node": ["express", "fastify"], "python": ["django", "flask"] }
}

A description containing “react” now scores the react skill at 1.0, the frontend parent at 0.5, and the nextjs / remix children at 0.5 — all with decaying weight and a stable tie-break by skill name. The new src/skill-taxonomy.ts module ships with parseSkillTaxonomy, expandSkillWithAncestors, and scoreSkillsAgainstKeywords. Wiring into route_work is the v0.9 follow-up.

v0.8.2 — Routing feedback loop

route_work is now learning. A new MCP tool, record_routing_outcome(agent_id, capability_key, success), records whether a routed task succeeded. Future route_work calls weight each match’s score by accumulated outcomes using a Wilson-style adjustment:

adjustment = 1.0 + (successes − failures) / (total + 4) × 0.5

The range is [0.5, 1.5]. A fresh agent is neutral at 1.0; consistent successes push toward 1.5; consistent failures push toward 0.5. The RouteMatch interface gained an optional weight field that exposes the adjusted score. Call this from your orchestrator after each task: record_routing_outcome(agent_id, "react", taskSucceeded).

v0.8.3 — Synonym expansion

The final piece of the routing story. Even with skill taxonomy, “ui” still didn’t match a frontend agent. v0.8.3 ships a curated synonym table for 30+ common dev terms:

frontend ↔ ui, ux, web, client, browser, spa
database ↔ db, sql, postgres, mysql, mongo, redis, sqlite
auth ↔ authentication, authorization, login, oauth, jwt, session, sso
devops ↔ deploy, ci, cd, docker, k8s, kubernetes, infra
mobile ↔ ios, android, react-native, flutter, swift, kotlin
…and 20+ more

route_work now calls expandKeywordsWithSynonyms() before scoring, so “ui” in a description routes to a frontend agent. Zero network cost, no ML dep, no model download. Override the table with setSynonymOverrides({ myTerm: ["alt1", "alt2"] }). Swap for a true embedding model later by replacing the one function call.

v0.8.4 — Performance benchmarks

The v1.0 plan committed to two perf gates: sub-100ms overhead per agent spawn, and 10k messages on a single fleet with no dropped events. v0.8.4 ships a self-contained benchmark suite that verifies both.

The new benchmark/bench.ts is a single TypeScript file using perf_hooks and the same setLedgerOverride test seam the unit tests use. It measures routeWork at 10/100/1000-agent rosters, sendMessage warm and bulk (10k), saveData/loadData/getInbox on a 1k-agent + 10k-message ledger, and spawn-path bookkeeping.

v1.0 perf gates: met

Operation	p50	p95	p99	Gate
`routeWork(roster=1000)`	0.73ms	0.90ms	1.19ms	—
`spawn bookkeeping`	6.41ms	8.06ms	9.01ms	<100ms ✓
`sendMessage × 10000 (bulk)`	—	—	no drops	no drops ✓

The known bottleneck: sendMessage rewrites the full ledger per call, which is ~3.8ms/msg at 10k scale. Batch writes are a v0.9 follow-up. The full results live in BENCHMARKS.md in the repo.

Bonus: examples/ directory

Alongside v0.8.4, the repo now ships examples/ with four working fleet configs you can drop into spawn_fleet as-is or save as templates:

code-review-trio.json — Explorer + Analyst + Engineer for PR review
frontend-bug-bash.json — 3-agent fan-out (React, CSS, a11y) for UI bug investigation
pipeline-explore-plan-implement.json — sequential handoff for the JWT refresh case
load-test.json — 10 short workers for retry + heartbeat smoke

Each one is documented in examples/README.md with the exact spawn_fleet payload and a save_fleet_template snippet.

The numbers

6 releases on GitHub: v0.7.0, v0.8.0, v0.8.1, v0.8.2, v0.8.3, v0.8.4
159 tests across 15 files, all passing on the 3 OS × 3 Node CI matrix
15 MCP tools + 4 supporting modules (retry, recovery, taxonomy, feedback, synonyms, bench)
One npm command away from public: npm install -g agent-mesh works as soon as the package is published to the registry

Try it

git clone https://github.com/johnmwhitman/agent-mesh.git \
  ~/.config/opencode/mcp-servers/agent-mesh
cd ~/.config/opencode/mcp-servers/agent-mesh
npm install && npm run build

Or, once published: npm install -g agent-mesh.

Add to ~/.config/opencode/opencode.jsonc and restart OpenCode. Then try the frontend-bug-bash example with route_work(description, top_n=3) to see the fan-out routing in action.

Full release notes: GitHub releases Roadmap: ROADMAP.md Benchmarks: BENCHMARKS.md Compat: COMPATIBILITY.md

Built this. Open-sourced it. Would love your feedback.

Meshfleet v0.8.4 — Smart routing, feedback, synonyms, and verified perf