Meshfleet v0.4.0 — Resilience
Per-fleet timeouts, a structured event log, and list_fleets. The mesh is starting to watch its own back.
Meshfleet v0.4.0 — Resilience
We just shipped v0.4.0 of the Agent Mesh MCP server. Three new tools, one new observability surface, and a meaningful change to the timeout model. This post walks through what changed and why.
What’s new
1. Per-fleet timeouts
Until v0.4.0, the only way to control how long an agent could run was the global AGENT_MESH_AGENT_TIMEOUT_MS env var. That works fine when you want a uniform timeout across all your work. It fails the moment you have a heterogeneous workload — a quick lint agent that should time out in 30 seconds, alongside a deep-reasoning agent that needs the full 30 minutes.
set_fleet_timeout solves that:
await callTool("set_fleet_timeout", {
fleet_id: "lint-fleet",
timeout_ms: 30_000,
});
After this call, agents in lint-fleet get killed after 30 seconds. Other fleets in the same MCP server instance are unaffected. The effective timeout for each fleet is resolved as: per-fleet override → env var → 30-minute default. Inspect with get_fleet_timeout_ms(fleet_id).
2. Structured event log
We’ve added an append-only NDJSON log at ~/.config/opencode/agent-mesh.events.log. Every fleet_created, agent_spawned, fleet_timeout_set, and spawn_fleet_called event is now written to it. Format:
{"event":"fleet_created","fleet_id":"abc-123","timestamp":1751347200000}
{"event":"agent_spawned","fleet_id":"abc-123","agent_id":"a1","role":"Explorer","timestamp":1751347201000}
{"event":"agent_spawned","fleet_id":"abc-123","agent_id":"a2","role":"Analyst","agent_file":"oracle","timestamp":1751347201100}
This is the foundation for the upcoming v0.5.0 fleet inspector (CLI/TUI). For now, you can tail -f the log to watch fleets spawn in real time, or jq it to query historical runs.
The log is intentionally minimal — just the events, no PII, no prompt contents, no agent output. If you want to add observability without leaking data, this is a safe baseline.
3. list_fleets tool
Until v0.4.0, you could only inspect one fleet at a time via fleet_status({ fleet_id }). The new list_fleets tool returns a summary of every fleet the MCP server knows about:
const { fleets } = await callTool("list_fleets", {});
// → {
// fleets: [
// { id: "abc-123", status: "complete", agent_count: 3, agents_complete: 3, agents_failed: 0, agents_running: 0, ... },
// { id: "def-456", status: "running", agent_count: 2, agents_complete: 1, agents_failed: 0, agents_running: 1, ... },
// ]
// }
Each summary includes agent counts broken down by status (complete, failed, running). The first thing we built with this was a “what’s running right now?” check in our own internal tooling.
4. Auto-emit events from core paths
createFleet, spawn_fleet, and set_fleet_timeout all emit structured events to the log now. This is what powers the log above — but it also means external tools (custom dashboards, alerting, audit pipelines) can subscribe to the file directly and know exactly when fleets come and go.
Breaking changes
None. v0.3.0 callers continue to work unchanged. The new Fleet.timeout_ms field is optional and defaults to “use the env var.”
What’s next
v0.5.0 is where the mesh gets real-time. The plan:
- Heartbeat / watchdog — emit periodic heartbeat events; auto-fail agents that miss N heartbeats (this is the one piece left over from the v0.4 resilience push)
- SSE push notifications —
subscribe_inbox(agent_id, callback)for real-time message delivery instead of polling - Fleet events — emit events on fleet start / agent complete / fleet complete, building on the event log
- CLI inspector —
npx agent-mesh inspect <fleet_id>shows a live TUI of running agents
Try it
cd ~/.config/opencode/mcp-servers/agent-mesh
git pull
npm install
npm run build
Restart OpenCode. Check the new tools:
// Per-fleet timeout
await callTool("set_fleet_timeout", { fleet_id: "abc-123", timeout_ms: 60_000 });
// List all your fleets
const { fleets } = await callTool("list_fleets", {});
// Watch the event log
// tail -f ~/.config/opencode/agent-mesh.events.log
36 tests passing, MIT licensed, no telemetry, no cloud. Read the full spec →
— The Meshfleet team