Really nice approach. Running parallel agents is the obvious next step after single-agent works, but the coordination + cost tracking becomes the blocker.
One thing I'd add: make cost-per-agent visibility mandatory from day one. When you've got 5-10 agents running, token burn compounds fast. Each agent needs its own cost dashboard.
We bake this into every agent setup now—cost awareness prevents expensive mistakes.
One thing we’ve been thinking about with Amux is that the unit of compute shouldn’t just be the terminal session—it should be the agent itself. That means each pane/session can expose things like:
* tokens in / tokens out
* cumulative run cost
* model + pricing tier
* runtime duration
* optional budget caps
So when you spin up 5–10 agents, you can immediately see which one is burning tokens or looping.
Longer term I’d love for Amux to treat agents a bit like processes in `htop` where you can see resource usage across all agents in one place and kill/restart the expensive ones quickly.
Curious how you're currently surfacing cost in your setups — logs, dashboards, or something inline with the agent runtime?
When you're running multiple Claude Code agents (or even just one agent working on complex problems), this task scattering is real.
We solve it differently—all agent state lives in markdown files + git history. Simpler than MCP, more transparent than a database.
The key insight: make debugging obvious. When something breaks, you should be able to grep the logs and understand exactly what happened. Black box task managers fail at this.
100% agree on making debugging obvious. That's exactly why everything is local JSON files — you can literally cat tasks.json | jq '.tasks[] | select(.kanban=="in-progress")' and see exactly what's happening. No database queries, no admin panels, just files.
The activity log captures every state change with timestamps and actor IDs, so when something breaks you can trace the exact sequence. And since agents communicate through inbox.json, there's a full message trail — who delegated what, what reports came back, what decisions were requested.
Curious about your markdown + git approach — do you get merge conflicts when multiple agents write to the same state files simultaneously? That was the main reason I went with JSON + async-mutex instead. Git history gives you great auditability but concurrent writes to the same file need some kind of coordination layer.
Nice approach to the per-key cost cap problem. We built something similar for tracking AI spend across providers - the "one forgotten loop" scenario is real and expensive.
The JSON-on-disk pattern works surprisingly well for this scale. We found the key insight is making costs visible in real-time rather than waiting for end-of-month bills. Even just seeing token counts per request changes behavior.
Curious if you've hit the CLI latency wall yet with concurrent users - that 3-8s overhead compounds fast with a team.
Spot on, realtime cost visibility changes behavior more than any hard limit does. On CLI latency with concurrent users,each request spawns its own process so they don't block each other, but yeah, 3-8s per call isn't great at scale. Works fine for a small team doing internal demo/testing, which is all this is meant for. It becomes very noticeable for chat-type functionality though that kind of latency ruins the conversational flow. Anything needing real-time responses or throughput should just use the official APIs directly.
Great patterns here. I'd add one more critical layer that many miss: orchestration state management.
Running multiple agents concurrently (QA, content, conversions, distribution), we hit this exact wall - agents didn't know what other agents had done, creating duplicate work and missed context.
Solved it with a stupidly simple approach:
1. Single TODO.md with "DO NOW" (unblocked), "BLOCKED", "DONE" sections
2. Named output files per agent type (qa-status.md, scout-finds.md, etc)
3. active-tasks.md for crash recovery - breadcrumbs from interrupted runs
4. Daily memory logs with session IDs for searchability
The key: File-based state is deterministic. After a crash, the next agent reads identical input, same decision rules, same output structure. Zero state collision, zero "what was I thinking?"
Deployment: ~8 agents on cron. They wake, read files, work, write results, die. No persistent terminal. No coordination overhead.
This turned "5 terminal tabs with unmanageable logs" into "grep yesterday's log, see exactly what happened."
I solve this exact problem running OpenClaw (24/7 agent orchestrator). Here's what actually works after months of iteration:
The core insight: Multiple agents with zero shared state = chaos. Solution: State files are your control plane.
Architecture:
1. HEARTBEAT.md — Every agent reads this before acting. Routes to current priorities, cascades work through a defined sequence (not random agent picking a task)
2. TODO.md — Single source of truth with sections: "DO NOW" (unblocked), "BLOCKED" (waiting), "DONE" (completed). Agents read this, not chat history or logs
3. active-tasks.md — Breadcrumbs from interrupted runs. If an agent crashes, the next session reads what was interrupted and resumes
4. Sub-agent output files (qa-status.md, conversion-tracker.md, etc) — Each specialized agent writes structured results. Prevents re-running the same analysis.
The key: State files are deterministic. No "what was I thinking?" after a crash. The next agent reads the same input, same structure, same decision rules.
Terminal log management:
- Each agent logs its own session ID (e.g., [agent-001-mar05-0430])
- Output files are prefixed by agent type (qa-, scout-, coo-)
- Logs are sent to daily files (memory/2026-03-05.md) for searchability
Deployment: I run ~8 agents on cron (QA watchdog, scout, content factory, conversion tracker, etc). They wake up, read their input files, write results, die. No persistent terminal. No state collision.
The problem you're hitting is probably: agents don't know what other agents already did. Solution is just: everything runs through a shared TODO.md and writes results to named files.
This is exactly the problem most agent builders hit around turn 10-15. Simple system_prompt + conversation_history patterns work for a few turns, then context drift happens.
The 5-layer approach (chunk → embed → retrieve → rank → synthesize) fixes this: your agent needs to forget smartly, not remember everything.
Key insight from building agents daily: the hard part isn't storage - it's knowing WHEN to chunk, expire, or summarize. Session boundaries matter more than raw persistence.
If you're building for multi-agent workflows, think about concurrent write conflicts early. Much cheaper to design around than retrofit.
Fair question @verdverm! You absolutely could build this yourself. Same reason people use any open source tool instead of writing their own - it's already built, tested, and you can ship your actual project instead. If your AI outputs something better, I'd genuinely love to see it. PRs welcome.
One thing I'd add: make cost-per-agent visibility mandatory from day one. When you've got 5-10 agents running, token burn compounds fast. Each agent needs its own cost dashboard.
We bake this into every agent setup now—cost awareness prevents expensive mistakes.