What It Does
Kiln is a Claude Code plugin that installs via claude plugin marketplace add Fredasterehub/kiln and orchestrates 34 named agents across a 7-step pipeline: Onboarding, Brainstorm, Research, Architecture, Build, Validate, and Report. It is implemented entirely as markdown agent definitions and shell scripts, requiring no external runtime, daemon, or npm dependencies beyond Claude Code itself and the system tools jq and Node.js 18+.
The pipeline uses Claude Code’s native team primitives (TeamCreate, SendMessage, TaskCreate/Update/List) to create persistent agent teams per pipeline step. After an interactive brainstorm phase (where a human approves a vision document), steps 3–7 run without human intervention. State is persisted in .kiln/STATE.md, and the /kiln-fire command resumes from the last recorded position after crashes or interruptions. Optionally, Codex CLI can be integrated to run GPT-5.4 alongside Claude Opus 4.6 for planning and code generation phases.
Key Features
- 7-step pipeline: Onboarding, Brainstorm, Research, Architecture, Build, Validate, Report — autonomous from Research onward
- 34 named agents with individual responsibilities, scoped file ownership, and behavioral boundaries (examples: Da Vinci for brainstorming facilitation, KRS-One for chunk scoping, Judge Dredd for QA tribunal, Argus for user-flow validation)
- Persistent teams via
TeamCreate— agents survive across full milestone scope without restarting - Worker cycling: fresh builder/reviewer pairs per implementation chunk; persistent “minds” (Rakim, Sentinel, Thoth) retain cumulative knowledge
- Three-layer review: paired per-chunk reviewer, dual-model QA tribunal (Ken/Ryu with Denzel reconciliation), Argus user-flow validation (up to 3 correction cycles)
- Just-in-time (JIT) scoping: KRS-One scopes each implementation chunk from current codebase state, not from a stale upfront plan
- TDD built into build loop: builders apply RED-GREEN-REFACTOR by default, no flag required
- Crash-proof state in
.kiln/STATE.mdwith resume via/kiln-fire - Brownfield support via Alpha agent auto-detection and routing
- Optional GPT-5.4 integration via Codex CLI for planning and code phases; full Claude-only fallback path available
Use Cases
- Greenfield project development from conversation: Teams wanting to hand off a product vision and let the pipeline produce an architectural plan, implementation, and validation without intervening at each step
- Exploring Claude Code agent team primitives: Developers who want a production example of
TeamCreate/SendMessage/TaskCreateusage for building their own orchestration systems - Iterative full-pipeline testing: AI tooling researchers evaluating autonomous multi-agent development pipelines on real codebases
Adoption Level Analysis
Small teams (<20 engineers): Partial fit. Zero infrastructure overhead — installs as a plugin, runs in Claude Code. However, a full 7-step pipeline run across 34 agents on a non-trivial project will consume a significant number of Claude Opus 4.6 tokens, potentially hundreds of dollars per run at commercial rates. The pipeline is currently yellow-status (creator’s own label: “few edge cases remain”), meaning human correction is still likely necessary for production use. Suitable for experimentation or solo developers comfortable with early-stage tooling.
Medium orgs (20-200 engineers): Poor fit currently. No multi-developer coordination model, no audit logging of agent actions, no governance for what the agents execute, and no visibility into per-agent token spend. The pipeline’s assumption of a single orchestrated run does not map well to iterative team development workflows.
Enterprise (200+ engineers): Not suitable. No enterprise governance, centralized configuration management, access control, or compliance features. Enterprise teams would need to build significant wrapper infrastructure.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| BMAD Method | Document-first spec-driven methodology with six agent personas; does not use Claude Code native agent team primitives | You want structured documentation artifacts (PRD, architecture, stories) and want to control each phase manually |
| Ralph Loop Pattern | Autonomous iterative loop against a PRD task list with context-reset; simpler single-agent model | You want lighter-weight autonomous looping without a 34-agent team structure |
| Claude Flow (Ruflo) | 16+ agent roles, 314 MCP tools, shared memory; heavier orchestration framework | You want a more mature multi-agent framework with MCP integration and a larger community |
| Vibe Kanban | Local Kanban UI for orchestrating multiple Claude Code sessions in parallel worktrees; visual oversight | You want human-in-the-loop visual monitoring of parallel agents rather than autonomous pipeline execution |
| Composio Agent Orchestrator | Dual-layer orchestrator for parallel agent fleets with structured workflows | You need parallel multi-agent coordination with more flexible task decomposition |
Evidence & Sources
- Kiln GitHub Repository (MIT, v1.4.0)
- Claude Code Agent Teams Documentation
- Agent teams unusable in claude-code-action due to SDK session lifecycle (known platform limitation)
- Shared channel for agent teams — pending platform feature
- From Tasks to Swarms: Agent Teams in Claude Code (alexop.dev)
Notes & Caveats
- Yellow/work-in-progress status is the creator’s own label. The repository header explicitly states “pipeline stable, few edge cases remain” — which means edge cases remain. No changelogs or issue trackers document what those edge cases are.
- Single-contributor risk. One GitHub contributor, no public maintainer profile, no organizational backing. Abandonment risk is high compared to BMAD Method (43.6k stars, active community) or Claude Flow (21.6k+ stars, multiple contributors).
- No independent benchmarks or production case studies. 167 stars and 17 forks as of April 2026. No third-party reviews, blog posts, or documented production deployments surfaced in search. All capability claims originate from the repository README.
- Token costs are not disclosed. A full 7-step pipeline run across 34 Claude Opus 4.6 agent sessions on a non-trivial codebase could easily consume tens of thousands of tokens across multiple context windows. There is no per-pipeline cost estimate in the documentation.
- The “full autonomy” claim contradicts the review architecture. Three-layer review (paired reviewer, QA tribunal, Argus validation) exists because individual agent outputs cannot be trusted. This is appropriate engineering practice, but it means the autonomy claim should be read as “human-out-of-loop after brainstorm” not “correct by default.”
- Claude Code agent team primitives are experimental. The platform features Kiln relies on (
TeamCreate,SendMessage) are documented but flagged as experimental by Anthropic. Known issues include incompatibility withclaude-code-actionSDK session lifecycle, and absent shared persistent channels. Platform-level changes could break Kiln without notice. - No support for multi-developer workflows. The pipeline assumes a single orchestrator context. Concurrent use by multiple developers is not described and likely unsupported.
--dangerously-skip-permissionsflag is recommended. The README recommends running with this flag to avoid interruption during autonomous steps, which disables Claude Code’s permission prompts for file writes and command execution. This represents a meaningful security surface expansion that teams should evaluate before deployment.