What It Does

Kiln is a Claude Code plugin that installs via claude plugin marketplace add Fredasterehub/kiln and orchestrates 34 named agents across a 7-step pipeline: Onboarding, Brainstorm, Research, Architecture, Build, Validate, and Report. It is implemented entirely as markdown agent definitions and shell scripts, requiring no external runtime, daemon, or npm dependencies beyond Claude Code itself and the system tools jq and Node.js 18+.

The pipeline uses Claude Code’s native team primitives (TeamCreate, SendMessage, TaskCreate/Update/List) to create persistent agent teams per pipeline step. After an interactive brainstorm phase (where a human approves a vision document), steps 3–7 run without human intervention. State is persisted in .kiln/STATE.md, and the /kiln-fire command resumes from the last recorded position after crashes or interruptions. Optionally, Codex CLI can be integrated to run GPT-5.4 alongside Claude Opus 4.6 for planning and code generation phases.

Key Features

7-step pipeline: Onboarding, Brainstorm, Research, Architecture, Build, Validate, Report — autonomous from Research onward
34 named agents with individual responsibilities, scoped file ownership, and behavioral boundaries (examples: Da Vinci for brainstorming facilitation, KRS-One for chunk scoping, Judge Dredd for QA tribunal, Argus for user-flow validation)
Persistent teams via TeamCreate — agents survive across full milestone scope without restarting
Worker cycling: fresh builder/reviewer pairs per implementation chunk; persistent “minds” (Rakim, Sentinel, Thoth) retain cumulative knowledge
Three-layer review: paired per-chunk reviewer, dual-model QA tribunal (Ken/Ryu with Denzel reconciliation), Argus user-flow validation (up to 3 correction cycles)
Just-in-time (JIT) scoping: KRS-One scopes each implementation chunk from current codebase state, not from a stale upfront plan
TDD built into build loop: builders apply RED-GREEN-REFACTOR by default, no flag required
Crash-proof state in .kiln/STATE.md with resume via /kiln-fire
Brownfield support via Alpha agent auto-detection and routing
Optional GPT-5.4 integration via Codex CLI for planning and code phases; full Claude-only fallback path available

Use Cases

Greenfield project development from conversation: Teams wanting to hand off a product vision and let the pipeline produce an architectural plan, implementation, and validation without intervening at each step
Exploring Claude Code agent team primitives: Developers who want a production example of TeamCreate/SendMessage/TaskCreate usage for building their own orchestration systems
Iterative full-pipeline testing: AI tooling researchers evaluating autonomous multi-agent development pipelines on real codebases

Adoption Level Analysis

Small teams (<20 engineers): Partial fit. Zero infrastructure overhead — installs as a plugin, runs in Claude Code. However, a full 7-step pipeline run across 34 agents on a non-trivial project will consume a significant number of Claude Opus 4.6 tokens, potentially hundreds of dollars per run at commercial rates. The pipeline is currently yellow-status (creator’s own label: “few edge cases remain”), meaning human correction is still likely necessary for production use. Suitable for experimentation or solo developers comfortable with early-stage tooling.

Medium orgs (20-200 engineers): Poor fit currently. No multi-developer coordination model, no audit logging of agent actions, no governance for what the agents execute, and no visibility into per-agent token spend. The pipeline’s assumption of a single orchestrated run does not map well to iterative team development workflows.

Enterprise (200+ engineers): Not suitable. No enterprise governance, centralized configuration management, access control, or compliance features. Enterprise teams would need to build significant wrapper infrastructure.

Alternatives

Alternative	Key Difference	Prefer when…
BMAD Method	Document-first spec-driven methodology with six agent personas; does not use Claude Code native agent team primitives	You want structured documentation artifacts (PRD, architecture, stories) and want to control each phase manually
Ralph Loop Pattern	Autonomous iterative loop against a PRD task list with context-reset; simpler single-agent model	You want lighter-weight autonomous looping without a 34-agent team structure
Claude Flow (Ruflo)	16+ agent roles, 314 MCP tools, shared memory; heavier orchestration framework	You want a more mature multi-agent framework with MCP integration and a larger community
Vibe Kanban	Local Kanban UI for orchestrating multiple Claude Code sessions in parallel worktrees; visual oversight	You want human-in-the-loop visual monitoring of parallel agents rather than autonomous pipeline execution
Composio Agent Orchestrator	Dual-layer orchestrator for parallel agent fleets with structured workflows	You need parallel multi-agent coordination with more flexible task decomposition

Evidence & Sources

Notes & Caveats

Yellow/work-in-progress status is the creator’s own label. The repository header explicitly states “pipeline stable, few edge cases remain” — which means edge cases remain. No changelogs or issue trackers document what those edge cases are.
Single-contributor risk. One GitHub contributor, no public maintainer profile, no organizational backing. Abandonment risk is high compared to BMAD Method (43.6k stars, active community) or Claude Flow (21.6k+ stars, multiple contributors).
No independent benchmarks or production case studies. 167 stars and 17 forks as of April 2026. No third-party reviews, blog posts, or documented production deployments surfaced in search. All capability claims originate from the repository README.
Token costs are not disclosed. A full 7-step pipeline run across 34 Claude Opus 4.6 agent sessions on a non-trivial codebase could easily consume tens of thousands of tokens across multiple context windows. There is no per-pipeline cost estimate in the documentation.
The “full autonomy” claim contradicts the review architecture. Three-layer review (paired reviewer, QA tribunal, Argus validation) exists because individual agent outputs cannot be trusted. This is appropriate engineering practice, but it means the autonomy claim should be read as “human-out-of-loop after brainstorm” not “correct by default.”
Claude Code agent team primitives are experimental. The platform features Kiln relies on (TeamCreate, SendMessage) are documented but flagged as experimental by Anthropic. Known issues include incompatibility with claude-code-action SDK session lifecycle, and absent shared persistent channels. Platform-level changes could break Kiln without notice.
No support for multi-developer workflows. The pipeline assumes a single orchestrator context. Concurrent use by multiple developers is not described and likely unsupported.
--dangerously-skip-permissions flag is recommended. The README recommends running with this flag to avoid interruption during autonomous steps, which disables Claude Code’s permission prompts for file writes and command execution. This represents a meaningful security surface expansion that teams should evaluate before deployment.

Kiln

At a Glance

What It Does

Key Features

Use Cases

Adoption Level Analysis

Alternatives

Evidence & Sources

Notes & Caveats

Related

Agent Swarm

Claude Flow (Ruflo)

Claude Northstar

desplega.ai