What It Does
Caveman is a Claude Code skill — a small Agent Skills-formatted package — that instructs the AI agent to respond in minimal, caveman-style language. It strips filler phrases (“I’d be happy to help…”), hedging language, articles (a, an, the), and pleasantries from prose responses while leaving code blocks, technical terms, error messages, file paths, commands, and URLs completely unchanged. The result is shorter, denser responses that the project claims preserve full technical accuracy.
The project also ships a companion Python utility called caveman-compress that applies similar compression to CLAUDE.md project memory files, reducing the input token cost of loading project context at session start. The tool creates a backup (CLAUDE.original.md) before overwriting, which is a responsible design choice given the risk of lossy compression.
Key Features
- Three compression levels: Lite (minimal filler removal, grammatically coherent), Full (default; dropped articles, fragment sentences), Ultra (maximum compression with abbreviations)
- Selective preservation: Code blocks, inline code, technical terms, error messages, URLs, file paths, and commit messages are explicitly excluded from compression
- Cross-agent compatibility: Packaged as an Agent Skills module; activates via
npx skills add JuliusBrussee/cavemanand works across Claude Code, GitHub Copilot, Cursor, Windsurf, Cline, and 35+ other agents. Also available as a Codex plugin ($cavemantrigger) - Natural language triggers: Activated by
/caveman, “talk like caveman,” “caveman mode,” or “less tokens please.” Deactivated with “stop caveman” or “normal mode” - Caveman Compress companion tool: Python utility that compresses CLAUDE.md input context files, self-reporting ~45% reduction in project memory file token counts
- Reasoning token agnostic: Caveman affects only output prose — Claude’s reasoning/thinking tokens (if extended thinking is enabled) are not reduced
Use Cases
- Interactive CLI sessions: Developers using Claude Code interactively who want shorter, faster responses during debugging or exploration sessions. Output tokens are a meaningful fraction of cost and latency in non-agentic interactive use.
- High-volume developer tooling: Teams running many short Claude Code sessions per day where output verbosity is a noticeable cost factor.
- Learning token dynamics: Teams wanting a concrete, installable demonstration of how output verbosity affects token counts — useful as an educational tool even if not production-deployed.
Adoption Level Analysis
Small teams (<20 engineers): Marginally fits — mostly as a developer quality-of-life preference rather than a cost reduction mechanism. Token savings are real but modest in absolute terms for small teams.
Medium orgs (20–200 engineers): Does not meaningfully fit. At this scale, input token accumulation across long agent conversations, tool call results, and context windows dominates cost — not output verbosity. Caveman addresses the wrong part of the token budget for agentic workloads.
Enterprise (200+ engineers): Does not fit. Enterprise LLM cost optimization requires gateway-level routing, caching, and model tiering — not style constraints on individual sessions. Caveman is not a substitute for structured cost governance.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| LiteLLM | Gateway-level token budget enforcement, model routing, cost tracking | You need organization-wide token cost control |
| Prompt engineering (system prompt) | Craft a concise system prompt once per deployment | You want brevity without installing a skill dependency |
| LLM Gateway Pattern | Architectural pattern for proxy-based cost governance | You need cross-team, cross-model cost enforcement |
| LLMlingua (Microsoft) | Algorithmic prompt compression preserving semantic information | You need input token compression with measurable accuracy guarantees |
Evidence & Sources
- GitHub repository — JuliusBrussee/caveman
- Hacker News thread — community discussion and criticisms
- Caveman companion site with benchmark data
- Brevity Constraints Reverse Performance Hierarchies in Language Models (arXiv:2604.00025) — cited by the project; found brevity constraints improved accuracy by 26pp on certain benchmarks
- SimpleNews coverage
Notes & Caveats
- Self-reported benchmarks only. The author disclosed on Hacker News that the headline “~75%” figure (later revised to “~65% average”) “needs proper benchmarking before credibility.” All benchmark data is from a single run with no variance statistics, baseline controls, or independent replication.
- Output tokens are usually not the bottleneck. In agentic Claude Code workflows, the input context window (tool call results, file contents, conversation history) dominates token costs — not output verbosity. Multiple Hacker News commenters flagged this as the fundamental limitation of the approach. The Caveman Compress tool partially addresses this, but also carries the risk of lossy compression degrading agent behavior.
- Style constraints may affect reasoning quality. Constraining an LLM to respond in a particular style can reduce the quality of multi-step reasoning — the model may “think” in fewer tokens than optimal. The cited arXiv paper (2604.00025) does support brevity improving accuracy in some cases, but that paper studied decoding-level constraints, not style-mimicry prompt instructions, so applicability is indirect.
- Compression of CLAUDE.md is a human DX risk. Compressed caveman-style project instructions are harder for humans to read, maintain, and debug when agent behavior deviates. The backup file mitigates data loss but not cognitive load.
- Started as a joke. The author explicitly described the project as joke-originated on Hacker News. It has since gained genuine traction (coverage in multiple tech media outlets) but the engineering rigor expected of a production cost-reduction tool is not present.
- No security implications. Caveman is a markdown skill file with no executable code in the core skill. The caveman-compress tool is a Python script that modifies local files — read the source before running it.