skrun: Turning Agent Skills into REST APIs
skrun: Turning Agent Skills into REST APIs
skrun solves a genuine friction point — bridging the SKILL.md authoring experience to a callable HTTP endpoint — but ships today as a local-only tool with no cloud runtime, no authentication layer, and no isolation guarantees that matter for production.
The Problem
SKILL.md has become the lingua franca of AI agent capabilities. Anthropic introduced it, OpenAI adopted it for Codex CLI, Microsoft shipped it in GitHub Copilot, and as of early 2026 there are over 490,000 skills published across three major marketplaces. The format works well inside AI coding assistants: Claude Code discovers skills on the filesystem, loads them on demand, and executes them in a VM with full bash access. The problem is that the design is entirely local and model-specific.
If you want to expose a skill as a service — callable by a CI pipeline, another agent, a backend job, or a third-party integration — you have to wrap it yourself. That means writing a server, handling model API keys, normalizing inputs and outputs, managing conversation state between runs, and wiring up fallback logic when a model is unavailable. None of that is hard, but it is all boilerplate that every team reinvents independently.
skrun’s pitch is that this boilerplate should not need to be reinvented. Define the skill, write an agent.yaml that declares the I/O contract, and get a POST /run endpoint back.
What It Is
skrun is an open-source TypeScript CLI and local development server that turns agent skills into typed REST APIs. The core primitive is agent.yaml: a configuration file that sits alongside a SKILL.md file and declares the model to use, the input schema, the output schema, permission boundaries, state persistence, and test cases. The CLI generates a local HTTP server that accepts POST /run with a JSON body matching the input schema and returns a JSON object matching the output schema.
The framework is model-agnostic by design. A single agent.yaml can specify a primary model (say, Gemini 2.5 Flash) and a fallback chain (same model, different provider endpoint). It supports Anthropic, OpenAI, Google, Mistral, and Groq. The fallback mechanism is automatic — if the primary call fails or times out, the runtime retries against the fallback configuration without caller involvement.
State is handled through a key-value store local to each agent. The SEO audit example uses this to compare the current run’s score against the previous run’s score, storing last_score, last_audit_date, and audit_count with a 30-day TTL. The state is scoped per agent name, not per caller, which means concurrent callers share state. That is a footgun for anything beyond single-user use cases.
The architecture anticipates cloud deployment through a RuntimeAdapter interface, but v0.1 ships no cloud runtime. skrun deploy exists as a CLI command but its behavior depends on infrastructure that is not yet public.
How It Works
agent.yaml: The Contract Layer
The agent.yaml is where skrun does its real work. It separates the what (SKILL.md: instructions, workflow, examples) from the how (agent.yaml: model, schema, permissions, state, tests). Here is a representative example from the code-review agent:
name: dev/code-review
version: 1.0.0
model:
provider: google
name: gemini-2.5-flash
# fallback: retries the same model on transient failure
fallback:
provider: anthropic
name: claude-sonnet-4-5
inputs:
- name: code
type: string
required: true
description: The code to review
outputs:
- name: review
type: string
- name: issues
type: array
- name: score
type: number
permissions:
network: [] # no outbound network calls
filesystem: read-only
secrets: []
runtime:
timeout: 60s
sandbox: strict # sandbox mode (local isolation only)
state:
type: none
ttl: 30d
tests:
- name: basic-review
input:
code: "function add(a, b) { return a + b; }"
assert: output.score >= 0
The permissions block is declarative intent, not enforcement. In the local runtime, nothing prevents the model from making network calls if the instructions ask it to. sandbox: strict signals intent but the actual enforcement depends on the runtime adapter — which, in v0.1, is the local environment.
MCP Server Integration
The web-scraper example shows how skrun delegates browser automation to an MCP server rather than handling it directly. The agent.yaml specifies:
mcp_servers:
- name: browser
transport: stdio
command: npx
args:
- "-y"
- "@playwright/mcp"
- "--headless"
The skrun runtime starts the MCP server as a child process via stdio transport, then exposes its tools to the LLM during the run. This is the same pattern Claude Desktop uses for MCP integration. The implication is that skrun agents can consume any MCP-compatible tool without bespoke SDK work — including file systems, databases, browser automation, and custom internal tools.
State Persistence
The SEO audit example demonstrates stateful agents. The agent.yaml defines:
state:
type: kv
ttl: 30d
The SKILL.md can reference a _state variable that persists across invocations. The agent reads _state.last_score on entry and writes back an updated score at completion. The runtime handles serialization and TTL-based expiry. In practice this is a JSON file on disk in the local runtime — simple, fragile at concurrency, and not suitable for distributed deployments.
The Development Loop
# Initialize a new agent (creates agent.yaml + SKILL.md scaffold)
skrun init my-agent
# Import an existing skill
skrun init --from-skill path/to/SKILL.md
# Start local dev server with hot-reload on http://localhost:3000
skrun dev
# Run declared tests
skrun test
# Package as .agent bundle
skrun build
# Build + push + return live URL (cloud runtime not yet available)
skrun deploy
The hot-reload development server means you can edit SKILL.md and immediately call POST /run against the local endpoint to see the effect. The skrun test command executes the test cases declared in agent.yaml against the live model — these are end-to-end tests, not unit tests. They cost real tokens per run.
Calling the API
Once running, agents respond to:
POST /api/agents/{namespace}/{agent-name}/run
Authorization: Bearer <token>
Content-Type: application/json
{
"code": "function add(a, b) { return a + b; }"
}
Response:
{
"review": "Function is correct but lacks input validation...",
"issues": ["no type checking", "no error handling for NaN"],
"score": 72
}
The typed I/O contract is the key value here. Callers do not need to parse LLM output — skrun coerces the model’s response into the declared output schema.
In Practice
The primary audience for skrun today is developers who already have SKILL.md files and want to expose them as HTTP endpoints without writing a server. If you have built a collection of skills for Claude Code or Codex CLI, skrun gives you a way to call them from non-agent contexts — CI jobs, backend services, other agents — without duplicating the logic.
The multi-model fallback is genuinely useful for teams that want model-agnostic resilience. Rather than binding a skill to one provider, you can declare a preference chain and let the runtime handle switching. This is hard to get right manually — rate limits, transient failures, and API versioning all create edge cases.
The MCP integration is the most architecturally interesting feature. It means a skrun agent can consume a Playwright browser, a database query tool, or an internal API via MCP without custom SDK work. The skill focuses on what to do; MCP handles the how.
When to Use It
- You have existing SKILL.md files and want a lightweight HTTP wrapper without writing a custom server. The import path (
skrun init --from-skill) is the strongest use case. - You need typed output schemas from LLM calls. The input/output contract enforcement eliminates downstream JSON parsing code.
- You want multi-model fallback without building the retry logic yourself. Useful when you cannot tolerate dependency on a single provider.
- Local automation pipelines — CI hooks, developer tooling, internal scripts — where the local-only deployment model is acceptable.
- Prototyping a skill-as-service concept before building production infrastructure around it.
When NOT to Use It
- Public-facing APIs. There is no authentication beyond a static bearer token, no rate limiting, no request validation beyond schema type checking. Exposing a skrun endpoint to the internet is straightforwardly unsafe today.
- Multi-user concurrent workloads. The KV state store is not concurrent-safe. Multiple callers writing to the same agent’s state will produce undefined behavior.
- Production deployments. Cloud runtime is not shipped.
skrun deployhas no stable target. Building a production deployment on a feature that does not exist yet is not a plan. - Security-sensitive contexts. The
permissionsblock inagent.yamlis declarative, not enforced at the local runtime level. A model that ignores its permission constraints will not be stopped by skrun v0.1. - Regulated environments. No audit logging, no RBAC, no compliance controls. If you need to demonstrate that an agent cannot exfiltrate data, skrun cannot provide that guarantee today.
Trade-offs
| Advantage | Disadvantage |
|---|---|
| Zero-boilerplate HTTP wrapper for SKILL.md files | Cloud deployment is not yet available — skrun deploy has no stable target |
| Typed input/output schemas enforce contracts | Permission declarations are not enforced in the local runtime |
| Multi-model fallback with automatic switching | State store is not concurrent-safe; shared-state footgun for multi-user scenarios |
| MCP server integration reuses existing ecosystem tools | All model API keys live in local environment; no secrets management |
| Built-in test framework for end-to-end skill validation | Tests cost real tokens — expensive to run frequently in CI |
| Hot-reload dev server speeds up the authoring loop | Agent skills supply chain risk: the ToxicSkills study found 36% of public skills contain at least one security flaw |
| SKILL.md compatibility with Claude Code, Copilot, Codex | No authentication beyond a static bearer token |
| MIT license; TypeScript source is auditable | v0.1 — no production deployments, no post-mortems, no stability guarantees |
Alternatives
The right comparison depends on what problem you are actually solving.
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| Custom Express/Fastify server | Full control; no magic | You need authentication, rate limiting, custom auth, or non-standard I/O handling |
| LangGraph | Graph-based agent orchestration with checkpointing, streaming, and a cloud platform | You need multi-step orchestration, human-in-the-loop, or a persistent agent execution platform |
| OpenHands API | SWE-benchmark-validated coding agent, sandboxed execution, commercial platform | You need a production-ready coding agent with proven isolation and published performance |
| DeerFlow | SuperAgent harness with sub-agent coordination, deep research, and persistent memory | You want multi-agent workflows, not single-skill APIs |
| Direct LLM SDK calls | No runtime overhead; maximum flexibility | You only need one model, one skill, and do not need typed output schemas |
The honest comparison for skrun’s target use case — SKILL.md to HTTP endpoint — is “write it yourself.” That is 50–100 lines of TypeScript. skrun adds multi-model fallback, state management, and a test runner on top of that. Whether those additions justify the dependency is a judgment call for each team.
The SKILL.md Ecosystem Risk
skrun inherits a supply chain problem from the SKILL.md ecosystem it targets. The Snyk ToxicSkills study (2026) examined 3,984 public agent skills and found that 36.8% contained at least one security flaw. Of confirmed malicious skills, 91% combined prompt injection with traditional malware techniques — skills that instruct the agent to exfiltrate credentials via base64-encoded network calls while appearing to do something benign.
This matters for skrun specifically because the threat model changes when a skill runs as an API rather than inside a local coding assistant. A skill running in Claude Code operates in the context of a single developer. A skill running behind POST /run might be invoked by CI pipelines, other agents, or backend services with broader access to credentials and file systems. The attack surface is larger, the blast radius is higher, and skrun v0.1 provides no protection against it.
If you are importing skills from public marketplaces into a skrun deployment, audit them as you would audit any executable dependency.
Key Takeaways
- skrun solves a real and specific problem: exposing SKILL.md agent definitions as typed HTTP APIs without writing a server. For teams with existing skills they want to call programmatically, the import path is the strongest argument for trying it.
- The typed I/O contract — declaring inputs and outputs in
agent.yamland having the runtime enforce the schema — is the most durable feature. Everything else in v0.1 has caveats; this one holds regardless of whether cloud deployment ships. - Do not build production deployments on skrun today. Cloud runtime is not available. The
deploycommand exists as a CLI verb but its target does not yet exist publicly. Wait for theRuntimeAdapterto land before evaluating it for anything beyond local automation. - The multi-model fallback is genuinely useful but the defaults are naive. In a shared state scenario with concurrent callers, fallback logic can produce duplicate state writes. Test your failure modes before relying on them.
- Treat the
permissionsblock as documentation, not enforcement. In the local runtime, permission declarations do not constrain model behavior. A model that decides to make network calls will make network calls. - The SKILL.md ecosystem has an active supply chain attack problem. If you use public skills with skrun, audit them before deploying them to any environment with access to credentials or sensitive file systems.
References
- skrun GitHub repository — Source, examples, and agent.yaml specification
- Show HN: Skrun – Deploy any agent skill as an API — Community discussion, including security concerns and use case validation from practitioners
- Anthropic Agent Skills documentation — Definitive reference for the SKILL.md format, progressive loading model, and security considerations
- AI Agent Skills: The Complete Guide to SKILL.md for Developers in 2026 — Cross-platform compatibility reference for the SKILL.md standard
- ToxicSkills: Malicious AI Agent Skills in ClawHub — Snyk’s supply chain security study; 36.8% of public skills contain security flaws
- Use Agent Skills in VS Code — Microsoft’s Copilot implementation of the SKILL.md standard, demonstrating cross-platform reach