skrun: Turning Agent Skills into REST APIs

skrun solves a genuine friction point — bridging the SKILL.md authoring experience to a callable HTTP endpoint — but ships today as a local-only tool with no cloud runtime, no authentication layer, and no isolation guarantees that matter for production.

The Problem

SKILL.md has become the lingua franca of AI agent capabilities. Anthropic introduced it, OpenAI adopted it for Codex CLI, Microsoft shipped it in GitHub Copilot, and as of early 2026 there are over 490,000 skills published across three major marketplaces. The format works well inside AI coding assistants: Claude Code discovers skills on the filesystem, loads them on demand, and executes them in a VM with full bash access. The problem is that the design is entirely local and model-specific.

If you want to expose a skill as a service — callable by a CI pipeline, another agent, a backend job, or a third-party integration — you have to wrap it yourself. That means writing a server, handling model API keys, normalizing inputs and outputs, managing conversation state between runs, and wiring up fallback logic when a model is unavailable. None of that is hard, but it is all boilerplate that every team reinvents independently.

skrun’s pitch is that this boilerplate should not need to be reinvented. Define the skill, write an agent.yaml that declares the I/O contract, and get a POST /run endpoint back.

What It Is

skrun is an open-source TypeScript CLI and local development server that turns agent skills into typed REST APIs. The core primitive is agent.yaml: a configuration file that sits alongside a SKILL.md file and declares the model to use, the input schema, the output schema, permission boundaries, state persistence, and test cases. The CLI generates a local HTTP server that accepts POST /run with a JSON body matching the input schema and returns a JSON object matching the output schema.

The framework is model-agnostic by design. A single agent.yaml can specify a primary model (say, Gemini 2.5 Flash) and a fallback chain (same model, different provider endpoint). It supports Anthropic, OpenAI, Google, Mistral, and Groq. The fallback mechanism is automatic — if the primary call fails or times out, the runtime retries against the fallback configuration without caller involvement.

State is handled through a key-value store local to each agent. The SEO audit example uses this to compare the current run’s score against the previous run’s score, storing last_score, last_audit_date, and audit_count with a 30-day TTL. The state is scoped per agent name, not per caller, which means concurrent callers share state. That is a footgun for anything beyond single-user use cases.

The architecture anticipates cloud deployment through a RuntimeAdapter interface, but v0.1 ships no cloud runtime. skrun deploy exists as a CLI command but its behavior depends on infrastructure that is not yet public.

How It Works

agent.yaml: The Contract Layer

The agent.yaml is where skrun does its real work. It separates the what (SKILL.md: instructions, workflow, examples) from the how (agent.yaml: model, schema, permissions, state, tests). Here is a representative example from the code-review agent:

name: dev/code-review
version: 1.0.0

model:
  provider: google
  name: gemini-2.5-flash
  # fallback: retries the same model on transient failure
  fallback:
    provider: anthropic
    name: claude-sonnet-4-5

inputs:
  - name: code
    type: string
    required: true
    description: The code to review

outputs:
  - name: review
    type: string
  - name: issues
    type: array
  - name: score
    type: number

permissions:
  network: []          # no outbound network calls
  filesystem: read-only
  secrets: []

runtime:
  timeout: 60s
  sandbox: strict      # sandbox mode (local isolation only)

state:
  type: none
  ttl: 30d

tests:
  - name: basic-review
    input:
      code: "function add(a, b) { return a + b; }"
    assert: output.score >= 0

The permissions block is declarative intent, not enforcement. In the local runtime, nothing prevents the model from making network calls if the instructions ask it to. sandbox: strict signals intent but the actual enforcement depends on the runtime adapter — which, in v0.1, is the local environment.

MCP Server Integration

The web-scraper example shows how skrun delegates browser automation to an MCP server rather than handling it directly. The agent.yaml specifies:

mcp_servers:
  - name: browser
    transport: stdio
    command: npx
    args:
      - "-y"
      - "@playwright/mcp"
      - "--headless"

The skrun runtime starts the MCP server as a child process via stdio transport, then exposes its tools to the LLM during the run. This is the same pattern Claude Desktop uses for MCP integration. The implication is that skrun agents can consume any MCP-compatible tool without bespoke SDK work — including file systems, databases, browser automation, and custom internal tools.

State Persistence

The SEO audit example demonstrates stateful agents. The agent.yaml defines:

state:
  type: kv
  ttl: 30d

The SKILL.md can reference a _state variable that persists across invocations. The agent reads _state.last_score on entry and writes back an updated score at completion. The runtime handles serialization and TTL-based expiry. In practice this is a JSON file on disk in the local runtime — simple, fragile at concurrency, and not suitable for distributed deployments.

The Development Loop

# Initialize a new agent (creates agent.yaml + SKILL.md scaffold)
skrun init my-agent

# Import an existing skill
skrun init --from-skill path/to/SKILL.md

# Start local dev server with hot-reload on http://localhost:3000
skrun dev

# Run declared tests
skrun test

# Package as .agent bundle
skrun build

# Build + push + return live URL (cloud runtime not yet available)
skrun deploy

The hot-reload development server means you can edit SKILL.md and immediately call POST /run against the local endpoint to see the effect. The skrun test command executes the test cases declared in agent.yaml against the live model — these are end-to-end tests, not unit tests. They cost real tokens per run.

Calling the API

Once running, agents respond to:

POST /api/agents/{namespace}/{agent-name}/run
Authorization: Bearer <token>
Content-Type: application/json

{
  "code": "function add(a, b) { return a + b; }"
}

Response:

{
  "review": "Function is correct but lacks input validation...",
  "issues": ["no type checking", "no error handling for NaN"],
  "score": 72
}

The typed I/O contract is the key value here. Callers do not need to parse LLM output — skrun coerces the model’s response into the declared output schema.

In Practice

The primary audience for skrun today is developers who already have SKILL.md files and want to expose them as HTTP endpoints without writing a server. If you have built a collection of skills for Claude Code or Codex CLI, skrun gives you a way to call them from non-agent contexts — CI jobs, backend services, other agents — without duplicating the logic.

The multi-model fallback is genuinely useful for teams that want model-agnostic resilience. Rather than binding a skill to one provider, you can declare a preference chain and let the runtime handle switching. This is hard to get right manually — rate limits, transient failures, and API versioning all create edge cases.

The MCP integration is the most architecturally interesting feature. It means a skrun agent can consume a Playwright browser, a database query tool, or an internal API via MCP without custom SDK work. The skill focuses on what to do; MCP handles the how.

When to Use It

You have existing SKILL.md files and want a lightweight HTTP wrapper without writing a custom server. The import path (skrun init --from-skill) is the strongest use case.
You need typed output schemas from LLM calls. The input/output contract enforcement eliminates downstream JSON parsing code.
You want multi-model fallback without building the retry logic yourself. Useful when you cannot tolerate dependency on a single provider.
Local automation pipelines — CI hooks, developer tooling, internal scripts — where the local-only deployment model is acceptable.
Prototyping a skill-as-service concept before building production infrastructure around it.

When NOT to Use It

Public-facing APIs. There is no authentication beyond a static bearer token, no rate limiting, no request validation beyond schema type checking. Exposing a skrun endpoint to the internet is straightforwardly unsafe today.
Multi-user concurrent workloads. The KV state store is not concurrent-safe. Multiple callers writing to the same agent’s state will produce undefined behavior.
Production deployments. Cloud runtime is not shipped. skrun deploy has no stable target. Building a production deployment on a feature that does not exist yet is not a plan.
Security-sensitive contexts. The permissions block in agent.yaml is declarative, not enforced at the local runtime level. A model that ignores its permission constraints will not be stopped by skrun v0.1.
Regulated environments. No audit logging, no RBAC, no compliance controls. If you need to demonstrate that an agent cannot exfiltrate data, skrun cannot provide that guarantee today.

Trade-offs

Advantage	Disadvantage
Zero-boilerplate HTTP wrapper for SKILL.md files	Cloud deployment is not yet available — `skrun deploy` has no stable target
Typed input/output schemas enforce contracts	Permission declarations are not enforced in the local runtime
Multi-model fallback with automatic switching	State store is not concurrent-safe; shared-state footgun for multi-user scenarios
MCP server integration reuses existing ecosystem tools	All model API keys live in local environment; no secrets management
Built-in test framework for end-to-end skill validation	Tests cost real tokens — expensive to run frequently in CI
Hot-reload dev server speeds up the authoring loop	Agent skills supply chain risk: the ToxicSkills study found 36% of public skills contain at least one security flaw
SKILL.md compatibility with Claude Code, Copilot, Codex	No authentication beyond a static bearer token
MIT license; TypeScript source is auditable	v0.1 — no production deployments, no post-mortems, no stability guarantees

Alternatives

The right comparison depends on what problem you are actually solving.

Alternative	Key Difference	Prefer when…
Custom Express/Fastify server	Full control; no magic	You need authentication, rate limiting, custom auth, or non-standard I/O handling
LangGraph	Graph-based agent orchestration with checkpointing, streaming, and a cloud platform	You need multi-step orchestration, human-in-the-loop, or a persistent agent execution platform
OpenHands API	SWE-benchmark-validated coding agent, sandboxed execution, commercial platform	You need a production-ready coding agent with proven isolation and published performance
DeerFlow	SuperAgent harness with sub-agent coordination, deep research, and persistent memory	You want multi-agent workflows, not single-skill APIs
Direct LLM SDK calls	No runtime overhead; maximum flexibility	You only need one model, one skill, and do not need typed output schemas

The honest comparison for skrun’s target use case — SKILL.md to HTTP endpoint — is “write it yourself.” That is 50–100 lines of TypeScript. skrun adds multi-model fallback, state management, and a test runner on top of that. Whether those additions justify the dependency is a judgment call for each team.

The SKILL.md Ecosystem Risk

skrun inherits a supply chain problem from the SKILL.md ecosystem it targets. The Snyk ToxicSkills study (2026) examined 3,984 public agent skills and found that 36.8% contained at least one security flaw. Of confirmed malicious skills, 91% combined prompt injection with traditional malware techniques — skills that instruct the agent to exfiltrate credentials via base64-encoded network calls while appearing to do something benign.

This matters for skrun specifically because the threat model changes when a skill runs as an API rather than inside a local coding assistant. A skill running in Claude Code operates in the context of a single developer. A skill running behind POST /run might be invoked by CI pipelines, other agents, or backend services with broader access to credentials and file systems. The attack surface is larger, the blast radius is higher, and skrun v0.1 provides no protection against it.

If you are importing skills from public marketplaces into a skrun deployment, audit them as you would audit any executable dependency.

Key Takeaways

skrun solves a real and specific problem: exposing SKILL.md agent definitions as typed HTTP APIs without writing a server. For teams with existing skills they want to call programmatically, the import path is the strongest argument for trying it.
The typed I/O contract — declaring inputs and outputs in agent.yaml and having the runtime enforce the schema — is the most durable feature. Everything else in v0.1 has caveats; this one holds regardless of whether cloud deployment ships.
Do not build production deployments on skrun today. Cloud runtime is not available. The deploy command exists as a CLI verb but its target does not yet exist publicly. Wait for the RuntimeAdapter to land before evaluating it for anything beyond local automation.
The multi-model fallback is genuinely useful but the defaults are naive. In a shared state scenario with concurrent callers, fallback logic can produce duplicate state writes. Test your failure modes before relying on them.
Treat the permissions block as documentation, not enforcement. In the local runtime, permission declarations do not constrain model behavior. A model that decides to make network calls will make network calls.
The SKILL.md ecosystem has an active supply chain attack problem. If you use public skills with skrun, audit them before deploying them to any environment with access to credentials or sensitive file systems.

References

skrun GitHub repository — Source, examples, and agent.yaml specification
Show HN: Skrun – Deploy any agent skill as an API — Community discussion, including security concerns and use case validation from practitioners
Anthropic Agent Skills documentation — Definitive reference for the SKILL.md format, progressive loading model, and security considerations
AI Agent Skills: The Complete Guide to SKILL.md for Developers in 2026 — Cross-platform compatibility reference for the SKILL.md standard
ToxicSkills: Malicious AI Agent Skills in ClawHub — Snyk’s supply chain security study; 36.8% of public skills contain security flaws
Use Agent Skills in VS Code — Microsoft’s Copilot implementation of the SKILL.md standard, demonstrating cross-platform reach