OpenHands

★ New
trial
AI / ML open-source MIT freemium

What It Does

OpenHands is an open-source platform for building and running autonomous AI coding agents. Agents interact with codebases the way a human developer would: reading and editing files, running terminal commands, browsing the web, and executing multi-step development tasks end-to-end. The platform provides a sandboxed Docker runtime for safe code execution, supports multiple LLM providers (Anthropic Claude, OpenAI GPT, Google Gemini, DeepSeek, Qwen, local Ollama models), and ships four distinct interfaces: a CLI, a local web GUI, a Python SDK for programmatic agent orchestration, and a hosted cloud platform.

Originally called OpenDevin, the project emerged from CMU and UIUC research and was published at ICLR 2025. The commercial entity All Hands AI provides the cloud and enterprise tiers while the core remains MIT-licensed.

Key Features

  • Docker-sandboxed code execution environment isolating agent actions from host system
  • Model-agnostic architecture supporting Claude, GPT, Gemini, DeepSeek, Qwen, and local models via Ollama
  • Software Agent SDK (Python + REST API) for defining custom agents with built-in tools (file editor, terminal, task tracker)
  • CLI interface comparable to Claude Code or Codex for interactive terminal-based development
  • Local web GUI with real-time observation of agent reasoning and actions
  • Cloud platform with GitHub, GitLab, Bitbucket, Slack, Jira, and Linear integrations
  • Public skills marketplace for distributing reusable agent capabilities
  • OpenHands Index — a multi-domain benchmark evaluating LLMs across five software engineering task types (issue resolution, greenfield dev, frontend dev, test generation, information gathering)
  • SWE-bench Verified score of 77.6% (as of early 2026), claimed #1 open-source agent on leaderboard
  • Enterprise self-hosted deployment via Kubernetes Helm charts with RBAC and multi-tenancy

Use Cases

  • Automated bug fixing and PR creation from issue trackers (GitHub Issues, Jira, Linear)
  • Code migration and dependency upgrades across microservice fleets
  • Vulnerability triage and automated patching at scale
  • Parallel agent orchestration for large refactoring or migration campaigns
  • Research and evaluation platform for testing new LLMs on software engineering benchmarks
  • Enterprise teams needing model-agnostic, self-hosted AI coding infrastructure to avoid vendor lock-in

Adoption Level Analysis

Small teams (<20 engineers): Possible but with friction. The CLI and local GUI work well for individual developers. However, useful autonomous coding requires frontier LLM API access (Claude, GPT-4+), which costs $3+/task based on real-world reports. Local models via Ollama produce dramatically worse results — 14-32B models managed only 1-2 actions before losing context in independent testing. Docker dependency for the sandbox adds setup overhead. Cost-effective for occasional use, but not a game-changer for small teams at current LLM pricing.

Medium orgs (20-200 engineers): Good fit. The cloud platform and SDK enable shared infrastructure for AI-assisted development. GitHub/GitLab integrations and multi-user support make it viable as a team tool. The model-agnostic architecture provides negotiating leverage with LLM providers. Cost management becomes important — heavy usage runs $100-200/month per active developer in LLM API costs alone.

Enterprise (200+ engineers): Viable but enterprise product is still maturing. Self-hosted Kubernetes deployment via Helm charts exists but is self-described as having “gotchas.” The PostgreSQL-backed multi-tenancy migration was targeted for April 2026 completion. For organizations with strict data residency or air-gapped requirements, this is one of the few open-source options. However, enterprises should evaluate RBAC maturity, audit logging completeness, and the dual-license model (MIT core + commercial enterprise directory) before committing.

Alternatives

AlternativeKey DifferencePrefer when…
Claude CodeSingle-model (Anthropic), CLI-only, more polished autonomous coding experienceYou are committed to Anthropic ecosystem and want the most refined CLI agent experience
Codex (OpenAI)Single-model (OpenAI), async task delegation modelYou want fire-and-forget task delegation with OpenAI models
Devin (Cognition)Fully managed SaaS, proprietary, most autonomousYou want maximum autonomy without infrastructure management
Goose (Block)MCP-native, lighter weight, community-governed via AAIFYou want a simpler agent with strong MCP ecosystem integration
OpenCodeMIT-licensed, TUI + desktop app, lighter footprintYou want a simpler open-source alternative without sandboxed execution overhead

Evidence & Sources

Notes & Caveats

  • Benchmark score nuance: The 77.6% SWE-bench Verified score reflects the combined system (OpenHands harness + frontier LLM). Performance collapses dramatically with smaller or local models. The score is heavily model-dependent, not platform-dependent.
  • SWE-bench Verified vs Live gap: Across all agents, SWE-bench Verified scores (60%+) far exceed SWE-bench Live scores (~19%), suggesting possible memorization effects in the static benchmark. METR found roughly half of test-passing SWE-bench PRs would not be merged by maintainers.
  • Local model quality: Independent testing found Ollama models (7B-70B) effectively unusable for autonomous coding with OpenHands. Only frontier models produce useful results.
  • Enterprise maturity: The Helm chart for self-hosted deployment is acknowledged as work-in-progress by the project itself. PostgreSQL-backed multi-tenancy targeted April 2026 completion.
  • Credentials and secrets: No native secrets management. GitHub tokens work via web interface, but other credentials require workarounds (prompt injection or environment variables), creating security exposure.
  • Git operations: Multiple independent reports of agents struggling with git operations — pushing to wrong branches, failing to use credentials correctly, inability to interact with PR comments programmatically.
  • Cost at scale: ~$3/task for simple microservice upgrades. Heavy usage estimated at $100-200/month per developer in LLM API costs. The platform cost is secondary to the LLM cost.
  • Dual licensing: Core MIT, enterprise directory source-available with commercial license. Docker images are MIT. This is a legitimate open-core model but teams should understand what requires a paid license.
  • Name history: Project was originally called “OpenDevin” before rebranding to OpenHands, which may cause confusion in older references and search results.