ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Item: ClawKeeper
Rating: 3
Author: altexs

Source: GitHub (SafeAI-Lab-X) / arXiv 2603.24414 | Authors: Songyang Liu, Chaozhuo Li, Chenxu Wang et al. | Published: 2026-03-25 Category: research | Credibility: medium

Executive Summary

ClawKeeper is a three-layer real-time security framework for OpenClaw autonomous agents, implementing skill-based (instruction-level), plugin-based (runtime enforcement), and watcher-based (system-level monitoring) protections against threats including prompt injection, credential leakage, privilege escalation, and malicious skill execution.
The project is an academic research contribution from researchers with ties to Microsoft Research Asia and Beijing University of Posts and Telecommunications (Chaozhuo Li was a Lead Researcher at MSRA 2020-2024). It is released under MIT license with an accompanying arXiv paper and 305 GitHub stars.
The authors claim “optimal defense performance” on a self-constructed benchmark of 140 adversarial instances across 7 safety categories, but the benchmark is not independently validated and the paper is a preprint without peer review. A separate, unrelated project also named “clawkeeper” exists from RAD Security (a commercial security vendor), which is a bash-based host auditing CLI — these are distinct projects.

Critical Analysis

Claim: “ClawKeeper achieved optimal defense performance across seven safety task categories”

Evidence quality: vendor-sponsored (self-constructed benchmark by the same authors)
Assessment: The benchmark comprises 140 adversarial instances (20 per category, split into 10 simple / 10 complex). While the methodology sounds reasonable, the benchmark was created by the same team that built ClawKeeper, and no independent party has reproduced these results. “Optimal” is a strong word — it means they beat all competitors they tested against, but we do not know which competitors were included or whether they were configured favorably. The paper is a preprint (arXiv) with no peer review yet.
Counter-argument: Self-constructed benchmarks are common in academic AI security papers where standardized benchmarks do not yet exist. However, the competing paper “Don’t Let the Claw Grip Your Hand” (arXiv 2603.10387) used a different methodology (47 adversarial scenarios derived from MITRE ATLAS/ATT&CK frameworks) and found OpenClaw’s native defense rate was only 17%, improving to 19-92% with their HITL approach. The fact that multiple independent groups found severe OpenClaw vulnerabilities lends credibility to ClawKeeper’s problem statement, even if its solution claims remain unvalidated.
References:
- Don’t Let the Claw Grip Your Hand (arXiv 2603.10387) — alternative defense framework with MITRE-derived benchmarks
- SafeClaw-R (arXiv 2603.28807) — competing approach achieving 97.8% malicious skill detection and 95.2% accuracy in Google Workspace scenarios
- A Systematic Taxonomy of Security Vulnerabilities in OpenClaw (arXiv 2603.27517) — independent vulnerability taxonomy

Claim: “Three-layer architecture provides comprehensive protection across the full agent lifecycle”

Evidence quality: anecdotal (architectural design claim, no independent deployment evidence)
Assessment: The three-layer design (skill-based instruction injection, plugin-based runtime enforcement, watcher-based decoupled monitoring) is architecturally sound in theory. The decoupled watcher layer is the most novel contribution — it monitors agent state evolution without coupling to internal logic, enabling deployment both locally and in the cloud. However, “comprehensive” is a strong claim for a v1.0 release. No production deployment case studies exist. The skill-based layer is essentially prompt-level defense, which has well-documented limitations against sophisticated prompt injection.
Counter-argument: Defense-in-depth (multiple layers) is a well-established security principle and the approach is more complete than single-layer alternatives. However, the OWASP Top 10 for Agentic Applications (2026) identifies attack vectors like tool misuse (ASI02) and identity/privilege abuse (ASI03) that may not be fully addressed by instruction-level or behavioral monitoring alone. The skill-based layer in particular relies on the LLM correctly interpreting security instructions injected into its context, which is fundamentally the same mechanism that prompt injection attacks exploit.
References:
- OWASP Top 10 for Agentic Applications
- Prompt Injection Attacks: Comprehensive Review (MDPI)

Claim: “Watcher-based protection introduces a novel, decoupled system-level security middleware”

Evidence quality: anecdotal (design novelty claim)
Assessment: The watcher paradigm — an independent monitor that observes agent state evolution and can halt execution — is genuinely useful architecturally. It decouples security enforcement from the agent runtime, which is valuable because it means the agent does not need to cooperate with its own security (unlike skill-based injection where the LLM must “choose” to obey). However, the concept is not entirely novel: StrongDM’s Leash uses eBPF for similar decoupled runtime interception, and traditional OS-level security monitors (SELinux, AppArmor) have done this at the process level for decades. The novelty is in applying it specifically to the OpenClaw agent lifecycle.
Counter-argument: The watcher’s effectiveness depends on what state it can observe. If it only sees high-level actions (tool calls, file operations), sophisticated attacks that operate within “normal-looking” operations (e.g., exfiltrating data character-by-character through legitimate API calls) may evade detection. The paper’s evaluation presumably tests this, but without independent reproduction, we cannot assess detection coverage.
References:
- Leash by StrongDM — eBPF-based decoupled security enforcement for AI coding agents
- Uncovering Security Threats in Autonomous Agents (arXiv 2603.12644)

Claim: “OpenClaw’s broad operational privileges introduce critical security vulnerabilities”

Evidence quality: case-study (multiple independent CVE disclosures and security analyses)
Assessment: This claim is well-supported by external evidence. OpenClaw has had multiple severe CVEs in early 2026, including CVE-2026-25253 (CVSS 8.8, one-click RCE), credential leakage exposing 1.5M API tokens via Moltbook database misconfiguration, and 135,000+ exposed instances across 82 countries. Multiple independent security papers (at least 4 on arXiv in March 2026 alone) have analyzed OpenClaw vulnerabilities. The problem statement is legitimate and well-documented.
Counter-argument: None needed — this claim is well-evidenced. The question is not whether OpenClaw has security problems (it clearly does), but whether ClawKeeper is the right solution compared to alternatives or to waiting for OpenClaw itself to mature.
References:

Credibility Assessment

Author background: Lead author Chaozhuo Li is a former Microsoft Research Asia Lead Researcher (2020-2024) now at Beijing University of Posts and Telecommunications, with 100+ papers including 60+ CCF-A publications. This is a credible academic background. The “SafeAI-Lab-X” GitHub organization does not map to a well-known institution — it appears to be a project-specific GitHub org rather than an established lab. Not to be confused with CMU’s Safe AI Lab (safeai-lab) or ETH Zurich’s SafeAI.
Publication bias: Academic preprint (arXiv, not peer-reviewed). The project is open-source (MIT), not a commercial product, which reduces but does not eliminate self-promotion bias. The authors have no apparent commercial stake, but academic incentives (citation counts, novelty claims) still apply.
Competing projects: At least 3 other independent research groups published OpenClaw security papers in March 2026 alone (“Don’t Let the Claw Grip Your Hand,” “SafeClaw-R,” “A Systematic Taxonomy of Security Vulnerabilities in OpenClaw”). This validates the problem space but also means ClawKeeper is one of several proposed solutions, not the definitive one.
Name collision: A completely separate project called “clawkeeper” by RAD Security (a commercial security vendor) is a bash-based host auditing tool for AI agent machines with 42 checks. This is not the same project. The name collision is unfortunate and could cause confusion.
Verdict: medium — Credible academic authors with relevant expertise and a legitimate problem statement backed by extensive independent evidence. However, the benchmark is self-constructed, the paper is not peer-reviewed, the project is v1.0 with no production deployments, and the “optimal” performance claim cannot be verified independently. The three-layer architecture is sound in principle but untested at scale.

Referenced in catalog