OpenClaw Skill Security: Intent-Capability-Behavior Consistency as a New Framework

Personal agents become significantly more useful when they can extend themselves with skills. But the same mechanism that makes them adaptive also creates a new trust boundary: a skill can reshape what an agent is allowed to do, what secrets it can touch, and how it behaves after installation. For teams building agents that are meant to learn from real experience, this is not a peripheral security issue. It is part of the core problem of keeping adaptive systems trustworthy over time.

figure_01.png

Figure 1: Skills expand a personal agent's capability surface, which also expands its security boundary.

OpenClaw's rapid rise showed how compelling the personal-agent-and-skills model can be [1]. Skills are no longer just lightweight plugins [2]. They increasingly determine what an agent can do, how it does it, and how far its capabilities can extend in practice. The ClawHavoc incident then made the risk concrete [3]. As the marketplace scaled, seemingly legitimate skills were used to deliver malware and steal credentials, browser sessions, API keys, SSH keys, and other sensitive data from affected machines.

The core issue is not only whether malicious code has been inserted, but whether a skill can continuously shape how an agent acquires and exercises capabilities through documentation, installation flows, permission requests, and runtime behavior. This is why we believe skills security can no longer be treated as a simple extension of traditional plugin scanning or malware detection. It requires a framework that can reason about capability use across the full skill lifecycle. At Mind Lab, this matters because personal agents like Macaron are meant to reason, use tools, and improve from real experience. If that learning loop is to stay trustworthy, the skill layer has to be part of the trust boundary itself.

1. Limitations of Existing Detection Methods

If skills security is reduced to a traditional malware-detection problem, many of the most important risks are excluded from the outset. Existing methods typically assume that risk resides mainly in explicit malicious code, malicious dependencies, suspicious strings, or dangerous calls that match a rule set. This approach remains useful for traditional software supply chains, but it is insufficient for the skills ecosystem, where natural language, installation flows, and execution context are central [4].

The reason is that the risk posed by a skill is often not concentrated in any single file. It may appear in the task definition inside SKILL.md, in installation instructions that request permissions or environment variables, or in dependency configuration, external web content, remote configuration, or even runtime tool invocation patterns.

As a result, what skills security needs is not a stronger point scanner, but a new analytical framework capable of jointly interpreting the relationship among declared intent, platform-granted capabilities, and actual behavior.

2. A New Core Lens: Intent-Capability-Behavior Consistency

To truly understand skill risk, we need to ask a different question. Rather than asking whether a skill looks malicious, the more important question is: given its stated purpose and the capabilities granted to it, is its actual behavior still explainable, constrainable, and auditable?

We summarize this framework as Intent-Capability-Behavior Consistency (ICB).

A skill should be evaluated along at least three dimensions at the same time:

  • Intent: what it claims to do, as expressed in SKILL.md, description text, installation instructions, and other documentation.
  • Capability: what capabilities the platform actually grants it, such as network access, shell execution, secret access, file writes, remote configuration fetching, and so on.
  • Behavior: what it actually does during installation and runtime, including installation scripts, dependency actions, tool invocation traces, runtime observations, and capability expansion in later versions.

Whether a skill is trustworthy should not be determined by any one dimension alone, but by whether these three dimensions remain consistent with one another. The significance of ICB is that it shifts the problem from "Does this resemble malware?" to "Do the granted capabilities match the declared intent?" What deserves the most scrutiny is not whether a skill matches traditional malicious-sample signatures, but whether its capability boundary has drifted away from its original intent.

figure_02.png

Figure 2: Intent-Capability-Behavior Consistency evaluates whether declared purpose, granted permissions, and observed behavior remain aligned.

3. Building a Defense Platform Around This Risk: Pre-Publication, Installation, and Runtime

If ICB is to become a practical security framework, it cannot remain merely conceptual. It must be engineered into a control system that spans the entire skill lifecycle. The goal is not to build a scanner that scores a skill once at a single point in time, but to establish a platform that continuously evaluates risk across pre-publication, installation, and runtime.

For skills, risk is rarely exposed all at once at a single static moment. Instead, it often emerges gradually through installation, authorization, execution, and version evolution. These three stages are not separate mechanisms answering different questions. They are different phases of answering the same question: does the observed use of capability remain consistent with the declared intent of the skill?

  • Pre-publication: Does what the skill claims to be match what the repository actually contains?
  • Installation: What capabilities is the platform granting, and are those capabilities proportionate to the declared purpose?
  • Runtime: Does the skill's real execution behavior remain within the approved boundary?

Any single stage will miss cases. Pre-publication analysis cannot fully predict runtime behavior. Installation-time permission review cannot replace enforcement during execution. And runtime auditing often compensates for what the first two stages fail to catch. A more robust platform therefore should not rely on a single scan to establish permanent trust, but instead should continuously accumulate and update trust evidence.

figure_03.png

Figure 3: A practical defense stack for skills needs coordinated checks before publication, during installation, and at runtime.

3.1 Pre-Publication Detection

The key question at the pre-publication stage is not "Has this skill already been proven malicious?" but rather: is its stated purpose consistent with its repository contents, instruction design, dependency structure, and historical evolution?

Accordingly, pre-publication detection should not stop at traditional repository scanning or dependency analysis. It should also analyze SKILL.md, installation instructions, configuration files, scripts, dependency manifests, and version diffs together. For skills, SKILL.md is not just ordinary documentation; it directly affects how an agent organizes tasks and invokes tools. It is therefore part of the security boundary itself.

From the ICB perspective, the purpose of pre-publication detection is to reconstruct, as much as possible at static-analysis time, the skill's intent-capability surface: what it claims to do, what capabilities it actually requires, whether those capabilities exceed legitimate needs, and whether recent versions show suspicious patterns of expansion.

3.2 Installation-Time Detection

If the pre-publication stage focuses on whether a skill is consistent before it enters the distribution pipeline, the installation stage addresses a different question: when the skill is actually installed, what capabilities is the platform granting it, and are those capabilities proportionate to its declared purpose?

Installation should not be treated as a simple download. It should be understood as a capability grant event. The permissions a skill requests during installation, the environment variables it depends on, the directories it can access, the tools it may invoke, and whether it touches sensitive surfaces such as networking, shell, secrets, or browser sessions all reflect the platform's decision to admit a new set of capabilities into the execution environment.

From the user's perspective, the installation interface should not be just a download button. It should clearly present what capabilities the skill is requesting, why those capabilities may or may not be reasonable, and which ones exceed its originally declared purpose. From the platform's perspective, the skill's installation text should not be treated as a trusted declaration, but as input that must itself be verified.

3.3 Runtime Detection

Even if pre-publication and installation-time controls are in place, runtime detection remains indispensable. The reason is simple: no matter how strict the upfront controls are, they cannot cover every case. A skill may appear normal in static analysis and may request only seemingly reasonable capabilities at install time, yet the real problem may only emerge during execution.

The core runtime question is: does the skill's observed behavior remain consistent with both its declared purpose and the capabilities that were approved? Once it begins to trigger high-risk shell commands, access unapproved secrets, connect to out-of-scope external endpoints, or escalate tool use far beyond its stated purpose, its behavior can no longer be reasonably explained by the original intent and capability grant.

This layer typically includes sandboxing, tool policy enforcement, gates for high-risk operations, and lightweight behavioral auditing. The goal of the runtime layer is not to reach a final verdict based on a single observation, but to control risk as much as possible without assuming that the earlier layers were perfect, while also generating stronger evidence for whether the skill should continue to be trusted.

4. CrackedShell-Sec-171: A Security Benchmark for Agent Skills

Once the problem is framed as lifecycle-wide consistency, evaluation also needs a benchmark that reflects language-driven, capability-shaping risk. Existing evaluations tend to focus on explicit malicious artifacts, suspicious code patterns, or conventional software supply-chain signals. Such benchmarks, however, do not fully capture the distinctive risks introduced by skills: risks expressed not only through code, but also through documentation, installation flows, capability requests, and runtime behavior.

To address this gap, we built CrackedShell-Sec-171, a benchmark designed specifically for skill-level security evaluation. In constructing it, we first predefined eight categories of skill-security risk: prompt injection, malicious code, data exfiltration, credential theft, agent hijacking, supply-chain attacks, memory poisoning, and mixed attacks. We then instantiated each category by combining representative attack fragments and patterns drawn from real-world attacks and prior studies into concrete benchmark skills, with a frontier LLM assisting in the synthesis of realistic skill artifacts. In parallel, we constructed benign controls to balance the evaluation set. The final benchmark contains 171 cases, including 138 malicious and 33 benign skills.

For evaluation, we treat both mid and high risk as detected and requiring escalation, since from a platform-governance perspective, neither category should be considered safe by default. On CrackedShell-Sec-171, we compare three methods:

  • Cisco static baseline [5]
  • Prompt-only direct prompting for static auditing
  • ICB: rule-based static analysis + capability extraction + LLM-based consistency judgment

figure_04.png

Figure 4: On CrackedShell-Sec-171, traditional static scanning misses most malicious skills once risk is expressed through language, permissions, and workflow design.

Cisco static scanner achieved very high precision, but its recall was only 0.1014. It caught just 14 out of 138 malicious cases, missing 124. In other words, traditional baselines still have value for explicit artifact-level risk, but once risk shifts toward language-driven, context-dependent, or supply-chain-style expansion, they severely under-call.

Prompt-only represents the opposite extreme. On this benchmark, it achieved 0 false positives, with both precision and specificity at 1.0. But the cost was that it allowed 47 malicious cases to pass as safe. In essence, it behaves like an overly conservative semantic reviewer: as long as the text and surface-level purpose appear plausible, it tends to accept the skill's self-explanation.

ICB better matches the needs of platform governance. It achieved the best accuracy, recall, and F1 among the three methods: it detected 133 out of 138 malicious cases, missing only 5. More importantly, it did not simply mark all suspicious samples as high risk. Among the 138 malicious cases, 70 were directly classified as high, while 63 were elevated to mid. Among the 33 benign cases, only 1 was directly elevated to high, and 11 were marked as mid. This suggests that its false positives mainly take the form of "requires further explanation or review," rather than "treat all as clearly malicious."

figure_05.png

Figure 5: ICB improves overall detection quality while keeping most false positives in a review-oriented escalation band instead of collapsing everything into hard blocks.

Case Study: When a Stock Analysis Skill Starts Requesting Browser Session Tokens

In our scan of the ClawHub Top 500 skills, stock-analysis-6.2.0 was a particularly illustrative example, more revealing than traditional dangerous-command cases. On the surface, it appears to be a stock and cryptocurrency analysis skill used for watchlists, portfolio tracking, hot-topic scanning, and sentiment analysis. As a result, the prompt-only method ultimately classified it as allow, while the Cisco baseline assigned it only MEDIUM. Our ICB method, however, classified it as high, with an internal enforcement outcome of block.

The most important risk signal was that the skill explicitly instructed the user to extract active browser session tokens. Its bundled documentation directly referenced AUTH_TOKEN and CT0, and told users to open the browser DevTools Cookies panel, copy auth_token and ct0 from x.com, and write them into a .env file. More importantly, the skill also had multiple instances of local command execution via subprocess.run(...), accessed multiple external domains, and performed file writes and dependency installation. In other words, the real issue was that it combined session-token access, command execution, and broad external network access within a single high-download skill.

figure_06.png

Figure 6: The stock-analysis-6.2.0 case combines session-token requests, command execution, and broad network access in ways that exceed its declared purpose.

At the same time, we found that ClawHub's static publication gate provides very limited coverage for language-driven skill risk, whereas ICB substantially narrows that gap.

figure_07.png

Figure 7: Static marketplace screening leaves a substantial gap on language-driven skill risk, and ICB closes more of that gap.

5. Conclusion

Skills security is not simply traditional plugin security under a different name. As agent platforms elevate skills into a first-class extension mechanism, what must be defended is no longer just malicious payloads hidden in code repositories, but a broader supply-chain problem shaped by language, permissions, and behavior over time.

ICB is not a claim that one classifier or one scan can solve the problem. It still depends on good visibility into skill artifacts, capability grants, and runtime telemetry, and it will need stronger benchmarks, better permission interfaces, and tighter enforcement surfaces as skill ecosystems evolve. But it provides a more faithful question for the field: what does a skill claim, what is it allowed to do, and what does it actually do once installed?

At Mind Lab, this connects directly to how we think about Experiential Intelligence and Learning from Real Experience. Personal agents cannot safely adapt in the real world if the layer that extends their abilities remains opaque or weakly governed. We see Intent-Capability-Behavior Consistency as one practical step toward agents whose growth remains explainable, constrainable, and trustworthy over time.

References

[1] OpenClaw: Your own personal AI assistant (OpenClaw et al, 2026)

[2] Introducing Agent Skills (Anthropic et al, 2025)

[3] ClawHavoc: 341 Malicious Clawed Skills Found by the Bot They Were Targeting (Alex et al, 2026)

[4] Malicious OpenClaw Skills Used to Distribute Atomic macOS Stealer (Oliveira et al, 2026)

[5] Skill Scanner (Cisco et al, 2026)

Author

Mind Lab

Core Contributors

Shiro Yang, Rio Yang, Andrew Chen, Pony Ma

Team

Andrew Chen, Kaijie Chen, Song Cao, Yuan Cheng, Nolan Ho, Chongru Huang, Songlin Jiang, Fancy Kong, Jingdi Lei, Xiang Lei, Lucian Li, Rui Li, Tianchen Li, Nan Liu, Qihan Liu, Xiang Liu, Yiwen Lu, Pony Ma, Wenbin Wang, Guikun Yang, Rio Yang, Shiro Yang, Jiarui Yao, Ruijian Ye, Di Zhang, Ruijia Zhang, Conley Zhao, Congjie Zheng, Changhai Zhou, Yihui Zhuang and Mindverse Team

Names are listed alphabetically within team.

Citation

Please cite this work using the BibTeX citation:

@misc{shiroyang2026skillssecurity, author = {Shiro Yang and Rio Yang and Andrew Chen and Pony Ma and {Mind Lab}}, title = {OpenClaw Skill Security: Intent-Capability-Behavior Consistency as a New Framework}, year = {2026}, howpublished = {Mind Lab: A Lab for Experiential Intelligence}, note = {https://macaron.im/mindlab/research/openclaw-skill-security-intent-capability-behavior-consistency-framework} }
Share to
FacebookLinkedInX

Mind Lab © 2025 · contact@mindlab.ltd