Concepts & Glossary

Key terms and concepts in the Agent Skills ecosystem. Understanding these definitions helps when working with skills-check commands and the broader skill toolchain.

Core Concepts

Agent Skill

A markdown document (SKILL.md) with YAML frontmatter that instructs AI coding agents how to work with specific products, frameworks, and patterns. Unlike regular documentation, skills are loaded into LLM context windows and treated as executable instructions. Because agents can access the file system and run shell commands, skill quality is a security and correctness concern — not just a readability one.

`SKILL.md`

The standard file format for agent skills. Contains YAML frontmatter (name, description, version, compatibility) followed by a markdown body with instructions, code examples, and patterns. Validated by the lint command.

Version drift

When a skill's compatibility (or legacy product-version) frontmatter references an outdated version of the product it covers. For example, a React skill with compatibility: "react@^18.0.0" when v19 is current. Detected by the check command and resolvable via refresh.

Skill staleness

A skill that hasn't been updated relative to product releases, potentially containing outdated APIs, deprecated patterns, or removed features. Staleness is a spectrum — a one-patch lag may be acceptable, while a major version behind is urgent. The policy command can enforce staleness limits.

Security Concepts

Hallucinated packages

Package names referenced in skills that don't exist on package registries (npm, PyPI, crates.io). When an LLM generates a skill, it may invent plausible package names that a malicious actor could register, enabling dependency confusion attacks. Detected by the audit command's registry checker.

Prompt injection in skills

Malicious instructions embedded in skill content designed to override the agent's system prompt, exfiltrate data, or execute unauthorized commands. Because skills are loaded directly into the LLM context, they have a privileged position to influence agent behavior. Scanned by the audit command.

Dangerous commands

Destructive shell commands (rm -rf, chmod 777, curl | sh) that skills should not recommend, as agents may execute them with full system access. The audit command flags these patterns, and the policy command can ban them organization-wide.

Quality Concepts

Token budget

The context window cost of loading a skill, measured in tokens. Skills compete for limited context space; oversized skills reduce room for user code and conversation. The budget command measures per-skill and per-section token counts, detects redundancy between skills, and tracks costs over time.

Semver verification

Validating that content changes between skill versions match the declared semantic version bump. A typo fix shouldn't be a major version bump; a new API section shouldn't be a patch. The verify command uses heuristic rules and optionally LLM-assisted analysis to detect mismatches.

Policy enforcement

Organizational rules applied to skill collections via .skill-policy.yml. Policies can require trusted sources, ban patterns, mandate metadata, and set staleness limits. Enforced by the policy command and integrable into CI pipelines.

Skill registry

The skills-check.json file that maps product names to npm packages, tracks verified versions, and lists associated skill files. Created by init and consumed by check, report, and refresh.

Ecosystem

skills.sh

The primary CLI and registry for installing and distributing agent skills (npx skills add). Handles discovery, installation, and lifecycle management. skills-check complements skills.sh as the verification layer — skills.sh installs skills, skills-check keeps them safe.

Agent harness

The AI coding tool (Claude Code, Cursor, Codex, Windsurf) that loads skills into its LLM context window and executes them as part of its instruction set. The test command runs eval suites through configurable agent harnesses to verify skill behavior.

Context window

The LLM's working memory where skills, user code, conversation history, and system prompts compete for space. Typically 100K-200K tokens. The budget command helps teams understand and optimize how much of this space their skills consume.