# Skills Check — Complete Reference > The missing quality toolkit for Agent Skills skills-check is the quality and integrity layer for Agent Skills (SKILL.md files). It provides 10 commands covering freshness detection, security auditing, metadata linting, token budget analysis, semver verification, policy enforcement, and eval testing. Agent Skills are markdown documents with YAML frontmatter that instruct AI coding agents (Claude Code, Cursor, Codex, etc.) how to work with specific products, frameworks, and patterns. Skills look like documentation but are treated as executable instructions by agents with file system and shell access. This makes skill quality a security and correctness concern, not just a readability one. skills-check sits alongside skills.sh (which handles installation, discovery, and distribution) as the complementary verification layer. skills.sh installs them, skills-check keeps them safe. ## Installation No installation required. Run directly with npx: ``` npx skills-check check ``` Or install globally: ``` npm install -g skills-check ``` ## Commands Overview | Command | Group | Description | |---------|-------|-------------| | check | Freshness & Currency | Detect version drift by comparing skill frontmatter against the npm registry. | | refresh | Freshness & Currency | AI-assisted updates to stale skills using LLMs. Fetches changelogs and generates diffs. | | report | Freshness & Currency | Generate a formatted staleness report in markdown or JSON for your team or CI. | | audit | Security & Quality | Scan for hallucinated packages, prompt injection, dangerous commands, and dead URLs. | | lint | Security & Quality | Validate metadata completeness, structural quality, and format in skill files. | | policy | Security & Quality | Enforce organizational trust rules for skills via .skill-policy.yml policy-as-code. | | budget | Analysis & Verification | Measure token cost per skill, detect redundancy, and track context window usage over time. | | verify | Analysis & Verification | Validate that content changes between skill versions match the declared semver bump. | | test | Analysis & Verification | Run eval test suites declared in skill tests/ directories for regression detection. | | fingerprint | Analysis & Verification | Generate a fingerprint registry of installed skills with content hashes and watermarks. | | usage | Analysis & Verification | Analyze skill telemetry events for usage patterns, cost estimation, and policy compliance. | | init | Setup | Scan a skills directory for SKILL.md files and generate a skills-check.json registry. | --- ## Freshness & Currency ### check Compare the product-version in your SKILL.md frontmatter against the latest version on npm. Instantly know which skills are stale and by how much. **Why it matters:** Agent skills that reference outdated APIs lead to hallucinated code, broken builds, and wasted developer time. A skill written for React 18 won't generate correct React 19 Server Component patterns. Version drift is the #1 source of skill quality decay. **What it does:** - Reads your skills-check.json registry to map products to npm packages - Fetches the latest version from the npm registry for each product - Compares it against the product-version declared in each skill's frontmatter - Reports stale products with the exact version gap (e.g. 4.2.0 → 4.5.1) - Exits with code 1 in CI mode when staleness is detected **Usage:** `npx skills-check check [options]` **Options:** - `--registry ` — Path to skills-check.json (default: skills-check.json) - `--json` — Output results as JSON - `--ci` — CI mode — exit code 1 if stale products found - `-p, --product ` — Check a single product **Examples:** Check all products: ``` npx skills-check check ``` JSON output for scripts: ``` npx skills-check check --json ``` CI gate: ``` npx skills-check check --ci ``` Check one product: ``` npx skills-check check -p ai-sdk ``` **CI tip:** Pair with the GitHub Action to automatically open issues when skills go stale. Set fail-on-stale: true to block merges until skills are updated. ### refresh Automatically update stale skill files by fetching changelogs, analyzing breaking changes, and generating targeted diffs using LLMs. Review, approve, or auto-apply. **Why it matters:** Manually updating skills after every dependency release is tedious and error-prone. Refresh automates the grunt work — it reads the changelog, understands what changed, and proposes precise edits to your skill files so agents always have current instructions. **What it does:** - Identifies stale skills by running check internally - Fetches changelogs and release notes from GitHub for each stale product - Sends the current skill content + changelog to an LLM to generate a targeted update - Presents a diff for review in interactive mode, or auto-applies with -y - Preserves your skill's structure and style — only changes what's outdated **Usage:** `npx skills-check refresh [skills-dir] [options]` **Options:** - `-y, --yes` — Auto-apply all changes without prompting - `--dry-run` — Preview changes without writing files - `--provider ` — LLM provider: anthropic, openai, or google - `--model ` — Specific model ID to use - `-p, --product ` — Refresh a single product **Examples:** Interactive review: ``` npx skills-check refresh ./skills ``` Auto-apply: ``` npx skills-check refresh -y ``` Preview only: ``` npx skills-check refresh --dry-run ``` Specific provider: ``` npx skills-check refresh --provider anthropic --model claude-sonnet-4-20250514 ``` **CI tip:** Run refresh in --dry-run mode as a CI check to surface which skills need updates, then handle updates in a separate PR workflow. ### report Produce a comprehensive staleness report summarizing which skills are current, which are stale, and by how much. Output as markdown for issues or JSON for automation. **Why it matters:** Teams need visibility into skill health across their entire fleet. A weekly report in a GitHub issue or Slack message keeps everyone aware of drift without requiring manual checks. **What it does:** - Runs a full version check against the npm registry - Generates a formatted report with current vs. latest versions - Groups results by status (stale, current, error) - Outputs markdown suitable for GitHub issues or JSON for pipelines **Usage:** `npx skills-check report [options]` **Options:** - `--registry ` — Path to skills-check.json - `--format ` — Output format: markdown or json **Examples:** Markdown report: ``` npx skills-check report ``` JSON for automation: ``` npx skills-check report --format json ``` **CI tip:** The GitHub Action automatically generates a report and opens/updates a GitHub issue when staleness is detected. Use the report output for custom Slack or email notifications. ## Security & Quality ### audit A security-focused scan that verifies every package, URL, and command in your skill files. Catches hallucinated dependencies, prompt injection patterns, dangerous shell commands, and broken links before they reach an agent. **Why it matters:** Skills are executable instructions — an agent will npm install packages, run shell commands, and follow URLs exactly as written. A hallucinated package name could install malware via typosquatting. A prompt injection pattern could override agent safety boundaries. Audit catches these before they cause harm. **What it does:** - Extracts all npm/pip/cargo package references and verifies they exist on their registries - Cross-references against known hallucinated package databases (Aikido Security, Socket.dev research) - Scans for prompt injection patterns: instruction overrides, data exfiltration, obfuscation - Flags dangerous shell commands: destructive operations, pipe-to-shell installs, sensitive file access - Checks every URL for liveness via HEAD requests with SSRF protection - Validates frontmatter metadata completeness **Usage:** `npx skills-check audit [path] [options]` **Options:** - `--format ` — Output: terminal, json, markdown, or sarif - `--fail-on ` — Exit 1 at threshold: critical, high, medium, low - `--ci` — CI mode with strict exit codes - `--quiet` — Suppress non-finding output - `--no-network` — Skip network-dependent checks (registry, URLs) - `--isolation ` — Run in an isolated container (auto, docker, podman, apple-container, vercel-sandbox, etc.) - `--no-isolation` — Disable isolation and run directly on the host **Examples:** Audit everything: ``` npx skills-check audit ``` Audit one file: ``` npx skills-check audit ./skills/ai-sdk-core.md ``` SARIF for GitHub Security tab: ``` npx skills-check audit --format sarif ``` CI gate at high severity: ``` npx skills-check audit --fail-on high --ci ``` Audit in Docker: ``` npx skills-check audit --isolation docker ``` **CI tip:** Use --format sarif and upload to GitHub's code scanning to see findings inline on PRs. Combine with --fail-on high to block merges on critical issues. ### lint Enforce metadata standards across your skill fleet. Validates required frontmatter fields, checks SPDX license identifiers, verifies URLs, and can auto-fix missing fields from git context. **Why it matters:** Incomplete metadata breaks downstream tooling. Without a name, skills-check can't track a skill. Without product-version, check can't detect drift. Without a license, legal compliance is impossible. Lint ensures every skill meets the bar before it ships. **What it does:** - Validates required fields: name, description (always required) - Checks publish-ready fields: author, license, repository - Validates conditional fields: product-version when products are referenced, agents when agent-specific - Verifies format: semver syntax, SPDX license identifiers (100+ supported with OR/AND expressions), valid URLs - Auto-fix mode populates missing fields from git context (author from git config, repo from git remote) **Usage:** `npx skills-check lint [dir] [options]` **Options:** - `--fix` — Auto-fix missing fields from git context - `--ci` — CI mode with strict exit codes - `--fail-on ` — Threshold: error or warning - `-f, --format ` — Output: terminal or json **Examples:** Lint all skills: ``` npx skills-check lint ``` Auto-fix from git: ``` npx skills-check lint --fix ``` CI gate: ``` npx skills-check lint --ci --fail-on error ``` JSON output: ``` npx skills-check lint --format json ``` **CI tip:** Run lint --fix locally before committing to auto-populate metadata. Use lint --ci in CI to catch skills that slip through without proper frontmatter. ### policy Define and enforce organizational rules for which skills are allowed, what they must contain, and where they can come from. Policy-as-code via a .skill-policy.yml file that lives in your repo. **Why it matters:** In team and enterprise environments, you need guardrails: only skills from approved sources, mandatory security disclaimers, banned patterns, freshness requirements. Policy turns these rules into automated checks that run in CI. **What it does:** - Source allow/deny lists with glob matching (e.g., allow only npm:@your-org/*) - Required skills verification — ensure critical skills are always present - Banned skills — block known-bad or deprecated skills - Metadata requirements — enforce specific frontmatter fields and allowed licenses - Content deny/require patterns — flag or require specific content with line numbers - Freshness limits — max version drift and max age in days - Audit integration — require clean audit results as part of policy **Usage:** `npx skills-check policy [options]` **Options:** - `--policy ` — Path to .skill-policy.yml - `--fail-on ` — Threshold: blocked, violation, or warning - `--ci` — CI mode with strict exit codes - `-f, --format ` — Output: terminal or json **Examples:** Check against policy: ``` npx skills-check policy check ``` Initialize default policy: ``` npx skills-check policy init ``` Validate policy file: ``` npx skills-check policy validate ``` CI gate: ``` npx skills-check policy check --ci --fail-on violation ``` **CI tip:** Commit .skill-policy.yml to your repo root. Policy discovery walks up directories, so monorepo subdirectories inherit the root policy automatically. ## Analysis & Verification ### budget Every skill you load into an agent consumes context window tokens. Budget tells you exactly how many, finds redundancy between skills, estimates costs across model pricing tiers, and tracks changes over time. **Why it matters:** Context windows are finite and expensive. Loading 5 verbose skills at 10K tokens each consumes half of Claude's context window before the user even types a prompt. Budget helps you optimize: trim bloated skills, deduplicate overlapping content, and set token ceilings that CI enforces. **What it does:** - Counts tokens per skill using cl100k_base encoding (within 5% across model families) - Breaks down token usage per section within each skill - Detects inter-skill redundancy via 4-gram Jaccard similarity - Estimates cost across model pricing tiers (Haiku, Sonnet, Opus) - Saves snapshots and compares against baselines to track budget changes over time - Enforces token ceilings — exit 1 if total exceeds a configurable threshold **Usage:** `npx skills-check budget [dir] [options]` **Options:** - `-s, --skill ` — Analyze a specific skill - `-d, --detailed` — Per-section token breakdown - `--max-tokens ` — Token ceiling — exit 1 if exceeded - `--save ` — Save snapshot for future comparison - `--compare ` — Compare against a saved snapshot - `--model ` — Pricing model for cost estimates - `-f, --format ` — Output: terminal or json **Examples:** Analyze all skills: ``` npx skills-check budget ``` Detailed breakdown: ``` npx skills-check budget --detailed ``` Enforce a ceiling: ``` npx skills-check budget --max-tokens 50000 ``` Save baseline: ``` npx skills-check budget --save baseline.json ``` Compare to baseline: ``` npx skills-check budget --compare baseline.json ``` **CI tip:** Set --max-tokens in CI to prevent skill bloat. Save a baseline in main and use --compare on PRs to catch token regressions before they merge. ### verify Like cargo semver-checks but for knowledge. Verify that the version bump declared in a skill's frontmatter actually matches the magnitude of content changes. Catches both under-bumps (breaking changes in a patch) and over-bumps (typo fix as a major). **Why it matters:** Semver is a contract. If an agent pins to ^1.0.0, a breaking change in 1.1.0 violates that contract. Verify catches dishonest or accidental version bumps by analyzing the actual content diff — using both heuristics and optional LLM-assisted semantic analysis. **What it does:** - Retrieves the previous version of each skill from git history - Computes section-level diffs, package changes, and content similarity scores - Runs heuristic rules to classify changes as major, minor, or patch - Optionally uses an LLM for semantic analysis of uncertain cases - Compares the classified change level against the declared version bump - Suggests the correct version bump when mismatches are found **Usage:** `npx skills-check verify [options]` **Options:** - `-s, --skill ` — Verify a specific skill file - `-a, --all` — Verify all skills with git history - `--suggest` — Suggest the correct version bump - `--skip-llm` — Heuristic-only mode (no API key needed) - `--provider / --model` — LLM provider and model for semantic analysis - `-f, --format ` — Output: terminal or json **Examples:** Verify all skills: ``` npx skills-check verify --all ``` Suggest correct bump: ``` npx skills-check verify --suggest ``` Heuristic only: ``` npx skills-check verify --all --skip-llm ``` One skill: ``` npx skills-check verify -s ./skills/ai-sdk-core.md ``` **CI tip:** Run verify --all --skip-llm in CI for fast, deterministic checks. Use the full LLM-assisted mode locally for nuanced semantic analysis before publishing. ### test Execute eval test suites that verify skills actually work when loaded by an agent. Define test cases in cases.yaml with prompts, expected outcomes, and graders. Track baselines to catch regressions after refresh. **Why it matters:** A skill can have perfect metadata and pass every lint check, but still produce wrong code when an agent uses it. Test closes this gap by actually running prompts through an agent harness and grading the output — like integration tests for your skill files. **What it does:** - Discovers tests/ directories inside skill directories containing cases.yaml - Parses declarative test suites with trigger, outcome, style, and regression test types - Executes prompts through configurable agent harnesses (Claude Code CLI, generic shell) - Grades results with 7 built-in graders: file-exists, command, contains, not-contains, json-match, package-has, llm-rubric - Supports custom graders via dynamic module import - Runs multiple trials per test case with configurable pass thresholds and flaky test detection - Stores baselines for regression tracking across skill updates **Usage:** `npx skills-check test [dir] [options]` **Options:** - `-s, --skill ` — Test a specific skill - `-t, --type ` — Filter: trigger, outcome, style, or regression - `--agent ` — Agent harness: claude-code or generic - `--trials ` — Number of runs per test case - `--dry` — Preview test plan without executing - `--update-baseline` — Save results as new baseline - `--ci` — CI mode with strict exit codes - `-f, --format ` — Output: terminal or json - `--isolation ` — Run in an isolated container (auto, docker, podman, apple-container, vercel-sandbox, etc.) - `--no-isolation` — Disable isolation and accept the risk of running directly on the host **Examples:** Run all tests: ``` npx skills-check test ``` Test one skill: ``` npx skills-check test -s ai-sdk-core ``` Outcome tests only: ``` npx skills-check test --type outcome ``` Preview plan: ``` npx skills-check test --dry ``` Update baseline: ``` npx skills-check test --update-baseline ``` Test in isolation: ``` npx skills-check test --isolation auto ``` **CI tip:** Run test --ci after refresh to catch regressions. Use --update-baseline on main after verified changes so future PRs compare against the latest known-good results. ### fingerprint Discover SKILL.md files, compute SHA-256 content hashes, extract or inject watermarks, and produce a FingerprintRegistry for integrity verification, runtime detection, and deduplication. **Why it matters:** When agents load skills into context, you need to know exactly which skills are present and whether they've been tampered with. Fingerprint gives every skill a unique identity — a content hash that changes when the skill changes — enabling runtime detection, integrity verification, and deduplication across environments. **What it does:** - Discovers all SKILL.md files in the target directory - Computes SHA-256 hashes for frontmatter, content body, and a combined prefix - Extracts existing watermarks or injects new ones into skill files - Produces a FingerprintRegistry mapping each skill to its content hashes - Outputs results in terminal, JSON, or machine-readable formats **Usage:** `npx skills-check fingerprint [dir] [options]` **Options:** - `--output ` — Write fingerprint registry to a file - `--inject-watermarks` — Inject watermarks into skill files - `--json` — Output results as JSON - `--ci` — CI mode with strict exit codes - `--verbose` — Show detailed processing information - `--quiet` — Suppress non-essential output **Examples:** Fingerprint all skills: ``` npx skills-check fingerprint ``` Inject watermarks: ``` npx skills-check fingerprint --inject-watermarks ``` JSON output to file: ``` npx skills-check fingerprint --json --output fingerprints.json ``` **CI tip:** Generate fingerprints in CI and compare against a known-good registry to detect unauthorized skill modifications. Use --inject-watermarks during build to enable runtime telemetry. ### usage Read telemetry events from JSONL or SQLite stores to understand which skills are actually being used, how often, and at what cost. Cross-reference against organizational policies to detect unauthorized or non-compliant skill usage. **Why it matters:** Installing and fingerprinting skills is only half the story — you also need to know which skills agents actually load at runtime, how much they cost, and whether usage complies with organizational policies. Usage closes the observability gap between skill installation and agent behavior. **What it does:** - Reads telemetry events from JSONL or SQLite data stores - Deduplicates events and groups them by skill identity - Detects version drift between deployed and actually-used skill versions - Estimates token cost based on skill usage frequency and context window consumption - Cross-references usage against organizational policies for compliance checking **Usage:** `npx skills-check usage [options]` **Options:** - `--store ` — Path to telemetry store (JSONL or SQLite) - `--since ` — Filter events after this date (ISO 8601) - `--until ` — Filter events before this date (ISO 8601) - `--check-policy` — Cross-reference usage against policy rules - `--policy ` — Path to .skill-policy.yml for compliance checks - `--format ` — Output: terminal, json, or markdown - `--json` — Shorthand for --format json - `--markdown` — Shorthand for --format markdown - `--output ` — Write report to a file - `--ci` — CI mode with strict exit codes - `--fail-on ` — Exit 1 at threshold: critical, high, medium, low - `--detailed` — Show per-event breakdown - `--verbose` — Show detailed processing information - `--quiet` — Suppress non-essential output **Examples:** Analyze usage: ``` npx skills-check usage --store ./telemetry.jsonl ``` Usage with policy check: ``` npx skills-check usage --store ./telemetry.jsonl --check-policy ``` Filter by date range: ``` npx skills-check usage --store ./telemetry.jsonl --since 2026-01-01 --until 2026-03-01 ``` **CI tip:** Run usage --check-policy in CI to catch unauthorized skill usage before it reaches production. Use --fail-on to set severity thresholds for policy violations. ## Setup ### init Bootstrap skills-check for your project. Scans a directory for SKILL.md files, prompts for npm package mappings, and generates the skills-check.json registry that all other commands depend on. **Why it matters:** The skills-check.json registry is the foundation — it maps product names to npm packages and tracks which skill files belong to which product. Without it, check, report, and refresh can't function. Init sets everything up in seconds. **What it does:** - Recursively scans a directory for files matching *SKILL.md or *skill.md - Extracts product names and version references from frontmatter - In interactive mode, prompts you to confirm or correct npm package mappings - In non-interactive mode (-y), auto-detects mappings from frontmatter - Generates a skills-check.json with $schema reference for editor validation **Usage:** `npx skills-check init [dir] [options]` **Options:** - `-y, --yes` — Non-interactive mode (auto-detect mappings) - `-o, --output ` — Output path (default: skills-check.json) **Examples:** Interactive setup: ``` npx skills-check init ./skills ``` Auto-detect: ``` npx skills-check init ./skills -y ``` Custom output path: ``` npx skills-check init ./skills -o config/registry.json ``` **CI tip:** Run init once locally, then commit skills-check.json to your repo. Other commands will find it automatically. --- ## Registry Format (skills-check.json) The `skills-check.json` file maps product names to npm packages, tracks verified versions, and lists associated skill/agent files. It follows a JSON Schema available at https://skillscheck.ai/schema.json. ### Structure ```json { "$schema": "https://skillscheck.ai/schema.json", "version": 1, "products": { "ai-sdk": { "displayName": "Vercel AI SDK", "package": "ai", "verifiedVersion": "4.2.0", "verifiedAt": "2026-01-15T00:00:00Z", "skills": ["ai-sdk-core", "ai-sdk-tools"], "agents": ["ai-sdk-engineer"] } } } ``` ### Fields - `$schema` — URL to the JSON Schema for editor validation - `version` — Schema version (currently 1) - `products` — Map of product names to their configuration: - `displayName` — Human-readable product name - `package` — npm package name used for version lookups - `verifiedVersion` — Last verified version string - `verifiedAt` — ISO 8601 timestamp of last verification - `skills` — Array of skill file names (without .md extension) - `agents` — Array of agent file names (without .md extension) --- ## SKILL.md Frontmatter Specification Each SKILL.md file should include YAML frontmatter with the following fields: ### Required Fields - `name` — Unique identifier for the skill (e.g., "ai-sdk-core") - `description` — Brief description of what the skill covers ### Recommended Fields - `product-version` — Semver version of the product this skill targets (e.g., "4.2.0"). Required for version drift detection via `check`. - `author` — Skill author name or organization - `license` — SPDX license identifier (e.g., "MIT", "Apache-2.0"). Supports OR/AND expressions. - `repository` — URL to the skill's source repository ### Example ```yaml --- name: ai-sdk-core description: Core patterns for the Vercel AI SDK product-version: "4.2.0" author: "Your Name" license: "MIT" repository: "https://github.com/your-org/skills" --- # AI SDK Core Your skill content here... ``` --- ## GitHub Action The `voodootikigod/skills-check` action runs one or more skills-check commands in your CI pipeline. ### Basic Usage ```yaml - uses: voodootikigod/skills-check@v1 with: commands: check,audit,lint,budget audit-fail-on: high lint-fail-on: error budget-max-tokens: 50000 ``` ### Command Selection Inputs | Input | Default | Description | |-------|---------|-------------| | commands | "" | Comma-separated list (e.g., check,audit,lint). Overrides toggle flags. | | check | true | Run version drift detection | | audit | false | Run security and hallucination detection | | lint | false | Run metadata validation | | budget | false | Run token cost analysis | | policy | false | Run policy enforcement | | verify | false | Run semver bump validation | | test | false | Run eval test suites | ### Threshold Inputs | Input | Default | Description | |-------|---------|-------------| | audit-fail-on | high | Severity threshold: critical, high, medium, low | | lint-fail-on | error | Level threshold: error or warning | | budget-max-tokens | "" | Token ceiling (empty = no limit) | | policy-file | "" | Path to .skill-policy.yml | | policy-fail-on | blocked | Threshold: blocked, violation, warning | ### Shared Inputs | Input | Default | Description | |-------|---------|-------------| | skills-dir | . | Directory containing skill files | | registry | skills-check.json | Path to registry file | | node-version | 22 | Node.js version | | open-issues | true | Open/update GitHub issue on staleness | | fail-on-stale | false | Exit non-zero when stale | ### Outputs | Output | Description | |--------|-------------| | stale-count | Number of stale products (0 if current) | | issue-number | Issue number created/updated (empty if none) | | report | Full markdown report from check | | results | JSON with per-command exit codes | ### Full Quality Gate Example ```yaml name: Skill Quality Check on: schedule: - cron: "0 9 * * 1" # Monday 09:00 UTC workflow_dispatch: permissions: contents: read issues: write jobs: quality: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: voodootikigod/skills-check@v1 with: commands: check,audit,lint,budget audit-fail-on: high budget-max-tokens: 100000 fail-on-stale: "true" ``` --- ## LLM Provider Setup For AI-assisted commands (`refresh`, `verify`, `test`), install a provider SDK and set the API key: ### Anthropic (Claude) ``` npm install @ai-sdk/anthropic export ANTHROPIC_API_KEY=sk-... ``` ### OpenAI ``` npm install @ai-sdk/openai export OPENAI_API_KEY=sk-... ``` ### Google (Gemini) ``` npm install @ai-sdk/google export GOOGLE_GENERATIVE_AI_API_KEY=... ``` --- ## Links - npm: https://www.npmjs.com/package/skills-check - GitHub: https://github.com/voodootikigod/skills-check - Documentation: https://skillscheck.ai/docs - JSON Schema: https://skillscheck.ai/schema.json