test command | Skills Check

← Browse all 10 skills-check commands

Execute eval test suites that verify skills actually work when loaded by an agent. Define test cases in cases.yaml with prompts, expected outcomes, and graders. Track baselines to catch regressions after refresh.

Why it matters

A skill can have perfect metadata and pass every lint check, but still produce wrong code when an agent uses it. Test closes this gap by actually running prompts through an agent harness and grading the output — like integration tests for your skill files.

What it does

Discovers tests/ directories inside skill directories containing cases.yaml
Parses declarative test suites with trigger, outcome, style, and regression test types
Executes prompts through configurable agent harnesses (Claude Code CLI, generic shell)
Grades results with 7 built-in graders: file-exists, command, contains, not-contains, json-match, package-has, llm-rubric
Supports custom graders via dynamic module import
Runs multiple trials per test case with configurable pass thresholds and flaky test detection
Stores baselines for regression tracking across skill updates

Usage

npx skills-check test [dir] [options]

Options

Flag	Description
`-s, --skill <name>`	Test a specific skill
`-t, --type <type>`	Filter: trigger, outcome, style, or regression
`--agent <name>`	Agent harness: claude-code or generic
`--trials <n>`	Number of runs per test case
`--dry`	Preview test plan without executing
`--update-baseline`	Save results as new baseline
`--ci`	CI mode with strict exit codes
`-f, --format <type>`	Output: terminal or json
`--isolation <provider>`	Run in an isolated container (auto, docker, podman, apple-container, vercel-sandbox, etc.)
`--no-isolation`	Disable isolation and accept the risk of running directly on the host

Examples

Run all tests

npx skills-check test

Test one skill

npx skills-check test -s ai-sdk-core

Outcome tests only

npx skills-check test --type outcome

Preview plan

npx skills-check test --dry

Update baseline

npx skills-check test --update-baseline

Test in isolation

npx skills-check test --isolation auto

When to use this

After refreshing a stale skill
To detect regressions in agent behavior
Before major version releases

Related commands

budget — Measure cost, then verify behavior
refresh — Refresh a skill, then verify behavior

CI tip

Run test --ci after refresh to catch regressions. Use --update-baseline on main after verified changes so future PRs compare against the latest known-good results.

See complete CLI reference in docs →