Getting Started
Add skill evaluation to your CI pipeline in 3 steps.
Prerequisites
- A GitHub repository with Claude Code skills
ANTHROPIC_API_KEYas a repository or environment secret- Eval YAML files in each skill's
evals/directory
Step 1: Add eval cases
Create test cases in your skill directory:
skills/my-skill/
SKILL.md
evals/
001-basic-usage.yaml
002-edge-case.yaml
003-negative-trigger.yaml
Each YAML file defines a test:
# evals/001-basic-usage.yaml
name: Basic usage
prompt: "The user prompt that should trigger this skill"
criteria:
- "Output contains the expected pattern"
- "Response follows the skill's conventions"
expect_skill: true
timeout: 120
Step 2: Add the workflow
Create .github/workflows/skill-eval.yml:
name: Skill Eval
on:
pull_request:
paths:
- 'skills/**'
workflow_dispatch:
permissions:
contents: read
pull-requests: write
jobs:
eval:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v6
- uses: skill-bench/skill-eval-action@v1
with:
skill-name: my-skill
skill-path: skills/my-skill
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
Step 3: Run it
Push the workflow and trigger it:
gh workflow run skill-eval.yml
The action will:
- Execute each eval case via
claude -p - Grade responses against your criteria
- Post results in the GitHub Actions step summary
- Upload an interactive HTML viewer as an artifact
Next steps
- Writing Evals - learn how to write effective test cases
- Configuration - customize thresholds, timeouts, and retries
- Usage Patterns - evaluate multiple skills in parallel