Skip to main content

Getting Started

Add skill evaluation to your CI pipeline in 3 steps.

Prerequisites

  • A GitHub repository with Claude Code skills
  • ANTHROPIC_API_KEY as a repository or environment secret
  • Eval YAML files in each skill's evals/ directory

Step 1: Add eval cases

Create test cases in your skill directory:

skills/my-skill/
SKILL.md
evals/
001-basic-usage.yaml
002-edge-case.yaml
003-negative-trigger.yaml

Each YAML file defines a test:

# evals/001-basic-usage.yaml
name: Basic usage
prompt: "The user prompt that should trigger this skill"
criteria:
- "Output contains the expected pattern"
- "Response follows the skill's conventions"
expect_skill: true
timeout: 120

Step 2: Add the workflow

Create .github/workflows/skill-eval.yml:

name: Skill Eval
on:
pull_request:
paths:
- 'skills/**'
workflow_dispatch:

permissions:
contents: read
pull-requests: write

jobs:
eval:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v6

- uses: skill-bench/skill-eval-action@v1
with:
skill-name: my-skill
skill-path: skills/my-skill
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Step 3: Run it

Push the workflow and trigger it:

gh workflow run skill-eval.yml

The action will:

  1. Execute each eval case via claude -p
  2. Grade responses against your criteria
  3. Post results in the GitHub Actions step summary
  4. Upload an interactive HTML viewer as an artifact

Next steps