Getting Started

Add skill evaluation to your CI pipeline in 3 steps.

Prerequisites

A GitHub repository with Claude Code skills
ANTHROPIC_API_KEY as a repository or environment secret
Eval YAML files in each skill's evals/ directory

Step 1: Add eval cases

Create test cases in your skill directory:

skills/my-skill/
  SKILL.md
  evals/
    001-basic-usage.yaml
    002-edge-case.yaml
    003-negative-trigger.yaml

Each YAML file defines a test:

# evals/001-basic-usage.yaml
name: Basic usage
prompt: "The user prompt that should trigger this skill"
criteria:
  - "Output contains the expected pattern"
  - "Response follows the skill's conventions"
expect_skill: true
timeout: 120

Step 2: Add the workflow

Create .github/workflows/skill-eval.yml:

name: Skill Eval
on:
  pull_request:
    paths:
      - 'skills/**'
  workflow_dispatch:

permissions:
  contents: read
  pull-requests: write

jobs:
  eval:
    runs-on: ubuntu-latest
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v6

      - uses: skill-bench/skill-eval-action@v1
        with:
          skill-name: my-skill
          skill-path: skills/my-skill
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Step 3: Run it

Push the workflow and trigger it:

gh workflow run skill-eval.yml

The action will:

Execute each eval case via claude -p
Grade responses against your criteria
Post results in the GitHub Actions step summary
Upload an interactive HTML viewer as an artifact

Next steps

Writing Evals - learn how to write effective test cases
Configuration - customize thresholds, timeouts, and retries
Usage Patterns - evaluate multiple skills in parallel

Prerequisites​

Step 1: Add eval cases​

Step 2: Add the workflow​

Step 3: Run it​

Next steps​

Prerequisites

Step 1: Add eval cases

Step 2: Add the workflow

Step 3: Run it

Next steps