Configuration
All configuration is done through action inputs in your workflow file.
Inputs
| Input | Required | Default | Description |
|---|---|---|---|
skill-name | Yes | - | Name of the skill to evaluate |
skill-path | Yes | - | Path to the skill directory |
anthropic-api-key | Yes | - | Anthropic API key for claude CLI |
pass-threshold | No | 80 | Minimum pass rate (0-100) to succeed |
timeout | No | 120 | Timeout per eval case in seconds |
post-comment | No | true | Post results as a PR comment |
github-token | No | github.token | Token for PR comments |
upload-viewer | No | true | Upload eval viewer HTML as artifact |
node-version | No | 22 | Node.js version for claude CLI |
max-retries | No | 3 | Max retry attempts per API call |
retry-delay | No | 10 | Base delay between retries (seconds) |
Outputs
| Output | Description |
|---|---|
pass-rate | Overall pass rate as percentage (0-100) |
passed | Total criteria passed |
total | Total criteria evaluated |
cases-run | Number of eval cases executed |
Environment secrets
For best security, use a GitHub environment for your API key:
jobs:
eval:
environment: skill-eval # scoped secret access
steps:
- uses: skill-bench/skill-eval-action@v1
with:
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
Retry behavior
The action retries failed API calls automatically:
- 3 attempts per
claude -pcall (configurable viamax-retries) - Exponential backoff: delay * attempt number (10s, 20s, 30s by default)
- Retries on: timeout, non-zero exit code, JSON parse failure
- Only the final attempt's result is reported
Threshold enforcement
The action fails the step if the pass rate is below the threshold. Set pass-threshold: '0' to never fail (reporting only).