Skip to main content

Configuration

All configuration is done through action inputs in your workflow file.

Inputs

InputRequiredDefaultDescription
skill-nameYes-Name of the skill to evaluate
skill-pathYes-Path to the skill directory
anthropic-api-keyYes-Anthropic API key for claude CLI
pass-thresholdNo80Minimum pass rate (0-100) to succeed
timeoutNo120Timeout per eval case in seconds
post-commentNotruePost results as a PR comment
github-tokenNogithub.tokenToken for PR comments
upload-viewerNotrueUpload eval viewer HTML as artifact
node-versionNo22Node.js version for claude CLI
max-retriesNo3Max retry attempts per API call
retry-delayNo10Base delay between retries (seconds)

Outputs

OutputDescription
pass-rateOverall pass rate as percentage (0-100)
passedTotal criteria passed
totalTotal criteria evaluated
cases-runNumber of eval cases executed

Environment secrets

For best security, use a GitHub environment for your API key:

jobs:
eval:
environment: skill-eval # scoped secret access
steps:
- uses: skill-bench/skill-eval-action@v1
with:
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Retry behavior

The action retries failed API calls automatically:

  • 3 attempts per claude -p call (configurable via max-retries)
  • Exponential backoff: delay * attempt number (10s, 20s, 30s by default)
  • Retries on: timeout, non-zero exit code, JSON parse failure
  • Only the final attempt's result is reported

Threshold enforcement

The action fails the step if the pass rate is below the threshold. Set pass-threshold: '0' to never fail (reporting only).