Configuration

All configuration is done through action inputs in your workflow file.

Inputs

Input	Required	Default	Description
`skill-name`	Yes	-	Name of the skill to evaluate
`skill-path`	Yes	-	Path to the skill directory
`anthropic-api-key`	Yes	-	Anthropic API key for `claude` CLI
`pass-threshold`	No	`80`	Minimum pass rate (0-100) to succeed
`timeout`	No	`120`	Timeout per eval case in seconds
`post-comment`	No	`true`	Post results as a PR comment
`github-token`	No	`github.token`	Token for PR comments
`upload-viewer`	No	`true`	Upload eval viewer HTML as artifact
`node-version`	No	`22`	Node.js version for claude CLI
`max-retries`	No	`3`	Max retry attempts per API call
`retry-delay`	No	`10`	Base delay between retries (seconds)

Outputs

Output	Description
`pass-rate`	Overall pass rate as percentage (0-100)
`passed`	Total criteria passed
`total`	Total criteria evaluated
`cases-run`	Number of eval cases executed

Environment secrets

For best security, use a GitHub environment for your API key:

jobs:
  eval:
    environment: skill-eval  # scoped secret access
    steps:
      - uses: skill-bench/skill-eval-action@v1
        with:
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Retry behavior

The action retries failed API calls automatically:

3 attempts per claude -p call (configurable via max-retries)
Exponential backoff: delay * attempt number (10s, 20s, 30s by default)
Retries on: timeout, non-zero exit code, JSON parse failure
Only the final attempt's result is reported

Threshold enforcement

The action fails the step if the pass rate is below the threshold. Set pass-threshold: '0' to never fail (reporting only).

Inputs​

Outputs​

Environment secrets​

Retry behavior​

Threshold enforcement​

Inputs

Outputs

Environment secrets

Retry behavior

Threshold enforcement