Skip to main content

Eval Case Format

YAML schema

name: string              # required - human-readable case name
prompt: string # required - user prompt sent to Claude
criteria: # required - list of pass/fail assertions
- string
- string
expect_skill: boolean # optional - default true
timeout: integer # optional - default from action input (120s)
files: # optional - temp files created before execution
- path: string
content: string

How execution works

For positive cases (expect_skill: true), the SKILL.md content is prepended to the prompt so Claude has access to the skill's instructions.

For negative cases (expect_skill: false), the bare prompt is sent without skill content.

Grading output

Each case produces a grading.json:

{
"expectations": [
{
"text": "Output contains a valid resource block",
"passed": true,
"evidence": "The response includes 'resource \"aws_instance\" \"web\" {...}'"
},
{
"text": "Uses for_each, not count",
"passed": false,
"evidence": "Response uses 'count = var.instance_count' instead of for_each"
}
],
"summary": {
"passed": 1,
"failed": 1,
"total": 2,
"pass_rate": 0.5
}
}