Eval Case Format
YAML schema
name: string # required - human-readable case name
prompt: string # required - user prompt sent to Claude
criteria: # required - list of pass/fail assertions
- string
- string
expect_skill: boolean # optional - default true
timeout: integer # optional - default from action input (120s)
files: # optional - temp files created before execution
- path: string
content: string
How execution works
For positive cases (expect_skill: true), the SKILL.md content is prepended to the prompt so Claude has access to the skill's instructions.
For negative cases (expect_skill: false), the bare prompt is sent without skill content.
Grading output
Each case produces a grading.json:
{
"expectations": [
{
"text": "Output contains a valid resource block",
"passed": true,
"evidence": "The response includes 'resource \"aws_instance\" \"web\" {...}'"
},
{
"text": "Uses for_each, not count",
"passed": false,
"evidence": "Response uses 'count = var.instance_count' instead of for_each"
}
],
"summary": {
"passed": 1,
"failed": 1,
"total": 2,
"pass_rate": 0.5
}
}