← Datasets

Refund-flow policy guardrails

The checks the support agent must keep passing before any release.

4 items · 69e8360b-cd22-4fc2-bebf-77394b6eca0e

Run an eval

Replays every item and scores the output. Use the captured model for a regression check, or override the model to compare.

Items
Eval history