Evals
1An eval re-runs every step in a dataset and scores it against your checks. Green means your change is safe; a red row is a regression you caught before it shipped. Start one from any dataset's Run eval button.
| Status | Eval | Dataset | Model | Pass | Started |
|---|---|---|---|---|---|
| done | Pre-release · refund-flow | Refund-flow policy guardrails | captured | 3/4 | 31m ago |