Datasets
1A dataset is a saved set of agent steps you want to keep an eye on — plus the checks each one has to pass. It's how you turn a bug you fixed once into a test it can't fail again.
- 1Add a step. Open a run, pick an LLM step, and click + Add to dataset on its Replay tab.
- 2Set the checks. Choose what its output must always do — e.g. must not error, must call refund(), or a plain-English rubric.
- 3Run an eval. Runback replays every step and scores it, so a regression shows up as a red row — before it ships.