Test Case Models¶
evaldeck.test_case.EvalCase
¶
Bases: BaseModel
A test case for evaluating an agent.
Test cases define conversation turns to send to the agent and the expected behavior/output to validate against for each turn.
Example
Single turn: turns: - user: "Book a flight to NYC" expected: tools_called: [search_flights, book_flight]
Multi-turn: turns: - user: "I want to book a flight" - user: "NYC to LA, March 15" expected: tools_called: [search_flights] - user: "Book the cheapest one" expected: tools_called: [book_flight]
evaldeck.test_case.ExpectedBehavior
¶
Bases: BaseModel
Expected behavior for an agent test case.
evaldeck.test_case.EvalSuite
¶
Bases: BaseModel
A collection of test cases.
from_directory
classmethod
¶
Load all test cases from a directory.
Source code in src/evaldeck/test_case.py
filter_by_tags
¶
Return a new suite with only test cases matching the given tags.
Source code in src/evaldeck/test_case.py
evaldeck.test_case.GraderConfig
¶
Bases: BaseModel
Configuration for a grader.