Base Grader¶
evaldeck.graders.BaseGrader
¶
Bases: ABC
Base class for all graders.
Graders evaluate a trace against expected behavior and return a grade result. Supports both sync and async evaluation.
Async behavior
- Default grade_async() runs sync grade() in a thread pool
- Override grade_async() for true async I/O (e.g., LLMGrader)
- When using Evaluator.evaluate_async(), all graders run concurrently
Creating a custom async grader::
class MyAPIGrader(BaseGrader):
name = "my_api"
def grade(self, trace, test_case):
# Sync fallback (blocking)
return requests.post(...).json()
async def grade_async(self, trace, test_case):
# Async implementation (non-blocking)
async with httpx.AsyncClient() as client:
response = await client.post(...)
return GradeResult.from_api(response.json())
grade
abstractmethod
¶
Evaluate the trace and return a grade result.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trace
|
Trace
|
The execution trace to evaluate. |
required |
test_case
|
EvalCase
|
The test case with expected behavior. |
required |
Returns:
| Type | Description |
|---|---|
GradeResult
|
GradeResult indicating pass/fail and details. |
Source code in src/evaldeck/graders/base.py
grade_async
async
¶
Async version of grade.
Default implementation runs sync grade() in a thread pool. Override this method for true async behavior (e.g., async API calls).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trace
|
Trace
|
The execution trace to evaluate. |
required |
test_case
|
EvalCase
|
The test case with expected behavior. |
required |
Returns:
| Type | Description |
|---|---|
GradeResult
|
GradeResult indicating pass/fail and details. |
Source code in src/evaldeck/graders/base.py
evaldeck.graders.CompositeGrader
¶
Bases: BaseGrader
A grader that combines multiple graders.
By default, all graders must pass for the composite to pass.
Initialize composite grader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graders
|
list[BaseGrader]
|
List of graders to run. |
required |
require_all
|
bool
|
If True, all must pass. If False, any can pass. |
True
|
Source code in src/evaldeck/graders/base.py
grade
¶
Run all graders and combine results.
Source code in src/evaldeck/graders/base.py
grade_async
async
¶
Run all graders concurrently and combine results.