Skip to content

Evaluation Results

evaldeck.results.EvaluationResult

Bases: BaseModel

Complete result of evaluating a single test case.

passed property

passed

Check if the evaluation passed.

failed_grades property

failed_grades

Get all failed grades.

pass_rate property

pass_rate

Calculate pass rate across all grades.

is_multi_turn property

is_multi_turn

Check if this result is from a multi-turn conversation.

turns_completed property

turns_completed

Number of turns that were actually run (not skipped).

total_turns property

total_turns

Total number of turns in the test case.

add_grade

add_grade(grade)

Add a grade result.

Source code in src/evaldeck/results.py
def add_grade(self, grade: GradeResult) -> None:
    """Add a grade result."""
    self.grades.append(grade)
    # Update overall status
    if grade.status == GradeStatus.ERROR:
        self.status = GradeStatus.ERROR
    elif grade.status == GradeStatus.FAIL and self.status != GradeStatus.ERROR:
        self.status = GradeStatus.FAIL

add_metric

add_metric(metric)

Add a metric result.

Source code in src/evaldeck/results.py
def add_metric(self, metric: MetricResult) -> None:
    """Add a metric result."""
    self.metrics.append(metric)

add_turn_result

add_turn_result(turn_result)

Add a turn result.

Source code in src/evaldeck/results.py
def add_turn_result(self, turn_result: TurnResult) -> None:
    """Add a turn result."""
    self.turn_results.append(turn_result)
    # Add turn's grades to overall grades
    for grade in turn_result.grades:
        self.grades.append(grade)
    # Update overall status
    if turn_result.status == GradeStatus.ERROR:
        self.status = GradeStatus.ERROR
        self.failed_at_turn = turn_result.turn_index
    elif turn_result.status == GradeStatus.FAIL and self.status != GradeStatus.ERROR:
        self.status = GradeStatus.FAIL
        if self.failed_at_turn is None:
            self.failed_at_turn = turn_result.turn_index

evaldeck.results.SuiteResult

Bases: BaseModel

Result of evaluating a test suite.

total property

total

Total number of test cases.

passed property

passed

Number of passed test cases.

failed property

failed

Number of failed test cases.

errors property

errors

Number of errored test cases.

pass_rate property

pass_rate

Overall pass rate.

duration_ms property

duration_ms

Total duration in milliseconds.

add_result

add_result(result)

Add an evaluation result.

Source code in src/evaldeck/results.py
def add_result(self, result: EvaluationResult) -> None:
    """Add an evaluation result."""
    self.results.append(result)

evaldeck.results.RunResult

Bases: BaseModel

Result of a complete evaluation run (multiple suites).

total property

total

Total test cases across all suites.

passed property

passed

Total passed across all suites.

failed property

failed

Total failed across all suites.

pass_rate property

pass_rate

Overall pass rate.

all_passed property

all_passed

Check if all tests passed.

add_suite

add_suite(suite)

Add a suite result.

Source code in src/evaldeck/results.py
def add_suite(self, suite: SuiteResult) -> None:
    """Add a suite result."""
    self.suites.append(suite)