Skip to content

API Reference

Comprehensive documentation for Evaldeck's Python API, auto-generated from source code docstrings.

Core Classes

Class Description
Trace Complete execution record of an agent
Step Single step in an execution trace
Evaluator Core evaluation engine
EvalCase Test case definition

Result Types

Class Description
GradeResult Result from a single grader
EvaluationResult Complete evaluation of one test

Graders

Class Description
BaseGrader Abstract base class for graders
Code Graders Deterministic graders
LLM Graders Model-as-judge graders

Metrics

Class Description
Built-in Metrics Quantitative measurements

Configuration

Class Description
EvaldeckConfig Configuration loading

Quick Import Guide

# Core classes
from evaldeck import (
    Trace,
    Step,
    Evaluator,
    EvalCase,
    EvalSuite,
    ExpectedBehavior,
)

# Result types
from evaldeck import (
    GradeResult,
    GradeStatus,
    MetricResult,
    EvaluationResult,
    SuiteResult,
    RunResult,
)

# Enums
from evaldeck import (
    StepType,
    StepStatus,
    TraceStatus,
    TokenUsage,
)

# Graders
from evaldeck.graders import (
    BaseGrader,
    ContainsGrader,
    ToolCalledGrader,
    LLMGrader,
)

# Metrics
from evaldeck.metrics import (
    BaseMetric,
    StepCountMetric,
    TokenUsageMetric,
)