Configuration¶
Evaldeck uses a YAML configuration file to define project settings. This guide covers all configuration options.
Configuration File¶
Evaldeck looks for configuration in this order:
evaldeck.yamlevaldeck.yml.evaldeck.yaml
You can also specify a custom path:
Full Configuration Reference¶
# evaldeck.yaml
# Configuration version (required)
version: 1
# Agent configuration (required for `evaldeck run`)
agent:
module: my_agent # Python module containing your agent
function: run_agent # Function to call (receives input string)
# class_name: MyAgent # Optional: if using a class
# Test directory
test_dir: tests/evals # Where to find test case YAML files
# Test suites (optional, alternative to test_dir)
suites:
- name: critical
path: tests/evals/critical
tags: [critical]
- name: regression
path: tests/evals/regression
# Default settings for all tests
defaults:
timeout: 30 # Timeout in seconds
retries: 0 # Number of retries on failure
# Grader settings
graders:
llm:
model: gpt-4o-mini # Default model for LLM graders
provider: openai # openai or anthropic
timeout: 60 # LLM call timeout
# Pass/fail thresholds
thresholds:
min_pass_rate: 0.0 # Minimum pass rate (0.0-1.0)
max_failures: null # Maximum allowed failures (null = unlimited)
# Output settings
output_dir: .evaldeck # Directory for results and artifacts
Agent Configuration¶
With Framework Integration (Recommended)¶
For supported frameworks, use the framework option for automatic instrumentation:
Your function returns the agent instance (not a Trace):
# my_agent.py
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
def create_agent():
llm = ChatOpenAI(model="gpt-4o-mini")
return create_react_agent(llm, tools=[...])
Evaldeck handles OTel instrumentation and trace capture automatically.
Supported frameworks:
| Framework | Value | Install |
|---|---|---|
| LangChain / LangGraph | langchain |
pip install evaldeck[langchain] |
Note: Agent invocations are serialized to ensure clean trace capture. Grading runs in parallel.
Without Framework Integration¶
If not using a supported framework, your function must return a Trace:
# my_package/agents/booking.py
from evaldeck import Trace, Step
def run_booking_agent(input: str) -> Trace:
trace = Trace(input=input)
# ... agent logic ...
trace.complete(output="...")
return trace
Using a Class¶
If your agent is a class:
# my_package/agents.py
from evaldeck import Trace
class BookingAgent:
def __init__(self):
# Setup...
pass
def run(self, input: str) -> Trace:
trace = Trace(input=input)
# ... agent logic ...
return trace
Test Directories and Suites¶
Simple Setup¶
For most projects, a single test directory is sufficient:
Evaldeck will recursively discover all .yaml files:
Named Suites¶
For larger projects, organize into named suites:
suites:
- name: smoke
path: tests/smoke
tags: [quick]
- name: regression
path: tests/regression
tags: [full]
- name: critical
path: tests/critical
tags: [critical, ci]
Run specific suites:
Default Settings¶
Defaults apply to all test cases unless overridden:
Override per test case:
# tests/evals/slow_test.yaml
name: complex_operation
timeout: 120 # Override: 2 minute timeout
retries: 0 # Override: no retries
Grader Configuration¶
LLM Grader Defaults¶
Set defaults for all LLM graders:
graders:
llm:
model: gpt-4o-mini # Model to use
provider: openai # openai or anthropic
timeout: 60 # Timeout for LLM calls
Provider Configuration¶
LLM graders use environment variables for authentication:
Override the model per test case:
Thresholds¶
Thresholds determine when an evaluation run passes or fails:
CI/CD Thresholds¶
For production CI:
For development:
Environment Variables¶
Evaldeck supports environment variable substitution:
agent:
module: ${AGENT_MODULE:-my_agent} # Default: my_agent
graders:
llm:
model: ${LLM_MODEL:-gpt-4o-mini}
Common environment variables:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key for LLM graders |
ANTHROPIC_API_KEY |
Anthropic API key for LLM graders |
EVALDECK_CONFIG |
Custom config file path |
EVALDECK_VERBOSE |
Enable verbose output |
Multiple Configurations¶
Use different configs for different scenarios:
configs/
├── evaldeck.yaml # Default development config
├── evaldeck.ci.yaml # CI/CD config (stricter thresholds)
└── evaldeck.local.yaml # Local testing config
Validation¶
Evaldeck validates your configuration on load. Common errors:
Error: Invalid configuration
- agent.module: Required field missing
- thresholds.min_pass_rate: Must be between 0.0 and 1.0
To validate without running: