OpenTelemetry / OpenInference Integration¶

The OpenTelemetry adapter provides broad framework coverage with a single integration. It uses OpenInference semantic conventions to capture traces from any instrumented AI framework.

Supported Frameworks¶

Any framework with an OpenInference instrumentor works automatically:

Framework	Instrumentor Package
LangChain / LangGraph	`openinference-instrumentation-langchain`
CrewAI	`openinference-instrumentation-crewai`
LiteLLM	`openinference-instrumentation-litellm`
OpenAI SDK	`openinference-instrumentation-openai`
Anthropic SDK	`openinference-instrumentation-anthropic`
AutoGen	`openinference-instrumentation-autogen`
LlamaIndex	`openinference-instrumentation-llama-index`
Haystack	`openinference-instrumentation-haystack`
DSPy	`openinference-instrumentation-dspy`
Bedrock	`openinference-instrumentation-bedrock`
Groq	`openinference-instrumentation-groq`
Mistral	`openinference-instrumentation-mistralai`

See the full list of instrumentors.

Installation¶

pip install evaldeck

# Install instrumentor(s) for your framework
pip install openinference-instrumentation-langchain
pip install openinference-instrumentation-crewai
pip install openinference-instrumentation-openai
# ... etc

Quick Start¶

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from openinference.instrumentation.langchain import LangChainInstrumentor

from evaldeck.integrations import EvaldeckSpanProcessor

# 1. Setup tracing
processor = EvaldeckSpanProcessor()
provider = TracerProvider()
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# 2. Instrument your framework
LangChainInstrumentor().instrument()

# 3. Run your agent (no changes to your code needed)
result = agent.invoke({"input": "Book a flight to NYC"})

# 4. Get trace and evaluate
evaldeck_trace = processor.get_latest_trace()

evaluator = Evaluator()
result = evaluator.evaluate(evaldeck_trace, test_case)

Convenience Setup¶

Use the helper function for simpler setup:

from evaldeck.integrations import setup_otel_tracing
from openinference.instrumentation.langchain import LangChainInstrumentor

# One-line setup
processor = setup_otel_tracing()
LangChainInstrumentor().instrument()

# Run agent...
trace = processor.get_latest_trace()

Multiple Frameworks¶

Instrument multiple frameworks at once:

from evaldeck.integrations import setup_otel_tracing
from openinference.instrumentation.langchain import LangChainInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor
from openinference.instrumentation.litellm import LiteLLMInstrumentor

processor = setup_otel_tracing()

# Instrument all frameworks you use
LangChainInstrumentor().instrument()
OpenAIInstrumentor().instrument()
LiteLLMInstrumentor().instrument()

# All traces go to the same processor
# regardless of which framework generated them

Running Test Suites¶

from evaldeck.integrations import setup_otel_tracing
from openinference.instrumentation.crewai import CrewAIInstrumentor
from evaldeck import Evaluator, EvalSuite

# Setup once at module level
processor = setup_otel_tracing()
CrewAIInstrumentor().instrument()

def run_agent(input_text: str):
    """Agent function that returns a trace."""
    processor.reset()  # Clear previous traces

    # Run your agent
    crew.kickoff(inputs={"query": input_text})

    return processor.get_latest_trace()

# Load and run test suite
suite = EvalSuite.from_directory("tests/evals/")
evaluator = Evaluator()
results = evaluator.evaluate_suite(suite, run_agent)

How It Works¶

The EvaldeckSpanProcessor intercepts OpenTelemetry spans and converts them to Evaldeck's Trace/Step format:

OpenInference Span Kind	Evaldeck StepType
`LLM`	`LLM_CALL`
`TOOL`	`TOOL_CALL`
`CHAIN` (root)	Trace container
`CHAIN` (nested)	`REASONING`
`EMBEDDING`	`TOOL_CALL` (tool_name="embedding")
`RETRIEVER`	`TOOL_CALL` (tool_name="retriever")
`RERANKER`	`TOOL_CALL` (tool_name="reranker")
`GUARDRAIL`	`REASONING`

Captured data: - LLM calls: model name, input/output messages, token usage - Tool calls: tool name, arguments, results - Timing: duration for each step - Errors: captured with status and message - Metadata: OpenTelemetry trace/span IDs for cross-referencing

API Reference¶

EvaldeckSpanProcessor¶

processor = EvaldeckSpanProcessor()

# Get a specific trace by ID
trace = processor.get_trace("abc123...")

# Get the most recent trace
trace = processor.get_latest_trace()

# Get all captured traces
traces = processor.get_all_traces()

# Clear all traces (useful between test runs)
processor.reset()

setup_otel_tracing()¶

from evaldeck.integrations import setup_otel_tracing

# Create processor and configure OpenTelemetry
processor = setup_otel_tracing()

# Or pass an existing processor
existing_processor = EvaldeckSpanProcessor()
processor = setup_otel_tracing(processor=existing_processor)

Using with Arize Phoenix¶

You can send traces to both Evaldeck and Phoenix simultaneously:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

from evaldeck.integrations import EvaldeckSpanProcessor

# Evaldeck processor
evaldeck_processor = EvaldeckSpanProcessor()

# Phoenix exporter (or any OTLP endpoint)
phoenix_exporter = OTLPSpanExporter(endpoint="http://localhost:6006/v1/traces")
phoenix_processor = BatchSpanProcessor(phoenix_exporter)

# Add both to the provider
provider = TracerProvider()
provider.add_span_processor(evaldeck_processor)
provider.add_span_processor(phoenix_processor)
trace.set_tracer_provider(provider)

# Now traces go to both Evaldeck (for evaluation) and Phoenix (for visualization)

Troubleshooting¶

No traces captured¶

Ensure you call setup_otel_tracing() before instrumenting frameworks
Ensure you instrument the framework before importing/creating agents
Check that spans have openinference.span.kind attribute set

Missing steps¶

Some frameworks may not emit all span types. Check which span kinds your instrumentor supports.

Token counts missing¶

Not all LLM providers return token usage. The tokens field will have zeros if not available.