Skip to content

Development Setup

Set up your environment for Evaldeck development.

Prerequisites

  • Python 3.10+
  • Git
  • (Optional) OpenAI or Anthropic API key for LLM grader tests

Installation

1. Clone the Repository

git clone https://github.com/tantra-run/evaldeck-py.git
cd evaldeck

2. Create Virtual Environment

python -m venv venv
source venv/bin/activate  # Linux/macOS
# or
venv\Scripts\activate     # Windows

3. Install Dependencies

# Core + dev dependencies
pip install -e ".[dev]"

# With all optional dependencies
pip install -e ".[dev,all]"

4. Install Pre-commit Hooks

pre-commit install

This runs linting and formatting on every commit.

Running Tests

# All tests
pytest

# With coverage
pytest --cov=evaldeck --cov-report=html

# Specific file
pytest tests/test_evaluator.py

# Specific test
pytest tests/test_evaluator.py::test_basic_evaluation -v

# Skip slow tests
pytest -m "not slow"

Code Quality

Linting

# Check for issues
ruff check .

# Auto-fix issues
ruff check --fix .

Formatting

# Check formatting
ruff format --check .

# Apply formatting
ruff format .

Type Checking

mypy src/

Run All Checks

# Same as CI
ruff check .
ruff format --check .
mypy src/
pytest

Building Documentation

# Install docs dependencies
pip install mkdocs-material mkdocstrings[python]

# Serve locally
mkdocs serve

# Build static site
mkdocs build

Visit http://localhost:8000 to view docs.

Project Structure

evaldeck/
├── src/evaldeck/           # Source code
│   ├── __init__.py       # Public API
│   ├── cli.py            # CLI commands
│   ├── config.py         # Configuration
│   ├── evaluator.py      # Core engine
│   ├── trace.py          # Trace models
│   ├── test_case.py      # Test case models
│   ├── results.py        # Result models
│   ├── graders/          # Grader implementations
│   │   ├── base.py
│   │   ├── code.py
│   │   └── llm.py
│   ├── metrics/          # Metric implementations
│   └── integrations/     # Framework adapters
├── tests/                # Test suite
│   ├── conftest.py       # Fixtures
│   ├── test_evaluator.py
│   └── ...
├── docs/                 # Documentation
├── examples/             # Usage examples
├── pyproject.toml        # Project config
└── mkdocs.yml           # Docs config

Environment Variables

For LLM grader tests:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

IDE Setup

VS Code

Recommended extensions:

  • Python
  • Pylance
  • Ruff

Settings (.vscode/settings.json):

{
    "python.defaultInterpreterPath": "./venv/bin/python",
    "[python]": {
        "editor.defaultFormatter": "charliermarsh.ruff",
        "editor.formatOnSave": true,
        "editor.codeActionsOnSave": {
            "source.fixAll.ruff": "explicit",
            "source.organizeImports.ruff": "explicit"
        }
    },
    "python.analysis.typeCheckingMode": "basic"
}

PyCharm

  1. Mark src as Sources Root
  2. Enable Ruff plugin
  3. Configure Python interpreter to use venv

Troubleshooting

Import Errors

Ensure you installed in editable mode:

pip install -e ".[dev]"

Pre-commit Failures

Run checks manually to see details:

ruff check .
ruff format .

Test Failures

Check Python version:

python --version  # Should be 3.10+

Run with verbose output:

pytest -v --tb=long