Development Guide

Prerequisites

Tool	Version	Install
Docker	any	docker.com
uv	latest	`brew install uv` (for local dev / pre-commit)

Setup

git clone git@github.com:coilysiren/gauntlet.git
cd gauntlet
uv sync
uv run pre-commit install  # install git hooks

Running the MCP server locally

uv run gauntlet-mcp

This speaks stdio.

Exercising the plugin end-to-end

The repo doubles as a Claude Code plugin. Point Claude Code at it during development:

cd your-test-project
claude --plugin-dir /absolute/path/to/gauntlet

This loads the plugin from disk for the current session - no install, no cache. Claude Code will:

register the gauntlet MCP server via uv run --project ${CLAUDE_PLUGIN_ROOT} gauntlet-mcp
auto-discover the skill at skills/gauntlet/SKILL.md

Verify:

/mcp lists gauntlet with its 13 tools
/agents lists gauntlet-attacker, gauntlet-inspector, gauntlet-holdout-evaluator
Typing a trigger phrase like "run gauntlet" loads the skill

To install the plugin permanently (for non-development use):

claude plugin marketplace add coilysiren/gauntlet
claude plugin install gauntlet@coilysiren-gauntlet

Restart Claude Code after install so the skill, MCP server, and subagents register.

Files the plugin system reads:

.claude-plugin/plugin.json - manifest (MCP server declaration, metadata)
skills/gauntlet/SKILL.md - the Orchestrator skill (auto-discovered by trigger phrase)
skills/gauntlet-author/SKILL.md - the trial-authoring skill (auto-discovered by trigger phrase)
agents/gauntlet-attacker.md, agents/gauntlet-inspector.md, agents/gauntlet-holdout-evaluator.md - per-role subagent definitions with MCP-tool allowlists

All paths are load-bearing. Moving any of them breaks the plugin; update plugin.json if you relocate a file.

Tests

# Run tests inside Docker (canonical)
docker compose run --rm test

# Run tests locally (faster iteration)
uv run pytest -m "not docker"

# Run docker integration tests (requires Docker daemon)
uv run pytest -m docker

Coverage is printed to the terminal and written to coverage.xml after every run. coverage.xml is gitignored.

Linting & formatting

Pre-commit hooks run automatically on every git commit. To run manually:

uv run ruff check .          # lint
uv run ruff check . --fix    # lint + auto-fix
uv run ruff format .         # format
uv run mypy gauntlet tests --strict  # type-check

CI

Three jobs run on every push and PR to main:

Job	What it checks
`lint`	ruff + mypy
`test`	pytest + uploads coverage to Codecov
`docker`	`docker compose build` + `docker compose run --rm test`

See .github/workflows/ci.yml.

Dependency management

Add a runtime dependency:

uv add <package>

Add a dev-only dependency:

uv add --dev <package>

Always commit the updated uv.lock alongside pyproject.toml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Guide

Prerequisites

Setup

Running the MCP server locally

Exercising the plugin end-to-end

Tests

Linting & formatting

CI

Dependency management

FilesExpand file tree

development.md

Latest commit

History

development.md

File metadata and controls

Development Guide

Prerequisites

Setup

Running the MCP server locally

Exercising the plugin end-to-end

Tests

Linting & formatting

CI

Dependency management