Skip to content

Latest commit

 

History

History
115 lines (79 loc) · 2.97 KB

File metadata and controls

115 lines (79 loc) · 2.97 KB

Development Guide

Prerequisites

Tool Version Install
Docker any docker.com
uv latest brew install uv (for local dev / pre-commit)

Setup

git clone git@github.com:coilysiren/gauntlet.git
cd gauntlet
uv sync
uv run pre-commit install  # install git hooks

Running the MCP server locally

uv run gauntlet-mcp

This speaks stdio.

Exercising the plugin end-to-end

The repo doubles as a Claude Code plugin. Point Claude Code at it during development:

cd your-test-project
claude --plugin-dir /absolute/path/to/gauntlet

This loads the plugin from disk for the current session - no install, no cache. Claude Code will:

  • register the gauntlet MCP server via uv run --project ${CLAUDE_PLUGIN_ROOT} gauntlet-mcp
  • auto-discover the skill at skills/gauntlet/SKILL.md

Verify:

  • /mcp lists gauntlet with its 13 tools
  • /agents lists gauntlet-attacker, gauntlet-inspector, gauntlet-holdout-evaluator
  • Typing a trigger phrase like "run gauntlet" loads the skill

To install the plugin permanently (for non-development use):

claude plugin marketplace add coilysiren/gauntlet
claude plugin install gauntlet@coilysiren-gauntlet

Restart Claude Code after install so the skill, MCP server, and subagents register.

Files the plugin system reads:

  • .claude-plugin/plugin.json - manifest (MCP server declaration, metadata)
  • skills/gauntlet/SKILL.md - the Orchestrator skill (auto-discovered by trigger phrase)
  • skills/gauntlet-author/SKILL.md - the trial-authoring skill (auto-discovered by trigger phrase)
  • agents/gauntlet-attacker.md, agents/gauntlet-inspector.md, agents/gauntlet-holdout-evaluator.md - per-role subagent definitions with MCP-tool allowlists

All paths are load-bearing. Moving any of them breaks the plugin; update plugin.json if you relocate a file.

Tests

# Run tests inside Docker (canonical)
docker compose run --rm test

# Run tests locally (faster iteration)
uv run pytest -m "not docker"

# Run docker integration tests (requires Docker daemon)
uv run pytest -m docker

Coverage is printed to the terminal and written to coverage.xml after every run. coverage.xml is gitignored.

Linting & formatting

Pre-commit hooks run automatically on every git commit. To run manually:

uv run ruff check .          # lint
uv run ruff check . --fix    # lint + auto-fix
uv run ruff format .         # format
uv run mypy gauntlet tests --strict  # type-check

CI

Three jobs run on every push and PR to main:

Job What it checks
lint ruff + mypy
test pytest + uploads coverage to Codecov
docker docker compose build + docker compose run --rm test

See .github/workflows/ci.yml.

Dependency management

Add a runtime dependency:

uv add <package>

Add a dev-only dependency:

uv add --dev <package>

Always commit the updated uv.lock alongside pyproject.toml.