| Tool | Version | Install |
|---|---|---|
| Docker | any | docker.com |
| uv | latest | brew install uv (for local dev / pre-commit) |
git clone git@github.com:coilysiren/gauntlet.git
cd gauntlet
uv sync
uv run pre-commit install # install git hooksuv run gauntlet-mcpThis speaks stdio.
The repo doubles as a Claude Code plugin. Point Claude Code at it during development:
cd your-test-project
claude --plugin-dir /absolute/path/to/gauntletThis loads the plugin from disk for the current session - no install, no cache. Claude Code will:
- register the
gauntletMCP server viauv run --project ${CLAUDE_PLUGIN_ROOT} gauntlet-mcp - auto-discover the skill at
skills/gauntlet/SKILL.md
Verify:
/mcplistsgauntletwith its 13 tools/agentslistsgauntlet-attacker,gauntlet-inspector,gauntlet-holdout-evaluator- Typing a trigger phrase like "run gauntlet" loads the skill
To install the plugin permanently (for non-development use):
claude plugin marketplace add coilysiren/gauntlet
claude plugin install gauntlet@coilysiren-gauntletRestart Claude Code after install so the skill, MCP server, and subagents register.
Files the plugin system reads:
.claude-plugin/plugin.json- manifest (MCP server declaration, metadata)skills/gauntlet/SKILL.md- the Orchestrator skill (auto-discovered by trigger phrase)skills/gauntlet-author/SKILL.md- the trial-authoring skill (auto-discovered by trigger phrase)agents/gauntlet-attacker.md,agents/gauntlet-inspector.md,agents/gauntlet-holdout-evaluator.md- per-role subagent definitions with MCP-tool allowlists
All paths are load-bearing. Moving any of them breaks the plugin; update plugin.json if you relocate a file.
# Run tests inside Docker (canonical)
docker compose run --rm test
# Run tests locally (faster iteration)
uv run pytest -m "not docker"
# Run docker integration tests (requires Docker daemon)
uv run pytest -m dockerCoverage is printed to the terminal and written to coverage.xml after every run. coverage.xml is gitignored.
Pre-commit hooks run automatically on every git commit. To run manually:
uv run ruff check . # lint
uv run ruff check . --fix # lint + auto-fix
uv run ruff format . # format
uv run mypy gauntlet tests --strict # type-checkThree jobs run on every push and PR to main:
| Job | What it checks |
|---|---|
lint |
ruff + mypy |
test |
pytest + uploads coverage to Codecov |
docker |
docker compose build + docker compose run --rm test |
See .github/workflows/ci.yml.
Add a runtime dependency:
uv add <package>Add a dev-only dependency:
uv add --dev <package>Always commit the updated uv.lock alongside pyproject.toml.