[prompt-clustering] Copilot agent prompt clustering — 2026-05-18 #33017

2026-05-18T11:30:46Z

github-actions[bot]
Bot May 18, 2026

Summary

Analysis window: PRs created since 2026-04-18 (last 30 days)
PRs analyzed: 1104
Clusters identified: 8 (silhouette=0.043 on cosine TF-IDF)
Overall merge rate: 79.4%
Top cluster: C4 — CI job-failure fixes (224 PRs, 78% merged)
Best merge rate: C6 — Address-review-comments tasks (94%) · Worst: C1 — WIP / planning-stage runs (46%)

Cluster sizes

Key findings

Three task families dominate the queue. CI job-failure fixes (C4, 20%), prompt/experiment tuning (C7, 18%), and new-feature / docs work (C0, 17%) account for 55% of all copilot PRs in the window.
Review-comment follow-ups merge the most reliably (94%). Tasks where a human asks the agent to address PR review comments are the highest-yield workflow — short turnaround (avg 4.5 commits, 4.8 review threads) with the cleanest success signal.
WIP / planning-stage prompts under-perform (46% merged). Cluster C1 contains PRs with auto-generated titles like [WIP] Fix failing GitHub Actions job <name> where the agent has only produced an empty plan/progress description before being abandoned. Half of them never reach merge — strong indicator they should be filtered out of merge-rate dashboards or expanded before being assigned.
AWF / firewall / MCP bumps are the most expensive (avg 83 files changed, 940 additions, 1080 deletions) and the second-lowest success rate (66%). These are bulk regeneration PRs — when they fail it's usually because the bump landed on a stale baseline.
Bug-fix cluster (C2) needs the most iterations (4.9 commits/PR) but achieves a strong 86% merge rate — the agent is willing to keep working until the behavior is right.

Cluster sizes & merge rates

Cluster	Theme	Size	Merge %	Avg commits	Avg files	Top terms
C4	CI job-failure fixes	224	78%	2.7	20.2	job, failure, failing, cli, files, cause
C7	Prompt tuning & experiments	200	84%	3.0	17.0	prompt, run, experiment, file, token, output
C0	New features & docs	189	78%	4.0	24.1	new, adds, feature, model, shared, guidance
C2	Bug fixes & behavior changes	173	86%	4.9	20.9	bug, behavior, coverage, path, updated, change
C5	AWF / MCP / firewall bumps	116	66%	3.7	82.8	awf, mcp, version, config, gateway, default
C6	Address-review-comments tasks	101	94%	4.5	19.7	comments, review comments, review, silently, run, comment
C3	Merge-main & recompile chores	75	75%	6.0	60.7	recompile, merge, merge main, main recompile, main, run
C1	WIP / planning-stage runs	26	46%	2.2	29.7	asking, asking work, form plan, work started, plan progress, started description

2-D projection

Truncated-SVD projection of the TF-IDF matrix. Cohesion is loose (silhouette ≈ 0.04 — typical for short technical text), but the eight families are distinguishable by their top-term centroids rather than by visual separation.

Effort per cluster

Bulk regeneration tasks (C5 awf/mcp bumps, C3 merge-main recompiles) ship 60–80+ files per PR. Most other work hovers around 15–25 changed files.

Detailed cluster cards (top terms · representative PRs · stats)

C4 — CI job-failure fixes

Size: 224 PRs (20.3% of total)
Merge rate: 78% (175 merged / 43 closed-unmerged / 6 open)
Avg commits per PR: 2.7
Avg review-thread comments: 1.5
Avg files changed: 20.2
Avg additions / deletions: 182 / 89
Top TF-IDF terms: job, failure, failing, cli, files, cause, replace, run, root, failures
Representative PRs (closest to centroid):
- #30197 — fix: add actions: read permission to smoke-water.yml (#investigate-smoke-water-failure) · ✅ merged
- #29154 — fix: case-insensitive isPermissionsError in create_discussion.cjs (#25116260447) · ✅ merged
- #30413 — Stabilize Documentation Unbloat docs-server readiness probe · ❌ closed-unmerged

C7 — Prompt tuning & experiments

Size: 200 PRs (18.1% of total)
Merge rate: 84% (167 merged / 32 closed-unmerged / 1 open)
Avg commits per PR: 3.0
Avg review-thread comments: 1.2
Avg files changed: 17.0
Avg additions / deletions: 300 / 103
Top TF-IDF terms: prompt, run, experiment, file, token, output, bash, daily, summary, report
Representative PRs (closest to centroid):
- #31586 — Reduce Daily Syntax Error Quality workflow token churn without changing cadence · ✅ merged
- #32406 — feat(experiments): add output_format A/B test to daily-compiler-quality · ✅ merged
- #32339 — Add prompt_style A/B experiment to ci-coach with concise vs detailed prompt variants · ✅ merged

C0 — New features & docs

Size: 189 PRs (17.1% of total)
Merge rate: 78% (148 merged / 39 closed-unmerged / 2 open)
Avg commits per PR: 4.0
Avg review-thread comments: 2.7
Avg files changed: 24.1
Avg additions / deletions: 502 / 93
Top TF-IDF terms: new, adds, feature, model, shared, guidance, implementation, frontmatter, spec, daily
Representative PRs (closest to centroid):
- #32233 — Add OpenTelemetry reference page and centralize observability docs · ✅ merged
- #30681 — docs: SPDD spec improvements — multiplier registry, safeguards, conflict norms, error norms, sync notes, compl · ✅ merged
- #31540 — Add AWF /reflect reference for gateway-based model discovery and routing · ✅ merged

C2 — Bug fixes & behavior changes

Size: 173 PRs (15.7% of total)
Merge rate: 86% (148 merged / 20 closed-unmerged / 5 open)
Avg commits per PR: 4.9
Avg review-thread comments: 2.3
Avg files changed: 20.9
Avg additions / deletions: 263 / 138
Top TF-IDF terms: bug, behavior, coverage, path, updated, change, existing, regression, testing, fallback
Representative PRs (closest to centroid):
- #31104 — Use AWF audit JSONL as source for effective token failure parsing · ✅ merged
- #30941 — Add set-issue-field safe output with allowed-fields constraints, schema/compiler wiring, actionable field-va · ✅ merged
- #32220 — Handle missing bundle prerequisite commits in safe-output create_pull_request · ✅ merged

C5 — AWF / MCP / firewall bumps

Size: 116 PRs (10.5% of total)
Merge rate: 66% (76 merged / 39 closed-unmerged / 1 open)
Avg commits per PR: 3.7
Avg review-thread comments: 4.9
Avg files changed: 82.8
Avg additions / deletions: 941 / 1080
Top TF-IDF terms: awf, mcp, version, config, gateway, default, bump, golden, cli, firewall
Representative PRs (closest to centroid):
- #32913 — Bump default AWF firewall to v0.25.48 and MCP gateway to v0.3.11 · ✅ merged
- #30406 — Bump default AWF firewall image set to v0.25.40 · ✅ merged
- #32582 — Bump default Claude Code to 2.1.143 and MCP Gateway to v0.3.10, then regenerate compiled workflow artifacts · 🟡 open

C6 — Address-review-comments tasks

Size: 101 PRs (9.1% of total)
Merge rate: 94% (95 merged / 6 closed-unmerged / 0 open)
Avg commits per PR: 4.5
Avg review-thread comments: 4.8
Avg files changed: 19.7
Avg additions / deletions: 443 / 127
Top TF-IDF terms: comments, review comments, review, silently, run, comment, mcp, summary, safe-output, lines
Representative PRs (closest to centroid):
- #31197 — (empty)(/empty) · ❌ closed-unmerged
- #29580 — refactor: EngineRegistry.Register returns error instead of panicking on invalid port · ✅ merged
- #31411 — Comment out top-level on.labels in compiled workflows to prevent push-time workflow parse failures · ✅ merged

C3 — Merge-main & recompile chores

Size: 75 PRs (6.8% of total)
Merge rate: 75% (56 merged / 19 closed-unmerged / 0 open)
Avg commits per PR: 6.0
Avg review-thread comments: 5.6
Avg files changed: 60.7
Avg additions / deletions: 567 / 352
Top TF-IDF terms: recompile, merge, merge main, main recompile, main, run, bug, validation, files, step
Representative PRs (closest to centroid):
- #29819 — chore: upgrade gh-aw-firewall to v0.25.35 · ❌ closed-unmerged
- #31128 — minEffextiveTokens · ✅ merged
- #31614 — Auto-detect ARC/DinD and emit AWF --docker-host-path-prefix in generated workflows · ✅ merged

C1 — WIP / planning-stage runs

Size: 26 PRs (2.4% of total)
Merge rate: 46% (12 merged / 14 closed-unmerged / 0 open)
Avg commits per PR: 2.2
Avg review-thread comments: 0.3
Avg files changed: 29.7
Avg additions / deletions: 225 / 22
Top TF-IDF terms: asking, asking work, form plan, work started, plan progress, started description, date form, description date, started, progress
Representative PRs (closest to centroid):
- #32003 — [WIP] Fix failing GitHub Actions job lint-go · ✅ merged
- #32036 — [WIP] Fix failing GitHub Actions job lint-js · ✅ merged
- #32042 — [WIP] Fix failing GitHub Actions job lint-js · ✅ merged

Sample data table — 60 most-recent PRs (full assignments CSV is 1104 rows)

PR	Title	Cluster	Outcome	Files	Commits
#32944	Revert default firewall/MCP gateway bump from `ac0fd25`	C5 — AWF / MCP / firewall bumps	✅ merged	236	3
#32939	fix(model-inventory): enrich /reflect null models via models_url fallback	C0 — New features & docs	✅ merged	2	2
#32938	fix(daily-model-inventory): remove runner-host /reflect pre-step and query reflect in-agen	C3 — Merge-main & recompile chores	❌ closed	3	3
#32937	Pin Daily Compiler Threat Spec Optimizer to AWF v0.25.46	C5 — AWF / MCP / firewall bumps	❌ closed	2	2
#32913	Bump default AWF firewall to v0.25.48 and MCP gateway to v0.3.11	C5 — AWF / MCP / firewall bumps	✅ merged	236	3
#32911	Add SSL Skill Normalizer and convert reporting, error-messages, jqschema to SSL JSON	C0 — New features & docs	🟡 open	5	4
#32910	fix(safe-output): prevent silent 422 on PR review submission	C6 — Address-review-comments tasks	✅ merged	8	6
#32909	Reject removed `tools.serena` in parser and align with schema	C2 — Bug fixes & behavior changes	✅ merged	3	4
#32906	Harden privileged checkout path in `q.lock.yml` for comment-triggered runs	C2 — Bug fixes & behavior changes	🟡 open	54	9
#32905	Add BMAD-guided PR reviewer workflow and slash-command routing	C3 — Merge-main & recompile chores	❌ closed	14	10
#32904	Add UK AI operational resilience workflow with recent-change triage and sub-agent risk gov	C0 — New features & docs	✅ merged	2	3
#32900	Handle `update_pull_request.update_branch` workflow-permission failures as non-fatal	C2 — Bug fixes & behavior changes	✅ merged	2	10
#32894	Remove duplicate CLI install step in deep-report agent job	C4 — CI job-failure fixes	✅ merged	2	1
#32890	Centralize default HTTP client timeout in `pkg/constants` and remove duplicated 30s litera	C2 — Bug fixes & behavior changes	✅ merged	8	4
#32889	docs: add explicit multi-engine signal in README and make engine guidance need-based	C0 — New features & docs	✅ merged	2	2
#32888	Consolidate CLI not-found detection and fix lowercase “not found” miss	C2 — Bug fixes & behavior changes	✅ merged	3	2
#32872	Improve sanitize test assertions in pkg/stringutil/sanitize_test.go	C4 — CI job-failure fixes	✅ merged	1	4
#32868	Align docs GEO discovery artifacts for GitHub Pages subpath hosting	C4 — CI job-failure fixes	✅ merged	3	2
#32867	Expose and advertise `llms-full.txt` for docs GEO discoverability	C0 — New features & docs	✅ merged	3	2
#32866	Strengthen docs homepage JSON-LD graph with explicit node context and KG disambiguation me	C0 — New features & docs	✅ merged	1	4
#32863	GEO: Fix broken links and reduce keyword stuffing in README	C0 — New features & docs	❌ closed	1	2
#32862	Add `/sitemap.xml` alias and strengthen README GEO/brand signals	C0 — New features & docs	❌ closed	3	2
#32861	Add log-triage and workflow-file-scanner inline sub-agents to q.md	C7 — Prompt tuning & experiments	✅ merged	1	2
#32849	feat: infer gh CLI permissions for activation job pre-steps	C4 — CI job-failure fixes	✅ merged	8	16
#32847	Improve JS step-summary agent trace readability with ordered event markers	C7 — Prompt tuning & experiments	✅ merged	2	2
#32844	specs: daily SPDD work plan 2026-05-17 — safeguards, DriftRecord schema, sync notes, confo	C0 — New features & docs	✅ merged	5	3
#32842	Guard `MergeUnique` allocation sizing against integer overflow (CodeQL #592)	C2 — Bug fixes & behavior changes	✅ merged	1	2
#32841	Harden run step sanitizer against allocation-size overflow pattern	C2 — Bug fixes & behavior changes	✅ merged	1	2
#32840	[WIP] Fix failing GitHub Actions job JS Tests (shard 4/4)	C1 — WIP / planning-stage runs	✅ merged	2	7
#32839	[WIP] Fix failing GitHub Actions job js-typecheck	C1 — WIP / planning-stage runs	✅ merged	1	2
#32836	Normalize report-formatting guidance across non-compliant reporting workflows	C7 — Prompt tuning & experiments	✅ merged	3	2
#32832	[WIP] Fix the failing GitHub Actions job "Integration Unauthenticated Add (Public Repo)"	C1 — WIP / planning-stage runs	✅ merged	12	3
#32831	[WIP] Fix failing GitHub Actions job lint-go	C1 — WIP / planning-stage runs	✅ merged	12	3
#32827	Add `otlp-env-vars` skill for OpenTelemetry SDK env var configuration	C0 — New features & docs	❌ closed	10	2
#32826	Increase default repo-memory MaxFileSize from 10KB to 100KB	C5 — AWF / MCP / firewall bumps	✅ merged	3	1
#32825	Allow bash * in linter-miner workflow	C4 — CI job-failure fixes	✅ merged	1	1
#32822	Add missing `errorutil` package spec and align dependency sections in package READMEs	C0 — New features & docs	✅ merged	3	2
#32820	Connect `agent-performance-analyzer` to AgentDB for trend, recall, and regression memory	C0 — New features & docs	✅ merged	1	4
#32819	Add `checkout.clean-git-credentials` to support submodule-safe checkout credential cleanup	C2 — Bug fixes & behavior changes	✅ merged	13	29
#32818	Apply safe-output mention policy across markdown-producing handlers	C2 — Bug fixes & behavior changes	✅ merged	18	7
#32816	Linter Miner: raise execution limits and switch bash tool allowlist to wildcard	C7 — Prompt tuning & experiments	✅ merged	2	1
#32805	fix: pass mentions config to add_comment handler so allowed mentions aren't escaped	C6 — Address-review-comments tasks	✅ merged	6	4
#32804	docs: fix broken pattern links in patterns.md	C4 — CI job-failure fixes	✅ merged	1	2
#32802	feat(daily-semgrep-scan): add semgrep_output_format A/B experiment	C7 — Prompt tuning & experiments	✅ merged	2	2
#32801	Unify OTel engine identity on agent spans and remove duplicate `gen_ai.system`	C2 — Bug fixes & behavior changes	✅ merged	2	2
#32791	Align OTLP shared import with header-based secret contract	C2 — Bug fixes & behavior changes	🟡 open	229	5
#32771	Add shared AgentDB MCP import and wire deep-report for large-scale discussion search	C0 — New features & docs	🟡 open	3	2
#32770	Prevent chaos create-pull-request fallback when branch already exists	C2 — Bug fixes & behavior changes	🟡 open	2	1
#32769	fix(chaos-fuzzer): add `recreate-ref: true` to prevent branch-exists push failure	C4 — CI job-failure fixes	🟡 open	2	1
#32761	Surface OTLP export failures in observability step summary	C2 — Bug fixes & behavior changes	✅ merged	2	2
#32759	feat: surface denied commands and fix prompt in agent failure reports	C7 — Prompt tuning & experiments	🟡 open	11	8
#32758	Adopt isPermissionError helper for gh CLI auth-error detection	C4 — CI job-failure fixes	✅ merged	6	2
#32757	Export shared IsNotFoundError helper to pkg/errorutil, adopt across pkg/cli and pkg/parser	C4 — CI job-failure fixes	✅ merged	9	3
#32746	lint-monster: skip-if recent open issues (24h), single agent session	C7 — Prompt tuning & experiments	❌ closed	2	3
#32744	docs: add llms.md — AWF /reflect endpoint guide for LLM tool configuration	C5 — AWF / MCP / firewall bumps	✅ merged	2	2
#32743	docs: fix broken link to charmbracelet/crush repo in engines.md	C0 — New features & docs	✅ merged	1	2
#32742	Refine ET budget exhaustion message for scanability, link fidelity, and optimization guida	C7 — Prompt tuning & experiments	✅ merged	3	4
#32735	[model-inventory] Register 2026-05-17 OpenAI/Gemini model variants in ET multiplier regist	C0 — New features & docs	✅ merged	4	2
#32733	Fix cache-memory artifact upload path generation in threat-detection workflows	C3 — Merge-main & recompile chores	✅ merged	89	4
#32720	Refactor call-workflow extraction to use a shared YAML/Markdown loader	C2 — Bug fixes & behavior changes	❌ closed	5	3

Recommendations

Treat C1 as a noise channel, not a success channel. WIP planning-stage PRs ([WIP] Fix failing GitHub Actions job ...) are flooding the dataset with no-op completions. Either auto-close them after N hours of no agent activity, or filter them out of success_rate rollups so the headline number reflects intentional work.
Standardize the AWF/MCP bump workflow (C5). It's the biggest, most-expensive cluster with a 34% failure rate. Establish a fixed baseline (last green main) before launching the bump and require the agent to recompile only after a successful rebase, instead of having merge-main and bump conflate (see C3 vs C5 overlap).
Promote the @copilot review all comments flow (C6). 94% merge rate, low effort. If we can make this the canonical second-pass step after an initial agent PR, we lift the overall merge rate by ~2 pp at almost no marginal cost.
Give bug-fix prompts (C2) more turns. 86% merge rate with 4.9 commits/PR — the cluster is succeeding because it iterates. Resist tightening turn budgets here; the iteration loop is doing the work.
Tag prompt-experiment PRs (C7) with the experiment ID at PR-creation time. They share top terms (experiment, token, output, cache, daily, analysis) but aren't grouped semantically — a prompt-experiment:<id> label would let us attribute lift back to the underlying prompt change instead of relying on TF-IDF clustering after the fact.

Methodology

Source: /tmp/gh-aw/prompt-cache/pr-full-data/*.json (1,104 PRs created in last 30 days, copilot-authored).
Prompt text = PR title + lead body section (before ## Changes / ## Implementation / ## Test ...) + first non-bot comment mentioning copilot.
Vectorizer: TF-IDF, 1–2 grams, sublinear TF, min_df=3, max_df=0.45, project-specific stopwords removed.
Model: K-means; k chosen by best cosine silhouette over {4,5,6,7,8} → k=8 (silhouette 0.043). Silhouette is modest because PR titles/bodies are short and share heavy domain vocabulary; cluster identity comes from top-term centroids, not geometric isolation.
Workflow-turn metrics from aw_info.json were not joined in this run (logs not pre-fetched for the full PR set); cluster effort is approximated via changedFiles / commits_count / review-thread length instead.

Generated by Prompt Clustering Analysis · run §26029513288 · 2026-05-18T11:28:20Z

Generated by 📊 Copilot Agent Prompt Clustering Analysis · ● 16M · ◷

expires on May 19, 2026, 11:30 AM UTC

2026-05-19T11:14:30Z

github-actions[bot]
Bot May 19, 2026
Author

This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis.

A newer discussion is available at Discussion #33277.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot agent prompt clustering — 2026-05-18 #33017

Uh oh!

{{title}}

Uh oh!

C4 — CI job-failure fixes

C7 — Prompt tuning & experiments

C0 — New features & docs

C2 — Bug fixes & behavior changes

C5 — AWF / MCP / firewall bumps

C6 — Address-review-comments tasks

C3 — Merge-main & recompile chores

C1 — WIP / planning-stage runs

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot agent prompt clustering — 2026-05-18 #33017

Uh oh!

github-actions[bot] Bot May 18, 2026

Summary

Cluster sizes

Key findings

Cluster sizes & merge rates

2-D projection

Effort per cluster

C4 — CI job-failure fixes

C7 — Prompt tuning & experiments

C0 — New features & docs

C2 — Bug fixes & behavior changes

C5 — AWF / MCP / firewall bumps

C6 — Address-review-comments tasks

C3 — Merge-main & recompile chores

C1 — WIP / planning-stage runs

Recommendations

Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 19, 2026 Author

github-actions[bot]
Bot May 18, 2026

github-actions[bot]
Bot May 19, 2026
Author