[prompt-clustering] Copilot agent prompt clustering — 2026-05-18 #33017
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis. A newer discussion is available at Discussion #33277. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Cluster sizes
Key findings
[WIP] Fix failing GitHub Actions job <name>where the agent has only produced an empty plan/progress description before being abandoned. Half of them never reach merge — strong indicator they should be filtered out of merge-rate dashboards or expanded before being assigned.Cluster sizes & merge rates
2-D projection
Truncated-SVD projection of the TF-IDF matrix. Cohesion is loose (silhouette ≈ 0.04 — typical for short technical text), but the eight families are distinguishable by their top-term centroids rather than by visual separation.
Effort per cluster
Bulk regeneration tasks (C5 awf/mcp bumps, C3 merge-main recompiles) ship 60–80+ files per PR. Most other work hovers around 15–25 changed files.
Detailed cluster cards (top terms · representative PRs · stats)
C4 — CI job-failure fixes
job, failure, failing, cli, files, cause, replace, run, root, failuresactions: readpermission to smoke-water.yml (#investigate-smoke-water-failure) · ✅ mergedC7 — Prompt tuning & experiments
prompt, run, experiment, file, token, output, bash, daily, summary, reportprompt_styleA/B experiment toci-coachwith concise vs detailed prompt variants · ✅ mergedC0 — New features & docs
new, adds, feature, model, shared, guidance, implementation, frontmatter, spec, daily/reflectreference for gateway-based model discovery and routing · ✅ mergedC2 — Bug fixes & behavior changes
bug, behavior, coverage, path, updated, change, existing, regression, testing, fallbackset-issue-fieldsafe output with allowed-fields constraints, schema/compiler wiring, actionable field-va · ✅ mergedcreate_pull_request· ✅ mergedC5 — AWF / MCP / firewall bumps
awf, mcp, version, config, gateway, default, bump, golden, cli, firewallC6 — Address-review-comments tasks
comments, review comments, review, silently, run, comment, mcp, summary, safe-output, lineson.labelsin compiled workflows to prevent push-time workflow parse failures · ✅ mergedC3 — Merge-main & recompile chores
recompile, merge, merge main, main recompile, main, run, bug, validation, files, step--docker-host-path-prefixin generated workflows · ✅ mergedC1 — WIP / planning-stage runs
asking, asking work, form plan, work started, plan progress, started description, date form, description date, started, progressSample data table — 60 most-recent PRs (full assignments CSV is 1104 rows)
tools.serenain parser and align with schemaq.lock.ymlfor comment-triggered runsupdate_pull_request.update_branchworkflow-permission failures as non-fatalpkg/constantsand remove duplicated 30s literallms-full.txtfor docs GEO discoverability/sitemap.xmlalias and strengthen README GEO/brand signalsMergeUniqueallocation sizing against integer overflow (CodeQL #592)otlp-env-varsskill for OpenTelemetry SDK env var configurationerrorutilpackage spec and align dependency sections in package READMEsagent-performance-analyzerto AgentDB for trend, recall, and regression memorycheckout.clean-git-credentialsto support submodule-safe checkout credential cleanupgen_ai.systemrecreate-ref: trueto prevent branch-exists push failureRecommendations
[WIP] Fix failing GitHub Actions job ...) are flooding the dataset with no-op completions. Either auto-close them after N hours of no agent activity, or filter them out ofsuccess_raterollups so the headline number reflects intentional work.@copilot review all commentsflow (C6). 94% merge rate, low effort. If we can make this the canonical second-pass step after an initial agent PR, we lift the overall merge rate by ~2 pp at almost no marginal cost.experiment,token,output,cache,daily,analysis) but aren't grouped semantically — aprompt-experiment:<id>label would let us attribute lift back to the underlying prompt change instead of relying on TF-IDF clustering after the fact.Methodology
/tmp/gh-aw/prompt-cache/pr-full-data/*.json(1,104 PRs created in last 30 days, copilot-authored).## Changes/## Implementation/## Test...) + first non-bot comment mentioningcopilot.aw_info.jsonwere not joined in this run (logs not pre-fetched for the full PR set); cluster effort is approximated viachangedFiles/commits_count/ review-thread length instead.Generated by Prompt Clustering Analysis · run §26029513288 · 2026-05-18T11:28:20Z
Beta Was this translation helpful? Give feedback.
All reactions