Agent Performance Report - Week of May 20, 2026 #33558

2026-05-20T13:45:46Z

github-actions[bot]
Bot May 20, 2026

Executive Summary

Agents analyzed: 323 workflows
Total outputs reviewed: 374 (185 issues, 189 PRs)
Average quality score: 74/100 (18-day plateau)
Average effectiveness score: 71/100 (18-day plateau)
Overall health score: 63/100 (stable but degraded)
Top performers: Issue Monster (87), Auto-Triage (85), Bot Detection (83)
Critical issues: 3 new improvement issues created + 2 existing P1 blockers

Key Finding: Agent ecosystem is stable but significantly degraded by orchestrator failure (Agentic Maintenance) and 90+ day critical bugs (CGO/CJS). Quality and effectiveness scores have plateaued for 18 days. Expected breakout to 76-78 quality and 73-75 effectiveness once P1 issues resolved.

Performance Rankings

Top Performing Agents 🏆

1. Issue Monster (Quality: 85/100, Effectiveness: 87/100)

Highest effectiveness in ecosystem
Fast execution: ~6m39s average runtime
Excellent task completion rate
Strength: Well-scoped, single-responsibility design
Example: Coordinates with copilot-swe-agent effectively every 30 minutes

2. Auto-Triage Issues (Quality: 82/100, Effectiveness: 85/100)

100% success rate in recent runs
Strong labeling accuracy
Fast execution: ~8 minutes
Strength: Clear scope, good error handling
Example: Processes 30 unlabeled issues per run

3. Bot Detection (Quality: 82/100, Effectiveness: 83/100)

Ultra-fast execution: 9 seconds
100% success rate
Zero false positives detected
Strength: Simple, focused, reliable
Example: Processes all open issues efficiently

4. License Compliance Check (Quality: 80/100, Effectiveness: 82/100)

~98% success rate
Specialized domain expertise
Consistent quality
Strength: Domain-specific validation

5. PR Sous Chef (Quality: 80/100, Effectiveness: 82/100)

100% success rate
Effective PR assistance
Good collaboration patterns
Strength: Clear PR improvement workflow

6. Copilot SWE Agent (Quality: 78/100, Effectiveness: 85/100)

30-day performance:
- PRs created: 125
- PRs merged: 101 (80.8% merge rate) 🎯
- PRs closed without merge: 19 (15.2%)
- PRs open: 5
Strength: High merge rate, productive output
Minor concern: 15.2% closure rate indicates some inconsistency
Recent examples: Allow patch-diff.githubusercontent.com in the GitHub domain ecosystem #33543, Remove centralized pull_request_reviewer dispatching from agentic_commands.yml #33542, Add sub_agent_strategy A/B experiment to smoke-gemini workflow #33540, fix(otlp): always emit gen_ai.response.finish_reasons; use GITHUB_SHA as service.version fallback #33528 (all merged)

Agents Needing Improvement 📉

🔴 CRITICAL - Agentic Maintenance (Effectiveness: 0/100) - P1

Status: DOWN (Day 2)

Issues:

Compile failure in compile-workflows step
Orchestrator completely impaired
All downstream automation blocked
Zero outputs in past 2 days

Impact: Meta-orchestrator capacity lost, quality/effectiveness plateau

Action: Created issue for immediate fix
Expected recovery: 2-4 hours, +2-4 points quality/effectiveness

🔴 CRITICAL - CGO/CJS Workflows (Effectiveness: 0/100) - P1

Status: FAILING (90+ days, 0% success rate)

Issues:

Complete workflow category failure
Zero successful runs in 90+ days
Failing on every push to main

Action: Issue #29669 needs escalation to dedicated engineering
Decision deadline: June 1, 2026 (fix or deprecate)

⚠️ HIGH - Codex Workflows (Effectiveness: 0/100) - P1

Status: BLOCKED (12 workflows)

Issues:

12 workflows unable to execute
OPENAI_API_KEY sandbox exclusion

Action: Issue #32446 needs sandbox configuration fix
Expected recovery: 48 hours

⚠️ MEDIUM - github-actions (Quality: 55/100, Effectiveness: 50/100) - P2

Status: ACTIVE but underperforming

30-day performance:

Issues: 185 created, 139 closed (75.1%) ✅
PRs: 64 created, 21 merged (32.8%) ⚠️, 43 closed without merge (67.2%)

Issues:

Very low PR merge rate vs. copilot-swe-agent (32.8% vs. 80.8%)
Scope creep: Mixed responsibilities (issues + PRs)
43 rejected PRs = wasted review bandwidth

Action: Created issue to split mixed workflows, improve PR quality
Target: >60% PR merge rate within 30 days

⚠️ MEDIUM - Token Budget Exhaustion (Multiple Workflows) - P2

Status: INTERMITTENT failures

Issues:

Multiple daily workflows hitting token limits
Estimated 15-20% token waste
Inconsistent completion rates

Action: Created issue for token usage audit and optimization
Target: 95%+ consistent completion, <5% waste

Inactive Agents

None identified in this analysis period. All 323 workflows have lock files and are deployable.

Quality Analysis

Output Quality Distribution

Excellent (80-100): 6 top performers
Good (60-79): Majority of ecosystem
Fair (40-59): github-actions PR quality, token-constrained workflows
Poor (<40): Failed workflows (Agentic Maintenance, CGO/CJS, Codex)

Common Quality Issues

1. Incomplete Outputs (Under-Creation Pattern)

Affected agents: 5+

Agentic Maintenance (compile failure)
CGO/CJS workflows (90+ day failure)
Codex workflows (12 blocked)
Token budget constrained workflows
github-actions PRs (low merge rate = ineffective)

Impact: Reduced ecosystem output volume and effectiveness

2. Inconsistent Performance

Affected agents: 3

copilot-swe-agent (15.2% closure rate)
github-actions (3x gap between issue and PR success)
Token budget workflows (intermittent failures)

Impact: Unpredictable outcomes, wasted partial runs

3. Scope Creep

Affected agents: github-actions

Mixed issues + PRs under same actor
Poor PR performance (32.8% merge rate)
Confusion about responsibilities

Impact: Wasted review bandwidth, unclear accountability

Effectiveness Analysis

Task Completion Rates

High completion (>80%): Issue Monster (87), Auto-Triage (85), Bot Detection (83), copilot-swe-agent (85)
Medium completion (50-80%): Most daily orchestrators, specialized workflows
Low completion (<50%): github-actions PRs (32.8%), token-exhausted workflows, blocked workflows (0%)

PR Merge Statistics (30-Day Window)

Excellent Merge Rates (>75%)

copilot-swe-agent: 80.8% (101/125 PRs merged) 🏆
Top performer benchmark

Poor Merge Rates (<40%)

github-actions: 32.8% (21/64 PRs merged) ⚠️
Needs improvement: See improvement issue

Time to Completion

Fast (<10 min): Bot Detection (9s), smoke tests, simple validators
Medium (10-30 min): Issue Monster (6m39s), Auto-Triage (8m), License Compliance
Slow (>30 min): Daily orchestrators, complex analysis workflows

Optimization opportunity: 4 daily orchestrators flagged for resource waste

Behavioral Patterns

Problematic Patterns ⚠️

Under-Creation (5+ agents):

Agentic Maintenance (compile failure) ← IMMEDIATE
CGO/CJS workflows (90+ day failure) ← ESCALATE
Codex workflows (12 blocked) ← HIGH PRIORITY
Token budget workflows (exhaustion) ← NEW ISSUE
github-actions PRs (low merge rate = ineffective) ← NEW ISSUE

Inconsistency (3 agents):

copilot-swe-agent (15.2% closure rate)
github-actions (3x gap between issue and PR success)
Token budget workflows (intermittent failures)

Scope Creep (1 agent):

github-actions (mixed issues + PRs, poor PR performance)

Resource Waste (4 agents):

Daily orchestrators with excessive runtimes
Token budget exhaustion (15-20% wasted runs)

Productive Patterns ✅

High-Quality Single-Responsibility:

Issue Monster: Clear scope, fast, effective
Auto-Triage: Focused labeling, 100% success
Bot Detection: Simple, 9-second runtime, reliable

Effective Coordination:

Issue Monster → copilot-swe-agent handoff (every 30 min)
Auto-Triage → Label application workflow
PR Sous Chef → PR improvement feedback

Specialized Expertise:

License Compliance: Domain validation, 98% success
Bot Detection: Focused spam/bot detection

Coverage Analysis

Well-Covered Areas ✅

Issue triage and labeling (Issue Monster, Auto-Triage, Bot Detection)
PR creation and review (copilot-swe-agent, PR Sous Chef)
License and compliance checking
Bot and spam detection
Daily orchestration (60 workflows)

Coverage Gaps 🔍

Security vulnerability tracking (limited)
Performance optimization monitoring
User experience improvements
Documentation quality assessment
Technical debt tracking

Redundancy Concerns ⚠️

Issue handling: 3 agents (Issue Monster, Auto-Triage, Bot Detection) - some overlap but acceptable specialization
Daily orchestration: 60 workflows - potential consolidation opportunities

Ecosystem Health

Agent Diversity

Engine Distribution (Total: 323 workflows):

Copilot: 140 workflows (43.3%)
Claude: 60 workflows (18.6%)
Codex: 12 workflows (3.7%) - BLOCKED ⚠️
Others: 19 workflows (5.9%)
Non-agentic: 15 workflows (4.6%)
Unknown: 77 workflows (23.9%)

Observations:

✅ Healthy engine diversity (3+ engines active)
✅ Copilot dominant but not monopolistic
⚠️ Codex blockage temporarily reduces diversity
💡 Claude underutilized (opportunity for expansion)

Trends

Quality Score: 74/100

Current: 74/100
Change: 0 (18-day plateau) ⚠️
Expected breakout: 76-78 once P1 issues resolved
Contributing factors: Top performers at 80-85, critical failures dragging average

Effectiveness Score: 71/100

Current: 71/100
Change: 0 (18-day plateau) ⚠️
Expected breakout: 73-75 once P1 issues resolved
Contributing factors: Under-creation, blocked workflows, low PR merge rates

Health Score: 63/100

Current: 63/100 (stable but degraded)
Change: 0
Status: Degraded from Agentic Maintenance failure
Lock file coverage: 100% (231/231) ✅
Open aw-failures: ~22 (stable)

Output Volume (30 days):

Issues created: 185 (↔ stable)
PRs created: 189 (↔ stable)
PRs merged: 122 (64.6% overall merge rate)
Overall activity: Stable

Recommendations

🔴 Immediate Actions (0-24 hours)

Restore Agentic Maintenance (P1) ← NEW ISSUE
- Fix compile failure
- Verify orchestrator functionality
- Expected: +2-4 points quality/effectiveness
Investigate Codex Blockage (P1)
- Fix OPENAI_API_KEY sandbox exclusion
- Unblock 12 workflows
- Issue: Codex OpenAI proxy fails because OPENAI_API_KEY is excluded from AWF sandbox #32446

⚠️ High Priority (24-72 hours)

Address Token Budget Exhaustion (P2) ← NEW ISSUE
- Audit token usage
- Test compression experiments
- Target: 95%+ completion, <5% waste
Escalate CGO/CJS Issue (P1)
- Dedicated engineering assignment
- Decision: Fix or deprecate by June 1
- Issue: [CGO] Workflow failure on main - Run #2565 #29669 (needs escalation)

💡 Medium Priority (1-2 weeks)

Split github-actions Mixed Workflows (P2) ← NEW ISSUE
- Separate issue and PR creation
- Target: >60% PR merge rate
Break Quality and Effectiveness Plateaus
- Unblock critical workflows
- Optimize token usage
- Implement quality ratcheting
- Target: 78/100 quality, 75/100 effectiveness

🔧 Low Priority (2-4 weeks)

Optimize Resource Usage - Profile daily orchestrators, reduce runtime 25%
Implement Deduplication - Add similarity detection across agents
Document Top-Performer Patterns - Create reusable templates

Actions Taken This Run

✅ Created 3 improvement issues:

Agentic Maintenance compile failure (P1)
github-actions low PR merge rate (P2)
Token budget exhaustion (P2)

✅ Generated comprehensive performance report

✅ Updated shared memory:

agent-performance-latest.md
shared-alerts.md

✅ Coordinated with other meta-orchestrators:

Notes for Campaign Manager (token pressure, Codex blocked)
Notes for Workflow Health Manager (Agentic Maintenance priority)

✅ Pattern detection completed:

5+ under-creation agents identified
3 inconsistency patterns
1 scope creep
4 resource waste

Next Steps

Daily monitoring of P1 issues until resolved (health score >70)
Track improvement metrics weekly:
- Quality score (target: 78+)
- Effectiveness score (target: 75+)
- PR merge rates (github-actions target: 60+)
- Token waste (target: <5%)
- Blocked workflows (target: 0)
Run pattern-detector again after fixes deployed (target: 7 days)
Coordinate with Campaign Manager on quality plateau impact
Next full analysis: Weekly (or daily until health improves)

Analysis Period: April 20 - May 20, 2026 (30 days)
Run ID: §26165726464
Next Report: 2026-05-27 (weekly) or 2026-05-21 (daily if health <70)

Generated by ⚡ Agent Performance Analyzer - Meta-Orchestrator · ● 11.5M · ◷

expires on May 21, 2026, 1:45 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report - Week of May 20, 2026 #33558

Uh oh!

{{title}}

Uh oh!

Top Performing Agents 🏆

1. Issue Monster (Quality: 85/100, Effectiveness: 87/100)

2. Auto-Triage Issues (Quality: 82/100, Effectiveness: 85/100)

3. Bot Detection (Quality: 82/100, Effectiveness: 83/100)

4. License Compliance Check (Quality: 80/100, Effectiveness: 82/100)

5. PR Sous Chef (Quality: 80/100, Effectiveness: 82/100)

6. Copilot SWE Agent (Quality: 78/100, Effectiveness: 85/100)

Agents Needing Improvement 📉

🔴 CRITICAL - Agentic Maintenance (Effectiveness: 0/100) - P1

🔴 CRITICAL - CGO/CJS Workflows (Effectiveness: 0/100) - P1

⚠️ HIGH - Codex Workflows (Effectiveness: 0/100) - P1

⚠️ MEDIUM - github-actions (Quality: 55/100, Effectiveness: 50/100) - P2

⚠️ MEDIUM - Token Budget Exhaustion (Multiple Workflows) - P2

Inactive Agents

Output Quality Distribution

Common Quality Issues

1. Incomplete Outputs (Under-Creation Pattern)

2. Inconsistent Performance

3. Scope Creep

Task Completion Rates

PR Merge Statistics (30-Day Window)

Excellent Merge Rates (>75%)

Poor Merge Rates (<40%)

Time to Completion

Replies: 0 comments

Select a reply

Uh oh!

Agent Performance Report - Week of May 20, 2026 #33558

Uh oh!

github-actions[bot] Bot May 20, 2026

Executive Summary

Top Performing Agents 🏆

1. Issue Monster (Quality: 85/100, Effectiveness: 87/100)

2. Auto-Triage Issues (Quality: 82/100, Effectiveness: 85/100)

3. Bot Detection (Quality: 82/100, Effectiveness: 83/100)

4. License Compliance Check (Quality: 80/100, Effectiveness: 82/100)

5. PR Sous Chef (Quality: 80/100, Effectiveness: 82/100)

6. Copilot SWE Agent (Quality: 78/100, Effectiveness: 85/100)

Agents Needing Improvement 📉

🔴 CRITICAL - Agentic Maintenance (Effectiveness: 0/100) - P1

🔴 CRITICAL - CGO/CJS Workflows (Effectiveness: 0/100) - P1

⚠️ HIGH - Codex Workflows (Effectiveness: 0/100) - P1

⚠️ MEDIUM - github-actions (Quality: 55/100, Effectiveness: 50/100) - P2

⚠️ MEDIUM - Token Budget Exhaustion (Multiple Workflows) - P2

Inactive Agents

Output Quality Distribution

Common Quality Issues

1. Incomplete Outputs (Under-Creation Pattern)

2. Inconsistent Performance

3. Scope Creep

Task Completion Rates

PR Merge Statistics (30-Day Window)

Excellent Merge Rates (>75%)

Poor Merge Rates (<40%)

Time to Completion

Behavioral Patterns

Problematic Patterns ⚠️

Productive Patterns ✅

Coverage Analysis

Well-Covered Areas ✅

Coverage Gaps 🔍

Redundancy Concerns ⚠️

Ecosystem Health

Agent Diversity

Trends

Recommendations

🔴 Immediate Actions (0-24 hours)

⚠️ High Priority (24-72 hours)

💡 Medium Priority (1-2 weeks)

🔧 Low Priority (2-4 weeks)

Actions Taken This Run

Next Steps

Replies: 0 comments

github-actions[bot]
Bot May 20, 2026