Agent Persona Exploration - 2026-05-17 #32837

2026-05-17T15:51:49Z

github-actions[bot]
Bot May 17, 2026

Overview

Seven scenarios tested across two runs (2026-05-16 and 2026-05-17), covering all five software worker personas. Overall average score: 4.2/5.0. The agent consistently excels at PR-triggered analysis workflows and struggles with scenarios requiring external infrastructure tooling (cloud CLIs, credentials, network allowlisting).

Key Findings

PR-triggered analysis workflows are the sweet spot — path-filtered pull_request triggers with add-comment safe-output consistently scored 4.4–4.8/5
Scheduled report workflows are reliable — schedule + create-issue/create-discussion with skip-if-match dedup works cleanly
Complex DevOps scenarios score lower due to out-of-scope concerns (cloud credentials, binary installation, large timeout risk), not framework limitations
Security posture is consistently correct — read-only agent job + safe-outputs delegation scored 4–5 across all scenarios
The visual-regression.md and test-coverage.md reference prompts give the agent a major quality boost for those domains

Quality Scores by Scenario

#	Persona	Task	Avg
S3	QA Tester	Test coverage PR comment	4.8
S5	Frontend	Visual regression report	4.6
S1	Backend	Schema migration review	4.4
S4	PM	Weekly feature digest	4.4
S7	Backend	API docs diff on PR	4.4
S6	DevOps	Terraform drift detection	3.8
S2	DevOps	Deployment log monitoring	3.2

Top Patterns Observed

Most common trigger: pull_request (opened/synchronize) with optional paths: filter
Most recommended tools: github (gh-proxy), bash, playwright (browser scenarios), cache-memory (persistence)
Universal security: read-only agent job + safe-outputs for writes + skip-if-match deduplication on scheduled runs

High Quality Responses (scores ≥ 4.4)

S5 — Visual Regression (Frontend, 4.6/5)
Near-perfect scenario fit. The agent maps directly to .github/aw/visual-regression.md as a reference, applies Playwright with SSRF protection (localhost-only), uses cache-memory for baseline persistence, and correctly rate-limits add-comment with max:1. Only gap: should ask which app server command to use (storybook, vite, etc.).

S3 — Test Coverage (QA, 4.8/5)
The agent correctly routes to the test-coverage.md prompt and applies coverage diff analysis with clear guidance on thresholds. Best-performing scenario across both runs.

S7 — API Docs (Backend, 4.4/5)
Clean path-filtered trigger, correct token-cost mitigation (scope analysis to changed files only), and update-comment pattern to avoid PR comment spam on re-runs.

Areas for Improvement (scores ≤ 3.8)

S2 — Deployment Monitoring (DevOps, 3.2/5)
The workflow_run trigger is correct but the required actions:read permission is easy to miss. The scenario implies multi-stage behavior (wait for deployment to stabilize) that the single-job model cannot support — the agent should explicitly call this out.

S6 — Terraform Drift (DevOps, 3.8/5)
The framework mechanics are sound, but the agent may generate a syntactically valid workflow that silently fails at runtime because:

Terraform binary isn't pre-installed
Cloud credentials require manual secret configuration
Cloud API hostnames must be explicitly allowlisted (non-trivial for AWS/GCP/Azure)
Large workspaces risk the default 20-minute timeout

The agent should proactively surface these blockers rather than generate a workflow that appears complete.

Recommendations

Add a "complex ops checklist" to .github/aw/create-agentic-workflow.md — when the agent detects keywords like "cloud", "terraform", "kubernetes", or "deploy", it should automatically prompt the user for: cloud provider, credentials strategy, required binaries, and expected run time. This prevents silent workflow failures on DevOps scenarios.
Expand .github/aw/create-agentic-workflow.md with workflow_run trigger guidance — document the actions:read permission requirement and the single-job constraint (no waiting for deployment stabilization). Both S2 and similar monitoring scenarios are blocked by this knowledge gap.
Add update-comment pattern to the PR workflow template in .github/aw/create-agentic-workflow.md — the "check for existing comment before posting" pattern (to avoid spam on synchronize events) came up in multiple scenarios (S1, S5, S7) and should be a first-class recommendation for any PR-triggered workflow that posts comments.

Runs: §25995345892 | Previous: 2026-05-16, 2026-05-15

Generated by 🎭 Agent Persona Explorer · ● 10.7M · ◷

2026-05-20T16:45:25Z

github-actions[bot]
Bot May 20, 2026
Author

This discussion has been marked as outdated by Agent Persona Explorer.

A newer discussion is available at Discussion #33589.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Persona Exploration - 2026-05-17 #32837

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Persona Exploration - 2026-05-17 #32837

Uh oh!

github-actions[bot] Bot May 17, 2026

Overview

Key Findings

Quality Scores by Scenario

Top Patterns Observed

Recommendations

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 20, 2026 Author

github-actions[bot]
Bot May 17, 2026

github-actions[bot]
Bot May 20, 2026
Author