add(coverage): Phase 3 -- unit 88% / integration 71% by sergio-sisternes-epam · Pull Request #1417 · microsoft/apm

sergio-sisternes-epam · 2026-05-20T14:34:21Z

TL;DR

Pushes unit coverage from 78% to 88% and integration coverage from 60% to 71% by adding ~4,500 hermetic tests across 48 new test files. Tightens the unit gate from 75→80 and introduces the repo's first integration coverage gate at 55% — both with comfortable margin so community contributors can land changes without tripping the gate on unrelated drift.

Note

Closes #1402. Phase 3 of the progressive coverage ratchet (#1398).

Problem (WHY)

Unit fail_under was 75 — five points below the 78% baseline, meaning coverage could silently regress before the gate tripped.
Integration tests had no coverage gate at all — the 60% baseline could drop to zero without CI failing.
[!] Several core modules (formatters, adapters, downloaders, integrators, marketplace) had < 50% branch coverage, leaving large surfaces untested.

Why these matter: the ratchet strategy in #1398 requires each phase to tighten gates and fill gaps so regressions become visible before merge — per "Grounding outputs in deterministic tool execution transforms probabilistic generation into verifiable action.".

Approach (WHAT)

#	Fix
1	Raise unit `fail_under` from 75→80 in `pyproject.toml`
2	Add a dedicated "Enforce integration coverage gate" CI step with `--fail-under=55` in the fan-in job
3	Add ~2,800 unit tests across 35 files covering commands, compilation, deps, install phases, integration, core, marketplace, models, utils
4	Add ~1,700 integration tests across 21 files covering integrators, commands, downloaders, marketplace, validation, cache, adapters

Implementation (HOW)

pyproject.toml — fail_under 75→80. No other config changes.
.github/workflows/ci-integration.yml — Extracted coverage gate into its own step ("Enforce integration coverage gate") that runs coverage report --fail-under=55 against the combined 4-shard .coverage data. The summary step retains continue-on-error: true; the new gate step has none, so it is the sole hard fail. Per-shard runs keep --cov-fail-under=0 since each shard covers only a fraction.
tests/unit/*_phase3*.py (35 files) — Hermetic unit tests targeting the 25 modules with the largest statement+branch gaps. Each file mocks all I/O and patches at the lookup site. Key modules: output/formatters, copilot_adapter, github_downloader, download_strategies, script_runner, context_optimizer, mcp_integrator, skill_integrator, policy/discovery, workflow/runner, git_cache, plugin_exporter, uninstall/engine, commands/view, commands/init, commands/outdated.
tests/integration/*_phase3*.py (21 files) — Hermetic integration tests exercising cross-module boundaries. Key modules: skill_integrator, mcp_integrator, github_downloader, marketplace, command_integrator, git_cache, git_reference_resolver, download_strategies, validation, yml_schema, install/services, vscode_adapter, package_validator.
tests/unit/compilation/__init__.py — Added for test discovery in the compilation subpackage.
apm.lock.yaml — Removed stale local_deployed_files / local_deployed_file_hashes entries (upstream merge artifact).

Diagrams

Legend: CI coverage pipeline showing the two gates — the tightened unit gate (pyproject.toml) and the new integration gate step in the fan-in job.

flowchart LR
    subgraph Unit["Unit CI job"]
        U1["pytest + cov"] --> U2["fail_under=80"]:::new
    end
    subgraph Integration["Integration CI fan-in job"]
        I1["4 shards"] --> I2["coverage combine"]
        I2 --> I3["coverage summary"]
        I3 --> I4["fail_under=55"]:::new
    end
    U2 --> G["Green CI"]
    I4 --> G
    classDef new stroke-dasharray: 5 5;
    class U2,I4 new;

Trade-offs

Gate at 80/55, not at current baseline (88/71). Setting fail_under at the measured value would make the gate brittle for contributors. The 8pt unit margin and 16pt integration margin let community PRs land without requiring test additions for unrelated modules.
Scenario Evidence skipped. This PR is pure additive test coverage with two config-line changes and no behaviour change to production code — the skip clause applies per the scenario evidence rubric. The existing + new test suites running green ARE the evidence.
Pre-existing flaky tests left as-is. Five ordering-dependent unit tests and 20 network-dependent integration tests (test_skill_bundle_live.py) fail intermittently. These are pre-existing; fixing them is out of scope.

Benefits

Unit coverage 78%→88% (+10pt), gated at 80% with 8pt margin.
Integration coverage 60%→71% (+11pt), gated at 55% with 16pt margin.
~4,500 new hermetic tests catch regressions across 40+ source modules.
First-ever integration coverage gate prevents silent regression of the integration suite.
Community contributors have comfortable margin above both gates.

Validation

Unit test suite — 11,446 passed, gate 80%

Required test coverage of 80.0% reached. Total coverage: 88.39%
11446 passed, 1 skipped, 10 warnings, 34 subtests passed in 18.13s

Integration test suite — 5,630 passed, gate 55%

Integration: 70.94%
5630 passed, 222 skipped, 2 xfailed in 114.71s
(20 pre-existing failures in test_skill_bundle_live.py — network-dependent, pass in CI)

Lint — ruff check + format

All checks passed!
916 files already formatted

How to test

uv run --extra dev pytest tests/unit tests/test_console.py -n auto --dist worksteal -q --cov=apm_cli --override-ini="addopts=" → 11,400+ passed, coverage ≥ 80%
uv run --extra dev pytest tests/integration/ -q --cov=apm_cli --cov-config=pyproject.toml --override-ini="addopts=" → 5,600+ passed, coverage ≥ 55%
uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/ → silent
Review ci-integration.yml diff — new "Enforce integration coverage gate" step with --fail-under=55, no continue-on-error
Review pyproject.toml diff — fail_under changed from 75→80

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Coverage gates: - Unit: fail_under 75 -> 80 (pyproject.toml) - Integration: add --fail-under=55 gate step (ci-integration.yml) Coverage results: - Unit: 78.42% -> 88.34% (+9.92pt, 8.34pt margin over gate) - Integration: 59.82% -> 70.91% (+11.09pt, 15.91pt margin over gate) Test fleet: ~4,500 new hermetic tests across 48 files covering formatters, adapters, downloaders, integrators, commands, compilation, install phases, marketplace, models, deps, cache, utils, and more. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace internal phase/wave suffixes (_phase3, _phase3w4, _phase3w5, _phase3b, _phase3c) with descriptive names that communicate what each file tests (e.g. _error_handling, _branches, _hermetic, _cli_surface). Also clean phase references from docstrings and comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-20T18:00:16Z

APM Review Panel: `ship_with_followups`

Ship with two follow-ups: fix the new gate's silent-bypass on missing .coverage, and regenerate apm.lock.yaml hashes via apm install before next release.

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

The test-coverage-expert's blocking finding must be downweighted. The panelist read the BASE branch file (main, pre-merge) and correctly observed that the old combine step had its enforcement line commented out with continue-on-error: true. That state is exactly what this PR fixes: it removes the commented-out lines and adds a new, separate "Enforce integration coverage gate" step with if: always() and coverage report --fail-under=55. The claim -- "the 55% floor is never enforced" -- does not survive inspection of the PR diff. The test-coverage-expert's second and third recommended findings (continue-on-error wrapping the gate, 54 vs 55 comment discrepancy) are similarly based on pre-PR state that this PR deletes. All three are rendered moot by the PR's own changes.

The real and agreed-upon concern -- raised independently by python-architect, devx-ux-expert, cli-logging-expert, and supply-chain-security -- is that the NEW gate step this PR introduces has a silent bypass: when .coverage is absent (shard timeout, failed combine, infrastructure fault), the step echoes a skip message and exits 0. A coverage gate that goes green when there is no coverage data is not a gate. The fix is one line: invert the guard to exit 1 when .coverage is absent, with an explicit error message naming the upstream combine step as the expected producer. This is not a hard blocker, but it should land before or alongside the next coverage ratchet phase.

The supply-chain-security finding about apm.lock.yaml is the most substantive non-gate concern. Removing local_deployed_file_hashes for 23 agent and instruction files silently disables SHA-256 tamper detection on the files that define AI agent personas -- exactly the supply-chain surface that matters most for an AI-native tool. The characterization as "stale merge artifacts" is plausible, but the correct resolution is apm install to regenerate valid hashes, not wholesale deletion. This should be addressed before the next tagged release; it is not a hard blocker for this test-only PR, but leaving the lockfile in this state across a release would be a trust regression.

Dissent. The test-coverage-expert returned two findings with evidence.outcome: failed classified as blocking and recommended respectively; failed-evidence findings normally carry the strongest weight. However, both evidence excerpts quote the BASE branch text (# Use: coverage report --fail-under=54 and continue-on-error: true wrapping the gate invocation), which the PR diff removes. The evidence is irrefutable about what is on main today; it is simply not about the post-merge state. Because the panelist's static analysis targeted the wrong commit state -- confirmed by the PR diff -- the CEO is downweighting both findings from their evidence-backed severity to voided. No other panelists disagreed; all four who raised gate concerns did so about the new step's bypass behavior, not the old commented-out line.

Aligned with: Secure by default -- the silent-bypass flaw and the lockfile hash removal are both regressions here, addressable with minimal follow-up work; OSS community driven -- 88%/71% coverage with a progressive ratchet story is a strong community signal; the milestone deserves a CHANGELOG entry and eventual README badge once the gate bypass is fixed; Pragmatic as npm -- gate thresholds (80/55) with documented margins are contributor-friendly; the --override-ini flag in the How-to-test section needs a one-line explanation for first-time contributors.

Growth signal. The oss-growth-hacker correctly identifies this as a launch beat hiding in a test PR. "APM now enforces 88% unit + 71% integration coverage gates -- production-grade test discipline from day one" is a credible, concrete differentiator. Recommend surfacing in the next release notes with the Phase 1->2->3 ratchet arc as a contributor-recruitment story. Hold the README coverage badge until the gate bypass is fixed and the lockfile hashes are regenerated, so the badge is backed by an actually-enforced gate.

Panel summary

Persona	B	R	N	Takeaway
Python Architect	0	1	2	Test architecture is sound and hermetic; CI gate has a silent-bypass flaw when no coverage data is collected; lockfile cleanup is safe.
CLI Logging Expert	0	1	3	Good ASCII + symbol coverage on install/policy tests; two weak mock assertions skip message content; one inconsistent patch target in resolver; formatter uses bare 'x Error' prefix outside STATUS_SYMBOLS.
DevX UX Expert	0	2	2	Gate margins are contributor-friendly; one unexplained flag in 'How to test' creates friction; silent CI skip on missing .coverage masks shard failures.
Supply Chain Security Expert	0	1	2	Removing local_deployed_file_hashes silently disables SHA-256 tamper detection for all agent/instruction files; correct fix is re-running apm install to regenerate valid hashes.
OSS Growth Hacker	0	1	2	88% unit / 71% integration coverage is a strong trust signal; gate margins and phase naming carry minor contributor-friction risk worth a quick doc fix.
Doc Writer	0	1	0	No doc files changed; one recommended finding: Phase 3 coverage ratchet milestone has no CHANGELOG entry.
Test Coverage Expert	1	2	1	Integration coverage gate is dead code (commented out + continue-on-error); unit gate and hermetic discipline are sound. Fix the gate before ship. (CEO note: blocking finding voided -- based on pre-PR base state; see Dissent.)

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

[Python Architect] Invert the .coverage guard in the new "Enforce integration coverage gate" step: exit 1 when .coverage is absent, with an explicit error message naming the combine step as the expected producer. -- A gate that exits 0 on missing coverage data is not a gate. Four panelists flagged this independently. One-line fix with zero test-suite risk.
[Supply Chain Security Expert] Run apm install locally to regenerate local_deployed_files and local_deployed_file_hashes for the 23 agent/instruction files, then commit the updated apm.lock.yaml before the next tagged release. -- Removing the SHA-256 integrity anchors for AI agent persona files silently disables tamper detection on APM's highest-trust supply-chain surface. Must be resolved before a release, not necessarily before this PR merges.
[Doc Writer] Add a CHANGELOG entry under [Unreleased] for the Phase 3 coverage milestone: unit 88%, integration 71%, fail_under raised to 80, first integration CI gate. -- Externally observable quality milestones belong in the CHANGELOG. Contributors and downstream consumers use it to assess project health trajectory. One line under ### Quality or ### Changed.
[OSS Growth Hacker] Replace phase-scoped comments in ci-integration.yml with stable language ("# Integration coverage gate -- raise this floor when baseline improves. Current floor: 55%.") and add a one-liner in CONTRIBUTING.md pointing contributors to gate locations. -- Phase-numbered comments bitrot and raise the cognitive cost for first contributors. Stable language scales across future ratchet phases without creating naming confusion.
[DevX UX Expert] Add an inline comment in the PR template "How to test" section explaining the purpose of --override-ini="addopts=". -- Unexplained override flags erode trust for contributors copy-pasting test commands. One sentence removes the friction entirely.

Architecture

classDiagram
    direction TB
    class ShardJob {
      <<CIJob>>
      +matrix shard 1..4
      +run_tests() void
      +rename_coverage() void
      +upload_artifact() void
    }
    class IntegrationTestsJob {
      <<CIJob>>
      +download_shards() void
      +combine_coverage() void
      +enforce_gate() void
      +aggregate_results() void
    }
    class CombineStep {
      <<IOBoundary>>
      +continue_on_error bool
      +find_shards() bool
      +coverage_combine() void
      +coverage_json() void
    }
    class GateStep {
      <<Pure>>
      +if_always bool
      +check_coverage_file() bool
      +fail_under int
    }
    class CoverageFile {
      <<ValueObject>>
      +path str
    }
    ShardJob "4" *-- CoverageFile : produces
    IntegrationTestsJob *-- CombineStep : contains
    IntegrationTestsJob *-- GateStep : contains
    CombineStep ..> CoverageFile : reads shards, writes combined
    GateStep ..> CoverageFile : reads combined
    class GateStep:::touched
    classDef touched fill:#fff3b0,stroke:#d47600

flowchart TD
    S1["Shard 1: pytest --cov"] -->|".coverage.shard-1"| UA["Upload Artifact"]
    S2["Shard 2: pytest --cov"] -->|".coverage.shard-2"| UA
    S3["Shard 3: pytest --cov"] -->|".coverage.shard-3"| UA
    S4["Shard 4: pytest --cov"] -->|".coverage.shard-4"| UA
    UA --> DL["Download shard coverage (coverage-shards/)"]
    DL --> FIND{"find coverage-shards -name .coverage*"}
    FIND -->|"found"| COMBINE["coverage combine\ncoverage json\nscripts/coverage-summary.py"]
    FIND -->|"not found"| SKIP1["echo: no shard files -- skip\nexit 0"]
    COMBINE -->|"continue-on-error=true"| GATE_CHECK
    SKIP1 --> GATE_CHECK
    GATE_CHECK{"if: always\nif -f .coverage"}
    GATE_CHECK -->|".coverage exists"| ENFORCE["coverage report --fail-under=55\nexit nonzero if below threshold"]
    GATE_CHECK -->|"NO .coverage"| SILENT_PASS["echo: No combined coverage\nexit 0 -- GATE BYPASSED"]
    ENFORCE -->|"pass"| AGG["Aggregate shard results"]
    ENFORCE -->|"fail"| AGG
    SILENT_PASS --> AGG
    style SILENT_PASS fill:#ffcccc,stroke:#cc0000

Recommendation

The core value of PR #1417 is sound: ~4,500 hermetic tests, 88% unit / 71% integration coverage, and a new CI gate are a material quality step that strengthens APM's credibility. The test-coverage-expert's blocking finding is voided -- it describes the pre-PR base state that this PR explicitly fixes. The two real follow-ups (gate silent-bypass fix, lockfile hash regeneration) are bounded, low-risk, and do not touch production source code; they should be addressed as a fast follow before the next tagged release. Ship when the author acknowledges these two items with either a companion PR or a follow-up issue assignment.

Full per-persona findings

Python Architect

[recommended] Integration coverage gate silently passes when no .coverage file exists, defeating its purpose at .github/workflows/ci-integration.yml
The new "Enforce integration coverage gate" step checks if [ -f .coverage ] and echoes a skip message (exit 0) when the file is absent. If all shards time out, fail before writing coverage, or the combine step is skipped, the gate always passes -- exactly the scenario where a gate should fail loudly. The prior combine step has continue-on-error: true, which compounds this: a failed combine leaves no .coverage, and the gate silently greens. A coverage gate that bypasses itself on infrastructure failure is not a gate.
Suggested: Invert the guard: fail if .coverage is absent. Replace the else branch with else\n echo 'ERROR: No combined coverage data -- gate cannot be evaluated.'\n exit 1\nfi. If intentional skip-on-missing is desired for draft PRs, add an explicit opt-out env var rather than silent success.
[nit] Dead comment block left in "Combine and summarise coverage" step at .github/workflows/ci-integration.yml
Lines 268-270 still contain the old Phase 3 comment and the commented-out # Use: coverage report --fail-under=54 line after the gate was extracted into its own step. This is misleading -- a reader might think the gate was intentionally removed rather than moved.
Suggested: Remove the three-line comment block now that the gate lives in its own step.
[nit] Integration test helper factories duplicated across test files at tests/integration/test_integrators_end_to_end.py
_make_apm_package, _make_package_info, _make_copilot_project, _make_skill_dir appear verbatim in both test_integrators_end_to_end.py and test_commands_auth_flow.py. Three call sites already exist. Per the architect rule: abstract when 3+ call sites share the same logic pattern.
Suggested: Move shared factory functions into tests/integration/fixtures/ and import from there.

CLI Logging Expert

[recommended] CI coverage gate message "No combined coverage data; skipping gate." offers no debug path at .github/workflows/ci-integration.yml
When .coverage is absent the step silently skips with no indication of which prior step should have produced it. An agent or engineer debugging a broken pipeline has no actionable signal about what went wrong upstream.
Suggested: Append: echo " Hint: run pytest with --cov before this step. Check the coverage step exit code above." so the skip reason is actionable for both humans and CI log scrapers.
[nit] mock_err.assert_called_once() without message-content assertion in two install error tests at tests/unit/commands/test_install_error_handling.py:139
Lines 139 and 195 assert the error helper was called but never inspect call_args[0][0]. A silent message regression (empty string, wrong symbol, missing actionable hint) would pass.
Suggested: Add: assert 'owner/repo' in mock_err.call_args[0][0] after assert_called_once().
[nit] Inconsistent _rich_error patch target in test_apm_resolver_edge_cases.py at tests/unit/deps/test_apm_resolver_edge_cases.py:616
Patches apm_cli.utils.console._rich_error directly. All other new tests patch at the import site. Fragile when import style changes.
Suggested: Patch at the resolver's own module path: apm_cli.deps.apm_resolver._rich_error.
[nit] Production formatters.py uses bare x Error: prefix instead of STATUS_SYMBOLS['error'] at src/apm_cli/output/formatters.py:865
Test pins the bare-x form rather than catching the inconsistency with STATUS_SYMBOLS['error'] = '[x]'. Not introduced by this PR, but the new test cements the inconsistency rather than catching it.

DevX UX Expert

[recommended] "How to test" commands include --override-ini="addopts=" with no explanation
Community contributors copy-pasting the test commands will encounter --override-ini="addopts=" with zero context about why it is needed. Unexplained override flags erode trust and force a doc-hunt before a contributor can even verify their change.
Suggested: Add an inline comment explaining the purpose of --override-ini="addopts=", or simplify the command so the override is not needed for a basic local run.
[recommended] CI gate silently passes when .coverage is absent due to shard failure at .github/workflows/ci-integration.yml
Same bypass concern as python-architect. Silent exit 0 masks shard failures and means a broken pipeline leg can appear green.
Suggested: Replace the skip echo with exit 1 so a missing .coverage is a hard failure.
[nit] apm.lock.yaml now contains only version/timestamp/empty deps -- contributors may question its purpose
An essentially empty lockfile after cleanup may confuse contributors about whether it is working correctly or is stale. A one-line comment or docs note would close this perception gap.
[nit] Gate thresholds (80/55) not cross-referenced in pyproject.toml comments at pyproject.toml
No in-file context about the margin policy. Cargo.toml, npm scripts conventionally carry one-line "why" comments for non-obvious numeric values.
Suggested: Add # Phase 3 gate: current 88%, 8pt margin for community PRs inline comment next to fail_under = 80.

Supply Chain Security Expert

[recommended] Removing local_deployed_file_hashes disables SHA-256 tamper detection for 23 agent and instruction files at apm.lock.yaml
apm.lock.yaml previously recorded local_deployed_file_hashes for 23 files under .github/agents/ and .github/instructions/. With both fields removed, apm audit --ci --no-drift silently skips hash verification for all agent definition and instruction files. These files define AI agent personas/behaviors -- a tampered supply-chain-security-expert.agent.md would go undetected. The correct resolution is to run apm install to regenerate correct hashes rather than removing the integrity anchor entirely.
Suggested: Run apm install locally to repopulate local_deployed_files and local_deployed_file_hashes with correct SHA-256 values for the current agent and instruction files, then commit the regenerated lockfile.
[nit] Coverage gate silently skips when .coverage is absent -- prefer explicit failure at .github/workflows/ci-integration.yml
Silent skip means a broken pipeline leg produces a green gate.
Suggested: Use exit 1 when .coverage is absent so gate bypass is visible in CI annotations.
[nit] copilot-instructions.md version comment (0.12.4 -> 0.14.1) skips 0.13.x -- verify no dependency resolution gap at .github/copilot-instructions.md
Metadata only (no apm.lock.yaml dependency version change), so no actual resolver impact. Worth a quick sanity check that no intermediate version introduced a breaking lockfile schema change.

OSS Growth Hacker

[recommended] Phase-scoped test naming baked into comments will confuse future contributors at .github/workflows/ci-integration.yml
Contributors won't know whether to name new tests "phase3" or "phase4", and the baseline date will bitrot. OSS projects with unclear naming conventions see higher PR abandonment rates on first contributions.
Suggested: Replace phase-scoped comments with a single stable comment: "# Integration coverage gate -- raise this floor when baseline improves. Current floor: 55%." Add a one-liner in CONTRIBUTING.md pointing contributors to the gate locations.
[nit] apm.lock.yaml cleanup is a positive hygiene signal -- worth a one-line CHANGELOG entry
Removing stale local deployment metadata reduces noise for contributors. Currently invisible because it is bundled in a test-coverage PR. A brief CHANGELOG entry makes the signal legible to adopters.
Suggested: Add a one-line entry to CHANGELOG.md.
[nit] Gate mismatch: comment says target 54%, gate is 55% -- stale copy creates trust drag at .github/workflows/ci-integration.yml
Small inconsistencies erode the "rigorous team" credibility signal the PR is trying to project.
Suggested: Sync the comment target to match the actual --fail-under value before merge.

Auth Expert -- inactive

No auth source files changed (auth.py, token_manager.py, azure_cli.py, github_downloader.py, github_host.py all untouched); the one integration test touching auth (test_commands_auth_flow.py) exercises AuthResolver correctly via proper env-isolation and targeted external-I/O mocking (_NO_GIT_CRED/_NO_GH_CLI), with no bypass patterns.

Doc Writer

[recommended] Add a CHANGELOG entry for Phase 3 of the coverage ratchet
PR add(coverage): Phase 3 -- unit 88% / integration 71% #1417 completes a named, multi-phase engineering initiative (unit 88%, integration 71%, fail_under raised from 75 to 80, CI coverage gate added). Milestones of this scope are externally observable quality signals -- contributors and downstream consumers use the CHANGELOG to understand project health trajectory. The current [Unreleased] section and the [0.14.1] release block are both silent on this. A one-liner under ### Changed or a new ### Quality heading in [Unreleased] would close the gap. Not a hard blocker.

Test Coverage Expert

[blocking] Integration coverage gate is commented out -- the 55% floor claimed by the PR is never enforced in CI at .github/workflows/ci-integration.yml
CEO note: this finding is voided. The panelist analyzed the BASE branch file (pre-merge). The PR diff removes the old commented-out lines and adds a new separate "Enforce integration coverage gate" step. See Dissent section above.
Proof (failed at static): .github/workflows/ci-integration.yml::Combine and summarise coverage (fan-in job) -- proves: Integration coverage is gated at 55% and CI fails when coverage drops below that threshold
# Use: coverage report --fail-under=54 <-- commented out; step also has continue-on-error: true
[recommended] continue-on-error: true on the coverage combine step will swallow any future gate failure at .github/workflows/ci-integration.yml
CEO note: voided -- this finding references the old combine step state. The new separate gate step added by this PR does not carry continue-on-error.
Proof (failed at static): .github/workflows/ci-integration.yml::Combine and summarise coverage -- proves: A failing coverage gate causes CI to fail when coverage drops below threshold
continue-on-error: true # applies to the entire step, including any future coverage report --fail-under invocation
[recommended] Gate threshold comment says 54% but PR body claims 55% at .github/workflows/ci-integration.yml
CEO note: voided -- the comment is removed by this PR.
Proof (failed at static): .github/workflows/ci-integration.yml -- proves: The integration coverage gate value matches the stated policy of 55%
# Use: coverage report --fail-under=54 (PR body: 'gated at 55%'; pyproject.toml fail_under=80 for unit)
[nit] New integration test files correctly marked hermetic; confirm shard routing marker is sufficient at tests/integration/
All sampled new integration test files correctly set pytestmark = pytest.mark.integration and explicitly state "No live network calls". Live-marker discipline is clean.
Proof (passed at static): tests/integration/test_integrators_hooks_execution.py::pytestmark declaration -- proves: New integration tests are hermetic and will not make live network calls in CI
pytestmark = pytest.mark.integration # docstring: 'No live network calls'; requests.get always patched

_{This panel is advisory. It does not block merge. Re-apply the

panel-review label after addressing feedback to re-run.}

Generated by PR Review Panel for issue #1417 · ● 5.1M · ◷

Address APM Review Panel feedback: - Invert .coverage guard in integration gate: exit 1 when absent so shard failures cannot silently bypass the coverage gate. - Add CHANGELOG entry for Phase 3 coverage milestone. - Add inline comments documenting gate thresholds and margins. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sergio-sisternes-epam · 2026-05-20T19:22:06Z

Panel feedback addressed -- thank you

Thanks to the panel for a thorough and well-calibrated review. The CEO arbitration on the test-coverage-expert findings was spot-on -- those correctly described the base branch state that this PR fixes.

Changes landed (commit `15cd3039`)

Finding	Resolution
Silent gate bypass (Python Architect, DevX UX, CLI Logging, Supply Chain)	Gate now exits 1 when `.coverage` is absent, with an actionable error naming the upstream combine step.
CHANGELOG entry (Doc Writer)	Added under `[Unreleased] > Changed` with gate values and issue ref.
Phase-scoped naming (OSS Growth)	Already addressed in prior commit -- renamed all 85 test files from `_phase3*` to descriptive names (e.g. `_error_handling`, `_hermetic`, `_cli_surface`).
Gate threshold documentation (DevX UX)	Added inline comments to `pyproject.toml` and `ci-integration.yml` documenting current coverage, margins, and ratchet policy.
Coverage artifacts in repo	Added `.gitignore` patterns for `coverage*.json` to prevent future leaks.

Deferred to follow-up

Finding	Plan
`apm.lock.yaml` hash regeneration (Supply Chain)	Will run `apm install` to repopulate `local_deployed_file_hashes` before next tagged release. Not a test-PR concern.
Shared test fixtures (Python Architect)	Extracting duplicated factory helpers to `tests/integration/fixtures/` -- good hygiene, separate scope.
`--override-ini` explanation (DevX UX)	PR body improvement, not a code change.

sergio-sisternes-epam requested a review from danielmeppiel as a code owner May 20, 2026 14:34

Copilot AI review requested due to automatic review settings May 20, 2026 14:34

sergio-sisternes-epam added the tests label May 20, 2026

Copilot AI reviewed May 20, 2026

View reviewed changes

sergio-sisternes-epam force-pushed the coverage/1402 branch from 9fe6fdf to 8984d0b Compare May 20, 2026 14:55

sergio-sisternes-epam enabled auto-merge May 20, 2026 14:57

github-advanced-security AI found potential problems May 20, 2026

View reviewed changes

sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label May 20, 2026

github-advanced-security AI found potential problems May 20, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 20, 2026

[aw] PR Review Panel failed #1420

Open

danielmeppiel added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels May 20, 2026

sergio-sisternes-epam force-pushed the coverage/1402 branch from 38b835a to 15cd303 Compare May 20, 2026 19:20

danielmeppiel approved these changes May 20, 2026

View reviewed changes

sergio-sisternes-epam added this pull request to the merge queue May 20, 2026

Merged via the queue into microsoft:main with commit 1e67882 May 20, 2026
9 checks passed

sergio-sisternes-epam deleted the coverage/1402 branch May 20, 2026 19:44

Conversation

sergio-sisternes-epam commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Problem (WHY)

Approach (WHAT)

Implementation (HOW)

Diagrams

Trade-offs

Benefits

Validation

How to test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 20, 2026

APM Review Panel: ship_with_followups

Panel summary

Top 5 follow-ups

Architecture

Recommendation

Python Architect

CLI Logging Expert

DevX UX Expert

Supply Chain Security Expert

OSS Growth Hacker

Auth Expert -- inactive

Doc Writer

Test Coverage Expert

Uh oh!

sergio-sisternes-epam commented May 20, 2026

Panel feedback addressed -- thank you

Changes landed (commit 15cd3039)

Deferred to follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sergio-sisternes-epam commented May 20, 2026 •

edited

Loading

APM Review Panel: `ship_with_followups`

Changes landed (commit `15cd3039`)