Test Coverage Automation: Achieve 100% in 2 Weeks

Q: Does the pipeline require knowledge of Playwright or test frameworks?

No. Contributors only need to know the feature they are testing. The pipeline encodes all framework knowledge — page object patterns, selector strategies, BDD syntax, CI parity configuration, known anti-patterns — and enforces them automatically. Developers and business analysts ran it independently without any training.

Q: How many human gates are in the pipeline?

Three. (1) Requirement Update Approval — team reviews agent findings before code is written. (2) Feature File Approval — team reviews the BDD feature file before TypeScript is generated. (3) Human Final Review — a team member reviews the final PR diff before it merges to master. Human judgment is preserved at every critical decision point.

The Bottom Line

We stopped asking AI to write code. We asked it to run a pipeline. That one decision took a regulated medical device software team from 25% to 100% automated test coverage in two weeks — with the entire team contributing, not just QA engineers.

01Where We Started — and Why Coverage Stalled

Two weeks ago our project sat at 25% automated test coverage. Not because we lacked requirements — we had a full backlog. Not because the QA team wasn’t working hard. Because automation was siloed.

QA automation on a regulated medical device platform that interprets genomic variants for clinical labs carries real weight. Every automated test needs a traceable link from requirement to test evidence, a peer-reviewed pull request before merge, human sign-off before code lands in main, consistent patterns across 30+ contributor touchpoints, and Jira tickets in the correct state at every handoff.

Only QA engineers who knew the Playwright framework, understood the page object pattern, and had memorised which step definitions already existed could contribute. Developers who built the features and business analysts who wrote the acceptance criteria sat on the sidelines. The bottleneck wasn’t effort — it was access.

We tried AI code generation. It helped. But copy-pasting from a chat window into an IDE, debugging alone, and manually filing Jira tickets brought the gains back down to earth fast.

Coverage stalled at 25% not because we lacked requirements — but because we lacked bandwidth from the narrow group of people who could write the tests.

02The Shift — Stop Generating Code, Start Orchestrating Agents

The real unlock came when we stopped treating AI as a faster typist and started treating it as an orchestrator. We built an AI agent pipeline where Claude agents don’t just write code — they hand off to each other, gate themselves at quality checkpoints, self-heal when tests fail, and close the Jira loop end-to-end.

The key design principle: the pipeline requires zero knowledge of the test framework to use it. The person feeding it a Jira ticket number needs to know the feature. The agents handle everything else — the right patterns, the right selectors, the right file structure, the framework conventions accumulated across every previous PR.

The pipeline has 23 orchestrated agents across 5 phases. Here is how each one works.

AI Agent Orchestration Pipeline — All 23 Agents

AI Agent Orchestration Pipeline

Jira Requirement → Automated Tests → RTM → Merged PR · Fully Agentic · Self-Improving · Zero Framework Knowledge Required

Phase 1 · Intake · Analysis · Discovery

TRIGGERJira PR InputFetch ticket · AC · context

→

JIRA APICreate TaskAssign · link to req

→

JIRA→ In ProgressBoard reflects reality

→

ANALYSISAnalyze & ClarifyReusable steps · edge cases

→

BROWSERApp ReviewLive DOM · toasts · modals

→

HUMAN GATE 1Req Update ApprovalApp vs spec gap sign-off

→

PLAYWRIGHTDOM GatheringExact selectors · hidden inputs

↓

Phase 2 · Spec · Code Generation · Static Analysis

GITCreate Feature BranchCorrect base · naming conv.

→

SPECDraft Feature FileCucumber BDD · tags

→

HUMAN GATE 2Feature File ApprovalGO / NO-GO · redraft loopNO-GO ↺ redraft

→

CODE AGENTGenerate Test CodePage objects · step defs

→

STATIC ANALYSISStatic Analysistsc · pattern BLOCKERs

↓

Phase 3 · Test Execution · Self-Heal

TEST RUNNER · 4 PARALLEL WORKERSRun Generated TestsHeadless Playwright · CI parity

→

SELF-HEAL AGENTSelf-Heal ↺100% PASS GATE — non-negotiableretry loop until 100%

↓

Phase 4 · Delivery · Review · RTM · Merge · Learn

JIRA · RTMCreate Test Tickets1 ticket/scenario · req link = RTM

→

GIT · GITHUBCommit · Push · PRTest files only

→

JIRA→ Ready for ReviewBoard reflects real state

→

PR REVIEWERAutomated PR ReviewFramework standards

→

HUMAN GATE 3Human Final ReviewGO / NO-GO before merge

AUTO-MERGE · CI PASSAuto-Merge to MasterSquash · delete branch

→

JIRA→ DoneTask closed · fully traceable

→

LEARN AGENTPattern TrackerAppend Known Issues Log ↺

Known Issues Log — Cross-PR Pattern Memory

READ BY

Generate Test Code (avoid patterns)
Static Analysis (escalated = BLOCKER)
Self-Heal (known-bad selectors)

WRITTEN BY

Pattern Tracker after every merge
2+ occurrences → ESCALATED
3+ occurrences → AUTO-BLOCKER
Pipeline improves with each PR

COVERAGE IN 2 WEEKS

25% → 100%

Entire team · Zero framework knowledge

250+

Requirements
Automated

1,800+

Test Scenarios
Created

1,800+

RTM Tickets
Auto-Created

4×

Parallel Workers
CI Parity

Manual Code
Written

Human Gates
Per Ticket

Intake / Jira Analysis / Discovery Code Generation Test Execution / Self-Heal Delivery / RTM / Learn Human Gate

Key insight: Contributors only need domain knowledge — framework patterns, selectors, and best practices are encoded in the pipeline and enforced automatically.

23 agents across 5 phases — from Jira ticket to merged PR with full RTM. Human gates at Requirement Approval, Feature File Approval, and Final Review. The Known Issues Log feeds back into every new run.

03The Pipeline — Phase by Phase

Here is exactly what happens from the moment a Jira ticket number enters the pipeline to the moment a PR merges and the RTM updates.

Requirement Intake 3 Agents

JIRA APITRIGGER

A developer drops a Jira requirement ticket number into the pipeline. Three things happen automatically — zero clicks from the engineer.

Jira PR Input — fetches the full ticket: acceptance criteria, description, project context
Create Task from Requirement — checks if a linked task exists; if not, creates one, assigns it, and links it back to the requirement
Transition: In Progress — moves the Jira task so the whole team sees who is working on what. The board already reflects reality

Analysis & Discovery 4 Agents

BROWSER AGENTANALYSIS

This phase asks the hard questions before writing a single line of code — a discipline most human developers skip.

Analyze & Clarify — reads the requirement, searches every existing step definition for reusable steps, identifies edge cases, asks clarifying questions. Won’t proceed until ambiguities are resolved

App Review — navigates the live application with a Playwright browser agent and documents every DOM element: selectors, toast notifications, modal IDs, button states

Req Update Approval Human Gate 1 — presents agent findings for team sign-off before any code is written. If the live app diverges from the requirement, caught here — not in a failing CI run at midnight

DOM Gathering — captures precise locators: hidden elements, multiselect plugins, Bootstrap modal IDs, TinyMCE editor instances. Not guesses. Actuals.

Specification & Code Generation 5 Agents

BDDCODESTATIC ANALYSIS

Create Feature Branch — checks out a new branch following naming conventions, always from the correct base. Prevents merge conflicts that plagued early runs

Draft Feature File — writes the Cucumber BDD .feature file: scenarios, Given/When/Then steps, correct tags (@regression, feature tag, Jira ticket tag)

Feature File Approval Human Gate 2 — team reviews and approves or requests changes. A NO-GO loops the agent back to redraft before a single line of TypeScript is generated

Generate Test Code — produces page object methods and step definitions. Reads the Known Issues Log before writing — every anti-pattern already discovered is avoided from the start

Static Analysis — runs tsc –noEmit and pattern checks. Escalated patterns are automatic BLOCKERs: raw locators in step files, unconditional waitForTimeout, unguarded networkidle. Any fire — code is rejected

Test Execution & Self-Healing 2 Agents

EXECUTORSELF-HEAL

Tests run with 4 parallel workers matching CI configuration. Running with fewer workers masks race conditions that will fail in GitHub Actions. The configuration is intentional.

AGENT 1

Run Generated Tests

Executes all scenarios with npm run test:parallel:tag — CI parity from the start. 4 parallel workers, no shortcuts.

AGENT 2 · LOOP

Self-Heal Until 100% Pass

Reads the Playwright error trace, identifies root cause, cross-references the Known Issues Log, applies a targeted fix, re-runs, and loops. 100% pass gate is non-negotiable. No failing test reaches Phase 5. Hardest ticket: 11 iterations — the agent figured it out.

Delivery, Review & Traceability 9 Agents

RTMPRMERGELEARN

Create Test Tickets & Tag Scenarios — one Jira Test ticket per passing scenario, direct link to parent requirement. This is the RTM. No spreadsheet.

Commit, Push & Create PR — commits only test files (non-test commits blocked by Static Analysis), pushes to feature branch, raises GitHub PR

Transition: Ready for Review — moves Jira from In Progress to Ready for Review automatically

Automated PR Review — checks entire diff against framework standards: BasePage method usage, locator encapsulation, console.log prefixing, SOLID patterns

Fix Issues — applies PR review findings autonomously, up to 3 rounds, before escalating to a human

Human Final Review Human Gate 3 — a team member reviews the final diff. The last line of defense before code lands in master

Auto-Merge to Master — squash-merges the PR, deletes the feature branch, requires CI to pass

Transition: Done — closes the Jira task

Pattern Tracker — reads full run history, extracts new issues, appends to Known Issues Log. 2+ occurrences → ESCALATED. 3+ → AUTO-BLOCKER in all future Static Analysis runs

04Real Numbers From 250+ Production Runs

100%

25% → 100% in 2 weeks

250+

Requirements automated

1,800+

Test scenarios created

Lines of code written

Max self-heal iterations

Metric	Result
Test coverage before pipeline	25%
Test coverage after 2 weeks	100%
Requirements automated	250+
Test scenarios created	1,800+
Jira RTM tickets auto-created	1,800+
PRs merged	250+
Clean first-run passes (0 self-heal needed)	~40% of tickets
Most self-heal iterations on a single ticket	11 (19-scenario modal/multiselect run)
Parallel workers during test run	4 (matching CI)
Lines of test code written manually	0
Human gates per ticket	3

05You Don’t Need to Know the Framework

In the first week, QA engineers were the only contributors. By the second week, developers were running it independently. Not because anyone ran a training session. Because the workflow abstracts the framework entirely.

When a developer drops a Jira ticket into the pipeline, here is everything they need to know:

→ What the feature does (they already know — they built it)
→ Whether the agent’s feature file covers the scenarios correctly (Human Gate 2)
→ Whether the final PR diff looks reasonable (Human Final Review)

They don’t need to know what a page object pattern is. How Cucumber BDD syntax works. Why waitForFunction is not the same as waitForTimeout. How to scope a DataTables search selector to <tfoot> instead of <thead>. Why networkidle needs a .catch() guard in parallel execution. How to handle a Bootstrap static modal blocking Playwright navigation.

The pipeline knows all of this. The Known Issues Log has encoded every hard-won lesson from every previous PR. The Static Analysis agent enforces it. The institutional knowledge that previously lived only in the heads of the most experienced QA engineers — now available to everyone, every time, automatically.

Coverage moved from 25% to 100% not because we hired more QA engineers. It moved because we removed the prerequisite that you had to be a QA engineer to contribute.

06The Self-Improving System

The part worth dwelling on isn’t the speed. It’s that the system gets smarter with every PR.

The Pattern Tracker appends to the Known Issues Log after every merge. The Generate Test Code agent reads that log before writing. The Static Analysis agent treats escalated entries as BLOCKERs. The cycle is closed: every run teaches the pipeline something new, and that knowledge propagates into every future run automatically.

Early patterns — forgetting to scope DataTables <tfoot> search selectors, using waitForTimeout instead of polling with waitForFunction — were blockers in early tickets. They haven’t appeared in new code for weeks. We don’t document best practices in a wiki nobody reads. We encode them as hard gates that fire before code ships.

The self-heal agent’s effectiveness is bounded by the quality of Playwright’s error traces — which are rich enough that root cause diagnosis is almost always correct. The cases where it struggled were genuine upstream app defects. The agent correctly tagged those as @wip rather than attempting to fix the wrong thing. That judgment matters.

07What Full RTM Looks Like Now

Our regulatory compliance team needs evidence that every requirement has a tested scenario. Previously, that was a manually maintained spreadsheet — one QA engineer, one Friday afternoon, copying ticket numbers.

Now: every Jira requirement processed through the pipeline gets a Task ticket linked to the requirement, a set of Test tickets (one per scenario, each linked to the requirement), every ticket in the correct Jira state when the PR merges, and a live Jira link tree that is the RTM. When the compliance team pulls a Requirements Traceability Matrix report — it’s populated from the pipeline output. Real-time, always accurate, no manual entry.

08What Surprised Us

The DOM Gathering agent is the most valuable node in the pipeline. Wrong selectors are the number one cause of flaky tests. Having an agent browse the live application and extract precise locators — including hidden inputs, plugin-wrapped elements, and modal IDs — before a single test is written eliminated an entire class of failures that previously consumed most of the debugging time.

The Known Issues Log pattern escalation works. We were skeptical that a text file read by an agent would actually prevent repeat mistakes. It does. Patterns escalated as BLOCKERs have not recurred in new code. The scepticism was wrong.

Self-heal loops are bounded by error trace quality. Playwright’s output is rich enough that the agent almost always diagnoses the correct root cause. The hardest ticket — Bootstrap static backdrop modals blocking navigation, multiselect plugins hiding underlying <select> elements, SweetAlert toasts dismissed before the polling loop caught them — required 11 iterations. All resolved.

The future of QA automation isn’t faster typing. It’s removing the requirement to type at all.

09What’s Next

Three areas we’re expanding into:

01 Expanding the Known Issues Log to include app behavior quirks — documented environments, feature flags, known defects — so the agent can make smarter scenario design decisions
02 Running the pipeline against the full requirements backlog to surface which tickets still have zero test coverage
03 Integrating test result uploads directly into Jira’s test execution records for full traceability from test run to evidence artifact

Chandrasekhar Boddi

Lead SDET · Sails Software

Chandrasekhar Boddi builds AI-powered QA automation systems for regulated medical device software at Sails Software. He specialises in agent orchestration pipelines, Playwright-based test frameworks, and Requirements Traceability Matrix automation for clinical and compliance-heavy environments.

Common Questions

Frequently Asked Questions

How did the team go from 25% to 100% test coverage in 2 weeks?

By replacing manual test writing with a 23-agent AI orchestration pipeline. Instead of asking AI to generate code in a chat window, the team built a pipeline where Claude agents handle the entire workflow — from fetching Jira requirements to writing Playwright tests, self-healing failures, creating RTM tickets, and merging PRs. The whole team contributed, not just QA engineers, which eliminated the bandwidth bottleneck.

Does the pipeline require knowledge of Playwright or test frameworks?

No. Contributors only need domain knowledge of the feature they’re testing. The pipeline encodes all framework knowledge — page object patterns, selector strategies, BDD syntax, CI parity configuration, known anti-patterns — and enforces them automatically. Developers and business analysts ran it independently without any training or Playwright documentation.

What is the Known Issues Log and how does it prevent repeat mistakes?

The Known Issues Log is a cross-PR pattern memory that accumulates every anti-pattern discovered during pipeline runs. The Generate Test Code agent reads it before writing. The Static Analysis agent treats escalated entries as automatic BLOCKERs. A pattern seen twice becomes escalated; seen three or more times it becomes a hard BLOCKER that prevents code from passing static analysis regardless of who or what wrote it.

How does the self-healing test agent work?

When tests fail, the Self-Heal agent reads the Playwright error trace, identifies the root cause (selector problem, timing issue, data dependency, or app defect), cross-references the Known Issues Log to avoid reintroducing known-bad patterns, applies a targeted fix, and re-runs affected scenarios. It loops until 100% pass rate is achieved or a hard cap is reached. The 100% pass gate is non-negotiable — no failing test reaches the delivery phase.

What does automated RTM look like with this pipeline?

Every Jira requirement processed through the pipeline gets a linked Task ticket and a set of Test tickets — one per passing scenario — automatically. All tickets are in the correct Jira state when the PR merges. The Jira issue link tree becomes the Requirements Traceability Matrix: real-time, always accurate, no spreadsheet, no manual entry. Compliance teams pull the RTM report directly from Jira.

How many human gates are in the pipeline and what do they cover?

Three. (1) Requirement Update Approval — the team reviews agent findings before any code is written. If the live app diverges from the requirement, it’s caught here. (2) Feature File Approval — the team reviews the BDD feature file before TypeScript is generated. A NO-GO loops the agent back to redraft. (3) Human Final Review — a team member reviews the final PR diff before it merges to master. Human judgment is preserved at every critical decision point.

Is Your Team’s Test Coverage Stuck?

If you have a backlog of untested requirements and a coverage number that hasn’t moved in months, we can show you what this pipeline looks like in practice. You don’t need a bigger QA team. You need a better pipeline.

Talk to Our Team

From 25% to 100% Test Coverage
in 2 Weeks — Without Writing
a Single Line of Code

01Where We Started — and Why Coverage Stalled

02The Shift — Stop Generating Code, Start Orchestrating Agents

03The Pipeline — Phase by Phase

04Real Numbers From 250+ Production Runs

05You Don’t Need to Know the Framework

06The Self-Improving System

07What Full RTM Looks Like Now

08What Surprised Us

09What’s Next

Frequently Asked Questions

Is Your Team’s Test Coverage Stuck?

Related

01Where We Started — and Why Coverage Stalled

02The Shift — Stop Generating Code, Start Orchestrating Agents

03The Pipeline — Phase by Phase

04Real Numbers From 250+ Production Runs

05You Don’t Need to Know the Framework

06The Self-Improving System

07What Full RTM Looks Like Now

08What Surprised Us

09What’s Next

Frequently Asked Questions

Is Your Team’s Test Coverage Stuck?

Related

Discover more from Sails Software