From 25% to 100% Test Coverage in 2 Weeks | Sails Software
QA Automation · AI Orchestration

From 25% to 100% Test Coverage
in 2 Weeks — Without Writing
a Single Line of Code

A 23-agent AI orchestration pipeline. 250+ requirements automated. 1,800+ test scenarios created. Zero manual code. And a Requirements Traceability Matrix that builds itself.

Chandrasekhar Boddi
12 min read
June 26, 2026
QA Automation AI Agents Playwright Medical Device
The Bottom Line

We stopped asking AI to write code. We asked it to run a pipeline. That one decision took a regulated medical device software team from 25% to 100% automated test coverage in two weeks — with the entire team contributing, not just QA engineers.

01Where We Started — and Why Coverage Stalled

Two weeks ago our project sat at 25% automated test coverage. Not because we lacked requirements — we had a full backlog. Not because the QA team wasn’t working hard. Because automation was siloed.

QA automation on a regulated medical device platform that interprets genomic variants for clinical labs carries real weight. Every automated test needs a traceable link from requirement to test evidence, a peer-reviewed pull request before merge, human sign-off before code lands in main, consistent patterns across 30+ contributor touchpoints, and Jira tickets in the correct state at every handoff.

Only QA engineers who knew the Playwright framework, understood the page object pattern, and had memorised which step definitions already existed could contribute. Developers who built the features and business analysts who wrote the acceptance criteria sat on the sidelines. The bottleneck wasn’t effort — it was access.

We tried AI code generation. It helped. But copy-pasting from a chat window into an IDE, debugging alone, and manually filing Jira tickets brought the gains back down to earth fast.

Coverage stalled at 25% not because we lacked requirements — but because we lacked bandwidth from the narrow group of people who could write the tests.

02The Shift — Stop Generating Code, Start Orchestrating Agents

The real unlock came when we stopped treating AI as a faster typist and started treating it as an orchestrator. We built an AI agent pipeline where Claude agents don’t just write code — they hand off to each other, gate themselves at quality checkpoints, self-heal when tests fail, and close the Jira loop end-to-end.

The key design principle: the pipeline requires zero knowledge of the test framework to use it. The person feeding it a Jira ticket number needs to know the feature. The agents handle everything else — the right patterns, the right selectors, the right file structure, the framework conventions accumulated across every previous PR.

The pipeline has 23 orchestrated agents across 5 phases. Here is how each one works.

AI Agent Orchestration Pipeline — All 23 Agents
AI Agent Orchestration Pipeline
Jira Requirement → Automated Tests → RTM → Merged PR  ·  Fully Agentic  ·  Self-Improving  ·  Zero Framework Knowledge Required
Phase 1  ·  Intake · Analysis · Discovery
TRIGGERJira PR InputFetch ticket · AC · context
JIRA APICreate TaskAssign · link to req
JIRA→ In ProgressBoard reflects reality
ANALYSISAnalyze & ClarifyReusable steps · edge cases
BROWSERApp ReviewLive DOM · toasts · modals
HUMAN GATE 1Req Update ApprovalApp vs spec gap sign-off
PLAYWRIGHTDOM GatheringExact selectors · hidden inputs
Phase 2  ·  Spec · Code Generation · Static Analysis
GITCreate Feature BranchCorrect base · naming conv.
SPECDraft Feature FileCucumber BDD · tags
HUMAN GATE 2Feature File ApprovalGO / NO-GO · redraft loopNO-GO ↺ redraft
CODE AGENTGenerate Test CodePage objects · step defs
STATIC ANALYSISStatic Analysistsc · pattern BLOCKERs
Phase 3  ·  Test Execution · Self-Heal
TEST RUNNER · 4 PARALLEL WORKERSRun Generated TestsHeadless Playwright · CI parity
SELF-HEAL AGENTSelf-Heal ↺100% PASS GATE — non-negotiableretry loop until 100%
Phase 4  ·  Delivery · Review · RTM · Merge · Learn
JIRA · RTMCreate Test Tickets1 ticket/scenario · req link = RTM
GIT · GITHUBCommit · Push · PRTest files only
JIRA→ Ready for ReviewBoard reflects real state
PR REVIEWERAutomated PR ReviewFramework standards
HUMAN GATE 3Human Final ReviewGO / NO-GO before merge
AUTO-MERGE · CI PASSAuto-Merge to MasterSquash · delete branch
JIRA→ DoneTask closed · fully traceable
LEARN AGENTPattern TrackerAppend Known Issues Log ↺
Known Issues Log — Cross-PR Pattern Memory
READ BY
  • Generate Test Code (avoid patterns)
  • Static Analysis (escalated = BLOCKER)
  • Self-Heal (known-bad selectors)
WRITTEN BY
  • Pattern Tracker after every merge
  • 2+ occurrences → ESCALATED
  • 3+ occurrences → AUTO-BLOCKER
  • Pipeline improves with each PR
COVERAGE IN 2 WEEKS
25%100%
Entire team · Zero framework knowledge
250+
Requirements
Automated
1,800+
Test Scenarios
Created
1,800+
RTM Tickets
Auto-Created
Parallel Workers
CI Parity
0
Manual Code
Written
3
Human Gates
Per Ticket
Intake / Jira Analysis / Discovery Code Generation Test Execution / Self-Heal Delivery / RTM / Learn Human Gate
Key insight: Contributors only need domain knowledge — framework patterns, selectors, and best practices are encoded in the pipeline and enforced automatically.
23 agents across 5 phases — from Jira ticket to merged PR with full RTM. Human gates at Requirement Approval, Feature File Approval, and Final Review. The Known Issues Log feeds back into every new run.

03The Pipeline — Phase by Phase

Here is exactly what happens from the moment a Jira ticket number enters the pipeline to the moment a PR merges and the RTM updates.

P1
Requirement Intake 3 Agents
JIRA APITRIGGER

A developer drops a Jira requirement ticket number into the pipeline. Three things happen automatically — zero clicks from the engineer.

  • Jira PR Input — fetches the full ticket: acceptance criteria, description, project context
  • Create Task from Requirement — checks if a linked task exists; if not, creates one, assigns it, and links it back to the requirement
  • Transition: In Progress — moves the Jira task so the whole team sees who is working on what. The board already reflects reality
P2
Analysis & Discovery 4 Agents
BROWSER AGENTANALYSIS

This phase asks the hard questions before writing a single line of code — a discipline most human developers skip.

Analyze & Clarify — reads the requirement, searches every existing step definition for reusable steps, identifies edge cases, asks clarifying questions. Won’t proceed until ambiguities are resolved
App Review — navigates the live application with a Playwright browser agent and documents every DOM element: selectors, toast notifications, modal IDs, button states
Req Update Approval Human Gate 1 — presents agent findings for team sign-off before any code is written. If the live app diverges from the requirement, caught here — not in a failing CI run at midnight
DOM Gathering — captures precise locators: hidden elements, multiselect plugins, Bootstrap modal IDs, TinyMCE editor instances. Not guesses. Actuals.
P3
Specification & Code Generation 5 Agents
BDDCODESTATIC ANALYSIS
1
Create Feature Branch — checks out a new branch following naming conventions, always from the correct base. Prevents merge conflicts that plagued early runs
2
Draft Feature File — writes the Cucumber BDD .feature file: scenarios, Given/When/Then steps, correct tags (@regression, feature tag, Jira ticket tag)
Feature File Approval Human Gate 2 — team reviews and approves or requests changes. A NO-GO loops the agent back to redraft before a single line of TypeScript is generated
4
Generate Test Code — produces page object methods and step definitions. Reads the Known Issues Log before writing — every anti-pattern already discovered is avoided from the start
5
Static Analysis — runs tsc –noEmit and pattern checks. Escalated patterns are automatic BLOCKERs: raw locators in step files, unconditional waitForTimeout, unguarded networkidle. Any fire — code is rejected
P4
Test Execution & Self-Healing 2 Agents
EXECUTORSELF-HEAL

Tests run with 4 parallel workers matching CI configuration. Running with fewer workers masks race conditions that will fail in GitHub Actions. The configuration is intentional.

AGENT 1
Run Generated Tests
Executes all scenarios with npm run test:parallel:tag — CI parity from the start. 4 parallel workers, no shortcuts.
AGENT 2 · LOOP
Self-Heal Until 100% Pass
Reads the Playwright error trace, identifies root cause, cross-references the Known Issues Log, applies a targeted fix, re-runs, and loops. 100% pass gate is non-negotiable. No failing test reaches Phase 5. Hardest ticket: 11 iterations — the agent figured it out.
P5
Delivery, Review & Traceability 9 Agents
RTMPRMERGELEARN
Create Test Tickets & Tag Scenarios — one Jira Test ticket per passing scenario, direct link to parent requirement. This is the RTM. No spreadsheet.
Commit, Push & Create PR — commits only test files (non-test commits blocked by Static Analysis), pushes to feature branch, raises GitHub PR
Transition: Ready for Review — moves Jira from In Progress to Ready for Review automatically
Automated PR Review — checks entire diff against framework standards: BasePage method usage, locator encapsulation, console.log prefixing, SOLID patterns
Fix Issues — applies PR review findings autonomously, up to 3 rounds, before escalating to a human
Human Final Review Human Gate 3 — a team member reviews the final diff. The last line of defense before code lands in master
Auto-Merge to Master — squash-merges the PR, deletes the feature branch, requires CI to pass
Transition: Done — closes the Jira task
Pattern Tracker — reads full run history, extracts new issues, appends to Known Issues Log. 2+ occurrences → ESCALATED. 3+ → AUTO-BLOCKER in all future Static Analysis runs

04Real Numbers From 250+ Production Runs

100%
25% → 100% in 2 weeks
250+
Requirements automated
1,800+
Test scenarios created
0
Lines of code written
11
Max self-heal iterations
MetricResult
Test coverage before pipeline25%
Test coverage after 2 weeks100%
Requirements automated250+
Test scenarios created1,800+
Jira RTM tickets auto-created1,800+
PRs merged250+
Clean first-run passes (0 self-heal needed)~40% of tickets
Most self-heal iterations on a single ticket11 (19-scenario modal/multiselect run)
Parallel workers during test run4 (matching CI)
Lines of test code written manually0
Human gates per ticket3

05You Don’t Need to Know the Framework

In the first week, QA engineers were the only contributors. By the second week, developers were running it independently. Not because anyone ran a training session. Because the workflow abstracts the framework entirely.

When a developer drops a Jira ticket into the pipeline, here is everything they need to know:

  • What the feature does (they already know — they built it)
  • Whether the agent’s feature file covers the scenarios correctly (Human Gate 2)
  • Whether the final PR diff looks reasonable (Human Final Review)

They don’t need to know what a page object pattern is. How Cucumber BDD syntax works. Why waitForFunction is not the same as waitForTimeout. How to scope a DataTables search selector to <tfoot> instead of <thead>. Why networkidle needs a .catch() guard in parallel execution. How to handle a Bootstrap static modal blocking Playwright navigation.

The pipeline knows all of this. The Known Issues Log has encoded every hard-won lesson from every previous PR. The Static Analysis agent enforces it. The institutional knowledge that previously lived only in the heads of the most experienced QA engineers — now available to everyone, every time, automatically.

Coverage moved from 25% to 100% not because we hired more QA engineers. It moved because we removed the prerequisite that you had to be a QA engineer to contribute.

06The Self-Improving System

The part worth dwelling on isn’t the speed. It’s that the system gets smarter with every PR.

The Pattern Tracker appends to the Known Issues Log after every merge. The Generate Test Code agent reads that log before writing. The Static Analysis agent treats escalated entries as BLOCKERs. The cycle is closed: every run teaches the pipeline something new, and that knowledge propagates into every future run automatically.

Early patterns — forgetting to scope DataTables <tfoot> search selectors, using waitForTimeout instead of polling with waitForFunction — were blockers in early tickets. They haven’t appeared in new code for weeks. We don’t document best practices in a wiki nobody reads. We encode them as hard gates that fire before code ships.

The self-heal agent’s effectiveness is bounded by the quality of Playwright’s error traces — which are rich enough that root cause diagnosis is almost always correct. The cases where it struggled were genuine upstream app defects. The agent correctly tagged those as @wip rather than attempting to fix the wrong thing. That judgment matters.

07What Full RTM Looks Like Now

Our regulatory compliance team needs evidence that every requirement has a tested scenario. Previously, that was a manually maintained spreadsheet — one QA engineer, one Friday afternoon, copying ticket numbers.

Now: every Jira requirement processed through the pipeline gets a Task ticket linked to the requirement, a set of Test tickets (one per scenario, each linked to the requirement), every ticket in the correct Jira state when the PR merges, and a live Jira link tree that is the RTM. When the compliance team pulls a Requirements Traceability Matrix report — it’s populated from the pipeline output. Real-time, always accurate, no manual entry.

08What Surprised Us

The DOM Gathering agent is the most valuable node in the pipeline. Wrong selectors are the number one cause of flaky tests. Having an agent browse the live application and extract precise locators — including hidden inputs, plugin-wrapped elements, and modal IDs — before a single test is written eliminated an entire class of failures that previously consumed most of the debugging time.

The Known Issues Log pattern escalation works. We were skeptical that a text file read by an agent would actually prevent repeat mistakes. It does. Patterns escalated as BLOCKERs have not recurred in new code. The scepticism was wrong.

Self-heal loops are bounded by error trace quality. Playwright’s output is rich enough that the agent almost always diagnoses the correct root cause. The hardest ticket — Bootstrap static backdrop modals blocking navigation, multiselect plugins hiding underlying <select> elements, SweetAlert toasts dismissed before the polling loop caught them — required 11 iterations. All resolved.

The future of QA automation isn’t faster typing. It’s removing the requirement to type at all.

09What’s Next

Three areas we’re expanding into:

  • 01 Expanding the Known Issues Log to include app behavior quirks — documented environments, feature flags, known defects — so the agent can make smarter scenario design decisions
  • 02 Running the pipeline against the full requirements backlog to surface which tickets still have zero test coverage
  • 03 Integrating test result uploads directly into Jira’s test execution records for full traceability from test run to evidence artifact
Chandrasekhar Boddi, Lead SDET at Sails Software
Chandrasekhar Boddi
Lead SDET · Sails Software

Chandrasekhar Boddi builds AI-powered QA automation systems for regulated medical device software at Sails Software. He specialises in agent orchestration pipelines, Playwright-based test frameworks, and Requirements Traceability Matrix automation for clinical and compliance-heavy environments.

Frequently Asked Questions

By replacing manual test writing with a 23-agent AI orchestration pipeline. Instead of asking AI to generate code in a chat window, the team built a pipeline where Claude agents handle the entire workflow — from fetching Jira requirements to writing Playwright tests, self-healing failures, creating RTM tickets, and merging PRs. The whole team contributed, not just QA engineers, which eliminated the bandwidth bottleneck.
No. Contributors only need domain knowledge of the feature they’re testing. The pipeline encodes all framework knowledge — page object patterns, selector strategies, BDD syntax, CI parity configuration, known anti-patterns — and enforces them automatically. Developers and business analysts ran it independently without any training or Playwright documentation.
The Known Issues Log is a cross-PR pattern memory that accumulates every anti-pattern discovered during pipeline runs. The Generate Test Code agent reads it before writing. The Static Analysis agent treats escalated entries as automatic BLOCKERs. A pattern seen twice becomes escalated; seen three or more times it becomes a hard BLOCKER that prevents code from passing static analysis regardless of who or what wrote it.
When tests fail, the Self-Heal agent reads the Playwright error trace, identifies the root cause (selector problem, timing issue, data dependency, or app defect), cross-references the Known Issues Log to avoid reintroducing known-bad patterns, applies a targeted fix, and re-runs affected scenarios. It loops until 100% pass rate is achieved or a hard cap is reached. The 100% pass gate is non-negotiable — no failing test reaches the delivery phase.
Every Jira requirement processed through the pipeline gets a linked Task ticket and a set of Test tickets — one per passing scenario — automatically. All tickets are in the correct Jira state when the PR merges. The Jira issue link tree becomes the Requirements Traceability Matrix: real-time, always accurate, no spreadsheet, no manual entry. Compliance teams pull the RTM report directly from Jira.
Three. (1) Requirement Update Approval — the team reviews agent findings before any code is written. If the live app diverges from the requirement, it’s caught here. (2) Feature File Approval — the team reviews the BDD feature file before TypeScript is generated. A NO-GO loops the agent back to redraft. (3) Human Final Review — a team member reviews the final PR diff before it merges to master. Human judgment is preserved at every critical decision point.

Is Your Team’s Test Coverage Stuck?

If you have a backlog of untested requirements and a coverage number that hasn’t moved in months, we can show you what this pipeline looks like in practice. You don’t need a bigger QA team. You need a better pipeline.

Talk to Our Team

Discover more from Sails Software

Subscribe now to keep reading and get access to the full archive.

Continue reading