From 25% to 100% Test Coverage
in 2 Weeks — Without Writing
a Single Line of Code
A 23-agent AI orchestration pipeline. 250+ requirements automated. 1,800+ test scenarios created. Zero manual code. And a Requirements Traceability Matrix that builds itself.
We stopped asking AI to write code. We asked it to run a pipeline. That one decision took a regulated medical device software team from 25% to 100% automated test coverage in two weeks — with the entire team contributing, not just QA engineers.
01Where We Started — and Why Coverage Stalled
Two weeks ago our project sat at 25% automated test coverage. Not because we lacked requirements — we had a full backlog. Not because the QA team wasn’t working hard. Because automation was siloed.
QA automation on a regulated medical device platform that interprets genomic variants for clinical labs carries real weight. Every automated test needs a traceable link from requirement to test evidence, a peer-reviewed pull request before merge, human sign-off before code lands in main, consistent patterns across 30+ contributor touchpoints, and Jira tickets in the correct state at every handoff.
Only QA engineers who knew the Playwright framework, understood the page object pattern, and had memorised which step definitions already existed could contribute. Developers who built the features and business analysts who wrote the acceptance criteria sat on the sidelines. The bottleneck wasn’t effort — it was access.
We tried AI code generation. It helped. But copy-pasting from a chat window into an IDE, debugging alone, and manually filing Jira tickets brought the gains back down to earth fast.
Coverage stalled at 25% not because we lacked requirements — but because we lacked bandwidth from the narrow group of people who could write the tests.
02The Shift — Stop Generating Code, Start Orchestrating Agents
The real unlock came when we stopped treating AI as a faster typist and started treating it as an orchestrator. We built an AI agent pipeline where Claude agents don’t just write code — they hand off to each other, gate themselves at quality checkpoints, self-heal when tests fail, and close the Jira loop end-to-end.
The key design principle: the pipeline requires zero knowledge of the test framework to use it. The person feeding it a Jira ticket number needs to know the feature. The agents handle everything else — the right patterns, the right selectors, the right file structure, the framework conventions accumulated across every previous PR.
The pipeline has 23 orchestrated agents across 5 phases. Here is how each one works.
- Generate Test Code (avoid patterns)
- Static Analysis (escalated = BLOCKER)
- Self-Heal (known-bad selectors)
- Pattern Tracker after every merge
- 2+ occurrences → ESCALATED
- 3+ occurrences → AUTO-BLOCKER
- Pipeline improves with each PR
Automated
Created
Auto-Created
CI Parity
Written
Per Ticket
03The Pipeline — Phase by Phase
Here is exactly what happens from the moment a Jira ticket number enters the pipeline to the moment a PR merges and the RTM updates.
A developer drops a Jira requirement ticket number into the pipeline. Three things happen automatically — zero clicks from the engineer.
- Jira PR Input — fetches the full ticket: acceptance criteria, description, project context
- Create Task from Requirement — checks if a linked task exists; if not, creates one, assigns it, and links it back to the requirement
- Transition: In Progress — moves the Jira task so the whole team sees who is working on what. The board already reflects reality
This phase asks the hard questions before writing a single line of code — a discipline most human developers skip.
Tests run with 4 parallel workers matching CI configuration. Running with fewer workers masks race conditions that will fail in GitHub Actions. The configuration is intentional.
npm run test:parallel:tag — CI parity from the start. 4 parallel workers, no shortcuts.04Real Numbers From 250+ Production Runs
| Metric | Result |
|---|---|
| Test coverage before pipeline | 25% |
| Test coverage after 2 weeks | 100% |
| Requirements automated | 250+ |
| Test scenarios created | 1,800+ |
| Jira RTM tickets auto-created | 1,800+ |
| PRs merged | 250+ |
| Clean first-run passes (0 self-heal needed) | ~40% of tickets |
| Most self-heal iterations on a single ticket | 11 (19-scenario modal/multiselect run) |
| Parallel workers during test run | 4 (matching CI) |
| Lines of test code written manually | 0 |
| Human gates per ticket | 3 |
05You Don’t Need to Know the Framework
In the first week, QA engineers were the only contributors. By the second week, developers were running it independently. Not because anyone ran a training session. Because the workflow abstracts the framework entirely.
When a developer drops a Jira ticket into the pipeline, here is everything they need to know:
- → What the feature does (they already know — they built it)
- → Whether the agent’s feature file covers the scenarios correctly (Human Gate 2)
- → Whether the final PR diff looks reasonable (Human Final Review)
They don’t need to know what a page object pattern is. How Cucumber BDD syntax works. Why waitForFunction is not the same as waitForTimeout. How to scope a DataTables search selector to <tfoot> instead of <thead>. Why networkidle needs a .catch() guard in parallel execution. How to handle a Bootstrap static modal blocking Playwright navigation.
The pipeline knows all of this. The Known Issues Log has encoded every hard-won lesson from every previous PR. The Static Analysis agent enforces it. The institutional knowledge that previously lived only in the heads of the most experienced QA engineers — now available to everyone, every time, automatically.
Coverage moved from 25% to 100% not because we hired more QA engineers. It moved because we removed the prerequisite that you had to be a QA engineer to contribute.
06The Self-Improving System
The part worth dwelling on isn’t the speed. It’s that the system gets smarter with every PR.
The Pattern Tracker appends to the Known Issues Log after every merge. The Generate Test Code agent reads that log before writing. The Static Analysis agent treats escalated entries as BLOCKERs. The cycle is closed: every run teaches the pipeline something new, and that knowledge propagates into every future run automatically.
Early patterns — forgetting to scope DataTables <tfoot> search selectors, using waitForTimeout instead of polling with waitForFunction — were blockers in early tickets. They haven’t appeared in new code for weeks. We don’t document best practices in a wiki nobody reads. We encode them as hard gates that fire before code ships.
The self-heal agent’s effectiveness is bounded by the quality of Playwright’s error traces — which are rich enough that root cause diagnosis is almost always correct. The cases where it struggled were genuine upstream app defects. The agent correctly tagged those as @wip rather than attempting to fix the wrong thing. That judgment matters.
07What Full RTM Looks Like Now
Our regulatory compliance team needs evidence that every requirement has a tested scenario. Previously, that was a manually maintained spreadsheet — one QA engineer, one Friday afternoon, copying ticket numbers.
Now: every Jira requirement processed through the pipeline gets a Task ticket linked to the requirement, a set of Test tickets (one per scenario, each linked to the requirement), every ticket in the correct Jira state when the PR merges, and a live Jira link tree that is the RTM. When the compliance team pulls a Requirements Traceability Matrix report — it’s populated from the pipeline output. Real-time, always accurate, no manual entry.
08What Surprised Us
The DOM Gathering agent is the most valuable node in the pipeline. Wrong selectors are the number one cause of flaky tests. Having an agent browse the live application and extract precise locators — including hidden inputs, plugin-wrapped elements, and modal IDs — before a single test is written eliminated an entire class of failures that previously consumed most of the debugging time.
The Known Issues Log pattern escalation works. We were skeptical that a text file read by an agent would actually prevent repeat mistakes. It does. Patterns escalated as BLOCKERs have not recurred in new code. The scepticism was wrong.
Self-heal loops are bounded by error trace quality. Playwright’s output is rich enough that the agent almost always diagnoses the correct root cause. The hardest ticket — Bootstrap static backdrop modals blocking navigation, multiselect plugins hiding underlying <select> elements, SweetAlert toasts dismissed before the polling loop caught them — required 11 iterations. All resolved.
The future of QA automation isn’t faster typing. It’s removing the requirement to type at all.
09What’s Next
Three areas we’re expanding into:
- 01 Expanding the Known Issues Log to include app behavior quirks — documented environments, feature flags, known defects — so the agent can make smarter scenario design decisions
- 02 Running the pipeline against the full requirements backlog to surface which tickets still have zero test coverage
- 03 Integrating test result uploads directly into Jira’s test execution records for full traceability from test run to evidence artifact
Chandrasekhar Boddi builds AI-powered QA automation systems for regulated medical device software at Sails Software. He specialises in agent orchestration pipelines, Playwright-based test frameworks, and Requirements Traceability Matrix automation for clinical and compliance-heavy environments.
Frequently Asked Questions
Is Your Team’s Test Coverage Stuck?
If you have a backlog of untested requirements and a coverage number that hasn’t moved in months, we can show you what this pipeline looks like in practice. You don’t need a bigger QA team. You need a better pipeline.
Talk to Our Team
