Agentic Coding: QA & Testing
Agentic Coding: QA & Testing - THE LGTM
Agentic Coding: QA & Testing
AI agents can write tests, detect bugs, and even heal flaky tests — but only with the right setup. Here's how to build testing workflows that leverage agent capabilities without creating maintenance nightmares.
Last Updated: April 5, 2026
The Testing Paradox
AI coding generates code faster than ever. But code without tests is technical debt at scale.
The trap: "Generate unit tests. Aim for 100% coverage. We can always regenerate them."
Wrong. If tests change every time you refactor, they're not guarding behavior — they're guarding implementation. They become maintenance tax, not safety net.
What AI Testing Does Well
✅ Strengths
- Generating test scaffolding and boilerplate
- Creating test data and fixtures
- Writing edge case tests (boundary values, nulls, empty inputs)
- Translating requirements into test cases
- Explaining failing tests in natural language
- Identifying gaps in test coverage
❌ Limitations
- Understanding why a feature matters (business logic)
- Writing tests that survive refactoring
- Determining appropriate test scope
- Catching integration issues
- Non-functional testing (performance, UX)
Test Generation Strategies
Strategy 1: Property-Based Test Generation
AI generates tests based on properties that should always hold:
Property: "sorting should maintain all elements"
Test: generate random arrays, verify output is sorted permutation of input
Property: "auth should reject invalid tokens"
Test: generate various token formats, verify all rejected
Tools: Hypothesis (Python), fast-check (JS), PropEr (Erlang)
Strategy 2: Mutation Testing + AI
- Mutate source code (change operators, remove lines)
- Run test suite
- AI identifies mutations not caught by tests
- AI suggests tests to kill surviving mutants
Benefit: Tests improve based on actual gaps, not coverage metrics.
Strategy 3: Behavior-Driven Generation
Start with behavior specifications, generate tests:
Spec: "When user adds item to cart, cart total updates"
AI generates:
- Test: add item → verify total increased
- Test: add multiple → verify sum correct
- Test: add zero-price item → verify handling
- Test: add invalid item → verify error
Self-Healing Tests
Flaky tests waste time. AI can help:
Flakiness Detection
- Run tests multiple times, detect non-determinism
- Identify timing-dependent assertions
- Find tests depending on external state
Healing Patterns
Original (flaky):
expect(element).toBeVisible() // fails if animation in progress
Healed:
await expect(element).toBeVisible({ timeout: 5000 })
// adds wait, more resilient
Original (flaky):
expect(Date.now()).toEqual(expectedTimestamp) // timezone issues
Healed:
expect(new Date().toISOString().slice(0,10))
.toEqual(expectedDate) // date only, not time
Caution: Self-healing can mask real issues. Review all changes.
The Test Pyramid for AI
Traditional pyramid still applies:
/\
/ \ E2E tests (AI-assisted, human-written)
/____\
/ \ Integration tests (AI generates scaffold)
/________\
/ \ Unit tests (AI helps, humans review)
/____________\
Unit Tests
- AI: Generate boilerplate, edge cases
- Human: Review behavior coverage, remove brittle tests
Integration Tests
- AI: Generate API call sequences, mock setup
- Human: Define meaningful assertions
E2E Tests
- AI: Help with selectors, page objects
- Human: Define user journeys, critical paths
AI-Driven QA Workflows
Workflow 1: Continuous Test Generation
Developer writes feature
↓
AI suggests test cases based on code analysis
↓
Developer selects/rejects suggestions
↓
AI generates selected tests
↓
Developer reviews and refines
Workflow 2: Regression Test Selection
Code change committed
↓
AI analyzes impact (changed files, dependencies)
↓
AI selects subset of tests to run
↓
Run selected tests (fast feedback)
↓
Full suite runs in backgroundBenefit: 5-minute feedback instead of 30-minute full suite.
Workflow 3: Visual Regression + AI
UI change made
↓
Screenshots captured
↓
AI compares to baseline, flags differences
↓
Human reviews flagged changes
↓
Approve/reject each diff
↓
Update baselines or fix codeTesting Tool Integration
Via MCP (Model Context Protocol)
Testing tools can expose MCP servers:
Agent: "Run tests for changed files"
MCP Test Server: executes jest --onlyChanged
Agent: reads results, identifies failures
Agent: "Fix failing test in auth.test.js"
Agent: edits code, re-runs test
Testing-Specific Agents
| Tool | Testing Features |
|---|---|
| Cursor | Generate tests, run in agent loop |
| Claude Code | Test-driven development mode |
| Kiro | Spec-driven test generation |
| Momentic | AI-powered E2E testing platform |
| Chromatic | Visual testing, AI diff review |
Quality Gates with AI
Automated quality checkpoints:
Pre-commit:
✓ Type check passes
✓ Linter clean
✓ Unit tests for changed files pass
✓ AI: No obvious security issues detected
Pre-PR:
✓ All tests pass
✓ Coverage doesn't decrease
✓ AI: Architecture checks pass
✓ AI: Test quality review (no brittle tests)
Pre-merge:
✓ Integration tests pass
✓ Performance benchmarks (if applicable)
✓ Human: Code review approved
✓ Human: Test coverage acceptableThe "Survive Refactoring" Rule
The best test metric: Would this test survive a refactoring that doesn't change behavior?
Good test (survives refactoring):
test("applies discount to total", () => {
const cart = new Cart([item({ price: 100 })])
cart.applyDiscount(10) // 10% off
expect(cart.total).toBe(90)
})Bad test (breaks on refactoring):
test("calculates total correctly", () => {
const cart = new Cart()
cart.items = [{ price: 100 }]
cart.discount = 10
expect(cart.calculate()).toBe(90)
})Test behavior, not implementation.
AI Testing Best Practices
Do:
- ✅ Review AI-generated tests before committing
- ✅ Keep tests behavior-focused
- ✅ Use AI for edge cases you'd miss
- ✅ Let AI explain failures before you debug
- ✅ Generate tests from specifications, not just code
Don't:
- ❌ Blindly accept 100% coverage from AI
- ❌ Let AI tests become implementation snapshots
- ❌ Skip manual testing of critical paths
- ❌ Trust AI security tests without human review
- ❌ Generate tests after the fact — include in dev workflow
The Bottom Line
AI is a powerful testing assistant, not a replacement for testing strategy.
Effective AI testing workflow:
- Define behavior in specifications (human)
- Generate test scaffolding (AI)
- Review and refine assertions (human)
- Run tests, analyze failures (AI helps interpret)
- Iterate until green (human + AI)
Tests are documentation. Tests are safety nets. Don't let AI-generated slop undermine both.