Agentic Coding: QA & Testing

Agentic Coding: QA & Testing - THE LGTM

Agentic Coding: QA & Testing

AI agents can write tests, detect bugs, and even heal flaky tests — but only with the right setup. Here's how to build testing workflows that leverage agent capabilities without creating maintenance nightmares.

Last Updated: April 5, 2026

The Testing Paradox

AI coding generates code faster than ever. But code without tests is technical debt at scale.

The trap: "Generate unit tests. Aim for 100% coverage. We can always regenerate them."

Wrong. If tests change every time you refactor, they're not guarding behavior — they're guarding implementation. They become maintenance tax, not safety net.

What AI Testing Does Well

✅ Strengths

  • Generating test scaffolding and boilerplate
  • Creating test data and fixtures
  • Writing edge case tests (boundary values, nulls, empty inputs)
  • Translating requirements into test cases
  • Explaining failing tests in natural language
  • Identifying gaps in test coverage

❌ Limitations

  • Understanding why a feature matters (business logic)
  • Writing tests that survive refactoring
  • Determining appropriate test scope
  • Catching integration issues
  • Non-functional testing (performance, UX)

Test Generation Strategies

Strategy 1: Property-Based Test Generation

AI generates tests based on properties that should always hold:

Property: "sorting should maintain all elements"
Test: generate random arrays, verify output is sorted permutation of input

Property: "auth should reject invalid tokens"
Test: generate various token formats, verify all rejected

Tools: Hypothesis (Python), fast-check (JS), PropEr (Erlang)

Strategy 2: Mutation Testing + AI

  1. Mutate source code (change operators, remove lines)
  2. Run test suite
  3. AI identifies mutations not caught by tests
  4. AI suggests tests to kill surviving mutants

Benefit: Tests improve based on actual gaps, not coverage metrics.

Strategy 3: Behavior-Driven Generation

Start with behavior specifications, generate tests:

Spec: "When user adds item to cart, cart total updates"
AI generates:
- Test: add item → verify total increased
- Test: add multiple → verify sum correct
- Test: add zero-price item → verify handling
- Test: add invalid item → verify error

Self-Healing Tests

Flaky tests waste time. AI can help:

Flakiness Detection

  • Run tests multiple times, detect non-determinism
  • Identify timing-dependent assertions
  • Find tests depending on external state

Healing Patterns

Original (flaky):
  expect(element).toBeVisible() // fails if animation in progress

Healed:
  await expect(element).toBeVisible({ timeout: 5000 })
  // adds wait, more resilient
Original (flaky):
  expect(Date.now()).toEqual(expectedTimestamp) // timezone issues

Healed:
  expect(new Date().toISOString().slice(0,10))
    .toEqual(expectedDate) // date only, not time

Caution: Self-healing can mask real issues. Review all changes.

The Test Pyramid for AI

Traditional pyramid still applies:

        /\
       /  \     E2E tests (AI-assisted, human-written)
      /____\
     /      \   Integration tests (AI generates scaffold)
    /________\
   /          \ Unit tests (AI helps, humans review)
  /____________\

Unit Tests

  • AI: Generate boilerplate, edge cases
  • Human: Review behavior coverage, remove brittle tests

Integration Tests

  • AI: Generate API call sequences, mock setup
  • Human: Define meaningful assertions

E2E Tests

  • AI: Help with selectors, page objects
  • Human: Define user journeys, critical paths

AI-Driven QA Workflows

Workflow 1: Continuous Test Generation

Developer writes feature
  ↓
AI suggests test cases based on code analysis
  ↓
Developer selects/rejects suggestions
  ↓
AI generates selected tests
  ↓
Developer reviews and refines

Workflow 2: Regression Test Selection

Code change committed
  ↓
AI analyzes impact (changed files, dependencies)
  ↓
AI selects subset of tests to run
  ↓
Run selected tests (fast feedback)
  ↓
Full suite runs in background

Benefit: 5-minute feedback instead of 30-minute full suite.

Workflow 3: Visual Regression + AI

UI change made
  ↓
Screenshots captured
  ↓
AI compares to baseline, flags differences
  ↓
Human reviews flagged changes
  ↓
Approve/reject each diff
  ↓
Update baselines or fix code

Testing Tool Integration

Via MCP (Model Context Protocol)

Testing tools can expose MCP servers:

Agent: "Run tests for changed files"
MCP Test Server: executes jest --onlyChanged
Agent: reads results, identifies failures
Agent: "Fix failing test in auth.test.js"
Agent: edits code, re-runs test

Testing-Specific Agents

Tool Testing Features
Cursor Generate tests, run in agent loop
Claude Code Test-driven development mode
Kiro Spec-driven test generation
Momentic AI-powered E2E testing platform
Chromatic Visual testing, AI diff review

Quality Gates with AI

Automated quality checkpoints:

Pre-commit:
  ✓ Type check passes
  ✓ Linter clean
  ✓ Unit tests for changed files pass
  ✓ AI: No obvious security issues detected

Pre-PR:
  ✓ All tests pass
  ✓ Coverage doesn't decrease
  ✓ AI: Architecture checks pass
  ✓ AI: Test quality review (no brittle tests)

Pre-merge:
  ✓ Integration tests pass
  ✓ Performance benchmarks (if applicable)
  ✓ Human: Code review approved
  ✓ Human: Test coverage acceptable

The "Survive Refactoring" Rule

The best test metric: Would this test survive a refactoring that doesn't change behavior?

Good test (survives refactoring):

test("applies discount to total", () => {
  const cart = new Cart([item({ price: 100 })])
  cart.applyDiscount(10) // 10% off
  expect(cart.total).toBe(90)
})

Bad test (breaks on refactoring):

test("calculates total correctly", () => {
  const cart = new Cart()
  cart.items = [{ price: 100 }]
  cart.discount = 10
  expect(cart.calculate()).toBe(90)
})

Test behavior, not implementation.

AI Testing Best Practices

Do:

  • ✅ Review AI-generated tests before committing
  • ✅ Keep tests behavior-focused
  • ✅ Use AI for edge cases you'd miss
  • ✅ Let AI explain failures before you debug
  • ✅ Generate tests from specifications, not just code

Don't:

  • ❌ Blindly accept 100% coverage from AI
  • ❌ Let AI tests become implementation snapshots
  • ❌ Skip manual testing of critical paths
  • ❌ Trust AI security tests without human review
  • ❌ Generate tests after the fact — include in dev workflow

The Bottom Line

AI is a powerful testing assistant, not a replacement for testing strategy.

Effective AI testing workflow:

  1. Define behavior in specifications (human)
  2. Generate test scaffolding (AI)
  3. Review and refine assertions (human)
  4. Run tests, analyze failures (AI helps interpret)
  5. Iterate until green (human + AI)

Tests are documentation. Tests are safety nets. Don't let AI-generated slop undermine both.

Further Reading