agentic-coding

Agentic Coding: QA & Testing

Anatoliy Kolodkin

29 Mar 2026 • 3 min read

Agentic Coding: QA & Testing - THE LGTM

Agentic Coding: QA & Testing

AI agents can write tests, detect bugs, and even heal flaky tests — but only with the right setup. Here's how to build testing workflows that leverage agent capabilities without creating maintenance nightmares.

Last Updated: April 5, 2026

The Testing Paradox

AI coding generates code faster than ever. But code without tests is technical debt at scale.

The trap: "Generate unit tests. Aim for 100% coverage. We can always regenerate them."

Wrong. If tests change every time you refactor, they're not guarding behavior — they're guarding implementation. They become maintenance tax, not safety net.

What AI Testing Does Well

✅ Strengths

Generating test scaffolding and boilerplate
Creating test data and fixtures
Writing edge case tests (boundary values, nulls, empty inputs)
Translating requirements into test cases
Explaining failing tests in natural language
Identifying gaps in test coverage

❌ Limitations

Understanding why a feature matters (business logic)
Writing tests that survive refactoring
Determining appropriate test scope
Catching integration issues
Non-functional testing (performance, UX)

Test Generation Strategies

Strategy 1: Property-Based Test Generation

AI generates tests based on properties that should always hold:

Property: "sorting should maintain all elements"
Test: generate random arrays, verify output is sorted permutation of input

Property: "auth should reject invalid tokens"
Test: generate various token formats, verify all rejected

Tools: Hypothesis (Python), fast-check (JS), PropEr (Erlang)

Strategy 2: Mutation Testing + AI

Mutate source code (change operators, remove lines)
Run test suite
AI identifies mutations not caught by tests
AI suggests tests to kill surviving mutants

Benefit: Tests improve based on actual gaps, not coverage metrics.

Strategy 3: Behavior-Driven Generation

Start with behavior specifications, generate tests:

Spec: "When user adds item to cart, cart total updates"
AI generates:
- Test: add item → verify total increased
- Test: add multiple → verify sum correct
- Test: add zero-price item → verify handling
- Test: add invalid item → verify error

Self-Healing Tests

Flaky tests waste time. AI can help:

Flakiness Detection

Run tests multiple times, detect non-determinism
Identify timing-dependent assertions
Find tests depending on external state

Healing Patterns

Original (flaky):
  expect(element).toBeVisible() // fails if animation in progress

Healed:
  await expect(element).toBeVisible({ timeout: 5000 })
  // adds wait, more resilient

Original (flaky):
  expect(Date.now()).toEqual(expectedTimestamp) // timezone issues

Healed:
  expect(new Date().toISOString().slice(0,10))
    .toEqual(expectedDate) // date only, not time

Caution: Self-healing can mask real issues. Review all changes.

The Test Pyramid for AI

Traditional pyramid still applies:

        /\
       /  \     E2E tests (AI-assisted, human-written)
      /____\
     /      \   Integration tests (AI generates scaffold)
    /________\
   /          \ Unit tests (AI helps, humans review)
  /____________\

Unit Tests

AI: Generate boilerplate, edge cases
Human: Review behavior coverage, remove brittle tests

Integration Tests

AI: Generate API call sequences, mock setup
Human: Define meaningful assertions

E2E Tests

AI: Help with selectors, page objects
Human: Define user journeys, critical paths

AI-Driven QA Workflows

Workflow 1: Continuous Test Generation

Developer writes feature
  ↓
AI suggests test cases based on code analysis
  ↓
Developer selects/rejects suggestions
  ↓
AI generates selected tests
  ↓
Developer reviews and refines

Workflow 2: Regression Test Selection

Code change committed
  ↓
AI analyzes impact (changed files, dependencies)
  ↓
AI selects subset of tests to run
  ↓
Run selected tests (fast feedback)
  ↓
Full suite runs in background

Benefit: 5-minute feedback instead of 30-minute full suite.

Workflow 3: Visual Regression + AI

UI change made
  ↓
Screenshots captured
  ↓
AI compares to baseline, flags differences
  ↓
Human reviews flagged changes
  ↓
Approve/reject each diff
  ↓
Update baselines or fix code

Testing Tool Integration

Via MCP (Model Context Protocol)

Testing tools can expose MCP servers:

Agent: "Run tests for changed files"
MCP Test Server: executes jest --onlyChanged
Agent: reads results, identifies failures
Agent: "Fix failing test in auth.test.js"
Agent: edits code, re-runs test

Testing-Specific Agents

Tool	Testing Features
Cursor	Generate tests, run in agent loop
Claude Code	Test-driven development mode
Kiro	Spec-driven test generation
Momentic	AI-powered E2E testing platform
Chromatic	Visual testing, AI diff review

Quality Gates with AI

Automated quality checkpoints:

Pre-commit:
  ✓ Type check passes
  ✓ Linter clean
  ✓ Unit tests for changed files pass
  ✓ AI: No obvious security issues detected

Pre-PR:
  ✓ All tests pass
  ✓ Coverage doesn't decrease
  ✓ AI: Architecture checks pass
  ✓ AI: Test quality review (no brittle tests)

Pre-merge:
  ✓ Integration tests pass
  ✓ Performance benchmarks (if applicable)
  ✓ Human: Code review approved
  ✓ Human: Test coverage acceptable

The "Survive Refactoring" Rule

The best test metric: Would this test survive a refactoring that doesn't change behavior?

Good test (survives refactoring):

test("applies discount to total", () => {
  const cart = new Cart([item({ price: 100 })])
  cart.applyDiscount(10) // 10% off
  expect(cart.total).toBe(90)
})

Bad test (breaks on refactoring):

test("calculates total correctly", () => {
  const cart = new Cart()
  cart.items = [{ price: 100 }]
  cart.discount = 10
  expect(cart.calculate()).toBe(90)
})

Test behavior, not implementation.

AI Testing Best Practices

Do:

✅ Review AI-generated tests before committing
✅ Keep tests behavior-focused
✅ Use AI for edge cases you'd miss
✅ Let AI explain failures before you debug
✅ Generate tests from specifications, not just code

Don't:

❌ Blindly accept 100% coverage from AI
❌ Let AI tests become implementation snapshots
❌ Skip manual testing of critical paths
❌ Trust AI security tests without human review
❌ Generate tests after the fact — include in dev workflow

The Bottom Line

AI is a powerful testing assistant, not a replacement for testing strategy.

Effective AI testing workflow:

Define behavior in specifications (human)
Generate test scaffolding (AI)
Review and refine assertions (human)
Run tests, analyze failures (AI helps interpret)
Iterate until green (human + AI)

Tests are documentation. Tests are safety nets. Don't let AI-generated slop undermine both.

Agentic Coding: QA & Testing

Anatoliy Kolodkin

Agentic Coding: QA & Testing

The Testing Paradox

What AI Testing Does Well

✅ Strengths

❌ Limitations

Test Generation Strategies

Strategy 1: Property-Based Test Generation

Strategy 2: Mutation Testing + AI

Strategy 3: Behavior-Driven Generation

Self-Healing Tests

Flakiness Detection

Healing Patterns

The Test Pyramid for AI

Unit Tests

Integration Tests

E2E Tests

AI-Driven QA Workflows

Workflow 1: Continuous Test Generation

Workflow 2: Regression Test Selection

Workflow 3: Visual Regression + AI

Testing Tool Integration

Via MCP (Model Context Protocol)

Testing-Specific Agents

Quality Gates with AI

The "Survive Refactoring" Rule

AI Testing Best Practices

Do:

Don't:

The Bottom Line

Further Reading

Sign up for more like this.