claude-code

Lumo Turns Mobile UI Taste Into Deterministic Tools Claude Code Can Actually Call

Anatoliy Kolodkin

16 May 2026 • 3 min read

Mobile UI review is one of the places where AI agents sound confident while being dangerously imprecise. A model can declare a screen “clean,” “modern,” or “accessible” without measuring a single contrast ratio or tap target. Lumo, a new Claude Code-compatible plugin and MCP package, is interesting because it pushes the agent away from aesthetic vibes and toward tools that actually measure things.

The repository was created on May 16 and had 0 stars and 0 forks at research time, so this is early software, not a mature ecosystem bet. But the pattern is worth paying attention to. Lumo exposes four command-line tools — lumo-wcag, lumo-theory, lumo-parity, and lumo-mcp — for mobile UI/UX checks grounded in WCAG contrast, Fitts, Hick, Gestalt, Nielsen, Apple HIG, and Material Design. The install paths include npx @onexeor/lumo init, npx skills add OneXeor-Dev/lumo, Claude Code plugin marketplace, pipx install lumo-mobile, and manual clone.

The agent should orchestrate. The ruler should measure.

The most credible AI developer tools are not the ones that ask the model to become an expert at everything. They are the ones that give the model deterministic instruments. Lumo fits that philosophy. Claude Code can explain findings, propose patches, and coordinate changes across files. The actual contrast math, parity comparison, and layout heuristics come from callable tools with structured output.

That distinction matters because UI quality has a lot of “looks right” review theater. Accessibility is the clearest example. A pale blue button on a white background may look acceptable in a screenshot and still fail WCAG contrast. Lumo’s README example moves #7DD3FC on white from a 1.667:1 ratio to 4.779:1 using #1B7BA1 after 14 OKLCH iterations. The important detail is not just that the number passes. It is that the correction preserves hue/chroma instead of smashing brand color through naive RGB darkening.

That is the kind of output engineers can review. “Make it accessible” becomes a concrete before/after diff with ratios, colors, and a strategy. A human designer can still disagree, but now the disagreement is about tradeoffs rather than whether the assistant’s taste had a good day.

Cross-platform parity is where agents need receipts

Lumo’s parity checker may be even more useful for teams shipping Android and iOS together. It compares Android Compose/XML in dp against iOS SwiftUI/UIKit in pt, flags mismatches, and whitelists known platform defaults like Material’s 48dp and Apple HIG’s 44pt. The example output reports 6 findings: 1 high, 3 medium, and 2 info, including a missing iOS FAB and an Android/iOS height mismatch.

This is exactly the sort of work that falls between design QA, code review, and “we’ll fix it later.” Compose and SwiftUI screens drift. XML and UIKit screens drift. A card gets 16dp padding on Android and 48pt on iOS. A floating action button exists on one platform but not the other. None of those failures require superintelligence to find. They require the agent to stop guessing and call a parity tool.

The practitioner value is concrete: wire this class of check into implementation, not just final QA. Ask Claude Code to run contrast and parity while building the feature. Have it attach findings to the PR. Let deterministic checks block objective regressions and leave interpretive layout feedback as comments. That keeps the agent useful without pretending it has perfect design judgment.

Heuristics are helpful; gates need restraint

The lumo-theory side is more nuanced. Tap targets, Fitts difficulty, Hick overload in equal-weight choice groups, Gestalt proximity violations, and one-handed reachability are all useful review lenses. They are also not laws of physics. Context matters. A dense professional tool, a casual consumer app, and a game UI can make different choices for good reasons.

That means teams should treat Lumo’s interpretive findings differently from contrast and platform parity. Contrast ratios and missing platform counterparts can be objective enough for CI in many cases. Cognitive-science layout warnings should start as PR comments with severity labels and escape hatches. The fastest way to kill a useful agent tool is to let it block releases on noisy taste policing.

There is also the supply-chain angle. Claude Code plugins and MCP packages can package skills, tools, hooks, binaries, MCP servers, and settings. A brand-new plugin with no adoption signal should be reviewed before it touches a real repo. Inspect the manifest, understand what commands run, test on a sample project, and pin versions where possible. “It improves accessibility” is not a reason to skip the same plugin hygiene you would apply to any tool in the agent runtime.

The broader lesson is bigger than Lumo. Domain plugins get useful when they turn squishy review categories into structured checks. A backend agent should call a database migration verifier. A security agent should call a protocol-aware scanner. A mobile agent should call contrast, parity, and layout tools. The model’s job is to connect context, explain tradeoffs, and produce patches. The tool’s job is to measure things the model should not be trusted to eyeball.

If you build mobile software, start with the boring gates: WCAG contrast and Android/iOS parity. Run them during feature work. Put the output in PRs. Let the agent fix the obvious failures before a human reviewer spends attention on them. The best coding agent is not the one with the fanciest taste. It is the one that knows when to ask a ruler.

Sources: Lumo repository, Claude Code plugin docs, Model Context Protocol overview, Claude Code MCP docs

The agent should orchestrate. The ruler should measure.

Cross-platform parity is where agents need receipts

Heuristics are helpful; gates need restraint

Sign up for more like this.