The Problem with Selectors
Selector-based automation ties your workflows to implementation details. Whether you're automating an Android app, a Windows desktop application, a React web app, or an automotive infotainment system — the pattern is the same: find an element by its ID, class, or XPath, then interact with it.
Every UI update triggers a maintenance cycle. A developer renames a button, refactors a component, or updates a framework — and suddenly dozens of automations fail. Teams spend more time fixing scripts than building new ones.
The Cross-Platform Challenge
Modern applications span multiple platforms. A single product might include a React web dashboard, native Android and iOS apps, an Electron desktop client, and embedded displays.
Traditional automation requires different tools for each: Selenium for web, Appium for mobile, WinAppDriver for Windows, custom solutions for embedded systems. Each tool has its own API, selector syntax, and maintenance burden.
And some platforms don't support selectors at all. Embedded HMI systems, legacy desktop applications, games, and canvas-based UIs have no accessible DOM. You either can't automate them, or you resort to fragile image-based approaches that break with every pixel change.
The Multimodal Reality
Modern interfaces aren't just touchscreens. Users interact through multiple channels simultaneously — and your automation needs to handle all of them.
Mobile apps combine touch gestures, voice commands, camera input, and biometric authentication. Desktop applications mix keyboard shortcuts, mouse interactions, drag-and-drop, and clipboard operations. Automotive infotainment adds physical knobs, steering wheel buttons, and voice assistants. Industrial HMI includes ruggedized touchscreens, physical controls, and hardware integrations.
Traditional automation requires separate frameworks for each input type. But real user journeys combine them — a voice command while scrolling, a gesture after typing. Agentic automation handles multimodal interactions natively, automating the experience as users actually use it.
How Agentic Automation Works
Instead of writing code that manipulates UI elements by selector, you define what you want to automate — and the agent figures out how. Task definitions live in spreadsheets or natural language:
The agent reads instructions, observes the screen with computer vision, executes actions, verifies results, and generates evidence — screenshots, logs, and reports for every step. It doesn't need to know how a button was rendered or what framework built it. It just needs to see it.
Benefits
For automation teams: Define tasks in spreadsheets. Automate across web, mobile, desktop, and embedded with one tool. Reduce maintenance by 80%+. Get audit-ready evidence automatically.
For development: Decouple automation logic from UI implementation. Refactor freely without breaking workflows. Integrate with CI/CD. Ship faster without automation bottlenecks.
For compliance: Full traceability from requirement to execution. Detailed reporting with every step documented and validated against expected behavior. Consistent coverage across all platforms.
The Bottom Line
Whether you're automating a mobile app, a desktop application, a web dashboard, or an embedded HMI system — users don't interact with selectors. They interact with what they see.
That's the difference between automating implementation and automating experience.