The Missing Link: Device UI Controller Features Neccessary for Vision AI Agents

October 31, 2024

The image depicts a futuristic scene featuring a humanoid figure with a metallic and intricately detailed head, gazing at a large, hovering, digital eye. The figure's head is embedded with various technological components and gears, highlighting an advanced level of cybernetic enhancement. Their right hand is raised with the index finger pointing upwards in a thoughtful gesture, with vibrant colors and lights illuminating the scene, creating a dynamic and high-tech atmosphere. The digital eye is encircled by rings of energy and data streams, symbolizing themes of artificial intelligence and digital surveillance.

With Claudes' Computer Use feature being all the hype for the amazing thing it can do on your desktop we can surely say that AI will get to everybody eventually.

AI models are now able to understand images well enough that they can do sufficient Visual-Question-Answering: Detecting relations between objects! This is a big step forward because this task can be excellently solved by humans. But since the 1960s Computer Vision was not able to do this.

With the rapid growth in computing power -namely GPUs- and the rise of Large Language Models (LLMs) we are now able to reason well enough that AI models can decide what needs to be done on a User Interface (UI) to achieve a goal.

But unfortunately a critical component is often missing from the demos that is needed for widespread adoption inside real businesses: A reliable Device UI Controller that can act as real-human user. At AskUI we believe that true UI Automation is only possible if you control and automate your UI like a real human. With mouse movements, keypresses and clicks/taps.

Only then everybody can build reliable and intelligent Vision Agents for their use case. In this blog post we discuss what is necessary for a working Device UI Controller.

Missing From The Demos: A Reliable UI Device Controller

While the demos look impressive, there are massive hindrances to use them anywhere else except for impressing demos. Most of the demos use some kind of library like PyAutoGui. Those libraries serve their purpose well for the use case but are not able to be used in enterprise production applications because there are too many edge cases where they fall flat:

Cross-Platform compatibility
Real Unicode Char Typing
Multi-Screen Support
Type in Commandline
No need of administrator permissions
Support for all Desktop OS and native Mobile OS
Application selection
Process visualization

If you check the current landscape, no tool/library can fulfill these requirements. Most of them work on only a few operating systems or have trouble with real unicode char typing. This renders them fairly useless for business applications.

AskUIs' Controller: Production Ready for Intelligent Vision Agents

AskUI developed and extends its own Device UI Controller from scratch with all these requirements in mind. By integrating it deep into the operating system we achieve superior performance and features on each operating system:

Cross-Platform compatibility: Windows, Linux, macOS, Android
Multi Screen Support
Real Unicode Char Typing
Type in Commandline
No Need of administrator rights
Application Selection
Process visualization

Up and coming:

iOS support
In background automation
Native Tasks
Video Streaming

Conclusion

The missing link from all the demos - A reliable Device UI Controller that can act as real-human user - is already available today and ready to build Agentic AI.

Check out our AskUI Vision Agent Implementation

And if you want to use AskUIs Device Controller to build reliable enterprise production ready Agents with Vision:

Johannes Dienst

October 31, 2024

The Missing Link: Device UI Controller Features Neccessary for Vision AI Agents

What can be said can be solved.

Missing From The Demos: A Reliable UI Device Controller

AskUIs' Controller: Production Ready for Intelligent Vision Agents

Conclusion

What can be said can be solved.

More to explore

5 Tipps to get started with Agentic AI and AskUIs Vision Agent

"Revolutionizing Test Automation: Harnessing the Power of Agentic AI for Smarter Software Testing"

"Transforming Transportation: Leveraging Agentic AI in Test Automation for Native Desktop Apps"