Transforming UI Automation: AskUI and LLM

October 16, 2023
Academy
A Robot Hand touching a Human Hand to symbolize Revolutionizing UI Automation: Unleashing AskUI's Power with GPT! Discover how AskUI's visionary integration with advanced language models like GPT is reshaping automation. From intuitive workflows to seamless natural language translation, delve into the future of effortless UI automation.
linkedin icontwitter icon

Introduction

The importance of rapidly prototyping and testing ideas cannot be overstated in today's fast-paced technology landscape. With the emergence of powerful tools like GPT and other Large Language Models (LLMs), businesses now have a remarkable resource to expedite these critical processes. In this article, we would like to share our vision for AskUI and its potential integration with LLMs, illustrating how it could redefine real-world applications.

Check out the original here: LLMs Prototype

Catering to Enterprises

AskUI is a UI automation tool. While many might think of Selenium when they hear “user automation,” AskUI goes beyond it. Traditional tools like Selenium are dependent on the underlying website’s code. But what happens when the code changes? Or when you need to automate tasks on desktop applications? Enter AskUI, which leverages vision-based techniques, similar to human perception, to identify elements. So, it sees things like a human and you can ask it as if you ask your testing team to test specific actions like clicking a red button or signup button. Through object detection and other advanced methods, AskUI stands out as a robust automation tool that’s not just limited to web applications.

Empowering Natural Language

One of the distinctive attributes of AskUI is its user-friendly Domain Specific Language (DSL). For instance, a command like 'aui.click().button().withText("Hello World").exec();' is designed to be self-explanatory. However, by aiming to cater to a broader audience, including financial analysts at esteemed institutions and everyday individuals, we've recognized the potential for a more intuitive interaction. Our goal is to seamlessly convert natural language instructions into AskUI DSL commands.

Translating Natural Language to DSL
Translating Natural Language to DSL

AskUI framework DSL
AskUI framework DSL

Rather than immediately diving into the complexities of building a machine translation model, we've turned to the capabilities of GPT and LLMs. By feeding them our documentation and list of commands, we were able to quickly prototype a system that translates natural language step into our DSL. We just had to provide the existing functions (like get, await, etc.), end goal, and GPT would generate the entire workflow. For instance, if the goal was to “click on the SignUp button” GPT would generate the following DSL commands: aui.click().button().withText("SignUp").exec();.

Natural Language to DSL Demo
Natural Language to DSL Demo

The results were beyond translation; they demonstrated GPT's potential to not just interpret, but strategically plan and execute entire workflows based on a defined end goal. It's a remarkable prospect that could represent a transformative moment for us. We could now potentially leverage GPT to generate workflows for our users, granting them the flexibility to fine-tune and customize these workflows as needed. This approach isn't just about time-saving; it's about enabling users to maintain their focus on their ultimate goals, free from the intricacies of the steps required to achieve them.

Streamlining Workflows with Vision

Expanding on this vision, we've imagined a future where users may not even need to explicitly type commands. With AskUI’s workflows, users can define a series of steps using screenshots. For each step, they specify the action, and our engine executes it. But what if we could automate this specification process? What if users can just perform what action is supposed to happen and the corresponding askui command is stored internally?

Easy Worflow: Click to command conversion
Easy Workflow: Click to command conversion

In this scenario, the AskUI inference engine identifies all visible UI elements on the screen. As the engine also provides the positions of the identified elements, we use the click coordinates and map them to the corresponding element’s position. And then, using the result and the relevant documentation as context, GPT can then generate the corresponding DSL command. This approach can enhance and streamline the workflow creation process, which in turn will greatly improve the users’ experience with AskUI.

Conclusion

The transformative integration of AskUI with LLMs like GPT has shown us the possibility to reshape UI automation, offering an intuitive, efficient, and powerful tool for achieving the users’ automation goals. As AskUI will continue to explore for a better automation experience, stay tuned for updates and news!

Sign up for a free AskUI trial to test its visual UI automation features.

Murali Kondragunta
·
October 16, 2023
On this page