The AskUI Vision Agent is revolutionizing automation by enabling users to interact with applications using natural language commands. One of its standout features is the Act command act(), which allows users to simulate actions like typing, clicking, or navigating through interfaces. To help you maximize its potential, here are five essential tips for using the Act command effectively, along with code examples.
1. Use "simulate" for Enhanced Stability
When using the Act command, the keyword simulate can improve stability and accuracy. It mimics user interactions more precisely, ensuring that actions like typing or clicking are performed as a real user would.
Example:
agent.act("simulate a user clicking into the textfield and typing username: xyz and password: 123456")
By simulating actions rather than directly executing them, you reduce errors caused by UI inconsistencies or timing issues.
2. Add "if" Conditions for Context Awareness
To ensure reliable execution, include conditional prompts like if statements. For example, only proceed with an action if specific fields (e.g., search bars) are empty. This approach prevents unintended overwrites or conflicts in native apps.
Example:
agent.act("Click on Login button, if the screen does not change then that probably means that you forgot to enter the username and password, which is username: xyz and password: 123456")
This technique ensures the agent adapts dynamically to the application's state, improving robustness in automation workflows.
3. Break Actions Into Step-by-Step Instructions
Complex tasks can be simplified by breaking them into smaller steps. This makes debugging easier and enhances reliability by ensuring each step is executed sequentially.
Example:
agent.act("""
Simulate a user doing the following actions:
1. Click on the Textfield below text username
2. Type in username: xyz
3. Click on the Textfield below text password
4. Type in password: 123456
5. Click on the Login button
""")
This approach is particularly useful for multi-step processes like form filling or playing games (e.g., Blackjack), where precision is critical.
4. Add Verification Conditions for Stability
Verification conditions are crucial when dealing with dynamic or ambiguous application states. For example, you can instruct the agent to proceed only if certain fields (e.g., search bars) are empty or specific UI elements are visible.
Example:
agent.act("""
Simulate a user doing the following actions:
1. Click on the Textfield below text username
2. Type in username: xyz
3. Click on the Textfield below text password
4. Type in password: 123456
5. Click on the Login button
Only proceed if the search bar is empty.
""")
This ensures that your automation script adapts dynamically to real-time application states, preventing errors like overwriting existing data.
5. Use Multiple Approaches for Actions
Flexibility is key when automating tasks, especially for repetitive actions like deleting text. The AskUI Vision Agent allows you to use multiple approaches for the same action, ensuring compatibility across different scenarios.
Example:
# Approach 1: Delete using backspace
agent.act("Simulate deleting the text 'xyz' by pressing backspace")
# Approach 2: Delete using right-click and delete option
agent.act("Simulate deleting the text 'xyz' by right-clicking and pressing delete")
# Approach 3: Delete using cmd+a and backspace
agent.act("Simulate deleting the text 'xyz' by pressing command + a, and afterwards pressing backspace")
By combining multiple approaches, you increase flexibility and ensure that your automation script works across various environments and input methods.
Why These Tips Matter
The AskUI Vision Agent's Act command leverages cutting-edge AI capabilities to automate tasks across diverse platforms (Windows, MacOS, Linux, Android, iOS). By applying these tips:
- You enhance stability and reliability in your automation workflows.
- You improve context awareness for dynamic applications.
- You simplify complex tasks into manageable steps.
- You ensure precision when interacting with ambiguous UI elements.
- You increase flexibility by incorporating multiple approaches to achieve actions.
If you have any Feedback, feel free to join our Discord and share what you were able to automate or build!