TLDR
Step-by-step guide to automating web search on Android devices using AskUI's VisionAgent + Python. Works with any UI without relying on DOM or accessibility IDs.
Introduction
Mobile automation doesn't have to be complex. This tutorial shows you how to automate a web search on Android using AskUI and Python, starting from your device's home screen.
Unlike traditional tools that break when UI layouts change, AskUI uses computer vision to interact with your device like a human would making your automation code resilient and adaptable.
Why Vision-Based Android Automation?
AskUI vs Traditional Automation Tools
| Traditional Tools | Vision-Based (AskUI) |
|---|---|
| Needs accessibility IDs | Works with any visible element |
| Breaks when UI updates | Adapts to visual changes |
| Can't automate games/canvas | Automates anything visible |
| Requires app source code | Works on production apps |
| Single app only | Cross-app workflows |
Key Benefits:
Universal Automation
AskUI sees the screen like a user does, enabling automation across:
- Native Android apps
- Web browsers
- Hybrid applications
- Games and custom UIs
Resilience to Changes
Visual recognition means your automation survives:
- Button repositioning
- Theme updates
- Layout modifications
- Minor UI redesigns
Human-Readable Code
Write intuitive commands like:
agent.click(loc.Text("Search"))agent.type("wikipedia")agent.wait(2)
Prerequisites Checklist
Before starting, ensure you have:
- Android Device Setup - Device/emulator with USB debugging enabled
- ADB Installed - Android Debug Bridge for device connection
- AskUI Controller App - Running on your Android device
- Python 3.8+ - With pip package manager
- AskUI Account - For workspace ID and access token
Follow the complete Android setup guide for detailed instructions.
Step 1: Set Up Your Python Environment
# Create and activate virtual environment
python -m venv askui_env
source askui_env/bin/activate # Windows: askui_env\Scripts\activate
# Install AskUI SDK
pip install askui
# Install python-dotenv for credentials management
pip install python-dotenv
# Create .env file for credentials
echo "ASKUI_WORKSPACE_ID=your_workspace_id" >> .env
echo "ASKUI_TOKEN=your_access_token" >> .envStep 2: Write Your Automation Code
Create search_automation.py:
from askui import VisionAgent
from askui import locators as loc
import logging
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
def automate_wikipedia_search():
"""
Automates Wikipedia search on Android device using VisionAgent.
Prerequisites: AskUI Controller running on Android device.
"""
# Initialize VisionAgent with credentials
with VisionAgent(
workspace_id=os.getenv('ASKUI_WORKSPACE_ID'),
token=os.getenv('ASKUI_TOKEN'),
log_level=logging.INFO
) as agent:
try:
print("Starting Android automation...")
# Step 1: Open Chrome browser
print("Opening Chrome...")
agent.click(loc.Text("Chrome"))
agent.wait(3) # Wait for Chrome to load
# Step 2: Click search bar
print("Finding search field...")
# Try different text variations that might appear
try:
agent.click(loc.Text("Search or type web address"))
except:
# Fallback for different UI text
agent.click(loc.Text("Search"))
# Step 3: Type search query
print("Typing search query...")
agent.type("wikipedia")
# Step 4: Submit search (Enter key)
print("Submitting search...")
agent.keyboard('enter')
agent.wait(2) # Wait for results
# Step 5: Handle potential cookie consent
try:
# Check if cookie popup exists
agent.locate(loc.Text("Accept"))
agent.click(loc.Text("Accept"))
agent.wait(1)
except:
# No cookie popup, continue
pass
# Step 6: Click Wikipedia link
print("Clicking Wikipedia result...")
agent.click(
loc.Text("Wikipedia")
.nearest_to(loc.Text("wikipedia.org"))
)
agent.wait(2)
# Step 7: Verify page loaded
print("Verifying Wikipedia page...")
try:
# Check that a visible "Wikipedia" element is present on the page
agent.locate(loc.Text("Wikipedia"))
print("Success! Wikipedia page opened.")
return True
except:
print("Warning: Could not verify Wikipedia page")
return False
except Exception as e:
print(f"Error: {e}")
print("\nTroubleshooting tips:")
print("1. Ensure AskUI Controller is running on Android")
print("2. Check ADB connection: adb devices")
print("3. Verify Chrome is installed on device")
return False
if __name__ == "__main__":
automate_wikipedia_search()Version Notice: This code uses AskUI VisionAgent patterns. As the Python SDK evolves, some method names or import paths may vary. Consult the latest documentation for updates.
Step 3: Run Your Automation
Execute the automation:
python search_automation.pyExpected flow:
- Connects to Android device via AskUI Controller
- Visually locates and clicks Chrome icon
- Finds and interacts with search field
- Types "wikipedia" and submits search
- Handles cookie popups if present
- Clicks on Wikipedia result
- Verifies the page loaded
Advanced: Making Your Code More Robust
Handle Different UI States
from askui import VisionAgent
from askui import locators as loc
def open_chrome_flexible(agent):
"""
Opens Chrome with fallback strategies
"""
strategies = [
# Strategy 1: Direct click on Chrome text
lambda: agent.click(loc.Text("Chrome")),
# Strategy 2: Click on Chrome in app drawer
lambda: (
agent.click(loc.Text("Apps")),
agent.wait(1),
agent.click(loc.Text("Chrome"))
),
# Strategy 3: Use AI element detection
lambda: agent.click(loc.AiElement("chrome-icon"))
]
for i, strategy in enumerate(strategies, 1):
try:
print(f"Trying strategy {i}...")
strategy()
return True
except:
continue
raise Exception("Could not open Chrome with any strategy")Visual Debugging
def debug_screen(agent):
"""
Analyze what's visible on screen
"""
# Use AI to describe screen
screen_content = agent.get("What text and buttons are visible?")
print(f"Screen analysis: {screen_content}")
# Check for specific elements
elements_to_check = ["Chrome", "Search", "Apps"]
for element in elements_to_check:
try:
agent.locate(loc.Text(element))
print(f"Found: {element}")
except:
print(f"Not found: {element}")Using Visual Relationships
# Click button near specific text
agent.click(
loc.Text("Add to cart")
.below_of(loc.Text("Product Name"))
)
# Click icon to the right of text
agent.click(
loc.AiElement("settings-icon")
.right_of(loc.Text("Options"))
)
# Complex relationships
agent.click(
loc.Text("Submit")
.above_of(loc.Text("Cancel"))
.nearest_to(loc.Text("Form"))
)Troubleshooting Guide
Common Issues and Solutions
| Issue | Solution |
|---|---|
| Connection refused | Ensure AskUI Controller app is running on your Android device |
| Element not found | Use agent.get() to analyze screen content |
| Slow performance | Add agent.wait() between actions |
| Chrome not opening | Try different locator strategies (Text, AiElement, Prompt) |
Debugging Techniques
# Check element existence
def element_exists(agent, locator):
try:
agent.locate(locator)
return True
except:
return False
# Wait for element
def wait_for_element(agent, locator, timeout=10):
import time
start = time.time()
while time.time() - start < timeout:
if element_exists(agent, locator):
return True
agent.wait(0.5)
return False
# Usage
if wait_for_element(agent, loc.Text("Welcome")):
print("Page loaded")Use Cases
AskUI VisionAgent excels at:
- Cross-app workflows - Automate across multiple applications
- Legacy app testing - No source code needed
- Competitor analysis - Test any app on Play Store
- Visual regression - Detect UI changes automatically
- Game automation - Works with any visual interface
FAQ
Q: Do I need an AskUI account for local Android testing?
A: Yes, you need workspace credentials even for local device testing. Sign up at askui.com.
Q: Can I use this with emulators?
A: Yes, AskUI works with Android emulators that support ADB connections.
Q: How does VisionAgent differ from traditional selectors?
A: VisionAgent uses computer vision to "see" the screen, while traditional tools rely on code-level element identifiers.
Q: What Android versions are supported?
A: Android 6.0 (API 23) and above with AskUI Controller installed.
Next Steps
Ready to build more complex automations?
- Cross-app workflows - Automate between multiple apps
- Visual assertions - Verify UI states visually
- Parallel execution - Run on multiple devices
- CI/CD integration - Add to your testing pipeline
