Back to Blog
    Tutorial7 min readFebruary 13, 2026

    Getting Started: Computer-Use Agents with the AskUI Python SDK

    Learn how to build computer-use agents with the AskUI Python SDK. Run your first VisionAgent (agent.act/agent.get), then make runs debuggable and repeatable with Tool Store tools like screenshots, file I/O, and LoadImageTool.

    Jonas Menesklou
    Getting Started: Computer-Use Agents with the AskUI Python SDK

    TLDR

    Computer-use agents let AI operate real user interfaces by perceiving what’s on screen and taking OS-level actions, useful when selectors are missing or unstable. With the AskUI Python SDK, you can run intent-based instructions with agent.act() / agent.get() and attach Tool Store tools (e.g., screenshots, file tools) to make runs easier to debug and reuse. If you’re automating a web-only app with stable DOM selectors, Playwright may be simpler. AskUI fits when you need cross-app, OS-level automation beyond the DOM.

    Note on naming
    This guide reflects the AskUI Python SDK naming introduced in the v0.23.1 releases. In code, you typically create a VisionAgent.

    Introduction

    In the agent era, automation is shifting from brittle selector scripts to agents that can execute intent directly on interfaces users actually operate. A computer-use agent perceives what’s on screen and takes OS-level actions such as click, type, and navigate, making it practical when DOM-based automation is fragile or not available.

    What you’ll build in this guide

    1. Run a first intent-based agent with VisionAgent
    2. Save a screenshot artifact for debugging
    3. Parameterize a run with input.txt → write results to output/result.txt

    If you want a deeper architecture explanation, read: Understanding AskUI: The Eyes and Hands of AI Agents

    Computer Use Agent vs Selector Based Automation

    Traditional automation relies on DOM selectors or object identifiers. Computer use agents rely on visible cues such as text, layout, icons, and images, which makes them useful when selectors are missing or brittle.

    FeatureComputer use agentsSelector based tools
    Element targetingVisual cues such as text, layout, icons, and imagesDOM selectors such as id, class, XPath, and CSS
    Most likely to break whenUI changes visually in meaningful waysDOM structure or selectors change
    App coverageAny UI you can see on screenMostly web apps with accessible DOM
    MaintenanceLower selector maintenance, more resilient to refactorsOngoing selector maintenance and brittleness overtime
    Best fitDesktop apps, virtualized environments, kiosks, and custom-rendered UIsModern web apps with stable selectors

    Key Applications and Use Cases

    Computer use agents are especially useful when:

    • You need to automate beyond stable DOM selectors (desktop apps, virtualized environments, custom-rendered UIs)
    • The UI is canvas based or custom rendered, where selectors are brittle or missing
    • You want end-to-end workflows that reflect real user behavior and catch UI regressions

    They also work well for:

    • Cross application workflows
    • Document assisted processes where saving screenshots improves debugging and auditability

    Getting Started: Build Your First Agent with Python

    Prerequisites

    • Python 3.10 version or higher
    • VS code or any Python IDE
    • Windows, macOS, or Linux

    Step 1 : Installation

    pip install "askui[all]"

    For the latest installation notes and platform specific extras, see the docs→

    Step 2: Sign up with AskUI

    To run the examples, you’ll need an AskUI workspace and access token.

    1. Sign up at hub.askui.com.
    2. Copy your Workspace ID and Access Token from the Hub.

    Step 3:  Configure environment variables

    macOS / Linux:

    export ASKUI_WORKSPACE_ID="<your-workspace-id>" export ASKUI_TOKEN="<your-access-token>"

    Windows PowerShell:

    $env:ASKUI_WORKSPACE_ID="<your-workspace-id>" $env:ASKUI_TOKEN="<your-access-token>"

    Optional (Anthropic models):

    export ANTHROPIC_API_KEY="<your-anthropic-api-key>"

    Step 4: Verify your set up with a first script

    Create a file named agent_demo.py:

    from askui import VisionAgent def main(): with VisionAgent() as agent: agent.act( "Open a browser, go to wikipedia.org, open the English Wikipedia main page, " "and find the 'On this day' section." ) text = agent.get("Read the first bullet point under 'On this day' and return it as plain text.") print(f"\n📌 On this day: {text}\n") if __name__ == "__main__": main()

    Run it:

    python agent_demo.py

    If you see logs in the terminal and a line like 📌 On this day: ..., you’re ready, your agent is successfully operating the interface and extracting information from the screen.

    Step 5: Save artifacts (screenshots) for faster debugging

    Screenshots are a useful debugging artifact because they show what the agent actually saw on the screen. With AskUI’s Tool Store, you can attach optional tools to your runs to capture artifacts like screenshots for easier debugging and repeatability.

    5.1 Create the screenshots folder (first-time setup) Create a local folder where screenshots can be written:

    mkdir -p screenshots

    5.2 Run the same flow + save a screenshot Create a new file named agent_demo_with_artifacts.py:

    from askui import VisionAgent from askui.tools.store.computer import ComputerSaveScreenshotTool from askui.tools.store.universal import PrintToConsoleTool def main(): with VisionAgent() as agent: agent.act( "Open a browser, go to wikipedia.org, open the English Wikipedia main page.", ) agent.act( "Now take a screenshot and save it into the screenshots folder (e.g., wiki.png). Also print ‘screenshot saved’.", tools=[ ComputerSaveScreenshotTool(base_dir="./screenshots"), PrintToConsoleTool(), ], ) if __name__ == "__main__": main()

    Tip: For more reliable results, keep “do the task” and “save an artifact” as separate agent.act() calls.

    Run it:

    python agent_demo_with_artifacts.py

    After it finishes, check:

    ls -la screenshots

    You should see at least one image saved in the ./screenshots folder.

    Note: PrintToConsoleTool is optional. It’s nice if you want extra log messages, but screenshots do not require it.

    Step 6: Extend runs with Tool Store (files)

    Once your first run works, the next step is making it repeatable: move inputs and outputs into files so you can rerun the same flow with different data and keep artifacts for debugging.Tool Store’s universal file tools (ReadFromFileTool / WriteToFileTool) let you read inputs from disk and persist outputs back to files.

    6.1 Create an input file

    Create input.txt:

    echo "Artificial intelligence" > input.txt

    6.2 Read input → run the flow → write output Create a new file named agent_demo_with_files.py:

    from askui import VisionAgent from askui.tools.store.universal import ReadFromFileTool, WriteToFileTool def main(): with VisionAgent() as agent: agent.act( "1) Use ReadFromFileTool to read 'input.txt'. " "2) Open a browser, go to wikipedia.org, search for that exact text, and open the first result. " "3) Read the first sentence of the article introduction. " "4) Save ONLY that first sentence into 'result.txt' using WriteToFileTool.", tools=[ ReadFromFileTool(base_dir="."), WriteToFileTool(base_dir="./output"), ], ) print("\n✅ Done. Check ./output/result.txt\n") if __name__ == "__main__": main()

    Run it:

    python agent_demo_with_files.py

    Check the output:

    cat output/result.txt

    At this point you have a reusable run:

    • Change one line in input.txt
    • Run the script again
    • Get a new result in ./output/result.txt

    6.3 Load an image from disk (optional) If your workflow needs a reference image (e.g., compare against a baseline screenshot), you can load images from disk with LoadImageTool for analysis or visual inspection.

    from askui import VisionAgent from askui.tools.store.universal import LoadImageTool def main(): with VisionAgent() as agent: agent.act( "Describe the logo image called './images/logo.png'.", tools=[LoadImageTool(base_dir="./images")], ) if __name__ == "__main__": main()

    Step 7: Where to go next

    At this point you have a working baseline:

    • Run intent-based automation with VisionAgent (agent.act() / agent.get())
    • Capture screenshots as debugging artifacts
    • Parameterize runs with input.txt → persist results to output/result.txt

    Next, you can level this up by:

    • Saving screenshots around key actions to make debugging faster
    • Splitting long instructions into smaller steps for stability
    • Adding more Tool Store tools as your workflows grow

    Demo project (optional)

    If you want a complete end-to-end example (Tool Store, custom tools, CSV-driven steps, caching, and HTML reports), check out the AskUI Demo Project.

    For more examples and platform-specific setup, see the AskUI documentation.

    Ready to deploy your first AI Agent?

    Don't just automate tests. Deploy an agent that sees, decides, and acts across your workflows.

    We value your privacy

    We use cookies to enhance your experience, analyze traffic, and for marketing purposes.