Benchmarks

Proven on world benchmarks

Independent evaluation across desktop and mobile. No synthetic tests: real-world tasks.

OSWorld

Desktop automation

Android World

Mobile automation

94.8%

Pass@1

First-attempt success

+15%

vs Human

Above baseline

OSWorld

Desktop OS automation

Unified environment for evaluating open-ended computer tasks across Ubuntu, Windows, and macOS. Tests real-world automation with arbitrary applications.

Ranked #1 globally, 21 points ahead of second place

Generalization across different operating systems

Works with arbitrary applications without pre-training

OSWorld Leaderboard

Leaderboard

AskUI

66.2

GTA1 w/ o3

45.2

OpenAI CUA o3

42.9

UI-TARS-1.5

42.5

Agent S2 w/ Gemini 2.5

41.4

—

Human BaselineReference

72.4

Android World Leaderboard

Leaderboard

AGI-0

97.4%

AskUI

94.8%

DroidRun

91.4%

Surfer 2

87.1%

gbox.ai

86.2%

—

Human BaselineReference

80.0%

Android World

Mobile device automation

Comprehensive testing framework for mobile device automation. Evaluates agents on real Android tasks with first-attempt success rates.

94.8% success rate on first-attempt task completion

Outperforms human baseline (80.0%) by 15 points

Tested on real Android device interactions

Get started

Start building
in minutes.

Free trial with 5,000 credits. No credit card required.

Works on any HMI · Desktop · Mobile · Embedded

Talk to Sales Free Trial

We value your privacy

We use cookies to enhance your experience, analyze traffic, and for marketing purposes.

Proven on world benchmarks

Start buildingin minutes.

We value your privacy

Start building
in minutes.