Benchmarks

Measured computer-use performance.

Independent desktop and mobile evaluations show how AskUI agents perform on real tasks, not synthetic demos.

Snapshot shown as of May 2026. Use the linked public leaderboards to confirm current ranks before citing externally.

#1OSWorldDesktop automation

#2Android WorldMobile automation

94.8%Pass@1First-attempt success

OSWorld

OSWorld evaluates agents on open-ended computer tasks across Ubuntu, Windows, and macOS. The snapshot below shows AskUI first among published systems on the linked leaderboard.

Runs across arbitrary desktop applications
21 points ahead of the next listed system
Designed for cross-OS generalization

OSWorld

Leaderboard

Snapshot shown as of May 2026. Use the linked public leaderboards to confirm current ranks before citing externally.

#1AskUI66.2

#2GTA1 w/ o345.2

#3OpenAI CUA o342.9

#4UI-TARS-1.542.5

#5Agent S2 w/ Gemini 2.541.4

-Human baseline72.4

Android World

Leaderboard

Snapshot shown as of May 2026. Use the linked public leaderboards to confirm current ranks before citing externally.

#1AGI-097.4%

#2AskUI94.8%

#3DroidRun91.4%

#4Surfer 287.1%

#5gbox.ai86.2%

-Human baseline80.0%

Android World

Android World tests real mobile device interactions with first-attempt task completion. The snapshot below lists AskUI at 94.8% pass@1.

Works on real Android interaction flows
Outperforms the published human baseline
Useful signal for mobile agent reliability

Get started

Start free. Scale when ready.

Free includes a 14-day trial, a non-commercial AgentOS license, and 5,000 inference credits. Paid plans add commercial AgentOS and Hub token-based usage.

Trial · Starter · Pro · Enterprise

View pricing Get started

Measured computer-use performance.

OSWorld

OSWorld

Android World

Android World

Start free. Scale when ready.

We value your privacy