Benchmarks
Measured computer-use performance.
Independent desktop and mobile evaluations show how AskUI agents perform on real tasks, not synthetic demos.
Snapshot shown as of May 2026. Use the linked public leaderboards to confirm current ranks before citing externally.
OSWorld
OSWorld evaluates agents on open-ended computer tasks across Ubuntu, Windows, and macOS. The snapshot below shows AskUI first among published systems on the linked leaderboard.
- Runs across arbitrary desktop applications
- 21 points ahead of the next listed system
- Designed for cross-OS generalization
OSWorld
LeaderboardSnapshot shown as of May 2026. Use the linked public leaderboards to confirm current ranks before citing externally.
Android World
LeaderboardSnapshot shown as of May 2026. Use the linked public leaderboards to confirm current ranks before citing externally.
Android World
Android World tests real mobile device interactions with first-attempt task completion. The snapshot below lists AskUI at 94.8% pass@1.
- Works on real Android interaction flows
- Outperforms the published human baseline
- Useful signal for mobile agent reliability
Start free. Scale when ready.
Free includes a 14-day trial, a non-commercial AgentOS license, and 5,000 inference credits. Paid plans add commercial AgentOS and Hub token-based usage.