Computer-use agents · Test automation

Why test automation needs intelligence

Software is stateful, and state breaks script-based test automation. Here's why coverage stalls at the maintenance wall, and how a computer-use agent resolves what selectors and scripts can't.

Start trial Talk to us

Why traditional QA breaks

It all comes back to state.

The more state a system carries, the harder scripts are to keep alive. Four kinds of state break test automation, and none of them are in the tester's control.

State	What keeps changing	Why scripts break
Application state	New releases keep changing the UI	Selectors break; every fix needs a developer
Environment state	Eight teams change infra, data, and access	Flaky runs, QA locked out, hours spent finding the cause
Toolchain	Twelve tools and logins for one test cycle	Most of the day goes to navigating tools, not testing
Decay over time	State drifts and scripts rot	Coverage stalls at the 40–60% maintenance wall

Environment state

Most tests fail for reasons outside the tester's control.

State isn't only in the app. Ops pushes updates, devs ship bugs, IT changes the network. Eight teams keep changing it, and every change can break a test.

Test ManagementTest Manager

Missing test concept
Tests rot when ownership shifts
Quality varies by author

Test DefinitionTester

Code-based tests need dev skills
Test architecture rots over time
Toolchain & dependency churn

OperationsDevOps

Surprise environment updates
QA locked out of DBs & backends
Config drift between envs

ApplicationDevs

New bugs ship every release
Surprise releases break tests
Features change without QA notice

Test ReportsTester

No history across runs
No classification of errors
Stakeholders can't read CI logs

Test ExecutionTester

Flakes from unexpected app states
No visibility into what ran
Selectors break on every UI tweak

Test InfrastructureTester · DevOps

Provisioning takes weeks
Security limits test scope
OS updates break runners

IT / SecurityIT / Sec

Sockets blocked at random
Whitelisting takes a ticket per env
Automation APIs blocked (ADB, a11y)

Leads toHours of debuggingHours of clarificationUseless defect reports

The maintenance wall

More testers don't help. More weeks don't help either.

Algorithmic automation can't absorb state, so coverage stalls at the 40–60% maintenance wall, and only manual effort carries the rest.

More testers don't help

More weeks don't help either

* Healthiness = passed / all tests. Failed = failed + broken + skipped.

Cascading dependencies

One state out of sync and everything fails, and algorithms can't recover from errors.

Trapped in the loop

One engineer rewrites the scripts while the team waits. State shifts again, the loop repeats.

Manual fills the gap

Only manual testers cover above the wall. Each release demands more manual work.

The shift

The fix isn't more scripts. It's intelligence.

Traditional QA resolves ambiguity through algorithms: selectors and code that only cover the fixed parts. A computer-use agent resolves it through intelligence: it sees the screen, reasons, and acts like a tester, so it absorbs state instead of breaking on it.

Why intelligence

Algorithms resolve the fixed parts. Intelligence resolves the rest.

Deterministic selectors and scripted heuristics only cover a fixed slice of the screen. As instruction ambiguity and UI complexity grow, a computer-use agent keeps resolving, far past where scripts stall.

Quadrant diagram: instruction ambiguity versus UI complexity, showing deterministic, algorithmic, and intelligent zones. The agentic testing frontier extends far past the traditional QA frontier.

An agent reasons about whatever is on the screen.

A computer-use agent works at the intelligence layer: it does everything deterministic and scripted tools do, and keeps going when the unexpected happens. Coverage is no longer capped at the wall.

Absorbs open-ended state, popups, and slow loads
Recovers from errors instead of failing the run
A human teaches it once when it gets stuck

With intelligence, coverage reaches 100%.

How you write tests

Write the test in plain English. The agent runs it.

No translating requirements into selector code, no engineer in the loop. The natural-language requirement is the test: paste it in, and the agent reads it like a tester would.

01Authoring

From a spreadsheet, CSV, or sentence.

Use existing test cases or write them in plain text. Either way the agent executes on the real UI and reports what it saw, and the same test survives the next UI change.

Use existing CSV/XLS test cases, or write it in plain text
Any tester authors it; no selectors, no code
Runs on desktop, web, mobile, and embedded HMI
Structured, audit-ready report on every run

Traditionalengineer translates → code

def test_login_alice():
    driver.get(".../login")
    driver.find_element(By.ID, "email")
        .send_keys("alice@example.com")
    driver.find_element(
        By.CSS_SELECTOR, ".login-btn").click()
    assert "dashboard" in driver.title

Hours per test · breaks on every selector change

AskUIpaste it in → the agent runs it

login_tests.xlsx

Test IDDescriptionExpected

Seconds per test · any tester can author it

The same computer-use agent will also document and operate your interfaces: capture once, and a test, a work instruction, and an operation become interchangeable. More soon.

Architecture

How it fits together.

Plain language goes in. A cloud LLM is the brain; AgentOS is the runtime that drives any interface. Structured test results come back out: pass/fail, screenshots, and traces on every run.

You write

Plain-language testsSOPs & instructionsCSV/XLS test files

The brain: an LLM in the cloudAskUI Inference by default, or your own (BYOM): Anthropic · AWS Bedrock · GCP Vertex · Azure. Screenshots never leave your tenancy.

1Seecaptures a screenshot

2Thinkreasons about the screen

3Actclicks, types, scrolls

↻ loops until the task is done. No DOM, no selectors.

AgentOS runtime: captures the screen, performs the input

Host modeAgent + AgentOS + app on one machine. Desktops, VMs, CI runners.

Companion modeAgent on a Pi or mini-PC via USB + HDMI. The target stays untouched: locked-down HMIs, mobile, embedded.

Any interface

DesktopWebMobileEmbedded HMIBench instruments

Structured output, every run

Pass / fail reportsScreenshots & tracesAudit evidence

Why AskUI

Built to run in production.

The same computer-use agent, runtime, and workspace, wherever your interfaces and your tests live.

One platform, every surface.

AgentOS runtime, a cloud LLM brain, and Hub for API keys, billing, and users, across desktop, web, mobile, embedded HMI, and physical devices.

One runtime

Plain language, not scripts.

Domain experts author and review in plain text. No selectors, no brittle recordings.

No code

Reaches what others can't.

Host Mode installs on the target; Companion Mode drives locked-down and embedded screens via capture and input.

Host + Companion

Production- and audit-grade.

On-prem and air-gapped, ISO 27001, GDPR, BYOM, and machine-generated evidence on every run.

Compliance-ready

For builders

Start for free.

Download AgentOS, clone the demo project, or start with the SDK. Add API keys when you are ready to run agents.

Start for free

For teams

Ready for production?

Commercial AgentOS, bring-your-own-model, and on-prem for distributed fleets. We'll map a plan to your stack.

Book a demo