Computer-use agents · Test automation

    Why test automation needs intelligence

    Software is stateful, and state breaks script-based test automation. Here's why coverage stalls at the maintenance wall, and how a computer-use agent resolves what selectors and scripts can't.

    Why traditional QA breaks

    It all comes back to state.

    The more state a system carries, the harder scripts are to keep alive. Four kinds of state break test automation, and none of them are in the tester's control.

    StateWhat keeps changingWhy scripts break
    Application stateNew releases keep changing the UISelectors break; every fix needs a developer
    Environment stateEight teams change infra, data, and accessFlaky runs, QA locked out, hours spent finding the cause
    ToolchainTwelve tools and logins for one test cycleMost of the day goes to navigating tools, not testing
    Decay over timeState drifts and scripts rotCoverage stalls at the 40–60% maintenance wall
    Environment state

    Most tests fail for reasons outside the tester's control.

    State isn't only in the app. Ops pushes updates, devs ship bugs, IT changes the network. Eight teams keep changing it, and every change can break a test.

    Test ManagementTest Manager
    • Missing test concept
    • Tests rot when ownership shifts
    • Quality varies by author
    Test DefinitionTester
    • Code-based tests need dev skills
    • Test architecture rots over time
    • Toolchain & dependency churn
    OperationsDevOps
    • Surprise environment updates
    • QA locked out of DBs & backends
    • Config drift between envs
    ApplicationDevs
    • New bugs ship every release
    • Surprise releases break tests
    • Features change without QA notice
    Test ReportsTester
    • No history across runs
    • No classification of errors
    • Stakeholders can't read CI logs
    Test ExecutionTester
    • Flakes from unexpected app states
    • No visibility into what ran
    • Selectors break on every UI tweak
    Test InfrastructureTester · DevOps
    • Provisioning takes weeks
    • Security limits test scope
    • OS updates break runners
    IT / SecurityIT / Sec
    • Sockets blocked at random
    • Whitelisting takes a ticket per env
    • Automation APIs blocked (ADB, a11y)
    Leads toHours of debuggingHours of clarificationUseless defect reports
    The maintenance wall

    More testers don't help. More weeks don't help either.

    Algorithmic automation can't absorb state, so coverage stalls at the 40–60% maintenance wall, and only manual effort carries the rest.

    TEST COVERAGEMaintenance wall40%60%Automationtesters hired →

    More testers don't help

    TEST BASE HEALTHINESSMAINTENANCE EFFORT100%50%0%Automationdefect foundsetup failing: IT changed login flowweeks →

    More weeks don't help either

    * Healthiness = passed / all tests. Failed = failed + broken + skipped.

    Cascading dependencies

    One state out of sync and everything fails, and algorithms can't recover from errors.

    Trapped in the loop

    One engineer rewrites the scripts while the team waits. State shifts again, the loop repeats.

    Manual fills the gap

    Only manual testers cover above the wall. Each release demands more manual work.

    The shift

    The fix isn't more scripts. It's intelligence.

    Traditional QA resolves ambiguity through algorithms: selectors and code that only cover the fixed parts. A computer-use agent resolves it through intelligence: it sees the screen, reasons, and acts like a tester, so it absorbs state instead of breaking on it.

    Why intelligence

    Algorithms resolve the fixed parts. Intelligence resolves the rest.

    Deterministic selectors and scripted heuristics only cover a fixed slice of the screen. As instruction ambiguity and UI complexity grow, a computer-use agent keeps resolving, far past where scripts stall.

    Quadrant diagram: instruction ambiguity versus UI complexity, showing deterministic, algorithmic, and intelligent zones. The agentic testing frontier extends far past the traditional QA frontier.

    An agent reasons about whatever is on the screen.

    A computer-use agent works at the intelligence layer: it does everything deterministic and scripted tools do, and keeps going when the unexpected happens. Coverage is no longer capped at the wall.

    • Absorbs open-ended state, popups, and slow loads
    • Recovers from errors instead of failing the run
    • A human teaches it once when it gets stuck

    With intelligence, coverage reaches 100%.

    How you write tests

    Write the test in plain English. The agent runs it.

    No translating requirements into selector code, no engineer in the loop. The natural-language requirement is the test: paste it in, and the agent reads it like a tester would.

    01Authoring

    From a spreadsheet, CSV, or sentence.

    Use existing test cases or write them in plain text. Either way the agent executes on the real UI and reports what it saw, and the same test survives the next UI change.

    • Use existing CSV/XLS test cases, or write it in plain text
    • Any tester authors it; no selectors, no code
    • Runs on desktop, web, mobile, and embedded HMI
    • Structured, audit-ready report on every run

    The same computer-use agent will also document and operate your interfaces: capture once, and a test, a work instruction, and an operation become interchangeable. More soon.

    Architecture

    How it fits together.

    Plain language goes in. A cloud LLM is the brain; AgentOS is the runtime that drives any interface. Structured test results come back out: pass/fail, screenshots, and traces on every run.

    You write
    Plain-language testsSOPs & instructionsCSV/XLS test files
    The brain: an LLM in the cloudAskUI Inference by default, or your own (BYOM): Anthropic · AWS Bedrock · GCP Vertex · Azure. Screenshots never leave your tenancy.
    1Seecaptures a screenshot
    2Thinkreasons about the screen
    3Actclicks, types, scrolls
    ↻ loops until the task is done. No DOM, no selectors.
    AgentOS runtime: captures the screen, performs the input
    Host modeAgent + AgentOS + app on one machine. Desktops, VMs, CI runners.
    Companion modeAgent on a Pi or mini-PC via USB + HDMI. The target stays untouched: locked-down HMIs, mobile, embedded.
    Any interface
    DesktopWebMobileEmbedded HMIBench instruments
    Structured output, every run
    Pass / fail reportsScreenshots & tracesAudit evidence
    Why AskUI

    Built to run in production.

    The same computer-use agent, runtime, and workspace, wherever your interfaces and your tests live.

    One platform, every surface.

    AgentOS runtime, a cloud LLM brain, and Hub for API keys, billing, and users, across desktop, web, mobile, embedded HMI, and physical devices.

    One runtime

    Plain language, not scripts.

    Domain experts author and review in plain text. No selectors, no brittle recordings.

    No code

    Reaches what others can't.

    Host Mode installs on the target; Companion Mode drives locked-down and embedded screens via capture and input.

    Host + Companion

    Production- and audit-grade.

    On-prem and air-gapped, ISO 27001, GDPR, BYOM, and machine-generated evidence on every run.

    Compliance-ready
    For builders

    Start for free.

    Download AgentOS, clone the demo project, or start with the SDK. Add API keys when you are ready to run agents.

    Start for free
    For teams

    Ready for production?

    Commercial AgentOS, bring-your-own-model, and on-prem for distributed fleets. We'll map a plan to your stack.

    Book a demo

    We value your privacy

    We use cookies to enhance your experience, analyze traffic, and for marketing purposes.