In this article, we will give a detailed overview of askui`s architecture and how it works under the hood.
askui is built on top of a number of components. We will cover what these components are and how they work together to provide a flexible and reliable way to automate interactions with UI elements of any operating system or platform.
By the end of this article, whether you're a software developer, QA engineer, or automation specialist, you'll have a solid understanding of how askui works, and be able to use this knowledge to build more efficient automation for your project.
askui consists of three building blocks:
- askui Control Client
- askui UI Controller
- askui Inference Server
We will step through each of them and see how they work together to perform UI automation.
Throughout this article, we will use some terms that describe certain parts of askui. Some of them are used only internally and not exposed by the askui Control Client API, but are important for understanding how askui works and what it can do. Please refer to these terms while reading.
- Element-description: A description for a UI element. In the askui Control Client API, for example, it is the coded description like button() or textfield().contains().text().withText('Email').
- Action: A method in the askui Control Client API that describes an action to be taken against the operating system. For example click(), type().
- InputEvent (internal): A specific type of action to be taken against the operating system. For example MouseInputEvent or KeyboardInputEvent.
- ControlCommand (internal): A command sent to the UI controller telling what to perform on the operating system. It consists of one or more InputEvents.
askui Control Client
The askui Control Client provides the API that tells askui what/how to automate. Once you start using askui, you will mostly interact with askui via the askui Control Client. In most of our tutorials and demonstrations, you will see let aui: UIControlClient is declared and combined with an Action and Element-descriptions which ends up forming an instruction, e.g:
As shown above, you form an instruction by chaining an Action with Element-descriptions using the Fluent API of the askui Control Client. It is designed as a fluent interface to increase readability and make it more understandable.
The askui Control Client sends a request to the askui UI Controller:
- to take a screenshot.
- with a ControlCommand that tells what InputEvent to perform on the operating system.
- The askui Control Client communicates with the askui Inference Server:
- to send a screenshot to be annotated with the instruction.
- to receive the annotation, e.g. detected elements.
To use the askui Control Client, user credentials are required. User credentials can be obtained via our User Portal.
See our API documentation for more information on this component.
askui UI Controller
The askui UI Controller is a binary that controls the operating system. This binary gets automatically downloaded when the UiController is initialized by calling UiController.start(). Once executed, it stays in the background and communicates with the askui Control Client on a specific port to receive the ControlCommand. Based on the given ControlCommand, it triggers InputEvents respectively.
The askui UI Controller is responsible for:
- Taking a screenshot.
- Triggering the InputEvent, i.e MouseInputEvent, KeyboardInputEvent, or shell execution.
- Running the interactive annotation.
See our API documentation for more information on this component.
The askui Inference Server is responsible for the prediction of UI elements within the given screenshot. As soon as it receives a request from the askui Control Client, it performs the prediction on the given image and returns the annotation to the askui Control Client.
For the inference, we use a machine-learning model that consists of several submodels:
- Object Detector: Detects UI elements (e.g button, textfield).
- Icon Classifier: Predicts the class of an icon based on the detected objects (e.g. a user icon 👤).
- Optical Character Recognition (OCR): Converts the image of a text into text.
- Custom Element Detector: Searches for an area in the given screenshot that matches the image given by the Element-description .customElement().
Them All in Action
Assuming that we run askui on the same device we want to automate, the simplest synopsis can be described as such:
When running askui,
1. The askui Control Client checks whether it is needed to be processed by the Inference Server.
2. If the code contains any of the Element-description or Getters, then the askui Control Client tells the askui UI Controller to take a screenshot of the given screen and sends it to the Inference Server.
3. After the askui Control Client has retrieved the annotation back from the server, it sends a ControlCommand to the askui UI Controller. Afterwards, the askui UI Controller triggers the InputEvent on the operating system.
4. If the code contains an Action but no Element-description, then the askui Control Client sends the ControlCommand to the askui UI Controller to trigger the InputEvent directly.
- An Element-description represents a specific type of UI element that can be recognized by inference. Most of the commonly used UI elements such as button, textfield are supported and can be used.
- An Action represents a specific type of action to be performed, i.e Mouse/Keyboard Input Event or Shell Command. This action can be performed on a specific element when combined with Element-descriptions or can be performed on its own as shown in the example right above.
Please visit our API Docs, if you want to learn more about different types of Element-description and Action.
Here we have seen the three core components of askui. If you aim to use askui in a more advanced way, e.g. integrating it into your CI/CD pipeline, it may be worthwhile to get an overview of how it is composed. For more practical examples, please refer to our Tutorials and API docs. And don't forget to come over to our Discord community, if you have any questions about askui!
More to explore
Automating WebGL-/Canvas-based Website
WebGL-based websites are becoming increasingly popular among web developers looking to create immersive and interactive user experiences. By drawing graphics and animations directly on the canvas, they offer developers a high degree of creative control over the user interface, providing a more flexible and dynamic platform for graphical compositions. They also offer a more interactive user experience, allowing users to interact with graphics and animations in real-time.
Take a Screenshot with askui - Also on Fail!
When you develop UI-Automations you often run into the problem that they fail inexplicably 😥. This leads to frustration and oftentimes stops the effort entirely! We will show you how can level up your debugging skills by teaching you a way to take screenshots during an askui execution and when on fail of an execution. With this knowledge you can lower your frustration and become faster when developing UI-Workflows because you see what askui sees.
How to Do Assertions in askui
Real interactivity is what characterizes what an end-user is doing on a User Interface (UI). This is why it is important to extract data from your UI into your tool and use it later, for assertions for example 🔧.We will show two ways to use askui for real interactivity by extracting text out of your UI and use it for assertions.