The shining future of UI test automation

Blog Post UI Testing

Share This Post

Share on linkedin
Share on twitter
Share on email
This blog post covers the most promising technologies that will change the UI test automation market.

The shining future of UI test automation

UI test automation will change a lot in the upcoming years – the technologies used as well as the tools on the market. While this may sound critical at first, it’s actually a great thing, UI test automation will profit tremendously from recent scientific accomplishments. This has been overdue, seeing as some of the most popular tools use the same methods they used 20 years ago, even though the requirements to UI testing have changed – the tools did not. This has led to frustration for both customers and companies. Luckily five or even ten years from now, UI testing will be on par with current research and it will be for the better. The future of UI test automation is shining already, in this article we’ll show you why and how this future may look.

1. The status quo of UI test automation

Let’s take a step back and have a look at the core problem: the current state of UI testing. UI testing has become increasingly important, seeing as basically every technical device is equipped with an user interface and most of them change very frequently. UI testing has existed since UIs were introduced to the world and in the beginning testing a UI may have been a fun journey – you got a pioneer view on something you knew is going to be important some time. Testers had a list with all the features and click paths they wanted to test, they pioneered their way through the application and after a relatively short time they would hand a report over to the responsible engineers. In the early days of the internet, software and applications this process was easily manageable because the content that was to be tested was minimalistic.

Manual testers – tedious, expensive, inefficient?

Sadly and surprisingly, this is pretty much how UI testing looks like for a vast majority of companies in 2021. Manual testers are still in demand, because companies prefer them over current test automation solutions but these don’t fit their demands or their budget. Manual testing is anything but “pioneer work” or quickly done today. Manual testing is a farce, because user interfaces are not static for weeks or months as they used to be decades ago. UIs change all the time.

Just think of your favorite shop, or your preferred news page. There are new sale options, special offers, express news and much more every few minutes, sometimes seconds. How are manual testers supposed to keep up with these release cycles? Actually, they can’t. This leads to errors and bugs in applications and frustrated customers.

Selector-based UI test automation – an expired solution?

Obviously there are automation solutions available but they have some major flaws. Looking at the most popular automation tools, it’s remarkable how many of them end up having the same problem manual testers do: they do not meet the requirements. They are a solution to the tedious work manual testers have to do, but they are not a solution to the actual problem. Why is that?

Most of the mentioned popular tools are selector based, mostly CSS and XPATH. This is actually a great and smart automation solution to static websites because the exact position inside the DOM of a UI element is stored and you can rerun your test over and over to check if your selectors find the element again. Let’s suppose the said UI element is a button and you decide to make some sort of change to the button – relocate it or change the color. Selector-based tools will not be able to identify the button, the test report will mark it as a bug, testers and engineers will have to change the test plan and test code … you get the problem. A manual tester will still find the element – button = button, right? Does this make manual testing the best solution again? No, it’s just a hint that automation tools should approach UI testing more humanlike, but more on that later. To sum it up, the most popular test automation tools use the same technology they used 20 years ago. Requirements have changed, test automation tools barely did.

Current visual approaches and their flaws

Some tools are already trying to solve this problem. One way to solve this problem is the use of dynamic selectors, that try to minimize the error vulnerability of these selectors by using multiple simple selectors. These simple selectors are intelligently selected to re-find an element. Either way they will sooner or later end up with the very same problems of not re-finding elements.

Visual approaches to testing mostly focus on finding and identifying such errors. While this is a nice feature and a step in the right direction, it doesn’t solve the remaining problems. Test engineers will always end up to make changes to the code. This will always lead to the necessity of developers in the test management process and that’s not just expensive for companies and tedious for developers, it’s simply not necessary. Why? Because there are research solutions just waiting to unfold their power and potential.

2. UI testing – “A new hope”

Star Wars fans may smirk at this headline. The first Star Wars movie in 1977 introduced the world to Luke Skywalker, who (don’t worry, no spoiler) is the eponymous “new hope” for the rebellion in their fight against the empire.

Of course there is no evil empire in the UI testing world, but either way there is a new hope: science. Recent scientific accomplishments in computer vision (CV) and natural language processing (NLP) have led to a “new hope” for many use cases, one of them is UI testing.

Before we jump to potential transformations though, let’s have a look at some key definitions. Earlier we proposed that UI test automation should be accessible more easily to non-developers, and we proposed that the best solution might be a humanlike approach to UI testing. Let’s roll these proposals up first.

Scriptless Test Automation (STA) – Let’s humanize it!

Recently SoftwareTestingNews released an article  about the future of scriptless test automation. They asked two industry experts for their definition of STA.

Manikanta Gona kept it really short:

“STA is a way to create automation work without doing any manual effort.”

Miroslav Lazarov went one step further:

“(With STA …)  the tester doesn’t need to act as a programmer, it’s much better to describe the product under test with words and actions used by the customer.”

No manual effort and testers who need zero programming knowledge. The fact that this sounds like a complete dystopia to the current state of UI testing is everything that is wrong with it. That’s exactly what’s about to change in the upcoming years, because the technology needed is researched thoroughly as you will see now.

There are three technologies we are going to present. For the test creation we’ll look at Visual Description Generation (VDG), for the test execution we are going to cover Visual Question Answering (VQA),  and for the test case management we’ll look at Text Based Reconstruction (TBR). We’ll cover all of them very basic with the same use case, a simple login screen consisting of a text field for email and password and a login button.

All of them have one thing in common, it’s what we are going to call an image-centric approach to testing.

Test Creation with Visual Description Generation

IDG is strongly linked to research fields such as conditional language modelling or natural language generation tasks in NLP. Everything that is required is non-linguistic information, for example an image or a video. The goal of IDG is to generate a text snippet that describes the input and is readable for humans.

The last goal “readable for humans” may sound weird at first, but it’s actually a pretty big step forward. The ability to generate “sentence-level” descriptions for images offers a whole new world of possibilities for UI testing.

Look at the picture above and ask yourself how you would describe it. User Interfaces are relatively easy to describe, right? All colors are definite, the objects and elements can be identified clearly and there aren’t dozens of perspectives or back- and foregrounds. And that’s exactly why VDG can leverage the Test Creation to a new level, because it will describe pictures as the above one just like you would.

If you ran an Image Caption Generation Model over the UI, it would give you an output like

“There is the AskUI company logo on top of the picture. Underneath is a text field that reads email. Underneath is a text field that reads password. Underneath is a purple button that reads login.”

That is literally everything there is to the picture and exactly what you would do when creating test cases. You look at all the elements in a picture and then you want them to be typed in, clicked, et cetera.

Test Execution with Visual Question Answering (VQA)

Visual Question Answering aims at learning a model to comprehend visual content so it can answer questions in natural language.

This may sound difficult at first, so let’s start with some basics. Question Answering models are trained to be able to answer visual questions. In the picture above you could ask “What color is the login button?” and the QA model would answer “purple”. You could also ask “What does the upper text field read?” and the model would answer “email address”. While the answers may seem very short, they actually contain quite a lot of information if you look at them again. A QA model that is trained with UI data is able to identify UI elements and mark them. The question for the color of the login button requires the trained information that this is precisely a button and not any other element. Same goes for the second question that not only recognises a textfield, it is also given the location information “upper”.

In the end, this becomes an insanely powerful tool to execute your UI tests. As we mentioned before, the ultimate goal of scriptless test automation has to be that users don’t need any programming skills and this is the exact solution to the problems mentioned with manual testing and selector based automation tools.

The VQA approach enables the algorithm to look at a picture just like humans. And answer questions just like humans, even using their native language instead of any programming language. Literally everything that is needed is a screenshot of the application, that’s it. This solution is the farewell to selector based and programming-heavy testing tools.

The scalability of VQA is immense, as any visual content can be tested, no matter on which platform, display size or change of elements. Any changes will be detected by the algorithm, because it does not store information of the element, but it knows what the element looks like, no matter where it is in the picture.

VQA offers the most promising solution to UI testing and it will change the market dramatically in the (near) future.

Test case maintenance with visual text based reconstruction 

Last but not least we want to offer a perspective on what the test case maintenance might look like with something we want to call text based reconstruction (TBR). TBR is closely related to the research field of visual dialog, which has the goal to create AI agents that can dialog with humans in a natural language about a visual content. In other words, a model is trained with a dialog history and an image which in combination functions as context. Giving the model this context it will offer you an expected answer.

Now what does that mean for UI testing? With the two previously mentioned technologies in mind, we can easily generate visual information, which can function as context. We can give this information to the ML trained algorithm and it will predict and anticipate the answer.

This is best described in our example with the login button. Let’s say we upload the screenshot of the login page and start typing our test case instructions. Imagine you relocated the login button, now you want to fix the test case easily. This technology will suggest a one-click-solution on how to fix the test case. 

The AI has to be able to predict the <mask> (failed test step) just by giving the new image and the textual surrounding context (other test steps). Although this sounds really cool, it is unlikely to hit the market soon because of the huge amounts of data the AI needs to predict and expect output correctly. It is a pretty exciting outlook into the future though, imagining TBR in combination with the other presented technologies. 

Talking about an outlook, what do all these technologies and possibilities actually mean for the future of UI testing?

3. Outlook

We presented three technologies for three different stages of the UI testing process. We had a look at a model that can generate descriptions, which helps us at creating test cases. Then we had a look at some visual question answering models that can be transferred to the execution stage. Finally we looked at a way to maintain our test cases with text based reconstruction.

Let’s come to a conclusion by looking at their potential and likelihood of success when it comes to UI testing:

Starting with the most recent one, as we already stated at the end of the last chapter, it is probably unlikely to hit the UI market soon. All ML trained algorithms need tons of data and even then some perform better than others. Text based reconstruction shows great potential but is rather unlikely to unfold this potential soon because it does not completely revolutionize the UI testing “game”. It’s a nice to have feature that will surely find its way into some tools once the needed data catches up with the research progress. For now though, companies focus on the earlier stages of the UI testing progress. Once the necessary data is generated, many use cases will follow.

That brings us to the first technology we presented, description generation of visual content, used to create our test cases. This technology too shows great potential and a use case it can soon be mapped on. Throughout this article we mentioned how important it is to “humanize” UI testing, VDG is a first step in that direction.

We want to wrap this post up with the most exciting technology that will definitely shape the future of UI testing: visual question answering. VQA is the technology that enables a true humanlike approach to UI testing, all you need is a screenshot and that’s literally it. No need to look in the code, change your tests when elements change position or color and no need to worry about platforms or display sizes ever again. VQA has come to stay and it’s a blessing for UI testing.

Key takeaways:

  • Current UI test automation does not meet the industry’s requirements
  • Accomplishments in Computer Vision and Natural language processing research will change the UI market for a better
  • Visual question answering is the most promising technology as it perfectly simulates manual testers

More To Explore

Cheat Sheets

Integration Testing

Learn everything you need to get started in integration testing in our cheat sheet.

UI Testing Myths

Debunking 4 UI Testing Myths

UI Testing remains one of the most feared challenges for business owners and companies. But some myths around UI testing can be debunked.