RoboCon2025

Dear AI, Which Tests should Robot Framework Execute Now?
2025-02-13 , RoboCon

The more tests we have, the longer it takes to execute them all. This makes debugging more painful and costly. How can we select those few tests find new bugs quickly?
Many approaches have been proposed that use AI to tackle this question. Our team has implemented them and tried them in many projects. I will share our failures and successes.


The more tests we have, the longer it takes to execute them all. This increases feedback times between when a new bug gets introduced and when a Robot Framework tests reveals it. In consequence, debugging gets more painful and costly.
And lets be honest, it also takes a lot of the fun out of test automation, as recipients of late test failures are often unhappy about the news and tend to vent on the messenger. It is better for all involved, when test results are delivered quickly.

In theory, there is a simple approach to fix this: Don’t always execute all tests. Instead, select a small subset of tests that runs much faster. Then execute this subset more frequently.

This is a good idea if this small subset find a large percentage of the bugs in a small fraction of the time. For example, if we can find 80% of the bugs (that executing all tests would discover) in 5% of the time (it would take to execute all tests) then we could improve feedback times massively for most bugs.
In practice, however, this idea obviously hinges on how well we manage to select those tests.

We have spent the last decade working on this problem, both in research (through master thesis and PhD thesis projects) and in practice. We started out with approaches that do not use AI: For example, test impact analysis uses test-case specific code coverage and greedy optimization algorithms to select those tests that cover a given set of code changes most quickly. In recent years, we have included approaches that use AI to tackle this question. For example, predictive test selection learns from code changes and past test failures to predict which tests can spot new bugs in new code changes. Other approaches use information retrieval or distances between LLM embeddings of test cases to suggest test cases without requiring code coverage information. Finally, defect prediction approaches go one step further, and predict where in the code base bugs are most likely to occur.

Our team has implemented all of these approaches. We have tried them in our own development and test projects. We have also applied them in customer contexts. In this talk, I will share our failures and successes and outline how they can applied when using Robot Framework.

I will also give a checklist of which approaches work best in which contexts -- often you can really find 80% of the bugs in 5% or less of the time. But I will also reveal which approaches should be avoided at all costs, even when they are really shiny, because they do not work at all.


In-Person or Online talk?:

In-person only. (The online conference is during public school holidays where I live, so I am on family time and cannot participate)

Categorize / Tags:

test selection, test impact analysis, predictive test selection, defect prediction, find more bugs faster

Lessons Learned:

1) Overview of test selection approaches
2) Which test selection approaches operate on easily available data and how well they work
3) Which test selection approaches operate on more costly data, how much better they are, and when they are worth it
4) Why defect prediction is snake oil and does not work at all
5) How to optimize a smoke test suite to find 80% of the bugs in 6% of the time
6) How to use a selected test suite as a quality gate before more costly testing
7) Best practices on including test selection into continuous test execution in the CI pipeline.

Describe your intended audience:

This talk is for a broad audience. No prior knowledge on test selection is required.

Is this suitable for ..?:

Beginner RF User, Intermediate RF User, Advanced RF User

Elmar works both as a researcher and a founder. Elmar wrote his award-winning PhD thesis on static code analysis and is still active as a researcher in software quality analysis. In 2009, he co-founded CQSE GmbH. He (and his 60 colleagues) since help international teams in using software intelligence analyses to write better software in less time. Elmar frequently gives keynotes and talks at research conferences (e.g. ICSE, ICPC, SANER) and industry events (e.g. German Testing Day, Software Quality Days, Agile Testing Days, AutomationSTAR, EuroSTAR). He was elected best speaker 10+ times, including at Clean Code Days, German Testing Day and Software Quality Days. Elmar was named Junior Fellow of the GI in 2015. His research was cited 3600+ times. Elmar received the German Award for Software Quality in 2024.