Don’t let Test Automation be the final nail in the coffin!

by Ger Cloudt, author of “What is Software Quality?”

Image by Michael Schwarzenberger from Pixabay

“We need an extra Test Engineer.”,  the Agile Master informed me on a sunny Monday morning. Already I noticed that the team needed more effort to keep the dashboard of the nightly build and testing green. More often in the morning “reds” were reported indicating the build or test had failed. Good…, our short feedback loop seems to work, defects are detected early and thus can be addressed immediately.
However before granting the request of the Agile Master I wanted to understand better the root-cause of the increase of reds.

Automate everything.

Clearly there is a lot of pressure on development teams to become faster, more efficient and predictable. Over time the software industry addressed this demand a.o. by changing processes and becoming Agile. One of the key principles to mitigate waste is to create short feedback loops. If something is done wrong and detected very fast the waste is limited, simply because it is relatively easy to fix. That’s why we try to test as early as possible to catch defects as early as possible because we all know the relation between the time gap of defect insertion and defect resolution and costs. The earlier a defect is detected and solved the cheaper it is.
Testing as early as possible implies a lot of testing, over and over again. And that’s why we started to automate testing because activities to be repeated over and over again should be automated with clear reporting such that fails are noticed immediately without too much additional effort.

How about all these reds.

Before deciding on adding an extra Test Engineer we needed to have a closer look at the root-cause of these increasing reds in the nightly test run. First question to be asked was whether the reds were caused by indeed regressions in the Software Under Test? Analysis of the reds showed that this was not the case. Failures in test cases, instability in test framework and test infrastructure, not updated test cases clearly caused the increase of reds. A next interesting question to be asked is what percentage of reds is caused by actual regressions in the product code? As it seemed the majority of reds was not caused by regressions in product software but by other reasons.

Over the years automated test bench grew, configurations were added, tests were added, infrastructure changed and apparently more and more effort is needed to maintain all these automated tests and everything related. To be honest this should not be a surprise because test automation is software development. And software is subject to Technical Debt (see Help…, my software rots!). So, your test cases and test environment will be as well.

Gherkin scenarios

Let’s have a closer look at Gherkin scenarios supported by tooling like e.g. SpecFlow and Cucumber.
Gherkin is a business readable language used to describe behavior which can be used for defining executable test cases. It consists out of steps and uses keywords like “Given”, “When” and “Then” to describe a precondition, an action and a result. Each step is associated to a keyword. Scenarios are written in Gherkin and the steps require “glue-code” to address the Software Under Test. Tooling like SpecFlow generates for each step a signature which consists out of a method interface in a specific programming language like e.g. C# or Java. The actual code (glue-code), implementing the step, to address the Software Under Test in the correct way needs to be programmed by the Test Automation Engineer.

Despite Gherkin is a simple language, you still can build a mess… and if not paying attention to good programming practices you will build a mess even in Gherkin.

Therefore even in Gherkin scenarios you might think about defining generic steps to be re-used in multiple test cases and specific steps. Even in Gherkin scenarios you might think about Clean Code principles like using meaningful names, naming conventions and keeping steps small. Once I saw a step in a Gherkin scenario which resulted in a method with more than 15 arguments! Ouch……, what does Clean Code of Robert C Martin say about number of arguments? What if you need to adapt this Gherkin scenario due to a new requirement? Imagine you might have many of these of kind scenarios…..
As you can build a mess in both your Gherkin scenarios as well as your glue-code, test cases contain Technical Debt which might slow you down significantly and result in increasing numbers of reds in your automated test execution.

Even more code.

For testing we use Gherkin scenarios, glue-code, unit tests. All of these are code. But there is even more code, we have e.g. build scripts, configuration code and test framework code. There is a lot of code outside the actual software which is delivered as product or service. And also this code needs to be maintained, also this code needs to be changed as your product is evolving. New components are developed, meaning build scripts to be adapted, new test cases to be created. Existing test cases need to be changed, so there is always work to be done on code. For this reason, for accomplishing a sustainable pace of development, this non-product code needs to be handled in the correct way, in the same way as our product code. Mitigating Technical Debt as much as possible.
Is your test framework actually designed? Or did it grow without any design or structure? Is your test code under version control? Do you apply Clean Code practices on your test code and Gherkin scenario’s? Do you apply static code analysis on your test code? Is your test code reviewed? Is your test infrastructure maintained? Do you track defects in test code? Are your Test Automation Engineers actual Software Engineers?

To summarize, non-product code is code as well. Technical Debt is not only applicable to your product code but to all other code as well. To keep your automated testing up and running without too much effort one needs to apply proper software craftsmanship on all code and not only on the product code. If not, more and more effort need to be spent on analyzing “false-positive” reds resulting in slowing down your regular development until it becomes the final nail in the coffin.

Speed versus Pace

by Ger Cloudt, author of “What is Software Quality?”

Photo by Patrick Robert Doyle on Unsplash

Last Sunday, June-20th 2021, we witnessed maybe one of the most exciting Formula-1 races. A battle between the teams of Mercedes and Red Bull. A race which clearly demonstrated speed versus pace, which can act as a metaphor in software development as management regularly is asking for speed. However, the question should be whether management should ask for speed or pace?

Qualification versus race

In a Formula-1 weekend on Saturday, the qualification is driven to determine the start order of the drivers. The driver who drove the fasted lap in Q3 of the qualification will get the first and best start position; pole-position. Clearly, qualification is about speed, being the fastest over only 1 lap.

However, during the race it is about pace, in the Formula-1 they even call it race-pace. The race last Sunday was run over 53 laps with a total distance of approx. 310 kilometers, taking 1 hour and 27 minutes for the winner Max Verstappen.

The fastest lap during the qualification was driven in 1 minute and 29.99 seconds, the fastest lap during the race was driven in 1 minute and 36.4 seconds, clearly demonstrating the difference between speed and pace.

One of the reasons for the difference in velocity during qualification and the race is tire wear. The condition of the tires is highly determining the grip of the car and therefore highly influences the velocity. With higher velocities the tires will wear out faster, resulting in less grip and a lower velocity. That’s why it is important for the drivers to manage their tires carefully to avoid a strong decline in grip. An additional choice drivers have, is to make a pit-stop and change tires. The time penalty associated is, dependent on the circuit, approx. 25 seconds for each pit-stop.

In last Sundays race, the winner, Max Verstappen, had chosen for a 2-stopper (2 pit-stops) against the 1-stopper of Lewis Hamilton who finished second. When leading the race, closely chased by Hamilton, Verstappen made his second pit-stop to change to a fresh set of tires, giving away the leading position and taking a penalty of approx. 25 seconds. Hamilton continued without additional pit-stop resulting in a slower pace. Verstappen overtook Hamilton 1 lap before the finish.

Software Development

Let’s see whether we can discover a similarity between Formula-1 and software development. Like the tires in a Formula-1 race wear out, software wears out as well over time. Over time, imperfections will creep into your software, called Technical Debt, by which your software wears out. As a consequence complexity of your software will increase and your development velocity will decline. Internal software quality is like the tires of a Formula-1 race car, both wear out causing a decline in velocity.

In software development there are 2 ways in mitigating Technical Debt. The first one is “doing things as they should be”, meaning, applying good engineering practices like proper and thorough requirements engineering, solid design practices, producing Clean Code and testing efficiently and sufficiently to detect defects as early as possible. This way of mitigating Technical Debt can be compared with “carefully driving” in Formula-1 to limit the wearing out of the tires as much as possible. However, “carefully driving” like “doing things as they should be” will have a lower velocity than speeding as fast as you can. Initially you will be faster when speeding or taking short-cuts, but over time your tires or software will be worn out such you will become slower and slower. The Formula-1 car will become slower simply because the grip of the tires is declining and possibly even resulting into a blowout. The software development will become slower simply because the complexity of your software is increasing and possibly even resulting into a situation in which your engineers do not dare to touch the code anymore because they do not understand it anymore.

The second way of mitigating Technical Debt is “correcting imperfections” in the software, meaning to perform restructuring and/or refactoring of your code, solving open defects or add missing test cases. This way of mitigating Technical Debt can be compared with the pit-stop to change the tires in Formula-1. A penalty is taken, in software development all needed effort to perform this refactoring, in Formula-1 the time needed to make the pit-stop. However the result should be less complexity in your software such development velocity increases, in Formula-1 the pit-stop results in a fresh set of tires with high grip and a resulting higher velocity.

Pace versus speed, a balancing act.

In Formula-1 it is a balancing act between speed and pace during a race. Should the team apply for a 2-stopper or a 1-stopper? How “carefully” should the driver drive to mitigate the wear-out of the tires? The same questions can be asked considering mitigating Technical Debt in software development. How much effort should be invested in performing re-designs or refactoring the code? How much effort should be invested in analyzing the requirements, creating the perfect design, producing perfectly Clean Code and testing the maximum? It depends, it depends on how long your race will take. It depends on how long you will need to develop and maintain your software. What is clear, is that software development in general is a long lasting activity which might take years or even decades. That’s why we should focus on achieving a sustainable pace instead of a high speed.

Debugging – scope matters.

by Ger Cloudt, author of “What is Software Quality?”

Image by mohamed Hassan from Pixabay

When considering the core activities of a software development team, typical one would come up with the following activities; requirements engineering, design & modeling, coding & unit testing, functional testing, system testing, acceptance testing, and not to forget configuration management and tool management & support. However, it seems to be that on average a developer creates ~70 bugs per 1000 lines-of-code of which ~15 will find their way to the user of the software[1]. This implies there is another, not yet mentioned, important and time consuming activity to be performed by a software development team; debugging.

Simply stated, debugging are the activities performed to root-cause and resolve a malfunction of the software, or in short, solving that bug. Debugging might be one of the most underestimated activities of a software development team. According to Coralogix[1], solving a bug takes 30 times longer than writing a line of code. Reason enough to have a closer look to debugging, I would argue.

Execution Paths

Source code is executed sequentially by the CPU; the order in which the different statements are executed is referred to as the execution path. The figure below depicts the elementary execution paths for the possible control flows in source code.

Each circle is called a node and represents a statement. The arrows between these nodes are called edges and signify possible control flows. As an example, whenever an if- statement evaluates a condition, it continues in either direction (one of the two edges) depending on the outcome of evaluating the condition. Thus, an if-then or an if-then-else statement has two possible execution paths.

The following figure shows a small program of an algorithm which searches for the smallest minimum number in both sorted arrays in addition to the visualization of the possible execution paths through this program’s code.

Whenever we would execute this program with the following input data:

array_A = {1,2,4,6,8}
array_B = {2,3,5,6,7,8,10,11}

the following execution path would be executed resulting in the correct outcome “2”.
A,B,C,D,E,G,H,J,K,L,D,E,F,K,L,D,M

Let’s insert an error

When there is a bug in the software in most cases a not-expected execution path is executed, like if we would enter an error in the small program above by replacing statement E by

“if (array_A[i] != array_B[j])”

The execution path with mentioned input data would become
A,B,C,D,E,F,K,L,D,M
resulting in the wrong outcome “1”.

Debugging therefore focuses on understanding what execution path is executed in this error situation and where it differs from the expected execution path. To do so, one needs to have insights in the values of the different variables in the code at different stages in the execution to understand why certain decisions during execution are made. Debug tooling may support this process by the ability to set break points, step statement-by-statement through the code and examine values of variables at any required time. Whenever the actual not expected execution path is identified, the engineer needs to understand why this unexpected execution path was executed before the bug can be fixed.

Scope matters

Debugging a bug in the example in this article would be relatively simple due to the scope of the problem. In this program we only have 1 function consisting of 13 lines-of-code taking less than ½ page of paper if printed. A minimum scope which decreases complexity of debugging.

But how about a medium scope consisting of multiple functions like in the following figure? Here we have a visualization of execution paths of 11 functions consisting of 99 lines-of-code, which would take ~3 pages of paper if printed.

Somewhere in this code there is a bug……. As you can imagine it will be more complex to debug and find out where the bug is located, it could be anywhere.

And how about a bug detected during system testing or by a customer while running the complete software program? A large scope consisting of e.g. 100,000 lines-of-code? Thousands of functions and when printed being an equivalent of a pile of books like in the picture below.

Somewhere in this code there is a bug……. It can be anywhere, in any book, in any chapter on any page. You can imagine that debugging in a large scope is much more complex than having to consider a medium scope or, even better, a minimum scope only. As modern software programs are even much bigger than the represented pile of books in the picture, it becomes evident that we should focus on finding bugs in smallest scope as possible. That’s why engineers should focus on unit testing, in which the scope is the smallest one, and try to catch the majority of bugs. Only a few bugs should find their way into component or module testing and even less should find their way into system and acceptance testing, simply because debugging becomes more complex when scope increases.


[1] https://coralogix.com/log-analytics-blog/this-is-what-your-developers-are-doing-75-of-the-time-and-this-is-the-cost-you-pay/

Follow the green-line when tempted to follow the red-line.

by Ger Cloudt, author of “What is Software Quality?”

Organizations are striving for high-productivity, which, in Agile software development, can be expressed as velocity of a team. Teams with a high velocity are considered as being efficient. But is this true? Is it possible that a team with an apparent high velocity ultimately has a lower velocity? Yes, it is.

The cumulative velocity of a team results into what we call an earned value. This is a cumulative representation of all story points earned as a result of done User Stories. An earned value chart often is used in projects to show progress in completion of User Stories. The slope of this earned value chart is determined by the velocity of the team, and guess what…. we would like to see a steep slope, indicating a high velocity, reaching the targeted delivery scope as soon as possible. Here lies a risk, the risk that the team will be tempted to deliver a high velocity by e.g. postponing the resolution of bugs. Because, the later a bug is detected and solved the higher effort is needed, postponing resolution of bugs to later stages of the project will result in a long period of “maturing the system” in which the unresolved bugs need to be addressed. This is what we call following the “red-line”, an apparently high velocity in finishing User Stories succeeded by a relatively long maturing period.

We, as software professionals, however should not be tempted to follow the “red-line”, but we should make sure testing is sufficiently addressed and found bugs are solved fast, resulting in a flatter slope (and thus in an apparently lower velocity) such that the period of “maturing the system” is limited which results in an earlier delivery date! Mitigating the defect resolution gap as much as possible is what we call following the “green-line”.

Another reason to be on and to stick to the green-line is related to risk and uncertainty. In many cases e.g. insufficient testing imposes risk and is difficult to estimate. Which tests are still not executed? How many bugs will result from these tests and how much effort will it take to solve them? Good reasons to test-as-early-as-possible and solve-defects-as-early-possible resulting in a flatter slope of your earned value line but in a significant shorter maturing period and thus earlier delivery of your product. Therefore as a manager you should not ask your team about their velocity, but you should ask whether the team is on the “green-line”!

Trial & Error Programming

by Ger Cloudt, author of “What is Software Quality?”

credit: www.geek-and-poke.com, CC by 3.0

When browsing the internet I came across above cartoon of “geek & poke”. It reminded me of interviews with job applicants in which I noticed that they do not know what they’re doing. “Good coders know what they’re doing…..”, then why is it so hard to find good coders?

It is one of my biggest disappointments that it seems that many people who claim to be a software engineer do not understand coding anymore. Presenting a basic piece of code with some basic pointer algorithm in C is not understood by job applicants who claim to be knowledgeable and experienced in C-programming. Asking them to solve a simple programming assignment brings many of them to despair……..

Wondering whether I would be the only hiring manager in embedded software stumbling across this problem I started some searching on the internet and encountered “why programmers can’t program” from which I learned I am not the only one encountering this problem. Reading blogs on experiences of recruiting and interviewing SW engineers it seems they have the same problems as I have.

Punch Cards

Having some thoughts about how I learned programming myself in the late 70’s and early 80’s reminds me of the “good old days” in which we did not have an editor. No, when I learned programming we had to use punch cards. You had to type your code using a punching machine producing punch cards. Each line-of-code was captured on one punch card, resulting in a pile of punch cards to be delivered to the computer facility and the day after you could examine your results.

Because you needed to wait for 1 day before the results of your program were available there was quite a threshold in getting these results. Because of this threshold you made sure that you did review your code, together with your colleagues, over and over again to be sure you understood your code such your run would be successful. If not you would have lost at least a complete day.

Press F5

What a contrast to nowadays tooling! In modern IDE’s (Integrated Development Environment) you can type and modify your code and pressing e.g. “F5” will compile-build and execute your program instantly.
In case of any syntax errors you will get feedback in less than seconds and if the build succeeded you can execute your program and tests immediately with fast results. You do not need to wait anymore for any results, results are available instantly. Why worry reviewing your code? Why worry understanding your code? The tooling and test cases will immediately inform you about your success or failure. There is no threshold anymore in the possibility to try changes and see whether they bring the wanted result.

Trying to make it as easy and fast as possible to code and check your results is an invitation to “trial & error” and programmers will start rely on the tooling available.

Automated testing

Next to the modern IDE’s, which help us in coding, we usually automate everything we can automate, enabling a sweet execution of test cases detecting regressions as early as possible. A good practice because when a regression is detected early, it is easier (and cheaper) to correct.
Secondly when changes are made to your code or code is refactored your automated test suite also is used to see whether unwanted side-affects or regressions are introduced.
However the problem, again, is that programmers start to rely on this automated test execution, even if you do not completely understand your code and the tests are passed successfully your program is considered to be ok.

The problem lies in the fact that your test suite can never reach a coverage of 100%. Running all possible execution paths through your code is impossible and there will be always happy-, alternative- or sad-flows which are not executed during your automated tests.

That’s why it is important not fully to rely on your tooling and automated tests but still understand your code and know what you ae doing!

Whiteboard programming

And then we ask the job applicant to produce a piece of code performing a simple basic task on the white board. No IDE, no build, no test cases. Explaining and discussing the code produced with the white board marker. To be honest….., a great way in learning to understand coding. Once, one job applicant who failed the exercise, thanked us afterwards expressing he never learned as much in one hour then during the interview.
Therefore, to become a “good coder who knows what he/she is doing”, do not use the IDE, build and test cases for proving your solution right­ but instead review, discuss and understand your code before pressing “F5”.