Monday, June 25, 2012

5' on IT-Architecture: four laws of robust software systems

Murphy's Law ("If anything can go wrong, it will") was born at Edwards Air Force Base in 1949 at North Base. It was named after Capt. Edward A. Murphy, an engineer working on Air Force Project MX981, (a project) designed to see how much sudden deceleration a person can stand in a crash. One day, after finding that a transducer was wired wrong, he cursed the technician responsible and said, "If there is any way to do it wrong, he'll find it."

For that described reason it may be good to put some quality assurance process in place. I could also call this blog "the four laws of steady software quality". It's about some fundamental techniques that can help to achieve superior quality over a longer distance. This is particularly important if you're developing some central component that will cause serious damage if it fails in production. OK, here is my (never final and not holistic) list of practical quality assurance tipps.

Law 1: facilitate change

There is nothing permanent except change. If a system isn't designed in accordance to this superior important reality, then the probability of failure may increase above average. A widely used technique to facilitate change is the development of a sufficient set of unit tests. Unit testing enables to uncover regressions in existing functionality after changes have been made to a system. It also encourages to really think about the desired functionality and required design of the component under development.

Law 2: don't rush through the functional testing phase

In economics, the marginal utility of a good is the gain (or loss) from an increase (or decrease) in the consumption of that good. The law of diminishing marginal utility says, that the marginal utility of each (homogenous) unit decreases as the supply of units increases (and vice versa). The first functional test cases often walk through the main scenarios covering the main paths of the considered software. All the code tested wasn't executed before. These test cases have a very high marginal utility. Subsequent test cases may walk through the same code ranges except specific sidepaths at specific validation conditions for instance. These test cases may cover three or four additional lines of code in your application. As a result, they will have a smaller marginal utility then the first test cases.

My law about functional testing suggests: as long the execution of the next test case yields a significant utility the following applies: the more time you invest into testing the better the outcome! So don't rush through a functional testing phase and miss out some useful test case (this assumes the special case in which usefulness can be quantified). Try to find the useful test cases that promise a significant gain in perceptible quality. On the other hand, if you're executing test cases with a negative marginal utility you're actually investing more effort then you gain in terms of perceptible quality. There is this special (but not uncommon) situation where the client does not run functional tests on systematic bases. This law will then suggest: the longer the application is in the test environment, the better the outcome.

Law 3: run (non-functional) benchmark tests

Another peace of good permanent software quality is a regular load test. To make results usable load tests need a defined steady environment and a baseline of measured values (a benchmark). These values are at least: CPU, response time, memory footprint. Load tests of new releases can be compared to those load tests of older releases. That way we can also bypass the often stated requirement that the load test environment needs to have the same capacity parameters then the production environment. In many cases it is possible to see the real big issues with a relatively small set of parallel users (e.g. 50 users).

It makes limited sense to do load testing if single user profiling results are bad. Therefore it's a good idea to perform repeatable profiling test cases with every release. This way profiling results can be compared to each other (again: the benchmark idea). We do CPU and elapsed time profiling as well as memory profiling. Profiling is an activity that runs in parallel to actual development. It makes sence to focus on the main scenarios used regularly in production.

Law 4: avoid dependency lock-in

The difference between trouble and severe crisis is the time it takes to fix the problem that causes the trouble. For this reason you may always need a way back to your previous release, you need a fallback scenario to avoid a production crisis with severe business impact. You enable rollback by avoiding dependency lock-in. Runtime-dependencies of your application may exist to neighbouring systems by joint interface or contract changes during development. If you implemented requirements that resulted in changed interfaces and contracts, then you cannot simply roll back, that's obvious. Therefore you need to avoid too many interface and contract changes. Small release cycles help to reduce dependencies between application versions in one release 'cause less changes are rolled to production. Another counteraction against dependency lock-in is to let neighbouring systems be downwoards compatible for one version.

That's it in terms of robust systems.
Cheers, Niklas


  1. As a sales engineer, I have seen corporations spend an inordinate amount of time worrying about dependency lock-ins only to then purchase a component because it appeals to their current needs. In other words, we are always locked into some feature, starting from the OS and the device on which the app is running.

    1. This misses the point I think. Dependency lock-in happens when application A is deployed and it requires application B in a new version. If application B fails and you need to undeploy and roll back to the previous version, then application A will not function properly. That means you also need to undeploy appliation A. What I am suggesting is to make an effort to avoid such a scenario.