Sustainable Test-Driven Development: March 2012

Thursday, March 29, 2012

Notice: Test Categories finally finished

This was an edit to an existing blog posting, so you might have missed it... but, we finally did finish part 3 of test categories. Whew!

It's here: http://www.sustainabletdd.com/2012/02/testing-best-practices-test-categories.html

Thursday, March 15, 2012

The Importance of Test Failure

The typical process of Test-Driven Development goes something like this:

Write a test that expresses one required behavior of the system.
Create just enough production code (a “stub”) to allow the test to compile, and fail.
Run the test and watch it fail (red).
Modify the production code just enough to allow the test to pass.
Run the test and watch it pass (green).
Refactor the production code for quality, running the tests as you do so.
Return to Step 1 until all required behaviors are implemented (aka: rinse, repeat).

There are variations (we’ll suggest a few in another blog), but this is fairly representative. We create process steps like this to guide us, but also to create agreement across the team about what we’re going to do and when we’re going to do it. Sometimes, however, it seems unnecessary to follow every step every time, rigidly, when we’re doing something that appears to be simple or trivial.

In particular, developers new to TDD often skip step #3 (Run the test and watch it fail) when it appears completely obvious that the test is going to fail. The attitude here is often something like “it’s silly to run a test I absolutely know is going to fail, just so I can say I followed the process. I’m not a robot, I’m smart, thinking person, and the steps are just a guideline anyway. Running a test that’s obviously going to fail is a waste of my time.”

In general, we agree with the sentiment that we don’t want to blindly follow process steps without thinking. We are also very pragmatic about the value of developer time, and agree that it should not be wasted on meaningless activities.

However, it is absolutely crucial that every test is run in the failure mode before implementation code is created to make it pass. Every time, always, no exceptions. Why do we say this?

First of all, this step really does need to be habitual. We don’t want to have to decide each and every time whether to run the test or not, as this decision-making itself takes time. If it’s a habit, it becomes like breathing in and out; we do it, but we don’t think about it. [1]

Frankly, running the tests should not feel like a big burden anyway -- if it is, then we suspect the tests are running too slowly, and that’s a problem in and of itself. We may not be managing dependencies adequately, or the entity we’re testing may have excessive coupling to other entities in our design, etc...[2] The pain of slow tests is an indicator that we’re making mistakes, and if we avoid the pain we don’t fix the mistakes. Once the tests start feeling “heavy”, we won’t run them as often, and the TDD process will start to gradually collapse.

Running the tests can never have zero cost, but we want the cost to be so low (in terms of time) that we treat it as zero. Good TDD practitioners will sometimes run their tests just because, at the moment, they are not sure what to do next. When in doubt, we run the tests. Why not?

And let’s acknowledge that writing tests takes effort. We want all our effort to be paid back, otherwise it is waste. One place where a test repays us for writing it is whenever it fails. If we’re working on a system and suddenly a test fails, one thing we will surely think is “whoa, we’re glad we wrote that test”... because it just helped us to avoid making a mistake.

If... it can fail.

It is actually very easy (everyone does it eventually) to accidentally write a test which in truth can never fail under any circumstances. This is very bad. This is worse than no test at all. A test which can never fail gives us confidence we don’t deserve, makes us think we’ve clearly specified something about the system when we have not, and will provide no regression coverage when we later need to refactor or enhance the system. Watching the test fail, even once, proves that it can fail and thus has value.

Furthermore, the “surprise” passing of a test can often be a source of useful information. When a test passes unexpectedly we now must stop and investigate why this has happened. There are multiple possibilities:

We’ve written a test that cannot fail, as we said. The test is therefore badly written.
This test is a duplicate of another test already written. We don’t want that.
We got lucky; something in our language or framework already does what we want, we just didn’t know that or we forgot. We were about to write code we don’t need. This would be waste.

So: test failure validates that the test is meaningful and unique, and it also confirms that the code we’re about to write is useful and necessary. For such a simple thing, it provides an awful lot of value.

Also, the practice should really be to run all the tests in step 3, and observe that only the test we just wrote is failing. Similarly, we should really run all the tests in step 5 (Run the test and watch it pass), and observe that all the tests are now green; that the only change was that the test we just wrote went from red to green.

Running all the tests gives us a level of confidence about what we’ve done that simply cannot be replaced by any other kind of certainty. We may think “I know this change could not possibly effect anything else in the system”, but there is nothing like seeing the tests all pass to give us complete certainty. When we have this certainty, we will move faster because we know we have a safety net, and our energy will remain at a relatively high level throughout the day, which will also speed up the development process.

Confidence is a very rare coin in software development. Anything that offers confidence to us is something we want to adhere to. In TDD we are constantly offered moments of confirmation:

The test failing confirms the validity of the test.
The test passing confirms the validity of the system.
The other tests also passing confirms that we have no hidden coupling.
The entire suite passing during refactoring confirms that we are, in fact, refactoring.

Always observe the failing test before you write the code. You’ll be glad you did, and if you don’t you will certainly, eventually, wish you had.

And, finally, here is a critical concept that will help you remember all of this:

In TDD, success is not getting to green. Success is the transition from red to green.

Therefore, without seeing the red we cannot succeed.
-----

[1] And we apologize for the fact that you are now conscious about your breathing. It’ll pass.

[2] We will dig into the various aspects of TDD and its relationship to design and code quality in another blog. For now, we’ll just stipulate the correlation.

Friday, March 2, 2012

ATDD and TDD

Download the Podcast Part 1
Download the Podcast Part 2

A question that we are often asked is: “What is the difference between Acceptance Test Driven Development (ATDD) and Test Driven Development (TDD)?” These two activities are related by name but otherwise seem to have little to do with each other.

ATDD is a whole-team practice where the team members discuss a requirement and come to an agreement about the acceptance criteria for that requirement. Through the process of accurately specifying the acceptance criteria -- the acceptance test -- the team fleshes out the requirement, discovering and corroborating the various assumptions made by the team members and identifying and answering the various questions that, unanswered, would prevent the team from implementing or testing the system correctly.

The word acceptance is used in a wide sense here:

The customer agrees that if the system, which the team is about to implement, fulfills the acceptance criteria then the work was done properly
The developers accept the responsibility for implementing the system
The testers accept the responsibility for testing the system

This is a human-oriented interaction that focuses on the customer, identifying their needs. These needs are specified using the external, public interfaces of the system.

TDD, on the other hand is a developer-oriented activity designed to assist the developers in writing the code by strict analysis of the requirements and the establishment of functional boundaries, work-flows, significant values, and initial states. TDD tests are written in the developer’s language and are not designed to be read by the customers. These tests can use the public interfaces of the system, but are also used to test internal design elements.

We often see the developers take the tests written through the ATDD process and implement them with a unit testing framework.

Requirements from the customer

Before we continue, we need to ask ourselves -- what is a requirement? It is something that the customer needs the system to do. But who is the customer?

In truth, every system has more than one customer... dozens at times:

Stakeholders
End users, of different types
Operators
Administrators (DB, network, user, storage)
Support (field, customer, technical)
Sales, marketing, legal, training
QA and developers (e.g., traces and logs, simulators for QA)
etc...

All requirements coming from all of these different customers must be addressed, identified and expressed through the ATDD process. For example:

The legal department needs an End User Legal Agreement (EULA) to be displayed when the software is first run, and for the end user to check off the agreement before the system can be used. This is of no interest to the end users (who we sometimes think of as ‘the customers’), in fact might be an annoyance to them, but is required for the system to be acceptable to the lawyers.
The production support team needs all error messages in the system to be accompanied by error codes that can be reported along with the condition that caused the error. Here again, end users are not interested in these codes, but they can be crucial for the system to be acceptably supported.

And let us not forget the the developers are customers too, who else do we build tracers and loggers for? This is an obvious, publicly visible facet of the developer’s work. But when do we need these facilities? When we try to fix bugs. When we want to understand how the system works. When we work on the system for any reason.

In other words, when we do maintenance to the system.

Maintainability is a requirement

We need our maintenance to be as easy as possible. No car owner would like to disassemble the car’s engine just to change a windshield wiper; nor would they want to worry that by changing a tire they have damaged the car’s entertainment system.

Maintainability is a crucial requirement for any software system. Software system maintenance should be fast, safe and predictable. You should be able to make a change fast, without breaking anything, and you need to be able to tell me reliably how long it will take. We expect this of our car mechanic as well as our software developer. So although maintainability is primarily the concern of the developer it definitely affects the non-technical customers.

The way maintainability manifests itself in software is through design. Design principles are to developers as mathematics is to physicists. It’s the basis of everything that we do. If we do not pay attention to the system’s design as it is developed,it will quickly become unmanageable.

How often, however, have you seen “maintainability” as a requirement? We’ve never seen it. We call it the “hidden requirement.” It’s always there but no one talks about it. And because we don't talk about it, we forget it about it; we focus on fulfilling the written requirements thinking that we will be done when we complete them. And very quickly, the system turns very hard and unsafe to change. We are accumulating technical debt, which we could just call “the silent killer.”

If maintainability is such a crucial requirement, where is the acceptance criteria for it? Who is the customer for this requirement? The development team.

We need to prove to the customer that the design as was perceived was implemented, and that this design is in fact maintainable, that the correct abstractions exist, that object factories do what they are supposed to, that the functional units operate the way they should, etc...

Indeed there is something that we can do, in the developers’ own language -- code -- that does precisely these things. It’s TDD.

One key purpose of TDD is to prove and document the design of the system, hence proving and documenting its maintainability.

TDD is developer-facing ATDD

ATTD is about the acceptability of the system to its various customers. When the specific customer is the development team then the tests are about the acceptability of the system’s design and resulting maintainability. Our focus in this work is acceptability in this sense: is the design acceptable? Is our domain understanding sufficient and correct? Have we asked enough questions, and were they the right ones? A system that fails to meet these acceptance criteria will quickly become too expensive to maintain and thus will fail to meet the needs of those who use it.

Software that fails to meet a need is worthless. It dies. So, here again, failing to pass the “maintainability” acceptance criteria is the silent killer. TDD is the answer to this ailment.

Note to readers: This was a philosophical treatise. Specific, practical examples abound and will constitute much of our work here, so, read on.

Net Objectives

Pages