Sustainable Test-Driven Development: The Importance of Test Failure

The typical process of Test-Driven Development goes something like this:

Write a test that expresses one required behavior of the system.
Create just enough production code (a “stub”) to allow the test to compile, and fail.
Run the test and watch it fail (red).
Modify the production code just enough to allow the test to pass.
Run the test and watch it pass (green).
Refactor the production code for quality, running the tests as you do so.
Return to Step 1 until all required behaviors are implemented (aka: rinse, repeat).

There are variations (we’ll suggest a few in another blog), but this is fairly representative. We create process steps like this to guide us, but also to create agreement across the team about what we’re going to do and when we’re going to do it. Sometimes, however, it seems unnecessary to follow every step every time, rigidly, when we’re doing something that appears to be simple or trivial.

In particular, developers new to TDD often skip step #3 (Run the test and watch it fail) when it appears completely obvious that the test is going to fail. The attitude here is often something like “it’s silly to run a test I absolutely know is going to fail, just so I can say I followed the process. I’m not a robot, I’m smart, thinking person, and the steps are just a guideline anyway. Running a test that’s obviously going to fail is a waste of my time.”

In general, we agree with the sentiment that we don’t want to blindly follow process steps without thinking. We are also very pragmatic about the value of developer time, and agree that it should not be wasted on meaningless activities.

However, it is absolutely crucial that every test is run in the failure mode before implementation code is created to make it pass. Every time, always, no exceptions. Why do we say this?

First of all, this step really does need to be habitual. We don’t want to have to decide each and every time whether to run the test or not, as this decision-making itself takes time. If it’s a habit, it becomes like breathing in and out; we do it, but we don’t think about it. [1]

Frankly, running the tests should not feel like a big burden anyway -- if it is, then we suspect the tests are running too slowly, and that’s a problem in and of itself. We may not be managing dependencies adequately, or the entity we’re testing may have excessive coupling to other entities in our design, etc...[2] The pain of slow tests is an indicator that we’re making mistakes, and if we avoid the pain we don’t fix the mistakes. Once the tests start feeling “heavy”, we won’t run them as often, and the TDD process will start to gradually collapse.

Running the tests can never have zero cost, but we want the cost to be so low (in terms of time) that we treat it as zero. Good TDD practitioners will sometimes run their tests just because, at the moment, they are not sure what to do next. When in doubt, we run the tests. Why not?

And let’s acknowledge that writing tests takes effort. We want all our effort to be paid back, otherwise it is waste. One place where a test repays us for writing it is whenever it fails. If we’re working on a system and suddenly a test fails, one thing we will surely think is “whoa, we’re glad we wrote that test”... because it just helped us to avoid making a mistake.

If... it can fail.

It is actually very easy (everyone does it eventually) to accidentally write a test which in truth can never fail under any circumstances. This is very bad. This is worse than no test at all. A test which can never fail gives us confidence we don’t deserve, makes us think we’ve clearly specified something about the system when we have not, and will provide no regression coverage when we later need to refactor or enhance the system. Watching the test fail, even once, proves that it can fail and thus has value.

Furthermore, the “surprise” passing of a test can often be a source of useful information. When a test passes unexpectedly we now must stop and investigate why this has happened. There are multiple possibilities:

We’ve written a test that cannot fail, as we said. The test is therefore badly written.
This test is a duplicate of another test already written. We don’t want that.
We got lucky; something in our language or framework already does what we want, we just didn’t know that or we forgot. We were about to write code we don’t need. This would be waste.

So: test failure validates that the test is meaningful and unique, and it also confirms that the code we’re about to write is useful and necessary. For such a simple thing, it provides an awful lot of value.

Also, the practice should really be to run all the tests in step 3, and observe that only the test we just wrote is failing. Similarly, we should really run all the tests in step 5 (Run the test and watch it pass), and observe that all the tests are now green; that the only change was that the test we just wrote went from red to green.

Running all the tests gives us a level of confidence about what we’ve done that simply cannot be replaced by any other kind of certainty. We may think “I know this change could not possibly effect anything else in the system”, but there is nothing like seeing the tests all pass to give us complete certainty. When we have this certainty, we will move faster because we know we have a safety net, and our energy will remain at a relatively high level throughout the day, which will also speed up the development process.

Confidence is a very rare coin in software development. Anything that offers confidence to us is something we want to adhere to. In TDD we are constantly offered moments of confirmation:

The test failing confirms the validity of the test.
The test passing confirms the validity of the system.
The other tests also passing confirms that we have no hidden coupling.
The entire suite passing during refactoring confirms that we are, in fact, refactoring.

Always observe the failing test before you write the code. You’ll be glad you did, and if you don’t you will certainly, eventually, wish you had.

And, finally, here is a critical concept that will help you remember all of this:

In TDD, success is not getting to green. Success is the transition from red to green.

Therefore, without seeing the red we cannot succeed.
-----

[1] And we apologize for the fact that you are now conscious about your breathing. It’ll pass.

[2] We will dig into the various aspects of TDD and its relationship to design and code quality in another blog. For now, we’ll just stipulate the correlation.

3 comments:

Grzegorz GałęzowskiMay 12, 2012 at 7:06 AM
Hi, guys!

I wanted to comment on this post, but the comment grew so big, that I made it a blog post by itself. It's at
http://feelings-erased.blogspot.com/2012/05/test-first-why-is-it-so-important-in.html

Have a good weekend!
Amir KolskyJuly 2, 2012 at 2:57 AM
Great blog, Grzegorz!
Grzegorz GałęzowskiOctober 12, 2012 at 8:13 AM
Thanks, Amir! If this blog is what you're thinking in process of writing the book, the blog I created is actually what I'm thinking in process of applying what I learn from you (both this blog and your Emergent Design class I took part in last year) in my work environment.

So in one way it's a "feedback from the battlefield" :-), and in another - a pay-back from me for the enormous effort you put into teaching us TDD, as I think all feedback will be helpful in writing a great book (which, by the way, I'll definitely buy).

Net Objectives

Pages

Thursday, March 15, 2012

The Importance of Test Failure

3 comments: