Net Objectives

Net Objectives
If you are interested in coaching or training in ATDD or TDD please click here.

Wednesday, September 23, 2015

TDD and Its (at least) 5 Benefits

Many developers have concerns about adopting test-driven development, specifically regarding:
  • It's more work.  I'm already over-burdened and now you're giving me a new job to do.
  • I'm not a tester.  We have testers for testing, and they have more expertise than I do.  It will take me a long time to learn how to write tests as well as they do.
  • If I write the code, and then test it, the test-pass will only tell me what I already know: the code works.
  • If I write the test before the code the failing of the test will only tell me what I already know: I have not written the code yet.
Here we are going to deal with primarily the first one:  It's going to add work.

This is an understandable concern, at least at initially, and it is not only the developers that express it.  Project managers will fear that the team's productivity will decrease, which they are accountable for.  Project sponsors fear that the cost of the project will go up if the developers end up spending a fair amount of their time writing tests.  The primary cost of creating software is developer time.

The fact is, TDD is not about adding new burdens to the developers, but rather it is just the opposite: TDD is about gaining multiple benefits from a single activity.

In the test-first activity developers are not really writing tests.  They look like tests, but they are not (yet).  They are an executable specification (this is a critical part of our redefinition of TDD entry).  As such, they do what specifications do: they guide the creation of the code.  Traditional specifications, however, are usually expressed in some colloquial form, perhaps a document and/or some diagrams.  Communication in this form can be very lossy and easy to misinterpret.  Missing information can go unnoticed.

For example, one team decided to create a poker game as part of their training on TDD.  Often an enjoyable project is good when learning as we tend to retain information better when we're having a good time.  Also, these developers happened to live and work in Los Vegas. :) Anyway, it was a contrived project and so the team came up with the requirements themselves; basically the rules of poker and the mechanics of the game.  One requirement they came up with was "the system should be able to shuffle the deck of cards into a reordered state."  That seemed like a reasonable thing to require until they tried to write a test for it.  How does one define "reordered?"  One developer said "oh, let's say at least 90% of the cards need to be in a new position after the shuffle completes."  Another developer smiled and said "OK, just take the top card and put in on the bottom.  100% will be in a new position.  Will that be acceptable?"  They all agreed it would not.  This seemingly simple issue ended up being more complicated than anyone had anticipated.

In TDD we express the specification in actual test code, which is very unforgiving.  One of the early examples of this for us was the creation of a Fahrenheit-to-Celsius temperature conversion routine.  The idea seemed simple: take a measurement in Fahrenheit (say 212 degrees, the boiling point of water at sea level), and convert it to Celsius (100 degrees).  That statement seems very clear until you attempt to write a unit test for it, and realize you do not know how accurate the measurements should be.  Do we include fractional degrees?  To how many decimal places?  And of course the real question is what is this thing going to be used for?  This form of specification will not let you get away with not knowing because code is exacting like this.

Put another way, a test would ask "how accurate is this conversion routine?"  A specification asks "how accurate does this conversion routine need to be" which is of course a good question to ask before you attempt to create it.

The first benefit of TDD is just this: it provides a very detailed, reliable form of something we need to create anyway, a functional specification.

Once the code-writing beings, this test-as-specification serves another purpose.  Once we know what needs to be written, we can begin to write it with a clear indication of when we will have gotten it done.  The test stands as a rubric against which we measure our work.  Once it passes, the behavior is correct.  Developers quickly develop a strong sense of confidence in their work once they experience this phenomenon, and of course confidence reduces hesitancy and tends to speed us up.

The second benefit of TDD is that it provides clear, rapid feedback to the developers as they are creating the product code.

At some point, we finish our work.  Once this happens the suite of tests that we say are not really tests (but specifications) essentially "graduate" into their new life: as tests, in the traditional sense.  This happens with no additional effort from the developers.  Tests in the traditional sense are very good to have around and provide three more benefits in this new mode...

First, they guard against code regression when refactoring.  Sometimes code needs to be cleaned up either because it has quality issues (what we call "olfactoring"[1]), or because we are preparing for a new addition to the system and we want to re-structure the existing code to allow for a smooth introduction of the enhancement.  In either case, if we have a set of tests we can run repeatedly during the refactoring process, then we can be assured that we have not accidentally introduced a defect.  Here again, the confidence this yields will tend to increase productivity.

The third benefit is being able to refactor existing code in a confident and reassured fashion.

But also, they provide this same confirmation when we actually start writing new features to add to an existing system.  We return to test-as-specification when writing the new features, with the benefits we've already discussed, but also the older tests (as they continue to pass) tell us that the new work we are doing is not disturbing the existing system. Here again, allows us to be more aggressive in how we integrate the newly-wanted behavior.

The fourth benefit is being able to add new behavior in this same way.

But wait, there's more!  Another critical issue facing a development team is preventing the loss of knowledge.  Legacy code often has this problem:  the people who designed and wrote the systems are long gone, and nobody really understands the code very well.  A test suite, if written with this intention in mind, can capture knowledge because we can consider it any time to be "the spec" and read it as such. 

There are actually three kinds of knowledge we need to retain.
  1. What is the valuable business behavior that is implemented by the system?
  2. What is the design of the system?  Where are things implemented?
  3. How is the system to be used?  What examples can we look at? 
All of this knowledge is captured by the test suite, or perhaps more accurately, the specification suite.  It has the advantage over traditional documentation of being able to be run against the system to ensure it is still correct.

So the fifth benefit is being able to retain knowledge in a trustworthy form.

Up do this point we've connected TDD to several critical aspects of software development:
  1. Knowing what to build (test-first, with the test failing)
  2. Knowing that we built it (turning the test green)
  3. Knowing that we did not break it when refactoring it (keeping the test green)
  4. Knowing that we did not break it when enhancing/tuning/extending/scaling it (keeping the test green)
  5. Knowing, even much later, what we built (reading the tests after the fact)

All of this comes from one effort, one action.

And here's a final, sort of fun one:  Have you ever been reviewing code that was unfamiliar to you... perhaps written by someone else or even by you a long time ago, and you come across a line of code that you cannot figure out.  "Why is this here?   What is it for?  What does it do?  Is it needed?"  One can spend hours poring over the system, or trying to hunt down the original author who may herself not remember.  It can be very annoying and time-consuming.

If the system was created using TDD, this problem is instantly solved.  Don't know what a line of code does?  Break it, and run your tests.  A test should fail.  Go read that test.  Now you know.

Just don't forget to Crtl-Z. :)

But what if no test fails?  Or more than one test fails?  Well, that's why you're reading this blog.  For TDD to provide all these benefits, you need to do it properly...

[1] We'll add a link here when we've written this one

Tuesday, January 20, 2015

TDD and Defects

We've said all along that TDD is not really about "testing" but rather about creating an executable form of specification that drives development forward.  This is true, and important, but it does not mean that TDD does not have a relationship to testing.  One interesting issue where there is significant synergy is in our relationship to defects.

Two important issues we'll focus on are: when/how a defect becomes known to us, and the actions we take at that point.

Time and Development

In the cyclic nature of agile development, we repeatedly encounter various points in time when we may discover that something is not right.  First, as we are writing the source code itself most modern tools can let us know that something is not the way we intended it to be.  For example when you end a method with a closed-curly-brace a good IDE will underline or otherwise highlight any temporary method variables that you created but never used.  Obviously if you created a variable you intended to use it so you must have done something other than you meant to.  Or, if you type an object reference name and then hit the dot, many IDE's will bring up a list of methods available for you to call on that type.  If the list does not appear then something is not right.

When compiling the source into the executable we encounter a number of points in time where the technology can check our work.  The pre-compiler (macros, if-defs, #defines), the compiler, the linker (resolving dependencies), and so forth.

And there are run-time checks too.  The class loader, generic type constraints, assertions of preconditions and postconditions, etc..  Various languages and technologies provide different levels of these services and they all can be "the moment" where we realize that we made an error that has resulted in a defect.

Detection vs. Prevention

Defects are inevitable and so we have to take action to either detect them or to prevent them.  Let's say for example that you have a method that takes as its parameters the position of a given baseball player on a team, and his jersey number, and then adds the player to a roster somewhere.  If you use an integer to represent the position (1 = Pitcher 2 = Catcher and so forth) then you will have to decide what to do if another part of the system incorrectly calls this method with something below 1 or above 9.  That would be a defect that the IDE/compiler/linker/loader would not find, because an int is type-safe for all values from minint to maxint [1].  So if the method was called with a 32, you'd have to put something in the code to deal with it: 32 mod 9 to determine what position that effectively is (Third Base if you're curious), correct the data (anything above 9 is reduced to 9, below 1 becomes 1), return a null, throw an IllegalPositionException to raise the alarm... something.  Whatever the customer wants.  Then you'd write a failing test first to drive it into the code.

If, however, you chose not to use an int, but rather create your own type with its own constraints... for example, an enumeration called PLAYER with members PITCHER, CATCHER, SHORTSTOP, etc... then a defect elsewhere that attempted to pass in PLAYER.QUARTERBACK would not compile and therefore would never make it into production.  We can think of this as defect prevention even though it isn't really, it's just very early detection.  But that vastly decreases the cost of repair.

Cost of Delays

The earlier you find the bug, the cheaper it is to fix.  First of all, the issue is fresher in your mind and thus you don't have to recapitulate the thought process that got you there.   It's less likely that you'll have more than one bug to deal with at a time (late detection often means that other bugs have arisen during the delay, sometimes bugs which involve each other) which means you can focus.  Also, if you're in a very short cycle then the defect is something you just did, which makes it more obvious.

The worst time to find out a defect exists, therefore, is the latest time.  It is when the system is operating either in the QA department's testing process or especially when actually in use by a customer.  When QA finds the bug it's a delayed find.  When a customer finds the defect it's further delayed but it also means:
  1. The customer's business has suffered
  2. The product's reputation is tarnished
  3. Your organization's reputation is tarnished
  4. It is personally embarrassing to you
  5. And, as we said, the cost to fix will be much higher
In a perfect world this would never happen, of course, but the world is complex and we are prone to errors.

TDD and Time

In TDD we add another point in time when we can discover an error: test time.  Not QA's testing but developer test time, test we run and thus create our own non-delayed moment of run time.  Tests execute the system so they have the same "experience" as QA or a customer, but since we run them very frequently they represent a faster and more granular defect indication.

You would prefer to prevent all defects from making into runtime, of course.  But you cannot.  So a rule in TDD is this: any defect that cannot be prevented from getting into production must have a specification associated with it, and thus a test that will fail if the spec is not followed.

Since we write the tests as part of the code-writing process and if we adhere perfectly to the TDD rule that says "code is never put into the source without a failing test that requires it"... and if we see the test fails until the code is added which then makes it pass... then we should never have code that is not covered (and meaningfully so [2]) by tests.  But here we're going to make mistakes too.  Our good intentions will fall afoul of the forces they always do; fatigue, misunderstandings, things we forget, bad days and interruptions, the fat-fingered gods of chaos.

With TDD as your process certainly far fewer defects will make it into the product, but it it will still happen from time to time.  But what that will mean will be different.

TDD and Runtime Defects

Traditionally a bug report from outside the team is placed into a tracking system and addressed in order of priority, severity, in the order they are entered, something along those lines.  But traditionally addressed means fixed.  This is not so in TDD.

In TDD a bug reported from production is not really a bug... yet.  Because if all of our tests are passing and if our tests are the specification of the system, this means the code is performing as specified.  There is no bug.  But it is not doing what the customer wants so it is the specification that must be wrong: we have a missing test.

Therefore fixing the problem is not job #1; adding the missing test is.  In fact, we want the defect in place so that when we 1) figure out what the missing test was and 2) add it to the suite we can 3) run it and see it fail.  Then and only then we fix the bug and watch the new test go green, completely proving the connection between the test and the code, and also proving that the defect in question can never make it into production again. 

That's significant.  The effort engaged in traditional bug fixing is transitory; you found it and fixed it for now, but if it gets back in there somehow you'll have to find it and fix it again.   In TDD the effort is focused more on adding the test, and thus it is persistent effort.  You keep it forever.

Special Cases

One question that may be occurring to you is "what about bad behavior that gets into the code that really is not part of the spec and should never be?"  For an example in the case of our baseball-player-accepting method above, what if a developer on the team adds some code that says "if the method gets called with POSITION.PITCHER and a jersey number of exactly 23, then add them to the roster twice."  Let's further stipulate that no customer asked for this, it's simply wrong.

Could I write a test to guard against that?  Sure; the given-when-then is pretty clear:

Given: a pitcher with jersey number 23
            an empty roster

When: the pitcher is passed into method X once

Then: a pitcher with jersey number 23 will appear once in the roster

But I shouldn't.  First of all, the customer did not say anything about this scenario, and we don't create our own specifications.  Second, where would that end?  How many scenarios like that could you potentially dream up?  Combinations and permutations abound. [3]

The real issue for a TDD team in the above example is how did that code get into the system anyway?  There was no failing test that drove it.  In TDD adding code to the system without a failing test is a malicious attack by the development team on their own code.  If that's what you're about then nothing can really stop you.

So the answer to this conundrum is... don't do that.  TDD does not work, as a process, if you don't follow its rules in a disciplined way.  But then again, what process would?


[1] You might, in fact, have chosen to do this because the rules of baseball told you to:

[2] What is "non-meaningful coverage"?  I refer you to:

[3] I am not saying issues never arise with special cases, or that it's wrong to speculate; sometimes we discover possibilities the customer simply didn't think of.  But the right thing to do when this happens is go back to the customer and ask what the desired behavior of the system should be under circumstance X before doing anything at all.  And then write the failing test to specify it.

Monday, January 19, 2015

Welcome Max Guernsey

Max has joined Net Objectives, as some of you may know, as a trainer, coach, and mentor.  We've been friends with Max for a long while, and he has been a contributor to this blog and to the progress of our thinking in general.

So, we're adding him to the official authorship here and when (if ever :)) we get this thing written, he will be co-author with Amir and I.

I know this has been terribly slow going, but hopefully with another hand at the oars we can pick up the pace.


Tuesday, October 21, 2014

TDD and Asychronous Behavior: Part 2

In  part 1, we discussed the benefits of separating out the code that ensures mutex (in this case, using thread locks) from the code that provides core behavior using a Synchronization Proxy.  The core behavior can be tested in a straightforward, single-threaded way.  What remains in terms of TDD and asynchronous behavior is how to effectively specify/test the Synchronization Proxy.

Testing the Synchronization Proxy

You might be saying “the proxy class is so simple, I’m not sure I’d need to drive its behavior from a specification/test.  All it does is take the lock and delegate” The level of rigor in your specifications is always a judgment call, so we’ll set aside whether a given proxy behavior needs a test. We’re going to focus on how to write such a test it in the case where you wish to. In other words: if you decide not to include it in the specification we want it to be because you decided not to, not because you didn’t know how.[3]

The given-when-then layout of the specification would be something along these lines:


    Threads A and B are running
    Thread A is running code T


    Thread B attempts to run code T


    Thread B will wait until Thread A is done: the accesses with be serial, not parallel.

The key here is the word “until”. What the test needs to drive/specify/ensure is that the timing is right, that Thread B writes *after* Thread A even if Thread A takes a long time. Let’s look at an implementation sequence diagram.

Client A and Client B are inner classes of the test, created just to exercise the proxy, each in its own thread.  If the proxy did not add the synchronization behavior, the writes to Target would be 2, and then 1, because we tell the Target to wait 10 seconds before writing the state for Client A, but only 1 second for Client B.  If the proxy prevents this (proper behavior) then the writes will be 1, and then 2, because Client B couldn’t get the access until Client A was finished.

This is a partial solution, but it begs a few questions.
  1. How does the test get RealTarget to wait these different amounts of time?
  2. How does the test assert the sequence of these writes is 1, 2?
  3. If the RealTarget “waits 10 seconds” won’t the test execution be horribly slow?
The first two questions are answered by replacing RealTarget with a Mock Object[4]. Remember, we are not specifying RealTarget here, we are specifying the proxy’s behavior, therefore RealTarget must be controlled by the test. A mock allows this.

What about the time issue? Well, time is in scope and we certainly are not testing that time works. So we have to control it in the test as well.

Here’s the implementation sequence diagram with the mock object in place of RealTarget, and another object that replaces time.

Time is a simulator, which can be told by the test to “be” at any time we want. MockTarget basically calls on Time and says “let me know when we’ve reached or passed second x”. We use the Observer Pattern [5] to implement this. The first time the mock is called it will ask Time to notify it when 10 seconds have passed. The second time, it will ask for a 1 second notification. We do this with a simple conditional.

Furthermore MockTarget maintains a log of all calls made to it, in order, which the test can ask for to determine the sequence of setX() calls and assert that it is 1, 2 rather than 2, 1.

10 and 1 are not significant numbers, so as you’ll see in the code we made constants “longWait” and “shortWait” to be used in the test. It’s only important that the first thread waits longer than the second, and since time itself is being simulated anyway the “actual” lengths of time are unimportant. We can pretend they are one year and one hundred years if you want. It’s nice to control time. :)

MockTarget, Time, and the ClientA and ClientB objects are all part of the test, and so a good practice is to make them private inner classes of the test. Also the Observer interface and all constants used in this test are similarly part of the test itself. Remember, a test tests everything which is in scope but which the test does not control. The only thing not controlled by the test is the Synchronization Proxy.

We’ve coded all this up in C#. Click here to download the visual studio project.

[3] Perhaps later we’ll make our argument about whether you should or not. :)
[4] If you don't know about the Mock Object Pattern, visit this link:
[5] For more details on the Observer Pattern, visit this link: