Sustainable Test-Driven Development: November 2011

Monday, November 28, 2011

Redefining Test-Driven Development, Pt. 2

Download the Podcast

In part 1 we said “How you do something new is often influenced to a great extent by what you think you are doing.” Let’s add that, similarly, changing the way you do something you are already doing can come from a new understanding of its nature.

Something development teams already do (or, in our opinion, really should be doing) is to write a specification of the system before they create it. This specification comes from an analysis of requirements, and reflects the development team’s understanding of the business value of the system from the customer’s perspective and the technology used to create the solution. “The spec” is then referred to throughout the development process as fundamental guidance for everything the team does.

Specifications have great value; this value, however, is not persistent.

Let’s say you created a specification in a traditional way: you wrote a document, embedded some design diagrams charts and graphs, and so forth. This would form an artifact that expressed your understanding of the system.

Let’s further say that you used this specification to work from, completed the development process, released the system, and moved on.

Now, eighteen months later, the customer wants to make changes to the system. You’ve been away from the system for quite a while, and you’re fuzzy on the details, so job one is to re-acquaint yourself with it. Should you re-read that specification you created way back when? You could, but how do you know it is still accurate? Someone could easily have made changes to the system and not updated the spec accordingly.

We all know we should not do that, but as a practical matter it happens all the time. People make changes with limited time and resources, and under pressure... and often they simply neglect the spec entirely, or they update it incompletely or incorrectly.

And even if you don’t have any reason to suspect this has happened, how can you know, really know for sure, that it has not? The only way is to examine the system in detail and compare it to the spec. If you have to do this, they what good did having the written spec really do you?

So, consider this, a typical unit test:

// pseudocode
public class AccountTest {
    public void testAccountAmortizesCorrectly() {
        double value = Any.value();
        int term = Any.term();
        int yearToWriteOff = Any.yearUpTo(term);

        Account testAccount = new Account(value, term);
        double expectedAmount = max(value/term, 100.00);

        double actualAmount = testAccount.amortize(yearToWriteOff);

        assertAreEqual(expectedAmount, actualAmount, 1);
    }
}

Look closely. What does this tell you?

There is an object called Account that can amortize itself
Account takes a value and a term via its constructor
Value is double, term is int, and neither are constrained (“Any”) [1]
Amortize means “write off”
All years amortize in the same way (“Any” again)
You call an amortize() method and pass the year to write off (an int) to it
The way you know how much to write off is value/term, but no more than 100.00
We do not care about pennies (the tolerance for the assertion is 1)

Would you not say that this could serve, at least for the development team, as a specification? It tells you how the system should work, how it is structured, the API specifics (both constructor and public methods), etc... everything that a traditional spec would record.

Compare now, in the scenario where you’re coming back eighteen months later, this kind of specification to the document you would normally create. You can run this “unit test” immediately, watch it compile (the API’s have not changed if it does), watch it pass (the behavior of the system has not changed if it does), and thus confirm that it is still accurate with no effort at all. If we then further stipulate that every behavior of the system has a test like this, and we can run them all with a single click of the mouse, then we know our test suite is accurate to the code. Now run your code coverage measurement... is it 100%? Now you know that there is no additional behavior that has been added by someone else without that person adding such a unit test.

So, in TDD we do not write tests. We write specifications. Executable specifications.

Note that the testing framework itself (with just about every tool you’ll encounter) uses the term “assert.” Look that one up:

Assert(v) to state with assurance, confidence, or force; state strongly or positively; affirm; aver: He asserted his innocence of the crime. [2]

Note this not “check” or “examine to determine if” or “confirm”. When we assert something we do not say “this should be true” we say “this is true”. It’s a statement of truth not an investigation. It is not a test, but a fact about the system.

This simple shift in thinking from “I am writing a test” to “I am writing a specification” changes so many things about how you’ll write them, what you write and won’t write, what qualities you will look for and emphasize, how you’ll name things... and on and on... that we won’t even try to enumerate them here. We’ll write an entire posting just about this (Testing as Specification).

So, why do we still call them tests? Two reasons.

First, “Test-Driven Development” is the term we are stuck with. Language is a living thing, a shared thing, and we cannot dictate on our own what things are called. We’d love to call it what it is: “Behaviour-Based Analysis and Design”, and we think of it that way, but at the end of the day...
We’re not going to throw these executable specifications away when we’re done driving our development with them. Why would we? It took effort to make them, and we want to be able to refer to them later. But you know what else they magically turn into at this point? Tests! We can used them to test against system regression when we need to refactor it. These are regression tests we got for no extra effort, by the way.[3]

So, does TDD add new work to the development team? No. We were going to write a specification anyway, we’re just doing it in a different way now. A better way, because it will be written in cold, hard code (rather than vaguely in human language), and it will be automatically verifiable against the real system at any point we desire, with no effort on our part.

And additionally, for free, it will produce a regression suite at well. Most teams struggle mightily and do all sorts of shenanigans (see our upcoming blog Lies, Damn Lies, and Code Coverage) to achieve 75% to 80% code coverage. We will have 100% [4] and we don’t have to do anything additional to get it.

All this leaves is the third objection from part 1... what about the maintenance burden we take on when we have to keep the test suite up to date? What about new requirements that cause dozens or even hundreds of tests to break, and have to be repaired?

Yes indeed, what about that? Must have something to do with the word... Sustainable.

Stay tuned.

----

[1] We’ll talk about Any in a future blog
[2] http://dictionary.reference.com/browse/assert
[3] Not that we are saying our test suite will replace all traditional testing. It will not. But as a regression test suite it has a lot of value both for developers and testers alike
[4] ...or very close to it. Nothing is ever perfect, after all

Friday, November 18, 2011

Redefining Test-Driven Development, Pt. 1

Download the Podcast

How you do something new is often influenced to a great extent by what you think you are doing -- its precise nature, the steps and work-flows, and how it relates to other things that you already do and understand. The term “Test-Driven Development”, while well-established in our industry, is perhaps an unfortunate choice of words to describe what we are doing, and thus how we choose to do it. Here in part 1 we’ll examine the problem, and then later in part 2 we’ll suggest a solution.

Let’s start with the word “test”. This is a word we already have a definition for; typically we think of a test as an evaluation of something, or a judgement of something relative to a standard, or perhaps an action that determines the correctness or incorrectness about something. Test is a verb: “I shall test this.” It is also a noun: “Let’s conduct a test to find out if this works.”

In any case, the presumption is that there is something that is either correct, or operates correctly, or does not. Clearly this is a nonsensical idea if the thing to be tested does not actually exist yet.

In a typical TDD process, we write the test before we create the code we’re testing [1]. At the “testing point”, there is nothing to test. Will the test fail? Of course it will [2]. Something that does not exist can neither be right nor can it do the right thing. So it would seem that we’re not really doing anything meaningful [3].

Some of you are probably thinking: “The test won’t fail. It won’t even compile!” Very true, but this is only because our technology (typically) works the way it does. In another technology (Python, for example) referencing something that does not exist might simply cause the system to ignore you, or return 0, or null, or something else. This is one reason why we like strongly-typed languages and strict compilers. However, note what the compiler is actually saying: “This makes no sense! You’re trying to refer to something that does not exist!”

All of this would seem to indicate that we have to do it the other way ‘round: that we’ve got to create the thing to be tested before we can create the test. It’s just common sense.

Then there is the notion of “driven”. The notion of “test” in conflict with the notion of “driven”. If one activity drives another, then one would normally expect the driving activity to precede the driven activity, temporally. If thing X happens which then causes thing Y, and if this causality can be proven, then we can say X drove Y. But if the test must be created after the tested thing, then how can the test drive the tested?

Finally we have “development”. Development is the creation of something, usually from a plan or goal or set of principles. If tests are to drive development, then they must cause it. Thus they must constitute the plan or goal or set of principles. But tests in the traditional software sense are not plans, they are an examination of the system to determine if it meets its success criteria..

This confusion can cause lots of problems:

People won’t get the point, and will reject the idea intellectually: “that makes no sense”
People will see this as “new work” for the team to do, and will thus slow the team down: “that will be wasteful”
People will see the product (a collection of tests) as a new maintenance burden for the team: “that cannot be sustained over time”

In other words, TDD tests would seem to constitute at best a tremendous added cost, and at worse a totally meaningless one. This is categorically untrue, and we begin by re-defining what we’re doing.

In TDD, as it turns out, we don’t write tests first. In fact... in TDD we don’t write tests at all.

Stay tuned for part 2... :)

---

[1] As we will see in future blogs, the test-first technique does not actually equate to TDD, but it is a very common approach, and very compatible with TDD.

[2] ...and what if it doesn’t? What would that mean? That’s the subject of another blog...

[3] I can tell you a-priori that any test written before the thing it tests exists will fail, without even knowing what the test is about. Therefore actually writing the test and watching it fail is not going to tell me something I didn’t already know. So why do it?

Wednesday, November 9, 2011

Test Reflexology, Part 1 (second post)

Download the Podcast

...continued from previous post...

Overly Protective Test
Sometimes when examining a test we find it to be much larger in size than the production class. Oftentimes we can just split the test into multiple tests – but not in this case – remember our initial assumption is that the tests are as good as they can get. What could be the cause then? It could be because the test is overly protective.

In a protective test we end up testing not only the specified behavior, but we are also testing to ensure that another behavior implemented by the tested unit does not interfere with the original behavior. For example, if unit X deals with a computation and with data caching, we will need to ensure that as the results of the computation are independent of, for example, inserting a new item into the cache.

The need for such a test is often a result of perfect hindsight – a bug. For instance, we discover later that certain computations accidentally alter the items in the cache.

Before fixing the bug, a developer practicing TDD will update the test to ensure that the secondary

behavior never again interferes with the primary behavior. This is a good idea, but the need to create this overly protective test indicates a design issue – yes, you’ve guessed it; we have another problem with cohesion. The lack of cohesion leads to missing encapsulation[5] which has allowed the secondary behavior to couple unexpectedly to the primary behavior and affect it. The solution is to extract the behaviors into individual, encapsulated entities and prevent the coupling from occurring. Encapsulated entities cannot encroach on each other’s state.

Once we see an overly protective test we are sure to see it many times, whenever the primary behavior is tested conjunction with a different secondary behavior. This is done to insure that none of other activities of the unit interferes with the primary behavior we really want to test. The number of discrete test scenarios will increase geometrically because all behaviors need to be tested in conjunction with all of the other behaviors. Even if it were possible to create all these scenarios and test them in a reasonable period of time, there is an obvious redundancy in the tests which is highly undesirable[6].

Combinatorial Scenarios
We often see tests that do not test all the possible scenarios but rather a selected subset of the possible scenarios. This is because there are just too many of these scenarios to reasonably go through all of them. For example, let’s assume that during the week an employee is allowed to be late at most once. If we treat the week as five distinct days: Mon, Tue, Wed, Thu, and Fri, we will need to test the 5 scenarios where the employee is late once to prove that there is no action taken; we also need to test the 20 scenarios where the employee is late twice to make sure that an action is taken. This test reeks of repetition and would either be very long or would require some parametrization or iteration built into it to reduce the number or redundant scenarios. Alternatively, we may choose to test only a subset of the scenarios which leads to incomplete testing.

The test is shedding a light on a problem, namely that we are not choosing the correct abstractions in our design. In the example above, we should have considered the week to be a collection of days and verified that if in that collection we have only one day of tardiness – no action is taken, and if two – an action is taken. In this case we only need 2 tests to guarantee that the behavior is correct. This makes the complexity of the test the same as the complexity of the code and proves that the correct abstraction in this case is of a collection of days rather than individual days.

Stay Tuned for Test Reflexology, Part 2
Part one has focused on how a given unit test can provide you with insights about the quality of your design. Part two will extend this notion into the entire suite of tests; how the nature of the suite can also let you know when your design may be wanting. Coming soon!

-----

[5] Inside the scope of a class, we really cannot encapsulate much. “Private” means nothing. Temporary method variable are really the only encapsulation available “inside the curly braces”.

[6] This will be the subject a future blog, we promise.

Net Objectives

Pages