Net Objectives

Net Objectives
If you are interested in coaching or training in ATDD or TDD please click here.

Friday, December 11, 2015

Specifying The Negative in TDD

One of the issues that frequently comes up is "how do I write a test about a behavior that the system is specified not to have?"  It's an interesting question given the nature of unit tests.  Let's examine it.

The Decision Tree of Negatives


When it comes to behaviors that the system should not have, there are different ways that this can be specified and ensured for the future:

Inherently Impossible


Some things are inherently impossible, depending on the technology being used.  For example you cannot write to read-only memory.  This is in the nature of the memory and thus does not require a specification (nor a test, since that would be a test that could never fail).  In languages like C# and Java, there exists the concept of “private”, and we know that an attempt to read or write a private value from outside a class will not compile and so will never exist in the executable system. 

Some things are inherently impossible and cannot be made possible even accidentally.  Read-only memory cannot be made writable.  However other things which are impossible by nature can be made possible if desired.  A good example of this is an immutable object.

Let's say there exists in our system a SaleAmount class that represents an amount of money for a given retail sale in an online environment.  Such a class might exist in order to restrict, validate, or perfect the data it holds.  In this case, however, there is a customer requirement that the value held must be immutable, for reasons of security and consistency in their transactions. 

This brings up the question "how do I specify in a test that you cannot change the value?"
How can we test-drive such an entity when part of what we wish to specify is that the value, once established in an instance of this class, cannot be changed from the outside?  A typical way this questions is stated is "how can I show, in a test, that there is no SetValue() method?  Any test that references such a method simply will not compile because it does not exist.  Therefore, I cannot write the test.”

Developers will sometimes suggest two different ideas:
  1. Add the SetValue() method, but make it throw an exception if anyone ever calls it.  Write a test that calls this method and fails if the exception is not thrown.[1]  Sometimes other actions are suggested if the method gets called, but an exception is quite common.
  2. Use reflection in the test to examine the object and, if SetValue() is found, fail the test.

The problem with option #1 is that this is not what the requirement says, it is not what was wanted.  The specification should be "you cannot change the value" not "if you change the value, thing x will happen."  So here, the developer is creating his own specification and ignoring the actual requirements.

The problem with option #2 is twofold:  First, reflection is typically a very sluggish thing and in TDD we want our tests to be extremely fast so that we can run them frequently without this slowing down our process.  But even if we overcame that somehow, what would we have the test look for?  SetValue()PutValue()ChangeValue()AlterValue()? The possibilities are vast and the cost of fully verifying immutability, in this case, would be enormous compared to the value.

The key to solving this is in reminding ourselves once again that TDD is not initially about testing but creating a specification.  Developers have always worked from some form of specification it's just that the form was usually some kind of document.

So think about the traditional specification, the one you're likely more familiar with.  Ask yourself this: Does a specification indicate everything the system does not do?  Obviously not, for this would create a document of infinite length.  Every system does a finite set of things, and then there is an infinite set of things it does not do.

For example, here is an acceptance test for the positive requirement [2]:

Given: A SaleAmount S with value V
When: You ask for the value of S
Then: V is retrieved

This could be made into an executable specification by the following simple test:

[TestClass]
public class SaleAmountTest
{
    [TestMethod]
    public void TestSaleAmountPersistence()
    {
        var initialValue = 10.50d;
        var testDollar = new SaleAmount(
initialValue);

        var retrievedValue = testDollar.GetValue();

        Assert.AreEqual(retrievedValue, initialValue);
    }
}


Which would drive the entity and its behavior into existence:

public class SaleAmount
{
    private double myValue;
    public SaleAmount(double aValue)
    {
        myValue = aValue;
    }

    public double GetValue()
    {
        return myValue;
    }
}


Ask yourself the following question:  If we were using the TDD process to create this SaleAmount object, and if the object had a method allowing the value to be changed (SetValue() or whatever), how would it have gotten there?  Where is the test that drove that mechanism into existence?  It's not there because there is a specific requirement that it not be there.  In TDD we never add code to the system without having a failing test first, and we only add the code that is needed to make the test pass, and nothing more. 

Put another way, if a developer on our team added a method that allowed such a change, and did not have a failing test written first, then he would be ignoring the rules of TDD and would be creating a bug as a result.  TDD does not work if you don't do it.  We don't know of any process that does. 

And if we think back to the concept of a specification there is an implicit rule here, which basically has two parts.

1.    Everything the system does, every behavior, must be specified.
2.    Given this, anything that is not specified is by default specified as not a behavior of the system. 

If it is a behavior nonetheless it is a defect.

 

Inherently possible


We don’t have a test that shows the value being changed, so it cannot be.  But this does not mean we have a “test for immutability.”  Anything that comes from the customer must be retained; we never want to lose that knowledge.  So if we think of this requirement in terms of acceptance testing we could express it using the ATDD nomenclature:

Given: A SaleAmount S with value V exists in the system
Then: You cannot change V

There is no “When” in this case because this is a requirement that is always true, it is not based on system state.  But this, of course, implies a strongly-typed, compiled language with access-control idioms (like making things "private" and so forth).  What if your technology does not provide this?  What if it is an interpreted language, or one with no enforcement mechanism to prevent access to internal variables?

The first answer is: You have to ask the customer.  You have to tell them that you cannot do precisely what they are asking for, and consider other alternatives in that investigation.   It may well be that we are using the wrong technology.

The second answer is that there will be some occasions where the only way you can ensure that an illegal or unwanted behavior is not added to a system accidentally is through static analysis (a traditional code review, or perhaps a code analysis tool).  This is still “a test” but one that either cannot or should not be automated in all cases.

On the other hand, sometimes we can make an inherently possible thing impossible by adding behaviors.  Such behaviors must, of course, be test driven.

Let's add a requirement to our SaleAmount class.  If the context of this object was, say, an online book store, the customer might have a maximum amount of money that he allows to be entered into a transaction.

We used a double-precision number [3] to hold the value in SaleAmount. A double can hold an incredibly large value inherently.  In .net, for example, it can hold a value as high as 1.7976931348623157E+308 [4].  It does not seem credible that any purchase made at our customer's site could total up to something like that!  So the requirement is: Any SaleAmount object that is instantiated with a value greater than the customer's maximum credible value should raise a visible alarm, because this probably means the system is being hacked or has a very serious calculation bug.

As developers, we know a good way to raise an alarm is to throw an exception.  We can do that, but we also capture the customer's view of what the maximum credible value is, so we specify it.  Let's say he says "nothing over $1,000.00 makes any sense".  But... how much "over"?  A dollar?  A cent?  We have to ask, of course.  Let's say the customer says "one cent".

In TDD everything must be specified, all customer rules, behaviors, values, everything.  So we start with this:

Given: The system
Then: The Maximum value for a Sale Amount is $1000.00

We also have to capture the tolerance in its own specification:

Given: The System
Then: Tolerance for comparing SaleAmount to its Maximum is one cent

These tests establish bits of domain-specific language that can then be used in any number of other specifications (we won’t have to repeatedly define them whenever we make comparisons).

[TestMethod]
public void SpecifyMaximumDollarValue()
{
    Assert.AreEqual(1000d, SaleAmount.MAXIMUM);
}

[TestMethod]
public void SpecifyMaximumDollarValue()
{
    Assert.AreEqual(.01, SaleAmount.TOLERANCE);
}


In order to get these to pass we drive the Maximum and the Tolerance into the system.
Now we can write this test, which will also fail initially of course:

Given: Value S greater than or equal to Maximum + Tolerance
When: An attempt is made to create a SaleAmount with value S
Then: A warning is issued

[TestMethod]
public void TestUSDollarThowsUSDollarValueTooLargeException()
{
    var saleAmountMaximum = SaleAmount.MAXIMUM;
    var tolerance = SaleAmount.TOLERANCE;
    var excessiveAmount = saleAmountMaximum + tolerance;

    try
    {
        CreateSaleAmount(excessiveAmount);
        Assert.Fail("SaleAmount created with excessive"+"
                    "
value should have thrown an exception");
    }
    catch (SaleAmountValueTooLargeException)
    { }
}


But now the question is, what code do we write to make this test pass?  The temptation would be to add something like this to the constructor of SaleAmount:

if(aValue => MAXIMUM + TOLERANCE) 
          throw new SaleAmountValueTooLargeException();

But this is a bit of a mistake.  Remember, it's not just "add no code without a failing test", it is "add only the needed code to make the failing test pass."

Your spec is supposed to be your pal.  He's supposed to be there at your elbow saying "don't worry.  I won't let you make a mistake.  I won't let you write the wrong code, I promise."  He's not just your pal, he's your best pal. 

Here, however, the spec is just a mediocre friend because he will let you write the wrong code and say nothing about it.  He’ll let you get in your car when you are in no condition to drive.  He'll let you do this, and let it pass:

throw new SaleAmountValueTooLargeException();

There is no conditional.  We’re just throwing the exception all the time.  That's wrong, obviously. This behavior has a boundary (as we discussed in our blog about test categories) and every boundary has two sides.  We need a little more in specification.  We need something like this:

try
{
    new SaleAmount(SaleAmount.MAXIMUM);
}
catch (SaleAmountValueTooLargeException)
{
    Assert.Fail("SaleAmount created with value at the maximum"+
                "should not have thrown an exception");
}


Now the "anAmount => MAXIMUM + TOLERANCE" part must be added to the production code or your best buddy will let you know you're blowing it.  Friends don’t let friends implement incorrectly. 
...
[1] There are a variety of ways to do this.  We’ll show one way here a bit further on.
[2] [TODO] Link to ATDD blog
[3] If you’re thinking “you used the wrong type, a long would be better” it’s a fair point.  We simply wanted to make the conceptual point that primitives do not impose domain constraints inherently, and the use of the double just makes the idea really clear.
[4] For those who dislike exponential notation, this is:
$179,769,313,486,231,520,616,720,392,992,464,536,472,240,560,432,240,240,944,616,576,160,448,992,408,768,712,032,320,616,672,472,536,248,456,776,672,352,088,672,544,960,568,304,616,280,032,664,704,344,880,448,832,696,664,856,832,848,208,048,648,264,984,808,584,712,312,912,080,856,536,512,272,
952,424,048,992,064,568,952,496,632,264,936,656,128,816,232,688,512,496,536,552,712,648,144,200,160,624,560,424,848,368
...and no cents. :)


Wednesday, November 4, 2015

Structure of Tests-As-Specifications

A big part of our thesis is that TDD is not really a testing activity, but rather a specifying activity that generates tests as a very useful side effect.  For TDD to be a sustainable process, it is important to understand the various implications of this distinction. [1]

Here, we will discuss the way our tests are structured when we seek to use them as the functional specification of the system.

A question we hear frequently is "how does TDD relate to BDD?"  BDD is "Behavior-Driven Development" a term coined by Dan North and Chris Matts in their 2006 article "Introducing BDD" [2].  Many have made various distinctions between TDD, ATDD, and BDD, but we feel these distinctions to be largely unimportant.  To us, TDD is BDD, except that we conduct the activity at a level very close to the code, and automation is much more critical. Also, we contend that “development” includes analysis and design, and thus what TDD enables is more accurately stated to be “behavior-based analysis and design”, or BBAD.

In BBAD, the general idea is that the "unit" of software that is being specified is a behavior.  Software is behavior, after all.  Software is not a noun, it is a verb.  Software’s value lies entirely in what it does, what value the user accrues as result of its behavior.  In essence, software only exists in any meaningful sense of the word when it is up and running.  The job of a software development team is to take a general-purpose computer and cause it to act in specific, valuable ways.  We call these behaviors.

The nomenclature that North and Matts proposed for specifying each behavior of a system is this: Given-When-Then.  Here's a simple example:

Given:
     User U has a valid account on our system with Username UN and password PW
     The login username is set to UN and the login password is set to PW
When:
    Login is requested
Then:
    U is logged in

Everything that software does, every behavior can be expressed in this fashion.  Each Given-When-Then expression is a specific scenario that is deemed to have business value, and that the team has taken upon itself to implement.

In TDD, when the scenario is interpreted at a test, we strive to make this scenario actionable.  So we think of these three parts of the scenario a little differently, we "verbify" them to convert these conditions into activities.

Imagine that you were a manual tester that was seeking to make sure the system was behaving correctly in terms of the scenario above.  You would not wait around until a user with a valid account happened to browse to the login page, enter his info, and click the "Login" button... you would create or identify an existing valid user and, as that person, browse to the page, enter the correct username and password, and then click the button yourself. Then you'd check to see if your login was successful.  You would do all of these things.

So the Given wasn't given, it was done by the tester (you, in this case), the When was not when, it was now do, and the Then was not a condition but rather an action: go and see if things are correct.

"Given" becomes "Setup".
"When" becomes "Trigger".
"Then" become "Verify".

We want to structure our tests in such a way that these three elements of the specification are clear and, as much as possible, separate from each other.  Typical programming languages can make this a bit challenging at times, but we can overcome these problems fairly easily.

For example: Let's say we have a behavior that calculates the arithmetic mean of two real numbers accurate within 0.1. Most likely this will be a method call on some object that takes two values as parameters and returns their arithmetic mean of those values, accurate within 0.1.

Let’s start with the Given-When-Then:

Given:
     Two real values R1 and R2
     Required accuracy A is 0.1
When:
     The arithmetic mean of R1 and R2 is requested
Then:
     The return is (R1+R2)/2, accurate to A

Let's look at a typical unit test for such a behavior:

(Code samples are in C# with MSTest as the testing framework)

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        Assert.AreEqual(5.5d,
                        MathUtils.GetInstance().

                        ArithmeticMean(7.0d, 4.0d),.1);
    }
}



This test is simple because the behavior is simple.  But this is really not great as a specification.

The Setup (creation of the MathUtils object, the creation of the example doubles 7.0d and 4.0d), the Trigger (the calling of the ArithmeticMean method with our two examples doubles), and the Verify (comparing the method's return to the expectation, 5.5d, and establishing the precision as .1), are all expressed together in the assertion.  If we can separate them, we can make the specification easier to read and also make it clear that some of these particular values are not special, that they were just picked as convenient examples.

This is fairly straightforward, but easy to miss:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {         
        // Setup
        var mathUtils = MathUtils.GetInstance();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue)/2;

        // Trigger
        var actualMean = mathUtils.ArithmeticMean(anyFirstValue,
                                                  anySecondValue);

        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }
}


Here we have included comments to make it clear that the three different aspect of this behavioral specification are now separate and distinct from each other.   The "need" for comments always seems like a smell, doesn't it?  It means we can still make this better.

But we've also used variable names like "anyFirstValue" to indicate that the number we chose was not a significant value, creating more clarity about what is important here.  Note that tolerance and expectedMean were not named in this way, because their values are specific to the required behavior.

This, now, is using TDD to form a readable specification, which also happens to be executable as a test [2].  Obviously the value of this as a test is very high; we do not intend to trivialize this.  But we write them with a different mindset when we think of them as specifications and, as we'll see, this leads to many good things.

Looking at both code examples above however, some of you may be thinking "what is this GetInstance() stuff?  I would do this: "

        // Setup
        var mathUtils = new MathUtils();

Perhaps.  We have reasons for preferring our version, which we'll set aside for its own discussion.

But the interesting question is: what if you started creating the object one way (using “new”), and then later changed your mind and used a static GetInstance() method, or maybe even some factory pattern?  If, when that change was made, you had many test methods on this class doing it the "old" way this would require the same change in all of them.

We can do it this way instead:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var arithMeticMeanCalculator =
                           GetArithmeticMeanCalculator();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue) / 2;

        // Trigger
        var actualMean = arithMeticMeanCalculator.
                         ArithmeticMean(anyFirstValue,
                                        anySecondValue);
        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }
}



Now, no matter how many test methods on this test class needed to access this arithmetic mean behavior (for different scenarios), a change in terms of how you access the behavior would only involve the modification of the single "helper" method that is providing the object for all of them.

Many testing frameworks have their own mechanisms for eliminating redundant object creation, usually in the form of a Setup() or Initialize() method, etc., and these can be used. But we prefer the method because we then gain the ability to decouple the specification from the fact that the behavior we’re specifying happens to be implemented in a class called MathUtils.  We could also change this design detail and the impact would only be on the helper method (the fact that C# has a var type is a real plus here… you might be limited a bit in other languages)

But the spec is also not about the particular method you call to get the mean, just how the calculation works, behaviorally.  Certainly an ArithmeticMean() method is logical, but what if we decided to make it more flexible, allowing any number of parameters rather than just two?  The meaning of "arithmetic mean" would not change, but our spec would have to.  Which seems wrong.  So, we could take the idea a little bit farther:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var arithmeticMeanCalculator = GetArithmeticMeanCalculator();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue) / 2;

        // Trigger
        var actualMean = TriggerArithmeticMeanCalculator(
                         arithmeticMeanCalculator, 
                         anyFirstValue, anySecondValue);
        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private double TriggerArithmeticMeanCalculator(MathUtils mathUtils, 
                                                  double anyFirstValue, 
                                                  double anySecondValue)
    {
        return mathUtils.ArithmeticMean(anyFirstValue,
            anySecondValue);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }


Now if we change the ArithmeticMean() method to take a container rather than discrete parameters, or whatever, then we only change this private helper method and not all the various specification-tests that show the behavior with more parameters, etc...

The idea here is to separate the meaning of the specification from the way the production code is designed.  We talk about the specification being one thing, and the "binding" being another.  The specification should change only if the behavior changes.  The binding (these private helpers) should only change if the design of the system changes.

Another benefit here is clarity, and readability.  Let's improve it a bit more:

[TestClass]
public class MathTests
{
    [TestMethod] 
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;

        // Trigger
        var actualMean = TriggerArithmeticMeanCalculation(
                                             anyFirstValue,  '  
                                             anySecondValue);
           
        // Verify
        var expectedMean = (anyFirstValue + anySecondValue) / 2;
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private double TriggerArithmeticMeanCalculation(
                                double anyFirstValue, 
                                double anySecondValue)
    {
        var arithmeticMeanCalculator = GetArithmeticMeanCalculator();
        return arithmeticMeanCalculator.
                                ArithmeticMean(anyFirstValue, 
                                anySecondValue);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }
}

We have moved the call GetArithmeticMeanCalculator() to the Trigger, and expectedMean to the Verification [3].  Also we changed the notion of "trigger the calculator" to "trigger the calculation". Now, remember the original specification?

Given:
     Two real values R1 and R2
     Required accuracy A is 0.1
When:
     The Arithmetic Mean of R1 and R2 is requested
Then:
     The return is (R1+R2)/2, accurate to A

The unit test, which is our specification, very closely mirrors this Given-When-Then expression of the behavior. Do we really need the comments to make that clear?  Probably not.  We’ve created a unit test that is a true specification of the behavior without coupling it to the specifics of how the behavior is expressed by the system.

Can we take this even further?  Of course... but that's for another entry. :)

[1] It should be acknowledged that Max prefers to say "it is a test which also serves as a specification."  We'll probably beat him into submission :), but for the time being that's how he likes to think of it.  We welcome discussion, as always.

[2] Better Software Magazine, March 2006.

[3] It should also be acknowledged that we're currently discussing the relative merits of using Setup/Trigger/Verify in TDD rather than just sticking with Given/When/Then throughout. See Grzegorz Gałęzowski's very interesting comment below on this (and other things). 

Wednesday, September 23, 2015

TDD and Its (at least) 5 Benefits

Many developers have concerns about adopting test-driven development, specifically regarding:
  • It's more work.  I'm already over-burdened and now you're giving me a new job to do.
  • I'm not a tester.  We have testers for testing, and they have more expertise than I do.  It will take me a long time to learn how to write tests as well as they do.
  • If I write the code, and then test it, the test-pass will only tell me what I already know: the code works.
  • If I write the test before the code the failing of the test will only tell me what I already know: I have not written the code yet.
Here we are going to deal with primarily the first one:  It's going to add work.

This is an understandable concern, at least at initially, and it is not only the developers that express it.  Project managers will fear that the team's productivity will decrease, which they are accountable for.  Project sponsors fear that the cost of the project will go up if the developers end up spending a fair amount of their time writing tests.  The primary cost of creating software is developer time.

The fact is, TDD is not about adding new burdens to the developers, but rather it is just the opposite: TDD is about gaining multiple benefits from a single activity.

In the test-first activity developers are not really writing tests.  They look like tests, but they are not (yet).  They are an executable specification (this is a critical part of our redefinition of TDD entry).  As such, they do what specifications do: they guide the creation of the code.  Traditional specifications, however, are usually expressed in some colloquial form, perhaps a document and/or some diagrams.  Communication in this form can be very lossy and easy to misinterpret.  Missing information can go unnoticed.

For example, one team decided to create a poker game as part of their training on TDD.  Often an enjoyable project is good when learning as we tend to retain information better when we're having a good time.  Also, these developers happened to live and work in Los Vegas. :) Anyway, it was a contrived project and so the team came up with the requirements themselves; basically the rules of poker and the mechanics of the game.  One requirement they came up with was "the system should be able to shuffle the deck of cards into a reordered state."  That seemed like a reasonable thing to require until they tried to write a test for it.  How does one define "reordered?"  One developer said "oh, let's say at least 90% of the cards need to be in a new position after the shuffle completes."  Another developer smiled and said "OK, just take the top card and put in on the bottom.  100% will be in a new position.  Will that be acceptable?"  They all agreed it would not.  This seemingly simple issue ended up being more complicated than anyone had anticipated.

In TDD we express the specification in actual test code, which is very unforgiving.  One of the early examples of this for us was the creation of a Fahrenheit-to-Celsius temperature conversion routine.  The idea seemed simple: take a measurement in Fahrenheit (say 212 degrees, the boiling point of water at sea level), and convert it to Celsius (100 degrees).  That statement seems very clear until you attempt to write a unit test for it, and realize you do not know how accurate the measurements should be.  Do we include fractional degrees?  To how many decimal places?  And of course the real question is what is this thing going to be used for?  This form of specification will not let you get away with not knowing because code is exacting like this.

Put another way, a test would ask "how accurate is this conversion routine?"  A specification asks "how accurate does this conversion routine need to be" which is of course a good question to ask before you attempt to create it.

The first benefit of TDD is just this: it provides a very detailed, reliable form of something we need to create anyway, a functional specification.

Once the code-writing beings, this test-as-specification serves another purpose.  Once we know what needs to be written, we can begin to write it with a clear indication of when we will have gotten it done.  The test stands as a rubric against which we measure our work.  Once it passes, the behavior is correct.  Developers quickly develop a strong sense of confidence in their work once they experience this phenomenon, and of course confidence reduces hesitancy and tends to speed us up.

The second benefit of TDD is that it provides clear, rapid feedback to the developers as they are creating the product code.

At some point, we finish our work.  Once this happens the suite of tests that we say are not really tests (but specifications) essentially "graduate" into their new life: as tests, in the traditional sense.  This happens with no additional effort from the developers.  Tests in the traditional sense are very good to have around and provide three more benefits in this new mode...

First, they guard against code regression when refactoring.  Sometimes code needs to be cleaned up either because it has quality issues (what we call "olfactoring"[1]), or because we are preparing for a new addition to the system and we want to re-structure the existing code to allow for a smooth introduction of the enhancement.  In either case, if we have a set of tests we can run repeatedly during the refactoring process, then we can be assured that we have not accidentally introduced a defect.  Here again, the confidence this yields will tend to increase productivity.

The third benefit is being able to refactor existing code in a confident and reassured fashion.

But also, they provide this same confirmation when we actually start writing new features to add to an existing system.  We return to test-as-specification when writing the new features, with the benefits we've already discussed, but also the older tests (as they continue to pass) tell us that the new work we are doing is not disturbing the existing system. Here again, allows us to be more aggressive in how we integrate the newly-wanted behavior.

The fourth benefit is being able to add new behavior in this same way.

But wait, there's more!  Another critical issue facing a development team is preventing the loss of knowledge.  Legacy code often has this problem:  the people who designed and wrote the systems are long gone, and nobody really understands the code very well.  A test suite, if written with this intention in mind, can capture knowledge because we can consider it any time to be "the spec" and read it as such. 

There are actually three kinds of knowledge we need to retain.
  1. What is the valuable business behavior that is implemented by the system?
  2. What is the design of the system?  Where are things implemented?
  3. How is the system to be used?  What examples can we look at? 
All of this knowledge is captured by the test suite, or perhaps more accurately, the specification suite.  It has the advantage over traditional documentation of being able to be run against the system to ensure it is still correct.

So the fifth benefit is being able to retain knowledge in a trustworthy form.

Up do this point we've connected TDD to several critical aspects of software development:
  1. Knowing what to build (test-first, with the test failing)
  2. Knowing that we built it (turning the test green)
  3. Knowing that we did not break it when refactoring it (keeping the test green)
  4. Knowing that we did not break it when enhancing/tuning/extending/scaling it (keeping the test green)
  5. Knowing, even much later, what we built (reading the tests after the fact)

All of this comes from one effort, one action.

And here's a final, sort of fun one:  Have you ever been reviewing code that was unfamiliar to you... perhaps written by someone else or even by you a long time ago, and you come across a line of code that you cannot figure out.  "Why is this here?   What is it for?  What does it do?  Is it needed?"  One can spend hours poring over the system, or trying to hunt down the original author who may herself not remember.  It can be very annoying and time-consuming.

If the system was created using TDD, this problem is instantly solved.  Don't know what a line of code does?  Break it, and run your tests.  A test should fail.  Go read that test.  Now you know.

Just don't forget to Crtl-Z. :)

But what if no test fails?  Or more than one test fails?  Well, that's why you're reading this blog.  For TDD to provide all these benefits, you need to do it properly...

[1] We'll add a link here when we've written this one

Tuesday, January 20, 2015

TDD and Defects

We've said all along that TDD is not really about "testing" but rather about creating an executable form of specification that drives development forward.  This is true, and important, but it does not mean that TDD does not have a relationship to testing.  One interesting issue where there is significant synergy is in our relationship to defects.

Two important issues we'll focus on are: when/how a defect becomes known to us, and the actions we take at that point.

Time and Development


In the cyclic nature of agile development, we repeatedly encounter various points in time when we may discover that something is not right.  First, as we are writing the source code itself most modern tools can let us know that something is not the way we intended it to be.  For example when you end a method with a closed-curly-brace a good IDE will underline or otherwise highlight any temporary method variables that you created but never used.  Obviously if you created a variable you intended to use it so you must have done something other than you meant to.  Or, if you type an object reference name and then hit the dot, many IDE's will bring up a list of methods available for you to call on that type.  If the list does not appear then something is not right.

When compiling the source into the executable we encounter a number of points in time where the technology can check our work.  The pre-compiler (macros, if-defs, #defines), the compiler, the linker (resolving dependencies), and so forth.

And there are run-time checks too.  The class loader, generic type constraints, assertions of preconditions and postconditions, etc..  Various languages and technologies provide different levels of these services and they all can be "the moment" where we realize that we made an error that has resulted in a defect.

Detection vs. Prevention


Defects are inevitable and so we have to take action to either detect them or to prevent them.  Let's say for example that you have a method that takes as its parameters the position of a given baseball player on a team, and his jersey number, and then adds the player to a roster somewhere.  If you use an integer to represent the position (1 = Pitcher 2 = Catcher and so forth) then you will have to decide what to do if another part of the system incorrectly calls this method with something below 1 or above 9.  That would be a defect that the IDE/compiler/linker/loader would not find, because an int is type-safe for all values from minint to maxint [1].  So if the method was called with a 32, you'd have to put something in the code to deal with it: 32 mod 9 to determine what position that effectively is (Third Base if you're curious), correct the data (anything above 9 is reduced to 9, below 1 becomes 1), return a null, throw an IllegalPositionException to raise the alarm... something.  Whatever the customer wants.  Then you'd write a failing test first to drive it into the code.

If, however, you chose not to use an int, but rather create your own type with its own constraints... for example, an enumeration called PLAYER with members PITCHER, CATCHER, SHORTSTOP, etc... then a defect elsewhere that attempted to pass in PLAYER.QUARTERBACK would not compile and therefore would never make it into production.  We can think of this as defect prevention even though it isn't really, it's just very early detection.  But that vastly decreases the cost of repair.

Cost of Delays


The earlier you find the bug, the cheaper it is to fix.  First of all, the issue is fresher in your mind and thus you don't have to recapitulate the thought process that got you there.   It's less likely that you'll have more than one bug to deal with at a time (late detection often means that other bugs have arisen during the delay, sometimes bugs which involve each other) which means you can focus.  Also, if you're in a very short cycle then the defect is something you just did, which makes it more obvious.

The worst time to find out a defect exists, therefore, is the latest time.  It is when the system is operating either in the QA department's testing process or especially when actually in use by a customer.  When QA finds the bug it's a delayed find.  When a customer finds the defect it's further delayed but it also means:
  1. The customer's business has suffered
  2. The product's reputation is tarnished
  3. Your organization's reputation is tarnished
  4. It is personally embarrassing to you
  5. And, as we said, the cost to fix will be much higher
In a perfect world this would never happen, of course, but the world is complex and we are prone to errors.

TDD and Time


In TDD we add another point in time when we can discover an error: test time.  Not QA's testing but developer test time, test we run and thus create our own non-delayed moment of run time.  Tests execute the system so they have the same "experience" as QA or a customer, but since we run them very frequently they represent a faster and more granular defect indication.

You would prefer to prevent all defects from making into runtime, of course.  But you cannot.  So a rule in TDD is this: any defect that cannot be prevented from getting into production must have a specification associated with it, and thus a test that will fail if the spec is not followed.

Since we write the tests as part of the code-writing process and if we adhere perfectly to the TDD rule that says "code is never put into the source without a failing test that requires it"... and if we see the test fails until the code is added which then makes it pass... then we should never have code that is not covered (and meaningfully so [2]) by tests.  But here we're going to make mistakes too.  Our good intentions will fall afoul of the forces they always do; fatigue, misunderstandings, things we forget, bad days and interruptions, the fat-fingered gods of chaos.

With TDD as your process certainly far fewer defects will make it into the product, but it it will still happen from time to time.  But what that will mean will be different.

TDD and Runtime Defects


Traditionally a bug report from outside the team is placed into a tracking system and addressed in order of priority, severity, in the order they are entered, something along those lines.  But traditionally addressed means fixed.  This is not so in TDD.

In TDD a bug reported from production is not really a bug... yet.  Because if all of our tests are passing and if our tests are the specification of the system, this means the code is performing as specified.  There is no bug.  But it is not doing what the customer wants so it is the specification that must be wrong: we have a missing test.

Therefore fixing the problem is not job #1; adding the missing test is.  In fact, we want the defect in place so that when we 1) figure out what the missing test was and 2) add it to the suite we can 3) run it and see it fail.  Then and only then we fix the bug and watch the new test go green, completely proving the connection between the test and the code, and also proving that the defect in question can never make it into production again. 

That's significant.  The effort engaged in traditional bug fixing is transitory; you found it and fixed it for now, but if it gets back in there somehow you'll have to find it and fix it again.   In TDD the effort is focused more on adding the test, and thus it is persistent effort.  You keep it forever.

Special Cases


One question that may be occurring to you is "what about bad behavior that gets into the code that really is not part of the spec and should never be?"  For an example in the case of our baseball-player-accepting method above, what if a developer on the team adds some code that says "if the method gets called with POSITION.PITCHER and a jersey number of exactly 23, then add them to the roster twice."  Let's further stipulate that no customer asked for this, it's simply wrong.

Could I write a test to guard against that?  Sure; the given-when-then is pretty clear:

Given: a pitcher with jersey number 23
            an empty roster

When: the pitcher is passed into method X once

Then: a pitcher with jersey number 23 will appear once in the roster

But I shouldn't.  First of all, the customer did not say anything about this scenario, and we don't create our own specifications.  Second, where would that end?  How many scenarios like that could you potentially dream up?  Combinations and permutations abound. [3]

The real issue for a TDD team in the above example is how did that code get into the system anyway?  There was no failing test that drove it.  In TDD adding code to the system without a failing test is a malicious attack by the development team on their own code.  If that's what you're about then nothing can really stop you.

So the answer to this conundrum is... don't do that.  TDD does not work, as a process, if you don't follow its rules in a disciplined way.  But then again, what process would?

-S-

[1] You might, in fact, have chosen to do this because the rules of baseball told you to:
http://en.wikipedia.org/wiki/Baseball_positions

[2] What is "non-meaningful coverage"?  I refer you to:
http://www.sustainabletdd.com/2011/12/lies-damned-lies-and-code-coverage.html

[3] I am not saying issues never arise with special cases, or that it's wrong to speculate; sometimes we discover possibilities the customer simply didn't think of.  But the right thing to do when this happens is go back to the customer and ask what the desired behavior of the system should be under circumstance X before doing anything at all.  And then write the failing test to specify it.