Net Objectives

Net Objectives
If you are interested in coaching or training in ATDD or TDD please click here.

Friday, September 23, 2016

Magic Buttons and Code Coverage

This will be a quickie.  But sometimes good things come in small packages.

This idea came to us from Amir's good friend Eran Pe'er, when he was visiting Net Objectives from his home in Israel.

I'd like you to imagine something, then I'm going to ask you a question.  Once I ask the question you'll see a horizontal line of dashes.  Stop reading at that point and really try to answer the question.  Once you have your answer, see if you can work out why you answered the way you did.  Consider putting a comment below with your thoughts.

Here is the thing to imagine:

In front of you are two magic buttons, A and B.  You can only push one of them.
  • If you push A, you will instantly achieve 80 percent code coverage, only 20 percent of your code will be left without test coverage
  • If you push B, you will only get 70 percent code coverage, leaving 30 percent of your code uncovered

But there is one more difference.  If you push A you will have no idea where in your code the lack of coverage is, you just know that some 20% of your code is not covered by tests. If you push B, you will know precisely where your covered and uncovered code is, though of course there will be more uncovered code.

Which would you press, and why?

--------------------------------------Stop and Answer------------------------------------

For me, the answer is unequivocally B.  A big part of the value of TDD is reducing uncertainty and communicating with precision.  Coverage is all about adding a degree of safety when making changes whether they be via refactoring or when making enhancements to existing systems.  If I know where the uncovered code is, I know where I have to be more careful, and conversely if I know where the covered code is I know where I can be aggressive.

Also, if I push A, then I know that for every line of code there is a 1-in-5 chance that it is uncovered.

But I think this is only part of the answer.  We'd like to hear from you...

Tuesday, August 30, 2016

TDD and Design: Frameworks

Increasingly, in modern software development, we create software using components that are provided as part of a language, framework, or other element of an overall development ecosystem.  In test-driven development this can potentially cause difficulties because our code becomes dependent on components that we did not create, that may not be amenable to our testing approaches, and that the test cannot inherently control.

That last part is particularly critical.  In TDD, we want to create unique, isolated tests that fail for one specific reason and thus specify one narrowly-defined behavior of the system.  Tests that involve multiple behaviors are hard to read (as a specification) and will have multiple reasons to fail.  When a test fails, we want to know unequivocally why it failed so that we can efficiently address the issue.

We write tests in TDD to specify the proper behavior of the system, allowing us to confidently create the right things.  But they also, later, serve as tests to ensure that defects have not been introduced.  In this second mode, what does the test actually test?

A test will always test everything that is in scope which it does not control.  If a test is to be narrowly focused on just one thing, then everything else that is in scope must be brought under the control of the test, otherwise it is testing those things as well.

Let's roll the dice and look at an example:

Most people are familiar with the game Yahtzee.  Briefly, you roll five dice and try to make the best pattern you can.  Examples are three of a kind, or numbers in a sequence (a "straight") and so forth, all the way up to a "yahzee" which means all five dice are the same.  "Chance" means you have no pattern at all, just five unrelated numbers.

public enum Result { CHANCE, ACES, TWOS, THREES, FOURS, PAIR,
                     THREEOFAKIND, FOUROFAKIND, FULLHOUSE,
                     SMALLSTRAIGHT, LARGESTRAIGHT, YAHTZEE }

public class Yahtzee
{
    public Result RollDice() {
        char[] dice = new char[5];
        Result myResult = Result.CHANCE;
        Random rand = new Random();

        dice[1] = (char)rand.Next(6);
        dice[2] = (char)rand.Next(6);
        dice[3] = (char)rand.Next(6);
        dice[4] = (char)rand.Next(6);
        dice[5] = (char)rand.Next(6);

        // Logic to determine the best result and set myResult

        return myResult;
    }
}


The idea here is to roll five dice and have the game tell you what the best pattern is that you can make from the five random results that you got.  The default is "Chance" unless something better can be made from the die rolls you got.

What we would want to specify here is that the logic (which is commented out for brevity) would correctly identify various patterns of die rolls.  If we rolled 4 5's, for example, it would identify it as Result.FOUROFAKIND even though it is also true that we have three of a kind.

The problem is that Random is in scope... we are using it.  But unless we bring it under the control of the test we are also testing Random, which is not what we want.  Also, we cannot predict what Random will do.  We could seed the Random class with a known value, but even so we are testing more than we wish to.  How can we truly bring random under control?

This same issue would exist in code that is dependent upon any component: a GPS module, the system clock, a network socket, etc...

A Design Principle


In seeking to ensure high-quality designs, we need standards or rubrics to apply to any proposed design.  We want to check out thinking, to make sure we're not fooling ourselves or missing anything.  One such rubric is this:

When examining an entity (class, method, whatever) in our system we ask: is this entity aware of the framework, or aware of the application logic?  If the answer is "both", then we seek some way to separate the two aspects of the entity from each other.

If you examine the code above you'll see that this game entity is aware of the framework (how you create a use the Random class) and also the application logic (the rules of this particular game).  This is a clue that we should reconsider the structure of our code.  Note that this concern also impacts the testability of the code because, as we've already noted, the test does not want to be vulnerable to the Random component.  We want the test to be solely concerned with the game logic.

A Testing Adapter


One way to bring the framework component under the control of the test is to wrap it in an adapter, and then mock the wrapper:

class DieRoller
{
    Random rand;
    public DieRoller() {
        rand = new Random();
    }

    public virtual char RollDie() {
        return (char)rand.Next(6);
    }
}


..and then change the product code to use this class instead of using the framework element directly.  For the test, we could mock [1] this class and inject the mock instead of the adapter, bringing the die rolled in each case under the control of the test.

One example:

class MockDieRoller : DieRoller {
    private char roll;

    public void setRoll(char aRoll) {
        roll = aRoll;
    }

    public override char RollDie() {
        return roll;
    }
}


An Endo Test


Creating an adapter class is a viable option when it comes to framework components, but it may seem like overkill in some cases.  When the issue in question is very simple, as in our example, you could also control the dependency through a simple technique called "endo-testing."

public class Yahtzee
{
    public Result RollDice()
    {
        char[] dice = new char[5];
        Result myResult = Result.CHANCE;

        dice[1] = rollDie();
        dice[2] = rollDie();
        dice[3] = rollDie();
        dice[4] = rollDie();
        dice[5] = rollDie();

        // Logic to determine the best result and set myResult

        return myResult;
    }

    protected virtual char rollDie() {
        return (char)(new Random().Next(6));
    }
}


All we have done is extracted the use of the Random class into a local, protected virtual method.  This is a very simple and quick refactor; virtually any decent IDE will do this for you.  The test will look like this:

[TestClass]
public class YahtzeeTest
{
    [TestMethod]
    public void TestGameResults(){
           
        TestableYahtzee myGame = new TestableYahtzee();
        // conduct the test against controllable results
    }

    private class TestableYahtzee : Yahtzee {
        private char roll;
        public void setRoll(char aRoll) {
            roll = aRoll;
        }

        protected override char RollDie() {
            return roll;
        }
    }
}


Now the test can control what dice are rolled and conduct all the various scenarios to ensure that the rules of the game are adhered to.  Also, we've satisfied our design principles by separating game logic and framework knowledge into two methods, rather than two classes.




[1] See our blogs and podcast on mocking for more details:
http://www.sustainabletdd.com/2012/05/mock-objects-part-1.html


Friday, January 29, 2016

TDD and the "6 Do's and 8 Skills" of Software Development: Pt. 1

This post is not about TDD per se, but rather a context in which TDD can demonstrate its place in and contribution to the value stream.  This context has to do with the 6 things that we must accomplish (do) and the 8 skills that the team must have in order to accomplish them.  We'll describe each "do", noting where and if TDD has an impact, and then do the same thing with the skills.

6 Dos:
  • Do the right thing
  • Do the thing right
  • Do it efficiently
  • Do it safely
  • Do it predictably
  • Do it sustainably

8 Skills:

  • Programming
  • Designing
  • Analysis
  • Refactoring
  • Testing
  • Dev ops
  • Estimation
  • Process Improvement

 

Do the right thing


Everything the team does must be traceable back to business value.  This means “the right thing” is the thing that has been chosen by the business to be the next most important thing, in terms of business value, that we should work on.  TDD has no contribution to make to this.  Our assumption is that this decision has been made, and made correctly before we begin our work.  How the business makes this decision is out of scope for us, and if they make the wrong one we will certainly build the wrong thing.  This is an issue of product portfolio management and business prioritization, and we do not mean to minimize its importance; it is crucial.  But it’s not a TDD activity.  It is the responsibility of project/product management.

An analogy:

As a restaurant owner, the boss has determined that the next thing that should be added to the menu is strawberry cheesecake.  He made this decision based on customer surveys, or the success of his competitors at selling this particular dessert, or some other form of market research that tells him this added item will sell well and increase customer satisfaction ratings.  It will have great business value and, in his determination, is the most valuable thing to have the culinary staff work on.

Do the thing right


One major source of mistakes is misunderstanding.  Communication is an extremely tricky thing, and there can be extremely subtle differences in meaning with even the simplest of words.  “Clip” means to attach (clip one thing to another) and to remove (clipping coupons). 

A joke we like: My wife sent me to the store and said “please get a gallon of milk -- if they have eggs get six.”  So I came back with 6 gallons of milk.  When she asked why I did that, I replied “they had eggs.” 

The best way we know to ferret out the hidden assumptions, different uses of terms, different understanding, missing information, and the all-important “why” of a requirement (which is so often simply missing) is by engaging in a richly communicative collaboration involving developers, testers, and businesspeople.  The process of writing acceptance tests provides an excellent framework for this collaboration, and is the responsibility of everyone in the organization.

The analogy, continued:

You work as a chef in the restaurant, and the owner has told you to add strawberry cheesecake to the menu.  You prepare a graham-cracker crust, and a standard cheesecake base to which you add strawberry syrup as a flavoring.  You finish the dish and invite your boss to try it.  He says “I did not ask for strawberry flavored cheesecake, I asked for a strawberry cheesecake.  Cheesecake with strawberry.”

So you try again, this time making a plain cheesecake base and adding chopped up strawberries, stirring them in.  The boss stops by to sample the product and says “no, no, not strawberries in the cake, I meant on the cake.”

So you try another version where the plain cheesecake is topped by sliced strawberries.  Again the boss in unhappy with the result.  “Not strawberries, strawberry.  As in a strawberry topping.”

What he wanted was a cheesecake topped with strawberry preserves, which he has always thought of as “strawberry cheesecake.”  All this waste and delay could have been avoided if the requirements had been communicated with more detail and accuracy.

Do it efficiently


For most organizations the primary costs of developing software are the time spent by developers and testers doing their work, and the effect of any delays caused by errors in the development process.  Anything that wastes time or delays value must be rooted out and corrected.

TDD has a major role to play here. 
  • When tests are written as the specification that guides development, they keep the team focused on what is actually needed. 
  • The tests themselves require precision in our understanding of a requirement and thus lead to code that satisfies the exact need and nothing more.  Traditionally developers have worked in an environment of considerable uncertainty, and thus have spent time writing code that ends up being unnecessary, which wastes their time. 
  • Without TDD, defects in the code will largely be dealt with after development is over, requiring much re-investigation of the system after the fact.  TDD drives the issue to one of bug prevention (much more time-efficient) rather than bug detection.

 

Do it safely


Software must be able to change if it is to remain valuable, because its value comes from its ability to meet a need of an organization or individual.  Since these needs change, software must change. 

Changing software means doing new work, and this is usually done in the context of existing work that was already completed.  One of the concerns that arises when this is done is: will the new work damage the existing system?  When adding a new feature, for example, we need to guard against introducing bugs in the code that existed before we started our work.

TDD has a significant role here, because all of our work proceeds from tests and thus we have test coverage protecting of our code from accidental changes.  Furthermore, this test coverage is known to be meaningful because of how it was achieved.

Test coverage that is added after a system is created is only guaranteed to execute the production code, but not to guarantee anything about the behavior that results from the execution.  In TDD the coverage is created by writing tests that drive the creation of the behavior, so if they continue to pass we can be assured that the behavior remains the same.

Do it predictably


A big part of success in business is planning effectively, and this includes the notion of predictability.  Every development initiative is either about creating something new, or changing something that already exists (and, in fact, you could say that creating something new is just a form of change: from nothing to something).

One question we seek to answer when planning and prioritizing work is: how long will it take and how many resources will be required?  Although we know we can never perfectly predict these things, we want to reduce the degree of error in our predictions.

TDD has a role to play here:
  • TDD increases design and code quality.  There are many reasons for this, but the shorthand explanation is that bad designs and poor code are very hard to test.  If we start from the testing perspective, we tend to create more quality.  Higher quality creates clarity, and the more clarity you have the better your predictions will be.
  • TDD points out gaps in analysis earlier than traditional methodologies.  These gaps, when discovered late, create unexpected/unplanned for work, and this derails our predictions.
  • TDD provides meaningful code coverage.  This reduces the creation of unexpected errors, and fewer unexpected anything increases predictability.
  • TDD helps us to retain knowledge, and the more you understand a thing the more accurate your predictions will be about changing it.
  •  

Do it Sustainably


The team must work in a way that can be sustained over the long haul.  Part of this is avoiding overwork and rework, and making sure the pace of work is humane.  Part of this is allowing time for the team to get appropriate training, and thus to "sharpen the saw" between major development efforts.  Issues like these are the responsibility of management whether the team is practicing TDD or not.

However, this work is called "Sustainable Test-Driven Development" for a reason.  TDD itself can create sustainability problems if the maintaining the test suite presents an increasingly-significant burden for the team.  Much of our focus overall has been and will continue to be avoiding this problem.

In other words, TDD will not create sustainability unless you learn how to do it right.
Next up, How TDD impacts the 8 skills of software development

Friday, December 11, 2015

Specifying The Negative in TDD

One of the issues that frequently comes up is "how do I write a test about a behavior that the system is specified not to have?"  It's an interesting question given the nature of unit tests.  Let's examine it.

The Decision Tree of Negatives


When it comes to behaviors that the system should not have, there are different ways that this can be specified and ensured for the future:

Inherently Impossible


Some things are inherently impossible, depending on the technology being used.  For example you cannot write to read-only memory.  This is in the nature of the memory and thus does not require a specification (nor a test, since that would be a test that could never fail).  In languages like C# and Java, there exists the concept of “private”, and we know that an attempt to read or write a private value from outside a class will not compile and so will never exist in the executable system. 

Some things are inherently impossible and cannot be made possible even accidentally.  Read-only memory cannot be made writable.  However other things which are impossible by nature can be made possible if desired.  A good example of this is an immutable object.

Let's say there exists in our system a SaleAmount class that represents an amount of money for a given retail sale in an online environment.  Such a class might exist in order to restrict, validate, or perfect the data it holds.  In this case, however, there is a customer requirement that the value held must be immutable, for reasons of security and consistency in their transactions. 

This brings up the question "how do I specify in a test that you cannot change the value?"
How can we test-drive such an entity when part of what we wish to specify is that the value, once established in an instance of this class, cannot be changed from the outside?  A typical way this questions is stated is "how can I show, in a test, that there is no SetValue() method?  Any test that references such a method simply will not compile because it does not exist.  Therefore, I cannot write the test.”

Developers will sometimes suggest two different ideas:
  1. Add the SetValue() method, but make it throw an exception if anyone ever calls it.  Write a test that calls this method and fails if the exception is not thrown.[1]  Sometimes other actions are suggested if the method gets called, but an exception is quite common.
  2. Use reflection in the test to examine the object and, if SetValue() is found, fail the test.

The problem with option #1 is that this is not what the requirement says, it is not what was wanted.  The specification should be "you cannot change the value" not "if you change the value, thing x will happen."  So here, the developer is creating his own specification and ignoring the actual requirements.

The problem with option #2 is twofold:  First, reflection is typically a very sluggish thing and in TDD we want our tests to be extremely fast so that we can run them frequently without this slowing down our process.  But even if we overcame that somehow, what would we have the test look for?  SetValue()PutValue()ChangeValue()AlterValue()? The possibilities are vast and the cost of fully verifying immutability, in this case, would be enormous compared to the value.

The key to solving this is in reminding ourselves once again that TDD is not initially about testing but creating a specification.  Developers have always worked from some form of specification it's just that the form was usually some kind of document.

So think about the traditional specification, the one you're likely more familiar with.  Ask yourself this: Does a specification indicate everything the system does not do?  Obviously not, for this would create a document of infinite length.  Every system does a finite set of things, and then there is an infinite set of things it does not do.

For example, here is an acceptance test for the positive requirement [2]:

Given: A SaleAmount S with value V
When: You ask for the value of S
Then: V is retrieved

This could be made into an executable specification by the following simple test:

[TestClass]
public class SaleAmountTest
{
    [TestMethod]
    public void TestSaleAmountPersistence()
    {
        var initialValue = 10.50d;
        var testDollar = new SaleAmount(
initialValue);

        var retrievedValue = testDollar.GetValue();

        Assert.AreEqual(retrievedValue, initialValue);
    }
}


Which would drive the entity and its behavior into existence:

public class SaleAmount
{
    private double myValue;
    public SaleAmount(double aValue)
    {
        myValue = aValue;
    }

    public double GetValue()
    {
        return myValue;
    }
}


Ask yourself the following question:  If we were using the TDD process to create this SaleAmount object, and if the object had a method allowing the value to be changed (SetValue() or whatever), how would it have gotten there?  Where is the test that drove that mechanism into existence?  It's not there because there is a specific requirement that it not be there.  In TDD we never add code to the system without having a failing test first, and we only add the code that is needed to make the test pass, and nothing more. 

Put another way, if a developer on our team added a method that allowed such a change, and did not have a failing test written first, then he would be ignoring the rules of TDD and would be creating a bug as a result.  TDD does not work if you don't do it.  We don't know of any process that does. 

And if we think back to the concept of a specification there is an implicit rule here, which basically has two parts.

1.    Everything the system does, every behavior, must be specified.
2.    Given this, anything that is not specified is by default specified as not a behavior of the system. 

If it is a behavior nonetheless it is a defect.

 

Inherently possible


We don’t have a test that shows the value being changed, so it cannot be.  But this does not mean we have a “test for immutability.”  Anything that comes from the customer must be retained; we never want to lose that knowledge.  So if we think of this requirement in terms of acceptance testing we could express it using the ATDD nomenclature:

Given: A SaleAmount S with value V exists in the system
Then: You cannot change V

There is no “When” in this case because this is a requirement that is always true, it is not based on system state.  But this, of course, implies a strongly-typed, compiled language with access-control idioms (like making things "private" and so forth).  What if your technology does not provide this?  What if it is an interpreted language, or one with no enforcement mechanism to prevent access to internal variables?

The first answer is: You have to ask the customer.  You have to tell them that you cannot do precisely what they are asking for, and consider other alternatives in that investigation.   It may well be that we are using the wrong technology.

The second answer is that there will be some occasions where the only way you can ensure that an illegal or unwanted behavior is not added to a system accidentally is through static analysis (a traditional code review, or perhaps a code analysis tool).  This is still “a test” but one that either cannot or should not be automated in all cases.

On the other hand, sometimes we can make an inherently possible thing impossible by adding behaviors.  Such behaviors must, of course, be test driven.

Let's add a requirement to our SaleAmount class.  If the context of this object was, say, an online book store, the customer might have a maximum amount of money that he allows to be entered into a transaction.

We used a double-precision number [3] to hold the value in SaleAmount. A double can hold an incredibly large value inherently.  In .net, for example, it can hold a value as high as 1.7976931348623157E+308 [4].  It does not seem credible that any purchase made at our customer's site could total up to something like that!  So the requirement is: Any SaleAmount object that is instantiated with a value greater than the customer's maximum credible value should raise a visible alarm, because this probably means the system is being hacked or has a very serious calculation bug.

As developers, we know a good way to raise an alarm is to throw an exception.  We can do that, but we also capture the customer's view of what the maximum credible value is, so we specify it.  Let's say he says "nothing over $1,000.00 makes any sense".  But... how much "over"?  A dollar?  A cent?  We have to ask, of course.  Let's say the customer says "one cent".

In TDD everything must be specified, all customer rules, behaviors, values, everything.  So we start with this:

Given: The system
Then: The Maximum value for a Sale Amount is $1000.00

We also have to capture the tolerance in its own specification:

Given: The System
Then: Tolerance for comparing SaleAmount to its Maximum is one cent

These tests establish bits of domain-specific language that can then be used in any number of other specifications (we won’t have to repeatedly define them whenever we make comparisons).

[TestMethod]
public void SpecifyMaximumDollarValue()
{
    Assert.AreEqual(1000d, SaleAmount.MAXIMUM);
}

[TestMethod]
public void SpecifyMaximumDollarValue()
{
    Assert.AreEqual(.01, SaleAmount.TOLERANCE);
}


In order to get these to pass we drive the Maximum and the Tolerance into the system.
Now we can write this test, which will also fail initially of course:

Given: Value S greater than or equal to Maximum + Tolerance
When: An attempt is made to create a SaleAmount with value S
Then: A warning is issued

[TestMethod]
public void TestUSDollarThowsUSDollarValueTooLargeException()
{
    var saleAmountMaximum = SaleAmount.MAXIMUM;
    var tolerance = SaleAmount.TOLERANCE;
    var excessiveAmount = saleAmountMaximum + tolerance;

    try
    {
        CreateSaleAmount(excessiveAmount);
        Assert.Fail("SaleAmount created with excessive"+"
                    "
value should have thrown an exception");
    }
    catch (SaleAmountValueTooLargeException)
    { }
}


But now the question is, what code do we write to make this test pass?  The temptation would be to add something like this to the constructor of SaleAmount:

if(aValue => MAXIMUM + TOLERANCE) 
          throw new SaleAmountValueTooLargeException();

But this is a bit of a mistake.  Remember, it's not just "add no code without a failing test", it is "add only the needed code to make the failing test pass."

Your spec is supposed to be your pal.  He's supposed to be there at your elbow saying "don't worry.  I won't let you make a mistake.  I won't let you write the wrong code, I promise."  He's not just your pal, he's your best pal. 

Here, however, the spec is just a mediocre friend because he will let you write the wrong code and say nothing about it.  He’ll let you get in your car when you are in no condition to drive.  He'll let you do this, and let it pass:

throw new SaleAmountValueTooLargeException();

There is no conditional.  We’re just throwing the exception all the time.  That's wrong, obviously. This behavior has a boundary (as we discussed in our blog about test categories) and every boundary has two sides.  We need a little more in specification.  We need something like this:

try
{
    new SaleAmount(SaleAmount.MAXIMUM);
}
catch (SaleAmountValueTooLargeException)
{
    Assert.Fail("SaleAmount created with value at the maximum"+
                "should not have thrown an exception");
}


Now the "anAmount => MAXIMUM + TOLERANCE" part must be added to the production code or your best buddy will let you know you're blowing it.  Friends don’t let friends implement incorrectly. 
...
[1] There are a variety of ways to do this.  We’ll show one way here a bit further on.
[2] [TODO] Link to ATDD blog
[3] If you’re thinking “you used the wrong type, a long would be better” it’s a fair point.  We simply wanted to make the conceptual point that primitives do not impose domain constraints inherently, and the use of the double just makes the idea really clear.
[4] For those who dislike exponential notation, this is:
$179,769,313,486,231,520,616,720,392,992,464,536,472,240,560,432,240,240,944,616,576,160,448,992,408,768,712,032,320,616,672,472,536,248,456,776,672,352,088,672,544,960,568,304,616,280,032,664,704,344,880,448,832,696,664,856,832,848,208,048,648,264,984,808,584,712,312,912,080,856,536,512,272,
952,424,048,992,064,568,952,496,632,264,936,656,128,816,232,688,512,496,536,552,712,648,144,200,160,624,560,424,848,368
...and no cents. :)


Wednesday, November 4, 2015

Structure of Tests-As-Specifications

A big part of our thesis is that TDD is not really a testing activity, but rather a specifying activity that generates tests as a very useful side effect.  For TDD to be a sustainable process, it is important to understand the various implications of this distinction. [1]

Here, we will discuss the way our tests are structured when we seek to use them as the functional specification of the system.

A question we hear frequently is "how does TDD relate to BDD?"  BDD is "Behavior-Driven Development" a term coined by Dan North and Chris Matts in their 2006 article "Introducing BDD" [2].  Many have made various distinctions between TDD, ATDD, and BDD, but we feel these distinctions to be largely unimportant.  To us, TDD is BDD, except that we conduct the activity at a level very close to the code, and automation is much more critical. Also, we contend that “development” includes analysis and design, and thus what TDD enables is more accurately stated to be “behavior-based analysis and design”, or BBAD.

In BBAD, the general idea is that the "unit" of software that is being specified is a behavior.  Software is behavior, after all.  Software is not a noun, it is a verb.  Software’s value lies entirely in what it does, what value the user accrues as result of its behavior.  In essence, software only exists in any meaningful sense of the word when it is up and running.  The job of a software development team is to take a general-purpose computer and cause it to act in specific, valuable ways.  We call these behaviors.

The nomenclature that North and Matts proposed for specifying each behavior of a system is this: Given-When-Then.  Here's a simple example:

Given:
     User U has a valid account on our system with Username UN and password PW
     The login username is set to UN and the login password is set to PW
When:
    Login is requested
Then:
    U is logged in

Everything that software does, every behavior can be expressed in this fashion.  Each Given-When-Then expression is a specific scenario that is deemed to have business value, and that the team has taken upon itself to implement.

In TDD, when the scenario is interpreted at a test, we strive to make this scenario actionable.  So we think of these three parts of the scenario a little differently, we "verbify" them to convert these conditions into activities.

Imagine that you were a manual tester that was seeking to make sure the system was behaving correctly in terms of the scenario above.  You would not wait around until a user with a valid account happened to browse to the login page, enter his info, and click the "Login" button... you would create or identify an existing valid user and, as that person, browse to the page, enter the correct username and password, and then click the button yourself. Then you'd check to see if your login was successful.  You would do all of these things.

So the Given wasn't given, it was done by the tester (you, in this case), the When was not when, it was now do, and the Then was not a condition but rather an action: go and see if things are correct.

"Given" becomes "Setup".
"When" becomes "Trigger".
"Then" become "Verify".

We want to structure our tests in such a way that these three elements of the specification are clear and, as much as possible, separate from each other.  Typical programming languages can make this a bit challenging at times, but we can overcome these problems fairly easily.

For example: Let's say we have a behavior that calculates the arithmetic mean of two real numbers accurate within 0.1. Most likely this will be a method call on some object that takes two values as parameters and returns their arithmetic mean of those values, accurate within 0.1.

Let’s start with the Given-When-Then:

Given:
     Two real values R1 and R2
     Required accuracy A is 0.1
When:
     The arithmetic mean of R1 and R2 is requested
Then:
     The return is (R1+R2)/2, accurate to A

Let's look at a typical unit test for such a behavior:

(Code samples are in C# with MSTest as the testing framework)

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        Assert.AreEqual(5.5d,
                        MathUtils.GetInstance().

                        ArithmeticMean(7.0d, 4.0d),.1);
    }
}



This test is simple because the behavior is simple.  But this is really not great as a specification.

The Setup (creation of the MathUtils object, the creation of the example doubles 7.0d and 4.0d), the Trigger (the calling of the ArithmeticMean method with our two examples doubles), and the Verify (comparing the method's return to the expectation, 5.5d, and establishing the precision as .1), are all expressed together in the assertion.  If we can separate them, we can make the specification easier to read and also make it clear that some of these particular values are not special, that they were just picked as convenient examples.

This is fairly straightforward, but easy to miss:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {         
        // Setup
        var mathUtils = MathUtils.GetInstance();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue)/2;

        // Trigger
        var actualMean = mathUtils.ArithmeticMean(anyFirstValue,
                                                  anySecondValue);

        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }
}


Here we have included comments to make it clear that the three different aspect of this behavioral specification are now separate and distinct from each other.   The "need" for comments always seems like a smell, doesn't it?  It means we can still make this better.

But we've also used variable names like "anyFirstValue" to indicate that the number we chose was not a significant value, creating more clarity about what is important here.  Note that tolerance and expectedMean were not named in this way, because their values are specific to the required behavior.

This, now, is using TDD to form a readable specification, which also happens to be executable as a test [2].  Obviously the value of this as a test is very high; we do not intend to trivialize this.  But we write them with a different mindset when we think of them as specifications and, as we'll see, this leads to many good things.

Looking at both code examples above however, some of you may be thinking "what is this GetInstance() stuff?  I would do this: "

        // Setup
        var mathUtils = new MathUtils();

Perhaps.  We have reasons for preferring our version, which we'll set aside for its own discussion.

But the interesting question is: what if you started creating the object one way (using “new”), and then later changed your mind and used a static GetInstance() method, or maybe even some factory pattern?  If, when that change was made, you had many test methods on this class doing it the "old" way this would require the same change in all of them.

We can do it this way instead:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var arithMeticMeanCalculator =
                           GetArithmeticMeanCalculator();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue) / 2;

        // Trigger
        var actualMean = arithMeticMeanCalculator.
                         ArithmeticMean(anyFirstValue,
                                        anySecondValue);
        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }
}



Now, no matter how many test methods on this test class needed to access this arithmetic mean behavior (for different scenarios), a change in terms of how you access the behavior would only involve the modification of the single "helper" method that is providing the object for all of them.

Many testing frameworks have their own mechanisms for eliminating redundant object creation, usually in the form of a Setup() or Initialize() method, etc., and these can be used. But we prefer the method because we then gain the ability to decouple the specification from the fact that the behavior we’re specifying happens to be implemented in a class called MathUtils.  We could also change this design detail and the impact would only be on the helper method (the fact that C# has a var type is a real plus here… you might be limited a bit in other languages)

But the spec is also not about the particular method you call to get the mean, just how the calculation works, behaviorally.  Certainly an ArithmeticMean() method is logical, but what if we decided to make it more flexible, allowing any number of parameters rather than just two?  The meaning of "arithmetic mean" would not change, but our spec would have to.  Which seems wrong.  So, we could take the idea a little bit farther:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var arithmeticMeanCalculator = GetArithmeticMeanCalculator();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue) / 2;

        // Trigger
        var actualMean = TriggerArithmeticMeanCalculator(
                         arithmeticMeanCalculator, 
                         anyFirstValue, anySecondValue);
        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private double TriggerArithmeticMeanCalculator(MathUtils mathUtils, 
                                                  double anyFirstValue, 
                                                  double anySecondValue)
    {
        return mathUtils.ArithmeticMean(anyFirstValue,
            anySecondValue);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }


Now if we change the ArithmeticMean() method to take a container rather than discrete parameters, or whatever, then we only change this private helper method and not all the various specification-tests that show the behavior with more parameters, etc...

The idea here is to separate the meaning of the specification from the way the production code is designed.  We talk about the specification being one thing, and the "binding" being another.  The specification should change only if the behavior changes.  The binding (these private helpers) should only change if the design of the system changes.

Another benefit here is clarity, and readability.  Let's improve it a bit more:

[TestClass]
public class MathTests
{
    [TestMethod] 
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;

        // Trigger
        var actualMean = TriggerArithmeticMeanCalculation(
                                             anyFirstValue,  '  
                                             anySecondValue);
           
        // Verify
        var expectedMean = (anyFirstValue + anySecondValue) / 2;
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private double TriggerArithmeticMeanCalculation(
                                double anyFirstValue, 
                                double anySecondValue)
    {
        var arithmeticMeanCalculator = GetArithmeticMeanCalculator();
        return arithmeticMeanCalculator.
                                ArithmeticMean(anyFirstValue, 
                                anySecondValue);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }
}

We have moved the call GetArithmeticMeanCalculator() to the Trigger, and expectedMean to the Verification [3].  Also we changed the notion of "trigger the calculator" to "trigger the calculation". Now, remember the original specification?

Given:
     Two real values R1 and R2
     Required accuracy A is 0.1
When:
     The Arithmetic Mean of R1 and R2 is requested
Then:
     The return is (R1+R2)/2, accurate to A

The unit test, which is our specification, very closely mirrors this Given-When-Then expression of the behavior. Do we really need the comments to make that clear?  Probably not.  We’ve created a unit test that is a true specification of the behavior without coupling it to the specifics of how the behavior is expressed by the system.

Can we take this even further?  Of course... but that's for another entry. :)

[1] It should be acknowledged that Max prefers to say "it is a test which also serves as a specification."  We'll probably beat him into submission :), but for the time being that's how he likes to think of it.  We welcome discussion, as always.

[2] Better Software Magazine, March 2006.

[3] It should also be acknowledged that we're currently discussing the relative merits of using Setup/Trigger/Verify in TDD rather than just sticking with Given/When/Then throughout. See Grzegorz Gałęzowski's very interesting comment below on this (and other things).