Net Objectives

Net Objectives
If you are interested in coaching or training in ATDD or TDD please click here.

Wednesday, November 4, 2015

Structure of Tests-As-Specifications

A big part of our thesis is that TDD is not really a testing activity, but rather a specifying activity that generates tests as a very useful side effect.  For TDD to be a sustainable process, it is important to understand the various implications of this distinction. [1]

Here, we will discuss the way our tests are structured when we seek to use them as the functional specification of the system.

A question we hear frequently is "how does TDD relate to BDD?"  BDD is "Behavior-Driven Development" a term coined by Dan North and Chris Matts in their 2006 article "Introducing BDD" [2].  Many have made various distinctions between TDD, ATDD, and BDD, but we feel these distinctions to be largely unimportant.  To us, TDD is BDD, except that we conduct the activity at a level very close to the code, and automation is much more critical. Also, we contend that “development” includes analysis and design, and thus what TDD enables is more accurately stated to be “behavior-based analysis and design”, or BBAD.

In BBAD, the general idea is that the "unit" of software that is being specified is a behavior.  Software is behavior, after all.  Software is not a noun, it is a verb.  Software’s value lies entirely in what it does, what value the user accrues as result of its behavior.  In essence, software only exists in any meaningful sense of the word when it is up and running.  The job of a software development team is to take a general-purpose computer and cause it to act in specific, valuable ways.  We call these behaviors.

The nomenclature that North and Matts proposed for specifying each behavior of a system is this: Given-When-Then.  Here's a simple example:

Given:
     User U has a valid account on our system with Username UN and password PW
     The login username is set to UN and the login password is set to PW
When:
    Login is requested
Then:
    U is logged in

Everything that software does, every behavior can be expressed in this fashion.  Each Given-When-Then expression is a specific scenario that is deemed to have business value, and that the team has taken upon itself to implement.

In TDD, when the scenario is interpreted at a test, we strive to make this scenario actionable.  So we think of these three parts of the scenario a little differently, we "verbify" them to convert these conditions into activities.

Imagine that you were a manual tester that was seeking to make sure the system was behaving correctly in terms of the scenario above.  You would not wait around until a user with a valid account happened to browse to the login page, enter his info, and click the "Login" button... you would create or identify an existing valid user and, as that person, browse to the page, enter the correct username and password, and then click the button yourself. Then you'd check to see if your login was successful.  You would do all of these things.

So the Given wasn't given, it was done by the tester (you, in this case), the When was not when, it was now do, and the Then was not a condition but rather an action: go and see if things are correct.

"Given" becomes "Setup".
"When" becomes "Trigger".
"Then" become "Verify".

We want to structure our tests in such a way that these three elements of the specification are clear and, as much as possible, separate from each other.  Typical programming languages can make this a bit challenging at times, but we can overcome these problems fairly easily.

For example: Let's say we have a behavior that calculates the arithmetic mean of two real numbers accurate within 0.1. Most likely this will be a method call on some object that takes two values as parameters and returns their arithmetic mean of those values, accurate within 0.1.

Let’s start with the Given-When-Then:

Given:
     Two real values R1 and R2
     Required accuracy A is 0.1
When:
     The arithmetic mean of R1 and R2 is requested
Then:
     The return is (R1+R2)/2, accurate to A

Let's look at a typical unit test for such a behavior:

(Code samples are in C# with MSTest as the testing framework)

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        Assert.AreEqual(5.5d,
                        MathUtils.GetInstance().

                        ArithmeticMean(7.0d, 4.0d),.1);
    }
}



This test is simple because the behavior is simple.  But this is really not great as a specification.

The Setup (creation of the MathUtils object, the creation of the example doubles 7.0d and 4.0d), the Trigger (the calling of the ArithmeticMean method with our two examples doubles), and the Verify (comparing the method's return to the expectation, 5.5d, and establishing the precision as .1), are all expressed together in the assertion.  If we can separate them, we can make the specification easier to read and also make it clear that some of these particular values are not special, that they were just picked as convenient examples.

This is fairly straightforward, but easy to miss:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {         
        // Setup
        var mathUtils = MathUtils.GetInstance();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue)/2;

        // Trigger
        var actualMean = mathUtils.ArithmeticMean(anyFirstValue,
                                                  anySecondValue);

        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }
}


Here we have included comments to make it clear that the three different aspect of this behavioral specification are now separate and distinct from each other.   The "need" for comments always seems like a smell, doesn't it?  It means we can still make this better.

But we've also used variable names like "anyFirstValue" to indicate that the number we chose was not a significant value, creating more clarity about what is important here.  Note that tolerance and expectedMean were not named in this way, because their values are specific to the required behavior.

This, now, is using TDD to form a readable specification, which also happens to be executable as a test [2].  Obviously the value of this as a test is very high; we do not intend to trivialize this.  But we write them with a different mindset when we think of them as specifications and, as we'll see, this leads to many good things.

Looking at both code examples above however, some of you may be thinking "what is this GetInstance() stuff?  I would do this: "

        // Setup
        var mathUtils = new MathUtils();

Perhaps.  We have reasons for preferring our version, which we'll set aside for its own discussion.

But the interesting question is: what if you started creating the object one way (using “new”), and then later changed your mind and used a static GetInstance() method, or maybe even some factory pattern?  If, when that change was made, you had many test methods on this class doing it the "old" way this would require the same change in all of them.

We can do it this way instead:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var arithMeticMeanCalculator =
                           GetArithmeticMeanCalculator();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue) / 2;

        // Trigger
        var actualMean = arithMeticMeanCalculator.
                         ArithmeticMean(anyFirstValue,
                                        anySecondValue);
        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }
}



Now, no matter how many test methods on this test class needed to access this arithmetic mean behavior (for different scenarios), a change in terms of how you access the behavior would only involve the modification of the single "helper" method that is providing the object for all of them.

Many testing frameworks have their own mechanisms for eliminating redundant object creation, usually in the form of a Setup() or Initialize() method, etc., and these can be used. But we prefer the method because we then gain the ability to decouple the specification from the fact that the behavior we’re specifying happens to be implemented in a class called MathUtils.  We could also change this design detail and the impact would only be on the helper method (the fact that C# has a var type is a real plus here… you might be limited a bit in other languages)

But the spec is also not about the particular method you call to get the mean, just how the calculation works, behaviorally.  Certainly an ArithmeticMean() method is logical, but what if we decided to make it more flexible, allowing any number of parameters rather than just two?  The meaning of "arithmetic mean" would not change, but our spec would have to.  Which seems wrong.  So, we could take the idea a little bit farther:

[TestClass]
public class MathTests
{
    [TestMethod]
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var arithmeticMeanCalculator = GetArithmeticMeanCalculator();
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;
        var expectedMean = (anyFirstValue + anySecondValue) / 2;

        // Trigger
        var actualMean = TriggerArithmeticMeanCalculator(
                         arithmeticMeanCalculator, 
                         anyFirstValue, anySecondValue);
        // Verify
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private double TriggerArithmeticMeanCalculator(MathUtils mathUtils, 
                                                  double anyFirstValue, 
                                                  double anySecondValue)
    {
        return mathUtils.ArithmeticMean(anyFirstValue,
            anySecondValue);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }


Now if we change the ArithmeticMean() method to take a container rather than discrete parameters, or whatever, then we only change this private helper method and not all the various specification-tests that show the behavior with more parameters, etc...

The idea here is to separate the meaning of the specification from the way the production code is designed.  We talk about the specification being one thing, and the "binding" being another.  The specification should change only if the behavior changes.  The binding (these private helpers) should only change if the design of the system changes.

Another benefit here is clarity, and readability.  Let's improve it a bit more:

[TestClass]
public class MathTests
{
    [TestMethod] 
    public void TestArithmeticMeanOfTwoValues()
    {
        // Setup
        var anyFirstValue = 7.0;
        var anySecondValue = 4.0;
        var tolerance = .1;

        // Trigger
        var actualMean = TriggerArithmeticMeanCalculation(
                                             anyFirstValue,  '  
                                             anySecondValue);
           
        // Verify
        var expectedMean = (anyFirstValue + anySecondValue) / 2;
        Assert.AreEqual(expectedMean, actualMean, tolerance);
    }

    private double TriggerArithmeticMeanCalculation(
                                double anyFirstValue, 
                                double anySecondValue)
    {
        var arithmeticMeanCalculator = GetArithmeticMeanCalculator();
        return arithmeticMeanCalculator.
                                ArithmeticMean(anyFirstValue, 
                                anySecondValue);
    }

    private MathUtils GetArithmeticMeanCalculator()
    {
        return MathUtils.GetInstance();
    }
}

We have moved the call GetArithmeticMeanCalculator() to the Trigger, and expectedMean to the Verification [3].  Also we changed the notion of "trigger the calculator" to "trigger the calculation". Now, remember the original specification?

Given:
     Two real values R1 and R2
     Required accuracy A is 0.1
When:
     The Arithmetic Mean of R1 and R2 is requested
Then:
     The return is (R1+R2)/2, accurate to A

The unit test, which is our specification, very closely mirrors this Given-When-Then expression of the behavior. Do we really need the comments to make that clear?  Probably not.  We’ve created a unit test that is a true specification of the behavior without coupling it to the specifics of how the behavior is expressed by the system.

Can we take this even further?  Of course... but that's for another entry. :)

[1] It should be acknowledged that Max prefers to say "it is a test which also serves as a specification."  We'll probably beat him into submission :), but for the time being that's how he likes to think of it.  We welcome discussion, as always.

[2] Better Software Magazine, March 2006.

[3] It should also be acknowledged that we're currently discussing the relative merits of using Setup/Trigger/Verify in TDD rather than just sticking with Given/When/Then throughout. See Grzegorz Gałęzowski's very interesting comment below on this (and other things). 

2 comments:

  1. Hi, thanks for a great post! A few comments from me (I'm not trying to prove you wrong, just sharing my preferences. Hope you find them constructive):

    1. Do you find it valuable to introduce another notation (I mean Setup/Trigger/Verify) for unit tests besides Given/When/Then? I just use the latter everywhere, even though I know there's already an alternative in form of "arrange, act, assert". I prefer given/when/then because I fin they make the specification a bit more declarative, plus I have one terminology.
    2. You say: "To us, TDD is BDD, except that we conduct the activity at a level very close to the code, and automation is much more critical". In reality, BDD touches the unit level as well, at least as I see it. From what I remember, Dan North came up with the convention of naming test classes "XYZBehaviors" and methods starting with "should" - this was really invented for the lower-level tests. As I understand it, BDD is applicable to all levels (even above the acceptance tests level), but, while conceptually the same, is technically translated to something different at each level.

    3. I remember Liz Keogh's post about BDD language: http://lizkeogh.com/2012/05/30/showcasing-the-language-of-bdd/

    There, she touched on the subject of unit level behavior specification with the words: "Instead of writing a test, you’re going to write an example of how you can use your class (and you can’t use it except through public methods). You’re going to show why your class is valuable to other classes."

    I sympathize with this approach. The difference I see between your attempt and Liz's words (this is just my interpretation) is that you seem to want to specify business value in unit tests, while Liz says that the value is in how the class is useful to other classes. This is why I tend not to use helper methods too much - my mindset is that I should be able to show how a class is valuable by using this class directly in a test. Of course, on higher levels than unit level, I specify different things, so I want to show something different - then I use a lot of helper methods to convey the meaning.

    4. Also, I found that using helper methods takes away a bit of the "diagnostic pain" I remember you talking about. What I mean is that when I don't use helper methods to convey the meaning of what I do, I tend to pay more attention to shaping the API of the class to convey this meaning. So, taking the example of math utils, if I would be tempted to use a method like "TriggerAveragingBehavior();", it would lead me to redistributing the responsibilities and convert my production code to something like: "averagingBehavior.TriggerFor(anyFirstDouble, anySecondDouble)". The result may look laughable for this toy example, but you have to give me the credit for doing away with "utils" class :-P which I usually treat as a smell. Also, I would tend not to hide the arguments from triggering the average as I find it valuable to know what I am applying the operation to, even on conceptual level. Even if I was to write this scenario in Gherkin, and I would have steps like "Given first number is 7.0 And second number is 4.0", I would still write at least "When I calculate average for ___those___ numbers". I would never write "When I calculate average" - even when reading gherkin, I find such scenarios a bit hard to follow. Again, this is a toy example, but in real-life cases I find I like to be explicit about what matters.


    ReplyDelete
  2. I agree with what Grzegorz Gałęzows says in his comment. Part of the reason for directly accessing the unit under test is to test its API. Let’s take your example. One starts with:

    mathUtils.ArithmeticMean(anyFirstValue,anySecondValue);

    and now you decide you need a method that takes any number of values.
    You can change the API in at least two ways:

    1.) Add an additional method that takes an array (or some collection)

    mathUtils.ArithmeticMean(int [] values );

    If you do not get rid of the current method, you do not need to change anything in your tests.
    Inside of mathUtils, the two value version can call this version, so there is no redundancy in implementation.

    2.) Add this additional method and then take away the two-value one. That means that anyone calling the two value version (particularly production code) has to change.
    The fact that you have to change tests indicates that you are changing an API which means that production code has to change. That suggests that maybe you want to keep the original method.

    If you did not keep that two-value method, then production code would probably create a convenience method to avoid redundant code. Where else to better put that convenience methods but in MathUtils?

    You suggest that:

    “Now if we change the ArithmeticMean() method to take a container rather than discrete parameters, or whatever, then we only change this private helper method and not all the various specification-tests that show the behavior with more parameters, etc...”

    I would suspect that you are over-testing ArithmeticMean () if it is getting called from lots of places. And even if these extra tests were not redundant, than those tests suggest that there will be lots of production code that has to change .

    There are tradeoffs in everything and how you see tradeoffs is based on your experiences. To me, adding another layer of abstraction to every class under test would have little benefit and much cost. Your experience may differ and I’d really like to see real life examples of where this has paid off.

    ReplyDelete