How I Learned to Stop Worrying and Love the Mock

I learned something very useful from Marty Nelson’s blog entry on the proficinecy levels of TDD. It was the first time that I saw a construction which included mock objects as a good thing and that also passed my sniff test. So, I hereby recant: Mocks are no more Evil than are nuclear bombs. I’ve learned to love them, as the tools of mass destruction that they are.

Actually, I really liked Marty’s formation and information. And I liked James Shore’s presentation for proficiencies of Planning. So I’m going to steal from both of them, and change just enough that I can say it’s my own work.

The important thing that both of these authors point out is something that they stole from Where Are Your Keys (a game for learning to speak languages). Achieve, and celebrate, fluency at each level of proficiency.

Someone who speaks French like Tarzan, but is able to ask simple questions and understand responses, has achieved a hugely important level of proficiency. If he can do it while late for a train and really having to pee, he’s got an extremely valuable proficiency. That person is much more likely to enjoy his trip to Paris.

Sure, we’d all like to be able to discuss morality with Sartre. And it’s good to celebrate fluent discourse in epistemology, aesthetics, and other high-falutin’ topics. We should celebrate it to the same degree as we do the person who can ask for a restroom rather than just using the nearest stairwell.

One more aside, then I promise I’ll get back to the joys of a good nuclear mocking.

For those of you who have not played WAYK, it consists of a pair of simple games with a large number of simple techniques. These techniques apply at the meta-level of the game. They help organize the thinking around learning a language. They provide specific actions, or specific patterns of behavior, that a player can do to improve their rate of achieving fluency. There are lots of good ideas here, with evocative names such as Lunatic Fringe, Angel on the Shoulder, Full, Technique Technique, Obviously!, and Craig’s List.

The one that both of these authors used is Travels With Charlie. This technique’s purpose is to organize levels of proficiency at a skill that you’re trying to learn. This lets you notice, and celebrate, success along the way. It lets you identify what you need to learn next, and what to fall back on when you’re having difficulty. And, it distinguishes proficiency (what you can do) from fluency (how automatically you can do it).

Here, then, is my view of the road to TDD mastery.

Level 1: Zog Run Test

At this level, the coder knows what an in-language automated test is. He is able to use a test framework, such as JUnit, to write tests over parts of his code. His tests may not run everywhere, they may not run quickly, and they may have all sorts of other problems. However, the developer is writing tests, and he’s not using a UI automation tool to do so. Some of his tests will survive a UI change.

Focus: “we write automated tests.”

Skills:

  • Write automated tests using a developer-oriented test framework.
  • Tests don’t go through the UI controls, so are unaffected when that changes. However, they may not be far removed from the UI.
  • Can test easy pieces of code: procedures with few dependencies.
  • Can write a good state-based test (verify state before and after calling code under test).
  • Running tests is easy; the developers can use a single command to run all the tests in their system.
  • Developers run tests frequently.
  • Developers put tests around tricky areas of code, so that they get feedback before QA can do a test run.

Signs and portents:

  • Many areas are still too hard to test, due to dependencies. These are typically tested indirectly, from a higher level.
  • Most erroroneous changes cause a test failure. However, many of them cause multiple tests to fail.
  • The suite of tests is usually still pretty quick, because the tests don’t hit the things with external dependencies.
  • The tests aren’t that useful as documentation.
  • Tests use Assert.IsTrue().

A development team that is fluent at a given level exhibits the signs and portents for that level. It is also able to apply the skills consistently and automatically—the developers need not pay attention to the skills, but use them when focusing on their work.

Level 2: Coverage is My Sun, My Moon, and All My Stars

The dev team is concerned with coverage. They’ve seen some benefits from their testing, and now want to get everything under test. They’re starting to see the benefits of tests in supporting refactoring. They may start using red-green-refactor; they’re almost certainly at least doing red-green (failing test before code). TDD stands for test-driven development.

Focus: “we test our code.”

Skills:

  • Can test any part of the system, though often with a multi-unit test.
  • Can plug in tests at many levels of the code; each test tests from there “down.”
  • Developers see themselves as partly responsible for quality. They don’t just code stuff and then toss it over the wall to QA.
  • Can test both behavior and state, but doesn’t distinguish between them.
  • Can use tests to localize defects.
  • Can use tests to replace most uses of the debugger.

Signs and portents:

  • The team attempts to get everything under test—even if they don’t see a good way to do so.
  • Most erroroneous changes cause a test failure. However, many of them cause multiple tests to fail.
  • Tests take a long time to run.
  • Tests use Assert.AreEqual().
  • Tests depend on environment. They might break when run on “someone else’s box.”
  • Tests sometimes depend on each other. Order of execution matters.
  • Tests are often long sequences of operations interleaved with assertions.
  • Tests have a lot of duplication.
  • Tests commonly use SetUp and TearDown.
  • Many to many relationship between tests & classes under test.
  • Uses code coverage to find places where tests are missing.

Level 3: An Earth-Shattering Ka-Boom!

A team fluent at this level suddenly starts writing decomposable code. They are able to hear the needs of the tests, and use them to reduce dependencies between code elements. This decreases coupling and increases (class-level-) cohesion. They start thinking of TDD as a design activity, and refactor both the code and the tests. They usually have a fair amount of legacy test built up from earlier levels, which they now refactor. TDD influences only the surface design; it doesn’t result in fundamental changes yet.

Focus: “our tests and code work together to make each other maintainable.”

Skills:

  • Can use test doubles to break dependencies when testing.
  • Can write a unit test for any part of the system, though often by artificially breaking dependencies.
  • Can use dependency injection to reduce test setup complexity.
  • Can see the value in reducing coupling & dependencies.
  • Can get legacy systems under test, safely.
  • Use of stubs, DI, or similar techniques to allow testing without a database, filesystem, or other external resource.
  • Learns from TDD how to write better tests.
  • Test ccode is kept as healthy as regular code. Little duplication in either.
  • Distinguish between state tests and behavior tests.
  • Distinguish between unit and multi-unit tests, and prefer the former.

Signs and portents:

  • Testability is a reason to refactor code.
  • 1:1 relationship between test classes and code classes.
  • Most bugs cause exactly one test to fail.
  • Build time speeds up again, as slower tests have dependencies replaced with mocks.
  • Uses code coverage to find hot spots—places where too many tests cover the same code.
  • Tests are either 3-line simple state tests, or longer mock-based tests that end with a mock.VerifyAll().
  • Use of most of the equality-style assertions (Assert.Contains, Assert.Equivalent, etc), as well as of the mock verify assertions.
  • Strongly OO design.
  • Tons of interfaces; a fair bit of inheritance.
  • SetUp and TearDown are commonly used to set up infrastructure (such as test doubles), but no longer to initialize the system under test.
  • Often writes their own test framework, mocking framework, DI tool, or some other “better way to do unit testing.”

Level 4: Knock on the Sky and Listen to the Sound!

A team fluent at this level is finally able to listen to what the tests are telling them about the design. Where above they could hear the tests talking, now they can hear what the tests are saying. Thus, the code and tests express multiple kinds of decomposition in multiple ways. They have method, class, and module cohesion. The degree and type of coupling matches the cohesion in the underlying domain. The distinction between test code and production code blurs; a fair amount of code is actually both. Testing now actually is a design activity; TDD stands for test-driven design.

Focus: “our testing teaches us about the domain and about the health of our design.”

Skills:

  • Can write a true-unit test for any part of the system; testing produces fundamental changes in the code under test.
  • Name stuff well.
  • Use tests to express intent & to design how a thing should be used.
  • Listen to the testing process to receive design feedback. When received, refactor appropriately—which sometimes means not refactoring.
  • Use techniques from both OO and Functional paradigms to enable decoupling without violating encapsulation.
  • Use many different ways to attach program parts together, depending on the desired coupling level.

Signs and portents:

  • Can safely check in after every red-green-refactor cycle, every minute or two.
  • 1:1 relationship between test classes and system responsiblities. When this isn’t a 1:1 mapping between test class and system class, treat that as design feedback (violation of SRP).
  • Sees the need for a mock as design feedback (system has too much coupling).
  • Sees the difficulty to name a test class as design feedback (lack of cohesion of responsibility).
  • Sees the need to do behavior testing as design feedback (typically a method that combines query with update).
  • Uses simulators, anti-corruption layers, bounded contexts, and other mechanisms to remove cross-boundary dependencies. Tests provide design feedback to identify when those are needed.
  • No remaining use for DI or mocking, so they tend not to get used.
  • Testing updaters becomes a combination of testing the simple updaters at the end of binding sequences, testing the state of bindings, and testing that the code results in the right events firing (or continuations, or whatever).
  • Tests use custom assertions & conditions. Often those condition objects are used by both test and production code.
  • Very little inheritance and few interfaces. Lots of composition. Often a fair amount of function dispatch (events, higher-order functions, fields with a function type, etc).
  • SetUp and TearDown are rarely used. The team eliminates duplicative initializations by eliminating initialization.
  • Throws away their custom “better way to do testing” tool, and usually most of their 3rd-party testing support libraries as well. The only tools they use regularly are a library of assertions, a way to indicate that something is a test, and two runners (command-line in build and in-IDE for dev).

So, in the end, there is a good use for mocks (& DI). The value in the third level is learning to write cohesive, loosely-coupled code. In order to use mocks, you have to achieve at least a minimal amount of cohesion (so that you have something to mock) and decoupling (so that you can put the mock in).

A lightweight mocking framework—one which is unable to mock out static methods and other tightly coupled constructs—can help a team learn to write decoupled code. They are not the only way to gain proficiency at level 3, but they are a way that provides direct feedback and guidance during the learning process.

Also, mocks allow a team to start learning to write maintainable tests without having to learn good design yet. It’s a lot easier to learn these separately.

Mocks are weapons of mass destruction. They can be usefully applied to a code base in order to explode it into lots of testable bits. We should celebrate fluency at this level of proficiency. A team which is very comfortable with mocks can blow any system into little pieces extremely quickly. The functionality of each fragment is much…simpler, and thus a lot easier to test.

The team should be valued for their capabilities. They’ll get a lot done. They’ll be able to build things out of the rubble, then shift them to another purpose fairly quickly. They’ll certainly move faster than a team that is trapped in a vine-infested legacy jungle.

Mocks aren’t Evil. They’re one of the most effective ways to turn a jungle of intertwined vines and obstructions into easy-to-test green glass.

So I invite you all to celebrate the mock. Wave your cowboy hat as you ride it all the way down. And then take what you’ve learned and find a way to accomplish your ends with a little less collateral damage.

20 thoughts on “How I Learned to Stop Worrying and Love the Mock”

  1. Hey Arlo,

    I enjoyed how you tied it all up at the end with the 'Mocks as a useful weapon of mass destruction' bit.

    Maybe we'll get to have some beers again next year @ Rock Bottom for ALT .NET Seattle!

    Cheers,

    Mario

    1. It has been rumored that I am always up for beer.

      Always.

      And there are many nights in Seattle beyond the Alt.Net ones.

  2. The leap from level 3 to 4 seems to me much bigger leap than from 1 to 2 or 2 to 3. You're essentially declaring that (1) all dependency graphs should be dynamic and (2) a dependency graph should be inspectable at runtime. It's the latter that lets you forget about mocks. Instead of inferring a dependency graph by a test that says (for example):

    should "try calculate output value from input" do
    during {
    @button.click!
    }.behold! {
    @slider.receives(:intValue) { 5 }
    @textfield.receives(:stringValue) { "some string" }
    @output_field.receives(:setStringValue, "Your result is…")
    }
    end

    (If that doesn't format right, see https://gist.github.com/1001003 )

    … you check the dependency graph directly. If you were programming in Clojure (as in my comment to https://arlobelshee.com/essay/decoupled-design ), you could make the graph explicit enough that checking its construction would seem silly.

    Perhaps I'm missing the point and need more and larger examples.

    1. I may well be missing some levels. I certainly don't feel that I've found the final set. I figure that this is one area that will be re-examined in my stages of practice workshop at the conference this year…and often between now and then as I try to figure out the starting point for that discussion.

      As for the step between 3 & 4 being larger, how do you mean?

      If you mean more of an assumption upon my part, then I agree. Level 4 is closer to my preferred style. Currently, there are a tremendous number of people out there making the 2 to 3 transition or feeling comfortable in 3 (and trying to build ever more powerful mocking libraries). I personally find more effectiveness in "listening to the tests and changing the design," than in "listening to the tests and changing the tests but leaving the SUT unaltered." Thus, I prefer to not use mocks, and I see it as a later stage.

      Others could well disagree. If so, we can discuss.

      OTOH, if you mean that the step between 3 & 4 is greater for the practitioner, then I have to disagree. It is simply a step of a different nature. In the earlier steps, the practitioner is learning new skills in an existing language. Those skills are applied without regart to context. To gain rough proficiency in step 4, they "merely" have to learn to listen to the context and learn a new thing for every context. Fluency, at this level, is mostly about how easily they can invent a new solution to a new problem.

      I feel steps 1-3 are mastered with practice. You learn each skill, and then are fluent. Step 4, however, is an hour to learn and a lifetime to master.

    1. I wish.

      Unfortunately, even I don't write code that would be ideal under my rubric. I'll always see one more thing that is clunky to test, or where the testability screws up an otherwise simple design. However, each project I manage to find solutions for a bunch of problems that I didn't solve in the last one.

      Also, I don't get a chance to write a bunch of OSS. Which is to say I've started several projects, but the only ones that have survived to get very far along are the ones where I just hack away without doing any TDD at all. Aka, those where I achieve a large amount of business value in the first 8 hours' work, then never touch it again. The others achieve little in the first 8 hours, whereapon I never touch it again.

      Hopefully Minions, which has gotten some 20 hrs work at this point, will be the first exception. We'll see on that.

      1. It'll be interesting to see when it turns up. I agree with most of your points but don't understand the extent of your anti-mock position. And, until I see the goods, I can't see what I might have been doing wrong. In particular, I don't understand why "behaviour" testing implies "a method that combines query with update".

        1. First, this post is a partial retraction of my previous position on mocks. My historical position has been intentionally, and provocatively, extreme (Mocks are Evil). I am still against them for me, because they don't work out as well for me as do other options. But I'm not against them for you.

          What I've found, for me, is that if I let myself test by using doubles, and use a mock framework to make use of those doubles simpler, then I frequently test with doubles rather than refactoring to code such that a double is not needed. I have seen this behavior in others as well, though I have by no means seen everyone, and do believe that there are some for whom it is not true.

          To be clear, doubles are still an option in my toolbag, and I still use them. They're just well down the priority tree. They hang out with things like code generation that are used to work around flawed designs. They let me make progress when I know that a design is flawed, but don't see a way to fix it. By telling myself that these things are Evil (and agreeing on that with the team), we keep seeing the smell and trying to do something about it. Often, this eventually leads to either a refactor to a deeper design (typically a cohesion fix) or learning a new technique for decomposition.

          Mocks, per se (the kind that verify that certain methods were called), I think may well be totally Evil. Doubles are in my toolbox; mocks are not.

          1. Well, provocative statements are easy. As one might expect, I'm still struggling to understand just what you actually do in practice.

            Perhaps you could write up one example of a smell and its refectoring. The example you've posted has interesting ideas but then it has a null model with no dependencies.

          2. I hear you, and I agree. I am currently putting together an example, gleaned from one of my past hobby projects. It looks like it'll have to end up as a seriess of blog posts. I'm using a full (though small) project as my example, so there are lots of useful bits and lots of messy bits.

          3. I have a question about Mocks (mocks specifically, as opposed to other doubles).

            Suppose I have a Button object, and I want to be able to subscribe to events from it. Specifically, if I call button.addListener(me), then I want it to call my click() method whenever it's clicked.

            I'd use a mock in this case. Indeed, I think of it as pretty much the pinnacle of justifiable mocking, so it seems like a useful example to ask about.

            What's evil here: Is it the mocking, meaning that the code would be better tested some other way? Is it the underlying design? Is this just collateral damage–fine on its own, but not worth the risk of encouraging the mocking habit? Or is the answer more complicated than this?

          4. Well, if you let me use that answer, then of course "the answer is more complicated than this."
            I can think of a design which I'd prefer: I think most UI elements should send events to functions, or better yet, to data pipelines. Command pattern is a good option, especially with databinding, as used on MVVM. This lets me fully test the view model without ever having a UI, and then simply test that the databinding declarative code is correct (without invoking anything).
            That said, most UI toolkits are frameworks, not libraries. So if they weren't written to be easily used with view models, databinding, and command, then it's a bear to add that. So then I might have to use a mock – at least a hand-rolled test double – to verify that the expected thing happens on button click (and especially that the right thing happens when code updates the UI).
            Also, I do subscribe to teh collateral damage argument. There are places where mocks are useful. There are times where they are useful in all places. However, becoming comfortable with them tends, I find, to reduce the number of times when I (or people I know) look for a better design.

        2. As regards the second point: behavior testing doesn't necessarily imply a method that combines query with update. However, a method that combines query with update is very difficult (not impossible) to test with anything but behavior testing.

          There are several other uses for behavior testing, but I find them similarly to be design problems (method on wrong class, split responsibility, etc). So, I find behavior testing to be a strong indication that something is wrong in the design. I don't know what it is, but once I see where it is, I can usually see what it is very quickly.

          And, I think the most common in the code bases I've seen is a method that combines query with update (usually several of each, in an alternating sequence, with some conditional logic thrown in for joy).

          Once I fix these design problems, I can usually test the resulting system with state tests (usually here is something like 98%). When I can't, then I move on, but I leave it in the back of my brain. I assume that there is a design flaw in there that I'm just not a skilled enough designer to see, yet. And often I come back to it a year or so later and do see the design flaw.

      2. The thing about WAYK and Travels with Charlie is that you need that fluent fool who can model that next level for you. And, although I have never seen this Charlie Rose, I am told he's a non-fictional guy who has documented conversations with people, (though Sartre may not be one of them).

        But here in the realm of TDD mastery, without a fluent speaker to run the game at "Knock on the Sky" level, we're left wondering if what you've written actually describes a thing that exists in this world, or if it's one of B. S. Johnson's designs that doesn't work outside of Ankh-Morpork.

  3. Great piece!

    What do you think the role of mocking frameworks plays as well? Do they help or hurt people who either want to get started mocking or want to mock the unmockable?

    1. Mocking frameworks certainly make it easier to get started with mocking. I agree with the wisdom of the clouds here.

      However, I disagree with the general sentiment that mocking is a good thing forever. And my impression (based on too little data) is that mocking frameworks make it more difficult to see the non-mocking design improvement options. That would / does make it more difficult to go beyond using mocks to force splits to the point of actually writing loosely-coupled, highly-cohesive code.

      I would be all for someone finding another technique to teach "people who write procedural code and call it OO" how to actually create a cohesive class with a small number of responsibilities that delegates to other classes rather than violating their encapsulation. Currently, it seems that mocking is one of the best ways to teach people this critical OO design skill. It allows people who lack the skill to fake it 'till they learn it.

      Of course, a more powerful mocking framework allows more faking, which can inhibit learning. For example, some frameworks allow you to mock out calls to static functions, typically by selective re-writing of VM bytecode. This reinforces the bad habit of writing OO systems where 90% of the methods are static procedures – procedural code with OO trappings.

      In my impression, a weak mock framework is optimal for the development of good OO design skill (and is better than no mocking framework at all). A powerful mock framework is optimal for getting highly-indebted, highly-coupled code under test, but inhibits learning. Use the right one for the context.

      And both are tools to be used for a time, but to constantly strive to get rid of. As you gain skill and the code gains suppleness, you will use them less. Eventually you can do away with the powerful mock framework (the code is supple), but you may still use the weak one for quite a long time (you have more to learn).

      1. For the record, I absolutely agree that mock libraries should not be too powerful. I've just been reviewing a codebase where use of one of the more adventurous frameworks has done a tremendous amount of damage.

        Now, all I need to understand is the next step in the logic.

  4. It is one of B.S. Johnson's ideas. Taken to an absolute extreme, it violates the laws of physics. However, applied just a little this side of absolutely bonkers, it's crazy effective (at least, it is for me).

    I agree, it does help a lot to have a fluent master TDD speaker. And I hope to provide many examples through this blog (over time). I've been thinking for a while that I'll likely end up writing an "alternatives to mocks" series. I don't have examples on hand, but I can generate them on demand (I do at work most days). So I'll get off my arse and do that publicly at some point.

    However, you can also bootstrap yourself to that level of proficiency. You won't be fluent at it for a long time, but you can get there.

    This is why I stole the title for the 4th level from a Zen koan. I feel level 4 to be much like Zen. You can get there yourself. In fact, you are already there, but for making a complete change of the way you think and then taking a set of actions almost identical to what you did when fluent only at level 2.

    Unfortunately, I am insufficiently a Zen master to phrase it as a koan, or even as a question. I can phrase a statement, which, for me, blocks out the modes of thought that gave me level 3 proficiency (mocks are a strategic nuclear bomb. Use accordingly).

    That said, my attempt at the question would be: "test and code under test are one and the same. How do I test in isolation code which blends into the code around it? How does each piece of my code perceive itself together and apart from other codde at once?," or "How can I see in isolation the beauty of the whole?" Perhaps that's the koan form too, or perhaps it's a modification of the Zen kaon I chose (very non-randomly): (My tests) knock on the sky. Listen to the sound!

  5. In reply to the comments asking for examples, I am starting a new blog mini-series. I'm posting a partially-complete project, and showing how some of these ideas played out in that project. See the Mock Free Example blog series.

Leave a Reply

Your email address will not be published. Required fields are marked *