Book now

A coding dojo story

It was 2008, and I was at the CITCON conference in Amsterdam. I’d only started going to conferences that year, and was feeling as intimidated as I was inspired by the depth of experience in the people I was meeting. It seemed like everyone at CITCON had written a book, their own mocking framework, or both.

I found myself in a session on refactoring legacy code. The session used a format that was new to me, and to most of the people in the room: a coding dojo.

Our objective, I think, was to take some very ugly, coupled code, add tests to it, and then refactor it into a better design. We had a room full of experts in TDD, refactoring, and code design. What could possibly go wrong?

One thing I learned in that session is the importance of the “no heckling on red” rule. I watched as Experienced Agile Consultant after Experienced Agile Consultant cracked under the pressure of criticism from the baying crowd of Other Experienced Agile Consultants. With so many egos in the room, everyone had an opinion about the right way to approach the problem, and nobody was shy of sharing his opinion. It was chaos!

We got almost nowhere. As each pair switched, the code lurched back and forth between different ideas for the direction it should take. When my turn came around, I tried to shut out the noise from the room, control my quivering fingers, and focus on what my pair was saying. We worked in small steps, inching towards a goal that was being ridiculed by the crowd as we worked.

The experience taught me how much coding dojo is about collaboration. The rules about when to critique code and when to stay quiet help to keep a coding dojo fun and satisfying, but they teach you bigger lessons about working with each other day to day.

Agile / Lean Software Development

Comments (0)

Permalink

Optimising a slow build? You’re solving the wrong problem

At the time I left Songkick, it took 1.5 hours to run all the cukes and rspec ‘unit’ tests on the big ball of Rails. We were already parallelising over a few in-house VMs at the time to make this manageable, but it still took 20 minutes or so to get feedback. After I left, the team worked around this by getting more slave nodes from EC2, and the build time went down to under 10 minutes.

Then guess what happened?

They added more features to the product, more tests for those features, and the build time went up again. So they added more test slave nodes. In the end, I think the total build time was something like 15 hours. 15 fucking hours! You’re hardly going to run all of that on your laptop before you check in.

The moral of this story: if you optimise your build, all you’ll do is mask the problem. You haven’t changed the trajectory of your project, you’ve just deferred the inevitable.

The way Songkick solved this took real courage. First, they started with heart-to-heart conversations with their stakeholders about removing rarely-used features from the product. Those features were baggage, and once the product team saw what it was costing them to carry that baggage, they were persuaded to remove them.

Then, with a slimmed-down feature set, they then set about carving up their architecture, so that many of those slow end-to-end Cucumber scenarios became fast unit tests for simple, decoupled web service components. Now it takes them 15 seconds to run the tests on the main Rails app. That’s more like it!

So by all means, use tricks to optimise and speed up the feedback you get from your test suite. In the short term, it will definitely help. But realise that the real problem is your architecture: if your tests take too long, the code you’re testing has too many responsibilities. The sooner you start tackling this problem head-on, the sooner you can start enjoying the benefits.

Agile / Lean Software Development

Comments (7)

Permalink

TDD vs BDD

I regularly find myself explaining to people the difference between TDD (Test-Driven Development) and BDD (Behaviour-Driven Development). There still seems to be a lot of confusion over this, so I wanted to write this up for reference.

Late last year I was interviewed for a virtual panel on InfoQ along with Dan, Gojko, and Liz. Probably the most interesting part of that conversation covered the difference between TDD and BDD. Or rather the lack of any great difference.

We’ll start with some snippets from that discussion.

Both TDD and BDD include acceptance testing

One common misconception is that TDD is what you do when you’re unit-testing, and BDD is what you do when you’re writing customer-facing acceptance tests. Here’s Dan North on that point:

TDD – as originally described – is also about the behaviour of entire systems. Kent [Beck] specifically describes it as operating on multiple levels of abstraction, not just “down in the code”. BDD is equally important in this space, because describing the behaviour of systems is fractal: you can describe different granularities of behaviour from the entire application right down to individual small components, classes or functions.

Extreme Programming has always talked about writing acceptance tests, sometimes also called functional tests to describe what the customer expects to be done at the end of an iteration.

So this is nothing new. What’s new is how we explain it, and therefore how successful teams end up being in making it work for them.

BDD describes TDD done well

When Dan was working as a coach teaching TDD, he found that it was easier to get people to understand the principles of TDD if he stopped using the word ‘test’:

My experiences as a coach told me people were missing the point, with all this talk of unit tests, acceptance tests, functional tests, integration tests… Kent Beck’s style of TDD is a very smart way to develop software, so I tried removing the word “test” when I was coaching it, replacing it with things like behaviour, examples, scenarios etc. The result was very encouraging: People seemed to “get” TDD much quicker when I avoided referring to testing.

When Aslak and I wrote the Cucumber Book, I wrote this description of BDD:

BDD builds upon TDD by formalising the good habits of the best TDD practitioners.

That’s basically all there is to it. We want to re-explain TDD in a way that highlights the habits that successful TDD practitioners having been using for over a decade.

So what are those good habits?

Specifically, I think those good habits are:

  1. Working outside-in, starting from a business or organisational goal
  2. Using examples to clarify requirements
  3. Developing and using a ubiquitous language

Working outside-in seems obvious to habitual TDD practitioners, but many teams seem to limit themselves to doing this at the level of small units of code. Business-level black-box testing is still done manually, or automated as a check after the code has already been implemented.

This misses out of the major benefit of working outside-in, which is having the requirement challenged: if you need to explain to a computer how to check the requirement, you’ll need to be damn sure understand it yourself. If you don’t (and you often don’t) it’s much cheaper to find that out before you write the code.

Examples have always been a great way to make sure you really understand a requirement. What BDD does is formalise this by encouraging you to use scenarios to describe behaviour. These examples provide the perfect bridge between the business-facing and technology-facing sides of a team: they’re just formal enough that you can get a computer to check them, but anyone on the team can read them and make sure they’re describing behaviour that they actually want.

The GOOS Book, written by two of the best TDD practitioners in the business, frequently highlights the importance of domain language in our programs. In software teams, communication is probably the biggest overhead you have, and you make that communication a lot harder when you allow different dialects of terminology to be used by different parts of the team. Developing and then sticking to a consistent language takes deliberate effort, but it’s something that the best TDD practitioners have long learned will give them a significant advantage.

My experience is that BDD’s emphasis on collaboration, and the use of business-readable, executable specifications, means that this shared language develops much more quickly. When everyone is involved in writing documentation that describes what the system should do, they all get a chance to learn the language of the domain together.

So BDD really isn’t all that different to TDD. What BDD adds is a clear emphasis on what it takes to make TDD succeed.

BDD

Comments (4)

Permalink

Skillsmatter BDD Exchange

Last week I travelled down to London to the BDD Exchange conference. It was a one-day conference organised by Gojko Adzic and I had a great time. I missed Gojko’s talk as I travelled down from my cave in Scotland on the day, but I did arrive in time to see Chris Matt’s excellent lecture on what business analysis really should be about.

I particularly enjoyed the talk from Christian Hassa about teams failing to make BDD work. We can learn the most from failure, and Christian’s thoughtful analysis of what he’s observed in the field as a consultant with TechTalk is useful to any team trying to get the most from these techniques. The message of Christian’s talk very much echoed my own, that the tooling you use is entirely secondary to the collaborative relationship you need to build between the business and technical-facing members of the team. I was interested to learn about the tool, SpecLog, TechTalk are building to help teams with this problem, which seems to have many similar goals to my own Relish. It was nice of Christian to give Relish a name-check in his talk.

My session ran along the same theme as my talk from earlier in the year at Skillsmatter, describing the value of writing acceptance tests at the right level of abstraction, so that they describe business rules rather than implementation details. You can watch the session here.

Agile / Lean Software Development

Comments (2)

Permalink

BDD Training

Update: This training is now available as a public course, starting October 8th in London.

Would you like to learn how Behaviour-Driven Development can help your company get better at software development?

I’ve helped several teams learn BDD, and I’ve started to formalise the training I’ve been doing into a set of course modules. The modules aim to provide the foundations for a teamʼs successful adoption of BDD.

We start by immersing the whole team in BDD for a day to get everyone enthusiastic about the process. Then I take the programmers and testers and implement their very first scenario, end-to-end, on their own code. Now that we’ve proved it can be done, I work with project managers, product owners, and development leads, to streamline their agile process to get the best from BDD. We practice collaborative scenario-writing sessions, we learn how to use metrics to track progress, and how Kanban and BDD can fit into your existing agile process.

Please take a look at the course prospectus and get in touch to see how I can help.

Agile / Lean Software Development
BDD

Comments (11)

Permalink

Fixing my testing workflow

Okay I’m bored of this. I need to talk about it.

I love to use Ruby, RSpec, Cucumber and Rails to do test-driven development, but my tools for running tests are just infuriatingly dumb. Here’s what I want:

  • When a test fails, it should be kept on a list until it has been seen to pass
  • When more than one test fails:
    • Show me the list, let me choose one
    • Focus on that one until it passes, or I ask to go ‘back up’ to the list
    • When it passes, go back up to the list and let me choose again
    • When the list is empty, I get a free biscuit
  • When a test case is run, a mapping should be stored to the source files that were covered as it ran so that:
    • When a file changes, I can use that mapping to guess which test cases to run. Fuck all this naming convention stuff, it’s full of holes.
    • At any time, I can pipe the git diff though the tool to figure out which test cases to run to cover the entire commit I’m about to make.

When I say test case, I personally mean:

  • An RSpec example
  • A Cucumber scenario

…but it should work for any other testing framework too.

I feel like having a tool like this that I trusted would make a huge difference to me. There are all these various scrappy little pieces of the puzzle around: guard plugins, autotest, cucover, cucumber’s rerun formatter. None of them seem to quite do it, for me. Am I missing something?

Or shall we make one?

Agile / Lean Software Development
Ruby Programming

Comments (8)

Permalink

Outside-In vs Inside Out – Comparing TDD Approaches

At last month’s ScotRUG Brian Swan and I attempted to solve the TDD Avatars problem as a live recital in our chosen style. We each had 35 minutes.

The videos are here:

Brian’s Inside-Out TDD approach

Matt’s Outside-In approach

When Brian had walked us through his approach and solution at the last month’s meeting, he’d built his solution as a Rails application, with web forms for filling out bookings and viewing receipts and so on.

When I came to start practicing and converted the use case from the TDD Avatars paper into a Cucumber feature, it quickly became clear that the value of the system I was building, at least as described by the use case, was to provide printed receipts to customers. I then started to think about the simplest way I could build a system to provide that value.

Here’s the feature I wrote:

Feature: Pay bill
 
  Background: Prices
    Given the following operations are available:
      | operation        | price |
      | routine check up | 10    |
      | shots            | 5     |
 
  Scenario: Dave Pays for Fluffy
    Given there is an owner Dave Atkins, let's call him "Dave"
    And Dave brings his pet named Fluffy into the clinic for the following operations:
      | routine check up |
      | shots            |
    When the veterinarian charges him for the visit
    And Dave pays cash
    Then Dave is given a receipt which looks like this:
      """
      Operations:
        $10 (routine check up)
        $5 (shots)
 
      Total to pay: $15
 
      Paid cash, received with thanks
 
      """

Notice that the scenario doesn’t talk about clicking particular buttons or filling in boxes on a form? I’ve used a higher-level declarative style to describe the behaviour I want. In my experience this helps in various ways:

  • more human-readable features
  • features that aren’t coupled to a particular user interface

If you watch the video, you’ll see that the first thing I did, working my way in from the step definitions, was to create a custom step definition DSL for my problem domain. Instead of using a generic DSL like Capybara’s fill_in, click_button etc, I created this one:

module VetsHelper
  def register_operation_price(operation, price)
  end
 
  def remember_owner(name, nickname)
  end
 
  def create_visit(owner_nickname, pet_name, operations)
  end
 
  def charge_for_visit
  end
 
  def pay_with(payment_type, nickname)
  end
 
  def receipt
    ""
  end
end

This is arguably unnecessary: my step definitions are already translating from English into Ruby, so why add this extra layer of indirection?

As I worked my way from the outside (the features) into the step definitions, I wasn’t ready to commit myself to how I was going to couple the tests to my new application. By defining this interface, I’ve deferred that commitment a little later. I’ve also given myself a clean view of all the behaviour the new application needs to support.

My first iteration implementation (the one in the video) of VetsHelper drives out a domain model directly from the methods in that module. If that was what we released to our user, they’d only be able to print receipts if they knew how to use an IRB prompt. That might seem ridiculous, but we’ve gone a long way to solving the problem, and we could probably spike a simple script that let them do it from the command-line without much risk.

For our second iteration, we can talk to the customer about that command-line interface, then write a new implementation of VetsHelper, perhaps using some of Aruba’s DSL, which goes through that command-line interface instead of directly to the model. This is the beauty of using a declarative style together with your own domain-specific step definition DSL: it gives you the flexibility to swap in connections to the system that hit it at different levels, using exactly the same acceptance tests.

Did BDD Save Me Time?

When Brian and I were planning this month’s session, I showed him the code I’d written and he decided to do a comparable solution this time, without any UI, so that they were easy to compare. In fact, Brian’s solution looked much simpler, and was certainly quicker to write, because he didn’t have to spend any time writing the acceptance testing layers and he didn’t write any kind of entry-point Practice class. He just went straight into building the Appointment class.

A big difference between the solution we produced this month and the one that Brian had originally built was that we didn’t use Rails, and instead went for a much simpler solution that still provided some immediate value. I like to think that the idea for doing this came from the BDD approach I took—I’m pretty sure I remember the lightbulb going on as I typed out the feature—but we’ll never know now where this idea originated.

I noticed that Brian spent time testing getters on his classes, which I probably wouldn’t have done. I tend to try to avoid using them, except on value object, and I rarely test the behaviour of value objects. I rely on my acceptance tests to tell me if they’re not working.

Focus and Design

Brian’s big take-away was that the difference in our approaches when we needed a collaborator object. When I needed a collaborator for a class, I would just mock out the collaborator and carry on finishing off the class I was building, whereas he would leave the current class broken and go and build the other class first.

I find my (mock-based) approach gives me focus, and also means I can sketch out the design of the collaborator without having to commit myself to that design until I understand how it’s going to be used.

I’m really happy with the design I ended up with. It’s hard to make much of a judgement in such a simple problem, but I’d be interested to hear your thoughts on how the two designs compare. Which one would you have preferred to add a new feature to?

Agile / Lean Software Development

Comments (0)

Permalink

Belly Wants to Eat Your Tests

Ever since I lead the team at Songkick through an Acceptance-Test-Driven re-write of their gorgeous web-ui, I’ve been thinking about problem of scaling a large suite of acceptance tests. By the time I left Songkick for the wilds of Scotland, it would take over 3 hours to run all the Cucumber tests on a single machine.

When things take that long, TDD stops being fun.

Intelligent Selection

In order to make an intelligent selection of which tests to run, you need some knowledge of the past history of each of your tests. Most testing tools are like goldfish: they run the tests and show you what failed, then on the next run they wipe the slate clean and start over. Dumb.

Sir Kent Beck, always ahead of the game, has been building an exciting new product to enable precisely this kind of selective testing for Java projects.

But I don’t work on Java projects.

Enter the Belly

I decided to build a web service that would record the history of each scenario in my Cucumber test suite, so that I could start to make decisions about which ones to run. I see no reason why this service can’t be generic enough to work for any kind of test case, but Cucumber scenarios seem like a good place to get started, since that’s where I do a lot of my testing, and they’re often slow.

Belly works by installing a little hook into your Cucumber test suite. When the tests run, Belly sends a message to the central ‘hub’ web service (currently hosted at http://belly.heroku.com) reporting the result of the test. Gradually, Belly builds up a picture of your test suite, which you can browse from the website.

Features

The current version of Belly is alpha-ware, proof-of-concept. It works, but I’m sure it won’t scale well to thousands of users with thousands of tests. I’m sure you’ll find bugs. It also looks pretty rough, but don’t let that put you off; there’s huge potential here.

works-on-my-machine

Right now, probably the most useful feature is the belly rerun command, which helps you focus on running just the cukes that you’ve broken. Rather than having to keep track of them in a rerun.txt file, Belly will remember everything you’ve broken and give you the output you need to run it again with Cucumber:

cucumber `belly rerun`

You can see a demonstration of how to get started using Belly in this slick and polished screencast.

How To

If you can’t make out the details on the horribly blurry screencast, here’s the summary:

# install the gem:
gem install belly
 
# write belly's hook and config files into your project:
belly init
 
# run your features
cucumber features
 
# see your test results live on the internets. tada!
open http://belly.heroku.com
 
# see what's left to do
belly rerun
 
# tell cucumber to run what's left to do
cucumber `belly rerun`

Disclaimer

I can’t stress how rough-and-ready this is, but I think it’s still useful enough to provoke you into giving me some feedback. Use it at your own risk, and let me know your thoughts.

Coincidences

Incredibly, it turns out that Joe Wilk, my old team-mate at Songkick and fellow Cucumber-core hacker, has been working on another solution to exactly the same problem. Living with a 3-hour build can be quite a motivator! I’m hoping Joe and I can figure out a way to combine our efforts into something really beautiful.

Agile / Lean Software Development

Comments (5)

Permalink

Battling Robots at Software Craftsmanship 2010

I’ve submitted a session for the Software Craftsmanship 2010 Conference. It’s a redux of the Robot Tournament I ran at SPA2010.

The idea behind the session is to simulate the life of a start-up software company. In the early rounds of the tournament, the priority for each team is to get a robot, any robot, out there an playing matches. As the tournament progresses, quality becomes more important as you need to adapt your robot to make it a better competitor.

This ability to adapt your approach to the work to the context you’re doing it in is, I think, really important for the true craftsperson to grasp. If you’ve been reading Kent Beck’s posts about start-ups and design, you’ll be familiar with this subject. It’s wonderful to be able to patiently produce a beautifully-worked piece of furniture, but what if the building is burning down and you just need a ladder to escape, right now? Can you rip up some floorboards and knock something together from the materials to hand and save your family?

There’s a great skill in understanding and working to the appropriate level of quality for the context you’re currently in, and I hope this session gives people a little insight into that.

You can watch my screencast audition for the session here. Please leave any feedback or comments here.

Agile / Lean Software Development

Comments (0)

Permalink

Acceptance Tests Trump Unit Tests

At work, we have been practising something approximating Acceptance Test Driven Development now for several months. This means that pretty much every feature of the system that a user would expect to be there, has an automated test to ensure that it really is.

It has given me a whole new perspective on the value of tests as artefacts produced by a project.

I made a pledge to myself when I started this new job in August that I would not (knowingly) check in a single line of code that wasn’t driven out by a failing test. At the time, I thought this would always mean a failing unit test, but I’m starting to see that this isn’t always necessary, or in fact even wise.

Don’t get me wrong. Unit testing is extremely important, and there’s no doubt that practising TDD helps you to write well-structured, low-defect code in an really satisfying manner. But I do feel like the extent to which TDD, at the level of unit testing alone, allows for subsequent changes to the behaviour of the code, has been oversold.

If you think you’re doing TDD, and you’re only writing unit tests, I think you’re doing it wrong.

As new requirements come in, the tensions influencing the design of the code shift. Refactoring eases these tensions, but by definition means that the design has to change. This almost certainly means that some, often significant, portion of the unit tests around that area of the code will have to change too.

I struggled with this for a long time. I had worked hard on those tests, for one thing, and was intuitively resistant to letting go of them. More than that, I knew that somewhere in there, they were testing behaviour that I wanted to preserve: if I threw them out, how would I know it still worked?

Yet those old unit tests were so coupled to the old design that I wanted to change…

Gulliver

In my mind, I have started to picture the tests we write to drive out a system like little strings, each one pulling at the code in a slightly different direction. The sum total of these tensions is, hopefully, the system we want right now.

While these strings are useful to make sure the code doesn’t fall loose and do something unexpected, they can sometimes mean that the code, like Gulliver in the picture above, is to restrained and inflexible to change.

The promise of writing automated tests up front is regression confidence: if every change to the system is covered by a test, then it’s impossible to accidentally reverse that change without being alerted by a failing test. Yet how often do unit tests really give us regression alerts, compared to the number of times they whinge an whine when we simply refactor the design without altering the behaviour at all? Worse still, how often do they fail to let us know when the mocks or stubs for one unit fail to accurately simulate the actual behaviour of that unit?

Enter acceptance tests.

By working at a higher level, acceptance tests give you a number of advantages over unit tests:

  • You get a much larger level of coverage per test
  • You get more space within which to refactor
  • You will test through layers to ensure they integrate correctly
  • They remain valuable even as underlying implementation technology changes

Admittedly, the larger level of coverage per test has a downside: When you get a regression failure, the signpost to the point of failure isn’t as clear. This is where unit tests come in: if you haven’t written any at all yet, you can use something like the saff squeeze to isolate the fault and cover it with a new test.

They’re also much slower to run, which can be important when you’re iterating quickly over changes to a specific part of the system.

To be clear, I’m not advocating that you stop unit testing altogether. I do feel there’s a better balance to strike, though, than forcing yourself to get 100% coverage from unit tests alone. They’re not always the most appropriate tool for the job.

To go back to the metaphor of the pulling strings, I think of acceptance tests as sturdy ropes, anchoring the system to the real world. While sometimes the little strings will need to be cut in order to facilitate a refactoring, the acceptance tests live on.

The main thing is to have the assurance that if you accidentally regress the behaviour of the system, something will let you know. As long as every change you make is driven out by some kind of automated test, be it at the system level or the unit level, I think you’re on the right track.

Agile / Lean Software Development

Comments (2)

Permalink