Book now

Costs and benefits of test automation

I think balance is important. Whenever I teach people about BDD or automated testing, we make a list of the costs and benefits of test automation.

The lists typically look something like this:

Benefits:

  • thorough analysis of a requirement
  • confidence to refactor
  • quick feedback about defects
  • repeatable test
  • living / trustworthy documentation
  • frees up manual testers for more interesting exploratory testing

Costs:

  • time spent leaning how to write a test
  • time spent writing the test
  • waiting for the test to run
  • time spent maintaining the test
  • having a false sense of security when am automated test is passing

The benefits are great, but don’t underestimate the costs. If your team are in the early stages of adopting test automation, you’re going to invest a lot of time in learning how to do it well. You’ll make some mistakes and end up with tests that are hard to maintain.

Even once you’re proficient, it’s important for each test to justify its existence in your test suite. Does it provide enough of a benefit to justify the investment needed to write it, and the ongoing maintenance cost? Is there a way to bring down the ongoing cost, for example by making it faster?

I also find that listing costs and benefits helps to tackle skepticism. Having a balanced discussion makes space for everyone’s point of view.


I’m teaching a public BDD course in London, 4-6 December. If you’d like to take part you can sign up here:

http://kickstartacademy.io/dates#london

BDD

Comments (1)

Permalink

Half-arsed agile

Transitioning to agile is hard. I don’t think enough people are honest about this.

The other week I went to see a team at a company. This team are incredibly typical of the teams I go to see at the moment. They’d adopted the basic agile practices from a ScrumMaster course, and then coasted along for a couple of years. Here’s the practices I saw in action:

  • having daily stand-up meetings
  • working in fixed-length iterations, called sprints
  • tracking their work as things called Stories
  • estimating their work using things called Story Points
  • trying to predict a release schedule using a thing called Velocity

This is where it started to fall down. This team have no handle on their velocity. It seems to vary wildly from sprint to sprint, and the elephant in the room is that it’s steadily dropping.

I see this a lot. Even where the velocity appears to be steady, it’s often because the team have been gradually inflating their estimates as time has gone on. They do this without noticing, because it genuinely is getting harder and harder to add the same amount of functionality to the code.

Why? Sadly, the easiest bits of agile are not enough on their own.

Keeping the code clean

Let’s have a look at what the team are not doing:

  • continuous integration, where everyone on the team knows the current status of the build
  • test-driven development
  • refactoring
  • pair programming

All of these practices are focussed on keeping the quality of the code high, keeping it malleable, and ultimately keeping your velocity under control in the long term.

Yet most agile teams, don’t do nearly enough of them. My view is that product owners should demand refactoring, just as the owner of a restaurant would demand their staff kept a clean kitchen. Most of the product owners I meet don’t know the meaning of the term, let alone the business benefit.

So why does this happen?

Reasons

Firstly, everyone on the team needs to understand what these practices are, and how they benefit the business. Without this buy-in from the whole team, it’s a rare developer who has the brass neck to invest enough time in writing tests and refactoring. Especially since these practices are largely invisible to anyone who doesn’t actually read the code.

On top of this, techniques like TDD do –despite the hype– slow a team down initially, as they climb the learning curve. It takes courage, investment, and a bit of faith to push your team up this learning curve. Many teams take one look at it and decide not to bother.

The dirty secret is that without these technical practices, your agile adoption is hollow. Sure, your short iterations and your velocity give you finer control over scope, but you’re still investing huge amounts of money having people write code that will ultimately have to be thrown away and re-written because it can no longer be maintained.

What to do

Like any investment decision, there’s a trade-off. Time and money spent helping developers learn skills like TDD and refactoring is time and money that could be spent paying them to build new features. If those features are urgently needed, then it may be the right choice in the short term to forgo the quality and knock them out quickly. If everyone is truly aware of the choice you’re making, and the consequences of it, I think there are situations where this is acceptable.

In my experience though, it’s far more common to see teams sleep-walking into this situation without having really attempted the alternative. If you recognise this as a problem on your team, take the time to explain to everyone what the business benefits of refactoring are. Ask them: would you throw a dinner party every night without doing the washing up?


Does the world need to hear this message? Vote for this article on Hacker News

Agile / Lean Software Development

Comments (9)

Permalink

How much do you refactor?

Refactoring is probably the main benefit of doing TDD. Without refactoring, your codebase degrades, accumulates technical debt, and eventually has to be thrown away and rewritten. But how much refactoring is enough? How do you know when to stop and get back to adding new features?

TDD loop (image credit: Nat Pryce)

I get asked this question a lot when I’m coaching people who are new to TDD. My answers in the past have been pretty wooly. Refactoring is something I do by feel. I rely on my experience and instincts to tell me when I’m satisfied with the design in the codebase and feel comfortable with adding more complexity again.

Some people rely heavily on metrics to guide their refactoring. I like the signals I get from metrics, alerting me to problems with a design that I might not have noticed, but I’ll never blindly follow their advice. I can’t imagine metrics ever replacing my design intuition.

So how can I give TDD newbies some clear advice to follow? The advice I’ve been giving them up to now has been this:

There are plenty of codebases that suffer from too little refactoring but not many that suffer from too much. If you’re not sure whether you’re doing enough refactoring, you’re probably not.

I think this is a good general rule, but I’d like something more concrete. So today I did some research.

Cucumber’s new Core

This summer my main coding project has been to re-write the guts of Cucumber. Steve Tooke and I have been pairing on a brand new gem, cucumber-core that will become the inner hexagon of Cucumber v2.0. We’ve imported some code from the existing project, but the majority is brand new code. We use spikes sometimes, but all the code in the master branch has been written test-first. We generally make small, frequent, commits and we’ve been refactoring as much as we can.

There are 160 commits in the codebase. How can I look back over those and work out which ones were refactoring commits?

Git log

My first thought was to use git log --dirstat which shows where your commit has changed files. If the commit doesn’t change the tests, it must be a refactoring commit.

Of the 160 commits in the codebase, 58 of them don’t touch the specs. Because we drive all our changes from tests, I’m confident that each of these must be a refactoring commit. So based on this measure alone, at least 36% of all the commits in our codebase are refactorings.

Sometimes though, refactorings (renaming something, for example) will legitimately need to change the tests too. How can we identify those commits?

Commit message

One obvious way is to look at the commit message. It turns out that a further 11 (or 7%) of the commits in our codebase contained the word ‘refactor’. Now we know that at least 43% of our commits are refactorings.

This still didn’t feel like enough. My instinct is that most of our commits are refactorings.

Running tests

One other indication of a refactoring is that the commit doesn’t increase the number of tests. Sure, it’s possible that you change behaviour by swapping one test for another one, but this is pretty unlikely. In the main, adding new features will mean adding new tests.

So to measure this I extended my script to go back over each commit that hadn’t already been identified as a refactoring, check out the code and run the tests. I then did the same for the previous commit, and compared the results. All the tests had to pass, otherwise it didn’t count as a refactoring. If the number of passing tests was unchanged, I counted it as a refactoring.

Here are the results now:

Refactoring vs Feature Adding Commits

Wow. So according to this new rule, less than 25% of the commits to our codebase have added features. The rest have been either improving the design, or perhaps improvements to the build infrastructure. That feels about right from my memory of our work on the code, but it’s still quite amazing to see the chart.

Conclusions

It looks as though in this codebase, there are about three refactoring commits for every one that adds new behaviour.

There will be some errors in how I’ve collected the data, and I may have made some invalid assumptions about what does or does not constitute a refactoring commit. It’s also possible that this number is artificially high because this is a new codebase, but I’m not so sure about that. We know the Cucumber domain pretty well at this stage, but we are being extremely rigorous to pay down technical debt as soon as we spot it.

We have no commercial pressure on us, so we can take our time and do our best to ensure the design is ready before forcing it to absorb more complexity.

If you’re interested, here’s the script I used to analyse my git repo. I realise it’s a cliche to end your blog post with a question, but I’d love to hear how this figure of 3:1 compares to anything you can mine from your own codebases.

Update 28 July 2013: Corrected ratio from 4:1 to 3:1 – thanks Mike for pointing out my poor maths!

Agile / Lean Software Development
BDD

Comments (4)

Permalink

Death to sleeps!

When I run workshops to review and improve people’s automated tests, a common problem I see is the use of sleeps.

I have a simple rule about sleeps: I might use them to diagnose a race condition, but I never check them into source control.

This blog post will look at what it means to use sleeps, why people do it, why they shouldn’t, and what the alternatives are.

TL;DR

If you don’t have time to read this whole article, you can sum it up with this quote from Martin Fowler’s excellent essay on the subject:

Never use bare sleeps to wait for asynchonous responses: use a callback or polling. — MartinFowler.com

Why sleep?

When two code paths run in parallel and then meet at a certain point, you have what’s called a race condition. For example, imagine you’re testing the AJAX behaviour of Google Search. Your test says something like this:

Given I am on the google homepage
When I type "matt" into the search box
Then I should see a list of results
And the wikipedia page for Matt Damon should be the top result

Notice that I didn’t hit Enter in the test, so the results we’re looking for in the two Then steps will be populated by asynchronous javascript calls. As soon as the tests have finished typing “Matt” into the search box, we have a race on our hands: will the app be able to return and populate the results before the tests examine the page to see if the right results are there?

We don’t need this kind of excitement in automated tests. They need to be deterministic, and behave exactly the same way each time they’re run.

The easy route to achieve this is to handicap the tests so that they always lose. By adding a sleep into the test, we can give the app sufficient time to fetch the results, and everything is good.

Given I am on the google homepage
When I type "matt" into the search box
And I wait for 3 seconds
Then I should see a list of results
And the wikipedia page for Matt Damon should be the top result

Of course in practice you’d push this sleep down into step definitions, but you get the point.

So why is this a bad idea?

What’s wrong with sleeps?

Sleeps quickly add up. When you use sleeps, you normally have to pad out the delay to a large number of seconds to give you confidence that the test will pass reliably, even when the system is having a slow day.

This means that most of the time, your tests will be sleeping unnecessarily. The system has already got into the state you want, but the tests are hanging around for a fixed amount of time.

All this means you have to wait longer for feedback. Slow tests are boring tests, and boring tests are no fun to work with.

What can I do instead?

The goal is to minimise the time you waste waiting for the system to get into the right state. As soon as it reaches the desired state, you want to move on with the next step of your test. There are two ways to achieve that:

  1. Have the system send out events (which the tests can listen for) as soon as it’s done
  2. Poll the system regularly to see if it has reached the right state yet

Using events is great when you can. You don’t need to use some fancy AMQP setup though; this can be a simple as touching a known file on the filesystem which the tests are polling for. Anything to give a signal to the tests that the synchronisation point has been reached. Using events has the advantage that you waste absolutely no time – as soon as the system is ready, the tests are notified and they’re off again.

In many situations though, polling is a more pragmatic option. This does involve the use of sleeps, but only a very short one, in a loop where you poll for changes in the system. As soon as the system reaches the desired state, you break out of the loop and move on.

How Capybara can save you

Many people using Capybara for web automation don’t realise how sophisticated it is for solving this problem.

For example, if you ask Capybara to find an element, it will automatically poll the page if it can’t find the element right away:

find('.results') # will poll for 5 seconds until this element appears

After five seconds, if the element hasn’t appeared, Capybara will raise an error. So your tests won’t get stuck forever.

This also works with assertions on Capybara’s page object:

page.should have_css('.results')

Similarly, if you want to wait for something to disappear before moving on, you can tell Capybara like this:

page.should have_no_css('.loading')

The reason you need to use should have_no_css here, rather than should_not have_css is because the have_no_css matcher is going to deliberately poll the page until the thing disappears. Think about what will happen if you use the have_css matcher instead, even with a negative assertion.

A more generic polling loop

As Jonas explained, there used to be a wait_until method on Capybara’s API, but it was removed. It’s easy enough to roll your own, but you can also use a library like anticipate if you’d rather not reinvent the wheel.

BDD

Comments (2)

Permalink

A coding dojo story

It was 2008, and I was at the CITCON conference in Amsterdam. I’d only started going to conferences that year, and was feeling as intimidated as I was inspired by the depth of experience in the people I was meeting. It seemed like everyone at CITCON had written a book, their own mocking framework, or both.

I found myself in a session on refactoring legacy code. The session used a format that was new to me, and to most of the people in the room: a coding dojo.

Our objective, I think, was to take some very ugly, coupled code, add tests to it, and then refactor it into a better design. We had a room full of experts in TDD, refactoring, and code design. What could possibly go wrong?

One thing I learned in that session is the importance of the “no heckling on red” rule. I watched as Experienced Agile Consultant after Experienced Agile Consultant cracked under the pressure of criticism from the baying crowd of Other Experienced Agile Consultants. With so many egos in the room, everyone had an opinion about the right way to approach the problem, and nobody was shy of sharing his opinion. It was chaos!

We got almost nowhere. As each pair switched, the code lurched back and forth between different ideas for the direction it should take. When my turn came around, I tried to shut out the noise from the room, control my quivering fingers, and focus on what my pair was saying. We worked in small steps, inching towards a goal that was being ridiculed by the crowd as we worked.

The experience taught me how much coding dojo is about collaboration. The rules about when to critique code and when to stay quiet help to keep a coding dojo fun and satisfying, but they teach you bigger lessons about working with each other day to day.

Agile / Lean Software Development

Comments (0)

Permalink

Optimising a slow build? You’re solving the wrong problem

At the time I left Songkick, it took 1.5 hours to run all the cukes and rspec ‘unit’ tests on the big ball of Rails. We were already parallelising over a few in-house VMs at the time to make this manageable, but it still took 20 minutes or so to get feedback. After I left, the team worked around this by getting more slave nodes from EC2, and the build time went down to under 10 minutes.

Then guess what happened?

They added more features to the product, more tests for those features, and the build time went up again. So they added more test slave nodes. In the end, I think the total build time was something like 15 hours. 15 fucking hours! You’re hardly going to run all of that on your laptop before you check in.

The moral of this story: if you optimise your build, all you’ll do is mask the problem. You haven’t changed the trajectory of your project, you’ve just deferred the inevitable.

The way Songkick solved this took real courage. First, they started with heart-to-heart conversations with their stakeholders about removing rarely-used features from the product. Those features were baggage, and once the product team saw what it was costing them to carry that baggage, they were persuaded to remove them.

Then, with a slimmed-down feature set, they then set about carving up their architecture, so that many of those slow end-to-end Cucumber scenarios became fast unit tests for simple, decoupled web service components. Now it takes them 15 seconds to run the tests on the main Rails app. That’s more like it!

So by all means, use tricks to optimise and speed up the feedback you get from your test suite. In the short term, it will definitely help. But realise that the real problem is your architecture: if your tests take too long, the code you’re testing has too many responsibilities. The sooner you start tackling this problem head-on, the sooner you can start enjoying the benefits.

Agile / Lean Software Development

Comments (7)

Permalink

TDD vs BDD

I regularly find myself explaining to people the difference between TDD (Test-Driven Development) and BDD (Behaviour-Driven Development). There still seems to be a lot of confusion over this, so I wanted to write this up for reference.

Late last year I was interviewed for a virtual panel on InfoQ along with Dan, Gojko, and Liz. Probably the most interesting part of that conversation covered the difference between TDD and BDD. Or rather the lack of any great difference.

We’ll start with some snippets from that discussion.

Both TDD and BDD include acceptance testing

One common misconception is that TDD is what you do when you’re unit-testing, and BDD is what you do when you’re writing customer-facing acceptance tests. Here’s Dan North on that point:

TDD – as originally described – is also about the behaviour of entire systems. Kent [Beck] specifically describes it as operating on multiple levels of abstraction, not just “down in the code”. BDD is equally important in this space, because describing the behaviour of systems is fractal: you can describe different granularities of behaviour from the entire application right down to individual small components, classes or functions.

Extreme Programming has always talked about writing acceptance tests, sometimes also called functional tests to describe what the customer expects to be done at the end of an iteration.

So this is nothing new. What’s new is how we explain it, and therefore how successful teams end up being in making it work for them.

BDD describes TDD done well

When Dan was working as a coach teaching TDD, he found that it was easier to get people to understand the principles of TDD if he stopped using the word ‘test’:

My experiences as a coach told me people were missing the point, with all this talk of unit tests, acceptance tests, functional tests, integration tests… Kent Beck’s style of TDD is a very smart way to develop software, so I tried removing the word “test” when I was coaching it, replacing it with things like behaviour, examples, scenarios etc. The result was very encouraging: People seemed to “get” TDD much quicker when I avoided referring to testing.

When Aslak and I wrote the Cucumber Book, I wrote this description of BDD:

BDD builds upon TDD by formalising the good habits of the best TDD practitioners.

That’s basically all there is to it. We want to re-explain TDD in a way that highlights the habits that successful TDD practitioners having been using for over a decade.

So what are those good habits?

Specifically, I think those good habits are:

  1. Working outside-in, starting from a business or organisational goal
  2. Using examples to clarify requirements
  3. Developing and using a ubiquitous language

Working outside-in seems obvious to habitual TDD practitioners, but many teams seem to limit themselves to doing this at the level of small units of code. Business-level black-box testing is still done manually, or automated as a check after the code has already been implemented.

This misses out of the major benefit of working outside-in, which is having the requirement challenged: if you need to explain to a computer how to check the requirement, you’ll need to be damn sure understand it yourself. If you don’t (and you often don’t) it’s much cheaper to find that out before you write the code.

Examples have always been a great way to make sure you really understand a requirement. What BDD does is formalise this by encouraging you to use scenarios to describe behaviour. These examples provide the perfect bridge between the business-facing and technology-facing sides of a team: they’re just formal enough that you can get a computer to check them, but anyone on the team can read them and make sure they’re describing behaviour that they actually want.

The GOOS Book, written by two of the best TDD practitioners in the business, frequently highlights the importance of domain language in our programs. In software teams, communication is probably the biggest overhead you have, and you make that communication a lot harder when you allow different dialects of terminology to be used by different parts of the team. Developing and then sticking to a consistent language takes deliberate effort, but it’s something that the best TDD practitioners have long learned will give them a significant advantage.

My experience is that BDD’s emphasis on collaboration, and the use of business-readable, executable specifications, means that this shared language develops much more quickly. When everyone is involved in writing documentation that describes what the system should do, they all get a chance to learn the language of the domain together.

So BDD really isn’t all that different to TDD. What BDD adds is a clear emphasis on what it takes to make TDD succeed.

BDD

Comments (5)

Permalink

Skillsmatter BDD Exchange

Last week I travelled down to London to the BDD Exchange conference. It was a one-day conference organised by Gojko Adzic and I had a great time. I missed Gojko’s talk as I travelled down from my cave in Scotland on the day, but I did arrive in time to see Chris Matt’s excellent lecture on what business analysis really should be about.

I particularly enjoyed the talk from Christian Hassa about teams failing to make BDD work. We can learn the most from failure, and Christian’s thoughtful analysis of what he’s observed in the field as a consultant with TechTalk is useful to any team trying to get the most from these techniques. The message of Christian’s talk very much echoed my own, that the tooling you use is entirely secondary to the collaborative relationship you need to build between the business and technical-facing members of the team. I was interested to learn about the tool, SpecLog, TechTalk are building to help teams with this problem, which seems to have many similar goals to my own Relish. It was nice of Christian to give Relish a name-check in his talk.

My session ran along the same theme as my talk from earlier in the year at Skillsmatter, describing the value of writing acceptance tests at the right level of abstraction, so that they describe business rules rather than implementation details. You can watch the session here.

Agile / Lean Software Development

Comments (2)

Permalink

BDD Training

Update: This training is now available as a public course, starting October 8th in London.

Would you like to learn how Behaviour-Driven Development can help your company get better at software development?

I’ve helped several teams learn BDD, and I’ve started to formalise the training I’ve been doing into a set of course modules. The modules aim to provide the foundations for a teamʼs successful adoption of BDD.

We start by immersing the whole team in BDD for a day to get everyone enthusiastic about the process. Then I take the programmers and testers and implement their very first scenario, end-to-end, on their own code. Now that we’ve proved it can be done, I work with project managers, product owners, and development leads, to streamline their agile process to get the best from BDD. We practice collaborative scenario-writing sessions, we learn how to use metrics to track progress, and how Kanban and BDD can fit into your existing agile process.

Please take a look at the course prospectus and get in touch to see how I can help.

Agile / Lean Software Development
BDD

Comments (11)

Permalink

Fixing my testing workflow

Okay I’m bored of this. I need to talk about it.

I love to use Ruby, RSpec, Cucumber and Rails to do test-driven development, but my tools for running tests are just infuriatingly dumb. Here’s what I want:

  • When a test fails, it should be kept on a list until it has been seen to pass
  • When more than one test fails:
    • Show me the list, let me choose one
    • Focus on that one until it passes, or I ask to go ‘back up’ to the list
    • When it passes, go back up to the list and let me choose again
    • When the list is empty, I get a free biscuit
  • When a test case is run, a mapping should be stored to the source files that were covered as it ran so that:
    • When a file changes, I can use that mapping to guess which test cases to run. Fuck all this naming convention stuff, it’s full of holes.
    • At any time, I can pipe the git diff though the tool to figure out which test cases to run to cover the entire commit I’m about to make.

When I say test case, I personally mean:

  • An RSpec example
  • A Cucumber scenario

…but it should work for any other testing framework too.

I feel like having a tool like this that I trusted would make a huge difference to me. There are all these various scrappy little pieces of the puzzle around: guard plugins, autotest, cucover, cucumber’s rerun formatter. None of them seem to quite do it, for me. Am I missing something?

Or shall we make one?

Agile / Lean Software Development
Ruby Programming

Comments (8)

Permalink