Optimising a slow build? You’re solving the wrong problem

At the time I left Songkick, it took 1.5 hours to run all the cukes and rspec ‘unit’ tests on the big ball of Rails. We were already parallelising over a few in-house VMs at the time to make this manageable, but it still took 20 minutes or so to get feedback. After I left, the team worked around this by getting more slave nodes from EC2, and the build time went down to under 10 minutes.

Then guess what happened?

They added more features to the product, more tests for those features, and the build time went up again. So they added more test slave nodes. In the end, I think the total build time was something like 15 hours. 15 fucking hours! You’re hardly going to run all of that on your laptop before you check in.

The moral of this story: if you optimise your build, all you’ll do is mask the problem. You haven’t changed the trajectory of your project, you’ve just deferred the inevitable.

The way Songkick solved this took real courage. First, they started with heart-to-heart conversations with their stakeholders about removing rarely-used features from the product. Those features were baggage, and once the product team saw what it was costing them to carry that baggage, they were persuaded to remove them.

Then, with a slimmed-down feature set, they then set about carving up their architecture, so that many of those slow end-to-end Cucumber scenarios became fast unit tests for simple, decoupled web service components. Now it takes them 15 seconds to run the tests on the main Rails app. That’s more like it!

So by all means, use tricks to optimise and speed up the feedback you get from your test suite. In the short term, it will definitely help. But realise that the real problem is your architecture: if your tests take too long, the code you’re testing has too many responsibilities. The sooner you start tackling this problem head-on, the sooner you can start enjoying the benefits.

Agile / Lean Software Development

Comments (7)


Using Cucumber for Load Testing

I sometimes get asked whether it’s possible to use Cucumber to test performance. The way to do it is to specify concrete examples of scenarios that the system will find itself in under stress. For example:

Given there are 100,000 users registered on the system
When I create a new account
Then I should be taken to my dashboard within 5ms
Given 1000 users are hitting the homepage simultaneously
Then each user should get a response within 2ms
Talking through these kinds of scenarios with your stakeholders will help you to understand where the boundary is for what they consider to be acceptable performance. You might find it hard to get them to be this specific at first, but help them understand that what you’re doing is drawing a line in the sand for the minimum acceptable performance – most of the time the application may be much faster than this. Now you have agreement about this, the next step is to work out how to automate these scenarios, by calling your load testing tool of choice from the Cucumber step definitions.

The key thing is to have Cucumber delegate to the stress testing tool, rather than the other way around. A common mistake people make is to simply point JMeter at existing Cucumber scenarios, but this doesn’t give you the benefit of having the parameters of the performance test documented in readable Cucumber scenarios.

These are not normal kinds of Cucumber tests. You would need to run these against an environment that’s representative of your production hardware, whereas normal behaviour scenarios could be run against any environment. It’s useful to create these scenarios early on during the project and run them against every build so they become constraints that every build must meet as the project progresses.

Agile / Lean Software Development

Comments (8)


Fixing my testing workflow

Okay I’m bored of this. I need to talk about it.

I love to use Ruby, RSpec, Cucumber and Rails to do test-driven development, but my tools for running tests are just infuriatingly dumb. Here’s what I want:

  • When a test fails, it should be kept on a list until it has been seen to pass
  • When more than one test fails:
    • Show me the list, let me choose one
    • Focus on that one until it passes, or I ask to go ‘back up’ to the list
    • When it passes, go back up to the list and let me choose again
    • When the list is empty, I get a free biscuit
  • When a test case is run, a mapping should be stored to the source files that were covered as it ran so that:
    • When a file changes, I can use that mapping to guess which test cases to run. Fuck all this naming convention stuff, it’s full of holes.
    • At any time, I can pipe the git diff though the tool to figure out which test cases to run to cover the entire commit I’m about to make.

When I say test case, I personally mean:

  • An RSpec example
  • A Cucumber scenario

…but it should work for any other testing framework too.

I feel like having a tool like this that I trusted would make a huge difference to me. There are all these various scrappy little pieces of the puzzle around: guard plugins, autotest, cucover, cucumber’s rerun formatter. None of them seem to quite do it, for me. Am I missing something?

Or shall we make one?

Agile / Lean Software Development
Ruby Programming

Comments (8)


Cucumber: Why Bother?

It’s perfectly possible to write automated acceptance tests without using Cucumber. You can just write them in pure Ruby. Take this test for withdrawing cash from an ATM:

Scenario: Attempt withdrawal using stolen card 
  Given I have $100 in my account 
  But my card is invalid
  When I request $50
  Then my card should not be returned 
  And I should be told to contact the bank

We could automate that test using good old Test::Unit, perhaps something like this:

require 'test/unit'
class WithdrawlTests < Test::Unit::TestCase
  def test_attempt_widthrawl_using_stolen_card
    bank = Bank.new
    account = Account.new(bank)
    card = DebitCard.new(account)
    atm = Atm.new(bank)
    assert atm.card_withheld?, "Expected the card to be withheld by the ATM"
    assert_equal "Please contact the bank.", atm.message_on_screen

The big disadvantage of writing acceptance tests in pure Ruby like this is that it’s unlikely you’ll be able to show this test to your team’s analyst without their eyes glazing over.


Unless your analyst is, or has recently been, a programmer themselves, they won’t be able to see past the noise of Ruby’s syntax, clean as it may be, to understand the actual behaviour that’s being specified. The specification of behaviour and the implementation of the test are all mixed up together, and that’s a problem if we want to get feedback from our stakeholders about whether we’ve specified the right thing before we go ahead and build it.

If we want the benefits of using plain language to write our behaviour specification, then we need a way to translate that into automation code that actually pulls and pokes at our application. Step definitions give you a translation layer between the plain-language specification of behviour and the test automation code, mapping the Gherkin steps of each scenario to Ruby code that Cucumber can execute.

The cost of this extra layer is complexity: Yes, you have more test code to maintain than you would if you stuck to writing your tests in pure Ruby. The benefit is clarity: by separating the what (the features) from the how (the ruby automation code), you keep each part simpler and easier for its target audience to understand.

Agile / Lean Software Development

Comments (11)


Seven Truths Exercise

Recently, I played a game with a team I was training which I called “Seven Truths of Test Automation”.

I got each “truth” and wrote it on an index card and put it in a (blank) envelope. I got to the training room early and hid them around the room, not very well, so that they were quite easy to find.

We did some other stuff first, and people were looking at all these little envelopes sticking out from behind whiteboards and under cushions, wondering what was going to happen.

Then I told them that we were going to play a game. I told them that there are seven truths of test automation, and they were about to discover them. I split them into 6 small groups. I then told them they would each go out and discover a truth. When they found it, they had to ask themselves the following questions:

  • Do we agree with this?
  • What are the implications of acting on this truth?
  • What are we going to do now?

I wrote those questions on a flip-chart. We then played the game out as a group on a single truth to make sure everyone got it. I made sure that we had a full and frank discussion about whether we agreed with the truth or not. We thought through the implications (good and bad) and listed them out. We then talked about some concrete steps we could take. When they offered vague intents “We’ll start getting better at reducing duplication” I urged them to say exactly what they were going to do next week to get better at reducing duplication.

Then they split off and each did their own. They wrote up a poster and we did a gallery at the end of the session where they had a chance to share their learning with the group.

It worked really well, generated great energy and used up a good couple of hours.

For the record, I think the truths I used with this group were:

  • test automation is software development (this is the one I picked to do with the whole group)*
  • duplication destroys maintainability*
  • incidental details destroy maintainability*
  • know when it’s OK to cheat
  • some things just aren’t worth testing
  • work with developers to make their systems testable
  • don’t test other people’s code

It felt a bit egotistical handing them down My Seven Truths, so I made the point that they were just my truths, and they would discover their own as they learned to become better testers. You could obviously vary the truths depending on what you think the that group needs to hear / discuss, and those truths could be about anything, not just test automation.

Agile / Lean Software Development

Comments (3)


Belly Wants to Eat Your Tests

Ever since I lead the team at Songkick through an Acceptance-Test-Driven re-write of their gorgeous web-ui, I’ve been thinking about problem of scaling a large suite of acceptance tests. By the time I left Songkick for the wilds of Scotland, it would take over 3 hours to run all the Cucumber tests on a single machine.

When things take that long, TDD stops being fun.

Intelligent Selection

In order to make an intelligent selection of which tests to run, you need some knowledge of the past history of each of your tests. Most testing tools are like goldfish: they run the tests and show you what failed, then on the next run they wipe the slate clean and start over. Dumb.

Sir Kent Beck, always ahead of the game, has been building an exciting new product to enable precisely this kind of selective testing for Java projects.

But I don’t work on Java projects.

Enter the Belly

I decided to build a web service that would record the history of each scenario in my Cucumber test suite, so that I could start to make decisions about which ones to run. I see no reason why this service can’t be generic enough to work for any kind of test case, but Cucumber scenarios seem like a good place to get started, since that’s where I do a lot of my testing, and they’re often slow.

Belly works by installing a little hook into your Cucumber test suite. When the tests run, Belly sends a message to the central ‘hub’ web service (currently hosted at http://belly.heroku.com) reporting the result of the test. Gradually, Belly builds up a picture of your test suite, which you can browse from the website.


The current version of Belly is alpha-ware, proof-of-concept. It works, but I’m sure it won’t scale well to thousands of users with thousands of tests. I’m sure you’ll find bugs. It also looks pretty rough, but don’t let that put you off; there’s huge potential here.


Right now, probably the most useful feature is the belly rerun command, which helps you focus on running just the cukes that you’ve broken. Rather than having to keep track of them in a rerun.txt file, Belly will remember everything you’ve broken and give you the output you need to run it again with Cucumber:

cucumber `belly rerun`

You can see a demonstration of how to get started using Belly in this slick and polished screencast.

How To

If you can’t make out the details on the horribly blurry screencast, here’s the summary:

# install the gem:
gem install belly
# write belly's hook and config files into your project:
belly init
# run your features
cucumber features
# see your test results live on the internets. tada!
open http://belly.heroku.com
# see what's left to do
belly rerun
# tell cucumber to run what's left to do
cucumber `belly rerun`


I can’t stress how rough-and-ready this is, but I think it’s still useful enough to provoke you into giving me some feedback. Use it at your own risk, and let me know your thoughts.


Incredibly, it turns out that Joe Wilk, my old team-mate at Songkick and fellow Cucumber-core hacker, has been working on another solution to exactly the same problem. Living with a 3-hour build can be quite a motivator! I’m hoping Joe and I can figure out a way to combine our efforts into something really beautiful.

Agile / Lean Software Development

Comments (5)


Acceptance Tests Trump Unit Tests

At work, we have been practising something approximating Acceptance Test Driven Development now for several months. This means that pretty much every feature of the system that a user would expect to be there, has an automated test to ensure that it really is.

It has given me a whole new perspective on the value of tests as artefacts produced by a project.

I made a pledge to myself when I started this new job in August that I would not (knowingly) check in a single line of code that wasn’t driven out by a failing test. At the time, I thought this would always mean a failing unit test, but I’m starting to see that this isn’t always necessary, or in fact even wise.

Don’t get me wrong. Unit testing is extremely important, and there’s no doubt that practising TDD helps you to write well-structured, low-defect code in an really satisfying manner. But I do feel like the extent to which TDD, at the level of unit testing alone, allows for subsequent changes to the behaviour of the code, has been oversold.

If you think you’re doing TDD, and you’re only writing unit tests, I think you’re doing it wrong.

As new requirements come in, the tensions influencing the design of the code shift. Refactoring eases these tensions, but by definition means that the design has to change. This almost certainly means that some, often significant, portion of the unit tests around that area of the code will have to change too.

I struggled with this for a long time. I had worked hard on those tests, for one thing, and was intuitively resistant to letting go of them. More than that, I knew that somewhere in there, they were testing behaviour that I wanted to preserve: if I threw them out, how would I know it still worked?

Yet those old unit tests were so coupled to the old design that I wanted to change…


In my mind, I have started to picture the tests we write to drive out a system like little strings, each one pulling at the code in a slightly different direction. The sum total of these tensions is, hopefully, the system we want right now.

While these strings are useful to make sure the code doesn’t fall loose and do something unexpected, they can sometimes mean that the code, like Gulliver in the picture above, is to restrained and inflexible to change.

The promise of writing automated tests up front is regression confidence: if every change to the system is covered by a test, then it’s impossible to accidentally reverse that change without being alerted by a failing test. Yet how often do unit tests really give us regression alerts, compared to the number of times they whinge an whine when we simply refactor the design without altering the behaviour at all? Worse still, how often do they fail to let us know when the mocks or stubs for one unit fail to accurately simulate the actual behaviour of that unit?

Enter acceptance tests.

By working at a higher level, acceptance tests give you a number of advantages over unit tests:

  • You get a much larger level of coverage per test
  • You get more space within which to refactor
  • You will test through layers to ensure they integrate correctly
  • They remain valuable even as underlying implementation technology changes

Admittedly, the larger level of coverage per test has a downside: When you get a regression failure, the signpost to the point of failure isn’t as clear. This is where unit tests come in: if you haven’t written any at all yet, you can use something like the saff squeeze to isolate the fault and cover it with a new test.

They’re also much slower to run, which can be important when you’re iterating quickly over changes to a specific part of the system.

To be clear, I’m not advocating that you stop unit testing altogether. I do feel there’s a better balance to strike, though, than forcing yourself to get 100% coverage from unit tests alone. They’re not always the most appropriate tool for the job.

To go back to the metaphor of the pulling strings, I think of acceptance tests as sturdy ropes, anchoring the system to the real world. While sometimes the little strings will need to be cut in order to facilitate a refactoring, the acceptance tests live on.

The main thing is to have the assurance that if you accidentally regress the behaviour of the system, something will let you know. As long as every change you make is driven out by some kind of automated test, be it at the system level or the unit level, I think you’re on the right track.

Agile / Lean Software Development

Comments (3)


Is the Value Fetish Killing Agile Teams?

Last weekend I was at CITCON Europe, a great opportunity to meet some of the leading minds in the agile software movement. One intriguing new term I heard a few times was “value fetish”. Let me try to explain what I think it means, and discuss the implications for agile teams.

Continue Reading »

Agile / Lean Software Development

Comments (7)


WatiN Goes Cross-Browser

The WatiN (Web Application Testing In .Net) framework, a port of the popular watir framework in ruby, has recently announced support for Firefox. This should make it a compelling alternative to selenium, especially as it looks to be a good deal quicker.

Sweet. Now if only I had a way to serve up an ASP.NET web application from code. Could this be what I need?

Agile / Lean Software Development

Comments (1)


Awesome Acceptance Testing

My notes on DanNorth and JoeWalnes‘ session at Spa 2008.

Five artefacts:

  • Automation – the glue that binds the tests to the code
  • Vocabulary – the language that the tests are expressed in
  • Syntax – the technology that the tests are expressed in (C#, Java)
  • Intent – the actual scenario being tested
  • Harness – the thing that runs the tests and tells you if they passed

Four roles. People might fill more than one, or more than one person might be in a role:

  • Stakeholder
  • Analyst
  • Tester
  • Developer

Taking a requirement, the Stakeholder and the Analyst have a conversation:

  • what does that requirement mean?
  • how can we create a shared understanding?

Then the Analyst and the Tester have a conversation:

  • what is the scope of (‘bigness’) of this requirement?
  • how will we know when we’re done?
  • => Scenarios (examples)

Tester then ‘monkeyfies’ the scenarios, using the following template:

Given … - assumptions, context in which the scenario occurs.

When … - user action, interaction with the system

Then … - expected outcome

e.g. Given we have an account holder Joe and their current account contains $100 and the interest rate is 10% When Joe withdraws $120 Then Joe’s balance should be $-22

The tester and the developer sit down and write an automated test to implement each scenario.

You might chain these up, but you can always categorise test code into these three partitions. This really helps how you look at test code.

Consistency Validation Between ‘Units’

See the Consumer Driven Contracts paper on Martin Fowler‘s website.

Tooling for Automation

Consider extending / creating the domain model to cover the application itself – the UI, the back end.

Loads of tools are availlable. Use whatever works and build on it.

Building a Vocabulary

Ubiquitous Language – Start with a shared language. It becomes ubiquitous when it appears everywhere – documents, code, databases, conversations.

You will use different vocabularies in different bounded contexts. A context might be your problem domain, testing domain, software domain, or the user interface domain.

Beware which roles understand you when you’re talking in a particular domain. Often terms will span domains.

e.g. NHibernateCustomerRepository <– 1 –><– 2–><– 3 –>

1 = 3rd Party Provider Domain 2 = Problem Domain 3 – Software Domain

Make your tests tell a story – make it flow. Don’t hide away things in Setup methods that will make the test hard to read. If that means a little bit of duplication, so be it. ‘Damp not DRY’.

Syntax – Implementing Your Tests

  • write your own
  • keep it simple. don’t fart around writing too fancy a DSL. you’ll be surprised what testers / analysts / stakeholders will be prepared to read.
  • great way to learn
  • Jbehave2
  • training wheels?
  • rspec
  • very nice.
  • create templates for each given / when / then which you can plug together with parameter values into scenarios
  • fit
  • concordion
  • nbehavejoe ocampo

Basically what you need is a way to assemble different permutations and combinations of Given / When / Then with different parameters to make different scenarios.

Expressing Intent

Think in terms of narrative, flow. Think in terms of bounded contexts, and who the audience (role) is for that context. Who will understand that vocabulary?

Make sure the intent is clear – that’s the main thing.


Do you want to hook into continuous integration build?

Which version of the code is it going to run against?

Keep the tests in two buckets: * in progress * done

Those which are in the ‘done’ bucket, should always work, those which are in progress are allowed to be failing, until you make them pass.

Getting Started

Things you can do today.

  • Try it for your next requirement
  • Given When Then helps guide the tests
  • It’s a collaborative process – get people involved
  • Works for bug fixes
  • a bug is a scenario that you missed in the first place use the tools you’re most comfortable with
  • doesn’t have to be perfect

Down The Line

What to aim for.

  • ALL requirements have acceptance criteria specified up front
  • helps with estimation
  • acceptance tests are automated where appropriate
  • just having thought about it helps – you may come back to automating it later.
  • Push button, availlable to all.
  • helps build trust with stakeholders


  • Automate pragmatically
  • Don’t try to automate what you can’t do manually
  • Testing is validating an outcome against intention
  • Non functional requirements
  • Plan for false positives
  • Quality is a variable
  • doesn’t mean you don’t go test first
  • doesn’t mean low quality code
  • does mean how complete is the solution? – how many scenarios / edge cases are you going to try and meet?


  • Have a shared understanding of done
  • There is no Golden Hammer
  • Be aware of the five aspects of test automation
  • Automation, Vocabulary, Syntax, Intent, Harness
  • Start simple, then you can benefit now

Agile / Lean Software Development

Comments (5)