Fix RubyMine 2.02 Cucumber Integration

If you’re using the latest version of RubyMine (2.0.2) with the latest version of Cucumber (actually anything above 0.7), you’ll probably see this ugly warning when you try to run your cukes from within the IDE:


The bug has been logged, and there’s a published workaround, but I wanted something a bit easier to use.

Try this instead. Close RubyMine, open a terminal, and run this command:

curl | patch -p0

or even

curl -L | patch -p0

It won’t work with Cucumbers older than 0.7, but why would you want to use them?

Update: If you like life on the bleeding edge, you can also try the EAP release (latest development build) of the forthcoming RubyMine 2.5, which contains this fix.

Belly Wants to Eat Your Tests

Ever since I lead the team at Songkick through an Acceptance-Test-Driven re-write of their gorgeous web-ui, I’ve been thinking about problem of scaling a large suite of acceptance tests. By the time I left Songkick for the wilds of Scotland, it would take over 3 hours to run all the Cucumber tests on a single machine.

When things take that long, TDD stops being fun.

Intelligent Selection

In order to make an intelligent selection of which tests to run, you need some knowledge of the past history of each of your tests. Most testing tools are like goldfish: they run the tests and show you what failed, then on the next run they wipe the slate clean and start over. Dumb.

Sir Kent Beck, always ahead of the game, has been building an exciting new product to enable precisely this kind of selective testing for Java projects.

But I don’t work on Java projects.

Enter the Belly

I decided to build a web service that would record the history of each scenario in my Cucumber test suite, so that I could start to make decisions about which ones to run. I see no reason why this service can’t be generic enough to work for any kind of test case, but Cucumber scenarios seem like a good place to get started, since that’s where I do a lot of my testing, and they’re often slow.

Belly works by installing a little hook into your Cucumber test suite. When the tests run, Belly sends a message to the central ‘hub’ web service (currently hosted at reporting the result of the test. Gradually, Belly builds up a picture of your test suite, which you can browse from the website.


The current version of Belly is alpha-ware, proof-of-concept. It works, but I’m sure it won’t scale well to thousands of users with thousands of tests. I’m sure you’ll find bugs. It also looks pretty rough, but don’t let that put you off; there’s huge potential here.


Right now, probably the most useful feature is the belly rerun command, which helps you focus on running just the cukes that you’ve broken. Rather than having to keep track of them in a rerun.txt file, Belly will remember everything you’ve broken and give you the output you need to run it again with Cucumber:

cucumber `belly rerun`

You can see a demonstration of how to get started using Belly in this slick and polished screencast.

How To

If you can’t make out the details on the horribly blurry screencast, here’s the summary:

# install the gem:
gem install belly

# write belly's hook and config files into your project:
belly init

# run your features
cucumber features

# see your test results live on the internets. tada!

# see what's left to do
belly rerun

# tell cucumber to run what's left to do
cucumber `belly rerun`


I can’t stress how rough-and-ready this is, but I think it’s still useful enough to provoke you into giving me some feedback. Use it at your own risk, and let me know your thoughts.


Incredibly, it turns out that Joe Wilk, my old team-mate at Songkick and fellow Cucumber-core hacker, has been working on another solution to exactly the same problem. Living with a 3-hour build can be quite a motivator! I’m hoping Joe and I can figure out a way to combine our efforts into something really beautiful.

Battling Robots at Software Craftsmanship 2010

I’ve submitted a session for the Software Craftsmanship 2010 Conference. It’s a redux of the Robot Tournament I ran at SPA2010.

The idea behind the session is to simulate the life of a start-up software company. In the early rounds of the tournament, the priority for each team is to get a robot, any robot, out there an playing matches. As the tournament progresses, quality becomes more important as you need to adapt your robot to make it a better competitor.

This ability to adapt your approach to the work to the context you’re doing it in is, I think, really important for the true craftsperson to grasp. If you’ve been reading Kent Beck’s posts about start-ups and design, you’ll be familiar with this subject. It’s wonderful to be able to patiently produce a beautifully-worked piece of furniture, but what if the building is burning down and you just need a ladder to escape, right now? Can you rip up some floorboards and knock something together from the materials to hand and save your family?

There’s a great skill in understanding and working to the appropriate level of quality for the context you’re currently in, and I hope this session gives people a little insight into that.

You can watch my screencast audition for the session here. Please leave any feedback or comments here.

Hi-Fidelity Project Management

If the only metric you use for measuring and forecasting your team’s progress is their iteration velocity, you’re missing out on a great deal of richer information that, for just a few extra minutes per day, you could easily be collecting. This is information that the team can use during the iteration to help spot and fix problems that are holding them back.

If you’re using scrum, you’re probably familiar with using a Release Burndown chart to track the team’s progress, iteration by iteration, towards a bigger goal:

Release Burn-Down Chart

Here we can see the team’s velocity appears to be slowing down. They might still release at the end of Sprint #9, but there’s a chance, if the flattening trend in their velocity continues to worsen, that they may never finish the project at all.

We have enough information here to know that something may be wrong. We can use this information to warn stakeholders that our release date may be at risk of slipping, and may also start looking though the backlog of stories for things we can drop.

The problem with these responses is that they’re purely reactive. Assuming there is a systemic cause at the root of this slow-down, we’re not doing anything to deal with it and get the team back on track. We can go and ask the team what they think might be wrong, and try to help them correct any problems they can identify. This can often work, but sometimes the team may not have the experience or perspective to be able to see what’s going wrong.

Another problem is that we’ve had to wait until the end of the iteration to get this information. For an organisation that’s used to going into the darkness of 9-month waterfall projects, getting iteration velocity figures once a fortnight can see pretty decent. An awful lot can happen inside an iteration though, and having that summed up by a single number loses a lot of important detail. Like the signal on an old short-wave radio, this is low-fidelity data.

One crucial piece of data that’s hidden within the burn-down chart is what I call the rate of discovery. The whole point of agile development is that it’s iterative: we build and ship something, get some feedback, learn from that, build and ship some more, and repeat until our stakeholders are happy with it. If we’re doing iterative (as opposed to repetitive) development, we’re going to discover new user stories as we go through the project, perhaps even as we go through the iteration. This is a good thing: it means we’re listening to our customers. We need to make sure our project plans can handle this as a matter of course.

Going back to our release burn-down, we want to separate the rate of discovery from the rate of delivery. A great way to do this is simply to flip the vertical axis of the chart, and use a Release Burn-Up instead. On here we can start tracking two lines instead of one. First we draw the number of completed stories (or story points), and then stacked on top of that, we draw the number of stories still to do. That includes any story not yet done – whether it’s in the backlog or being worked on in the next iteration.

I love these charts – they seem to easily map to people’s understanding of the project. When you explain that the area underneath the bottom line represents all the features that have been done, it’s easy for anyone involved with the project to quickly understand what it means. In the case of the chart above, we can identify that while the team are delivering at a pretty steady rate, they’re discovering features at a steady rate too. They’ll probably need to de-scope if they want to meet their release date.

We can add extra fidelity to this chart in two dimensions: We can collect samples more often, and we can collect details about just how done the stories are as they move from ‘not done’ into ‘done’. Let’s start by collecting more detail about how done the stories are. Imagine our task board looks like this at the end of an iteration:

One story (Story A) is done, and 8 other stories are not done. As we track these counts over time, we can draw one line on our Release Burn Up chart for each category of ‘not done’ and stack the lines:

This chart has another name, Cumulative Flow Diagram (CFD). We call it this because as stories flow from ‘not done’ to ‘done’ across the task board, we’re drawing the accumulation of that flow on the diagram. There are lots of things we can gleam from this diagram.

If we look at the example above, we can see that work is stacking up in the design stage of our process. Because our CFD chart highlights this, we can put more directed effort into relieving the bottleneck on the designers, perhaps by adding an extra analyst to the team to run ahead and do some more detailed analysis of the upcoming stories in the backlog, or by helping the designers to break the existing stories up into smaller ones that are easier to understand.

You can wait until the end of the iteration to count these numbers, but why stay ignorant? If you collect this data every day, you’ll get quick feedback about where bottlenecks are appearing in your team, and be able to try small tweaks to correct them.

Random Notes from SPA2010

Usage-Centric Design

Narrative Journey Maps

  • Duncan Prefers the term Usage-Centred Design to User-Centric Design. There was a book reference here but I missed it.
  • Narrative Journey Maps (NJM) are a way to model and visualise the steps a user has to follow as they try to achieve a goal.
  • Each Step is decorated with:
  • Comments
  • Questions
  • Ideas
  • An Emotional Barometer is draw across the top which highlights pain-points where the user may be particularly frustrated with the process.
  • NJMs are usually created from a Contextual Study where users are quietly observed trying to reach the goal.
  • They are a way to record the current state of play, and a place to start thinking about how things could change.


  • Alan Cooper 1985
  • Based on real data: capture and use direct observations of real experience
  • Group and merge those observations to form personas
  • Personas have:
  • Motivations
  • Goals
  • Frustrations
  • Behaviours

It seemed that what had worked for the presenters was to focus on just one persona at a time, when it became obvious who their core user was, and work at satisfying their needs. Once they’d started to make significant inroads into this, it became clearer and more worthwhile to look for other personas.

To Read

Luke Hohman’s Innovation Games
Mr Squiggle (seed drawings for workshop excercises)

  • Quaker Business Method
  • Sociocracy
  • Formal Consensus (CT Butler)
  • “Participatory Decision Making” – Sam Kaner

“The Logical Thinking Process” – H William Dettner. A book on conflict clouds.
“Round the Bend” – Neville Chute. Like Zen and The Art of Motorcycle Maintenance but English

Other Conferences

Keith Braithwaite was heading out to ‘Next Generation Testing’ and said he was really looking forward to it. He said it was quite easy to stand out from the crowd of big vendors, and that if you have something original to say you’ll likely be asked back. He also mentioned ‘Test Expo’ as being another conference in this vein.

Really interesting it taking the Cucumber message to these conferences.

Remote Pairing and Wolf-pack programming

I met several people who are making use of or would like to make use of this more. Many of them were using Ubuntu so don’t have access to iChat, and were struggling with slow VNC connections. I suggested screen + emacs/vim to a few people (not that I’ve used it myself, but I’ve heard good things). People mentioned plug-ins for eclipse, and my old favourite SubEthaEdit came up. It does feel like there’s product missing here.

Some guys did a BoF trying out a crazy contraption they had using a smalltalk environment that allowed a dozen people all edit the same code at the same time, on their own workstations. It sounded pretty amazing.

My Session

I ran a session, Robot Tournament at the conference. Despite what I had considered thorough preparation, I had some rather exciting moments when the tournament engine spluttered and needed some running repairs. Overall though the feedback I got was positive. Some observations:

  • The (accidental) downtime gave people an opportunity to make build scripts and so on. I wonder whether this could be engineered deliberately another time.
  • More logging and visibility of exactly what’s going on when a player runs would be useful to help participants with debugging.
  • The warm-up should include calling a robot with a command-line argument so that any teething problems with reading the input can be resolved at a relaxed pace.
  • A better explanation (role play?) of how the tournament works would help.
  • Need to limit the number of players to 1 per team. Although it was worth experimenting with allowing more than one, there were a couple of disadvantages that seemed to outweigh the advantages:
  • when people realised they could write scripts to add several robots, this really slowed down the time to run a round due to the number of permutations of matches. I guess here you could deal with this by using a league system, but for now the simplest thing seems to be to just limit the number of players.
  • there is a strategy (which the winning team used) where you use a patsy player which can recognise a game against another player from the same team and throw the game, thus giving that player an advantage. By releasing several patsy players you leverage that advantage.
  • I was surprised (and a bit disappointed) at how conservative most of the language choices were. I think we had 3 Ruby robots, 2 Java ones and one Haskell robot. Sadly I couldn’t get smalltalk working for the guy who wanted to use that. It seemed clear that rather than one language being particularly better than another for the problem at hand, teams who used a language they were familiar with did best.
  • It was hard for people to see what was going on when they were getting their robots running. More visibility how exactly what it looks like when their program is run on the server environment would be helpful.
  • Also more functionality on the UI to slice the results and look at just how your own robot had performed.
  • The problem was so small that tests were hardly needed. Pivoting, changing the rules of the game half-way through the tournament might have helped here.
  • I would also be interested in trying out variable-length iterations – some long ones, some short ones.
  • Shipping simple solutions early was definitely a strategy that had worked for everyone.
  • People enjoyed the fact that the goal – getting points – was clear, so that rather than it being about writing clean code or writing tests, it was more realistic to a business context.
  • Trying a more open game where you could learn more about your opponent might be interesting
  • Getting teams to swap code might also be interesting
  • Doing a code show & tell wasn’t in my plan but worked really well

The session format ended up being something like this:

  • 10 minutes introduction
  • 25 minutes warm-up
  • 30-45 minutes faffing around fixing the engine while people started to build their real robots
  • 7 x 7 = 50 minutes tournament rounds
  • 25 minutes code show & tell
  • 15 retrospective looking at what we’d learned and any insights

Fancy a Game of Robots?


I’m running a session next week at the SPA conference where we’re going to have a battle between rival teams of programmers.

I’ve been working on the tournament engine for the past few weeks and really picked up pace on it this week. I hadn’t realised it would be so much work!

The thing is, dear reader, I would really like to test the tournament engine out before the session runs for real at the conference. So I’m proposing to run the tournament via the web some time this weekend. It will take at most 2 hours. I’ll set up an IRC channel so we can heckle each other.

If this sounds like your idea of fun, please sign up here indicating the best time for you, and I’ll be in touch.

That’s right, you can SIGN UP HERE

Agile North 2010

I’ll be speaking at Agile North this year. The title of my talk is “The Lean Startup”, in which I’ll describe my experiences with an incredible young company I’ve been working with for the last couple of years.

Here’s some more details about the conference:

Friday 14th May 2010 – UCLan, Preston

Price held at £95 per person – be there to learn, share, take part
and enjoy

Details on

MegaMutex: A Distributed Mutex for Ruby

Sometimes I need to do this:

unless enough_widgets?

Which is all well and good, until I start letting two or more of these codes run in parallel. If you’ve never thought about this before, what can happen is something nasty called a race condition, where two or more processes (or threads) simultaneously check #enough_widgets?, and simultaneously both decide that they need to go and #make_more_widgets. With multiple processes now making more widgets, we end up with too many.

The solution is to lock this critical section of code so that only one process could ever run it at once – everyone else has to queue up and wait their turn. That way each check for #enough_widgets? will return an answer that’s accurate. In a single process with threads, this is achieved using the Mutex class, but when you run multiple processes in parallel, across multiple machines, you need something more. You need MegaMutex.


Suppose you have a WidgetMaker:

class WidgetMaker
  include MegaMutex

  def ensure_just_enough_things  
    with_distributed_mutex("WidgetMaker Mutex ID") do
      unless enough_widgets?

Now, thanks to the magic of MegaMutex, you can be sure that all processes trying to run this code will wait their turn, so each one will have the chance to make exactly the right number of widgets without anyone else poking their nose in.


MegaMutex uses memcache-client to store the mutex, so your infrastructure must be set up to use memcache servers.

By default, MegaMutex will attempt to connect to a memcache on the local machine, but you can configure any number of servers like so:

MegaMutex.configure do |config|
  config.memcache_servers = ['mc1', 'mc2']


sudo gem install mega_mutex


Agile 2009 Session – Debugging Pair Programming – Part 1

I had the chance to run a 90 minute session at this year’s Agile Conference in Chicago. The conference as a whole was an terrific experience but here I’d like to talk about my session, and what I learned from it.

The session focusses on the barriers that prevent people from adopting pair programming. I’ve encountered a whole variety of reasons for this in different teams I’ve worked on, and the first part of the session involved the participants working in groups to talk about specific characters I’d created who exhibit some of these problems.

We used these discussions as a catalyst to look over the general set of issues that can be a barrier to people’s adoption of pairing. We then went on to talk about tips and tricks for tackling these issues, but I’ll save those for another post.

Here’s what we came up with.


Lack of Feedback Within the Team

Trying out a radical new practice like pair programming can feel pretty disruptive to some people. It’s important when a team is going though such volatile changes that they get plenty of opportunities to feed back to one another about how it’s going. It’s also important that they recognise the benefits that pairing can bring for them. Offering pairing as a solution to a problem that the team has identified for itself will make for much more enthusiastic adoption than simply mandating it as a necessary agile practice.

Lack of Management / Team Approval of Pairing as a Valuable Practice

Again, if the team doesn’t recognise that pairing will help them, they’re unlikely to really give it an enthusiastic – and therefore successful – shot. More importantly, if management isn’t actively encouraging teams to try the practice out, it’s likely that at least some of the team will be reluctant to try it for fear of displeasing their boss.

Lack of Trust in Team from Management

The benefits of pairing are counter intuitive: two people working together are actually much more than twice as productive as if they were coding solo, yet this is not obvious to the causal or cynical observer. Management don’t necessarily need to understand this, but they certainly need to make the team feel trusted to make their own decisions about whether the practice can make them more effective. If the team don’t feel trusted by their management, they’re unlikely to try out a practice that can seem mysterious and even frightening to the uninitiated.

Dilbert Agile Programming

Lack of Trust Between Team Members

Pair programming brings people much closer together than they might normally have to be in a work situation. Thoughts and opinions are shared in rapid succession about quite subjective issues. It’s important that team members trust each other in order that they can settle in to this new level of intimacy.

Lack of Respect for Cultural / Gender Differences

Again, the level of intimacy between team members that paring demands can cause problems when people fail to consider one another’s backgrounds. Different people have different needs for personal space, for example. People with an intense need for personal space may be put off pairing simply by the idea of having someone sitting in such close proximity to them. Pairing that person with someone who is from a very tactile culture can be a disaster! It’s important to respect and work around these kind of issues as they are usually deep seated and cannot be simply rationalised away.

Need the Right Incentives

Often team members, especially from organisations used to a more traditional way of working, can be given incentives that don’t make paring the obvious choice. If people tend to be singled out for praise for delivering a particular feature, for example, it’s likely that they will be keen to own features by themselves so that they can take all the credit.

Lack of Experience at Pairing

There are some basic protocols that make pairing work effectively – talking about what you’re doing; swapping the keyboard when a new test is written or a test passes for the first time; taking regular breaks – but these are tricks that people tend to have learned though experience. On a team full of people who have never tried pairing before, you can expect some fairly basic mistakes to be made that may make pairing feel less enjoyable or effective than it could be.


This seems to be a significant barrier for people who’ve never tried pairing before. Some people are simply afraid of the unknown – they may have been programming solo since they were quite young and are really used to this way of working.

The other big fear is the one of intimacy – having to share your ideas and opinions freely and have them instantly judged by someone else is a much more direct way of working than many programmers are used to. This goes along with a fear of looking stupid – many programmers fear their ideas or designs are not that good, and would rather not have to demonstrate this to a pair.

A more sinister fear is the one some ivory-tower builders can have of losing their job security – by being forced to share their ‘unique’ domain knowledge with their team members will they lose their power and potentially their hold on their job?


Joseph Pelrine has talked about the higher-than-average incidence of Asbergers Syndrome or other ASD psychological conditions within the computer programming progression. People on the autistic spectrum demonstrate many skills that make them excellent programmers, yet they also suffer real barriers in communicating with other people. These barriers can make the close collaborative experience of pairing stressful for both them and their pair, and it’s important to be able to recognise the signs of this and work around them.

Need to Understand the Importance of Learning

A big part of the benefit of pairing is the spread of knowledge throughout the team, both about the problem domain and the technology being used. When an organisation simply doesn’t understand the value of learning, it may not yet be ready for the benefits that pairing can bring.