This afternoon I paired up with a colleague to fix a bug that had been introduced some time ago but, because the effects weren’t very noticeable, had only just come to our attention. Fixing the defect itself was actually quite easy – the real pain was writing a script to clean up the bad data that the bug had been silently strewing all over the database since it sneaked into production.
On my way home I reflected on the root cause of the defect, and how we could have avoided it. The faulty code was pretty good: it read nicely and was obviously written test-first but there was a tiny leak in the logic, obvious enough with hindsight but easy to see how it had been overlooked. I pondered whether the with the investment of a little bit more attention at the time of writing, we might have saved my pair and I an afternoon of relatively tedious data-clean up. My reckoning was that we probably could have.
At the time the original defective code was written, the team were under a fair amount of pressure. As a start-up, we’re living on borrowed time, and we had a great product that was almost finished hidden away in private beta while our public offering languished behind the competition. There was real urgency to get the new product released. I’m extremely proud of the team that under that pressure we stuck to practices we knew were making us most effective: writing code test-first, working in pairs and keeping the code clean and defect-free as we went. Or so we thought.
On reflection, I’m surprised to find I’m perfectly comfortable with the fact that we allowed that bug to creep through. Given the option of going more slowly and avoiding these kind of fairly subtle mistakes, and going at the pace we did and getting launched when we did with the product we did, it seems to me that going more quickly was the right choice to make at the time. Despite the fact that the net amount of programmer time was eventually greater, we exchanged that cost for the benefit of being able to launch the product at an earlier date.
This is an extremely dangerous lever to fiddle with. A programming team that allows itself to make too many mistakes will certainly not be able to ship to a predictable schedule, and may never even manage to ship at all. We were lucky that the damage this bug made to the data could be completely repaired: a more serious error might have left us having to contact users and apologise for losing their data, for example.
In this case I think we got the balance just about right, but that took skill, experience, and probably a bit of luck too.
Interesting, thoughtful, and well put. The idea makes me nervous, as so many people get the balance wrong, but it sounds like you folks came through ok.
I’d be interested to see a follow-up post that addresses other common levers. For example, could have you made time by removing features? Could you have reduced schedule pressure with more frequent releases? Could the application of more cash or time stolen from other parts of the company helped?
That’s not to suggest you guys made a bad choice; it sounds like you thought you were doing fine at the time. I’m just curious what you might do differently next time.
Thoughtful piece, wonder how many companies are sacrificing the basics (writing code test-first, working in pairs and keeping the code clean and defect-free) at the altar of the credit crunch, I know we are, and it feels like a ticking bomb, that I’m pretty certain its going to come back and bite me (I know thats a mixed metaphor).
Good work fella.
Great post, brings back memories of my time with a startup launching a new .com. Like you say, it’s a fine line.
Leave a comment