Recovery over Perfection, Predictability over Commitment, Safety Nets over Change Control, Collaboration over Handoffs

Author: Elisabeth Hendrickson

A couple weeks ago I tweeted:

Apparently it resonated. I think that’s more retweets than anything else original I’ve said on Twitter in my seven years on the platform. (SEVEN years? Holy snack-sized sound bytes! But I digress.)

@jonathandart said, “I would love to read a fleshed out version of that tweet.”

OK, here you go.

First, a little background. Since I worked on Cloud Foundry at Pivotal for a couple years, I’ve been living the DevOps life. My days were filled with zero-downtime deployments, monitoring, configuration as code, and a deep antipathy for snowflakes. We honed our practices around deployment checklists, incident response, and no-blame post mortems.

It is within that context that I came to appreciate these four simple statements.

Recovery over Perfection

Something will go wrong. Software might behave differently with real production data or traffic than you could possibly have imagined. AWS could have an outage. Humans, being fallible, might publish secret credentials in public places. A new security vulnerability may come to light (oh hai, Heartbleed).

If we aim for perfection, we’ll be too afraid to deploy. We’ll delay deploying while we attempt to test all the things (and fail anyway because ‘all the things’ is an infinite set). Lowering the frequency with which we deploy in order to attempt perfection will ironically increase the odds of failure: we’ll have fewer turns of the crank and thus fewer opportunities to learn, so we’ll be even farther from perfect.

Perfect is indeed the enemy of good. Striving for perfection creates brittle systems. So rather than strive for perfection, I prefer to have a Plan B. What happens if the deployment fails? Make sure we can roll back. What happens if the software exhibits bad behavior? Make sure we can update it quickly.

Predictability over Commitment

Surely you have seen at least one case where estimates were interpreted as a commitment, and a team was then pressured to deliver a fixed scope in fixed time. Some even think such commitments light a fire under the team. They give everyone something to strive for.
It’s a trap.

Any interesting, innovative, and even slightly complex development effort will encounter unforeseen obstacles. Surprises will crop up that affect our ability to deliver. If those surprises threaten our ability to meet our commitments, we have to make painful tradeoffs: Do we live up to our commitment and sacrifice something else, like quality? Or do we break our commitment? The very notion of commitment means we probably take the tradeoff. We made a commitment, after all. Broken commitments are a sign of failure.

Commitment thus trumps sustainability. It leads to mounting technical debt. Some number of years later find themselves constantly firefighting and unable to make any progress. The real problem with commitments is that they suggest that achieving a given goal is more important than positioning ourselves for ongoing success. It is not enough to deliver on this one thing. With each delivery, we need to improve our position to deliver in the future.

So rather than committing in the face of the unknown, I prefer to use historical information and systems that create visibility to predict outcomes. That means having a backlog that represents a single stream of work, and using velocity to enable us to predict when a given story will land. When we’re surprised by the need for additional work, we put that work in the backlog and see the implications. If we don’t like the result, we make an explicit decision to tradeoff scope and time instead of cutting corners to make a commitment.

Aiming for predictability instead of commitment allows us to adapt when we discover that our assumptions were not realistic. There is no failure, there is only learning.

Safety Nets over Change Control

If you want to prevent a given set of changes from breaking your system, you can either put in place practices to tightly control the nature of the changes, or you can make it safer to change things. Controlling the changes typically means having mechanisms to accept or reject proposed changes: change control boards, review cycles, quality gates.

Such systems may be intended to mitigate risk, but they do so by making change more expensive. The people making changes have to navigate through the labyrinth of these control systems to deliver their work. More expensive change means less change means less risk. Unless the real risk to your business is a slogging pace of innovation in a rapidly changing market.

Thus rather than building up control systems that prevent change, I’d rather find ways to make change safe. One way is to ensure recoverability. Recovery over perfection, after all. Fast feedback cycles make change safe too. So instead of a review board, I’d rather have CI to tell us when the system is violating expectations. And instead of a laborious code review process, I’d rather have a pair work with me in real time.

If you want to keep the status quo, change control is fine. But if you want to go fast, find ways to make change cheap and safe.

Collaboration over Handoffs

Full article:

You may also like...

Leave a Reply