We needed to improve quality. Not just once, but all the time. And we needed to do it across over 100 teams, each of which had a different context and different quality problems. The problem that impacted the greatest number of teams impacted only 6 — and more than 70% of the problems affected only one team.
No central plan could help with either of these. We needed a systematic solution, a strategy. But not a central plan. We needed a practice, a habit — something that combined funding with problem fixing, while reinforcing local ownership.
That’s when we invented Safeguarding. That was more than a year ago; teams are owning their quality and actively making things better.
Another company had consistent risk aversion. Commonly, people identified the option with the highest expected value…and then did something else. In that environment, the personal costs of even tiny failures were too high. So everyone consistently chose the lowest-risk option, even when that was a guaranteed bad outcome. Psychological safety was totally missing. And yet each relationship was different, with different risks.
Safeguarding was also an effective strategy for creating Psychological Safety.
The key insight?
No one ever tries to create a bug. Instead, humans are set up to fail, and manage to recover 99.9% of the time. Different environments have different hazards, and thus different frequencies of setting someone up to fail.
If you want fewer failures, make the environment more safe. Making humans more cautious has small payoff and large cost, while fixing hazards in the environment is cheap and effective.
It is, perhaps, hubris to say we invented Safeguarding. Safeguarding is not a new concept. But the implementation is, and then some new concepts come from that.
The Essence of Safeguarding
Safeguarding consists of the following:
- Get a signal that indicates when important problems happen. (we use bugs)
- Every time the signal fires, examine the specific case to find hazards.
- Specify a timebox — how much effort is worth spending to prevent problems like this one signal that happened (commonly a person-day or two).
- Execute remediations within the current sprint — partial improvements of hazards.
That’s really all there is to it. Most importantly, the following are not part of Safeguarding:
- We don’t look for patterns across multiple instances.
- We don’t deeply analyze what happened to find all the root causes.
- We don’t do any work that would exceed our effort timebox or go beyond the sprint.
- We don’t look for common patterns across teams or try to actively share improvements between teams.
- We don’t base the effort on the remediations we’re going to perform.
- We don’t ever block a remediation if the team decides it’s the one of the good-enough ideas for the current hazard.
Adding any of these would decrease the impact. We’ve often had to fight to keep people from adding them, because they seem like good ideas. They simply turn out to be bad ideas in practice. The concerted efforts to do the wrong thing come from our near-universal misunderstanding of change, particularly how organizations change.
Real Change Can’t be Managed
We often talk about Change Management. Heck, it’s a discipline, even a job. But Change Management operates from a fundamentally flawed assumption. It assumes that there is such a thing as a change.
In truth, change happens all the time. And there is no Right Change. Rather, any real situation has many different contexts, and the right change in any one of them would be an absolutely terrible idea in several others. To change successfully, the organization needs to change in multiple, opposite directions, at the same time.
As an example, in the bugs case mentioned above, 3 teams identified that they needed to add automated integration tests. That was right for their context. 3 others identified that they needed to remove automated integration tests. That was right for their context. They picked exactly the same measure, but needed to make exactly opposite changes. And the other 100+ teams? They didn’t need to do anything with integration tests. Any time spent even reporting that number would be total waste for them.
This means that any shared goal (or measure) is immediately and obviously stupid to those who are undergoing the change. And that’s why change managers say people fear change.
Yet if we look at the real world, we see that all of these people who supposedly fear change are constantly changing themselves. They put huge effort into change, they get excited by it and happy about it. People don’t resist change: they seek it!
The truth is simpler:
People don’t fear change. They fear being changed in obviously stupid ways.
Additionally, changing any system alters the local context. That’s the whole point. This means that when we define a change project or goal, then get started, the fact that we have started the change will invalidate some or all of the goal. It would be obviously dumb to keep going after the same goal.
We need to change our goal, and probably what we are measuring. The first ones who will see this are those doing the work; the last to see this are those “leading” the change (they tend to be further from the actual experience, and more dependent on the measures and data — which makes it very difficult for them to see when they are measuring the wrong thing.
Thus, change projects and a unified goal with measures creates a conflict between those making the change and those choosing what change to make. All of Change Management is an attempt to resolve that tension — usually in favor of those deciding what change to make.
Yet there is a far more effective solution: make it possible for each team and person to make good choices about what success looks like for themselves, then do that. And that’s what Safeguarding does.
How To Safeguard (Overview)
- Pick a bug. Any bug. (90 seconds, individual)
- Get together the right people, in one room: (15 min, individual)
- The bug finder, the bug fixer, the bug author, others with critical info from the debugging.
- Anyone required to approve adding work items to the current sprint (direct manager, product owner).
- Use Parallel Writing + Outline Voting to build a hazard outline, starting from the 3 key questions: (13 min, in meeting)
- (The before question): what things increased the probability of the author making this mistake in the first place?
- (The escape question): what things made it harder for us to notice the problem, thus allowing it to escape?
- (The recovery question): once we knew there was a problem, what things make it more difficult to find the cause & to repair the follow-on effects?
- Use Fist of Three to pick the remediation timebox. (90 sec, in meeting)
- Use Parallel Writing + Outline Voting to identify 2-3 remediations that: (9 min, in meeting)
- Partially address the selected hazards.
- Are each completable in no more than 1/4 of the timebox.
- Add the remediations to the current sprint. (3 min, individual)
- Communicate to stakeholders.
- Add them to your sprint demo.
- DO THE REMEDIATIONS. This is really the only step that matters. Everything is just here to set up this step. (time as per timebox)
This blog series will go into details about each of these sections, and a few other critical elements.
The Safeguarding Blog Series
- Culture is a Process, not a Single Change (this article)
- Using Parallel Writing to Engage All the Brains
- Recognizing Hazards
- Recognizing Good Remediations
- Tracking, Data, and Feedback
- Avoiding Common Pitfalls
- Addressing Hazards Beyond Your Control – Without Getting Blocked
- How to Lead Safeguarding Across an Organization
- Safeguarding as a Change Strategy