Someone I know had the following conversation recently during code review:

[WontFix] //depot/[elided].cs: Line 232

[reviewer]: Unit test this directly

[author]: the integrations test this. Don’t see any direct benefit of UT it directly. (26 minutes ago)

He wanted advice on how to get past this. His goal is to get his team to write direct unit tests, as opposed to multi-unit component and integration tests. But how? And what does he do in code review to help the team accomplish that in specific cases?

Code reviews aren’t about learning or change

First the dirty little secret of code reviews: no oneâ€™s mind is ever changed during a code review (CR). People can be reminded about something they already believe and go change the code to match their belief. But they wonâ€™t change their thinking. Several aspects of CR give this result:

CR is an after-the-fact check; the dev has moved on (or wants to).
CR is a critique. Most people highlight the mistakes (we are trained to), not the successes. This builds conflict and inhibits rewarding risk-taking.

Learning is all about taking a risk and getting feedback before a decision is made. CR fails both parts.

So why code review?

Simple: code review is excellent at helping people maintain their own agreements.

The hardest period in making a change, especially as a team, is in changing the habit. Getting agreement about the problem or the solution is relatively easy. Actually changing the habit and applying the new way of thinking consistently takes effort.

Code review is a great way to reinforce habits, because:

CR is an after-the-fact check. The dev has already corrected any mistakes he noticed. This is defense in depth.
CR is detailed. No summarized info, we can see all the specifics for each case.
CR is semi-public. It is seen only by trusted people trying to make the same change, but all of them can see it. So you can get social support for admitting and fixing your mistake and others can learn from it.

Habit forming is all about setting up social situations to help backstop individual responsibility. The individual is responsible for changing their own habits, but the social systems give them feedback and support as they are learning to notice specific mistakes and fix them. CR does this well.

Highlight successes, not failures

Side note: don’t highlight mistakes in CR. Hunt for successes and highlight those.

Then share the CRs widely. Much like every mistake will be made by someone on your team, every good behavior will be done by someone. Highlight those moments of success and repeat them.

Now people can follow examples to the good things, rather than trying to figure out the good thing from hundreds of bad examples.

Bad examples are useful after the team has nearly attained mastery. Cleaning up the final errors to achieve perfection has its time and place. But they’re only useful once the bad example is the minority case.

A great example of this is when a team is first starting to write dev automated tests.

At this point it doesn’t matter whether they write the test first or after, whether the test is a unit, component, or integration test, or really about anything else. We’re just trying to establish the habit of devs writing automated tests at all (or, rather, doing so consistently).

So call out the people who write tests. Ignore whether those are “good” tests. If you start seeing “good” tests, then call those out too. Highlight that not only did this person write tests, they also refactored the code to create units, and then wrote unit tests.

Celebrate the tests that everyone wrote. Celebrate the refactoring that this person did. Celebrate the greatness of these tests. Ignore the failures and people will change their habits to align with the celebrated successes.

How to change the way we code

We want multiple systems. CR helps us maintain our agreements. We want another system that is good at learning and at making agreements about habits to change.

The key is to get agreement and alignment outside of the specific case. We need total alignmentâ€”consent or consensus, not just majority.

So Iâ€™d bring this review result up to the whole team in a way that scrubs the names. Start a discussion about unit testing and ask why people unit test and why they donâ€™t.

Ask the team whether this kind of judgment call makes senseâ€”should we evaluate whether a UT is required in this case, or should we just blanket require it and assess whether an integration test makes sense? Donâ€™t push a particular result or agenda; let the team decide what it wants.

Then help hold the team to that standard during CR. Get the rest of the team to do the same.

Why this works

This is intentionally manipulative. It is a way for the team to take advantage of their own psychological biases.

First, people decide things based on the information readily accessible to them (What You See Is All There Is). People’s subconscious minds (which makeÂ the decisions that we then rationalize)Â don’t take any other information (or absence of information) into account.

During the discussion, specifics of cases are out of context (as long as you don’t bring them in). This drives the team towards optimism. They are more likely to agree / align to something that is right in general but may not apply to some specific case.

They are willing to try things that agree with their beliefs and with 95% their experience, even if there would be contradictions with some of their experience.

This helps the team make an agreement that aspires to 1-2 steps beyond what they can do today. The team chooses to try to be a little better than they areâ€”in whatever way they think is better.

Get them to state that agreement as a simple rule, and ask them to name the rule. Make it pithy and short (less than 5Â words). This makes it easy to access, so the subconscious will treat it as high-importance: it will always be something the subconscious sees.

Second, we are preying on our own commitment bias. Once a person has committed to a thing, even in just a tiny way, they will make subsequent decisions in a way to support that initial decision.

By ensuring that the team made at least a consent decision, every single person on the team has made a partial commitment to the agreement. Now their subconscious will push them towards follow-on decisions that support that agreement.

Everyone has biased themselves in favor of the direction they wanted to go.

Third, we are now all on the same side. There is no us/them conflict.

When a CR comes up you say â€œthis needs a UT,â€ they say â€œNah, covered by integration,â€ and you say â€œNever let a mess growâ€ (or whatever your rule is). Theyâ€™ve already agreed to that, so you are reminding them of their agreement.

Suddenly your words are peer support helping them meet their agreement, rather than a critique trying to get them to do something that they don’t see as relevant and preventing them from completing this task. They are much more likely to make the change.

Of course, we can do better

I could hardly leave this topic without discussing pairing.

Pairing works well because it has the good parts of a code review for habit formation, but a fundamentally different character for agreement and learning. The discussion happens before a decision is made. The two people are on the same side, trying to get stuff done well. Feedback is frequent, and most of the feedback is about things done well, not mistakes made. Pairing supports both learning and habit formation.

Of course, it still misses agreement. Getting a simple rule across the full team requires some structure beyond the pair. Retrospectives are the usual solution. Use your retrospectives to generate either one experiment to try or one habit to change (either to be completed before the next retro), and do them often (no less frequent than weekly; I currently like every 2 days). The main focus is on executing those experiments and habit changes, and that’s where pairing works.

Another option is mobs. I wonder (speculation alert!) if this is a big part of the secret of why mobbing works. Not only do you get the continuous learning advantages from pairing, but full-team agreement becomes continuous. You no longer wait for retrospectives to discuss things. You no longer discuss in abstract. Instead you make a decision in the moment and move on.

Of course, that would lose the advantageÂ ofÂ losing specific context when talking. Which couldÂ make it harder to slight of mind yourself into cultural shifts. But who knows: perhaps the reduced cost of making a decisionÂ opens up incremental culture shift options that don’t requireÂ even the small acts of faithÂ involved in settingÂ a new habit that you don’t yet live and trying to live up to it.

Don’t use code review to drive change

Back to the original topic, CR is a bad way to drive change. Heck, driving change is a bad idea in and of itself, but that’s a topic for another entry.

CR is a great way to help a team establish a habit that they have already decided to establishâ€”assuming that habit is visible in the code. Pairing is better, but CR can serve some benefit if used correctly. And, of course, all of this applies when youÂ are pairing too.

The hard part, of course, is changing your own habits around how to give a code review. How could your team support yourselves in that?

4 thoughts on “Using code review to support change”

Ted says:

June 3, 2014 at 23:56

Good post, Arlo!

I’m in the process of starting to have a team do code reviews and pair programming as part of their daily development work, and you’ve given me something to think about. Especially the things you say about habit forming and celebrating successes are really aligned with my own goals with the team. Keep up the good work!

Regards, Ted

@jaybazuzi says:

October 14, 2014 at 15:55

I just noticed a special case of the code reviews problem:

If the code change is complex, the author will always be proud of their accomplishment.They worked hard to solve a difficult problem. Telling them their solution sucks will hurt.

1. abelshee says:
  
  December 30, 2014 at 08:16
  
  And telling them it sucks because it is complex will hurt even more. Good point.
  
EricGu says:

August 24, 2015 at 10:15

Insightful as usual.

I've never been happy with the psychology of code reviews. You have – in most groups – spent a day or so on a task, and now, in your mind, you have completed it.

You get one of three kinds of feedback:
1) It's fine (or minor changes). You feel good.
2) Moderate changes required. You missed something, and you feel that you shouldn't have missed it. Your sense of accomplishment goes away; you have rework to do.
3) Major changes required. Usually happens when you don't understand something (how to write code for a section, what a good design would be, etc. ) This is really the worst kind of negative feedback, the kind that can hit you in the stomach. You *MESSED UP*, now you have to fix your mess, which is going to push you out past your estimate. Etc. Etc.

As a reviewer, there are good and bad ways to write feedback, but there is no great way to give the feedback for #3; you cannot nicely say, "you went totally in the wrong directly on this, and you should revert your changes and start over" without it being an attack on their competence. And it being semi-public just makes it worse.

Oh, and there's another factor going on; if you are a "scheduled" team – and by that I mean you have features/stories assigned to you to complete, if somebody needs to rework their code because of a #3 review, it shows up to management.

Which makes people defensive, and reinforces they "not great but at least it kindof works" standard.

Throw all of this together, and code review mostly becomes a way to check the things that are easy to check (naming conventions & other style stuff), a way to catch some slightly deeper things (did you remember to write tests or handle a null case here?), and a poor way to catch big design or architectural issues.

It can also lead to teams that are really focused on minutae.