Donnerstag, Juni 25, 2020

They want to be small, they want to be big: thoughts on code reviews and the power of patch series

Code reviews are a central fact of life in software development. It's important to do them well, and developer quality of life depends on a good review workflow.

Unfortunately, code reviews also appear to be a difficult problem. Many projects are bottlenecked by code reviews, in that reviewers are hard to find and progress gets slowed down by having to wait a long time for reviews.

The "solution" that I've often seen applied in practice is to have lower quality code reviews. Reviewers don't attempt to gain a proper understanding of a change, so reviews become shallower and therefore easier. This is convenient on the surface, but more likely to allow bad code to go through: a subtle corner case that isn't covered by tests (yet?) may be missed, there may be a misunderstanding of a relevant underlying spec, a bad design decision slips through, and so on. This is bound to cause pains later on.

I've experienced a number of different code review workflows in practice, based on a number of tools: GitHub PRs and GitLab MRs, Phabricator, and the e-mail review of patch series that is the original workflow for which Git was designed. Of those, the e-mail review flow produced the highest quality. There may be confounding factors, such as the nature of the projects and the selection of developers working on them, but quality issues aside I certainly feel that the e-mail review flow was the most pleasant to work with. Over time I've been thinking and having discussions a lot about just why that is. I feel that I have distilled it to two key factors, which I've decided to write down here so I can just link to this blog post in the future.

First, the UI experience of e-mail is a lot nicer. All of the alternatives are ultimately web-based, and their UI latency is universally terrible. Perhaps I'm particularly sensitive, but I just cannot stand web UIs for serious work. Give me something that reacts to all input reliably in under 50ms and I will be much happier. E-mail achieves that, web UIs don't. Okay, to be fair, e-mail is probably more in the 100ms range given the general state of the desktop. The point is, web UIs are about an order of magnitude worse. It's incredibly painful. (All of this assumes that you're using a decent native e-mail client. Do yourself a favor and give that a try if you haven't. The CLI warriors all have their favorites, but frankly Thunderbird works just fine. Outlook doesn't.)

Second, I've come to realize that there are conflicting goals in review granularity that e-mail happens to address pretty well, but none of the alternatives do a good job of it. Most of the alternatives don't even seem to understand that there is a problem to begin with! Here's the thing:

Reviews want to be small. The smaller and the more self-contained a change is, the easier it is to wrap your head around and judge. If you do a big refactor that is supposed to have no functional impact, followed by a separate small functional change that is enabled by the refactor, then each change individually is much easier to review. Approving changes at a fine granularity also helps ensure that you've really thought through each individual change and that each change has a reasonable justification. Important details don't get lost in something larger.

Reviews want to be big. A small, self-contained change can be difficult to understand and judge in isolation. You're doing a refactor that moves a function somewhere else? Fine, it's easy to tell that the change is correct, but is it a good change? To judge that, you often need to understand how the refactored result ends up being used in later changes, so it's good to see all those changes at once. Keep in mind though that you don't necessarily have to approve them at the same time. It's entirely possible to say, yes, that refactor looks good, we can go ahead with that, but please fix the way it's being used in a subsequent change.

There is another reason why reviews want to be big. Code reviews have a mental context-switching overhead. As a reviewer, you need to think yourself into the affected code in order to judge it well. If you do many reviews, you typically need to context-switch between each review. This can be very taxing mentally and ultimately unpleasant. A similar, though generally smaller, context-switching overhead applies to the author of the change as well: let's say you send out some changes for review, then go off and do another thing, and come back a day or two later to some asynchronously written reviews. In order to respond to the review, you may now have to page the context of that change back in. The point of all this is that when reviews are big, the context-switching overhead gets amortized better, i.e. the cost per change drops.

Reviews want to be both small and big. Guess what, patch series solve that problem! You get to review an entire feature implementation in the form of a patch series at once, so your context-switching overhead is reduced and you can understand how the different parts of the change play together. At the same time, you can drill down into individual patches and review those. Two levels of detail are available simultaneously.

So why e-mail? Honestly, I don't know. Given that the original use case for Git is based on patch series review, it's mind-boggling in a bad way that web-based Git hosting and code review systems do such a poor job of it, if they handle it at all.

Gerrit is the only system I know of that really takes patch series as an idea seriously, but while I am using it occasionally, I haven't had the opportunity to really stress it. Unfortunately, most people don't even want to consider Gerrit as an option because it's ugly.

Phabricator's stacks are a pretty decent attempt and I've made good use of them in the context of LLVM. However, they're too hidden and clumsy to navigate. Both Phabricator and Gerrit lack a mechanism for discussing a patch series as a whole.

GitHub and Gitlab? They're basically unusable. Yes, you can look at individual commits, but then GitHub doesn't even display the commits in the correct order: they're sorted by commit or author date, not by the Git DAG order, which is an obviously and astonishingly bad idea. Comments tend to get lost when authors rebase, which is what authors should do in order to ensure a clean history, and actually reviewing an individual commit is impossible. Part of the power of patch series is the ability to say: "Patches 1-6, 9, and 11 are good to go, the rest needs work."

Oh, and by the way: Commit messages? They're important! Gerrit again is the only system I know of that allows review comments on commit messages. It's as if the authors of Gerrit are the only ones who really understood the problem space. Unfortunately, they seem to lack business and web design skills, and so we ended up in the mess we're in right now.

Mind you, even if the other players got their act together and supported the workflow properly, there'd still be the problem of UI latency. One can dream...