Donnerstag, Juni 25, 2020

They want to be small, they want to be big: thoughts on code reviews and the power of patch series

Code reviews are a central fact of life in software development. It's important to do them well, and developer quality of life depends on a good review workflow.

Unfortunately, code reviews also appear to be a difficult problem. Many projects are bottlenecked by code reviews, in that reviewers are hard to find and progress gets slowed down by having to wait a long time for reviews.

The "solution" that I've often seen applied in practice is to have lower quality code reviews. Reviewers don't attempt to gain a proper understanding of a change, so reviews become shallower and therefore easier. This is convenient on the surface, but more likely to allow bad code to go through: a subtle corner case that isn't covered by tests (yet?) may be missed, there may be a misunderstanding of a relevant underlying spec, a bad design decision slips through, and so on. This is bound to cause pains later on.

I've experienced a number of different code review workflows in practice, based on a number of tools: GitHub PRs and GitLab MRs, Phabricator, and the e-mail review of patch series that is the original workflow for which Git was designed. Of those, the e-mail review flow produced the highest quality. There may be confounding factors, such as the nature of the projects and the selection of developers working on them, but quality issues aside I certainly feel that the e-mail review flow was the most pleasant to work with. Over time I've been thinking and having discussions a lot about just why that is. I feel that I have distilled it to two key factors, which I've decided to write down here so I can just link to this blog post in the future.

First, the UI experience of e-mail is a lot nicer. All of the alternatives are ultimately web-based, and their UI latency is universally terrible. Perhaps I'm particularly sensitive, but I just cannot stand web UIs for serious work. Give me something that reacts to all input reliably in under 50ms and I will be much happier. E-mail achieves that, web UIs don't. Okay, to be fair, e-mail is probably more in the 100ms range given the general state of the desktop. The point is, web UIs are about an order of magnitude worse. It's incredibly painful. (All of this assumes that you're using a decent native e-mail client. Do yourself a favor and give that a try if you haven't. The CLI warriors all have their favorites, but frankly Thunderbird works just fine. Outlook doesn't.)

Second, I've come to realize that there are conflicting goals in review granularity that e-mail happens to address pretty well, but none of the alternatives do a good job of it. Most of the alternatives don't even seem to understand that there is a problem to begin with! Here's the thing:

Reviews want to be small. The smaller and the more self-contained a change is, the easier it is to wrap your head around and judge. If you do a big refactor that is supposed to have no functional impact, followed by a separate small functional change that is enabled by the refactor, then each change individually is much easier to review. Approving changes at a fine granularity also helps ensure that you've really thought through each individual change and that each change has a reasonable justification. Important details don't get lost in something larger.

Reviews want to be big. A small, self-contained change can be difficult to understand and judge in isolation. You're doing a refactor that moves a function somewhere else? Fine, it's easy to tell that the change is correct, but is it a good change? To judge that, you often need to understand how the refactored result ends up being used in later changes, so it's good to see all those changes at once. Keep in mind though that you don't necessarily have to approve them at the same time. It's entirely possible to say, yes, that refactor looks good, we can go ahead with that, but please fix the way it's being used in a subsequent change.

There is another reason why reviews want to be big. Code reviews have a mental context-switching overhead. As a reviewer, you need to think yourself into the affected code in order to judge it well. If you do many reviews, you typically need to context-switch between each review. This can be very taxing mentally and ultimately unpleasant. A similar, though generally smaller, context-switching overhead applies to the author of the change as well: let's say you send out some changes for review, then go off and do another thing, and come back a day or two later to some asynchronously written reviews. In order to respond to the review, you may now have to page the context of that change back in. The point of all this is that when reviews are big, the context-switching overhead gets amortized better, i.e. the cost per change drops.

Reviews want to be both small and big. Guess what, patch series solve that problem! You get to review an entire feature implementation in the form of a patch series at once, so your context-switching overhead is reduced and you can understand how the different parts of the change play together. At the same time, you can drill down into individual patches and review those. Two levels of detail are available simultaneously.

So why e-mail? Honestly, I don't know. Given that the original use case for Git is based on patch series review, it's mind-boggling in a bad way that web-based Git hosting and code review systems do such a poor job of it, if they handle it at all.

Gerrit is the only system I know of that really takes patch series as an idea seriously, but while I am using it occasionally, I haven't had the opportunity to really stress it. Unfortunately, most people don't even want to consider Gerrit as an option because it's ugly.

Phabricator's stacks are a pretty decent attempt and I've made good use of them in the context of LLVM. However, they're too hidden and clumsy to navigate. Both Phabricator and Gerrit lack a mechanism for discussing a patch series as a whole.

GitHub and Gitlab? They're basically unusable. Yes, you can look at individual commits, but then GitHub doesn't even display the commits in the correct order: they're sorted by commit or author date, not by the Git DAG order, which is an obviously and astonishingly bad idea. Comments tend to get lost when authors rebase, which is what authors should do in order to ensure a clean history, and actually reviewing an individual commit is impossible. Part of the power of patch series is the ability to say: "Patches 1-6, 9, and 11 are good to go, the rest needs work."

Oh, and by the way: Commit messages? They're important! Gerrit again is the only system I know of that allows review comments on commit messages. It's as if the authors of Gerrit are the only ones who really understood the problem space. Unfortunately, they seem to lack business and web design skills, and so we ended up in the mess we're in right now.

Mind you, even if the other players got their act together and supported the workflow properly, there'd still be the problem of UI latency. One can dream...


Michel Dänzer hat gesagt…

Interesting post!

I agree it would be nice if GitLab allowed doing all the same things with Git commit logs as it does with the changes themselves. But I don't consider this a showstopper; general comments mostly work OK for discussing the commit logs.

The other issues you raise against GitHub and GitLab seem mostly GitHub specific. GitLab is working pretty well for reviewing Mesa patch series consisting of any number of commits. It lists the commits in the same order as the Git command line tools, and doesn't drop comments across rebases (but clearly points out when the code reference by a comment has been changed in the meantime). Comments can be made in the context of specific commits, and the current GitLab version 13.1 now supports combining multiple such comments to a "review" which can be prepared as a whole and then atomically posted at once.

The last point highlights that while GitLab is certainly not perfect yet for patch review, it is constantly getting better. But even now, it can be worth putting up with any patch review deficiencies in exchange for other benefits from GitLab, e.g.: Integrated CI; merge requests provide a central and persistent reference for changes and corresponding discussion; easy cross-references between merge requests and issues (also between separate projects of the same GitLab instance).

P.S. I suspect you're exaggerating with 100ms :). But FWIW, bad desktop response may indicate a setup issue. E.g. this laptop used to feel kind of sluggish due to , but now it's really snappy (with GNOME 3.36 Wayland).

Nicolai Hähnle hat gesagt…

This is good to hear about GitLab, I haven't used it that much with newer versions. It's always been better than GitHub from what I remember.

And the 100ms... I haven't actually measured it. I suspect it's lower than those 100ms (it certainly feels snappy enough here), but I remember reading a blog post by somebody who did serious measurement found that most mice and keyboards have surprisingly high latency in practice. So don't nail me down on the exact numbers, I haven't done the measurements and subjective impressions can be very misleading. The point is that web UIs are just horribly bad. For example, certainly in GitHub and in the older versions of GitLab I've used more seriously, switching between commits feels like it's taking closer to a second, where switching between e-mails in a native client, or between commits in something like gitk, is pretty much instantaneous.

What I'd really want is something like gitk, but with the added ability to both read and write review comments offline and then send them off as a batch. Or having it integrated fully in an IDE that I want to use, but a separate tool is fine in practice.

Michel Dänzer hat gesagt…

Yeah, makes sense. What I usually do with GitLab is open all commits I want to look at in separate Firefox tabs, then switching between them is quick.

GitLab provides an API which should make something like what you want at least theoretically possible. E.g. there's for VSCode.

Michel Dänzer hat gesagt…

Just happened to come across Bichon, a terminal based GitLab merge request review client:

fungi hat gesagt…

Similarly, there's a console-based Gerrit REST API client from called "gertty" which I rely on all day every day. It attempts to make code review more like replying to E-mail or Usenet News discussions, includes dependency-based threading of patch series, extensive configurability, hotkey binding for custom queries, batch processing... I'm really not a fan of Gerrit's WebUI either, but with gertty I can still love using Gerrit.