Roman Imankulov

Roman Imankulov

Full-stack Python web developer from Porto

search results (esc to close)
08 Jan 2022

Dealing with large pull requests

Fragility of adding new code to stacked pull requests
Photo by Elise Coates

Sometimes, my pull requests grow too big to review. I want to split them into small chunks to be reviewed and merged independently. At the same time, I want to keep working on the feature in a branch, creating newer pull requests until the work is done.

Automating PR stacks

GitHub has a stacked pull requests thing. For two PRs, you can define their base to say “I want to merge PR 2 into the PR 1. I want to merge PR 1 into the main branch.” This is helpful when you work on a large code change, but want to review and merge it in independent smaller chunks. Managing these stacks of PRs is tedious, though.

Turned out, there are tools promising to streamline the process. I wanted to see if it would be possible to incorporate one of them into my workflow and maybe make it a recommended solution for my team. Despite my initial enthusiasm, I should admit that none of the tools quite fits the bill.

None of them got a widespread adoption: below 200 stars for the most popular one. After having a closer look at some of them, I think I understood why.

First, to work around GitHub limitations, the tools add meta-information to commits or pull requests or require you to use specific names for branches. Not like it’s a big deal, though, but I keep it here as a minor annoyance.

ghstack adds meta-information to the commit

ghstack adds meta-information to the commit

Second, all tools come with a list of caveats and warnings. For example, ghstack warns you against merging commits using the GitHub UI or merging changes from the main branch back to the pull request (you can only rebase.) A show-stopper if you want to adopt it for the team and keep seamless onboarding for newcomers.

ghstack warns you against clicking the merge PR button

ghstack README warns you against clicking the merge PR button

Finally, the tools start breaking when others bend them for their own very unique needs.

Issues in ejoffe/spr show that the tool has difficulty covering a wide variety of use cases.

Issues in ejoffe/spr show that the tool has difficulty covering a wide variety of use cases.

In hindsight, this makes sense. They build an independent patch management system and create a translation layer to convert it to a representation of GitHub pull requests, a system that’s not supposed to work that way.

3 tools to manage stacks of pull requests

Probably, stacked PR management tools can be helpful in some environments. If you want to give them a try anyway, I provide the list of what I tried below.

ghstack

A Python tool to submit stacks of diffs to GitHub as separate pull requests, created by Edward Yang from Facebook. It pushes and creates pull requests for each commit on the stack (one PR per commit).

GitHub top language GitHub stars GitHub last commit GitHub contributors

spr

As ghstack, creates on pull request per commit. Can be installed with Homebrew on macOS and apt on Debian/Ubuntu.

GitHub top language GitHub stars GitHub last commit GitHub contributors

gh-stack

In contrast to previous tools, gh-stack builds individual pull requests from branches and not commits — more flexible but harder to maintain. The author outlines the problem well and writes about his solution in a blog post A Better Model for Stacked Pull Requests. The project may be abandoned, but there’s an actively maintained fork.

Original project

GitHub top language GitHub stars GitHub last commit GitHub contributors

How about creating smaller PRs?

Hitting the wall in my attempt to solve the issue on the first layer, I came to the next “why.” Why am I tempted to create large pull requests in the first place?

In my case, the legacy system made it difficult to plan my work: as I dive deeper into the task, I find myself in the trenches, changing the system in several seemingly unrelated places, and letting the PR grow organically with commits like, “Update subsystem X to support the feature” or “Fix tests that implicitly relied on the old behavior.”

Maintaining aging systems is a separate topic. In my reading list, there is a book Kill It with Fire by Marianne Bellotti that promises to give insights on how to modernize legacy systems and not go nuts.

Legacy or not, any project requires an architecture, and the next step to address the issue beyond patching can be to introduce the framework for taking architecture decisions. The frameworks suggested by Andrew Harmel-Law from Thoughtworks in Scaling the Practice of Architecture, Conversationally provides a good starting point.

My low-tech workflow

Changing project architecture takes ages. Until that, I developed my low-tech workflow for unexpectedly large pieces of work.

I prefer not to use stacked pull requests. Instead, when needed, I create a big draft PR, sort of a master copy of my work.

  • While working on a feature or the update, I periodically groom it. I cherry-pick one or a few relevant commits into independent pull requests based on the main branch.
  • When pull requests are merged, I rebase my draft PR on top of the main branch.
  • Eventually, I deconstruct the entire pull request and merge it with the main branch with smaller PRs.

Not ideal, but it works nicely for me so far.

Roman Imankulov

Hey, I'm Roman, a full-stack Python web developer.

If you like what you read and want to work with me, drop me a line to roman.imankulov@gmail.com.

About me