GPT Models for PR Review

Jul 11, 2023 | 1 minute read

Tags: blog, openai, chatgpt, development, software

Nerdy tech evangelist on a sailboat digital art

We had a problem all to familiar to development teams recently: Important feature needs releasing, but there is a backlog of pull requests which need reviewing.
I decided to give GPT-assisted PR review another try.
TL;DR - it’s promising but still needs work.

I ran GPT against some spike (i.e. crap) code used to generate some statistics for this website, and here’s what it found:

  • 1x potentially valid review comment that I would’ve made as a human.
  • 2x generic “Cover Your Ass” type suggestions (review documentation, check variable names, write tests)
  • 6x items that linter/CiCd tools could’ve caught with better semantics, readability, and IDE integration.

The open source tools in this space are still evolving. I’m on waitlist for Github Next’s Co-Pilot PR Review, which seems to be the most promising.
The token limit (4k or 10k) barely fits small PRs, and splitting them across multiple requests limits static analysis capabilities.

Here are the tools I reviewed:

Here is the sample review output for the crapcode PR reviewed:

If you’ve got any experience with GPT-assisted PR review, I’d love to hear about it.