Session

AI-Enhanced Reviews for Linux Networking

Speakers

Jesse Brandeburg
Kamel Ayari

Label

Moonshot

Session Type

Talk

Contents

Description

Problem Statement

Open-source projects heavily rely on community contributions, but the sheer volume of the patches often overwhelms even the most dedicated reviewers. These reviewers invest considerable time in meticulously reviewing each patch, ensuring adherence to guidelines, code quality, and semantic correctness. However, contributors frequently overlook established guidelines and conventions (such as whitespace issues, comment format irregularities, use of imperative mood, commit message inconsistencies, reverse Christmas tree patterns (RCT), etc.). This can shift the reviewers’ focus away from more significant code aspects and from providing deeper code reviews. Furthermore, reviewers find themselves repeatedly addressing these issues, which can slow down the overall review process. While legacy automation tools (checkpatch, et all) are used to enforce certain rules, they lack the flexibility to adapt to new practices or understand the context. These tools typically concentrate on syntactic correctness but fall short when it comes to semantic subtleties and are unable to provide meaningful feedback, leaving this task to manual reviewers.

Proposed Solution

To address this challenge, we propose leveraging the advancement in generative Artificial Intelligence and Large Language Models (LLM) to provide an AI-based patch review. The solution will ingest patch emails, then submit the raw email as context attached to the review request prompt. The LLM returned review is generated as an email reply. This prompt instructs the model about commit message rules and netdev mailing list best practices. Initially, the model can scan and comment on commit messages from mailing list patch postings. Furthermore, we envision expanding this approach to review entire patches, including commit messages and code, in search of standard rule violations and obvious mistakes.

Solution Benefits

The following are a few examples of the benefits this solution brings to the review process: • Efficiency: Leveraging AI-based patch reviews reduces the burden on manual reviewers, allowing them to focus on more critical aspects of code. • Adaptability: Unlike legacy automation tools, our solution can adapt to new practices and understand context, providing meaningful feedback. • Semantic Understanding: Large Language Models (LLMs) can identify semantic subtleties beyond syntactic correctness. • Faster Review Process: By addressing common issues automatically, we accelerate the overall review process. • Scalability: The approach can be expanded to review entire patches (especially large ones), including commit messages and code. • Consistency: AI-based reviews maintain consistency in enforcing guidelines and conventions.

Potential Future Work

It is essential to emphasize that while this approach will not replace human reviews, it will significantly reduce the time spent by human reviewers on mundane comments. Looking ahead, integrating this AI-based review process into the zero-day bot, or enabling direct replies on the mailing list (with maintainers’ permission) could enhance its effectiveness. The AI assistant will (currently) not be adding Acked-by or other tags, but it is interesting to discuss what would be the prerequisites and method of doing so (example: Bot-reviewed: AI Assistant Formatting Bot ai.bot@company.com)

Examples

The paper will show example patches with problems that were found on the mailing list and compare our assistant’s output with the reviewer’s comments.

Plans

Gather netdev mailing list postings on patches from one kernel

  • Use a “net-next” kernel-cycle (v6.7..v6.8) of postings from the close of a merge window (rc1 release) to the “net-next is closed” announcement (when the merge window on the next kernel opens.) Compare AI-assistant output to real comments from reviewers
  • Use lore.kernel.org/netdev git repositories to scan real traffic
  • Use a sampling strategy to examine the assistant generated output compared to human reviewers