Generative AI: The new emerging bottleneck in software development - Human Code Review

Mar 10, 2024

Hey, I am Klaus Haeuptle! Welcome to this edition of the Engineering Ecosystem newsletter in which I write about a variety of software engineering and architecture topics like clean code, test automation, decision-making, technical debt, large scale refactoring, culture, sustainability, cost and performance, generative AI and more.

GitHub Copilot and similar and tools based on LLMs have made it easier than ever to generate code, text, and other artifacts. However, the quality of the generated artifacts is not always perfect, and they often require human review and intervention. This has led to a new bottleneck in AI development: human review. First studies show that GitHub CoPilot improves productivity. At the same time some challenges are emerging in the context of code duplication, code churn, security and the ability to review code.

In a recent blog Productivity and Generative AI: Importance of focusing on effectiveness and looking at the whole system I highlighted the importance of looking at the whole system of software engineering with context-awareness. In this blog, we will explore the new bottleneck in AI development, the importance of human review, and some strategies to address the challenges. We will also discuss the implications for software engineering and the future of AI assisted-development. In the recent blog on How AI Code Genration works by GitHub , they have shared some insights on how they are addressing some of the challenges.

Code Duplication

One of the main challenges with AI-generated code is code duplication. AI models like GitHub Copilot are trained on large datasets of code, and they often generate code that is similar to existing code. This can lead to code duplication, which can make the codebase harder to maintain and understand. It can also lead to security vulnerabilities, as duplicated code can contain bugs and other issues that are not immediately obvious. To reduce this issue partially GitHub CoPilot plans to integrate a duplicate filter.

Security

Another challenge with AI-generated code is security. AI models are not perfect, and they can generate code that contains security vulnerabilities. This can be a serious problem, as security vulnerabilities can lead to data breaches, financial losses, and other issues. It is important to review AI-generated code for security vulnerabilities and to take steps to address them. This can be challenging, as security vulnerabilities can be subtle and hard to detect, especially in large codebases. To reduce the security risk developers can use tools like code scanning, which actively reviews your code for potential security issues in real-time and seamlessly integrates the findings into the developer workflow as well as checking for vulnerabilities in dependencies. Both are also a must if you do not use AI tool support.

Code Churn

Another challenge with AI-generated code is code churn. AI models are constantly evolving and improving, and they often generate code that is not immediately useful or correct. This can lead to code churn, as developers have to review and refactor the code to make it useful. This can slow down development and lead to frustration among developers. There are some strategies to address the challenges of code churn, such as using AI-generated code as a starting point and then refining it through human review and intervention. Doing proper code review is hard and requires a good understanding of the context and programming language.

GenAI as major Incentive to write Clean Code

The challenges of AI-generated code have led to a renewed focus on clean code. Clean code is readable, maintainable, and intention-revealing, and it is easier to review and understand. It is also easier for AI models to understand and generate code that is clean. Therefore, clean code is getting more important in the age of generative AI tool support. In the article they mention e.g. the importance of intention revealing names: "If developers give specific names to their functions and variables, and write documentation, they can get better suggestions, too. That’s because GitHub Copilot can read the variable names and use them as an indicator for what that function should do."

The Importance of Human Code Review of Generative AI Code

Human code review is essential to identify and address issues before the code is deployed. Generative AI lacks the ability to understand the broader context and the specific requirements of a project. Human code reviewers, with their understanding of the project's goals and constraints, can ensure that the generated code aligns with the project's needs. Therefore, human code review remains an essential part of the process. It improves the quality and security of the generated code, aligns the code with the project's needs, and fosters a culture of learning and accountability. In the article they mention:

"Code reviews play an important role in maintaining code quality and reliability in software projects, regardless of whether AI coding tools are involved. In fact, the earlier developers can spot bugs in the code development process, the cheaper it is by orders of magnitude."
"But using AI chatbots doesn’t mean developers should be hands-off. Mistakes in reasoning could lead the AI down a path of further mistakes if left unchecked. Berryman recommends that users should interact with the chat assistant in much the same way that you would when pair programming with a human. “Go back and forth with it. Tell the assistant about the task you are working on, ask it for ideas, have it help you write code, and critique and redirect the assistant’s work in order to keep it on the right track.”"

The Importance of the Ability to Review Code

Very experienced and highly-skilled software engineers get the most benefits from usage of CoPilot. While every developer can reap the benefits of using AI coding tools, experienced programmers can often feel these gains even more so. The reason is that they are quicker to understand the context and the suggestions, can make better use of the suggestions and are able to review and iterate with the code.

The New Bottle Neck in AI Development: Human Code Review

If through Generative AI support the amount of code changes increases, the need for human review will also increase. This will lead to a new bottleneck in AI development: human review. The challenge is that the need for time spend in code review will raise. This is a challenge that we need to address as a community. We need to find ways to make human review more efficient and effective to be able to ensure the quality, security of the changes and keep up with the growing code base.

How can we make code review better and are able to ensure the quality of the code? What are our requirements for tools like GitHub Code Review support, IDEs or other tools in the code review process? What are your ideas for improving code review?

Related Resources

Subscription: If you want to get updates, you can subscribe to the free newsletter:

Mark as not spam: : When you subscribe to the newsletter please do not forget to check your spam / junk folder. Make sure to "mark as not spam" in your email client and move it to your Inbox. Add the publication's Substack email address to your contact list. All posts will be sent from this address: ecosystem4engineering@substack.com.

❤️ Share it — The engineering ecosystem newsletter lives thanks to word of mouth. Share the article with someone to whom it might be useful! By forwarding the email or sharing it on social media.

Share Engineering Ecosystem

Software Engineering Ecosystem

Discussion about this post