Productivity and Generative AI: Importance of focusing on effectiveness and looking at the whole system
Hey, I am Klaus Haeuptle! Welcome to this edition of the Engineering Ecosystem newsletter in which I write about a variety of software engineering and architecture topics like clean code, test automation, decision-making, technical debt, large scale refactoring, culture, sustainability, cost and performance, generative AI and more. In this edition of the newsletter I am touching the complex topic of producitivity improvements of generative AI.
GitHub claims that GitHub CoPilot allows to complete some tasks up to 55% faster. The measurement of productivity is already a very controversial and complex topic in itself, as already depicted by Kent Beck Gergely Orosz and many other authors. With this blog post I do not want to discuss the measurement of productivity and rather want to highlight the need to look at the whole system of software engineering with context-awareness and the need to ask the right set of questions while exploring generative AI.
Personally I see huge value in using generative AI for many tasks as well I likewise see the risk of overconfidence in the results and the risk of not doing the necessary checks. A recent HBR article on Generative AI and Productivity highlights the issue with the current focus on task-level productivity measurement and some of the system-level issues, which already occurred. Generative AI is here to stay and will change the way we develop software, we need to embrace this and learn how to use it effectively.
It depends on the task context
Besides, it is important to consider that the degree of usefulness depends on the task context. Generative AI is very powerful for some coding tasks, but e.g. probably less useful in unreadable and tightly coupled code. While it can be valuable to explain legacy code to a certain degree, the capability is limited by the data quality (unreadable code, missing documentation, code complexity etc.). Further the prompt context has limits and RAG or fine-tuning comes with its own limitations.
Having the context-awarness in mind it also means if you have tried Generative AI and your experiments did not result in something valuable, I would recommend to keep experimenting with different approaches or even try to switch the context if you do not get good results in your given context. It is worth to create the Aha moment to see the potential of Generative AI in software engineering.
Moreover, the efficacy of generative AI is not a one-size-fits-all solution; it is highly contingent on the specific context of the task at hand. The task context extends beyond mere code — it encompasses a variety of factors that need to be differentiated. These include different forms of legacy code, greenfield projects, technology stacks, and varying levels of maturity. So the leading questions are how high is the degree of usefulness of generative AI depending on the detailed task context? And how can we use Generative AI to get more effective and produce better outcomes for the given context and from holistic system perspective?
The whole software engineering system matters
Improving productivity in coding tasks is indeed a crucial aspect of software development, but it's important to note that it only represents a fraction of the overall system of enterprise software development. In fact, writing or maintaining code accounts for less than 30% of the total work involved. The majority of the work encompasses a broad range of other tasks such as requirement gathering, exploration of the problem and solution spaces, architecture, system design, testing, deployment, documentation and more. Developers need to understand a certain domain, grasp the mental models and concepts and convert them into a unique and unambiguous specification of the system. LLMs do not have a complete notion about what these concept are. Additionally, communication, exploration and collaboration within the team, as well as with stakeholders, play a significant role. Therefore, while enhancing coding productivity can lead to some improvements, a comprehensive approach that addresses all aspects of the software development lifecycle is necessary for substantial progress in enterprise software development.
Emerging Risks with generative AI
While AI-generated code can accelerate development, it also introduces new risks. This is because AI tools currently rely on the LLMs, which use general data and don't know the specifics of the application under test. LLMs do not have concept of the domain, they are generating code for. They require to be taught with concepts. Moreover, they are being used to produce much higher quantities of code which in turn will lead to more bugs and vulnerabilities. The reliance on AI can also lead to a reduced understanding of the codebase by human programmers and a false sense of security, as found in a study by Stanford University. The study found that developers using AI assistants were more likely to introduce security vulnerabilities than those who did not use them. One reason was because the developers were less likely to review the whole code generated by the AI assistant.
What about quality? Strictly speaking, an AI based on LLMs doesn’t have a concept of “correctness”; some tokens are more probable than others, but the AI itself doesn’t execute the code, nor does it “prove” its response in any meaningful way. Therefore, with the emergence of generative solid software engineering practices like unit testing, code quality, shift left, continuous integration and code reviews are getting even more important to understand and verify machine generated code. A major challenge is that reviewing code is much harder than writing it. Another fallacy is that the failures can sometimes be subtle and not obvious to the human eye during code review. An additional observation is that unexperienced developers might just generate the code without understanding the concepts and the ability to do a proper code review.
Leading Questions for continuous exploration
While keep exploring the potential of generative AI, we as community of developer and decision-makers should also keep in mind the risks and the need for a more in-depth understanding of the impact of AI on the whole system of software engineering. E.g. what is the impact on technical debt? What is the impact on quality and security? E.g. will developers get overconfident in the results and do less vigorous code reviews?
We should also ask how we can use Generative AI to get more effective and produce better outcomes. Focusing too much on efficiency can lead to local optimization, sub-optimization and increased risks. Further we should ask what risks can emerge and how to deal with them? How to get there? One step is that leaders embrace and understand the complexity of software development and are able to ask the right set of questions. E.g. instead of assuming that AI can write the documentation, you can ask how AI can help humans to improve the documentation - AI requires good documentation to be more useful and good documentation not helps in the context of generative AI use cases. Instead of asking with efficiency in mind, ask questions related to the effectiveness of the system. Another step is that we as a community of software engineers conduct many experiments with an open but skeptical mindset, learn and adapt. And that research is exploring the impact of generative AI on the whole system of software engineering.
Further Resources
Below some interesting resources to read more about the topic of the impact of generative AI from different perspectives:
Three things GenAI will not change about software delivery by Birgita Böckler from ThoughtWorks
Refactoring vs Refuctoring: Advancing the state of AI-automated code improvements
Signs and Portents: Some hints about what the next year of AI looks like
Subscription: If you want to get updates, you can subscribe to the free newsletter:
Mark as not spam: : When you subscribe to the newsletter please do not forget to check your spam / junk folder. Make sure to "mark as not spam" in your email client and move it to your Inbox. Add the publication's Substack email address to your contact list. All posts will be sent from this address: ecosystem4engineering@substack.com.
❤️ Share it — The engineering ecosystem newsletter lives thanks to word of mouth. Share the article with someone to whom it might be useful! By forwarding the email or sharing it on social media.
I think it was Grady Booch who said "..., one of the things that tools can do is help bad designers create ghastly designs much more quickly than they ever could in the past."
Having said that, I believe AI will have as large an impact on software development as the internet did in the mid to late 1990's. Before access to the internet was generally available, developers had to rely on manuals, journals, and co-located senior colleagues for help in designing and coding, etc.
Now, no-one thinks twice about searching online for help with a design or coding problem, or an existing library or framework that already does what they need. AI takes this to a whole new level because it can summarise thousands of answers for you and generate designs and code for a problem in a fraction of the time it would take a human developer to do so
... but just as the early internet could give really bad and incomplete information as well as good, and the quality of the information has steadily improved ... so I suspect we will see some very bad results from early adoption of AI in software development but steady improvement in the quality of the results over the next decade.
One thing, I will be interested to see is if AI spawns a new generation of widely-adopted general purpose programming languages that supersede the Java and JavaScript generation of languages born in the early years of the world-wide-web nearly thirty years ago.