pixel_dreams - Fotolia

How can software transplants fix bad code?

Copying and pasting bad code into an application is a big problem for developers, but software transplants can help. Expert Michael Cobb explains the technology.

Michael Cobb

Published: 28 Dec 2015

What are software transplants, and how can they fix bad code? Is there a place for this method in the enterprise software lifecycle? Or could this lead to more software bugs and security vulnerabilities?

Software developers love copying and pasting code from the Internet. It doesn't matter if it's a short example of how a function works, a code snippet or a multiline open source library. It saves huge amounts of time and money and enables developers to quickly add features and functionality without having to create it from scratch themselves. Development teams can easily use a hundred or more different open source libraries, frameworks and tools along with code snippets copied off the Internet when building an application. The 2014 Sonatype Open Source Development Survey found that 90% of a typical application is assembled with open source components, many of which contain known security flaws.

This is a serious problem; an error in a popular section of bad code can be copied or incorporated into hundreds and sometimes thousands of applications. It actually appears in the latest OWASP Top 10 list of application vulnerabilities. To tackle this problem, researchers from the Massachusetts Institute of Technology have come up with a way of replacing bad code with code that works correctly from another program -- a form of Darwinian best of breed self-improvement. Their system is called CodePhage and it can recognize and fix common programming errors such as out of bounds access, integer overflows and divide-by-zero errors.

CodePhage works by analyzing how the bad code processes data that doesn't cause it to crash or malfunction and code that does. It then feeds the error-inducing input to a donor program and records what checks and constraints it uses to handle the input safely. CodePhage then uses that information to correct the original program by taking code snippets from the more secure donor program that correctly handles the input -- usually a function or routine that sanitizes the input data. It then checks that the inserted code has fixed the bug. If it hasn't, it carries on looking for divergences in how the two programs handle the input. CodePhage doesn't require access to the source code of the donor applications and can import checks from applications written in programming languages other than the one in which the program it's repairing was written.

Early tests on various common open source programs in which their crash-inducing input generator DIODE had found bugs have been very promising, with bad code patched within two and 10 minutes, using code taken from between two and four donors each time. As a lot of coding time goes into ensuring data inputs won't disrupt the expected flow of a program, CodePhage could reduce development times by automatically correcting coding errors during the development process. As an automated code analysis and repair tool, CodePhage sounds very promising, and enterprise development teams should follow its development and look to incorporate it into their build process when possible. It would be a misuse of this technology if developers abandoned secure coding practices and just relied on it to insert vital security checks for them.

Ask the Expert:
Want to ask Michael Cobb a question about application security? Submit your questions now via email. (All questions are anonymous.)

Next Steps

Find out whether a college degree is necessary to be a computer programmer

Learn how to secure code using a continuous delivery model

Check out the discussion about the security risks of dynamic code

How can software transplants fix bad code?

Copying and pasting bad code into an application is a big problem for developers, but software transplants can help. Expert Michael Cobb explains the technology.

Next Steps

Dig Deeper on Application and platform security

Best practices for network automation with Python

LangChain

GPT-3

The past, present and future of AI coding tools

Related Q&A from Michael Cobb

Stateful vs. stateless firewalls: Understanding the differences

The differences between inbound and outbound firewall rules

Symmetric vs. asymmetric encryption: What's the difference?