pixel_dreams - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

How can software transplants fix bad code?

Copying and pasting bad code into an application is a big problem for developers, but software transplants can help. Expert Michael Cobb explains the technology.

What are software transplants, and how can they fix bad code? Is there a place for this method in the enterprise software lifecycle? Or could this lead to more software bugs and security vulnerabilities?

Software developers love copying and pasting code from the Internet. It doesn't matter if it's a short example of how a function works, a code snippet or a multiline open source library. It saves huge amounts of time and money and enables developers to quickly add features and functionality without having to create it from scratch themselves. Development teams can easily use a hundred or more different open source libraries, frameworks and tools along with code snippets copied off the Internet when building an application. The 2014 Sonatype Open Source Development Survey found that 90% of a typical application is assembled with open source components, many of which contain known security flaws.

This is a serious problem; an error in a popular section of bad code can be copied or incorporated into hundreds and sometimes thousands of applications. It actually appears in the latest OWASP Top 10 List of application vulnerabilities. To tackle this problem, researchers from the Massachusetts Institute of Technology have come up with a way of replacing bad code with code that works correctly from another program -- a form of Darwinian best of breed self-improvement. Their system is called CodePhage and it can recognize and fix common programming errors such as out of bounds access, integer overflows and divide-by-zero errors.

CodePhage works by analyzing how the bad code processes data that doesn't cause it to crash or malfunction and code that does. It then feeds the error-inducing input to a donor program and records what checks and constraints it uses to handle the input safely. CodePhage then uses that information to correct the original program by taking code snippets from the more secure donor program that correctly handles the input -- usually a function or routine that sanitizes the input data. It then checks that the inserted code has fixed the bug. If it hasn't, it carries on looking for divergences in how the two programs handle the input. CodePhage doesn't require access to the source code of the donor applications and can import checks from applications written in programming languages other than the one in which the program it's repairing was written.

Early tests on various common open source programs in which their crash-inducing input generator DIODE had found bugs have been very promising, with bad code patched within two and 10 minutes, using code taken from between two and four donors each time. As a lot of coding time goes into ensuring data inputs won't disrupt the expected flow of a program, CodePhage could reduce development times by automatically correcting coding errors during the development process. As an automated code analysis and repair tool, CodePhage sounds very promising, and enterprise development teams should follow its development and look to incorporate it into their build process when possible. It would be a misuse of this technology if developers abandoned secure coding practices and just relied on it to insert vital security checks for them.

Ask the Expert:
Want to ask Michael Cobb a question about application security? Submit your questions now via email. (All questions are anonymous.)

Next Steps

Find out whether a college degree is necessary to be a computer programmer

Learn how to secure code using a continuous delivery model

Check out the discussion about the security risks of dynamic code

This was last published in December 2015

Dig Deeper on Secure software development

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Does your organization use software transplants to help avoid bad code snippets? If so, do you find them effective?
That's certainly very interesting approach with CodePhage and it might be helpful. On the other hand, a great caution is needed when running an automatic copy-and-replace code generation tool.
I would have to work with a tool like this a bit on my own, before I could give it my seal of approval.  I'm sure it may be helpful in some ways, but i wonder if over reliance will lead to new problems.
We don't, currently, although the more I research transplants, the more that I think we could benefit from them.
We don't, currently, although the more I research transplants, the more that I think we could benefit from them.
@Michael -

> the more I research transplants, the more that I think we could benefit from them
- Could you share a few examples?
A pretty interesting concept. It seems that the results should be better when smaller snippets of code are pulled from the donor program. Is there anything that allows for configuration of the size of snippets it can take?
I agree, sounds interesting. I've never heard of the term "software transplant" before. We definitely use a number of open source projects, and have run into a couple of issues with bugs in those. I can't think of any issues that stemmed from copying a code snippet online, though.
I’ve seen problems creep in when in someone is working with a new (to them) technology, and try copy/paste from multiple locations as they try to get something working. Since they don’t really understand the code they are copying, they tend to come up with a solution that seems to work, but introduces many unexpected consequences into the main code base. It sounds like transplants may be able to help identify some of these areas.
CodePhage might be helpful in addressing certain class of bugs, but only of "mechanical" kind. I also don't get how it would fit into mature XP / TDD development model.
Copy and Pasting code has another problem.  IP concerns.  Who owns what you copied.  

The security issue here is major.  Good devs look at examples, and then port and update the idea to fit properly into our code bases.

However, if they have a tool that can detect these security issues, and you actually trust it enough to click an apply the patch.  That could be a significant development.