This content is part of the Essential Guide: Antimalware tools and techniques security pros need right now
Problem solve Get help with specific problems with your technologies, process and projects.

De-anonymizing malware: Can a new tactic help?

Researchers have discovered a new technique for de-anonymizing software that could help infosec teams attribute malware attacks. Expert Nick Lewis explains how it works.

Identifying patterns can be incredibly useful in understanding data -- even when that data is incomplete or hidden. For example, Morse code operators could be identified by their "fist," which describes the individual style of how an operator sends his or her messages. Even predicting the weather uses computer models that identify patterns. New research from Princeton University offers a way of de-anonymizing malware and identifying programmers based on executable code. There are many reasons why being able to identify a programmer without looking at source code could be beneficial to information security professionals and enterprises alike.

This tip looks at the research into de-anonymizing malware and identifying programmers based on executable code, and explains how that could be useful.

Identifying programmers based on executable code

In its research, the Princeton University team analyzed binary executables and identified malware authors in limited scenarios. The researchers basically decompiled the executable code into source code and then, using machine learning, identified patterns in the code that could be attributed to a specific programmer from a limited group of individuals. They then analyzed executables from Google Code Jam and GitHub where the researchers knew the authors and were able to identify the author in 96% of the cases.

This de-anonymizing technique might advance privacy protection for programmers, but it would still be easy to identify the programmer for a particular piece of code by looking through programming forums and inquiring about the specific code or algorithms.

When source code is compiled by a compiler into executable binary code that an operating system can directly execute, the compiler typically strips out variable names, comments in the code, makes several optimizations and many other changes that remove most identifying information from the source code. It does this to improve the performance of the executable code.

In the paper, researchers outlined some of the difficulties with applying these techniques in the wild such requiring only one author of specific binary, handing statically linked executables and knowing the compiler used. They specifically mentioned obfuscation techniques as an area that could make malware more difficult to identify the author.

De-anonymizing malware authors

In the paper, the researchers offer a few caveats about the challenges of using their techniques to identify malware authors. However, even with those caveats, it is still a valuable advancement that can be built upon with further research. Being able to identify the author(s) of an executable could provide several useful pieces of intelligence for malware analysis or incident response. Investigators may not even need to know the identity of the author -- just the "fist" or the characteristics of the programmer to categorize malware. Knowing an author of an executable can be used as a shortcut to analyze malware or used to attribute a specific attack. If investigators have a profile of the attack capabilities or techniques used in the malware and can identify that person by analyzing the binaries used in an attack, this could provide a high degree of confidence in attribution. This de-anonymizing analysis could drastically reduce the scope and time required to identify and attribute an attack by simply knowing, for example, that a particular author or group always used a specific executable packer.

On the other hand, the researchers called out the potential security and privacy implications for anonymous legitimate programmers being identified by their executable code. This de-anonymizing technique might advance privacy protection for programmers, but it would still be easy to identify the programmer for a particular piece of code by looking through programming forums and inquiring about the specific code or algorithms. Even looking through LinkedIn profiles and connections can provide clues about authorship of commercial or open source code. In addition, software development teams within enterprises will retain authorship details as part of their source code management systems for many different reasons, so legitimate programmers are unlikely to be anonymous in an enterprise to begin with.


There are enormous challenges in trying to identify programmers based on executable code, and it is probably best left to malware researchers that have experience with conducting such analyses. But the de-anonymizing technique outlined by the Princeton University researchers can provide benefits in those efforts. It can be used as a shortcut for analyzing malware first or attributing a specific attack, and that information can be used to identify a threat and help prioritize security controls known to protect against that threat. For most enterprises, this may not have significant value other than in reviewing research reports to ensure their information security program addresses the necessary security controls. But in the event of a major attack, it could provide much-needed intelligence to both incident response teams, as well as the authorities regarding the malware authors behind the attack.

Next Steps

Learn more about how attackers thwart forensics investigations

Find out why dynamic code obfuscation is a new threat

Discover why fileless malware attacks are on the rise and how to stop them

This was last published in March 2016

Dig Deeper on Malware, virus, Trojan and spyware protection and removal