Definition

obfuscation

What is obfuscation?

Obfuscation means to make something difficult to understand. Programming code is often obfuscated to protect intellectual property or trade secrets, and to prevent an attacker from reverse engineering a proprietary software program.

Encrypting some or all of a program's code is one obfuscation method. Other approaches include stripping out potentially revealing metadata, replacing class and variable names with meaningless labels and adding unused or meaningless code to an application script. A tool called an obfuscator will automatically convert straightforward source code into a program that works the same way, but is more difficult to read and understand.

Unfortunately, malicious code writers also use these methods to prevent their attack mechanisms from being detected by antimalware tools. The 2020 SolarWinds attack is an example of hackers using obfuscation to evade defenses.

Deobfuscation techniques can be used to reverse engineer -- or undo -- obfuscation. These techniques include program slicing, which involves narrowing the program code to just the relevant statements at a particular point in the program. Compiler optimization and program synthesis are two other deobfuscation techniques. Obfuscation aims to make reverse engineering difficult and not worth the trouble.

How does obfuscation work?

Obfuscation in computer code uses complex roundabout phrases and redundant logic to make the code difficult for the reader to understand. The goal is to distract the reader with the complicated syntax of what they are reading and make it difficult for them to determine the true content of the message.

With computer code, the reader may be a person, a computing device or another program. Obfuscation is also used to fool antivirus tools and other programs that rely heavily on digital signatures to interpret code. Decompilers are available for languages such as Java, operating systems such as Android and iOS, and development platforms like .NET. They can automatically reverse engineer source code; obfuscation aims to make it difficult for these programs to do their decompiling as well.

Code obfuscation is not about changing the content of a program's original code, but rather about making the delivery method and presentation of that code more confusing. Obfuscation does not alter how the program works or its end output.

What follows is an example snippet of normal JavaScript code:

var greeting = 'Hello World';
greeting = 10;
var product = greeting * greeting;

That same snippet in obfuscated form looks like this:

var _0x154f=['98303fgKsLC','9koptJz','1LFqeWV','13XCjYtB','6990QlzuJn','87260lXoUxl','2HvrLBZ','15619aDPIAh','1kfyliT','80232AOCrXj','2jZAgwY','182593oBiMFy','1lNvUId','131791JfrpUY'];var _0x52df=function(_0x159d61,_0x12b953){_0x159d61=_0x159d61-0x122;var _0x154f4b=_0x154f[_0x159d61];return _0x154f4b;};(function(_0x19e682,_0x2b7215){var _0x5e377c=_0x52df;while(!![]){try{var _0x2d3a87=-parseInt(_0x5e377c(0x129))*parseInt(_0x5e377c(0x123))+-parseInt(_0x5e377c(0x125))*parseInt(_0x5e377c(0x12e))+parseInt(_0x5e377c(0x127))*-parseInt(_0x5e377c(0x126))+-parseInt(_0x5e377c(0x124))*-parseInt(_0x5e377c(0x12f))+-parseInt(_0x5e377c(0x128))*-parseInt(_0x5e377c(0x12b))+parseInt(_0x5e377c(0x12a))*parseInt(_0x5e377c(0x12d))+parseInt(_0x5e377c(0x12c))*parseInt(_0x5e377c(0x122));if(_0x2d3a87===_0x2b7215)break;else _0x19e682['push'](_0x19e682['shift']());}catch(_0x22c179){_0x19e682['push'](_0x19e682['shift']());}}}(_0x154f,0x1918c));var greeting='Hello\x20World';greeting=0xa;var product=greeting*greeting;

The obfuscated version is nearly impossible to follow using the human eye.

Obfuscation techniques

Obfuscation involves several different methods. Often, multiple techniques are used to create a layered effect.

Programs written in software languages that are compiled, such as C# and Java, are easier to obfuscate. This is because they create intermediate-level instructions that are generally easier to read. In contrast, C++ is more difficult to obfuscate, because it compiles to machine code, which is more difficult for people to work with.

Some common obfuscation techniques include the following:

  • Renaming. The obfuscator alters the methods and names of variables. The new names may include unprintable or invisible characters.
  • Packing. This compresses the entire program to make the code unreadable.
  • Control flow. The decompiled code is made to look like spaghetti logic, which is unstructured and hard to maintain code where the line of thought is obscured. Results from this code are not clear, and it's hard to tell what the point of the code is by looking at it.
  • Instruction pattern transformation. This approach takes common instructions created by the compiler and swaps them for more complex, less common instructions that effectively do the same thing.
  • Dummy code insertion. Dummy code can be added to a program to make it harder to read and reverse engineer, but it does not affect the program's logic or outcome.
  • Metadata or unused code removal. Unused code and metadata give the reader extra information about the program, much like annotations on a Word document, that can help them read and debug it. Removing metadata and unused code leaves the reader with less information about the program and its code.
  • Opaque predicate insertion. A predicate in code is a logical expression that is either true or false. Opaque predicates are conditional branches -- or if-then statements -- where the results cannot easily be determined with statistical analysis. Inserting an opaque predicate introduces unnecessary code that is never executed but is puzzling to the reader trying to understand the decompiled output.
  • Anti-debug. Legitimate software engineers and hackers use debug tools to examine code line by line. With these tools, software engineers can spot problems with the code, and hackers can use them to reverse engineer the code. IT security pros can use anti-debug tools to identify when a hacker is running a debug program as part of an attack. Hackers can run anti-debug tools to identify when a debug tool is being used to identify the changes they are making to the code.
  • Anti-tamper. These tools detect code that has been tampered with, and if it has been modified, it stops the program.
  • String encryption. This method uses encryption to hide the strings in the executable and only restores the values when they are needed to run the program. This makes it difficult to go through a program and search for particular strings.
  • Code transposition. This is the reordering of routines and branches in the code without having a visible effect on its behavior.

How to measure obfuscation success

The success of obfuscation methods can be measured using the following criteria:

  • Strength. The extent to which transformed code resists automated deobfuscation attempts determines strength. The more effort, time and resources it takes, the stronger the code is.
  • Differentiation. The degree to which transformed code differs from the original is another measure of how effective it is. Some of the ways used to judge differentiation include:
    • The number of predicates the new code contains.
    • The depth of the inheritance tree (DIT) -- a metric used to indicate the complexity of code. A higher DIT means a more complex program.
  • Expense. A cost-efficient obfuscation method will be more useful than one that's expensive, particularly when it comes to how well it scales for larger applications.
  • Complexity. The more layers the obfuscator adds, the more complex the program will be, making the obfuscation more successful.

Advantages of obfuscation

The main advantages of obfuscation are as follows:

  • Secrecy. Obfuscation hides the valuable information contained in code. This is an advantage for legitimate organizations looking to protect code from competitors and attackers. Conversely, bad actors capitalize on the secrecy of obfuscation to hide their malicious code.
  • Efficiency. Some obfuscation techniques, like unused code removal, have the effect of shrinking the program and making it less resource intensive to run.
  • Security. Obfuscation is a built-in security method, sometimes referred to as application self-protection. Instead of using an external security method, it works within what's being protected. It is well-suited for protecting applications that run in an untrusted environment and that contain sensitive information.

Disadvantages of obfuscation

One of the main disadvantages of obfuscation is it is also used in malware. Malware writers use it to evade antivirus programs that scan code for specific features. By obscuring those features, the malware appears legitimate to the antivirus software.

Common techniques malware authors use include:

  • Exclusive or (XOR). An operation that hides data by applying XOR values to code so that only a trained eye would be able to decrypt it.
  • ROT-13. An instruction that substitutes code for random characters.

With obfuscation, instead of developing new malware, authors repackage commonly used, commodity attack methods to disguise their features. In some cases, malicious actors include vendor-specific techniques.

Another disadvantage of obfuscation is it can make code more difficult to read. For example, code that uses the string encryption obfuscation method requires decryption of the strings at runtime, which slows performance.

Obfuscation and SolarWinds

An attack on SolarWinds, an Austin, Texas, IT management and monitoring software maker, which is thought to have started as far back as September 2019, resulted in a host of other companies and government agencies being breached. The attack was discovered in December 2020 and is attributed to Russian hackers. It initially compromised SolarWinds' Orion IT management platform.

The attackers used Sunburst malware, which combined obfuscation, machine learning and AI techniques to plant a backdoor in software updates for the Orion platform. To disguise their efforts and bypass defenses, they altered audit logs, deleted files and programs after use and faked activity to make it appear as legitimate applications on the network.

This supply chain attack is suspected to have remained undetected for more than a year. The malware inserted in the Orion code lay dormant and hidden until users downloaded the infected updates. It then spread through the network undetected and infected a long list of organizations using Orion.

Obfuscation is one of many techniques hackers employ to break into IT systems. Learn more about defending against various types of cybersecurity attacks in TechTarget's in-depth cybersecurity planning guide.

This was last updated in April 2021

Continue Reading About obfuscation

Dig Deeper on Application and platform security

Networking
CIO
Enterprise Desktop
Cloud Computing
ComputerWeekly.com
Close