Andrea Danti - Fotolia

Malware in encrypted traffic uncovered with machine learning

Cisco claims it can accurately detect malware activity in encrypted traffic using machine learning, but some experts worry about privacy implications.

Detecting malware activity in encrypted traffic was thought to be an impossible task, but machine learning appears to have led to a working technique.

Blake Anderson, a technical leader at Cisco, and David McGrew, a fellow in the company's advanced security research group, said it isn't possible to look into encrypted traffic. But the two developed a machine learning model that studied data features in "TLS [Transport Layer Security] handshake metadata, DNS [domain name system] contextual flows linked to the encrypted flow, and the HTTP headers of HTTP-contextual flows" in order to see the difference in how these encrypted traffic streams were used in malicious and benign scenarios.

According to an article posted by Cisco and written by Jason Deign, the technique is called Encrypted Traffic Analytics (ETA), and it "involves looking for telltale signs in three features of encrypted data."

"The first [telltale sign] is the initial data packet of the connection. This by itself may contain valuable data about the rest of the content. Then, there is the sequence of packet lengths and times, which offers vital clues into traffic contents beyond the beginning of the encrypted flow," Cisco wrote. "Finally, ETA checks the byte distribution across the payloads of the packets within the flow being analyzed. Since this network-based detection process is aided by machine learning, its efficacy improves over time."

Ajay Uggirala, director of product marketing at Imperva, based in Redwood Shores, Calif., worried about this method in practice because "there are several challenges with inspecting encrypted traffic."

"There are different types of encryption and standards, and network security needs to make sure they can decrypt all the types of encrypted traffic to see what is in it, not just some of the traffic. Inspecting encrypted traffic impacts network performance, causing latency to network traffic," Uggirala told SearchSecurity. "The network operations team's service-level agreements rely on maintaining high performance, and in most cases, the team would not deploy a security device that adds latency to network traffic."

Cisco did not respond to requests for comment, so it is unclear if this method adds latency to encrypted traffic.

Looking for malware activity in encrypted traffic without decryption is similar to being in a dark room looking for a black cat without a flashlight. Yes, you will see shadows, but I don't expect the accuracy of this method is sufficiently high.
Nick Bilogorskiysenior director of threat operations at Cyphort

Nick Bilogorskiy, senior director of threat operations at Cyphort Inc., based in Santa Clara, Calif., and Mounir Hahad, senior director of the Cyphort Labs, said the success of Cisco's technique will depend on accuracy.

"Looking for malware activity in encrypted traffic without decryption is similar to being in a dark room looking for a black cat without a flashlight. Yes, you will see shadows, but I don't expect the accuracy of this method is sufficiently high," Bilogorskiy told SearchSecurity. "Supervised machine learning is a good tool to pick up anomalies, given clean voluminous data. But as attackers learn of this ETA detection method, they will likely try to modify their encrypted traffic to blend in and remove the features that machine learning models rely on for detection."

Hahad added, "It is interesting to note that this technique is not being introduced in Cisco's purpose-built security products like the ASA firewall or Advanced Malware Protection Threat Grid appliances, but rather in its switches and routers."

In their research abstract, Anderson and McGrew claimed "incorporating this contextual information into a supervised learning system significantly increases performance at a 0.00% false discovery rate for the problem of classifying encrypted, malicious flows."

Sam McLane, head of security engineering at Arctic Wolf Networks Inc., based in Sunnyvale, Calif., noted that even "an algorithm that results in zero false positives is going to have false negatives [and] threats that were missed. There is no silver bullet in security, and you cannot rely on just one detection method."

Potential privacy concerns

Tom Kellermann, CEO of Strategic Cyber Ventures, based in Washington, D.C., said the research itself was sound, but he wasn't convinced it was "foolproof or scalable." And he said he was worried this technique of scanning encrypted traffic could be modified "to find other types of data and allow for futuristic packet inspection and sniffing."

McLane echoed this concern and said this method can likely "be modified for any type of data. Just like any technology, it can be used for legal or illegal purposes."

"This can absolutely be used for malicious purposes. TLS encryption has been around for a long time, but cybercriminals do not use it as much because there is an added expense for them," McLane said. "As it becomes more widely adopted for the public internet, I expect cybercriminals will likely adopt it, as well, for their criminal efforts."

Uggirala said, "In the wrong hands, the technology could be modified to look for other types of metadata, such as email headers, user IDs, etc., which could then be copied and sent to bad actors to monetize."

Bilogorskiy imagined Cisco's technique "may be adapted to classify web browsing patterns and try to deanonymize a user's persona online. If there is enough distinction in the training data, then one could track individual users or groups of users online."

Next Steps

Learn why a lack of SSL traffic inspection can pose a threat to enterprises.

Find out how attackers turned Instagram into C&C infrastructure.

Get info on how nearly 25% of security pros are blind to threats in encrypted traffic. 

Dig Deeper on Network security

Networking
CIO
Enterprise Desktop
Cloud Computing
ComputerWeekly.com
Close