Problem solve Get help with specific problems with your technologies, process and projects.

Preparing for extrusion detection with a network traffic analysis

Extrusion detection and prevention products can help companies proactively thwart internal data security breaches, but preparation is required before making a purchase. In this Data Protection Security School tip, Richard Bejtilch discusses the importance of network traffic awareness, along with the ability to acquire data from conversing hosts. Security School
This tip is part of the Data Protection Security School lesson on preventing data leakage. Visit the Preventing data leakage lesson page for more learning resources.

Extrusion detection and prevention products are designed to inspect and/or deny network traffic carrying unauthorized content beyond the perimeter of the enterprise. Terms associated with extrusion products include data leak protection (DLP), exfiltration, and intellectual property leakage (IPL). The basic idea is to identify and/or stop sensitive business content -- such as Social Security numbers, credit card numbers, sales data, and the like -- from leaving the network.

But before buying an extrusion detection or prevention product, security professionals must prepare by engaging in a number of technical steps. It's worth noting that a number of non-technical steps -- including creating or reviewing the organization's security policy, identifying and prioritizing business information and systems and formulating attack scenarios -- are just as important and should typically come first. But once these non-technical steps are taken care of, one can turn to technical considerations. The most efficient way to do this is by conducting a network traffic analysis.

Knowing the network
It's important to acquire and maintain a sense of the traffic traversing the network. This sort of situational awareness doesn't need to take place at the per-packet level. Instead, start with statistical data. Open source tools like Darkstat and Ntop can be deployed on stand-alone passive sensors to gather traffic volume statistics, active IP addresses and observed services. For example, one might run Darkstat for 48 hours and notice a lot of traffic from a company host to a machine in Russia. Simply seeing this traffic could indicate a security problem.

The following is an example of output for a specific IP from Darkstat:

In: 595,241,799
Out: 21,944,219
Total: 617,186,018

TCP ports
(1-5 of 5)
Port Service In Out Total SYNs
22 ssh 594,567,104 11,398,612 605,965,716 6
995 pop3s 213,802 7,136,793 7,350,595 42
80 http 271,439 2,867,041 3,138,480 249
443 https 87,593 312,445 400,038 51
25 smtp 101,861 229,328 331,189 34

UDP ports

The table is empty.

IP protocols
(1-1 of 1)
# Proto In Out Total
6 tcp 595,241,799 21,944,219 617,186,018

Statistical data is helpful, but it's not granular enough to identify individual connections of interest. To acquire information on hosts conversing on a per-connection basis, I recommend collecting session data. Session data records source IP, destination IP, source port, destination port, protocol and traffic sent by either side of a conversation. Layer 3 switches and routers can export session data in NetFlow and similar formats to open source collectors and analyzer like Flow-tools. Other open source tools like Argus can operate independently, collecting and analyzing session data. The Security Analyst's Network Connection Profiler (SANCP) is integrated into Sguil, an open source suite for network security monitoring.

The following is an example of session data for a conversation exported from SANCP and Sguil. (In Sguil this data is represented in a row format.)

Sensor: cel433 Session
ID: 5055537005472227539
Start Time: 2007-04-20 15:45:35 End
Time: 2007-04-20 15:45:35 ->
Source Packets: 5 Bytes:302
Dest Packets: 5 Bytes:131

Beyond the packet
In addition to statistical and session data, one should be familiar with the process of collecting full content data in order to identify exactly what's represented by a session of interest. Full content data can be collected by many tools. Open source options include Tcpdump, Wireshark/Tshark/Dumpcap, Snort and Daemonlogger. When confronting unencrypted traffic, full content is the only way to identify the information transmitted in a session of interest.

The following is the full content for the session previously demonstrated. It was collected by Snort running in packet-collection mode and reconstructed within Sguil by Tcpflow. P0f provided operating system identification.

Sensor Name: cel433
Timestamp: 2007-04-20 15:45:35
Connection ID: .cel433_5055537005472227539
Src IP: (
Dst IP: (
Src Port: 1031
Dst Port: 80
OS Fingerprint: - Windows XP SP1+, 2000 SP3
OS Fingerprint: -> (distance 2, link: ethernet/modem)

SRC: Accept: */*
SRC: Accept-Language: en-us
SRC: UA-CPU: x86
SRC: Accept-Encoding: gzip, deflate
SRC: If-Modified-Since: Mon, 08 Jan 2007 04:44:47 GMT
SRC: If-None-Match: "403b6-d5e-16cc4dc0"
SRC: User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)
SRC: Host:
SRC: Connection: Keep-Alive
DST: HTTP/1.1 304 Not Modified
DST: Date: Fri, 20 Apr 2007 15:45:47 GMT
DST: Server: Apache/2
DST: Connection: close
DST: ETag: "403b6-d5e-16cc4dc0"

As you can see, the connection shows a request for the / or index of

Conducting a network traffic analysis or forensics exam to reveal statistical, session, and full content data helps security professionals understand their networks, thereby guiding their decision to implement extrusion products. After all it does not make sense to try to select and deploy an extrusion product if an organization doesn't understand the traffic on its network. Only after gaining the ability to recognize the properties of the data traversing the wire does it become possible to be an informed buyer of extrusion tools.

About the author
Richard Bejtlich is an expert on data protection and information leakage. He is the author of The Tao of Network Security Monitoring and Extrusion Detection, and co-author of Real Digital Forensics. He is a frequent speaker and author of TaoSecurity blog.

This was last published in May 2007

Dig Deeper on Real-time network monitoring and forensics

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.