ipsumdump - produce ASCII summary of network traffic or tcpdump(1) trace
ipsumdump [-r | -i | ...] [--src, --dst, --sport, --dport, ...] [other options] [files or interfaces]
The ipsumdump program reads IP packets from one or more data sources, then summarizes those packets into a line-based ASCII file. The resulting summary dump is easy to process with text-based tools. (But see the --binary option, which generates a smaller binary file.)
Here are a couple lines of ipsumdump output, from 'ipsumdump -sd /home/kohler/largedump.gz':
!IPSummaryDump 1.3
!creator "ipsumdump -sd /home/kohler/largedump.gz"
!host max.lcdf.org
!runtime 1000943858.353723 (Wed Sep 19 16:57:38 2001)
!data ip_src ip_dst
64.55.139.202 209.247.204.242
18.26.4.9 64.55.139.202
The '-sd' option, which is equivalent to '--src --dst', tells ipsumdump to log source and destination IP addresses. '/home/kohler/largedump.gz' is a compressed tcpdump(1) file. Each data line represents a packet; a space separates the two addresses. The !data
comment describes the contents of each data line.
Source options tell ipsumdump what kind of data source to use: tcpdump(1) raw-packet files (--tcpdump), live network interfaces (--interface), NetFlow summary files (--netflow-summary), ipsumdump output files (--ipsumdump), DAG or NLANR-formatted files (--dag, --nlanr), or others.
Non-option arguments specify the files, or interfaces, to read. For example, 'ipsumdump -r eth0 eth1' will read two tcpdump(1) files, named "eth0" and "eth1"; 'ipsumdump -i eth0 eth1' will read from two live network interfaces, "eth0" and "eth1".
Options that read files read from the standard input when you supply a single dash -
as a filename, or when you give no filenames at all.
Read from one or more files produced by tcpdump(1)'s -w option (also known as "pcap files"). Stop when all the files are exhausted. This is the default. Files (except for standard input) may be compressed by gzip(1) or bzip2(1); ipsumdump will uncompress them on the fly.
Read from live network interfaces. When run this way, ipsumdump will continue until interrupted with SIGINT or SIGHUP. When stopped, ipsumdump appends a comment to its output file, indicating how many packets were dropped by the kernel before output.
Read from one or more ipsumdump files. Any packet characteristics not specified by the input files are set to 0.
Read from one or more ipsumdump files, using the specified default format. The format should be a space-separated list of content types; see ToIPSummaryDump(n) for a list.
Read from one or more DAG-formatted trace files. For new-style ERF dumps, which contain encapsulation type information, just say --dag. For old-style dumps, you must supply the right encap argument: ATM
for ATM RFC-1483 encapsulation (the most common), ETHER
for Ethernet, PPP
for PPP, IP
for raw IP, HDLC
for Cisco HDLC, PPP_HDLC
for PPP HDLC, or SUNATM
for Sun ATM. See http://dag.cs.waikato.ac.nz/.
Read from one or more NLANR-formatted trace files (fr, fr+, or tsh format). See http://pma.nlanr.net/Traces/.
Read from one or more NetFlow summary files. These are line-oriented ASCII files; blank lines, and lines starting with '!' or '#', are ignored. Other lines should contain 15 or more fields separated by vertical bars '|'. Ipsumdump pays attention to some of these fields:
Field Meaning Example
----- ---------------------------- ----------
0 Source IP address 192.4.1.32
1 Destination IP address 18.26.4.44
5 Packet count in flow 5
6 Byte count in flow 10932
7 Flow timestamp (UNIX-style) 998006995
8 Flow end timestamp 998006999
9 Source port 3917
10 Destination port 80
12 TCP flags (OR of all pkts) 18
13 IP protocol 6
14 IP TOS bits 0
Read from one or more files containing tcpdump(1) textual output. It's much better to use the binary files produced by 'tcpdump -w', but if someone threw those away and all you have is the ASCII output, you can still make do. Only works with tcpdump versions 3.7 and earlier.
These options determine the dump's contents. Each data option adds a field to the output file; you can supply any number of data options. In the output, fields are separated by spaces. If you say '-sd', or the equivalent '--src --dst', the dump's data lines will contain an IP source address, a space, and an IP destination address:
192.168.1.101 18.26.4.44
If you supply no dump content options, ipsumdump will not create a summary dump. This may be useful if you're only interested in creating a tcpdump file with --write-tcpdump.
Include packet timestamp in the dump. Example: 1000212480.005813
. For NetFlow summary input, the packet timestamp equals the flow-end timestamp. The timestamp has nanosecond precision when input timestamps had nanosecond precision.
Include flow-begin timestamp in the dump. Example: 1000212479.001937
. This is meaningful only for packet sources that include flow-begin timestamps, such as NetFlow summaries.
Include packet count in the dump. Some kinds of logs -- such as NetFlow summary logs -- record information about flows, not packets. A flow represents multiple packets; the packet count says exactly how many. Example: 1
. See also --multipacket, below.
Include wire length in the dump. This is the packet's length in the capture file, including any link headers and packet trailers. This is usually larger than --length, which returns the IP length.
Include the link number in the dump. TSH-format NLANR logs, NetFlow summary logs, and some IP summary logs can contain a link number. Example: 2
. For NetFlow summary logs, --link uses the input interface number.
Include the Ethernet source address in the dump. Example: 00-0A-95-A6-D9-BC
. Note that Ethernet addresses are only printed for IP packets.
Include the Ethernet destination address in the dump. Example: 00-0A-95-A6-D9-BC
. Note that Ethernet addresses are only printed for IP packets.
Include IP source address in the dump. Example: 192.168.1.101
.
Include IP destination address in the dump. Example: 18.26.4.44
.
Include IP packet length in the dump, not including any link-level headers. Example: 72
. See also --wire-length.
Include IP protocol in the dump. Can be T
for TCP, U
for UDP, I
for ICMP, or a number for some other protocol.
Include IP fragment test in the dump. The field value is F
for first fragments, f
for second and subsequent fragments, and .
(a single period) for nonfragments.
Include IP fragment offset in the dump. The field value is the fragment offset in bytes, possibly followed by a +
suffix, indicating the MF (more fragments) flag. Examples: 0+
(fragment offset 0, more fragments forthcoming), 552
(fragment offset 552, this is the last fragment).
Include IP ID field in the dump. Example: 19371
.
Include IP checksum in the dump. Example: 34987
.
Include IP options in the dump. Single IP option fields have the following representations:
EOL, NOP Not written, but FromIPSummaryDump
understands 'eol' and 'nop'
RR 'rr{10.0.0.1,20.0.0.2}+5' (addresses
inside the braces come before the
pointer; '+5' means there is space for
5 more addresses after the pointer)
SSRR, LSRR 'ssrr{1.0.0.1,1.0.0.2^1.0.0.3}'
('^' indicates the pointer)
TS 'ts{1,10000,!45}+2++3' (timestamps only
[type 0]; timestamp values 1, 10000,
and 45 [but 45 has the "nonstandard
timestamp" bit set]; the option has
room for 2 more timestamps; the
overflow counter is set to 3)
'ts.ip{1.0.0.1=1,1.0.0.2=2}+5'
(timestamps with IP addresses [type 1])
'ts.preip{1.0.0.1=1^1.0.0.2,1.0.0.3}'
(prespecified IP addresses [type 3];
the caret is the pointer)
Other options '98' (option 98, no data),
'99=0:5:10' (option with data, data
octets separated by colons)
Multiple options are separated by semicolons. Any invalid option causes the entire field to be replaced by a single question mark ?
. A period .
is used for packets with no options (except possibly EOL and NOP).
Include the IP time-to-live field in the dump.
Include the IP type of service field in the dump.
Include the IP header length in the dump. The length is measured in bytes.
Include the length of captured IP data in the dump. This can be less than the full IP length (see --length), since many packet capture programs will store only part of each packet's data.
Include TCP or UDP source port in the dump. Example: 8928
. For non-TCP or UDP packets, and for fragments after the first, this field is a single dash -
.
Include TCP or UDP destination port in the dump. Example: 80
.
Include length of packet payload in the dump. This is the length of the TCP or UDP payload, for TCP or UDP packets, or the length of the IP payload, for other IP packets. Example: 1000
.
Include the actual packet payload in the dump. This is the TCP or UDP payload, for TCP or UDP packets, or the IP payload, for other IP packets. Output as a double-quoted C string; non-ASCII characters, and double-quotes and backslashes, appear as C backslash escapes. Example: ",25\r\n\000"
.
Include an MD5 checksum of the packet payload in the dump. The payload is as defined above. In ASCII output, the output is a 22-character string consisting of characters [a-zA-Z0-9_@]; in binary output, it's a 16-character binary digest. Example: sQy@IjqXnFPwZtgtwaC5Hb
.
Like --payload-md5, but in ASCII output, the checksum is printed as 32 hexadecimal digits (the same format used by md5sum). Example: 12f6bb1941df66b8f138a446d4e8670c
.
TCP header fields equal a dash -
for non-TCP packets and non-first fragments.
Include TCP flags byte in the dump. Each flag is represented by an uppercase letter. Example: PA
(PSH and ACK are on, everything else is off). If no flags are on, the field is .
(a single period).
Flag characters are F
for FIN, S
for SYN, R
for RST, P
for PSH, A
for ACK, U
for URG, E
for ECE (flag bit 6), C
for CWR (flag bit 7), and N
for Nonce Sum (flag bit 8).
Include TCP sequence number in the dump. Example: 4009339012
.
Include TCP acknowledgement number in the dump. Example: 4009339012
.
Include TCP receive window in the dump. This value is not scaled by the connection's window scale, if any. Example: 480
.
Include TCP options in the dump. Single TCP option fields have the following representations:
EOL, NOP No representation
MSS 'mss1400'
Window scale 'wscale10'
SACK permitted 'sackok'
SACK 'sack95-98'; each SACK block
is listed separately
Timestamp 'ts669063908:38382731'
Other options '98' (option 98, no data),
'99=0:5:10' (option with data, data
octets separated by colons)
Multiple options are separated by semicolons. Any invalid option causes the entire field to be replaced by a single question mark ?
. A period .
is used for packets with no options (except possibly EOL and NOP).
Include SACK-related TCP options in the dump, using the format given under --tcp-opt
, above.
UDP header fields equal a dash -
for non-UDP packets and non-first fragments.
Include UDP length in the dump. This is the length reported in the UDP packet header. Example: 1000
.
ICMP header fields equal a dash -
for non-ICMP packets and non-first fragments.
Include ICMP type in the dump. Example: 3
. A dash is output for non-ICMP packets.
Include ICMP code in the dump. Example: 8
.
Include ICMP type in the dump, using textual names if known. Examples: echo
, echo-reply
, 100
.
Include ICMP code in the dump, using textual names if known. Examples: filterprohibited
, srcroutefail
, reassembly
, 97
.
Write the summary dump to file instead of to the standard output.
Write the summary dump in binary format. See below for more information.
Write processed packets to a tcpdump(1) file -- or to the standard output, if file is a single dash -
-- in addition to the usual summary output. Options including --filter and dump contents require IP; in the presence of these options, the output tcpdump(1) file will contain only IP packets. (ARP packets, for example, will not be written.)
The file written for --write-tcpdump will use microsecond-precision timestamps, rather than nonsecond-precision timestamps (the default).
Do not include IP packet payloads in any --write-tcpdump output.
Only include packets and flows matching a tcpdump(1) filter. For example, 'ipsumdump -f "tcp && src net 18/8"' will summarize data only for TCP packets from net 18. (The syntax for filter is currently a subset of tcpdump's syntax.)
Print lines like !bad IP header length 4
for packets with no IP headers, bad IP headers, or bad TCP/UDP headers. (A bad header has an incorrect length or unexpected version, or is spread across multiple fragments.) The !bad
line will immediately precede the normal output line. Whether or not --bad-packets is true, a dash -
is printed for any piece of information that came from a bad header, or that came from a portion of the header that was not captured.
Anonymize IP addresses in the output. The anonymization preserves prefix and class. This means, first, that two anonymized addresses will share the same prefix when their non-anonymized counterparts share the same prefix; and second, that anonymized addresses will be in the same class (A, B, C, or D) as their non-anonymized counterparts. The anonymization algorithm comes from tcpdpriv(1); it works like 'tcpdpriv -A50 -C4'.
If --anonymize and --write-tcpdump are both on, the tcpdump output file will have anonymized IP addresses. However, the file will contain actual packet data, unlike tcpdpriv output.
Do not place interfaces into promiscuous mode. Promiscuous mode is the default.
Sample packets with probability p. That is, p is the chance that a packet will cause output to be generated. The actual probability may differ from the specified probability, due to fixed point arithmetic; check the output for a !sampling_prob
comment to see the real probability. Strictly speaking, this option samples records, not packets; so for NetFlow summaries without --multipacket, it will sample flows.
Supply this option if you are reading NetFlow or IP summaries -- files where each record might represent multiple packets -- and you would like the output summary to have one line per packet, instead of the default one line per record. See also --packet-count, above.
Sort output packets by increasing timestamp. Use this option when reading from multiple tcpdump(1) files to ensure that the output has sorted timestamps. Combine --collate with --write-tcpdump to collate overlapping tcpdump(1) files into a single, sorted tcpdump(1) file.
Process packets for time, an interval length in seconds (or give a suffix like '2m' or '1hr'). For --interface, ipsumdump will quit after it has run for time. For other options, ipsumdump will quit before writing a packet whose timestamp is more than time seconds later than the timestamp on the first packet it sees.
Skip the first count packets.
Output at most count packets, then quit.
addrs is a space- or comma-separated list of IP addresses and/or prefixes. When the summary dump completes, ipsumdump will write those addresses to the standard error, paired with their anonymized counterparts.
Useful when reading from interfaces. This option causes ipsumdump to write a comment recording the cumulative number of packets output, and the number of packets dropped by the kernel before ipsumdump could process them, every time seconds. (Or you can say, for example, '2m' for 2 minutes.) A sample comment:
!counts out 0 kdrop 0
This says that ipsumdump has output 0 records, and the kernel reported 0 packet drops since ipsumdump began.
Set the random seed deterministically to seed, an unsigned integer. By default, the random seed is initialized to a random value using /dev/random, if it exists, combined with other data. The random seed indirectly determines which packets are sampled, and the values of anonymized IP addresses.
Do not use memory mapping when reading files. This may prevent crashes if you feed ipsumdump a corrupted file. See BUGS, below.
Do not print a progress bar to standard error. This is the default when ipsumdump isn't running interactively.
Do not print the IP summary dump header lines that make the dump self-describing.
Do not produce a summary. Instead, write the Click configuration that ipsumdump would run to the standard output.
Produce more verbose error messages.
Print a help message to the standard output, then exit.
Print version number and license information to the standard output, then exit.
When killed with SIGTERM or SIGINT, ipsumdump will exit cleanly by flushing its buffers. If you want it to flush its buffers without exiting, kill it with SIGHUP.
The '-tsSdDp' option set covers the most commonly useful information about each packet: timestamp, source address, source port, destination address, destination port, and protocol. Invoking 'ipsumdump -i eth1 -tsSdDp' might produce output like this:
!IPSummaryDump 1.3
!creator "ipsumdump -i eth1 -tsSdDp"
!host max.lcdf.org
!runtime 1000967293.569808 (Wed Sep 19 23:28:13 2001)
!data timestamp ip_src sport ip_dst dport ip_proto
1000967303.641581 64.71.165.130 80 192.168.1.101 4450 T
1000967303.670506 64.71.165.130 80 192.168.1.101 4450 T
1000967303.882621 18.26.4.44 - 192.168.1.101 - I
1000967304.253874 64.71.165.130 80 192.168.1.101 4442 T
1000967304.390016 192.150.187.11 53 192.168.1.101 1299 U
1000967304.425992 207.171.182.16 80 192.168.1.101 4451 T
Here is the same data, anonymized with -A:
!IPSummaryDump 1.3
!creator "ipsumdump --ipsumdump -A -tsSdDp"
!host max.lcdf.org
!runtime 1000968019.67508 (Wed Sep 19 23:40:19 2001)
!data timestamp ip_src sport ip_dst dport ip_proto
1000967303.641581 29.50.142.215 80 204.196.101.50 4450 T
1000967303.670506 29.50.142.215 80 204.196.101.50 4450 T
1000967303.882621 89.142.236.79 - 204.196.101.50 - I
1000967304.253874 29.50.142.215 80 204.196.101.50 4442 T
1000967304.390016 204.224.59.219 53 204.196.101.50 1299 U
1000967304.425992 192.230.64.231 80 204.196.101.50 4451 T
Binary ipsumdump files begin with several ASCII lines, just like regular ipsumdump files. The line !binary
indicates that the rest of the file, starting immediately after the newline, consists of binary records. Each record looks like this:
+---------------+------------...
|X|record length| data
+---------------+------------...
<---4 bytes--->
The initial word of data contains the record length in bytes. (All numbers in the file are stored in network byte order.) The record length includes the initial word itself, so the minimum valid record length is 4. The high-order bit X
is the metadata indicator. It is zero for regular packets and one for metadata lines.
Regular packet records have binary fields stored in the order indicated by the !data
line, as follows:
Field Name Length Description
timestamp 8 timestamp sec, usec
ntimestamp 8 timestamp sec, nsec
first_timestamp 8 timestamp sec, usec
first_ntimestamp 8 timestamp sec, nsec
ip_src 4 source IP address
ip_dst 4 destination IP address
sport 2 source port
dport 2 destination port
ip_len 4 IP length field
ip_proto 1 IP protocol
ip_id 2 IP ID
ip_frag 1 fragment descriptor
('F', 'f', or '.')
ip_fragoff 2 IP fragment offset field
tcp_seq 4 TCP seqnece number
tcp_ack 4 TCP ack number
tcp_flags 1 TCP flags
tcp_opt ? TCP options
tcp_sack ? TCP SACK options
payload_len 4 payload length
count 4 packet count
Each field is Length bytes long. Variable-length fields have Length ?
in the table; in a packet record, these fields consist of a single length byte, followed by that many bytes of data.
The data stored in a metadata record is just an ASCII string, ending with newline, same as in a regular ASCII IPSummaryDump file. !bad
records, for example, are stored this way.
The ipsumdump program uses the Click modular router, an extensible system for processing packets. Click routers consist of C++ components called elements. While some elements run only in a Linux kernel, most can run either in the kernel or in user space, and there are user-level elements for reading packets from libpcap or from tcpdump files.
Ipsumdump creates and runs a user-level Click configuration. However, you don't need to install Click to run ipsumdump; the libclick directory contains all the relevant parts of Click, bundled into a library.
If you're curious, try running 'ipsumdump --config' with some other options to see the Click configuration ipsumdump would run.
This is, I think, a pleasant way to write a packet processor!
Version 1.0 of the IPSummaryDump ASCII file format expressed 'ip_fragoff' fields in units of 8 bytes. In version 1.1 and later, these fields are expressed in bytes.
Version 1.1 used W
for CWR in tcp_flags fields. Early releases in Version 1.0 versions printed a number between 0 and 255 for tcp_flags, or used X
and Y
for ECE and CWR. Version 1.2 and later uses C
for CWR.
The names of !data
fields were formerly printed in quotes, and could contain spaces, like the following:
!data 'timestamp' 'ip src' 'sport' 'ip dst' 'dport' 'ip proto'
ipsumdump still understands files with the old format.
Version 1.2 could unfortunately contain incorrect MD5 checksums for packets with both link-level headers and short payloads, such as pure TCP acknowledgments.
Ipsumdump can use the mmap(2) system call to access files, which often has better performance. Unfortunately, if ipsumdump memory-maps a corrupt file, it may crash with a segmentation violation.
tcpdump(1), tcpdpriv(1), click(1), ipaggcreate(1)
See http://www.read.cs.ucla.edu/click/ for more on Click.
Eddie Kohler <kohler@cs.ucla.edu>, based on the Click modular router.
Extensive feedback and suggestions from Vern Paxson <vern@icir.org>. Anonymization algorithm from tcpdpriv(1) by Greg Minshall.