Application Protocol Identification

Shane Alcock, WAND Network Research Group, University of Waikato

Update: This work has now been converted into a software library. This means that our application protocol identification rules can be used in your own network traffic analysis programs.

See the libprotoident page for more information and to download the software.


Traditionally, network researchers have looked at TCP and UDP port numbers to determine which application is responsible for a given flow. For example, HTTP flows all operated over TCP port 80, SMTP over TCP port 25 and DNS over UDP port 53. With the advent of peer-to-peer protocols such as BitTorrent that frequently operate on high ports that have been more-or-less randomly selected, relying on port numbers alone will mean that many applications of interest cannot be detected in this fashion. This problem is best highlighted by the paper "Is P2P dying or just hiding"

Over the past few years, members of the WAND group have spent a significant amount of time developing rules to identify applications based on the first four bytes of application payload. Packets are truncated in many of our trace sets to provide us with access to these four bytes. The work has been primarily done by Shane Alcock with help from Aaron Murrihy and Donald Neal.

The particular version that is documented on this page uses only the payload extracted from the first packet observed for each direction of a bidirectional flow. This has the advantage that the identification code need only be called once per flow and can be done as soon as payload has been observed in both directions. It also means that we are less likely to be fooled by plain text content that may contain a string we are using to identify a protocol. However, it does mean that any subsequent information that may have been useful in identifying the application is ignored, but previous experience has suggested that there are few occasions where this genuinely matters.

To perform our identification, the following data is required for each flow:

The following is a list of all the application protocols that we have developed rules for, along with a plain English description of the rules themselves. Unless otherwise noted, ALL of the rules described for a protocol must be satisfied to confirm an identification. If you spot any errors or inaccuracies, please contact salcock@cs.waikato.ac.nz

NOTE: This list is pretty out of date - the libprotoident library supports all these protocols plus many more. At the moment, the libprotoident source code itself is the best documentation.

Jump straight to a particular protocol:


All Seeing Eye

Either of the payloads matches the string "EYE1".


ar Archive

Either of the payloads matches the string "!<ar".


Ares

Either of the payloads matches the string "ARES".


Azureus

When converted to 32-bit unsigned integers, the value of both payloads is equal to the length of the payload themselves


AND one of the port numbers is 27001 (because many other protocols also begin with a four byte length field, so we need to differentiate them somehow).


BitTorrent

Either of the payloads begins with the character '0x13' followed by the string "Bit".


BitTorrent Extension

ONE of the following conditions is met:


BitTorrent (UDP)

Either of the payloads begins with the string "d1:" followed by any other character.


Blizzard

Either of the payloads matches the hexadecimal character sequence "\x00\x06\xec\x01" and the other payload begins with the hexadecimal character '\x00' followed by any single character followed by the hexadecimal sequence "\xed\x01"

OR either of the payloads matches the hexadecimal character sequence "\x10\xdf\x22\x00" and the other payload matches the hexadecimal sequence "\x10\x00\x00\x00".


Call of Duty

Either of the payloads is 15 bytes in length and matches the hexadecimal character sequence "\xff\xff\xff\xff"

AND if payload is observed in the other direction, it must also begin with the hexadecimal sequence "\xff\xff\xff\xff".


Citrix ICA

Either of the payloads matches the hexadecimal character sequence "\x7f\x7f\x49\x43".


DHCP

Either of the payloads matches one of the following hexadecimal sequences:


DirectConnect

Either of the payloads matches one of the following strings:


DNS

The third and fourth bytes of both payloads must be exactly the same

AND the second byte of both payloads must be exactly the same after being bitwise ANDed with 0x79

AND the most significant bit of the second byte must NOT be the same for both payloads, i.e. it must be 1 for one of the payloads and 0 for the other.


DXP (SilverPlatter)

Either of the payloads matches the hexadecimal character sequence "\xb0\x04\x15\x00".


eMule

For BOTH payloads, the first byte is either the hexadecimal character '\xe3' or '\xc5', but both initial bytes cannot be '\xc5'. '\xe3' in both directions is acceptable.

AND the third and fourth bytes must be the hexadecimal character '\x00'.

Note: if there is no payload for one of the directions, the other payload must begin with '\xe3'.


FTP Control

Either of the payloads matches the string "FEAT" OR either of the payloads matches the string "USER" and the other payload matches one of the following strings:


FTP Data

One of the payload lengths is zero bytes and the other payload resembles the first four characters of a set of Unix-style file permissions:

OR one of the payload lengths is zero bytes and either of the port numbers is 20. (Note: I'm not a big fan of relying on port numbers, but all the actual FTP protocol info is exchanged via the control channel so it's hard to find any recognisable patterns)


Gamespy v1

Either of the payloads matches one of the following strings:


Gamespy v2+

Either of the payloads begins with the hexadecimal character sequence "\xfe\xfd" and the third and fourth bytes of that payload matches the first two bytes of the other payload.


Gnutella (TCP)

Note that Gnutella often uses HTTP to complete exchanges so this set of rules must be checked before perfoming an HTTP check.

Either of the payloads matches one of the following strings:


Gnutella (UDP)

Either of the payloads begins with the string "GND" followed by any other character.


Goku Chat

Either of the payloads matches the string "baut" and the other string matches one of the following strings:


Hamachi

When converted to 32-bit unsigned integers, the value of both payloads is equal to the length of the payload themselves

AND one of the port numbers is 12975 (because many other protocols also begin with a four byte length field, so we need to differentiate them somehow).


HTTP

Either of the payloads matches one of the following strings:


HTTP (Microsoft Extensions)

Probably redundant now, as the regular HTTP check will probably catch these flows first with the "HTTP" string in the response.

Either of the payloads matches one of the following strings:


HTTP Tunnelling

Note: this protocol should be evaluated BEFORE checking regular HTTP.

Either of the payloads matches the string "CONN" and the other payload matches the string "HTTP".

If one of the directions lacks payload, the other payload must match the string "CONN".


HTTPS

Note: HTTPS is not actually a separate protocol - it is just HTTP encrypted using SSL. Hence, we have to use the port number to distinguish between the two.

One of the port numbers is port 443 and the payloads successfully match the rules for SSL described below.


ID

Either of the payloads matches the string " : U".


IMAP

Either of the payloads matches the string "* OK".


IRC

Either of the payloads matches one of the following strings:


Lotus Notes

This is a proprietary protocol so I have been unable to uncover any docs that would confirm this rule as valid.

Either of the payloads matches the hexadecimal character sequence "\x78\x00\x00\x00" and the third and fourth bytes of the other payload match the hexadecimal sequence "\x00\x00".


Mitglieder

A trojan that is often used to relay spam via SMTP.

Either of the payloads matches the hexadecimal character sequence "\x04\x01\x00\x19".


MSN

Either of the payloads matches one of the following strings:


MSN Video

Either of the payloads has a length of 10 bytes and begins with the hexadecimal character sequence "\x48\x00\x00\x00".


MSN Voice

One of the payloads matches the hexadecimal character sequence "\x01\x01\x00\x70" and the other payload matches the hexadecimal character sequence "\x00\x01\x00\x64".


MySQL

When treated as unsigned integers, the value of the first three bytes of both payloads is equal to the length of the payloads themselves minus four bytes.

AND the fourth byte of one of the payloads is the hexadecimal character '\x00' and the fourth byte of the other payload is the hexadecimal character '\x01'.

Note: if payload is only present in one direction, the fourth byte of the payload for the other direction must be '0x00'.


Mzinga

Either of the payload matches the string "PCHA".


NCSoft

NCSoft is a major publisher of Online Role Playing Games. This protocol appears to be used to communicate with their servers.

Either of the payloads matches the hexadecimal character sequence "\x00\x05\x0c\x00".


Netbios

BOTH of the payloads begin with the hexadecimal character sequence "\x81\x00"

AND when converted to a 16-bit unsigned integer, the value of the third and fourth bytes of BOTH payloads is equal to the length of the payloads themselves minus four bytes.


NNTP

Either of the payloads matches one of the following strings:

OR one of the payloads matches the string "AUTH" and the other payload matches one of the following strings:


NTP v3

One of the payloads is 48 bytes in length and begins with the hexadecimal character '\x1b' and the other payload, if present, begins with the hexadecimal character '\x1c'.


NTP v4

One of the payloads is 48 bytes in length and begins with the hexadecimal character '\x23' and the other payload, if present, begins with the hexadecimal character '\x24'.


POP3

Either of the payloads begins with one of the following strings:


Razor

Razor is a protocol used for updating spam matching rules, e.g. SpamAssassin.

Either of the payloads begins with the string "sn=" followed by any character.


RDP

BOTH of the payloads must begin with the hexadecimal character sequence "\x03\x00"

AND when converted to a 16-bit unsigned integer, the value of the third and fourth bytes of BOTH payloads is equal to the length of the payloads themselves.


RFB

Either of the payloads matches the string "RFB ".


RPC Exploit

A particular exploit in Windows Remote Procedure Call.

Either of the payloads matches the hexadecimal character sequence "\x05\x00\x0b\03".


Rsync

Either of the payloads matches the string "@RSY".


RTSP

Either of the payloads matches the string "RTSP".


RTP

Either of the payloads matches the hexadecimal character sequence "\x00\x01\x00\x08" and the other payload begins with the hexadecimal sequence "\x80\x80" followed by any other two characters.


SIP (TCP)

One of the payloads matches the string "SIP/" and the other matches the string "REGI".

OR one of the payloads matches the string "SIP-" and the other begins with "R " followed by any other two characters.


SIP (UDP)

Either of the payloads begins with the string "SIP" followed by any other character.


SMTP

One of the payloads matches the string "220 " and the other payload matches one of the following strings:

If the match occurs on either of "quit" or "QUIT", one of the port numbers must be 25 to differentiate from FTP which also uses "220 " and "QUIT" as a valid exchange.


SMTP (Rejected)

We created a distinct category for SMTP connections where the server immediately sends a rejection code rather than the 220 banner - mainly so we could identify spammers.

Either of the payloads matches one of the following strings:


SMTP Scan

We also created a distinct category for SMTP connections where the client never sent any SMTP commands.

No payload was observed for one direction and the payload for the other direction matches the string "220 ".


SSH

Either of the payloads matches the string "SSH-" OR either of the payloads matches the string "QUIT" and one of the port numbers was 22.


SSL

Note: this SSL rule also matches TLS - we do not currently distinguish between the two.

One of the following conditions must be met:


Steam

Either of the payloads matches the hexadecimal character sequence "\xff\xff\xff\xff"

AND one of the following conditions is met:


Steam Friends

Either of the payloads matches the string "VS01".


TDS

Either of the payloads begins with the hexadecimal character sequence "\x04\x01" and the other payload begins with "\x12\x01".

AND when converted to a 16-bit unsigned integer, the value of the third and fourth bytes of BOTH payloads is equal to the length of the payloads themselves.


Telnet

Either of the payloads matches one of the following hexadecimal sequences:


TOR

This one is actually a bit of a guess - none of the TOR documentation I've found describes anything like this, but it was observed on the known TOR port for multiple IPs.

Either of the payloads matches the hexadecimal character sequence "\x3d\x00\x00\x00" and the payload length for that packet was exactly four bytes.


Warcraft 3

The rules for this protocol are not 100% confirmed.

Either of the payloads matches the hexadecimal character sequence "\xf7\x37\x012\x00".


Windows Messenger

Either of the payloads matches the hexadecimal character sequence "\x04\x00\x78\x00".


Xunlei (TCP)

BOTH of the payloads match the hexadecimal character sequence "\x29\x00\x00\x00".


Xunlei (UDP)

BOTH of the payloads match the hexadecimal character sequence "\x32\x00\x00\x00".


Yahoo

Either of the payloads matches one of the following strings:


Yahoo Webcam

Either of the payloads matches one of the following strings or hexadecimal sequences: