Update: This work has now been converted into a software library. This means that our application protocol identification rules can be used in your own network traffic analysis programs.
See the libprotoident page for more information and to download the software.
Traditionally, network researchers have looked at TCP and UDP port numbers to determine which application is responsible for a given flow. For example, HTTP flows all operated over TCP port 80, SMTP over TCP port 25 and DNS over UDP port 53. With the advent of peer-to-peer protocols such as BitTorrent that frequently operate on high ports that have been more-or-less randomly selected, relying on port numbers alone will mean that many applications of interest cannot be detected in this fashion. This problem is best highlighted by the paper "Is P2P dying or just hiding"
Over the past few years, members of the WAND group have spent a significant amount of time developing rules to identify applications based on the first four bytes of application payload. Packets are truncated in many of our trace sets to provide us with access to these four bytes. The work has been primarily done by Shane Alcock with help from Aaron Murrihy and Donald Neal.
The particular version that is documented on this page uses only the payload extracted from the first packet observed for each direction of a bidirectional flow. This has the advantage that the identification code need only be called once per flow and can be done as soon as payload has been observed in both directions. It also means that we are less likely to be fooled by plain text content that may contain a string we are using to identify a protocol. However, it does mean that any subsequent information that may have been useful in identifying the application is ignored, but previous experience has suggested that there are few occasions where this genuinely matters.
To perform our identification, the following data is required for each flow:
The following is a list of all the application protocols that we have developed rules for, along with a plain English description of the rules themselves. Unless otherwise noted, ALL of the rules described for a protocol must be satisfied to confirm an identification. If you spot any errors or inaccuracies, please contact salcock@cs.waikato.ac.nz
NOTE: This list is pretty out of date - the libprotoident library supports all these protocols plus many more. At the moment, the libprotoident source code itself is the best documentation.
Jump straight to a particular protocol:
Either of the payloads matches the string "EYE1".
Either of the payloads matches the string "!<ar".
Either of the payloads matches the string "ARES".
When converted to 32-bit unsigned integers, the value of both payloads is equal to the length of the payload themselves
AND one of the port numbers is 27001 (because many other protocols also begin with a four byte length field, so we need to differentiate them somehow).
Either of the payloads begins with the character '0x13' followed by the string "Bit".
ONE of the following conditions is met:
Either of the payloads begins with the string "d1:" followed by any other character.
Either of the payloads matches the hexadecimal character sequence "\x00\x06\xec\x01" and the other payload begins with the hexadecimal character '\x00' followed by any single character followed by the hexadecimal sequence "\xed\x01"
OR either of the payloads matches the hexadecimal character sequence "\x10\xdf\x22\x00" and the other payload matches the hexadecimal sequence "\x10\x00\x00\x00".
Either of the payloads is 15 bytes in length and matches the hexadecimal character sequence "\xff\xff\xff\xff"
AND if payload is observed in the other direction, it must also begin with the hexadecimal sequence "\xff\xff\xff\xff".
Either of the payloads matches the hexadecimal character sequence "\x7f\x7f\x49\x43".
Either of the payloads matches one of the following hexadecimal sequences:
Either of the payloads matches one of the following strings:
The third and fourth bytes of both payloads must be exactly the same
AND the second byte of both payloads must be exactly the same after being bitwise ANDed with 0x79
AND the most significant bit of the second byte must NOT be the same for both payloads, i.e. it must be 1 for one of the payloads and 0 for the other.
Either of the payloads matches the hexadecimal character sequence "\xb0\x04\x15\x00".
For BOTH payloads, the first byte is either the hexadecimal character '\xe3' or '\xc5', but both initial bytes cannot be '\xc5'. '\xe3' in both directions is acceptable.
AND the third and fourth bytes must be the hexadecimal character '\x00'.
Note: if there is no payload for one of the directions, the other payload must begin with '\xe3'.
Either of the payloads matches the string "FEAT" OR either of the payloads matches the string "USER" and the other payload matches one of the following strings:
One of the payload lengths is zero bytes and the other payload resembles the first four characters of a set of Unix-style file permissions:
OR one of the payload lengths is zero bytes and either of the port numbers is 20. (Note: I'm not a big fan of relying on port numbers, but all the actual FTP protocol info is exchanged via the control channel so it's hard to find any recognisable patterns)
Either of the payloads matches one of the following strings:
Either of the payloads begins with the hexadecimal character sequence "\xfe\xfd" and the third and fourth bytes of that payload matches the first two bytes of the other payload.
Note that Gnutella often uses HTTP to complete exchanges so this set of rules must be checked before perfoming an HTTP check.
Either of the payloads matches one of the following strings:
Either of the payloads begins with the string "GND" followed by any other character.
Either of the payloads matches the string "baut" and the other string matches one of the following strings:
When converted to 32-bit unsigned integers, the value of both payloads is equal to the length of the payload themselves
AND one of the port numbers is 12975 (because many other protocols also begin with a four byte length field, so we need to differentiate them somehow).
Either of the payloads matches one of the following strings:
Probably redundant now, as the regular HTTP check will probably catch these flows first with the "HTTP" string in the response.
Either of the payloads matches one of the following strings:
Note: this protocol should be evaluated BEFORE checking regular HTTP.
Either of the payloads matches the string "CONN" and the other payload matches the string "HTTP".
If one of the directions lacks payload, the other payload must match the string "CONN".
Note: HTTPS is not actually a separate protocol - it is just HTTP encrypted using SSL. Hence, we have to use the port number to distinguish between the two.
One of the port numbers is port 443 and the payloads successfully match the rules for SSL described below.
Either of the payloads matches the string " : U".
Either of the payloads matches the string "* OK".
Either of the payloads matches one of the following strings:
This is a proprietary protocol so I have been unable to uncover any docs that would confirm this rule as valid.
Either of the payloads matches the hexadecimal character sequence "\x78\x00\x00\x00" and the third and fourth bytes of the other payload match the hexadecimal sequence "\x00\x00".
A trojan that is often used to relay spam via SMTP.
Either of the payloads matches the hexadecimal character sequence "\x04\x01\x00\x19".
Either of the payloads matches one of the following strings:
Either of the payloads has a length of 10 bytes and begins with the hexadecimal character sequence "\x48\x00\x00\x00".
One of the payloads matches the hexadecimal character sequence "\x01\x01\x00\x70" and the other payload matches the hexadecimal character sequence "\x00\x01\x00\x64".
When treated as unsigned integers, the value of the first three bytes of both payloads is equal to the length of the payloads themselves minus four bytes.
AND the fourth byte of one of the payloads is the hexadecimal character '\x00' and the fourth byte of the other payload is the hexadecimal character '\x01'.
Note: if payload is only present in one direction, the fourth byte of the payload for the other direction must be '0x00'.
Either of the payload matches the string "PCHA".
NCSoft is a major publisher of Online Role Playing Games. This protocol appears to be used to communicate with their servers.
Either of the payloads matches the hexadecimal character sequence "\x00\x05\x0c\x00".
BOTH of the payloads begin with the hexadecimal character sequence "\x81\x00"
AND when converted to a 16-bit unsigned integer, the value of the third and fourth bytes of BOTH payloads is equal to the length of the payloads themselves minus four bytes.
Either of the payloads matches one of the following strings:
OR one of the payloads matches the string "AUTH" and the other payload matches one of the following strings:
One of the payloads is 48 bytes in length and begins with the hexadecimal character '\x1b' and the other payload, if present, begins with the hexadecimal character '\x1c'.
One of the payloads is 48 bytes in length and begins with the hexadecimal character '\x23' and the other payload, if present, begins with the hexadecimal character '\x24'.
Either of the payloads begins with one of the following strings:
Razor is a protocol used for updating spam matching rules, e.g. SpamAssassin.
Either of the payloads begins with the string "sn=" followed by any character.
BOTH of the payloads must begin with the hexadecimal character sequence "\x03\x00"
AND when converted to a 16-bit unsigned integer, the value of the third and fourth bytes of BOTH payloads is equal to the length of the payloads themselves.
Either of the payloads matches the string "RFB ".
A particular exploit in Windows Remote Procedure Call.
Either of the payloads matches the hexadecimal character sequence "\x05\x00\x0b\03".
Either of the payloads matches the string "@RSY".
Either of the payloads matches the string "RTSP".
Either of the payloads matches the hexadecimal character sequence "\x00\x01\x00\x08" and the other payload begins with the hexadecimal sequence "\x80\x80" followed by any other two characters.
One of the payloads matches the string "SIP/" and the other matches the string "REGI".
OR one of the payloads matches the string "SIP-" and the other begins with "R " followed by any other two characters.
Either of the payloads begins with the string "SIP" followed by any other character.
One of the payloads matches the string "220 " and the other payload matches one of the following strings:
If the match occurs on either of "quit" or "QUIT", one of the port numbers must be 25 to differentiate from FTP which also uses "220 " and "QUIT" as a valid exchange.
We created a distinct category for SMTP connections where the server immediately sends a rejection code rather than the 220 banner - mainly so we could identify spammers.
Either of the payloads matches one of the following strings:
We also created a distinct category for SMTP connections where the client never sent any SMTP commands.
No payload was observed for one direction and the payload for the other direction matches the string "220 ".
Either of the payloads matches the string "SSH-" OR either of the payloads matches the string "QUIT" and one of the port numbers was 22.
Note: this SSL rule also matches TLS - we do not currently distinguish between the two.
One of the following conditions must be met:
Either of the payloads matches the hexadecimal character sequence "\xff\xff\xff\xff"
AND one of the following conditions is met:
Either of the payloads matches the string "VS01".
Either of the payloads begins with the hexadecimal character sequence "\x04\x01" and the other payload begins with "\x12\x01".
AND when converted to a 16-bit unsigned integer, the value of the third and fourth bytes of BOTH payloads is equal to the length of the payloads themselves.
Either of the payloads matches one of the following hexadecimal sequences:
This one is actually a bit of a guess - none of the TOR documentation I've found describes anything like this, but it was observed on the known TOR port for multiple IPs.
Either of the payloads matches the hexadecimal character sequence "\x3d\x00\x00\x00" and the payload length for that packet was exactly four bytes.
The rules for this protocol are not 100% confirmed.
Either of the payloads matches the hexadecimal character sequence "\xf7\x37\x012\x00".
Either of the payloads matches the hexadecimal character sequence "\x04\x00\x78\x00".
BOTH of the payloads match the hexadecimal character sequence "\x29\x00\x00\x00".
BOTH of the payloads match the hexadecimal character sequence "\x32\x00\x00\x00".
Either of the payloads matches one of the following strings:
Either of the payloads matches one of the following strings or hexadecimal sequences: