nexmon – Blame information for rev 1
?pathlinks?
Rev | Author | Line No. | Line |
---|---|---|---|
1 | office | 1 | This file is a HOWTO for Wireshark developers. It describes how Wireshark |
2 | heuristic protocol dissectors work and how to write them. |
||
3 | |||
4 | This file is compiled to give in depth information on Wireshark. |
||
5 | It is by no means all inclusive and complete. Please feel free to send |
||
6 | remarks and patches to the developer mailing list. |
||
7 | |||
8 | |||
9 | Prerequisites |
||
10 | ------------- |
||
11 | As this file is an addition to README.dissector, it is essential to read |
||
12 | and understand that document first. |
||
13 | |||
14 | |||
15 | Why heuristic dissectors? |
||
16 | ------------------------- |
||
17 | When Wireshark "receives" a packet, it has to find the right dissector to |
||
18 | start decoding the packet data. Often this can be done by known conventions, |
||
19 | e.g. the Ethernet type 0x0800 means "IP on top of Ethernet" - an easy and |
||
20 | reliable match for Wireshark. |
||
21 | |||
22 | Unfortunately, these conventions are not always available, or (accidentally |
||
23 | or knowingly) some protocols don't care about those conventions and "reuse" |
||
24 | existing "magic numbers / tokens". |
||
25 | |||
26 | For example TCP defines port 80 only for the use of HTTP traffic. But, this |
||
27 | convention doesn't prevent anyone from using TCP port 80 for some different |
||
28 | protocol, or on the other hand using HTTP on a port number different than 80. |
||
29 | |||
30 | To solve this problem, Wireshark introduced the so called heuristic dissector |
||
31 | mechanism to try to deal with these problems. |
||
32 | |||
33 | |||
34 | How Wireshark uses heuristic dissectors? |
||
35 | ---------------------------------------- |
||
36 | While Wireshark starts, heuristic dissectors (HD) register themselves slightly |
||
37 | different than "normal" dissectors, e.g. a HD can ask for any TCP packet, as |
||
38 | it *may* contain interesting packet data for this dissector. In reality more |
||
39 | than one HD will exist for e.g. TCP packet data. |
||
40 | |||
41 | So if Wireshark has to decode TCP packet data, it will first try to find a |
||
42 | dissector registered directly for the TCP port used in that packet. If it |
||
43 | finds such a registered dissector it will just hand over the packet data to it. |
||
44 | |||
45 | In case there is no such "normal" dissector, WS will hand over the packet data |
||
46 | to the first matching HD. Now the HD will look into the data and decide if that |
||
47 | data looks like something the dissector "is interested in". The return value |
||
48 | signals WS if the HD processed the data (so WS can stop working on that packet) |
||
49 | or if the heuristic didn't match (so WS tries the next HD until one matches - |
||
50 | or the data simply can't be processed). |
||
51 | |||
52 | Note that it is possible to configure WS through preference settings so that it |
||
53 | hands off a packet to the heuristic dissectors before the "normal" dissectors |
||
54 | are called. This allows the HD the chance to receive packets and process them |
||
55 | differently than they otherwise would be. Of course if no HD is interested in |
||
56 | the packet, then the packet will ultimately get handed off to the "normal" |
||
57 | dissector as if the HD wasn't involved at all. As of this writing, the DCCP, |
||
58 | SCTP, TCP, TIPC and UDP dissectors all provide this capability via their |
||
59 | "Try heuristic sub-dissectors first" preference, but none of them have this |
||
60 | option enabled by default. |
||
61 | |||
62 | Once a packet for a particular "connection" has been identified as belonging |
||
63 | to a particular protocol, Wireshark should then be set up to always directly |
||
64 | call the dissector for that protocol. This removes the overhead of having |
||
65 | to identify each packet of the connection heuristically. |
||
66 | |||
67 | |||
68 | How do these heuristics work? |
||
69 | ----------------------------- |
||
70 | It's difficult to give a general answer here. The usual heuristic works as follows: |
||
71 | |||
72 | A HD looks into the first few packet bytes and searches for common patterns that |
||
73 | are specific to the protocol in question. Most protocols starts with a |
||
74 | specific header, so a specific pattern may look like (synthetic example): |
||
75 | |||
76 | 1) first byte must be 0x42 |
||
77 | 2) second byte is a type field and can only contain values between 0x20 - 0x33 |
||
78 | 3) third byte is a flag field, where the lower 4 bits always contain the value 0 |
||
79 | 4) fourth and fifth bytes contain a 16 bit length field, where the value can't |
||
80 | be larger than 10000 bytes |
||
81 | |||
82 | So the heuristic dissector will check incoming packet data for all of the |
||
83 | 4 above conditions, and only if all of the four conditions are true there is a |
||
84 | good chance that the packet really contains the expected protocol - and the |
||
85 | dissector continues to decode the packet data. If one condition fails, it's |
||
86 | very certainly not the protocol in question and the dissector returns to WS |
||
87 | immediately "this is not my protocol" - maybe some other heuristic dissector |
||
88 | is interested! |
||
89 | |||
90 | Obviously, this is *not* 100% bullet proof, but it's the best WS can offer to |
||
91 | its users here - and improving the heuristic is always possible if it turns out |
||
92 | that it's not good enough to distinguish between two given protocols. |
||
93 | |||
94 | Note: The heuristic code in a dissector *must not* cause an exception |
||
95 | (before returning FALSE) as this will prevent following |
||
96 | heuristic dissector handoffs. In practice, this normally means |
||
97 | that a test should be done to verify that the required data is |
||
98 | available in the tvb before fetching from the tvb. (See the |
||
99 | example below). |
||
100 | |||
101 | |||
102 | Heuristic Code Example |
||
103 | ---------------------- |
||
104 | You can find a lot of code examples in the Wireshark sources, e.g.: |
||
105 | grep -l heur_dissector_add epan/dissectors/*.c |
||
106 | returns 177 files (October 2015). |
||
107 | |||
108 | For the above example criteria, the following code example might do the work |
||
109 | (combine this with the dissector skeleton in README.developer): |
||
110 | |||
111 | XXX - please note: The following code examples were not tried in reality, |
||
112 | please report problems to the dev-list! |
||
113 | |||
114 | -------------------------------------------------------------------------------------------- |
||
115 | |||
116 | static dissector_handle_t PROTOABBREV_tcp_handle; |
||
117 | static dissector_handle_t PROTOABBREV_pdu_handle; |
||
118 | |||
119 | /* Heuristics test */ |
||
120 | static gboolean |
||
121 | test_PROTOABBREV(packet_info *pinfo _U_, tvbuff_t *tvb, int offset _U_, void *data _U_) |
||
122 | { |
||
123 | /* 0) Verify needed bytes available in tvb so tvb_get...() doesn't cause exception. |
||
124 | if (tvb_captured_length(tvb) < 5) |
||
125 | return FALSE; |
||
126 | |||
127 | /* 1) first byte must be 0x42 */ |
||
128 | if ( tvb_get_guint8(tvb, 0) != 0x42 ) |
||
129 | return FALSE; |
||
130 | |||
131 | /* 2) second byte is a type field and only can contain values between 0x20-0x33 */ |
||
132 | if ( tvb_get_guint8(tvb, 1) < 0x20 || tvb_get_guint8(tvb, 1) > 0x33 ) |
||
133 | return FALSE; |
||
134 | |||
135 | /* 3) third byte is a flag field, where the lower 4 bits always contain the value 0 */ |
||
136 | if ( tvb_get_guint8(tvb, 2) & 0x0f ) |
||
137 | return FALSE; |
||
138 | |||
139 | /* 4) fourth and fifth bytes contains a 16 bit length field, where the value can't be longer than 10000 bytes */ |
||
140 | /* Assumes network byte order */ |
||
141 | if ( tvb_get_ntohs(tvb, 3) > 10000 ) |
||
142 | return FALSE; |
||
143 | |||
144 | /* Assume it's your packet ... */ |
||
145 | return TRUE; |
||
146 | } |
||
147 | |||
148 | /* Dissect the complete PROTOABBREV pdu */ |
||
149 | static int |
||
150 | dissect_PROTOABBREV_pdu(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data _U_) |
||
151 | { |
||
152 | /* Dissection ... */ |
||
153 | |||
154 | return tvb_reported_length(tvb); |
||
155 | } |
||
156 | |||
157 | /* For tcp_dissect_pdus() */ |
||
158 | static guint |
||
159 | get_PROTOABBREV_len(packet_info *pinfo _U_, tvbuff_t *tvb, int offset, void *data _U_) |
||
160 | { |
||
161 | return (guint) tvb_get_ntohs(tvb, offset+3); |
||
162 | } |
||
163 | |||
164 | static int |
||
165 | dissect_PROTOABBREV_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) |
||
166 | { |
||
167 | tcp_dissect_pdus(tvb, pinfo, tree, TRUE, 5, |
||
168 | get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data); |
||
169 | return tvb_reported_length(tvb); |
||
170 | } |
||
171 | |||
172 | static gboolean |
||
173 | dissect_PROTOABBREV_heur_tcp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) |
||
174 | { |
||
175 | if (!test_PROTOABBREV(pinfo, tvb, 0, data)) |
||
176 | return FALSE; |
||
177 | |||
178 | /* specify that dissect_PROTOABBREV is to be called directly from now on for |
||
179 | * packets for this "connection" ... but only do this if your heuristic sits directly |
||
180 | * on top of (was called by) a dissector which established a conversation for the |
||
181 | * protocol "port type". In other words: only directly over TCP, UDP, DCCP, ... |
||
182 | * otherwise you'll be overriding the dissector that called your heuristic dissector. |
||
183 | */ |
||
184 | conversation = find_or_create_conversation(pinfo); |
||
185 | conversation_set_dissector(conversation, PROTOABBREV_tcp_handle); |
||
186 | |||
187 | /* and do the dissection */ |
||
188 | dissect_PROTOABBREV_tcp(tvb, pinfo, tree, data); |
||
189 | |||
190 | return (TRUE); |
||
191 | } |
||
192 | |||
193 | static int |
||
194 | dissect_PROTOABBREV_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) |
||
195 | { |
||
196 | udp_dissect_pdus(tvb, pinfo, tree, TRUE, 5, NULL, |
||
197 | get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data); |
||
198 | return tvb_reported_length(tvb); |
||
199 | } |
||
200 | |||
201 | static gboolean |
||
202 | dissect_PROTOABBREV_heur_udp(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree, void *data) |
||
203 | { |
||
204 | ... |
||
205 | /* and do the dissection */ |
||
206 | return (udp_dissect_pdus(tvb, pinfo, tree, TRUE, 5, test_PROTOABBREV, |
||
207 | get_PROTOABBREV_len, dissect_PROTOABBREV_pdu, data) != 0); |
||
208 | } |
||
209 | |||
210 | void |
||
211 | proto_reg_handoff_PROTOABBREV(void) |
||
212 | { |
||
213 | PROTOABBREV_tcp_handle = create_dissector_handle(dissect_PROTOABBREV_tcp, |
||
214 | proto_PROTOABBREV); |
||
215 | PROTOABBREV_pdu_handle = create_dissector_handle(dissect_PROTOABBREV_pdu, |
||
216 | proto_PROTOABBREV); |
||
217 | |||
218 | /* register as heuristic dissector for both TCP and UDP */ |
||
219 | heur_dissector_add("tcp", dissect_PROTOABBREV_heur_tcp, "PROTOABBREV over TCP", |
||
220 | "PROTOABBREV_tcp", proto_PROTOABBREV, HEURISTIC_ENABLE); |
||
221 | heur_dissector_add("udp", dissect_PROTOABBREV_heur_udp, "PROTOABBREV over UDP", |
||
222 | "PROTOABBREV_udp", proto_PROTOABBREV, HEURISTIC_ENABLE); |
||
223 | |||
224 | #ifdef OPTIONAL |
||
225 | /* It's possible to write a dissector to be a dual heuristic/normal dissector */ |
||
226 | /* by also registering the dissector "normally". */ |
||
227 | dissector_add_uint("ip.proto", IP_PROTO_PROTOABBREV, PROTOABBREV_pdu_handle); |
||
228 | #endif |
||
229 | } |
||
230 | |||
231 | |||
232 | Please note, that registering a heuristic dissector is only possible for a |
||
233 | small variety of protocols. In most cases a heuristic is not needed, and |
||
234 | adding the support would only add unused code to the dissector. |
||
235 | |||
236 | TCP and UDP are prominent examples that support HDs, as there seems to be a |
||
237 | tendency to re-use known port numbers for new protocols. But TCP and UDP are |
||
238 | not the only dissectors that provide support for HDs. You can find more |
||
239 | examples by searching the Wireshark sources as follows: |
||
240 | grep -l register_heur_dissector_list epan/dissectors/packet-*.c |
||
241 | returns 45 files (November 2014). |