nexmon – Blame information for rev 1
?pathlinks?
Rev | Author | Line No. | Line |
---|---|---|---|
1 | office | 1 | Protocol Dissection in XML Format |
2 | ================================= |
||
3 | Copyright (c) 2003 by Gilbert Ramirez <gram@alumni.rice.edu> |
||
4 | |||
5 | |||
6 | Wireshark has the ability to export its protocol dissection in an |
||
7 | XML format, tshark has similar functionality by using the "-Tpdml" |
||
8 | option. |
||
9 | |||
10 | The XML that wireshark produces follows the Packet Details Markup |
||
11 | Language (PDML) specified by the group at the Politecnico Di Torino |
||
12 | working on Analyzer. The specification can be found at: |
||
13 | |||
14 | http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm |
||
15 | |||
16 | That URL is not functioning any more, but a copy can be found at: |
||
17 | |||
18 | http://gd.tuwien.ac.at/.vhost/analyzer.polito.it/docs/dissectors/PDMLSpec.htm |
||
19 | |||
20 | A related XML format, the Packet Summary Markup Language (PSML), is |
||
21 | also defined by the Analyzer group to provide packet summary information. |
||
22 | The PSML format is not documented in a publicly-available HTML document, |
||
23 | but its format is simple. Wireshark can export this format too. Some day it |
||
24 | may be added to tshark so that "-Tpsml" would produce PSML. |
||
25 | |||
26 | One wonders if the "-T" option should read "-Txml" instead of "-Tpdml" |
||
27 | (and in the future, "-Tpsml"), but if tshark was required to produce |
||
28 | another XML-based format of its protocol dissection, then "-Txml" would |
||
29 | be ambiguous. |
||
30 | |||
31 | PDML |
||
32 | ==== |
||
33 | The PDML that wireshark produces is known not to be loadable into Analyzer. |
||
34 | It causes Analyzer to crash. As such, the PDML that wireshark produces |
||
35 | is be labeled with a version number of "0", which means that the PDML does |
||
36 | not fully follow the PDML spec. Furthermore, a creator attribute in the |
||
37 | "<pdml>" tag gives the version number of wireshark/tshark that produced the PDML. |
||
38 | In that way, as the PDML produced by wireshark matures, but still does not |
||
39 | meet the PDML spec, scripts can make intelligent decisions about how to |
||
40 | best parse the PDML, based on the "creator" attribute. |
||
41 | |||
42 | A PDML file is delimited by a "<pdml>" tag. |
||
43 | A PDML file contains multiple packets, denoted by the "<packet>" tag. |
||
44 | A packet will contain multiple protocols, denoted by the "<proto>" tag. |
||
45 | A protocol might contain one or more fields, denoted by the "<field>" tag. |
||
46 | |||
47 | A pseudo-protocol named "geninfo" is produced, as is required by the PDML |
||
48 | spec, and exported as the first protocol after the opening "<packet>" tag. |
||
49 | Its information comes from wireshark's "frame" protocol, which serves |
||
50 | the similar purpose of storing packet meta-data. Both "geninfo" and |
||
51 | "frame" protocols are provided in the PDML output. |
||
52 | |||
53 | The "<pdml>" tag |
||
54 | ================ |
||
55 | Example: |
||
56 | <pdml version="0" creator="wireshark/0.9.17"> |
||
57 | |||
58 | The creator is "wireshark" (i.e., the "wireshark" engine. It will always say |
||
59 | "wireshark", not "tshark") version 0.9.17. |
||
60 | |||
61 | |||
62 | The "<proto>" tag |
||
63 | ================= |
||
64 | "<proto>" tags can have the following attributes: |
||
65 | |||
66 | name - the display filter name for the protocol |
||
67 | showname - the label used to describe this protocol in the protocol |
||
68 | tree. This is usually the descriptive name of the protocol, |
||
69 | but it can be modified by dissectors to include more data |
||
70 | (tcp can do this) |
||
71 | pos - the starting offset within the packet data where this |
||
72 | protocol starts |
||
73 | size - the number of octets in the packet data that this protocol |
||
74 | covers. |
||
75 | |||
76 | The "<field>" tag |
||
77 | ================= |
||
78 | "<field>" tags can have the following attributes: |
||
79 | |||
80 | name - the display filter name for the field |
||
81 | showname - the label used to describe this field in the protocol |
||
82 | tree. This is usually the descriptive name of the protocol, |
||
83 | followed by some representation of the value. |
||
84 | pos - the starting offset within the packet data where this |
||
85 | field starts |
||
86 | size - the number of octets in the packet data that this field |
||
87 | covers. |
||
88 | value - the actual packet data, in hex, that this field covers |
||
89 | show - the representation of the packet data ('value') as it would |
||
90 | appear in a display filter. |
||
91 | |||
92 | Some dissectors sometimes place text into the protocol tree, without using |
||
93 | a field with a field-name. Those appear in PDML as "<field>" tags with no |
||
94 | 'name' attribute, but with a 'show' attribute giving that text. |
||
95 | |||
96 | Many dissectors label the undissected payload of a protocol as belonging |
||
97 | to a "data" protocol, and the "data" protocol usually resided inside |
||
98 | that last protocol dissected. In the PDML, The "data" protocol becomes |
||
99 | a "data" field, placed exactly where the "data" protocol is in wireshark's |
||
100 | protocol tree. So, if wireshark would normally show: |
||
101 | |||
102 | +-- Frame |
||
103 | | |
||
104 | +-- Ethernet |
||
105 | | |
||
106 | +-- IP |
||
107 | | |
||
108 | +-- TCP |
||
109 | | |
||
110 | +-- HTTP |
||
111 | | |
||
112 | +-- Data |
||
113 | |||
114 | In PDML, the "Data" protocol would become another field under HTTP: |
||
115 | |||
116 | <packet> |
||
117 | <proto name="frame"> |
||
118 | ... |
||
119 | </proto> |
||
120 | |||
121 | <proto name="eth"> |
||
122 | ... |
||
123 | </proto> |
||
124 | |||
125 | <proto name="ip"> |
||
126 | ... |
||
127 | </proto> |
||
128 | |||
129 | <proto name="tcp"> |
||
130 | ... |
||
131 | </proto> |
||
132 | |||
133 | <proto name="http"> |
||
134 | ... |
||
135 | <field name="data" value="........."/> |
||
136 | </proto> |
||
137 | </packet> |
||
138 | |||
139 | |||
140 | |||
141 | tools/WiresharkXML.py |
||
142 | ==================== |
||
143 | This is a python module which provides some infrastructure for |
||
144 | Python developers who wish to parse PDML. It is designed to read |
||
145 | a PDML file and call a user's callback function every time a packet |
||
146 | is constructed from the protocols and fields for a single packet. |
||
147 | |||
148 | The python user should import the module, define a callback function |
||
149 | which accepts one argument, and call the parse_fh function: |
||
150 | |||
151 | ------------------------------------------------------------ |
||
152 | import WiresharkXML |
||
153 | |||
154 | def my_callback(packet): |
||
155 | # do something |
||
156 | |||
157 | # If the PDML is stored in a file, you can: |
||
158 | fh = open(xml_filename) |
||
159 | WiresharkXML.parse_fh(fh, my_callback) |
||
160 | |||
161 | # or, if the PDML is contained within a string, you can: |
||
162 | WiresharkXML.parse_string(my_string, my_callback) |
||
163 | |||
164 | # Now that the script has the packet data, do something. |
||
165 | ------------------------------------------------------------ |
||
166 | |||
167 | The object that is passed to the callback function is an |
||
168 | WiresharkXML.Packet object, which corresponds to a single packet. |
||
169 | WiresharkXML Provides 3 classes, each of which corresponds to a PDML tag: |
||
170 | |||
171 | Packet - "<packet>" tag |
||
172 | Protocol - "<proto>" tag |
||
173 | Field - "<field>" tag |
||
174 | |||
175 | Each of these classes has accessors which will return the defined attributes: |
||
176 | |||
177 | get_name() |
||
178 | get_showname() |
||
179 | get_pos() |
||
180 | get_size() |
||
181 | get_value() |
||
182 | get_show() |
||
183 | |||
184 | Protocols and fields can contain other fields. Thus, the Protocol and |
||
185 | Field class have a "children" member, which is a simple list of the |
||
186 | Field objects, if any, that are contained. The "children" list can be |
||
187 | directly accessed by code using the object. The "children" list will be |
||
188 | empty if this Protocol or Field contains no Fields. |
||
189 | |||
190 | Furthermore, the Packet class is a sub-class of the PacketList class. |
||
191 | The PacketList class provides methods to look for protocols and fields. |
||
192 | The term "item" is used when the item being looked for can be |
||
193 | a protocol or a field: |
||
194 | |||
195 | item_exists(name) - checks if an item exists in the PacketList |
||
196 | get_items(name) - returns a PacketList of all matching items |
||
197 | |||
198 | |||
199 | General Notes |
||
200 | ============= |
||
201 | Generally, parsing XML is slow. If you're writing a script to parse |
||
202 | the PDML output of tshark, pass a read filter with "-R" to tshark to |
||
203 | try to reduce as much as possible the number of packets coming out of tshark. |
||
204 | The less your script has to process, the faster it will be. |
||
205 | |||
206 | 'tools/msnchat' is a sample Python program that uses WiresharkXML to parse |
||
207 | PDML. Given one or more capture files, it runs tshark on each of them, |
||
208 | providing a read filter to reduce tshark's output. It finds MSN Chat |
||
209 | conversations in the capture file and produces nice HTML showing the |
||
210 | conversations. It has only been tested with capture files containing |
||
211 | non-simultaneous chat sessions, but was written to more-or-less handle any |
||
212 | number of simultaneous chat sessions. |