WebSVN – nexmon – Blame – Rev 1 – /utilities/wireshark/doc/README.display

1

office

1

(This is a consolidation of documentation written by stig, sahlberg, and gram)

2

3

What is the display filter system?

4

==================================

5

The display filter system allows the user to select packets by testing

6

for values in the proto_tree that Wireshark constructs for that packet.

7

Every proto_item in the proto_tree has an 'abbrev' field

8

and a 'type' field, which tells the display filter engine the name

9

of the field and its type (what values it can hold).

10

11

For example, this is the definition of the ip.proto field from packet-ip.c:

12

13

{ &hf_ip_proto,

14

{ "Protocol", "ip.proto", FT_UINT8, BASE_DEC | BASE_EXT_STRING,

15

&ipproto_val_ext, 0x0, NULL, HFILL }},

16

17

This definition says that "ip.proto" is the display-filter name for

18

this field, and that its field-type is FT_UINT8.

19

20

The display filter system has 3 major parts to it:

21

22

1. A type system (field types, or "ftypes")

23

2. A parser, to convert a user's query to an internal representation

24

3. An engine that uses the internal representation to select packets.

code:

epan/dfilter/* - the display filter engine, including

29

scanner, parser, syntax-tree semantics checker, DFVM bytecode

30

generator, and DFVM engine.

31

epan/ftypes/* - the definitions of the various FT_* field types.

32

epan/proto.c - proto_tree-related routines

33

34

35

The field type system

36

=====================

37

The field type system is stored in epan/ftypes.

38

39

The proto_tree system #includes ftypes.h, which gives it the ftenum

40

definition, which is the enum of all possible ftypes:

/* field types */

enum ftenum {

FT_NONE, /* used for text labels with no value */

45

FT_PROTOCOL,

46

FT_BOOLEAN, /* TRUE and FALSE come from <glib.h> */

47

FT_UINT8,

48

FT_UINT16,

49

FT_UINT24, /* really a UINT32, but displayed as3 hex-digits if FD_HEX*/

FT_UINT32,

FT_UINT64,

etc., etc.

}

It also provides the definition of fvalue_t, the struct that holds the *value*

56

that corresponds to the type. Each proto_item (proto_node) holds an fvalue_t

57

due to having a field_info struct (defined in proto.h).

58

59

The fvalue_t is mostly just a gigantic union of possible C-language types

60

(as opposed to FT_* types):

61

62

typedef struct _fvalue_t {

63

ftype_t *ftype;

64

union {

65

/* Put a few basic types in here */

guint32 uinteger;

gint32 sinteger;

guint64 integer64;

gdouble floating;

gchar *string;

guchar *ustring;

GByteArray *bytes;

ipv4_addr ipv4;

ipv6_addr ipv6;

e_guid_t guid;

nstime_t time;

tvbuff_t *tvb;

GRegex *re;

} value;

/* The following is provided for private use

82

* by the fvalue. */

83

gboolean fvalue_gboolean1;

} fvalue_t;

Defining a field type

89

---------------------

90

The ftype system itself is designed to be modular, so that new field types

91

can be added when necessary.

92

93

Each field type must implement an ftype_t structure, also defined in

94

ftypes.h. This is the way a field type is registered with the ftype engine.

95

96

If you take a look at ftype-integer.c, you will see that it provides

97

an ftype_register_integers() function, that fills in many such ftype_t

98

structs. It creates one for each integer type: FT_UINT8, FT_UINT16,

99

FT_UINT32, etc.

100

101

The ftype_t struct defines the things needed for the ftype:

102

103

* its ftenum value

104

* a string representation of the FT name ("FT_UINT8")

105

* how much data it consumes in the packet

106

* how to store that value in an fvalue_t: new(), free(),

107

various value-related functions

108

* how to compare that value against another

109

* how to slice that value (strings and byte ranges can be sliced)

Using an fvalue_t

-----------------

Once the value of a field is stored in an fvalue_t (stored in

114

each proto_item via field_info), it's easy to use those values,

115

thanks to the various fvalue_*() functions defined in ftypes.h.

116

117

Functions like fvalue_get(), fvalue_eq(), etc., are all generic

118

interfaces to get information about the field's value. They work

119

on any field type because of the ftype_t struct, which is the lookup

120

table that the field-type engine uses to work with any field type.

121

122

The display filter parser

123

=========================

124

The display filter parser (along with the comparison engine)

125

is stored in epan/dfilter.

126

127

The scanner/parser pair read the string representing the display filter

128

and convert it into a very simple syntax tree. The syntax tree is very

129

simple in that it is possible that many of the nodes contain unparsed

130

chunks of text from the display filter.

131

132

There are four phases to parsing a user's request:

133

134

1. Scanning the string for dfilter syntax

135

2. Parsing the keywords according to the dfilter grammar, into a

136

syntax tree

137

3. Doing a semantic check of the nodes in that syntax tree

138

4. Converting the syntax tree into a series of DFVM byte codes

139

140

The dfilter_compile() function, in epan/dfilter/dfilter.c,

141

runs these 4 phases. The end result is a dfwork_t object (dfw), that

142

can be passed to dfilter_apply() to actually run the display filter

143

against a set of proto_trees.

144

145

146

Scanning the display filter string

147

----------------------------------

148

epan/dfilter/scanner.l is the lex scanner for finding keywords

149

in the user's display filter string.

150

151

Its operation is simple. It finds the special function and comparison

152

operators ("==", "!=", "eq", "ne", etc.), it finds slice operations

153

( "[0:1]" ), quoted strings, IP addresses, numbers, and any other "special"

154

keywords or string types.

155

156

Anything it doesn't know how to handle is passed to to grammar parser

157

as an unparsed string (TOKEN_UNPARSED). This includes field names. The

158

scanner does not interpret any protocol field names at all.

159

160

The scanner has to return a token type (TOKEN_*, and in many cases,

161

a value. The value will be an stnode_t struct, which is a syntax

162

tree node object. Since the final storage of the parse will

163

be in a syntax tree, it is convenient for the scanner to fill in

164

syntax tree nodes with values when it can.

165

166

The stnode_t definition is in epan/dfilter/syntax-tree.h

167

168

169

Parsing the keywords according to the dfilter grammar

170

-----------------------------------------------------

171

The grammar parser is implemented with the 'lemon' tool,

172

rather than the traditional yacc or bison grammar parser,

173

as lemon grammars were found to be easier to work with. The

174

lemon parser specification (epan/dfilter/grammar.lemon) is

175

much easier to read than its bison counterpart would be,

176

thanks to lemon's feature of being able to name fields, rather

177

then using numbers ($1, $2, etc.)

178

179

The lemon tool is located in tools/lemon in the Wireshark

180

distribution.

181

182

An on-line introduction to lemon is available at:

183

184

http://www.sqlite.org/src/doc/trunk/doc/lemon.html

185

186

The grammar specifies which type of constructs are possible

187

within the dfilter language ("dfilter-lang")

188

189

An "expression" in dfilter-lang can be a relational test or a logical test.

190

191

A relational test compares a value against another, which is usually

192

a field (or a slice of a field) against some static value, like:

193

194

ip.proto == 1

195

eth.dst != ff:ff:ff:ff:ff:ff

196

197

A logical test combines other expressions with "and", "or", and "not".

198

199

At the end of the grammatical parsing, the dfw object will

200

have a valid syntax tree, pointed at by dfw->st_root.

201

202

If there is an error in the syntax, the parser will call dfilter_fail()

203

with an appropriate error message, which the UI will need to report

204

to the user.

205

206

The syntax tree system

207

----------------------

208

The syntax tree is created as a result of running the lemon-based

209

grammar parser on the scanned tokens. The syntax tree code

210

is in epan/dfilter/syntax-tree* and epan/dfilter/sttree-*. It too

211

uses a set of code modules that implement different syntax node types,

212

similar to how the field-type system registers a set of ftypes

213

with a central engine.

214

215

Each node (stnode_t) in the syntax tree has a type (sttype).

216

These sttypes are very much related to ftypes (field types), but there

217

is not a one-to-one correspondence. The syntax tree nodes are slightly

218

high-level. For example, there is only a single INTEGER sttype, unlike

219

the ftype system that has a type for UINT64, UINT32, UINT16, UINT8, etc.

220

221

typedef enum {

222

STTYPE_UNINITIALIZED,

STTYPE_TEST,

STTYPE_UNPARSED,

STTYPE_STRING,

STTYPE_FIELD,

STTYPE_FVALUE,

STTYPE_INTEGER,

STTYPE_RANGE,

STTYPE_FUNCTION,

STTYPE_SET,

STTYPE_NUM_TYPES

} sttype_id_t;

The root node of the syntax tree is the main test or comparison

being done.

Semantic Check

--------------

After the parsing is done and a syntax tree is available, the

242

code in semcheck.c does a semantic check of what is in the syntax

243

tree.

244

245

The semantics of the simple syntax tree are checked to make sure that

246

the fields that are being compared are being compared to appropriate

247

values. For example, if a field is an integer, it can't be compared to

248

a string, unless a value_string has been defined for that field.

249

250

During the process of checking the semantics, the simple syntax tree is

251

fleshed out and no longer contains nodes with unparsed information. The

252

syntax tree is no longer in its simple form, but in its complete form.

253

254

For example, if the dfilter is slicing a field and comparing

255

against a set of bytes, semcheck.c has to check that the field

256

in question can indeed be sliced.

257

258

Or, can a field be compared against a certain type of value (string,

259

integer, float, IPv4 address, etc.)

260

261

The semcheck code also makes adjustments to the syntax tree

262

when it needs to. The parser sometimes stores raw, unparsed strings

263

in the syntax tree, and semcheck has to convert them to

264

certain types. For example, the display filter may contain

265

a value_string string (the "enum" type that protocols can use

266

to define the possible textual descriptions of numeric fields), and

267

semcheck will convert that value_string string into the correct

268

integer value.

269

270

Truth be told, the semcheck.c code is a bit disorganized, and could

271

be re-designed & re-written.

DFVM Byte Codes

---------------

The syntax tree is analyzed to create a sequence of bytecodes in the

276

"DFVM" language. "DFVM" stands for Display Filter Virtual Machine. The

277

DFVM is similar in spirit, but not in definition, to the BPF VM that

278

libpcap uses to analyze packets.

279

280

A virtual bytecode is created and used so that the actual process of

281

filtering packets will be fast. That is, it should be faster to process

282

a list of VM bytecodes than to attempt to filter packets directly from

283

the syntax tree. (heh... no measurement has been made to support this

284

supposition)

285

286

The DFVM opcodes are defined in epan/dfilter/dfvm.h (dfvm_opcode_t).

287

Similar to how the BPF opcode system works in libpcap, there is a

288

limited set of opcodes. They operate by loading values from the

289

proto_tree into registers, loading pre-defined values into

290

registers, and comparing them. The opcodes are checked in sequence, and

291

there are only 2 branching opcodes: IF_TRUE_GOTO and IF_FALSE_GOTO.

292

Both of these can only branch forwards, and never backwards. In this way

293

sets of DFVM instructions will never get into an infinite loop.

294

295

The epan/dfilter/gencode.c code converts the syntax tree

296

into a set of dvfm instructions.

297

298

The constants that are in the DFVM instructions (the constant

299

values that the user is checking against) are pre-loaded

300

into registers via the dvfm_init_const() call, and stored

301

in the dfilter_t structure for when the display filter is

actually applied.

DFVM Engine

===========

Once the DFVM bytecode has been produced, it's a simple matter of

308

running the DFVM engine against the proto_tree from the packet

309

dissection, using the DFVM bytecodes as instructions. If the DFVM

310

bytecode is known before packet dissection occurs, the

311

proto_tree-related code can be "primed" to store away pointers to

312

field_info structures that are interesting to the display filter. This

313

makes lookup of those field_info structures during the filtering process

314

faster.

315

316

The dfilter_apply() function runs a single pre-compiled

317

display filter against a single proto_tree function, and returns

318

TRUE or FALSE, meaning that the filter matched or not.

319

320

That function calls dfvm_apply(), which runs across the DFVM

321

instructions, loading protocol field values into DFVM registers

322

and doing the comparisons.

323

324

There is a top-level Makefile target called 'dftest' which

325

builds a 'dftest' executable that will print out the DFVM

326

bytecode for any display filter given on the command-line.

To build it, run:

$ make dftest

To use it, give it the display filter on the command-line:

332

333

$ ./dftest 'ip.addr == 127.0.0.1'

334

Filter: "ip.addr == 127.0.0.1"

335

336

Constants:

337

00000 PUT_FVALUE 127.0.0.1 <FT_IPv4> -> reg#1

338

339

Instructions:

340

00000 READ_TREE ip.addr -> reg#0

341

00001 IF-FALSE-GOTO 3

342

00002 ANY_EQ reg#0 == reg#1

00003 RETURN

The output shows the original display filter, then the opcodes

347

that put constant values into registers. The registers are

348

numbered, and are shown in the output as "reg#n", where 'n' is the

349

identifying number.

350

351

Then the instructions are shown. These are the instructions

352

which are run for each proto_tree.

353

354

This is what happens in this example:

355

356

00000 READ_TREE ip.addr -> reg#0

357

358

Any ip.addr fields in the proto_tree are loaded into register 0. Yes,

359

multiple values can be loaded into a single register. As a result

360

of this READ_TREE, the accumulator will hold TRUE or FALSE, indicating

361

if any field's value was loaded, or not.

362

363

00001 IF-FALSE-GOTO 3

364

365

If the load failed because there were no ip.addr fields

366

in the proto_tree, then we jump to instruction 3.

367

368

00002 ANY_EQ reg#0 == reg#1

369

370

This checks to see if any of the fields in register 0

371

(which has the pre-loaded constant value of 127.0.0.1) are equal

372

to any of the fields in register 1 (which are all of the ip.addr

373

fields in the proto tree). The resulting value in the

374

accumulator will be TRUE if any of the fields match, or FALSE

if none match.

00003 RETURN

This returns the accumulator's value, either TRUE or FALSE.

380

381

In addition to dftest, there is also a tools/dfilter-test script

382

which is a unit-test script for the display filter engine.

383

It makes use of text2pcap and tshark to run specific display

384

filters against specific captures (embedded within dfilter-test).

Display Filter Functions

389

========================

390

You define a display filter function by adding an entry to

391

the df_functions table in epan/dfilter/dfunctions.c. The record struct

392

is defined in dfunctions.h, and shown here:

typedef struct {

char *name;

DFFuncType function;

397

ftenum_t retval_ftype;

398

guint min_nargs;

399

guint max_nargs;

400

DFSemCheckType semcheck_param_function;

401

} df_func_def_t;

402

403

name - the name of the function; this is how the user will call your

404

function in the display filter language

405

406

function - this is the run-time processing of your function.

407

408

retval_ftype - what type of FT_* type does your function return?

409

410

min_nargs - minimum number of arguments your function accepts

411

max_nargs - maximum number of arguments your function accepts

412

413

semcheck_param_function - called during the semantic check of the

414

display filter string.

415

416

DFFuncType function

417

-------------------

418

typedef gboolean (*DFFuncType)(GList *arg1list, GList *arg2list, GList **retval);

419

420

The return value of your function is a gboolean; TRUE if processing went fine,

421

or FALSE if there was some sort of exception.

422

423

For now, display filter functions can accept a maximum of 2 arguments.

424

The "arg1list" parameter is the GList for the first argument. The

425

'arg2list" parameter is the GList for the second argument. All arguments

426

to display filter functions are lists. This is because in the display

427

filter language a protocol field may have multiple instances. For example,

428

a field like "ip.addr" will exist more than once in a single frame. So

429

when the user invokes this display filter:

430

431

somefunc(ip.addr) == TRUE

432

433

even though "ip.addr" is a single argument, the "somefunc" function will

434

receive a GList of *all* the values of "ip.addr" in the frame.

435

436

Similarly, the return value of the function needs to be a GList, since all

437

values in the display filter language are lists. The GList** retval argument

438

is passed to your function so you can set the pointer to your return value.

DFSemCheckType

--------------

typedef void (*DFSemCheckType)(int param_num, stnode_t *st_node);

443

444

For each parameter in the syntax tree, this function will be called.

445

"param_num" will indicate the number of the parameter, starting with 0.

446

The "stnode_t" is the syntax-tree node representing that parameter.

447

If everything is okay with the value of that stnode_t, your function

448

does nothing --- it merely returns. If something is wrong, however,

449

it should THROW a TypeError exception.

450

451

452

Example: add an 'in' display filter operation

453

=============================================

454

455

This example has been discussed on wireshark-dev in April 2004. It illustrates

456

how a more complex operation can be added to the display filter language.

Question:

If I want to add an 'in' display filter operation, I need to define

461

several things. This can happen in different ways. For instance,

462

every value from the "in" value collection will result in a test.

463

There are 2 options here, either a test for a single value:

(x in {a b c})

or a test for a value in a given range:

(x in {a ... z})

or even a combination of both. The former example can be reduced to:

472

473

((x == a) or (x == b) or (x == c))

474

475

while the latter can be reduced to

476

477

((x >= MIN(a, z)) and (x <= MAX(a, z)))

478

479

I understand that I can replace "x in {" with the following steps:

480

first store x in the "in" test buffer, then add "(" to the display

481

filter expression internally.

482

483

Similarly I can replace the closing brace "}" with the following

484

steps: release x from the "in" test buffer and then add ")"

485

to the display filter expression internally.

486

487

How could I do this?

Answer:

This could be done in grammar.lemon. The grammar would produce

492

syntax tree nodes, combining them with "or", when it is given

493

tokens that represent the "in" syntax.

494

495

It could also be done later in the process, maybe in

496

semcheck.c. But if you can do it earlier, in grammar.lemon,

497

then you shouldn't have to worry about modifying anything in

498

semcheck.c, as the syntax tree that is passed to semcheck.c

499

won't contain any new type of operators... just lots of nodes

500

combined with "or".

501

502

How to add an operator FOO to the display filter language?

503

==========================================================

504

505

Go to wireshark/epan/dfilter/

506

507

Edit grammar.lemon and add the operator. Add the operator FOO and the

508

test logic (defining TEST_OP_FOO).

509

510

Edit scanner.l and add the operator name(s) hence defining

511

TOKEN_TEST_FOO. Also update the simple() or add the new operand's code.

512

513

Edit sttype-test.h and add the TEST_OP_FOO to the list of test operations.

514

515

Edit sttype-test.c and add TEST_OP_FOO to the num_operands() method.

516

517

Edit gencode.c, add TEST_OP_FOO in the gen_test() method by defining

518

ANY_FOO.

519

520

Edit dfvm.h and add ANY_FOO to the enum dfvm_opcode_t structure.

521

522

Edit dfvm.c and add ANY_FOO to dfvm_dump() (for the dftest display filter

523

test binary), to dfvm_apply() hence defining the methods fvalue_foo().

524

525

Edit semcheck.c and look at the check_relation_XXX() methods if they

526

still apply to the foo operator; if not, amend the code. Start from the

527

check_test() method to discover the logic.

528

529

Go to wireshark/epan/ftypes/

530

531

Edit ftypes.h and declare the fvalue_foo(), ftype_can_foo() and

532

fvalue_foo() methods. Add the cmp_foo() method to the struct _ftype_t.

533

534

This is the first time that a make in wireshark/epan/dfilter/ can

535

succeed. If it fails, then some code in the previously edited files must

536

be corrected.

537

538

Edit ftypes.c and define the fvalue_foo() method with its associated

539

logic. Define also the ftype_can_foo() and fvalue_foo() methods.

540

541

Edit all ftype-*.c files and add the required fvalue_foo() methods.

542

543

This is the point where you should be able to compile without errors in

544

wireshark/epan/ftypes/. If not, first fix the errors.

545

546

Go to wireshark/epan/ and run make. If this one succeeds, then we're

547

almost done as no errors should occur here.

548

549

Go to wireshark/ and run make. One thing to do is make dftest and see

550

if you can construct valid display filters with your new operator. Or

551

you may want to move directly to the generation of Wireshark.

552

553

Look also at wireshark/gtk/dfilter_expr_dlg.c and edit the display filter

554

expression generator.

555

556

How to add a new test to dfilter-test.py

557

========================================

558

Note: dfilter-test.py requires Python 2.7 or newer (including Python 3)

559

560

"tools/dfilter-test.py" is the main test script. It includes

561

the test from files in tools/dftestlib. You can add a test

562

to a file in tools/dftestlib, or you can create a new file

563

in tools/dftestlib. If you do add a new file, you must

564

import it (and the class it defines) in "tools/dfilter-test.py"

565

566

Each new test class must define "trace_file", which names

567

a capture file in "tools/dftestfiles". All the tests

568

run in that class will use that one capture file.

569

570

There are 2 methods you can use for testing:

571

572

assertDfilter(dfilter_text, expected_count)

573

574

This will run the display filter through tshark, on the

575

file named by "trace_file", and assert that the

576

number of resulting packets equals "expected_count". This

577

also asserts that tshark does not fail; success with zero

578

matches is not the same as failure to compile the display

579

filter string.

580

581

assertDFilterFail(dfilter_text)

582

583

This will run tshark with the display filter, and

584

asser that tshark fails. This is useful when expecting

585

display filter syntax errors to be caught.

586

587

Then, simply run "dfilter-test.py". You can run the tests

588

in a single Test class by naming that -class on the

589

dfilter-test.py command-line, or even run a single

590

test by naming it. E.g., the following are all valid ways

591

of running dfilter-test.py:

592

593

# Run all tests

594

$ ./tools/dfilter-test.py

595

596

# Run all tests in "testTVB"

597

$ ./tools/dfilter-test.py testTVB

598

599

# Run the the "test_contains_1" test from testTVB

600

$ ./tools/dfilter-test.py testTVB.test_contains_1

601

602

Note that dfilter-test.py should be run from the top of the

603

Wireshark distribution, so it knows where to find the default

604

tshark executable.

605

nexmon – Blame information for rev 1