NetPDL Core Specification

Language Basics

Before describing the NetPDL language in depth, a brief overview of the language itself is presented.

<netpdl>: the root element

A NetPDL file starts with the <netpdl> root element, which supports the following attributes in addition to the Standard ones:

AttributeDescription
version (optional)The version of the NetPDL specification.
creator (optional)The creator of current NetPDL database. For instance, the database embedded in the NetBee library has creator=“nbee.org”. There is a small, semantic difference between creator and author (which is part of the Standard Attributes): the first one is usually intended to be the tool that creates the NetPDL library, while the second is usually intended to be the person who made the code.

General Structure

The general structure of a NetPDL document is the following:

<netpdl>
  <protocol ...>
    <format>
      ...
    </format>
    <encapsulation>

      ...
    </encapsulation>
  </protocol>
</netpdl>

The <netpdl> element must contain at least one <protocol>, but usually it contains a sequence of protocols. These objects will be used to describe network PDUs. Each protocol contains a section for describing the header format (included in the <format> section), and a section <encapsulation> for describing which protocols follow the present one.

Syntactical rules

Case-Sensitive syntax

The NetPDL language is case sensitive. As general rule, everything has to be written in lower case. There are two exceptions:

  • the content of some fields (like “description”), which is at user's choice
  • hexadecimal numbers can be written both in lower and upper case.

Identifiers

Identifiers are used for protocols and fields. These identifiers support only the following character set:

a-z, A-Z, 0-9, _

No other characters are allowed as identifiers (which corresponds to the name attribute). Other attributes can make use of any other printable characters.

Expressions

Although the NetPDL makes a large use of mathematical, boolean and string expressions, their syntax is rather intuitive and the reader can understand this specification even without reading carefully how expressions have been defined. In any case, expressions are defined in a companion document.

Defining a protocol: the general structure

Defining a network protocol: the <protocol> element

The <protocol> tag is the base object that makes up the NetPDL library. it includes the information required for:

  • describing the protocol format (e.g. the list of fields which constitute the header)
  • describing the protocol encapsulation (i.e. how to determine, from the analysis of the current protocol, which one has to be used to interpret the bytes of the payload)
  • describing the protocol custom processing (i.e. custom code that has to be executed in order to complete the protocol processing), such as verifying the correctness of the protocol (i.e. if the protocol we are currently examining is the right one).

A protocol is made up of the following child elements:

ElementDescription
<execute-code> (optional)It defines a set of code that might be executed when some event occurs. For instance, in this section we can define a set of expressions that indicate if the data currently under analysis belongs to this protocol, which can be used in case we are not sure if we are processing data according to the right protocol. This element belongs to the “advanced” part of NetPDL, hence will be presented in the Advanced NetPDL document.
<format> (required)It contains the definition of the headers of the current protocol.
<encapsulation> (optional)It contains a list of statements that indicates (when possible) the next PDU (i.e. the next protocol) contained in the packet.

A <protocol> element can contain the following attributes in addition to the Standard ones:

AttributeDescription
name (required)A unique name that identifies the object within its scope (e.g. “Eth”).
longname (optional)It keeps a 'human' name (e.g. “Ethernet 802.3”) and it may be used when the object has to be shown. Vice versa, the attribute name is usually used only inside the NetPDL description to provide a unique reference (within its scope) to the object. Therefore, the longname attribute should always be present, while the name attribute can be kept as small as possible since it is not used outside the NetPDL file. Warning: although it is not required, the longname attribute should always be present.

Standard Attributes

In addition, the <protocol> element supports also the following attributes, which are listed separately because these are used by many other NetPDL elements:

AttributeDescription
description (optional)Description of the object (e.g. “Encapsulation of the 802.3 CSMA/CD technology”).
comment (optional)A comment on the object; usually this comment related to the NetPDL description itself (e.g. “used this field format although it is not really standard, but it is simpler”) and it should not have any meaning outside the NetPDL file itself.
author (optional)The name of the person who implemented this object in the NetPDL Library.
date (optional)Date of the last update, using the format dd-mm-yyyy (e.g. 14-09-2006 means 14-Sept-2006).

Protocol skeleton

The protocol skeleton looks like the following:

<protocol name="IP">
  <execute-code>
    <!-- Check if 'version' is equal to '4' -->
    ...
  </execute-code>

  <format>

    <fields>
      <!-- Put all fields which define an IPv4 header -->
      <field type="fixed" name="version_and_length" size="1"/>
      <fixed type="fixed" name="tos" size="1"/>
      ...
    </fields>

    <block name="ipRecordRoute">
      <!-- Define here the IP Record Route format -->
      <field type="fixed" name="type" size="1"/>
      <field type="fixed" name="length" size="1"/>
      ...
    </block>
    <!-- Define other options here -->

  </format>

  <encapsulation>
    <!-- Determine the protocol that follows this header -->
    ...
  </encapsulation>
</protocol>

The three fundamental blocks (<execute-code>, <format> and <encapsulation>) are positioned as children of the <protocol> element. In addition, the <format> block can be further organized in <fields>, which contains the list of fields of the current protocol, and <block>, which contains some pieces of definition that can be part of the protocol format. More details will be presented in the following sections.

Defining the protocol format

Defining the section that contains all the protocol headers: the <format> element

The <format> element is a container for all the elements that define the structure of the protocol headers. It does not have any attribute; it supports the following child elements:

ElementDescription
<fields> (required)It contains all the fields that constitute the protocol header.
<block> (optional)It contains the definition of some group of fields (e.g. optional headers) that can be present within the protocol format. This element may be present zero, one or more times.

Defining the structure of the protocol headers: the <fields> element

This element is a container for all the structures that defines the header of the protocol. The protocol headers starts according to the first element contained in the <field> section.

This element does not have any attribute. It supports several child elements: <field>, Additional Format Elements and Conditional Elements.

Protocol fields: the <field> element

Fields are defined through the <field> element. It supports the following attributes in addition to the Standard ones:

AttributeDescription
name (required)It defines a unique name that identifies the object within its scope.
longname (optional)It defines a 'human' name; it may be used when the object has to be shown.
type (required)It defines the type of the field (i.e. fixed-length, variable length, etc.). More details will follow.
bigendian (optional)It specifies if the current protocol stores fields in network byte order (i.e. big endian). For instance, most protocols (e.g. TCP/IP) use the network byte order for storing multibyte fields, while some other (e.g. Bluetooth) use little-endian. It can assume the following values: no (default): this protocol stores multibyte fields in network byte order yes: this protocol does not store multibyte fields in network byte order Note: this attribute is associated to a field (instead of a protocol) because the byte ordering problem arises only for numeric fields; for instance, fields that contains ASCII values do not have this problem. However, NetPDL does not know the semantic of each field (it does not know if a field contains ASCII values or numbers), therefore this attribute cannot be associated to an entire protocol.

The most important attribute for a <field> element is the type attribute, which defines how the field length has to be calculated. Values for this attributes are the following:

type valueDescription
fixedIt defines fixed-length fields.
bitIt defines bit fields, i.e. fields that are not aligned to a byte.
variableIt defines a variable-length field, whose size can be derived from a mathematical expression computed at run-time. Typical example is Type-Length-Values fields, in which the length of the third field (“Value”) is defined in the second field (“Length”).
tokenendedIt defines a variable-length field terminated by a given token. This can be a field which is terminated by a special character (e.g. ”;”) or a special string.
tokenwrappedIt defines a variable-length field started by a first token and ended by a second one. This can be a field whose position is started by a marker and is terminated by a special character (e.g. ”;”) or a special string.
lineIt defines a special type of delimited field, i.e. a field which is terminated by a “new line”. This type of field takes into account the differences related to the “new line” in different operating systems.
paddingIt defines a field that realigns the current header to a 16 or 32 bit boundary.
pluginIt defines a field that cannot be described with NetPDL elements; it requires an appropriate plugin to be implemented in the NetPDL engine.

According to the value of the type attribute, more attribute must follow. The additional attributes required in each case are listed in a the section Different types of the <field> element.

The <field> element can have the following types of child elements:

ElementDescription
<offset> (optional)It defines the fiels offset; it is used in case a field is not following its previous sibling.
<field>, <if>, <switch>, <loop> (optional)All the elements allowed as children of the <fields> element (i.e. all the Protocol Fields and all the Additional Format Elements) and all the Conditional Elements. In other words, fields can be nested. For more details about nested fields, please refers to the Nested Fields section.

Different types of the <field> element

Fields with well-known length: the fixed type

A fixed type is used to declare a fixed length field. In addition to the standard attributes of the <field> element, in case of fixed fields one more attribute is required:

AttributeDescription
size (required)It defines the size of the field, in number of bytes.

This example defines the Ethernet 802.3 header that is made up of 3 fixed fields whose sizes are equal to 6, 6 and 2 bytes.

<protocol name="Ethernet">
  <format>

     <fields>
      <field type="fixed" name="dst" size="6"/>
      <field type="fixed" name="src" size="6"/>
      <field type="fixed" name="ethertype" size="2"/>
    </fields>
  </format>

</protocol>

Bit-fields: the bit type

A bit type can be used in case a field is not aligned to a byte-boundary. This element is used mostly for defining bit-fields such as flags, etc.

In addition to the standard attributes of the <field> element, in case of bit fields two more attributes are required:

AttributeDescription
size (required)It defines the size of the “master” field, in number of bytes.
mask (required)It defines which bits (within the size specified by the size attribute) belong to this field. This attribute contains a number that can be written in decimal, binary and hex according to the same rules used in NetPDL expressions (although the hex format is preferred, e.g. 0x0F). The bits equal to '1' in the mask select the corresponding valid bits for the bit field (e.g. 0×100b00010000; in this case the field is made up of a single bit).

This example shows how to define the flags of the IP header:

<protocol name="IPv4">
  <format>
    <fields>
      <field type="bit" name="ver" longname="Version" size="1" mask="0xF0"/>
      <field type="bit" name="hlen" longname="Header length" size="1" mask="0x0F"/>

      <field type="fixed" name="tos" longname="Type of service" size="1"/>
      ...
    </fields>
  </format>
</protocol>

One of the problem of bit-fields is that several of them can be present at the same offset. For instance, both fields ver and hlen are derived from the same block of the packet (whose size is one byte), at the same offset. It turns out that one of the problem is to determine which offset a bit field must be computed at.

In order to make definition simpler, NetPDL-based engine will determine automatically the offset according to the following rule:

  • if a bitfield B is preceded by another bitfield A, B's offset will be the same as A one, unless the last bit of A's mask is '1'
  • all fields (in this case A and B) that share the same offset must have the same value in the size attribute
  • if a bitfield B is followed by a third field C (not a bitfield), the offset of field C will be the one of B plus the value of B's size attribute.

This example shows how to define the following 6 bit-fields followed by a fixed field:

0               8               16              24            31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|<--F1->|<--F2->|<-F3>|<---F4-->|<-F5>|U|<-F6>|U|<----F7------->|



<fields>
  <field type="bit" name="F1" size="1" mask="0xF0"/>
  <field type="bit" name="F2" size="1" mask="0x0F"/>

  <!-- The 'mask' of previous field ends with '1', so next field has a new offset -->
  <field type="bit" name="F3" size="1" mask="0xE0"/>
  <field type="bit" name="F4" size="1" mask="0x1F"/>

  <!-- The 'mask' of previous field ends with '1', so next field has a new offset -->
  <field type="bit" name="F5" size="1" mask="0xE0"/>
  <field type="bit" name="F6" size="1" mask="0x0E"/>

  <!-- The next field is not a bitfield, therefore it will start at a new offset -->
  <field type="fixed" name="F7" size="1">

</fields>

This example makes clear how a set of bit fields can be used to describe the protocol format. Moreover, it shows that some of the bit may not be assigned to any field.

Fields whose size is known only at run-time: the variable type

A variable type is used to declare a variable length field, i.e. a field whose size can be determined only at run-time by means of some other parameters (e.g. the length of the field can be the value of some other preceding field). In addition to the standard attributes of the <field> element, a variable field requires one more attribute:

AttributeDescription
expr (required)It contains the expression that allows to determine the size of the variable at run-time.

This example defines a variable field called payload, whose size is derived from the value of a previously processed field, whose name is length.

<protocol name="Example">
  <format>
     <fields>
       <field type="fixed" name="length" size="2"/>
       <field type="variable" name="payload" expr="buf2int(length)"/>
     </fields>

  </format>
</protocol>

Fields that are ended by a given token: the tokenended type

The tokenended type is used to declare a field that ends when a specific token is encountered. In addition to the standard attributes of the <field> element, in case of tokenended fields the following attributes are allowed:

AttributeDescription
endtoken (optional)It contains the token that defines the end of this field. The token must be a string (e.g. ”\x0D\x0A”), written according to the same rules (and limitations) defined for NetPDL String Operands, with the exception that the string must not be delimited by the ”'” character. For instance, shown value means that the field ends when two (consecutive) bytes of values x0D and x0A are found in the packet data. Note: either endregex or endtoken must be present among the attribute list.
endregex (optional)It contains the token under the form of a regular expression that defines the end of this field. The token must be a string containing a regular expression (e.g. ”\0x0D\0x0A[^\x09\x20]”), written according to the same rules (and limitations) defined for NetPDL String Operands, with the exception that the string must not be delimited by the ”'” character. For instance, shown value means that the field ends when two (consecutive) bytes of values x0D and x0A are found in the packet data, and these characters are not followed by \x09 (i.e. the horizontal tab) or 0×20 (i.e. the space). Note: either endregex or endtoken

must be present among the attribute list.

endoffset (optional)It defines the offset (computed from the starting offset of this field) that must be considered for terminating this field. Usually, the field ends when the terminating token (or the terminating regular expression) ends, and the field length includes the length of the terminator. Sometimes it may be useful to terminate this field earlier: this attribute can be used for this. This attribute is a generic NetPDL expression, although it often makes used of the $token_begintlen, $token_fieldlen and $token_endtlen NetPDL variables.
enddiscard (optional)It defines the amount of bytes to discard before starting the analysis of the following field. In other words, the discarded bytes does not belong neither to the current field nor to the next field. This feature is often used in order to avoid analyzing the same bytes in the packet dump twice. More details are given in the example below.

Example: a simple tokenended field This example defines a field that is terminated by a carriage return / line feed character, and another that is terminated by a comma.

<protocol name="Example">
  <format>

    <fields>
      <!-- This field is terminated by a CR/LF string -->
      <field type="%%{{{%%tokenended}}}" name="field1" endtoken="\x0D\x0A"/>
      <!-- This field is terminated by a comma -->
      <field type="%%{{{%%tokenended}}}" name="field1" endtoken=","/>
    </fields>

  </format>
</protocol>

Example: a more complex tokenended field

In this new example, we define a field that are terminated by \r\n characters, unless they are followed by an horizontal tab or a space. For instance, this is the format of the header fields within the email envelope, in which fields can be folded and unfolded, as per RFC 2822).

<protocol name="Email">
  <format>
    <fields>
      ...
      <field type="%%{{{%%tokenended}}}" name="emailheader" endregex="%%{{{%%\r\t[^\t ]}}}" endoffset="%%{{{%%$}}}token_fieldlen+ %%{{{%%$token_endtlen - 1}}}"/>
      ...
    </fields>

  </format>
</protocol>

In this example the endoffset attribute is used: it tells the NetPDL engine that the field does not end when the ending signature terminates; instead, it ends at an offset equal to the given expression. For instance, let us suppose the following packet dump:

from: <from@domainfrom.com>\r\nto: <to@domainto.com>

The definition in the previous example allows the extraction of the “from” part from the packet dump. However, the ending regular expression is ”\r\t[^\t ]”, which selects also the first character of the “to” part. This is not what we want, because the first character after \r\n must only be checked, but must not assigned to the current field. The endoffset attribute solves the problem and defines that the field ends at an offset equal to the “pure” field size (withouth beginning/ending tokens) plus the size of the ending token minus one. The offset defined by the endoffset attribute is an offset relative to the current field, not within the whole packet dump (i.e. in this case $fieldtokenlen is equal to 27, and $endtoken is equal to 3).

For more details about the variables used in this example, please refer to the NetPDL variables section.

Example: a more complex tokenended field with discarded bytes

The PORT command is used by the FTP protocol to dynamically negotiate the port associated to a dynamic data trasfer. An example of a PORT command (as it appears in the packet dump) is the following:

PORT 130,192,226,140,18,255

which corresponds to port 4863 (which is 18*256 + 255) on host 130.192.226.140.

The following fragment can be used to describe this record:

<field type="line" name="portline">

  <field type="tokenended" name="command" endtoken=" " endoffset="$token_fieldlen" enddiscard="1" />
  <field type="tokenended" name="host1" endtoken="," endoffset="$token_fieldlen" enddiscard="1" />
  <field type="tokenended" name="host2" endtoken="," endoffset="$token_fieldlen" enddiscard="1" />
  <field type="tokenended" name="host3" endtoken="," endoffset="$token_fieldlen" enddiscard="1" />
  <field type="tokenended" name="host4" endtoken="," endoffset="$token_fieldlen" enddiscard="1" />
  <field type="tokenended" name="port1" endtoken="," endoffset="$token_fieldlen" enddiscard="1" />

  <field type="tokenended" name="port2" endtoken="\x0D" endoffset="$token_fieldlen" />
</field>

The first field (type=“line”) defines the format of the entire line. Inside this line, other fields are defined which span only a portion of the line itself. The first field contains the command (i.e. PORT) and it is terminated by a space. However, the space (i.e. the ending token) is not part of the command itself, therefore the ending offset of that field is equal to the field length (endoffset=“$token_fieldlen”), not including the length of the ending token. Moreover, the space is neither part of the next field, which contains the first byte of the host address: hence the enddiscard=“1” attribute, that tells the NetPDL engine that the following byte (after the ending offset) must be discarded.

A note about regular expressions in tokenended and tokenwrapped fields

Considering (for simplicity) a tokenended field, a endregex=”[^\t\n]” corresponds to a field made up of any string terminated by either \t or \n. For instance, the whole field can be defined with the following regular expression:

.*?[^\t\n]

where the question mark is used in order to define a non-greedy regular expression.

The tokenended (and the same applies also to the tokenwrapped) field forces defining the ending regular expression instead of the entire regular expression for that field. This is because the regular expression is usually more clear, and because we can distinguish the characters that belong to the signature (e.g. the ending regular expression) from the ones that belong to the rest of the field, and we can use these numbers in order to customize the fields starting offsets and such through the $fieldtokenlen and $endtokenlen NetPDL variables.

Fields that are delimited by two tokens: the tokenwrapped type

The tokenwrapped type is used to define a field that is delimited (both at the beginning and at the end) by two different tokens. In addition to the standard attributes of the <field> element, in case of tokenwrapped fields the following attributes are supported:

AttributeDescription
begintoken (optional)It contains the token that defines the beginning of this field. The token must be a string (e.g. ”\x0D\x0A”), written according to the same rules (and limitations) defined for NetPDL String Operands, with the exception that the string must not be delimited by the ”'” character. For instance, shown value means that the field begins when two (consecutive) bytes of values x0D and x0A are found in the packet data. Note: either beginregex or begintoken must be present among the attribute list.
beginregex (optional)It contains the token under the form of a regular expression that defines the beginning of this field. The token must be a string containing a regular expression (e.g. ”\0x0D\0x0A[^\x09\x20]”), written according to the same rules (and limitations) defined for NetPDL String Operands, with the exception that the string must not be delimited by the ”'” character. For instance, shown value means that the field begins when two (consecutive) bytes of values x0D and x0A are found in the packet data, and these characters are not followed by \x09 (i.e. the horizontal tab) or 0×20 (i.e. the space). Note: either beginregex or begintoken

must be present among the attribute list.

endtoken (optional)It contains the token that defines the end of this field. The format follows the same rules of the begintoken attribute. Note: either endregex or endtoken must be present among the attribute list.
endregex (optional)It contains the token that defines the end of this field. The format follows the same rules of the beginregex attribute. Note: either endregex or endtoken must be present among the attribute list.
beginoffset (optional)It defines the offset (computed from the starting offset of the entire field, including the starting token) that must be considered for beginning this field. Usually, the field begins when the beginning token (or the beginning regular expression) is located, and the field length includes the length of the terminator. Sometimes it may be useful to start this field later: this attribute can be used for this. This attribute is a generic wiki:NetPDLExpressions NetPDL

expression], although it often makes used of the $token_begintlen, $token_fieldlen and $token_endtlen NetPDL variables.

endoffset (optional)It defines the offset (computed from the starting offset of the entire field, including the starting token) that must be considered for terminating this field. It follows the same rules of the beginoffset attribute.

This example defines a variable called command, which is delimited by two carriage return / line feed token, one at the beginning and one at the end of the field.

<protocol name="Example">
  <format>
    <fields>
      <!-- This field is delimited by a couple of CR/LF string -->

      <field type="%%{{{%%tokenwrapped}}}" name="field1" begintoken="\x0D\x0A" endtoken="\x0D\x0A"/>
    </fields>
  </format>
</protocol>

Note: after processing, the length assigned to the field includes also the length of the two token delimiters unless the beginoffset and endoffset are defined.

Note: in case the begintoken is used, this field MUST begin at the current offset. In case the data at the current offset does not match the begintoken string, the field processing is aborted and processing continue with the next field within the packet format. In case the beginregex is used, the field can start at an offset larger than the current offset, depending if the defined regular expression (e.g. ”[^:]*”) allows discarding data at the beginning of the string.

ASCII strings with carriage return delimiter: the line type

A line type is a particular case of a tokenended type in which the token is equal to a “new-line” string. This type has been defined because of the large presence of text lines in higher level protocols, like HTTP, SMTP and many others, and the different ways the 'new line' string is handled by different operating systems (e.g. UNIX uses a '\n' character, while DOS/Windows uses the '\r\n' string).

This type of field does not have any new attribute in addition to the ones defined for the <field> element.

The following code is equivalent to previous example related to the tokenended field, but it makes use of the line type.

<protocol name="Example">

  <format>
    <fields>
      <field type="line" name="string"/>
    </fields>
  </format>
</protocol>

Fields that align data to an N-bytes boundary: the padding type

The padding type is used to re-align the PDU either to a short or to a word. It is often used after a variable element (e.g. variable, delimited or line). In addition to the standard attributes of the <field> element, in case of padding fields one more attribute is required:

AttributeDescription
align (required)It specifies the type of the alignment, in number of bytes. For instance, a value of ”4” is used to realign the protocol headers to the next 32 bits boundary, while ”2” is used to realign to the next 16 bits. Allowed values are 2 and 4.
<protocol name="Example">
   <format>
     <fields>
       <field type="fixed" name="length" size="2"/>
       <field type="variable" name="payload" expr="length"/>
       <field type="padding" name="pad" align="4"/>

     </fields>
  </format>
</protocol>

Fields that cannot be processed using the NetPDL specification: the plugin type

The plugin type is used to define a field that cannot be described with other NetPDL primitives. In addition to the standard attributes of the <field> element, in case of plugin fields one more attribute is required:

AttributeDescription
plugin (required)It defines the name of the plugin that has to be used to process the current field. This plugin corresponds to a piece of native code that must be implemented in the NetPDL engine. Please note that the use of plugins leads to the creation of non-portable NetPDL descriptions because that description can be used only by NetPDL engines which implements that plugin natively.

This example defines a field that has to be processed through a plugin.

<protocol name="Example">
  <format>

    <fields>
      <field type="plugin" name="length" plugin="mynativeplugin"/>
    </fields>
  </format>
</protocol>

Nested Fields

In general, the <field> element can contain any other element that is allowed as children of the <fields> element, i.e. all the Protocol Fields, the Additional Format Elements and the Conditional Elements.

In this case, any processing related to the sub-field cannot span over the parent field. For instance, if the size of the parent field is 10 bytes, the sum of the subfields cannot be larger than 10 bytes. Processing of subfields is equivalent to the processing of master fields.

Example: formatting the to: field of an e-mail

The email envelope, within an SMTP packet, contains a to: field that can include multiple recipients. A simple (and not complete) description that allows splitting each recipient is the following:

<field type="line" name="mailto" longname="To">
  <loop type="while" expr="1">

    <field type="tokenended" name="recipient" longname="Destination address" endtoken="','">
      <field type="tokenended" name="alias" longname="Email alias" endtoken="'\x3C'"/>
      <field type="tokenwrapped" name="email" longname="Email address" begintoken="'\x3C'" endtoken="'\x3E'"/>
    </field>
  </loop>
</field>

In this example, a line field is further split in a set of tokens (whose separator is a comma), and each token is split between the alias and the email address. The <loop> element will terminate when no additional data is available for processing.

Additional elements allowed in the <fields> section

Some additional elements have been defined for taking into account some additional needs. However, these do not aim at describing protocol fields directly; instead, they provide support for describing fields that can be found under special conditions (e.g. repeated fields).

ElementDescription
<block>It contains the definition of some group of fields (e.g. optional headers) that can be present within the protocol format.
<includeblk>It includes a block within the current position in the protocol headers.
<loop>It defines a set of fields that can be repeated several times.
<loopctrl>It defines an event that can modify the standard processing of a loop.

Note: although <loop> and <loopctrl> elements seem conditional elements, they are allowed only within the protocol format section. For this reason they are presented here and not in the Conditional Elements section.

Modularizing the definition of the current protocol: the <block> element

A <block> is an object that looks very similar to a <fields>. This object contains the definitions required to describe a given set of fields belonging to the current protocol. This can be, for example, a protocol option (e.g. the IPv4 Record Route Option) or an extension header (e.g. the IPv6 Hop-by-Hop Option), which can be seen as blocks of code that can be placed outside the main block that defines protocol header in order to improve readability.

There is not a gold rule about using blocks. However, here there are some suggestions:

  • if your protocol is made of a first part (named A) that is always present and a second part that can have different formats (named B, C and D), then a <block> is a good choice. You can implement 'A' in the usual way within the <fields> section, then 'B', 'C', and 'D' can be implemented as blocks. The code will become more readable if options are defined within smaller <block> tags. In this case, the <block> might be coupled with some Conditional Elements, that are used to determine if (and how) the option is present in the current protocol.
  • if your protocol has a well-defined section that can be repeated several times in different places (for example a group of IP addresses), then you can implement that part as block and include this set of fields every time is needed by means of the <includeblk> element.
  • if your protocol is made up of a first part which is always present, and a second optional part, you can use the <block> to group the list of fields related to the optional part.

The NetPDL is not a one-choice language: you can implement the same protocol in several different ways. The ones above are just suggestions.

A <block> does not influence protocol processing: all the elements within a block are processed in the same way with or without a block. However, the <block> element provides a way to organize the protocol structure in a better way for the (human) reader. However, please note that a <block> is scope-limited to the current protocol.

The <block> element supports the following attributes in addition to the Standard ones:

AttributeDescription
name (required)A unique name that identifies the object within its scope.
longname (optional)It keeps a 'human' name and it may be used when the object has to be shown.

The <block> element supports several child elements: <field>, Additional Format Elements and Conditional Elements.

Linking the fields contained in a <block>: the <includeblk> element

The <includeblk> provides a way to include a set of fields (the ones defined in the <block>) in the current list of fields, like the #include directive of the C/C++ languages. The <includeblk> does not have any child element and supports the following attributes in addition to the Standard ones:

AttributeDescription
name (required)A unique name that corresponds to the name of the block that has to be included. The included block must be defined within the current protocol.

An example can be the following:

<protocol name="Example">
  <format>

    <fields>
      ...
      <includeblk name="TLVoption"/>
      <includeblk name="SentinelField"/>
    </fields>

    <!-- Now we can define the options -->

    <block name="TLVoption">
      <field type="fixed" name="type" size="1"/>
      <field type="fixed" name="length" size="4"/>
      <field type="variable" name="payload" expr="length"/>
    </block>

    <block name="SentinelField">

      <field type="fixed" name="sentinel" size="1"/>
    </block>
  <format>
</protocol>

In this case, the protocol has two options, called TLVoption and SentinelField, which are processed after all the previous fields have been elaborated.

Often, however, options are present under some condition. In other words, the number of options could not be known, their order could be random, and so on. In the real world, the <block> - <includeblk> elements are usually coupled with Conditional Elements.

Defining a set of fields that can be repeated an arbitrary number of times: the <loop> element

In the most common case, a field has a single occurrence in a PDU. A <loop> element is used to describe a field or an ordered collection of fields that have multiple occurrences. A <loop> element has the following attributes in addition to the Standard ones:

AttributeValueDescription
type (required) It defines the type of the loop.
sizeIt is used to specify the size (in bytes) occupied by the iterated fields. The list of fields contained in the <loop> must be repeated until their size reaches a given value. In case the size is equal to zero, all the fields defined in the <loop> block are ignored.
times2repeatIt is used to specify the number of times the list of fields contained in the <loop> block must to be repeated. In case the number of repetitions is equal to zero, all the fields defined in the <loop> block are ignored.
whileIt is used to specify a loop in which the condition is evaluated before entering in the loop. The list of fields contained in the <loop> block can be repeated zero or more times.
do-whileIt is used to specify a loop in which the condition is evaluated at the end of the loop. The list of fields contained in the <loop> block can be repeated one or more times.
expr (required) This attribute defines an expression that will return the “size” of the loop. The meaning of the value returned by the expression differs according to the type of the expression (defined in the type attribute):
* type=“size”: it gives the size (in bytes) of the data that has to be processed using the fields contained in the loop
* type=“times2repeat”: it gives the number of times the loop has to be repeated
* type=“while”: the expression must return a boolean value; the condition is evaluated before entering in the loop. The loop is executed only if the result is true.
* type=“do-while”: the expression must return a boolean value; the loop is executed, then the expression is evaluated. If the result is true, the loop is repeated again.

The <loop> element supports several child elements: <field>, Additional Format Elements and Conditional Elements (hence nested <loop> are allowed). At list one child is required.

A special child is the <missing-packetdata> element (optional), which (if present) MUST BE the first child of the <loop>. This element defines a special branch that has to be executed when the expression cannot be evaluated due to the fact that the packet buffer does not have enough data in it. For more details about this element, please read the <missing-packetdata> section.

Example: processing N times a block of fields

This example presents an extract of the DNS protocol, in which the Question Section (i.e. the names that we want to be resolved in addresses) can be repeated several times. The DNS packet has a field (qstcnt) that keeps the number of questions contained in the packet. The NetPDL definition for the DNS protocol uses a <loop> whose type attribute is times2repeat, so that the loop is repeated qstcnt times. Fields that are processed qstcnt times are contained in the <loop> section.

<protocol name="DNS">
  <format>
    <fields>
      <field type="fixed" name="ID" longname="Identifier" size="2"/>
      <field type="fixed" name="flags" longname="Flags" size="2"/>
      <field type="fixed" name="qstcnt" longname="Question section count" size="2"/>

      <field type="fixed" name="anscnt" longname="Answer section count" size="2"/>
      <field type="fixed" name="autcnt" longname="Authority section count" size="2"/>
      <field type="fixed" name="addcnt" longname="Additional section count" size="2"/>
      <loop type="times2repeat" expr="qstcnt">
        <field type="delimited" name="Qname" longname="Question name" token="00"/>
        <field type="fixed" name="Qtype" longname="Question type" size="2"/>

        <field type="fixed" name="Qclass" longname="Question Class" size="2"/>
      </loop>
      ...
    </fields>
  </format>
</protocol>

Breaking a loop: the <loopctrl> element

Sometimes, the previous elements are still not enough to process a protocol. Like in the C/C++ languages, sometimes a condition occurs that forces to break a cycle (i.e. exit from the <loop> and continue with the next element) or to restart it again, without going to the end of the instructions contained in the loop. This condition is managed by an additional tag, the <loopctrl> element.

The <loopctrl> element can have the following attributes in addition to the Standard ones:

AttributeValueDescription
type (required) It defines the type of the loopctrl.
breakThe NetPDL protocol processing engine is forced to terminate the loop, without regard to the condition. It corresponds to the break instruction in the C/C++ languages.
continueThe NetPDL protocol processing engine is forced to stop the processing at the present position and restart the processing from the first field of the loop. It corresponds to the continue instruction in the C/C++ languages.

Warning: differently from the equivalent C/C++ instructions, the <loopctrl> element can be used only to control the execution flow within a <loop>.

Example: breaking the loop

This example presents an extract of the IPv6 protocol, which is made up of a mandatory part (the first 40 bytes) and a set of options. Options can be queued, i.e. there can be several options one after the other and there is no way to know, a priori, neither the global size of the options, nor their number.

The simplest way to process IPv6 options is to define a <loop>, which is repeated until there are options. Each option has its field named nexthdr that keeps the code of the next option: the expression in the <looptype> could check that the next option is one of the allowed one: if so, there is another option, otherwise the protocol has ended. The loop is repeated if the option code is 43, 44, 51 or 60 (please note that the IPv6 processing has even more options, not reported here for clarity). Here there is a snapshot of that code:

<loop type="while" expr="(nexthdr == 43) || (nexthdr == 44) || (nexthdr == 51) || (nexthdr == 60)">
  ...
</loop>

Clearly, this code does its job, but it is quite complex. A better solution that makes use of the <loopctrl> element is the following:

<protocol name="IPv6">
  <format>

    <fields>
      <field type="fixed" name="verhlen" size="4">
        <field type="bit" name="ver" longname="Version" mask="0xF0000000"/>
        <field type="bit" name="tos" longname="Type of service" mask="0x0F000000"/>
        <field type="bit" name="flabel" longname="Flow label" mask="0x00FFFFFF"/>
      </field>

      <field type="fixed" name="plen" longname="Payload Length" size="2"/>
      <field type="fixed" name="nexthdr" longname="Next Header" size="1"/>
      <field type="fixed" name="hop" longname="Hop limit" size="1"/>
      <field type="fixed" name="src" longname="Source address" size="16"/>
      <field type="fixed" name="dst" longname="Destination address" size="16"/>

      <loop type="while" expr="1" comment="loop until interrupted">

        <switch expr="nexthdr">
          <case value="43"> <includeblk name="RH"/>  </case>
          <case value="44"> <includeblk name="FH"/>  </case>

          <case value="51"> <includeblk name="AH"/>  </case>
          <case value="60"> <includeblk name="DOH"/> </case>
          <default>

            <!-- Default branch -->
            <loopctrl type="break"/>
          </default>
        </switch>
      </loop>
    </fields>

    ...
  </format>
</protocol>

The <loop> evaluates a fake condition, which is always true. Then, the control is passed to a <switch>-<case> element: if the option has been found, it is processed and the loop continues. Otherwise, the default clause is executed, which will break the loop through the <loopctrl> element.

Conditional elements

Conditional elements are widely used in packet processing: they are required for header processing, detecting the correct protocol encapsulation, and also optional features such as printing protocol structure and such.

This section defines the following conditional elements:

ElementsDescription
<if>It defines a “branch” in the current elaboration process.
<switch>It defines a “branch” with multiple options in the current elaboration process.

The standard choice method: the <if> element

The <if> element aims at evaluating a condition and at performing some actions only if the condition is matched. It has the following attributes in addition to the Standard ones:

AttributeDescription
expr (required)It contains the boolean expression that has to be evaluated at run-time in order to select the proper <if> branch: if it is true, the processing jumps to the fields defined in the <if-true> element, otherwise the NetPDL engine jumps to the <if-false> element. In case the latest element is missing, it jumps out of the entire <if> branch.

In addition, <if> element supports the following child nodes:

ElementDescription
<if-true> (required)It defines the list of fields that can must be processed if the condition is true. This element does not have attributes; it can have all the child element that are supported within the current section. For more details about the supported child nodes, please check at section that lists the allowed children in conditional elements. For example, if the <if> is included within a <fields> section, it can support all the elements allowed within a <fields> element.
<if-false> (optional)It defines the list of fields that can must be processed if the condition is false. This element can be omitted in case the 'else' condition is not required. For more details about the supported child nodes, please check at section that lists the allowed children in conditional elements.
<missing-packetdata> (optional)It defines a special branch that has to be executed when the expression cannot be evaluated due to the fact that the packet buffer does not have enough data in it. For more details about this element, please read the <missing-packetdata> section.

An example of the <if> element can be seen when processing the Ethernet frame:

<protocol name="Ethernet">
  <format>
    <fields>
      <field type="fixed" name="dst" size="6"/>
      <field type="fixed" name="src" size="6"/>

      <!-- Check if the next two bytes are less or equal 1500 -->

      <if expr="buf2int($packet[$currentoffset:2]) le 1500">
        <if-true>
          <field type="fixed" name="Length" size="2"/>
        </if-true>
        <if-false>
          <field type="fixed" name="EtherType" size="2"/>

        </if-false>
      </if>

    </fields>
  <format>
</protocol>

The NetPDL engine will evaluate the first two fields of the packet, then it needs to know the value of the next two bytes for further processing. This is done through an <if> element, whose result will be used to select which one among the <if-true> and the <if-false> branch must be used. In the first case, the next field is Length, otherwise is EtherType.

Making more compact the code needed for several similar choices: the <switch> - <case> elements

Often there is the case of multiple-if choices, always performed on the same “handle”. For these cases the <switch> - <case> elements have been defined in order to make the protocol description cleaner.

Defining the key in a <switch> element

The <switch> element defines the following attribute in addition to the Standard ones:

AttributeDescription
expr (required)This attribute defines an expression that will return the value to be used to compare against the <case> element that will follow. The returned value is called key and it can be either an integer or a string, depending on the expression type. The result of the expression will be used to determine which is the correct branch. The possible branches are defined in the <case> elements. The most common expression is a reference to a field that has already been encountered in the processing.
casesensitive (optional)It defines if the matching has to be done in a case-sensitive way (default) or not. This attribute is effective only in case of a string-based match. Its value is yes in case of a case-sensitive match, no otherwise.

A <switch> element has the following child elements:

ElementDescription
<case> (required)This element defines the values to which the key (i.e. the value of the field specified in the previous expression) has to be compared and the type of the comparison that has to be made.
<default> (optional)This element defines the 'default' choice, in case the key does not match with any <case>.

An example can be seen below:

<switch expr="MACsrc[0:1]">
  <case ...> ... </case>
  ...
</switch>

This example makes use of a <switch> element: the expression gets the value of the MACsrc field and it extracts only the first byte of the MAC address. The resulting value will be the key that will be used in the following <case> elements. In this case, the <switch> - <case> elements operate on strings because any field reference (such as MACsrc) is a buffer.

Comparing the key to a different set of conditions: the <case> element

The second part of the <switch>-<case> block is the <case> element, which defines the possible branches. The <case> element has the following attributes (in addition to the Standard ones):

AttributeDescription
value (required)It keeps the value used to evaluate the expression: the branch is selected if the value matches the key. The value must be a number (e.g. value=“10”) if the switch contained mathematical expressions; a string (e.g. value=”'string'”) otherwise. In both cases, number and strings must be written according to the same rules (and limitations) defined for NetPDL String and Number Operands.
maxvalue (optional)It keeps the maximum value allowed for the key and it is used for checking the key against a range of values instead of a single value. For instance, if this attribute is present, the <case> is selected if the relationship value ⇐ key ⇐ maxvalue is true. Warning: this attribute can be used only in case of numeric comparison; it cannot be used in case of strings.

For example, the following statement:

<case value="10"> ... </case>

will be selected only if the expression produced by evaluating the key is equal to “10”.

The value attribute supports both numbers and strings and it uses the same syntax defined in NetPDL expressions. Briefly:

FormatDescriptionExample
[0-9]+Decimal number10
0x[0-9a-fA-F]+Hex number0x86DD
0b[0-1]+Binay number0b00001100
'[^']*'String'abcd', 'endofline\x0D'
#[0-9a-fA-F]+Protocol reference#tcp

Please be careful to use the same data type as the one used in the <switch> element. In other words, if the expression defined in the <switch> element returns a string, all the <case> elements must be defined as strings (hex or ascii).

The <case> element will contain the list of fields related to the section in which the <case> is placed. For more details about the supported child nodes, please check at section that lists the allowed children in conditional elements. For example, if the <case> is placed within a <fields> section, it can support all the elements allowed within a <fields> element. An empty <case> should not be used.

Warning: only one <case> branch will be selected and their evaluation is order dependent.

Comparing the key to a different set of conditions: the <default> element

The <default> element looks like a <case> without any attribute. This element is optional; if present, it is selected if no other <case> elements match against the key.

Limitations

The <switch>-<case> elements are not intended to replace the <if> tag because only this element can handle complex expressions. Particularly, the <switch>-<case> can be used when:

  • all the possible branches depends on the result of a single expression.
  • only a single <case> can be selected: the <switch>-<case> does not allow for more than one choice. In other words, although there may be more matching branches (when the maxvalue is used, branches may overlap), only the first one will be selected; following branches cannot be selected even if the first one proved itself wrong (e.g. due to a post-validation through a <checkproto> element).

Example: processing a protocol whose format is made up of a common part and a set of options for the second part

This situation is quite common among network protocol. An example can be seen in the ICMP protocol, which has a common part made up of three fields and a second part that can have many formats, depending on the value of a field present in the fist part.

<protocol name="ICMP">
  <format>
    <fields>
      <field type="fixed" name="type" longname="Type" size="1"/>
      <field type="fixed" name="code" longname="Code" size="1"/>

      <field type="fixed" name="checksum" longname="Checksum" size="2"/>
      <!-- fields switch -->
      <switch expr="buf2int(type)">

        <case value="0">
          <!-- This is an echo request packet -->
          <field type="fixed" name="identifier" longname="Identifier" size="2"/>

          <field type="fixed" name="seqnumber" longname="Seq. number" size="2"/>
        </case>

        <case value="3">
          <!-- This is a destination unreachable report -->
          <field type="fixed" name="unused" longname="Unused" size="4"/>
        </case>

       <default>
          <!-- Unknown option -->
        </default>
      </switch>
      ...
    </fields>

  </format>
</protocol>

The common part is processed first; then, the <switch>-<case> is used to check the value of the field type (present in the common part of the header) and to determine which is the format of the second part. The value of the field type (function buf2int() is used to convert the field reference into a number) is compared against the number 0: if this is verified, the fields present in the packet are the ones included in the first <case> tag. Otherwise, the switch continues and the NetPDL engine will check the field type against the next <case> tag. In case no branches are available, the <default> branch is selected.

Each <case> and <default> branch can contain any set of fields; it follows that nested <switch> are possible.

Example: comparing against range of values

This fragment of code presents an example of a branch that compares against ranges of values:

<switch expr="buf2int(type)">
  <case value="126"> ... </case>
  <case value="127"> ... </case>
  <case value="255"> ... </case>

  <case value="128" maxvalue="254"> ... </case>
  <default> ... </default>
</switch>

In this example we execute the third branch in case se value of the type field is equal to 255, while we follow the fourth branch for all the values that are >= 128 and ⇐ 254. In case no <case> elements are suitable, the <default> branch is executed.

Allowed child elements for conditional elements

Conditional elements can be present in several different places within a NetPDL file. Child elements of <if-true>, <if-false> and <case> elements change according to their position within the NetPDL file and can be the following:

Position of the given elementsChild elements
Within a <format> elementAll the elements allowed as children of the <fields> element (i.e. all the Protocol Fields and all the Additional Format Elements) and all the Elements.
Within a <encapsulation> elementAll the elements allowed as children of the <encapsulation> element and all the Conditional Elements.

Protocol encapsulation

This section specifies the way a protocol is linked with other protocols, i.e. the protocol encapsulation.

Defining the protocol encapsulation: the <encapsulation> element

NetPDL uses the <encapsulation> element to describe the PDU encapsulation. This element contains a collection of <nextproto> , <nextproto-candidate> and (optionally) Conditional Elements. The <encapsulation> element does not have any attribute in addition to the Standard ones.

Elements allowed inside the <encapsulation> section are the following:

ElementDescription
<nextproto> (optional)It defines the protocol to jump to.
<nextproto-candidate> (optional)It defines a candidate protocol to jump to.
<if> (optional)It allows defining conditional constructs in protocol encapsulation.
<switch> (optional)It allows defining conditional constructs in protocol encapsulation.

Defining the encapsulated protocol: the <nextproto> element

The <nextproto> element is the most general way to specify which protocol is encapsulated in the present one. This includes the name of the next protocol we have to move to. The <nextproto> element supports one attribute in addition to the Standard ones:

AttributeDescription
proto (optional)This attribute defines an expression that will return a reference to the wanted protocol. Obviously, the NetPDL library must contain a protocol whose name is equal to the one defined in the expression. Please note that references to protocol fields begins with the # sign, followed by the protocol name. For instance, #http means ”the result of this expression is the HTTP protocol”.

Example: jumping to the protocol contained within a TokenRing frame

This example shows the portion of NetPDL needed to jump to the next protocol contained in a Token Ring frame:

<protocol name="TokenRing">
  ...
  <encapsulation>
    <%%{{{%%nextproto }}}proto="#llc">

  </encapsulation>
</protocol>

This example is rather simple because a Token Ring frame always contains an LLC packet. However, often some conditions have to be evaluated before taking the decision. This is the reason of using some more complex structures (<if> and <switch>) coupled with the previous one.

Example: jumping to the protocol in the session entry of a TCP session table

This example shows the portion of NetPDL needed to jump to the next protocol for a given TCP session, contained in the TCP session table:

<protocol name="tcp">
  ...
  <encapsulation>
    <if expr="checklookuptable($tcpsessiontable, $ipsrc, $ipdst, $portsrc, $portdst)">
      <if-true>
        <nextproto proto="$tcpsessiontable.nextproto"/>

      </if-true>
    </if>
  </encapsulation>
</protocol>

In this case, the checklookuptable() function checks if the TCP session (identified by the tuple $ipsrc, $ipdst, $portsrc, $portdst) belongs to the $tcpsessiontable lookup table. If so, the <nextproto> jumps to the protocol that is defined in the field nextproto of the lookup table.

Defining the encapsulated protocol upon verification: the <nextproto-candidate> element

Usually, the information about which is the next protocol is included in the current protocol headers. For instance, when examining the TCP header, the SourcePort and DestinationPort fields tell us which is the application-level protocol present in the packet.

However, sometimes this information is not enough. For instance, there are protocols that do not have a well-known port, or other cases in which a protocol is not using its well-known port (e.g. an HTTP packet on port 2000). The <nextproto-candidate> tells the NetPDL engine that a given protocol may follow, but some additional checks are required to determine the correctness of this choice. The <nextproto-candidate> element tells the NetPDL engine that the <verify> section of the candidate protocol must be executed and the decision must be taken only upon verification. The <verify> section contains a piece of code (e.g. conditional expressions) that can be useful to verify if data is compatible with the protocol format itself. For instance, in case of an HTTP packet, it is likely that the data payload will start with one of the following keywords: GET, POST or HTTP.

In case the target protocol does not contain any <verify> section, the NetPDL engine will assume that the target protocol does not satisfy the verification. For more details about protocol verification, please refer to the Executing code for verifying the correctness of a protocol section.

The <nextproto-candidate> element supports the same attribute of the <nextproto> element, (i.e. proto), used exactly in the same way.

Special cases for protocol encapsulation: default protocol, ethernet padding and conditional elements

In order to select the next protocol, the NetPDL permits the usage of <if> and <switch> - <case> elements. The syntax is the one already presented; allowed child for <if-true>, <if-false> and <case> elements are all the elements allowed within the <encapsulation> section.

In case the NetPDL engine is not able to find which is the next protocol, the <encapsulation> will return without any result. In that case, the default protocol (defined in the defaultproto protocol) will be selected if there is still data to be parsed. Viceversa, the etherpadding protocol will be selected in case no data has to be processed, but there is still some ethernet padding at the end of the packet. These two elements will be shown later.

Warning: The statements are examined in order; that is, the protocol analysis engine will jump to the first suitable protocol encountered.

The $nextproto variable

The result of the search about protocol encapsulation is stored in the $nextproto variable. This variable can be used in order to check (if needed) which are the headers that follow the current one. For instance, if no suitable protocol are found, this variable will assume value #defaultproto or #etherpadding (please remember that protocol references, in expressions, begins with the # sign).

Example: retrieving the network-layer protocol contained in an Ethernet frame

This example shows the portion of NetPDL needed to determine the network-level protocol contained in an Ethernet frame:

<protocol name="Ethernet" longname="Ethernet 802.3">
  <format>
    <fields>
      <field type="fixed" name="dst" longname="MAC Destination" size="6"/>

      <field type="fixed" name="src" longname="MAC Source" size="6"/>
      <field type="fixed" name="ethertype" longname="Ethertype" size="2"/>
    </fields>
  </format>

  <encapsulation>
    <!-- Check if next protocol if IPv4 -->

    <if expr="buf2int(ethertype) == 0x0800">
     <if-true>
       <%%{{{%%nextproto }}}proto="#ipv4"/>
     </if-true>
    </if>

    <!-- Check if next protocol is IPv6 -->

    <if expr="buf2int(ethertype) == 0x86DD">
      <if-true>
        <%%{{{%%nextproto }}}proto="#ipv6"/>
      </if-true>
    </if>
  </encapsulation>

</protocol>

This portion of code shows the instructions that allow to determine if the next protocol is IPv4 or IPv6: the <if> element checks the value of the ethertype field: if the value is 0×0800 the <if-true> branch will be executed (hence, the <nextproto> element points to IPv4); if it is 0x86DD it will be IPv6. In the other cases, no suitable protocols are be found and the NetPDL engine will jump to the default one (#defaultproto or #etherpadding elements).

The same example can be revewitten using <switch> - <case> elements:

<protocol name="Ethernet" longname="Ethernet 802.3">
  <format>
    <fields>

      <field type="fixed" name="dst" longname="MAC Destination" size="6"/>
      <field type="fixed" name="src" longname="MAC Source" size="6"/>
      <field type="fixed" name="ethertype" longname="Ethertype" size="2"/>
    </fields>
  </format>
  <encapsulation>

    <switch expr="buf2int(ethertype)">
      <case value="2048">  <%%{{{%%nextproto }}}proto="#ipv4"/> </case>
      <case value="34525"> <%%{{{%%nextproto }}}proto="#ipv6"/> </case>

    </switch>
  </encapsulation>
</protocol>

Defining the starting protocol: the startproto protocol

Having the hex dump of a network packet, the first problem is to determine the link layer on which the packet has been captured. For instance, the parsing of the hex dump will be completely different from a link-layer technology to another.

In order to determine the protocol to start processing with, the startproto protocol is provided. This element is similar to a zero-sized dummy protocol and it does not have any <format> child. For instance, this element could determine the first protocol by checking the status of the run-time variable that contains the link-layer type.

Please note that startproto, defaultproto and etherpadding protocols must always be present in a valid NetPDL file.

Example: defining the startproto protocol

This example shows a startproto protocol that checks the value of the linktype run-time variable: if it is equal to 1, the first bytes of the packet will be managed using the syntax of the Ethernet frame; the Token Ring format will be used in case it is equal to 6.

<protocol name="startproto">
  <encapsulation>
    <switch expr="$linklayer">
      <case value="1">  <nextproto proto="#ethernet"/> </case>
      <case value="6">  <nextproto proto="#tokenring"/> </case>

      <case value="10"> <nextproto proto="#fddi"/> </case>
    </switch>
  </encapsulation>
</protocol>

Defining a "default" protocol: the defaultproto protocol

The defaultproto protocol offers a way to process the packet when there is not any appropriate protocol available. This is the case, for example, when the NetPDL library does not contain the description for some exoteric protocol, or when all the protocols have been recognized and the remaining part of the packet is a generic application data.

The defaultproto is a new protocol that contains only one <format> child and it does not have have the <encapsulation> element.

The presence of the defaultproto protocol depends on the value of the $packetlength variable, which is not known “a priori”. This variable contains the value of the valid data in the packet, excluding some padding bytes (e.g. ethernet padding) at the end, and it is updated within the NetPDL file. For more details about how this variable is updated, please check a valid NetPDL file.

If the current data has been completely processed till $packetlength, the defaultproto does not exist; vice versa the #etherpadding protocol may still be present.

Please note that startproto, defaultproto and etherpadding protocols must always be present in a valid NetPDL file.

Example: defining the defaultproto protocol

This example shows a defaultproto protocol: all the data that has not been processed yet (which is derived subtracting the value of the $currentoffset from the $packetlength run-time variables) is assigned to this protocol.

<protocol name="defaultproto">
  <format>

    <fields>
      <field type="variable" name="payload" expr="$packetlength - $currentoffset"/>
    </fields>
  </format>
</protocol>

The "etherpadding" protocol

The etherpadding protocol is a special protocol that is present only in case there is some ethernet padding at the end of the packet. In this case, values for $framelength and $packetlength are different (the former takes into account also padding bytes, while the latter does not), and the etherpadding protocol is inserted at the end of the packet in order to take care of these 'spare' bytes.

Therefore, it appears that the etherpadding protocol (such as also defaultproto) is not linked explicitly in the NetPDL file from within any other protocol; it is under the responsibility of the NetPDL engine to check the values of the $framelength and $packetlength variables and decide which protocol (defaultproto or etherpadding) may be eventually present in the current packet.

Please note that, while the $packetlength variable must be explicitly updated within the NetPDL file, the $framelength variable is under the responsibility of the NetPDL engine and it is updated automatically upon the receipt of a new packet.

Please note that startproto, defaultproto and etherpadding protocols must always be present in a valid NetPDL file.

NetPDL Limitations

Even though the NetPDL is designed to be as general as possible, it is not a complete Turing machine and there are some protocols that cannot be described with this language.

Generally speaking, the most notably limitations of NetPDL are:

  • Repeated fields: in case a protocol contains a field that is repeated several times (e.g. within a <loop>), there is no way to specify which instance we are referring to from within an expression. NetPDL will use always the last instance of the selected field.
  • 'Spare bit fields' must be aligned to a byte/short/word boundary.
  • Trailers: fields must be defined starting from the beginning of the packet. For instance, the Ethernet CRC field is currently not supported by the current NetPDL description (or, better, it can be described, but it cannot be associated to the Ethernet frame).
  • Variable Length Trailer delimited by a token: the NetPDL is not able to describe a trailer if its position is specified by a (preceding) token; the trailer must start with an offset which can be computed by means of some previously defined fields.
  • Run-time negotiated processing algorithm: it does not support protocols which negotiate the field format at run-time. This could be the case when two entities exchange information about a mask that must be used to delimit the fields.
  • XML-related limitation: NetPDL files editing: it is not easy to edit NetPDL descriptions with a text editor
  • XML-related limitation: Validating the NetPDL file: some NetPDL choices make difficult the validation task. Particularly, some elements (e.g. <field>) change their format (attributes, supported child elements) according to the context: this is a good choice for keeping the syntax compact and to increase the readability of the document, but it complicates the document validation.
  • Field Validity Checking: current NetPDL specification does not support field validity checking, e.g. the computation of the checksum in order to verify if the computed value is correct or not.

Debugging NetPDL code

Although the special debug() function has been defined primarily for debugging expressions, it can be used to debug NetPDL processing as well.

For instance, the following code:

  <switch expr="debug($packet[$currentoffset:2] le 1500)">
    <default>
    </default>
  </switch>

will print '0' if the previous expression is not verified, or any other number in case it is.

The debug() function can be used to print a message in case some branches within the NetPDL code are executed. The message can be also rather simple, e.g. “debug('breakpoint1')”.

Appendix

Link-Layer type codes

The following table defines an unique ID for each link-layer type. This is needed in order to understand properly the value of the linktype global variable. For compatibility, these codes are equal to the ones defined in the libpcap / WinPcap libraries.

Link layerCode
Ethernet1
Token Ring6
FDDI10
 
netpdl/core_specs.txt · Last modified: 2010/07/26 08:26 by fulvio     Back to top