NetPDL Advanced Primitives

This document presents a set of NetPDL primitives that are mainly targeted to advanced protocol description and verification, and that include session tracking. Although these primitives are part of the NetPDL specification, a separate document has been created due to the (relative) complexity of these constructs.

Executing code associated to a given protocol: the <execute-code> section

Often, the declarative primitives presented in the NetPDL main document are not enough when advancing processing is required and some custom code must be used. For instance, we may need to check the correctness of a given protocol (e.g. “this packet that is on TCP port 80, is it really an HTTP packet?”), or we need to create temporary structures in order to process the protocol correctly.

These actions require executing some custom code that is defined in the <execute-code> section. The code within this section can be activated upon different events, notably at initialization time, before and after processing the protocol, and when the correctness of the protocol has to be checked.

The <execute-code> section can have four types of child elements:

ElementDescription
<init> (optional)It contains the code that has to be executed at the initialization of the NetPDL engine. For instance, it may contain variable declarations.
<before> (optional)It contains the code that has to be executed before starting the processing for the given protocol. For instance, it may contain some code that may be required to process the protocol itself (e.g. filling in some temporary variables).
<after> (optional)It contains the code that has to be executed when the processing for the given protocol is terminated. For instance, it may contain some code that may be required to process other protocols (e.g. filling in some temporary variables).
<verify> (optional)It contains the code that has to be executed in order to check the validity of a given protocol. For instance, if a protocol must have a given value in the 'version' field, this code can check this value.

Executing some custom code in the <fields> section

Although all the assignment elements (i.e. <assign-variable>, <assign-lookuptable> and <update-lookuptable>) are allowed also in the <fields> section, users are strongly encouraged not to use these elements there. In fact, these elements should be used in the <execute-code> section whenever possible.

The reason is that the NetPDL engine may stop protocol processing when no additional data is present in the packet dump. Therefore, if some execution elements are present after the last field in the protocol header, these elements may not be executed. Vice versa, these element will always be executed when placed, for example, in the <after> section; therefore, users should place executing instructions in there.

The when attribute for elements in the <execute-code> section

All these elements supports the when attribute in addition to the Standard ones:

AttributeDescription
when (optional)It contains an expression that has to be checked for determining if the section has to be executed, depending upon a given condition (e.g. a configuration variable).

The <init>, <before>, <after> (optional) and <verify> (optional) elements can appear several times within a single <execute-code> section. Each element can be executed (or not) depending on the value of the when expression. For instance, let's analyze the following NetPDL fragment:

<execute-code>
  <before when="$track_L4sessions">
    ...
  <before>
  <before when="$enable_protoverify">
    ...
  <before>

</execute-code>

Depending on the values of the $track_L4sessions and $enable_protoverify variables, the <before> sections can be executed both, or only one, or none. The when attribute is often used to avoid executing some portion of the code in case a given feature of the NetPDL is not required by the programmer, which can use the NetPDL configuration variables in order to enable or disable portions of code.

In case the when attribute is not present, that section is always executed.

Executing code at initialization time: the <init> section

This section contains the code that must be executed at initialization time, i.e. when the NetPDL engine is started. Currently, this section supports only the declaration (and initialization) of the variables; therefore only the <variable> and <lookuptable> elements are supported.

Please note that variables and lookup tables have global scope: a variable defined in the <init> section of protocol X is visible also from protocol Y. NetPDL allows defining variables within protocol only for making clearer which is the protocol that will (probabily) use that variable, but it does not limit the scope of the variable.

The <init> element does not support any attribute in addition to the when attribute and the Standard ones.

Warning. Please be careful in using the when attribute in the <init> section. For instance, if the when attribute contains a variable, this variable must be always been defined previously in the NetPDL database. In addition, please note that also in this case, variable value is the one defined in the NetPDL database. For instance, user can change the default value of configuration variables only after the initialization time.

For these reasons, is probably better avoiding the use of the when attribute in the <init> section.

Executing code before starting the processing of a protocol: the <before> section

This section contains the code that must be executed before starting processing the protocol headers. This code is usually oriented to change the value to some NetPDL variables (or lookup tables) in order to be able to process the protocol headers.

For instance, let us suppose the following fragment is present in the NetPDL file:

<protocol name="tcp">
  <execute-code>
    <before>
      ...
    </before>
  </execute-code>

  <format>
    <fields>
      ...
    </fields>
  </format>
</protocol>

In this case, the code inside the <before></before> tags must be executed in case the packet contains the TCP header, and before processing the header itself. The <before> section contains some code that is needed in order to be able to process the headers of this protocol.

In the <before> section, the following elements are allowed (also nested):

ElementDescription
<if> (optional)It contains a conditional expression and it can define the <if-true> and <if-false> branches.
<switch> - <case> (optional)It defines a switch-else section.
<assign-variable> (optional)It assignes a given value to a standard variable.
<assign-lookuptable> (optional)It assignes a given value to a field within a record inside a lookup table.
<update-lookuptable> (optional)It updates a lookup table, by adding/removing an entry in it.

Executing code after ending the processing of a protocol: the <after> section

The <after> section looks almost the same of the <before> section: the only difference is that this code is executed after the protocol headers have been processed.

For supported attributes and child elements, please refer to the <before> section.

Executing code for verifying the correctness of a protocol: the <verify> section

The <verify> section contains some code that can be used to verify if a payload belongs to a given protocol. For instance, in case of an HTTP packet, it is likely that the data payload will start with one of the following keywords: GET, POST or HTTP.

The relationship between the verification and the encapsulation section

The <verify> section is invoked each time a <nextproto-candidate> is encountered in the <encapsulation> section. Upon verification, a candidate protocol can be in the following states:

  • notfound: the verification has returned a negative result. The payload under examination does not contain the selected protocol.
  • found: the verification has returned a positive result. The payload under examination contains the selected protocol.
  • candidate: the verification has returned an intermediate result. The payload under examination seems to contain the selected protocol, but we are not 100% sure that this will be the correct protocol.
  • deferred: the verification has returned an intermediate result. From the information available at the moment, this protocol appears to be the correct one, but some more checks are required in the future in order to have a better decision.

Particularly, the candidate and deferred results require some more explanation. For the former, we can cite the Kazaa protocol, which is an HTTP packet with some special headers (x-kazaa:). Since the Kazaa protocol is, in fact, an HTTP packet, the <verify> section of the HTTP should not return 'found', otherwise we are not able to detect the Kazaa protocol correctly. The solution is to let the HTTP protocol return “candidate”, that means “this can well be an HTTP packet, but if something even more appropriate appears, let's pick that protocol”. In this case, if the Kazaa verification section will return 'found”, next protocol will be Kazaa, otherwise the HTTP dissector will be chosen.

Vice versa, a case that shows the deferred result is the RTP protocol. An RTP packet does not rely on standard port numbers and a possible heuristic for detecting RTP packets is that two following packets must have the same value in the ”SSRC” field. The first packet can trigger the insertion of the session tuple (IP source/destination, UDP port source/destination) and the SSRC field, while the second can check if its SSRC field is equal to the one stored in the session table. According to this mechanism, the first packet will return a verification result equal to 'deferred', while the second packet will return a positive result.

In case the <encapsulation> section contains multiple <nextproto> and <nextproto-candidate> entries, processing will be the following:

  • when a <nextproto> entry is encountered, the NetPDL engine will return from the <encapsulation> section because the result of this element is not subject to verification;
  • when a <nextproto-candidate> entry that returns 'found' is encountered, the NetPDL engine will return from the <encapsulation> section because the result of this element is considered valid;
  • when a <nextproto-candidate> entry that returns either 'candidate' or 'deferred' is encountered, the NetPDL engine will continue processing the following elements in the <encapsulation> section, looking for a “positive” result. In case there are no protocols returning 'found' (or no <nextproto> are encountered), the NetPDL engine will return the first 'candidate' or 'deferred' protocol encountered.

Please note that the checks defined in the <verify> sections must be done before entering in the protocol itself. Therefore, the protocol processing will start only if the check has been satisfied. The <verify> conditions must be checked when examining the <nextproto-candidate> element: before jumping to the protocol contained in that element, the NetPDL engine must check the existance (and if yes, also the correctness) of the <verify> element. In case the <verify> element return 'notfound', the NetPDL engine should select another candidate protocol as the next one in the encapsulation stack.

Syntax for the <verify> section

The <verify> section looks like other sibling code-related sections: it supports a when attribute (that defines when the section has to be executed), and it suppors the <if>, <switch> - <case>, <assign-variable>, <assign-lookuptable> and <update-lookuptable> elements. In addition, it requires the the code assigns the proper value to the $protoverify_result variable:

<protocol name="http">
  <execute-code>
    <verify>
      <if expr="hasstring($packet[$currentoffset:0], 'http/(0\.9|1\.0|1\.1) [1-5][0-9][0-9]|post [\x09-\x0d -~]* http%%/[%%01]\.[019]', 0)"/>
        <if-true>
          <assign-variable name="$protoverify_result" value="%FOUND"/>

        </if-true>
      </if>
    </verify>
  <execute-code>
</protocol>

Values allowed for the $protoverify_result variable are #NOTFOUND, #FOUND, %CANDIDATE and %DEFERRED, as presented before, which are aliases for values 0, 1, 2, 3 respectively.

Warnings

  • Please note that protocol verification is rather CPU-consuming.
  • Please note that a verification string may not be present in every packet: in order for this feature to work properly, the NetPDL code should be organized in such a way that only the interesting packets (i.e. the ones that contain the signature) are checked. This depends on the code present in the <encapsulation> section, since the <verify> element is executed only when a <nextproto-candidate> is encountered.
Updating status variables in the <verify> section

The <verify> section supports all the elements allowed in the <before> and <after> sections. However, putting the same piece of code in the <verify> and <before> sections does not look the same: the <before> section is executed only if the packet contains that protocol, while the <verify> section is executed every time we have to check if the packet contains this protocol.

The difference is clearer in case we consider a protocol P1, whose verification result is #CANDIDATE. In this case, the NetPDL engine may found a better protocol (let's say P2): if the code (e.g. the one that updates the session table with the newly identified session) is placed in the P1 verification section, it will be executed also in case the winning protocol is P2.

Therefore, please be careful in where to place the code; in general, please update status variables (e.g. session tables) in the <before> section and avoid the <verify> section for this.

Example: verify if a packet belongs to the HTTP protocol

It happens quite often than packets on TCP port 80 are not HTTP packets, because other applications use this port to escape to some firewall rules. The following code can help verifying the correctness of the HTTP protocol:

<protocol name="http">
  <execute-code>
    <verify>
      <if expr="hasstring($packet[$currentoffset:0], 'http/(0\.9|1\.0|1\.1) [1-5][0-9][0-9]|post [\x09-\x0d -~]* http%%/[%%01]\.[019]', 0)"/>
        <if-true>

          <assign-variable name="$protoverify_result" value="%FOUND"/>
        </if-true>
      </if>
    </verify>
  <execute-code>
  <format>

    <fields>
      <field type="line" name="hdrline" longname="Header line" showtemplate="FieldAscii"/>
      <!-- Other fields follow -->
    <fields>
  <format>
</protocol>

In this example the <if> element checks if the packet contains a given string (under the form of a regular expression) that corresponds to an HTTP session. Please note that this string is not present in every packet, hence it works only when a <nextproto-candidate> element is encountered.

Writing application-layer dissectors

NetPDL was not originally intended for writing application-layer dissectors; its original purpose was limited up to layer 4 dissectors. Although NetPDL has been extended to accomplish this task, there are still some limitations that cannot be avoided. One of the main NetPDL characteristics is its 'packet-based' nature, therefore a NetPDL engine stops its processing when the packet terminates (irrespective of the fact that the application-layer payload will continue on the next packet).

One of the problem we must be careful, is the problem of conditional elements inside the dissector, which is explained below.

Conditional expressions and packet boundaries

Some times, packet processing requires some kind of conditional elements in order to decide how to proceed. Let us assume for example the following fragment:

<protocol name="http">
  <format>
    <fields>
      <!-- Check if this packet contains an header -->
      <if expr="($packet[$currentoffset : 3] == 'GET') or ($packet[$currentoffset : 4] == 'POST')">

        <if-true>
          <field type="line" name="cmdline">
            <field type="tokenended" name="method" endtoken=" "/>
            <field type="tokenended" name="url" endtoken=" "/>
            <field type="line" name="version" />
          </field>

        </if-true>
        <if-false>
          <field type="line" name="statusline">
            <field type="tokenended" name="version" endtoken=" "/>
            <field type="tokenended" name="statuscode" endtoken=" " />
            <field type="line" name="reasonphrase"/>

          </field>
        </if-false>
      </if>
     ...
</protocol>

The problem appears when an expression (e.g. ($packet[$currentoffset : 3]) tries to get access to an invalid portion of the packet (e.g. outside the packet boundary). The NetPDL engine does not have much choice than to abort the transaction and return to the caller, forcing the application-layer processing to terminate. The result is that some bytes belonging to the application layer will not be associated to that dissector, and will be discarded instead.

This problem does not have a simple solution because nobody knows, a priori, if the application-layer message fits within the packet or is split between different packets. The only option we have is to avoid, whenever possible, this kind of processing (i.e. application-layer processing), which means avoiding <if> tags in application dissectors.

Vice versa, the problem may have a better solution in case some field is truncated: the NetPDL engine should assign every byte (starting from the beginning of the field to the end of the packet) to the selected field, even if the field is truncated and it continues on the next packet. However, please remember that also in this case you may experience some trouble since the field that appears in the next packet cannot be assigned to any field.

The <missing-packetdata> element

In order to solve the previous problem, the <missing-packetdata> element has been defined. This element can be present within an <if> and <loop> elements and it defines a special branch that has to be executed in case the conditional expression fails because the packet does not contain enough data to evaluate the expression itself. For instance, let us suppose the following example:

<protocol name="http">
  <format>
    <fields>
      <!-- Check if this packet contains an header -->
      <if expr="($packet[$currentoffset : 3] == 'GET') or ($packet[$currentoffset : 4] == 'POST')">
        <if-true>

          <field type="line" name="cmdline">
            <field type="tokenended" name="method" endtoken=" "/>
            <field type="tokenended" name="url" endtoken=" "/>
            <field type="line" name="version" />
          </field>
        </if-true>

        <if-false>
          <field type="line" name="statusline">
            <field type="tokenended" name="version" endtoken=" "/>
            <field type="tokenended" name="statuscode" endtoken=" " />
            <field type="line" name="reasonphrase"/>
          </field>

        </if-false>
        <missing-packetdata>
          <field type="variable" name="truncdata" expr="$packetlength - $currentoffset"/>
        </missing-packetdata>
      </if>
     ...

</protocol>

In this example, the expression in the <if> is evaluated. In case the result is true or false, the processing continues as usual. Vice versa, if the expression cannot be evaluated because the packet buffer does not have enough data in it, then the <missing-packetdata> branch is executed.

Please note that this behavior is not always respected: for example, if a regular expression is used in order to perform a test (e.g. through an extractstring() function), the current implementation of the NetPDL engine is not able to return the appropriate error code, and then the execution of the <missing-packetdata> branch will fail. However, this element is useful in most cases, allowing to “recover” an error due to the fact that the application expects some data that is missing, due to the fact that data is split over different packets.

Please note also that this element does not address the case in which the packet is truncated and one field appears to have only a portion of the data it should have. This element is useful only if an expression fails and returns the 'not enough data in the packet buffer' error code.

 
netpdl/advanced.txt · Last modified: 2010/03/31 14:57 by fulvio     Back to top