Draft - New Netpdl Extensions (experimental proposal)
Defining the protocol format
Defining the structure of the protocol headers: the <fields> element
This element is a container for all the structures that defines the header of the protocol. The protocol header starts with the first <field> or <cfield> element contained in the <fields> section.
This element does not have any attribute. It supports several child elements: <field> (I), <field> (II), <cfield>, Additional format elements, Conditional elements, and Format-based conditional elements.
Simple and complex fields
In NetPDL, simple fields are the ones in which the field value spans the entire length of the field itself. For instance, a binary field containing an IP address, whose size is 4 bytes, is a simple field.
Simple fields, i.e. the ones defined by the <field> element, are targeted mainly to simple binary protocols. However, application-layer protocols have the necessity of more complex types of fields, which have (at least one of) the following two characteristics:
- The content of the field (i.e., the part of the data we are interested in) does not match the binary footprint of the field itself. This is the case, for example, of “line” fields, which are delimited by a set of CRLF characters. However, the content of the field (i.e., the portion of the field that contains real data) does not include these “marker” characters; in other words, the ending token represents only a delimiter but does not transport any useful information with respect to the field value.
- Some fields are naturally structured, e.g., TLV or ASN.1 fields. Therefore, the definition of a complex field may automatically trigger the creation of a set of “default” subfields, such as the parts Type, Length and Value of a TLV field.
A field type that has at least of these two characteristic will be defined through the “complex field” (<cfield>) element.
Simple protocol fields: the <field> element
Simple fields, i.e. fields whose length matches information conveyed, are defined through the <field> element.
The most important attribute for a <field> element is the type attribute, which defines how the field length has to be calculated. tokenended and tokenwrapped values are now deprecated, while new values for this attribute are the following:
type value | Description |
|---|---|
pattern | It defines a special type of variable-length field whose size can be obtained from a regular expression applied on the payload. |
eatall | It defines a special type of variable-length field that takes care of all the remaining bytes with respect to the current scope. For instance, an eatall field within the main <fields> section will include all the data from that point to the end of the packet. In case this field is a child of another element, it will contain all the data from the current point to the end of its parent field. |
In addition, the <field> element can have three additional child elements:
| Element | Description |
|---|---|
<cfield>, <set>, <choice>(optional) | All the elements allowed as children of the <fields> element (i.e. all the simple protocol fields, Complex protocol fields, and all the Additional Format Elements), all the Conditional Elements, and all the Format-based Conditional Elements. In other words, fields can be nested. For more details about nested fields, please refers to the Nested fields for <field> elements section. |
New types of <field> elements
Fields based on regular expression: the pattern type
The pattern type is used to declare a dynamic field whose content can be matched by a regular expression. It targets well-specified fields (often text-based) whose pattern is well-known at specification-time. All the data matched by the pattern will be part of the field.
In addition to the standard attributes of the <field> element, the following attributes are required:
| Attribute | Description |
|---|---|
pattern(required) | It contains the regular expression that defines the pattern of this field. |
onpartialmatch (optional) | It specifies what to do if the regular expression specified in pattern returns a partial match (i.e., the match is succesful, but we do not have enough data in the packet to complete the matching). Allowed values are keep (default), that means that the field will contain the data that resulted from the partial match, and skipfield which forces the field to be discarded if the match is not completely successful. In the latter case a field with zero length may be created, depending upon the implementation of the NetPDL processing engine, and the $currentoffset variable assumes the same value that had before starting processing this field. |
On successful processing, the $currentoffset variable points to first byte after this field, i.e. on the first byte that does not match the pattern regular expression.
Please be careful to define regular expression correctly. For instance, regular expressions used by NetPDL (which follows the convention defined by pcre, whose summary is available in NetPDL Expressions) are, by default, greedy, which means that they try to match as much as they can.
For instance, an attempt to match a quoted string by applying the following pattern
\".+\"
will produce the following results:
| Input string | Matched pattern |
|---|---|
“first quoted string” plus some other text | “first quoted string” |
“first quoted string” “second quoted string” plus some other text | “first quoted string” “second quoted string” |
If the first behavior is the one the user looks for, the pattern must be written as follows:
\".+?\"
In fact, the question mark disables the greedy behavior of the quantifier, and the minimum number of bytes are matched
Example: Request Line of an HTTP header
This example shows how to define the request line of an HTTP header and its inner subfields with pattern type (although there are some more appropriate field types for this case).
<field type="pattern" pattern="[^\r\n]+\r\n" name="request_line">
<field type="pattern" pattern="[^\x20]+" name="method"/>
<field type="fixed" size="1" name="space1"/>
<field type="pattern" pattern="[^\x20]+" name="request_uri"/>
<field type="fixed" size="1" name="space2"/>
<field type="pattern" pattern="[^\r\n]+" name="http_version"/>
<field type="fixed" size="2" name="endofline"/>
</field>
The request_line field is supposed to be a text-based line. If a successful match for this field is found, the field is further decomposed in a set of six “subfields”, which include two space-delimited fields (method and request_uri), while the last one is delimited by an EOF character. Separators are defined by fixed fields, since they are known to have a fixed length.
In this case, the following Request Line:
GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1
will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
request_line | ”GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1” | Entire request line including final “newline” sequence |
method | ”GET” | HTTP method, excluding following space |
space1 | ” ” | A single space character |
request_uri | ”http://www.w3.org/pub/WWW/TheProject.html” | URL excluding surrounding spaces |
space2 | ” ” | A single space character |
http_version | ”HTTP/1.1” | HTTP version excluding final “newline” sequence |
endofline | ”\r\n” | Final “newline” sequence |
Fields that consume all the remaining data associated to the current scope: the eatall type
The eatall type is used to declare a field that spans its length over remaining bytes of the current scope. It is appropriate for the last field of a protocol
(whose size may be unknown a priori), or the last sub-field of a previously defined field. In the first case, the result is a field starting from current offset and ending at the end of the current packet. In other words, the eatall type is intended as a particular variable type of fields where expr attribute holds a well-known value. It is also useful in case of text-based fields when last variable-length sub-field of a series has to be defined.
The eatall field does not have any attribute in addition to the standard attributes of the <field> element.
Example: Request Line of an HTTP header
This example shows how one way to define the request line of an HTTP header and its inner subfields with pattern and eatall type.
<field type="pattern" pattern="[^\r\n]+\r\n" name="request_line">
<field type="pattern" pattern="[^\x20]+" name="method"/>
<field type="fixed" size="1" name="space1"/>
<field type="pattern" pattern="[^\x20]+" name="request_uri"/>
<field type="fixed" size="1" name="space2"/>
<field type="eatall" name="http_version"/>
</field>
This example looks like the one presented for the pattern type, but the last two elements are replaced with an eatall field. Processing is definitely similar, the only difference being that the last field now include also the ending line characters. Like in previous examples, some other field types may be more appropriate for such kind of processing.
In this case, the following Request Line:
GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1
will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
request_line | ”GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1” | Entire request line including final “newline” sequence |
method | ”GET” | HTTP method, excluding following space |
space1 | ” ” | A single space character |
request_uri | ”http://www.w3.org/pub/WWW/TheProject.html” | URL excluding surrounding spaces |
space2 | ” ” | A single space character |
http_version | ”HTTP/1.1\r\n” | HTTP version including final “newline” sequence |
Complex protocol fields: the <cfield> element
Complex fields include field types whose value is different from their binary footprint, or that are intrinsically structured, or that have both these characteristics at the same time.
Structured fields in application-layer protocol
NetPDL defines that fields that are naturally structured (e.g., the TLV fields) must automatically generate a precise set of subfields.
For instance, if the user defines a TLV field named mytlv in the NetPDL database, a NetPDL engine must automatically recognize fields mytlv.type, mytlv.length, and mytlv.value even if these subfields are not present explicitly within the NetPDL database.
The most important characteristic of subfields is that their lenght is known a priori, i.e. these subfields do not contain any attribute that are targeted to define their length. In fact, please note that the type attribute of the <field> element is a way to define the structure of the field, hence its length through some additional attributes (e.g. attribute size for fixed fields, or expr for variable fields, and more).
Obviously, the NetPDL defines some elements that can be used to customize the behavior of such these subfields. For example, NetPDL allows defining the visualization format of the mytlv.value subfield, as in the previous example. It is worthy noting that these elements can be used to further specify some portions of the base element, but are not required. If missing, the NetPDL engine must be able to handle these field with a default behavior.
A possible structure of a TLV field can be the following:
<!-- other fields -->
<cfield type="tlv" name="ip_timestamp_option">
<subfield portion="type" name="option_type"/>
<subfield portion="length" name="option_length"/>
<subfield portion="value" name="option_value"/>
</cfield>
The <subfield> element can have the same child elements allowed within a <fields> element. This means that a subfield can be further specified through other NetPDL elements.
In addition to the Standard Attributes, the <subfield> element supports the following attributes:
| Attribute | Description |
|---|---|
portion(required) | It defines the portion of the field we are referring to. Please note that the value contained in this attribute depends on the intrinsic structure of the parent field. Allowed values will be defined for each type of complex field. |
name(optional) | A unique name that identifies the object within its scope. If missing, the name will be the name of the parent field dot the value contained in the portion attribute. For instance, the name of the first subfield in the previous example would be ip_timestamp_option.type. |
longname(optional) | It keeps a “human” name; it may be used when the object has to be shown. If missing, the name will be the name of the parent field dot the value contained in the portion attribute. For instance, the name of the first subfield in the previous example would be ip_timestamp_option.type. |
| Visualization extension attributes (optional) | All the visualization-related attributes are allowed. If missing a default behavior (e.g. a binary view of the field value) will be used that depends on the implementation of the NetPDL engine. |
Fabio, MEMO x Fulvio: Le frasi successive qui nel testo vanno modificate in virtù dell'eliminazione dell'elemento <cfield>. Infatti non ha più senso in questo contesto avere un elemento <csubfield>, che viene assorbito da <subfield>; da questo momento in avanti è la presenza del vecchio attributo ctype (ora rinominato in subtype) che determina se sono di fronte a un sottocampo semplice o ad uno complesso.
A <csubfield> element looks like a <subfield>, but it defines a portion that is further structured. In other words, <subfields> is oriented to fields that have a plain structure (i.e., fields that could be otherwise defined with the <field> element), while <csubfields> refers to subfield that could be otherwise defined with the <cfield> element.
This results in a difference with respect to child elements: a <csubfield> can have only <subfield> and <csubfield> as child elements because we need to define the inner structure of this complex subfield.
Similar to <subfield>, also <csubfield> does not define any attribute that are used to calculate the length of the field because its length can be automatically calculated by the NetPDL engine. However, <csubfield> must have a further attribute that can be used to specify the structure of the inner subfields. In fact, the <csubfield> element supports all the attributes defined for <subfield>, plus an additional one:
Fabio: qui nella tabella ho già rinominato io l'attributo ctype in subtype e <csubfield> in <subfield>, ma un controllo incrociato non guasta.
| Attribute | Description |
|---|---|
subtype(required) | It defines the inner structure of the current portion of the field. It assumes the same values allowed for the type attribute of the <cfield> element. |
Example: definition of a subfield that is further structured
This example looks similar to the previous one, but the value portion of the field is structures as another TLV field:
<cfield type="tlv" name="ip_timestamp_option">
<subfield portion="type" name="option_type"/>
<subfield portion="length" name="option_length"/>
<subfield portion="value" subtype="tlv" name="option_value">
<subfield portion="type" name="inner_option_type"/>
<subfield portion="length" name="inner_option_length"/>
<subfield portion="value" name="inner_option_value"/>
</subfield>
</cfield>
Different types of <cfield> elements
Complex fields are defined by the <cfield> element.
Similarly to the <field> element, type is the most important attribute of a <cfield> element. It defines not only how the field length has to be calculated (as it already does with <field> elements), but also the values of the “portion” attribute of inner <subfield>/<csubfield> elements.
The type attribute of a <cfield> element can assume the values listed in the following table. Please note that the table defines also if the field is intrinsically structured (i.e. it defines a set of subfields) and/or its footprint is different from its semantic value (e.g., the value of a line field does not include the \r\n ending character, while the footprint does).
type value | Structured | Footprint != Value | Description |
|---|---|---|---|
tlv | Yes | No | It defines a structured field made up of three subfields: type, length and value (with the latter representing the real content of the field). |
delimited | No | Yes | It defines variable-length field started by an optional preceding token and terminated by an optional ending token. This can be a field whose position is started (or not) by a marker and is terminated by a special character (e.g. ”;”) or a special string, while the actual value is enclosed between such markers. |
line | No | Yes | It defines a special type of delimited field, i.e. a field which the actual value is delimited by a “new line”. This field type has been defined because its extensive usage in current application-layer protocols. |
hdrline | Yes | Yes | It defines a special type of delimited and structured field, i.e. a field which is terminated by a “new line” and not followed by a horizontal space ” ” or ”\t”, made up of two sections: header name and header value (the latter is the content). |
dynamic | Yes | Yes | It defines a special “last-resort” structured field defined by an overall regular expression whose named subpatterns are the relevant subfields. |
asn1 | Yes | Yes | It defines a structured field observing ASN.1 BER encoding rules, similar to tlv fields. |
xml | Yes | No | It defines a structured field containing a complete XML document with relevant prolog and body. |
Child elements of a <cfield> element
A <cfield> element may have different types of child element according to these three cases.
| Condition | |
|---|---|
| Non-structured complex field | |
| Structured complex field | If the complex field is intrinsically structured, it supports only <subfields> and <csubfield>, unless the xml type. |
Structured complex field whose type is xml | If the complex field has type=“xml”, it supports only the <map> element as child. |
Fields having a Type-Length-Value internal structure: the tlv type
A tlv type is used to declare common structured field that is composed by the following parts:
- Type: specifies the field type, i.e. syntax and semantic of the information that will be contained in the next two parts;
- Length: specifies the length of the Value portion of the field. Usually, the length of this field is fixed; occasionally it depends on the value contained in the Type part;
- Value: the actual valuable information contained in this field. The length of this part is given by the Length portion of this field.
In addition to the standard attributes of the <cfield> element, in case of tlv complex fields more attributes are required:
| Attribute | Description |
|---|---|
tsize(required) | It defines the size of the Type subfield, in number of bytes. |
lsize(required) | It defines the size of the Length subfield, in number of bytes. |
vexpr(optional) | It contains an expression that can calculate the size of the Value part at run-time. This can be required in case of TLV fields in which the Size portion keeps the size of the Value portion in multiple of 4 bytes or such. |
Portions defined in tlv fields
A tlv field defines an inner structure composed by three portions. Here there are the values allowed in the portion attribute of any possible subfield, with their description and the default values for the name attribute in case the associate subfield is missing:
portion attribute | Description | Default value for name attribute |
|---|---|---|
tlv.type | It defines the Type portion of a tlv field | type |
tlv.length | It defines the Length portion of a tlv field | length |
tlv.value | It defines the Value portion of a tlv field | value |
Example: deep decoding of an IP Option Header
This example presents a possible structure of an IP Option Header, which is a TLV field with the following characteristics:
- the
typeportion has to be further specified (different bits have different meanings) - the
lengthportion keeps the entire length of the option, not only the length of thevaluepart.
The resulting definition will be the following:
<cfield type="tlv" tsize="1" lsize="1" vexpr="buf2int(ip_option.length)-2" name="ip_option">
<subfield type="tlv.type" name="option_type">
<field type="bit" size="1" mask="0x80" name="copy"/>
<field type="bit" size="1" mask="0x60" name="class"/>
<field type="bit" size="1" mask="0x1F" name="option"/>
</subfield>
</cfield>
The tlv.type portion has been explicitly specified by a <subfield>, which is further specified with a set of bit fields.
This example makes use also of the vexpr attribute, which contains the length of the value part. In order to get the correct value, we have to get access to the length portion of the ip_option field (i.e., current field), hence the expression includes the ip_option.length field. The name of the length portion is the default one, because no further specification for this portion has been done (i.e. no <subfield type==“tlv.length”> is available in the protocol description). If this definition were present, the name of the portion would be replaced by the name of the field itself, as it can bee seen in the following example:
<cfield type="tlv" tsize="1" lsize="1" vexpr="buf2int(ip_option.option_length)-2" name="ip_option">
<subfield type="tlv.length" name="option_length"/>
</cfield>
The length and value portions in the original examples were not specified; hence the NetPDL assume must use default values.
Token-delimited fields: the delimited type
The delimited type is used to declare complex fields whose actual value can be included within a starting preamble and an ending trailer. These special sequences of bytes are only separator characters and therefore are not part of the field. Therefore, the actual value of the field will be the portion that stays between the starting and ending delimiter. In any case, preamble or trailer may be missing, leading to a field that starts or that ends with a specific token. Preamble and trailers are specified through a regular expression, enabling the usage of simple string-based markers but also more complex delimiters.
In addition to the standard attributes of the <cfield> element, the delimited field supports following attributes:
| Attribute | Description |
|---|---|
beginregex (optional) | It defines the regular expression that contains the string that precedes this field. If missing, the field starts immediately, without any bytes associated to the preamble. |
endregex (optional) | It defines the regular expression that contains the trailer sequence. If missing, the field terminates at the current scope (i.e., it consumes all the bytes from the current offset to the ending offset associated to current scope). |
onmissingbegin (optional) | It specifies how the current delimited field must behave if the starting preamble does not matches. It can assume the following values: continue (default) means that the field value starts from the beginning of the field (i.e. as if the starting preamble is not required); skipfield means that processing aborts, and the field value assumes an empty value. |
onmissingend (optional) | Similar to the onmissingbegin attribute, but it refers to the ending trailer. In case the trailer sequence does not match the endregex attribute and this attribute is equal to continue (default), the field becomes similar to an eatall field. Vice versa, if the value is skipfield, the field will be skipped in case the ending regular expression does not match. |
Briefly, the processing of a delimited field starts by matching the beginregex expression to the field content. On successful matching, these bytes are associated to the preamble, and the starting offset of this field is defined as the next byte in the packet data. Then, packet payload is analyzed for the endregex token. When a match is found, these bytes are associated to the trailer, and the data associated to field value will terminate with the last byte (in the packet payload) before the beginning of the ending trailer.
In case of successful processing, each delimited field will return the actual value without delimiters. The advantage in use delimited type than other types is in less verbosity and more readability of description because automatic update of the internal offset avoids declaration of other NetPDL fields to decode trivial separators.
The regular expressions specified in beginregex and endregex must be a string written according to the same rules (and limitations) defined for NetPDL string operands, with the exception that the string must not be delimited by the ” ' ” character. For instance, endregex=“\x0D\x0A[^\x09\x20]” means that the field ends when two (consecutive) bytes of values \x0D\x0A are found in the packet data, and these characters are not followed by \x09 (i.e. the horizontal tab) or \x20 (i.e. the space).
Fulvio: qui c'e' da definire cosa succede dei vari offset nel PDML. In questo caso, l'offset del campo e' quello associato al valore, oppure l'offset e' quello relativo all'inizio del valore? Lo stesso problema si ripresenta anche con altri campi, ad es ASN.1. Fabio: in PDML l'offset e' il footprint del campo.
Portions defined in 'delimited' fields
A delimited field may include preamble and trailer, but it is not intrinsecaly structured. Therefore, this field does not support any <subfield> child element.
Example: overall decoding of HTTP date based on RFCs 822/1123
This example presents a possible descriptions for the date/time format defined by RFCs 822/1123, which can be found in an HTTP header:
<cfield type="delimited" endregex=", " name="wkday" />
<!-- Date is a 2digits, space, 3 letters, space, 4 digits -->
<field type="pattern" pattern="[:digit:]{2}\x20[:alpha:]{3}\x20[:digit:]{4}" name="date"/>
<cfield type="delimited" beginregex=" " endregex=" " name="time"/>
<field type="fixed" name="gmt_str" size="3"/>
In case the data present in the packet payload is Sun, 06 Nov 1994 08:49:37 GMT, fields will assume the following values:
| Field name | Value |
|---|---|
wkday | “Sun” |
date | “06 Nov 1994” |
time | “08:49:37” |
gmt_str | “GMT” |
In other words, the NetPDL engine starts from the wkday field, which does not include a preceding token) and it terminates with the , (comma space) string. Then, it continues with the other pattern field, which defines a well-formatted field.
Example: deep decoding of HTTP date based on RFCs 850/1036
This example shows a way to perform a deep decoding of a date format defined by RFCs 850/1036, another date format allowed within a header value of some HTTP header field.
<cfield type="delimited" endregex=", " name="weekday"/>
<cfield type="delimited" endregex=" " name="date">
<cfield type="delimited" endregex="-" name="day"/>
<cfield type="delimited" endregex="-" name="month"/>
<cfield type="eatall" name="year"/>
</cfield>
<cfield type="delimited" endregex=" " name="time">
<cfield type="delimited" endregex=":" name="hour"/>
<cfield type="delimited" endregex=":" name="min"/>
<cfield type="eatall" name="sec"/>
</cfield>
<cfield type="fixed" name="gmt_str" size="3"/>
Completing this example, date Sunday, 06-Nov-94 08:49:37 GMT will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
weekday | ”Sunday” | |
date | ”06-Nov-94” | |
day | ”06” | |
month | ”Nov” | |
year | ”94” | |
time | ”08:49:37” | |
hour | ”08” | |
min | ”49” | |
sec | ”37” | |
gmt_str | ”GMT” |
Example: deep decoding of a SMTP recipient
This example shows how delimited element can be used to decode the RCPT TO command of the SMTP protocol, i.e. a line whose content is an e-mail address optionally preceded by its alias name (e.g. a quoted string).
<!-- Field starts with RCPT TO followed by a comma. Any space, tab or \n\r before or after the comma are discarded -->
<!-- Field ends with a \r\n string that is not followed by a tab or a space -->
<cfield type="delimited" beginregex="RCPT TO[ \t\r\n]*:[ \t\r\n]*" endregex="\r\n(?!\t| )" name="recipient">
<!-- This field MUST begin with a double quotation mark. If missing, the field must be skipped -->
<cfield type="delimited" beginregex="\x22" endregex="\x22[ \t\r\n]*" onmissingbegin="skipfield" name="alias" />
<!-- This field starts with "<" and ends with ">". Hex codes are due because these are reserved chars in XML -->
<cfield type="delimited" beginregex="\x3C" endregex="\x3E" name="email" longname="Email address"/>
</cfield>
Completing this example, command RCPT TO: <foo@bar.com> will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
recipient | ”<foo@bar.com>” | |
alias | (no value) | This field is missing because its beginregex failed, and the onmissingbegin attribute mandates to skip the field in case the initial marker cannot be found. |
email | ”foo@bar.com” | This field contains the email address, i.e. the value between '<' and '>' markers. |
Textual dynamic fields with carriage return delimiter: the line type
A line type is a particular case of delimited textual field in which the ending token is equal to a “new-line” string. This type has been defined because of the large presence of text lines in higher level protocols, such as HTTP, SMTP and many others.
The line field does not have any attribute in addition to the standard ones defined for the <cfield> element.
Processing of this complex field is very simple: starting from current offset the NetPDL engine looks for a “newline” sequence. In case the “newline” marker cannot be found, the consumed data will be assigned to the field itself. Vice versa, in case of a successful match, the $currentoffset variable points to first byte after “newline” string. In any case, the value of a line field does not include the “newline” ending character.
A line element is equivalent to the following delimited:
<field type="delimited" endregex="\r\n" onmissingend="continue"/>
Portions defined in 'line' fields
A line field is not intrinsecaly structured. Therefore, this field does not support any <subfield> child element.
Example: deep decoding of SIP First Line
A possible description of the first line of the SIP protocol is the following:
<cfield type="line" name="sip_firstline">
<field type="delimited" endtoken=" " onmissingend="abort" name="method"/>
<field type="delimited" endtoken=" " onmissingend="abort" name="request_uri"/>
<field type="eatall" name="sip_version"/>
</cfield>
In this case, this SIP Request Line:
SUBSCRIBE sip:user@domain.com SIP/2.0\r\n
will be processed as follows:
| Field name | Value | Notes |
|---|---|---|
sip_firstline | ”SUBSCRIBE sip:user@domain.com SIP/2.0” | Actual value without “newline” sequence |
method | ”SUBSCRIBE” | |
request_uri | ”sip:user@domain.com” | |
sip_version | ”SIP/2.0” |
It is worthy noting that after processing the sip_firstline element, the $currentoffset variable will will point to the first byte after the “newline” sequence.
In addition, also inner fields (i.e. requerst_uri and sip_version) receive the actual value of the parent field, i.e. a string that does not contain the “newline” sequence.
Textual fields with header-style structure: the hdrline type
An hdrline field is used to describe textual header fields, i.e. a field made up of two subfields separated by a token and terminated by a “newline” character with no following horizontal space (neither “blank space” nor “horizontal tab”). This field type is largely used in text-based application-level protocols, like HTTP, SIP and many others.
The internal structure of this field is the following:
- Header Name: specifies the header type, i.e. syntax and semantic of the information transported;
- Separator: a token that acts as a separator between the header name and the associated value;
- Header Value: the actual information contained in the field, whose structure can change according to the Header Name.
- EOL: the end-of-line terminator.
In addition to the standard attributes of the <cfield> element, the hdrline field supports the following attribute:
| Attribute | Description |
|---|---|
sepregex(required) | It contains the separator token under the form of a regular expression that defines the separator between Header Name and Header Value. |
Processing of this field is similar to the line one: the matching of a “newline” sequence is done first (please note that the newline is, in regular expression terms, \r\n(?!\t| )).
If there is a positive full-match and the field length is greater than zero, the processing of this field will continue; otherwise, the processing will fails.
After a successful completion of this first phase, a second matching deals with the separator sequence, which is searched withing the bytes returned by the first step. If the separator is found, the field value is further split between the part before the separator (the header name) and the part after the separator (the header value). The separator sequence does not belong neither to the header name, nor to the header value. If the separator is not found, the entire input data is assigned to the header name leaving the header value to an empty field.
The regular expression specified by sepregex must be a string written according to the same rules (and limitations) defined for NetPDL string operands, with the exception that the string must not be delimited by the ”'” character. For instance, sepregex = "[\x09\x20]*:[\x09\x20]*" means that the separator of the current header field is a byte of value \x3A possibly surrounded by several bytes \x09 (i.e. the horizontal tab) or \x20 (i.e. the space), and this sequence must be present before the newline sequence.
Portions defined in 'hdrline' fields
An hdrline field defines an inner structure composed by two portions. Here there are the values allowed in the portion attribute of any possible subfield, with their description and the default values for the name attribute in case the associate subfield is missing:
portion attribute | Description | Default value for name attribute |
|---|---|---|
hdrline.hname | It defines the Header Name portion of an hdrline field | hname |
hdrline.hvalue | It defines the Header Value portion of an hdrline field | hvalue |
Example: simple decoding of a generic HTTP header field
We can describe a generic HTTP field as follows:
<cfield type="hdrline" sepregex="(\r\n)?[\t ]*:(\r\n)?[\t ]*" name="http_hfield"/>
The separator between header name and header value is usually the colon space sequence (”: ”); however the specification permits an arbitrary amount of linear white spaces (the sequence \r\n can appear zero or one times, while we may have an arbitrary number of tab and space characters).
A NetPDL engine will make available three fields: http_hfield contains the overall header without “newline” character; http_hfield.hname contains header name (bytes on the left side of separator); http_hfield.hvalue contains header value (remaining bytes from the right side of separator).
Completing this example, the following Accept HTTP header field:
Accept: text/plain; q=0.5, text/html,
text/x-dvi; q=0.8, text/x-c
will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
http_hfield | ”Accept: text/plain; q=0.5, text/html,text/x-dvi; q=0.8, text/x-c” | Base field |
hname | ”Accept” | First subfield |
hvalue | ”text/plain; q=0.5, text/html,text/x-dvi; q=0.8, text/x-c” | Second subfield |
Example: deep decoding of the Accept HTTP header field
This example is more complex of the previous one and shows a pssible description for the Accept HTTP header.
<cfield type="hdrline" sepregex="(\r\n)?[\t ]*:(\r\n)?[\t ]*" name="accept_header">
<subfield type="hdrline.hvalue" name="accepted_format">
<loop type="while" expr="1">
<cfield type="delimited" endregex="(\r\n)?[\t ]*,(\r\n)?[\t ]*" name="format">
<cfield type="delimited" endregex="(\r\n)?[\t ]*;(\r\n)?[\t ]*" name="media_type"/>
<loop type="while" expr="1">
<cfield type="delimited" endregex="(\r\n)?[\t ]*(;|,)(\r\n)?[\t ]*" name="accept_param"/>
</loop>
</cfield>
</loop>
</subfield>
</cfield>
In this case, we describe the accept_header as a standard hdrline type. Then, we further specify the hvalue portion through a <subfield>, while we use the standard description of the hname portion.
The hvalue portion is made up of a set of strings terminated by a comma. White spaces (”(\r\n)?[\t ]*”; see previous exanple) can appear before and after the comma.
In addition, each token is further specified by an inner delimited field in order to split the first portion of the string (i.e. the media type) from the second (optional) portion containing the accept parameter.
Completing this example, the following Accept HTTP header field:
Accept: text/plain; q=0.5, text/html,
text/x-dvi; q=0.8, text/x-c
will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
accept_header | ”Accept: text/plain; q=0.5, text/html,text/x-dvi; q=0.8, text/x-c” | |
hname | ”Accept” | This is created through a default <subfield> |
accepted_format | ”text/plain; q=0.5, text/html,text/x-dvi; q=0.8, text/x-c” | |
format (1) | ”text/plain; q=0.5” | |
media_type (1) | ”text/plain” | |
accept_param (1) | ”q=0.5” | |
format (2) | ”text/html; q=0.5” | |
media_type (2) | ”text/html” | |
format (3) | ”text/x-dvi; q=0.8” | |
media_type (3) | ”text/x-dvi” | |
accept_param (2) | ”q=0.8” | |
format (4) | ”text/x-c” | |
media_type (4) | ”text/x-c” |
Fields based on regular expression with named sub-patterns: the dynamic type
The dynamic field type is used to declare a structured field whose size can be defined by a comprehensive regular expression and whose subfields are the named sub-patterns specified within the regular expression itself. Its declaration requires many <subfield>/<csubfield> elements equal to the number of the named sub-patterns present in the regular expression.
In addition to the standard attributes of the <cfield> element, the dynamic field supports the following attribute:
| Attribute | Description |
|---|---|
pattern(required) | It defines the regular expression containing named sub-patterns that allows to determine the size of the complex field at run-time. |
The processing of this complex field is based on regular expression specified by pattern attribute: starting from current offset, matching between this regular expression and packet content is performed. The matching MUST include all initial bytes from the current point. If there is a positive match the subfield under examination will be retrieved for subsequent processing; otherwise, the decoding process aborts.
When a subpattern is found, the string is isolated from the rest of the packet and is assigned to the <subfield> element whose name is equal to the named subpattern. For sake of compatibility with pcre syntax, each sub-pattern MUST have a unique identifier consisting of up to 32 alphanumeric characters and underscores.
Portions defined in dynamic fields
A dynamic field defines an inner structure composed by a set of portions equal to the number of the subpattern present in the regular expression.
When present, <subfield> and <csubfield> must have the same type of the name defined in the subpattern. Furthermore, the default value for the name attribute in case the associate subfield is missing is equal to the name of the subpattern as it appears in the regular expression:
portion attribute | Description | Default value for name attribute |
|---|---|---|
| name present in the named subpattern | It defines portion of the fiels that is associated with the given subpattern | name present in the named subpattern |
The number of subfields present in a dynamic field is not known at specification time, since it depends on the number of subpatterns present in the regular expression.
Example: Request Line of an HTTP header
This example presents a possible description for the Request Line of an HTTP header:
<cfield type="dynamic" pattern="(?'m'[^ ]+).(?'ru'[^ ]+).(?'hv'[^\r\n]+)\r\n" name="request_line">
<subfield type="m" name="method"/>
<subfield type="ru" name="request_uri"/>
<subfield type="hv" name="http_version"/>
</cfield>
It is evident that this example looks more elegant than the one provided for the pattern type.
The NetPDL engine evaluates the regular expression within pattern attribute starting from current point. On successful matching, m, ru, and hv named sub-patterns are available and are assigned to the associated subfields.
Completing this example, the following Request Line:
GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1
will be processed in the following way:
| Field name | Value | Notes |
|---|---|---|
request_line | ”GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1” | Entire request line including final “newline” sequence |
method | ”GET” | space-free HTTP method |
request_uri | ”http://www.w3.org/pub/WWW/TheProject.html” | space-free URL |
http_version | ”HTTP/1.1” | space-free HTTP version |
FULVIO: rimane evidente che con questo sistema abbiamo la possibilita' di descrivere tanti tipi di campo, ma non abbiamo la possibilita' di definire l'ereditarieta', ossia campi multipli che prendono da descrizioni diverse. Ad esempio, sarebbe utile se il campo precedente potesse essere un insieme tra il dynamic e il line per scartare il newline finale, solo che al moemnto non si puo' fare. E' fattibile, ma con campi annidati, che vuol dire avere N campi NetPDL uno dentro l'altro.
ASN.1-based fields: the asn1 type
An asn1 type is used to define complex fields that are encoded according to the ASN.1 specification. Currently, ony the BER (Basic Encoding Rule) is supported. ANS.1 fields define a sort of more sophisticated TLV structure as follows:
- Identifier Octet: specifies the field type, i.e. syntax and semantic of the information conveyed; it is further structured in:
- Class: defines the scope of the type;
- Primitive/Constructed: defines if the field is a base type or it is intrinsecally structured;
- Tag Number: defines a numeric value identifier, strictly-related to Class, that identifies the type;
- Length Octet: specifies the length of the actual information;
- Contents Octet: contains the actual value of the field, whose format is given by the Identifier Octet and whose length is given by the Length Octet;
- End-of-Contents Octet: defines the end of Contents Octet only in cases of Indefinite Form for the Length Octet.
The asn1 field does not have any attribute in addition to the standard ones defined for the <cfield> element.
Portions defined in 'asn1' fields
Although the asn1 structure is rather complex, what really matters is the field value. Therefore the asn1 type does not define any inner structure. Field value will be equal to the Content Octect of the ASN.1 field, while the other bytes are discarded.
Example: SNMPv1 Message
This example shows how SNMPv1 messages can be described. Let's start from the ASN.1 definition of a generic SNMPv1 message:
Message ::=
SEQUENCE {
version -- version-1 for this RFC
INTEGER {
version-1(0)
},
community -- community name
OCTET STRING,
data -- e.g., PDUs if trivial
ANY -- authentication is being used
}
Such a message is made up of one ASN.1 field (whose base type is the ASN.1 native SEQUENCE type) which is the sum of three different ASN.1 fields: version, community, data. Translation to NetPDL Syntax is very simple and it results as follows:
<cfield type="asn1" name="message">
<cfield type="asn1" name="version"/>
<cfield type="asn1" name="community"/>
<cfield type="asn1" name="data"/>
</cfield>
It is interesting to note that, while version and community are primitive types, data is an ASN.1 field that is further structured. Therefore, we may be able to further specify that field through a proper set of <csubfield> coupled with a check that discriminates different SNMP PDU types.
Completing this example, the following SNMPv1 Message-like bytes (ASN.1 BER encoding):
HEX 0 1 2 3 4 5 6 7 8 9 A B C D E F 000x 30 2A 02 01 00 04 09 53 4E 4D 50 5F 74 72 61 70 001x A1 1A 02 02 00 C3 02 01 00 02 01 00 30 0E 30 0C 002x 06 08 2B 06 01 02 01 04 02 00 05 00
will be processed as follows:
| Field name | Value | Notes |
|---|---|---|
message | ”02 01 00 04 09 53 4E 4D 50 5F 74 72 61 70 A1 1A 02 02 00 C3 02 01 00 02 01 00 30 0E 30 0C 06 08 2B 06 01 02 01 04 02 00 05 00” | Starting from the beginning of the packet dump, first byte ”30” is the Leading Octet of Identifier Octet indicating the SEQUENCE structured built-in type, while subsequent byte ”2A” is the Leading Octet of Length Octet reporting the size of 42 bytes for following Contents Octet. The actual field value starts then with the Content Octet, which begins at offset 3. |
version | ”00” | Starting from Contents Octet of message field, byte ”02” belongs to Identifier Octet (it indicates the INTEGER primitive built-in type), while following byte ”01” belongs to Length Octet. Therefore, the actual value of this field is ”00”. |
community | ”53 4E 4D 50 5F 74 72 61 70” | This field begins after the previous one. Byte ”04” belongs to Identifier Octet (it indicates the OCTET STRING primitive built-in type), while following byte ”09” belongs to Length Octet. The actual value of this field starts after these bytes and it is equal to ”SNMP_trap” (in ASCII). |
data | ”02 02 00 C3 02 01 00 02 01 00 30 0E 30 0C 06 08 2B 06 01 02 01 04 02 00 05 00” | After the bytes associated to the community field, byte ”A1” belongs to the Identifier Octet of this new field (it indicates a context-specific ASN.1 type), while following byte ”1A” defines a Contents Octet field of size 26 bytes. Hence data refers to twenty-six bytes after these two and it contains other encoded ASN.1 fields. |
Fields containing XML documents: the xml type
The xml type is oriented to describe XML data. However, this field type operates differenty compared to previous field types because it is oriented to describe XML documents and not XML fields.
In fact, NetPDL considers an XML document as a single field, which can be internally structured. The xml field type is used to describe the whole document, which can be structured in two portions:
- XML Prolog: contains Processing Instructions and an optional Schema Declaration;
- XML Body: the actual value content of the XML document, which may be structured according to the definition contained into a DTD or XML Schema companion document.
The xml element is different from the other <cfield> with respect of its supported child elements. In this case, a <cfield type=“xml”> element supports only the following child element:
| element | Description |
|---|---|
map (optional) | It contains a more in-depth description of the XML element. For more information, please go to the <map> element. |
The xml field type may appear useless; in fact, its usage is usually coupled with the <map> element, which will be presented later. The <map> element is in charge of a more accurate definition of the XML document and permits to describe each single XML element within the XML document.
In addition to the standard attributes of the <cfield> element, the xml field supports the following attribute:
| Attribute | Description |
|---|---|
size (optional) | It contains a NetPDL expression that defines the entire size of the XML document. In case is missing, the NetPDL must be able to detect (e.g. through a proper regular expression) the ending point of the XML document. In this case, some ending space characters may not be included in the XML document. |
Portions defined in xml fields
An xml field defines an inner structure composed by two portions. Here there are the values allowed in the portion attribute of any possible subfield, with their description and the default values for the name attribute in case the associate subfield is missing:
portion attribute | Description | Default value for name attribute |
|---|---|---|
xml.prolog | It defines the XML Prolog portion of an xml field | prolog |
xml.body | It defines the XML Body portion of an xml field | body |
Example: simple decoding of an XML document contained in a SIP message
This example shows a part of the NetPDL description related to the SIP protocol, with respect to the portion that describes the optional protocol contained in a SIP body message. Particularly, it refers to the part that describes the presence of an XML fragment aimed to notify e-presence messages.
<!-- ... -->
<!-- SIP header fields -->
<!-- Please note that this simple description supposes fields are present and are in order -->
<cfield type="hdrline" sepregex="(\r\n)?[\t ]*:(\r\n)?[\t ]*" name="content_type">
<cfield type="hdrline" sepregex="(\r\n)?[\t ]*:(\r\n)?[\t ]*" name="content_len">
<!-- ... -->
<!-- SIP message body -->
<!-- FULVIO: siamo sicuri che l'espressione si scriva veramente cosi'? L'abbiamo scritta da qualche parte? -->
<switch expr="ascii2int(content_type.hvalue)">
<!-- ... -->
<case value="'application/pidf+xml'">
<!-- Fulvio: qusto esempio e' stato provato? Perche' qui buf2int() non funz, se non prendo un granchio-->
<cfield type="xml" size="buf2int(content_len.hvalue)" name="e_presence"/>
</case>
<!-- ... -->
</switch>
Completing this example, the following XML-like payload fragment:
<?xml version="1.0" encoding="UTF-8"?>
<isComposing xmlns="urn:ietf:params:xml:ns:im-iscomposing">
<state>active</state>
<refresh timeunit="sec">60</refresh>
</isComposing>
will be processed as follows:
| Field name | Value | Notes |
|---|---|---|
e_presence | ”<?xml version="1.0" encoding="UTF-8"?> <isComposing xmlns="urn:ietf:params:xml:ns:im-iscomposing"> <state>active</state> <refresh timeunit="sec">60</refresh> </isComposing>” | The entire XML Document |
prolog | ”<?xml version="1.0" encoding="UTF-8"?>” | The XML Prolog portion |
body | ”<isComposing xmlns="urn:ietf:params:xml:ns:im-iscomposing"> <state>active</state> <refresh timeunit="sec">60</refresh></isComposing>” | The XML Body portion |
Defining the precise XML structure: the 'map' element
The NetPDL language does not aim at creating a new (and semantically equivalent) form for an XML Schema or DTD. Such these documents aim at defining the creating rules for the allowed structure of an XML document. Instead, NetPDL wants to define a way to describe the content of an XML document.
While the xml field enables to detect an entire XML section, the new <map> element enables the in-depth description of how the XML section may looks like.
In fact, a <map> element looks like a recognizer that tries to locate an XML element.
In addition to Standard Attributes, the <map> element supports following attributes:
| Attribute | Description |
|---|---|
type (required) | It defines the map type, i.e. the type of XML syntactical component desired (XML element, XML Processing Instruction, etc.). Allowed values are defined in the following table. |
srcref (required) | It defines the name of the XML element thas has to be located within the XML document.\\Warning: This attribute is optional if the map type is equal to xml.doctype. |
name (optional) | It defines a unique name that identifies the object within its scope. |
longname (optional) | It defines a 'human' name; it may be used when the object has to be shown. |
Allowed values for the type attributes are the following:
type value | Description |
|---|---|
xmlpi | It defines a NetPDL element that will describe an XML Processing Instruction, i.e. a XML element starting with ”<?” and ending with ”?>”. |
xmldoctype | It defines a NetPDL element that contains the XML Doctype declaration, i.e. a XML element starting with <!DOCTYPE and ending with >. |
xmlelement | It defines a NetPDL element that contains a standard XML element (within the body part). |
The <map> does not have child elements.
<!–
FULVIO Mi sembra che con XML ci siano ancora un po' di problemi.
- Come faccio a filtrare sull'attributo di un campo? Ad esempio, “cattura i pacchetti in cui “timeunit=sec”?
- Non e' definita la gerarchia. Se ho un oggetto piu' complesso con tanti elementi XML annidati uno nell'altro, come faccio a descriverlo?
- Quando definisco un xmlelement, il valore di quel campo è solo il contenuto, oppure l'intero elemento (compreso di tutti gli attributi?)
Possibile soluzione: definire che e' un campo strutturato con value e attributes, e poi gli attributi possonoe ssere creati dinamicamente. QUindi io posso filtrare su “element.attributes.myattribute == value”. Il compilatore deve essere in grado dicapire che quello che segue “attributes” e' un valore dinamico che fa parte degli attributi XML (non noti a priori).
Valore di 'element' e' l'intero campo comunque, da ”<” al tag di chiusora ”>”
Inoltre: salvo qui un po' di testo che ho trovato in giro e che non mi sento di validare (al momento).
This type enables the attributes: attrview, which can assume the following values: no (default): the attributes of the mapped element are not decoded; yes: the attributes of the mapped element are decoded with equivalents NetPDL fields whose name is that of the attribute and value is its relevant value; namespace, which keeps the name of namespace whose the mapped element belongs; hierarchy, which stores the XML elements path to observe to find the mapped element.
–>
Example: deep decoding of an XML content within SIP protocol with <map> element
This example is similar to the previous one, but it is able to describe more accurately the XML portion of a SIP message.
<!-- ... -->
<!-- SIP header fields -->
<!-- Please note that this simple description supposes fields are present and are in order -->
<cfield type="hdrline" sepregex="(\r\n)?[\t ]*:(\r\n)?[\t ]*" name="content_type">
<cfield type="hdrline" sepregex="(\r\n)?[\t ]*:(\r\n)?[\t ]*" name="content_len">
<!-- ... -->
<!-- SIP message body -->
<!-- FULVIO: siamo sicuri che l'espressione si scriva veramente cosi'? L'abbiamo scritta da qualche parte? -->
<switch expr="content_type.hvalue">
<!-- ... -->
<case value="'application/pidf+xml'">
<!-- Fulvio: qusto esempio e' stato provato? Perche' qui buf2int() non funz, se non prendo un granchio-->
<cfield type="xml" size="buf2int(content_len.hvalue)" name="e_presence">
<map type="xmlpi" srcref="xml" attrview="yes"
name="xml_firstline" longname="First XML Line"/>
<map type="xmlelement" srcref="state" hierarcy="isComposing"
name="state_comp" longname="State of Composition"/>
<map type="xmlelement" srcref="refresh" hierarcy="isComposing"
name="refreshtime" longname="Refresh Time"/>
</cfield>
</case>
<!-- ... -->
</switch>
Completing this example, the following XML-like payload fragment:
<?xml version="1.0" encoding="UTF-8"?>
<isComposing xmlns="urn:ietf:params:xml:ns:im-iscomposing">
<state>active</state>
<refresh timeunit="sec">60</refresh>
</isComposing>
will be processed as follows:
FULVIO Questo esempio andrebbe un po' rivisto alla luce di quanto detto nel precedente commento. In aggiunta, non sono convinto che i campi si possano chiamare “xml_firstline.version” oppure “version”, indifferentemente.
Risposta: il campo nel PDML si chiama 'version', ma per accederci da filtering bisogna fare parentfield.version
| Field name | Value | Notes |
|---|---|---|
e_presence | ”<?xml version="1.0" encoding="UTF-8"?> <isComposing xmlns="urn:ietf:params:xml:ns:im-iscomposing"> <state>active</state> <refresh timeunit="sec">60</refresh> </isComposing>” | The entire XML Document |
xml_firstline | ”<?xml version="1.0" encoding="UTF-8"?>” | |
xml_firstline.versionor version | ”1.0” | |
xml_firstline.encodingor encoding | ”UTF-8” | |
state_comp | ”active” | |
refresh | ”60” | |
refresh.timeunitor timeunit | ”sec” |
Fulvio Non sono convinto di quanto scritto qui
Fabio Il problema vero è l'estrema dinamicità del codice xml, motivo per cui individuare il namespace in modo corretto (lo definiscono alternativamente un append sul tag di apertura o l'attributo xmlns) introduce dei ritardi non trascurabili nella decodifica perché complica le relative regex. Siccome il nostro obiettivo è quello di mantenere l'efficienza, abbiamo stabilito di eliminare il supporto al namespace. Io scriverei una cosa del tipo (introducendo una possibile soluzione che ho elaborato al volo):
“I campi xml non permettono di definire i namespace XML attraverso gli elementi <map> di tipo xmlelement. Tuttavia la NetPDL engine si occuperà di definire un sottocampo di default namespace per il campo padre che conterrà il nome del namespace dell'elemento individuato qualora esista.”
che in una eventuale traduzione risulterebbe:
“Defining a <map type=“xmlelement”> element, the xml parent field does not deal with relevant XML namespaces. However, a default namespace subfield will be created and append to it for each <map type=“xmlelement”> element declared that matches against an XML element whose namespace is indicated.”
Così mi sembra un po' più equilibrata come soluzione, a patto che l'utente non generi conflitti definendo un mapping con il nome namespace (in ogni caso il nostro nome sarà il primo della lista dei campi PDML).
Facciamo un esempio per capirci: se io ho il seguente documento XML
<?xml version="1.0" encoding="UTF-8"?>
<isComposing xmlns="urn:ietf:params:xml:ns:im-iscomposing">
<state>active</state>
<refresh timeunit="sec">60</refresh>
</isComposing>
e dichiaro con NetPDL
<cfield type="xml" size="ascii2int(content_len.hvalue)" name="e_presence">
<map type="xmlelement" srcref="state" hierarcy="isComposing"
name="state_comp" longname="State of Composition"/>
</cfield>
allora avrò in fase di decodifica un campo state_comp.namespace con valore urn:ietf:params:xml:ns:im-iscomposing, senza che l'utente abbia dovuto dire nulla. Eventualmente, si può fare decidere all'utente con un attributo booleano se questo lavoro automatico è richiesto o no, ma questo potrebbe dare qualche problema nel packet filtering (devo ricordarmi di aver definito l'attributo per usare questa funzionalità!). In generale, penso che la soluzione nel suo insieme può essere apprezzata anche per il packet filtering, perché io posso generare in automatico un filtro che dica la seguente cosa
state_comp.namespace == urn:ietf:params:xml:ns:im-iscomposing
ricordando dall'XML che però il namespace può non essere specificato e quindi richio di non selezionare alcuni campi con questo filtro particolare. Il problema alla fine è sempre quello: trovare il giusto trade-off nel supportare l'estrema dinamicità della sintassi XML.
FINE COMMENTO Fabio
Please note that value of hierarchy attribute can append the namespace domain before the name of the element indicated; for instance, the value hierarchy="urn:ietf:params:xml:ns:im-iscomposing:isComposing" is a valid replacement for hierarchy="isComposing", more specific than the latter. Value of hierarchy attribute can also include a list of subsequent parents separated by ”.” character.
Risposta: attributo hierarchy (manca negli attributi del MAP). Server per distinguere lo scope del campo.
Format-based Conditional Elements
Often, a protocol has a set of different choices for its field list. For example, the presence of a field may depend on some conditions (e.g., a specific value in a previous fields). Currently, NetPDL handles these situations through the <if>, <switch> and <loop> elements. However, these elements have the following limitations:
- although they are extremely simple to implement (in fact, these are imperative elements and therefore can be easily mapped on a computer) and can be profitable used to generate very efficient code, their abstraction level is rather poor and may leading to verbose, complex and difficult-to-read descriptions;
- these elements take their decisions based on the result of a very simple mathematical expression, which is often the value of a field.
The new <set> and <choice> elements aim at overcoming the previous limitations, while still enabling an easy and efficient implementation. In fact, although these primitives look similar to the equivalent defined in other languages (e.g., ASN.1), they have peculiar characteristics that guarantee expressiveness while not losing in efficiency. Particularly:
- they are characterized by an higher-level structure (“declarative elements”), but some of their characteristics (e.g., specific attributes) enables the generatation of highly efficient code;
- they are format-based, i.e., they are oriented to describe a set of fields that have the same format
- they take their decision based on the value of a structured field (e.g., an header line) instead of a simple field or a set of bytes in the packet payload.
In fact, both <set> and <choice> are based on the concept of field format.
Both <set> and <choice> are typed, i.e. they have a type attribute that can assume the same values already defined for <field> and <cfield> elements.
The idea is that NetPDL allows describing a set of fields that share the same external format, no matter how data is structured within each field. This enables a fast skipping of unwanted fields (e.g., in case of packet filtering) since the NetPDL engine can immediately skip a field it is not interested in, while being able to locate easily another field.
When a group of fields sharing the same general format (e.g., a set of hdrline fields) have to be described, the <set> element can be used.
When an unique field is present, but it may come in different forms that share the same general format (e.g., a choice between two hdrline fields), the <choice> element can be used.
Basically, a <set> element may be seen as a replacement of a <loop> element with a nested <switch>. In fact, it is something more because of its format-based processing, while previous elements operate on raw data. This characteristic is one of the key points that can be used to implement some processing optimizations that are more problematic in case of previous elements.
The <set> element
The <set> element is oriented to define a sequence of fields having the same format. Fields in this sequence may appear in any order, and may be present zero, one or more times.
A <set> element supports the following attributes in addition to the Standard ones:
| Attribute | Description |
|---|---|
type (required) | It defines the type of the fields that will be present in the set. It can assume the same values defined for the type attribute of the <field> and <cfield> elements. |
| Other attributes | Other attributes might be present according to the value of the type attribute. See later for more details. |
More specifically, the <set> element looks like a specific field. Therefore, it will support all the attributes that aim at defining the length of that type of field.
For instance, the <set> element related to a set of fixed-length fields must have the size attribute, which keeps the length of a fixed field:
<set type="fixed" size="4">
<!-- a set of possible fixed-length fields -->
</set>
In the same way, a pattern field will have the pattern attribute (and may have also onpartialmatch, which is optional), and more.
When a <set> element is encountered in the NetPDL description, the NetPDL engine must be able to process following data according to the format of the field as indicated in the <set> element.
For instance, we can imagine this description:
<!-- Defines a set of TLV fields; 'Type' is 2 bytes, 'Length' is 1 byte -->
<set type="tlv" tsize="2" lsize="1">
<cfield name="firsttlv" match="this.type == 1" longname="First TLV"/>
<cfield name="secondtlv" match="this.type == 2" longname="Second TLV"/>
</set>
The NetPDL engine must recognize following bytes in the packet payload as a tlv field and it must be able to access to the three portions defined for a TLV field (type, length, value). These portions can be used in the field set to select the correct field based on the value of the match attribute present in the <field> element.
A <set> element allows the following child elements:
| Element | Description |
|---|---|
<exit-when> (required) | This element defines the exit condition for the processing of <set> parent element. When present, it MUST be the first element after <missing-packetdata> one. |
<field> (optional) | This element contains the description of a simple field and it is similar but not equal to the base <field> element. It must be present if the <set> element is supposed to contain simple fields. This element defines the remaining attributes related to the field type that are not specified within the <set> element, plus some other new attributes. |
<cfield> (optional) | This element contains the description of a complex field and it is similar but not equal to the base <cfield> element. It must be present if the <set> element is supposed to contain complex fields. This element defines the remaining attributes related to the field type that are not specified within the <set> element, plus some other new attributes. |
<default-item> (required) | This element defines the default item of the set, i.e. how to process the current element if the set does not contain any matching element. |
<missing-packetdata> (optional) | This element defines how to deal with packet data when the payload appears to be truncated. If present, this element must be the first child of the parent element. Fulvio: possiamo rimuovere questa limitazione? This element is equal to the one already defined in other elements such as <if> and <switch>. This element is useful only in case the payload appears to be truncated, e.g. in case the NetPDL engine is working at packet-level and the application-layer message spans across several packets. For more details about this element, please read the <missing-packetdata> section. |
Defining an exit condition from the <set>: the <exit-when> element
This element defines the exit condition from the set. This exit condition is checked after the identification of one field and before starting the processing of the next one.
The <exit-when> element allows the following attribute in addition to the Standard ones:
| Attribute | Description |
|---|---|
expr (required) | It defines the expression that defines the exit condition. For instance, the expression can also be false in case the exit condition is not required. |
Defining the proper set of fields: the <field> and <cfield> elements
This portion of the <set> section contains the list of <field> or <cfield> elements, which describe the actual content of the set. These elements must be of the same type of the <set> element.
In addition to the Standard Attributes, these <field> and <cfield> elements support following attributes:
| Attribute | Description |
|---|---|
match (required) | It contains the boolean expression that has to be evaluated at run-time in order to select the current element. |
recurring (optional) | It specifies if the current field can be present more than once. It can assume the following values: no (default): this field will appear at most once within the set; yes: this field may appear more than once within the set. |
Please remember that these element MUST not contain any attribute that aims at determining the field format (e.g. the field size) because these attributes are inherited from the parent <set> element. Obviously these attributes, which depends on the type of fields, must be contained in the parent <set> element.
Note: only one field can match within a given set. Therefore, the NetPDL description must not contain two match conditions that are true at the same time.
Default choice for a group of set elements: the <default-item> element
The <default-item> element defines a default field, i.e. is used in case no <field>/<cfield> elements are matching.
This element supports all the attributes that are allowed in a generic type of field.
The <default-item> element supports all child elements that are supported in the corresponding <field>/<cfield> whose type attribute is equal to the one specified in the <set> element.
Example: defining the set of HTTP Header Fields
This example shows a possible description of the HTTP header:
<set type="hdrline" sepregex="[ \t]*:[ \t\r\n]*">
<!-- FULVIO: io personalmente invertirei la semantica della 'exit condition) (metterei '==' qui) -->
<!-- risposta: invertire -->
<exit-when expr="$packet[$currentoffset:2] != '\x0D\x0A'"/>
<cfield match="hasstring(this.hname,'User(-)?Agent', 0)" name="user_agent"/>
<cfield match="hasstring(this.hname,'Accept', 0)" name="accept"/>
<cfield match="hasstring(this.hname,'Content-Type', 0)" name="content_type"/>
<cfield match="hasstring(this.hname,'Content-Encoding', 0)" name="content_enc"/>
<cfield match="hasstring(this.hname,'Content-Length', 0)" name="content_len"/>
[...]
<default-item name="generic_option"/>
</set>
The NetPDL engine verifies the exit condition first. In case it is not satisfied, it processes the field according to the type set in the <set> element (hdrline in this case); hence, it keeps the hname and hvalue of the next field.
Then, it analyzes the set of <cfield> present in the description looking for the one that matches. In case no <cdield> matches, the <default-item> is selected.
Then, this process restart from the beginning and it terminates either when the exit condition is verified, or when there is no more input data available.
Completing this example, the following HTTP payload fragment:
Keep-Alive: timeout=5, max=100\r\n Content-Length: 354\r\n Content-Type: text/html; charset=iso-8859-1\r\n \r\n
will be processed in the following way:
| Field name | Value |
|---|---|
generic_option | ”Keep-Alive: timeout=5, max=100” |
generic_option.hnameor hname | ”Keep-Alive” |
generic_option.hvalueor hvalue | ”timeout=5, max=100” |
content_len | ”Content-Length: 354” |
content_len.hnameor hname | ”Content-Length” |
content_len.hvalueor hvalue | ”354” |
content_type | ”Content-Type: text/html; charset=iso-8859-1” |
content_type.hnameor hname | ”Content-Type” |
content_type.hvalueor hvalue | ”text/html; charset=iso-8859-1” |
Fulvio anche qui, non sono convinto che si possa sostituire impunemente content_len.hname con hname e cosi' via.
The <choice> element
The <choice> element is oriented to define one field that may have different internal structures while still maintaining a common external structure. For instance, this is the case of an HTTP request/response, in which the first line is always a line field, but it may be a request or a response.
A <choice> element supports the following attributes in addition to the Standard ones:
| Attribute | Description |
|---|---|
type (required) | It defines the type of the fields that will be present in the choice. It can assume the same values defined for the type attribute of the <field> and <cfield> elements. |
| Other attributes | Other attributes might be present according to the value of the type attribute. See later for more details. |
As it may be evident, the <choice> element is definitely similar to the <set> element and they share the most part of their characteristics (hence also attributes and child elements).
When a <choice> element is encountered in the NetPDL description, the NetPDL engine must be able to process following data according to the format of the field as indicated in the <choice> element.
For instance, we can imagine this description:
<!-- Defines a choice between two TLV fields; 'Type' is 2 bytes, 'Length' is 1 byte -->
<choice type="tlv" tsize="2" lsize="1">
<cfield name="firsttlv" match="this.type == 1" longname="First TLV"/>
<cfield name="secondtlv" match="this.type == 2" longname="Second TLV"/>
</set>
The NetPDL engine must recognize following bytes in the packet payload as a tlv field and it must be able to access to the three portions defined for a TLV field (type, length, value). These portions can be used in the field set to select the correct field based on the value of the match attribute present in the <field> element.
A <choice> element allows the following child elements:
| Element | Description |
|---|---|
<field> (optional) | This element contains the description of a simple field and it is similar but not equal to the base <field> element. It must be present if the <choice> element is supposed to contain simple fields. This element defines the remaining attributes related to the field type that are not specified within the <choice> element, plus some other new attributes. |
<cfield> (optional) | This element contains the description of a complex field and it is similar but not equal to the base <cfield> element. It must be present if the <choice> element is supposed to contain complex fields. This element defines the remaining attributes related to the field type that are not specified within the <choice> element, plus some other new attributes. |
<default-choice> (optional) | This element defines the default choice, i.e. i.e. how to process the current element if the choice set does not contain any matching element. |
<missing-packetdata> (optional) | This element defines how to deal with packet data when the payload appears to be truncated. If present, this element must be the first child of the parent element. Fulvio: possiamo rimuovere questa limitazione? This element is equal to the one already defined in other elements such as <if> and <switch>. This element is useful only in case the payload appears to be truncated, e.g. in case the NetPDL engine is working at packet-level and the application-layer message spans across several packets. For more details about this element, please read the <missing-packetdata> section. |
Defining the possible field formats: the <field> and <cfield> elements
This portion of the <choice> section contains the list of <field> or <cfield> elements, which describe the possible choices between allowed fields. These elements must be of the same type of the <choice> element. In addition to the Standard Attributes, these <field> and <cfield> elements support following attribute:
| Attribute | Description |
|---|---|
match (required) | It contains the boolean expression that has to be evaluated at run-time in order to select the correct field format. |
Please remember that these element MUST not contain any attribute that aims at determining the field format (e.g. the field size) because these attributes are inherited from the parent <choice> element. Obviously these attributes, which depends on the type of fields, must be contained in the parent <choice> element.
Note: only one field can match within a given set. Therefore, the NetPDL description must not contain two match conditions that are true at the same time.
Default choice for a group of choice elements: the <default-item> element
Fulvio Attenzione: e' stato rinominato da <default-choice> a <default-item>, visto che funzionalmente e' assolutamente identico al precedente (solo che non supporta il “recurring”).
The <default-item> element defines a default field, i.e. is used in case no <field>/<cfield> elements are matching.
This element supports all the attributes that are allowed in a generic type of field.
The <default-item> element supports any child element that are supported in the corresponding <field>/<cfield> whose type attribute is equal to the one specified in the <choice> element.
Fulvio Mi sto chiedendo se non rinominare questi <field> e <cfield> in qualcosa del tipo <matchfield>, che mi consentirebbe di differenziare questo tag da quelli gia' esistenti. Rimarrebbe piu' chiara la spiegazione e sarebbe probabilmente anche piu' semplice gestire la lista degli attributi possibili nei vari casi.
Fabio mi permetto di far mia la paternità dell'idea qui esposta, perché se si ricorda avevo previsto nella sintassi iniziale vari nomi per questo specifico elemento, da <match> a <fieldmatch> (anche se <matchfield> mi mancava! =⇒ <fieldmatch>). Si tratta dunque di un ritorno alle origini, realizzabile senza problemi se non modficando l'attuale codice prototipale.
Fulvio Nella stessa linea, mi sto chiedendo anche se abbia senso piu' di tanto definire <field> e <cfield> in maniera differenziata, o non dire “sono tutti field” e poi internamente a nbProtoDb mettere nella struttura “field” un flag che mi dice se il campo e' intrinsecamente strutturato, e/o ha un valore diverso dal footprint binario.
Fabio io sono obiettivo: il codice va nella direzione de lei delineata per il semplice fatto che a livello di strutture dati <cfield> e <field> sono equivalenti (in parole semplici, non aggiungo altre variabili specifiche per il nuovo insieme di campi. Tuttavia, la modifiche al codice sono un tantino corpose se si pensa alla duplicazione degli ADT, sebbene fattibili (solo ci vuole un po' più di tempo per arrivare a una implementazione deinitiva). Per il resto, facendo così dovrei modificare la tesi, no? allora lì mi vengono i capelli bianchi per via di latex, ma se dobbiamo farlo non mi tiro indietro. Solo però devo saperlo presto perché il 27 si consegna (ed io voglio arrivare qualche giorno prima per evitare disguidi).
Fulvio La specifica dice che questo elemento deve essere l'ultimo della lista. Possiamo rimuovere questa limitazione dall'implementazione?
Fabio ne abbiamo già parlato e stabilito per il sì.
Example: determining first line type of a HTTP Header
This example shows a possible descripbion of the firs line of an HTTP header:
<choice type="line">
<cfield match="this[0:4] == 'HTTP' name="statusline">
<field type="delimited" endregex=" " onmissingend="abort" name="version"/>
<field type="delimited" endregex=" " onmissingend="abort" name="statuscode"/>
<field type="eatall" name="reasonphrase"/>
</cfield>
<default-item name="cmdline">
<field type="delimited" endregex=" " onmissingend="abort" name="method"/>
<field type="delimited" endregex=" " onmissingend="abort" name="url"/>
<field type="eatall" name="version"/>
</default-item>
</choice>
The NetPDL engine will recognize following data as a line field. Then, it tries to locate a matching field by evaluating the match condition present in each field. If a matching field cannot be found, the <default-item> branch is selected.
Completing this example, the following HTTP Command Line:
GET www.nbee.org HTTP/1.1\r\n
will be processed as follows:
| Field name | Value | Notes |
|---|---|---|
cmdline | ”GET www.nbee.org HTTP/1.1” | The ending sequence ”\r\n” is discarded, because this is a line field |
method | ”GET” | |
uri | ”www.nbee.org” | |
version | ”HTTP/1.1” |
Example: different types of parameters within Accept HTTP header field
The official ABNF description for the HTTP HTTP header is the following:
Accept = "Accept" ":"
#( media-range [ accept-params ] )
media-range = ( "*/*" | ( type "/" "*" ) | ( type "/" subtype ))
*( ";" parameter )
accept-params = ";" "q" "=" qvalue *( accept-extension )
accept-extension = ";" token [ "=" ( token | quoted-string )]
We now focus on definition of the accept-extension part, which is a token followed by another token or a quoted string.
The difference among token and quoted-string is that the second is a text enclosed between ” ” ” characters.
A possible description of this field with NetPDL will be the following:
<cfield type="dynamic"
pattern="[\h\v]*;[\h\v]*(?'pname'[^=]+)(=(?'pvalue'[\h\v]+))?[\h\v]*"
name="ext_param">
<subfield portion="dynamic.pname" name="param_name"/>
<subfield portion="dynamic.pvalue" name="param_value">
<if expr="$packet[$currentoffset:1] == '\x22'">
<if-true>
<cfield type="delimited" beginregex="\x22" endregex="\x22"
onmissingbegin="abort" onmissingend="abort"
name="quoted">
</if-true>
<if-false>
<field type="eatall" name="token"/>
</if-false>
</if>
</subfield>
</cfield>
An alternative description can make use of the <choice> element as follows:
<cfield type="dynamic"
pattern="[\h\v]*;[\h\v]*(?'pname'[^=]+)(=(?'pvalue'[\h\v]+))?[\h\v]*"
name="ext_param">
<subfield portion="dynamic.pname" name="param_name"/>
<!-- Fulvio: sono perplesso: ma "dynamic.pvalue" e' un tipo? -->
<!-- Fabio: no, in realtà si tratta della choice di un subfield che abbiamo sospeso dall'implementazione -->
<choice type="dynamic.pvalue">
<csubfield match="this[0:1] == \x22" ctype="delimited" beginregex="\x22" endregex="\x22"
onmissingbegin="abort" onmissingend="abort" name="quoted"/>
<default-item name="token"/>
<choice>
</cfield>
Syntax
Fulvio Personalmente sono sempre perplesso su mettere una cosa del genere. Aumenta a dismisura la descrizione… ma serve?
Fabio Se vede la risposta al commento precedente lo può constatare di persona, mentre un altro esempio concreto è la decodifica della data negli header fields HTTP che ce l'hanno come header value (per ora risolta con una choice su eatall). Non so se questo è sufficiente, eliminare questa facoltà descrittiva rende solo più semplice la stesura del codice finale.
Syntax of <subfield>-related <choice> element changes from standard one for some details listed below.
- “type” attribute of this special
<choice>element can assume only all provided values for “portion” attribute of<subfield>elements allowed within<cfield>element where the<choice>element itself is in. Obviously, one among special<choice>and<subfield>elements is allowed to be specified for a given value of “type” attribute, e.g. in atlvcomplex field is no longer allowed neither<subfield type=“tlv.value”>nor<csubfield type=“tlv.value”>if there is a<choice type=“tlv.value”>already defined (though subfields can be listed within the latter). <default-choice>element is required because at least one way to decode a subfield must be provided.
New NetPDL functions
In addition to previous available functions by NetPDL, a new function is added.
number isasn1type(buffer ASNdot1Field, number Class, number TagNumber)
It checks if the given buffer ASNdot1Field containing an ASN.1 field matches with the ASN.Class and Tag Number defined as second and third parameter (anche which are part of the Type portion of an ASN.1 field). This function returns a non-zero value if the match is positive.
Param ASNdot1Field
Buffer that contains the data in which we want to check the ASN.1 type. It can be, for example, a run-time variable, a portion of the packet buffer, and more.
Param Class
Number that indicates the ASN.1 Class number to check. Current allowed values are the following:
- ”
0”: ASN.1 UNIVERSAL class; - ”
1”: ASN.1 APPLICATION class; - ”
2”: ASN.1 CONTEXT-SPECIFIC class; - ”
3”: ASN.1 PRIVATE class.
Param TagNumber
Number that indicates the ASN.1 Tag Number number to check. This parameter must be a positive integer value, otherwise the function will return 'false'.
Returns
'true' if the buffer ASNdot1Field matches the ASN.1 type (i.e. Class and TagNumber parameters), 'false' otherwise.