<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd"[
  <!ENTITY draft-gurbani-sipping-clf PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml3/reference.I-D.draft-gurbani-sipping-clf-01.xml'>

]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes"?>
<?rfc compact="yes" ?>
<?rfc sortrefs="no" ?>
<?rfc symrefs="yes" ?>

<rfc ipr="trust200902">
  <front>
    <title>Binary Syntax for SIP Common Log Format</title>

    <author fullname="Adam Roach" initials="A. B." surname="Roach">
      <organization>Tekelec</organization>

      <address>
        <postal>
          <street>17210 Campbell Rd.</street>
          <street>Suite 250</street>
          <city>Dallas</city>
          <region>TX</region>
          <code>75252</code>
          <country>US</country>
        </postal>
        <email>adam@nostrum.com</email>
      </address>
    </author>

    <date month="May" year="2009" />

    <area>Real Time Applications and Infrastructure</area>

    <abstract>
      <t>
        This document proposes a binary syntax for the SIP
        common log format (CLF). It does not cover semantic issues,
        and is meant to be evaluated in the context of the
        other efforts discussing SIP CLF.
      </t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>
        The Common Log File (CLF) format for the Session Initiation
        Protocol (SIP) <xref target="I-D.gurbani-sipping-clf"/>
        proposes a syntax for logging SIP messages received and
        sent by SIP clients, servers, and proxies. The syntax
        proposed by that document has been inspired by the common
        HTTP log format. However, experience with that format has
        shown that dealing with large quantities of log data can
        be very processor intensive, as doing so necessary requires
        reading and parsing every byte in the log file(s) of interest.
      </t>
      <t>
        This document counter-proposes a format that is no more
        difficult to generate by logging entities, while being
        radically faster to process. In particular, the format
        is optimized for both rapidly scanning through log
        records, as well as quickly locating commonly-accessed
        data fields. Both operations can be performed in constant time
        (as compared with O(n) time associated with the current format,
        where n is the length of the log record).
      </t>
      <t>
        Further, the format proposed by this document retains the
        ability to be read by humans and processed using traditional
        Unix text processing tools, such as sed, awk, perl, cut, and grep.
      </t>

    </section>

    <section title="Format">
      <t>
        Each data record is encoded according to the following
        format. Note that indications of "hexadecimal encoded"
        indicate that the value is to be written out in human-readable
        base-16 numbers using the ASCII characters 0x30 through 0x39
        and 0x41 through 0x46 ('0' through '9' and 'A' through 'F').
        Similarly, indications of "decimal encoded" indicate that the 
        value is to be written out in
        human readable base-10 number using the ASCII characters
        0x30 through 0x39 ('0' through '9'). In both encodings, numbers
        always take up the number of bytes indicated, and are padded on
        the left with ASCII '0' characters to fill the entire space.
      </t>
      <t>
        <figure>
          <artwork><![CDATA[
 0      7 8     15 16    23 24    31
 +--------+--------+--------+--------+
 |Version |       Flags Field        | 0 - 3
 +--------+--------+--------+--------+
 |  0x2C  |      Record Length       | 4 - 7
 +--------+--------+--------+--------+
 |   Record Length (cont)   |  0x2C  | 8 - 11
 +--------+--------+--------+--------+
 |     Server Txn Pointer (Hex)      | 12 - 15
 +--------+--------+--------+--------+
 |      Server Txn Length (Hex)      | 16 - 19
 +--------+--------+--------+--------+
 |     Client Txn Pointer (Hex)      | 20 - 23
 +--------+--------+--------+--------+
 |      Client Txn Length (Hex)      | 24 - 27
 +--------+--------+--------+--------+
 |       Method Pointer (Hex)        | 28 - 31
 +--------+--------+--------+--------+
 |        Method Length (Hex)        | 32 - 35
 +--------+--------+--------+--------+
 |      To Value Pointer (Hex)       | 36 - 39
 +--------+--------+--------+--------+
 |       To Value Length (Hex)       | 40 - 43
 +--------+--------+--------+--------+
 |       To Tag Pointer (Hex)        | 44 - 47
 +--------+--------+--------+--------+
 |        To Tag Length (Hex)        | 48 - 51
 +--------+--------+--------+--------+
 |     From Value Pointer (Hex)      | 52 - 55
 +--------+--------+--------+--------+
 |      From Value Length (Hex)      | 56 - 59
 +--------+--------+--------+--------+
 |      From Tag Pointer (Hex)       | 60 - 63
 +--------+--------+--------+--------+
 |       From Tag Length (Hex)       | 64 - 67
 +--------+--------+--------+--------+
 |       Call-Id Pointer (Hex)       | 68 - 71
 +--------+--------+--------+--------+
 |       Call-Id Length (Hex)        | 72 - 75
 +--------+--------+--------+--------+
 |      TLV Start Pointer (Hex)      | 76 - 79
 +--------+--------+--------+--------+
 |  0x0A  |                          | 80 - 83
 +--------+                          +
 |            Date/Time              | 84 - 87
 +                          +--------+
 |                          |  0x2E  | 88 - 91
 +--------+--------+--------+--------+
 |        Fractional Seconds         | 92 - 95
 +                 +--------+--------+
 |                 |  0x09  |        | 96 - 99
 +--------+--------+--------+        +
 |                                   | 100 - 103
 +               CSeq                +
 |                                   | 104 - 107
 +        +--------+--------+--------+
 |        |  0x09  |    Response     | 108 - 111
 +--------+--------+--------+--------+
 |Response|  0x09  |                 | 112 - 115
 +--------+--------+                 +
 |                                   | 
 |                                   | 
 |         Mandatory Fields          | 
 |                                   | 
 |                                   | 
 +--------+--------+--------+--------+ \
 |  0x09  |        Tag (Hex)         |  \
 +--------+--------+--------+--------+   \
 |Tag,cont|  0x2C  |  Length (Hex)   |    \   Repeated as
 +--------+--------+--------+--------+     >  many times
 |  Length (cont)  |  0x2C  |        |    /   as necessary
 +--------+--------+--------+        +   /
 |              Value                |  /
 +--------+--------+--------+--------+ /
 |  0x0A  |
 +--------+
]]></artwork>
        </figure>
      </t>
      <t>
        First, an 80-byte header indicates meta-data about
        the record. Note that the field lengths encoded in the
        header do not include the ASCII tab characters used to
        separate fields from each other.
      </t>
      <t>
        <list style="hanging">
          <t hangText="Version (1 byte):"> 0x41 for this document
          <vspace blankLines="1"/></t>

          <t hangText="Flags Field (3 bytes):">
            <list style="hanging">
              <t hangText="byte 1 - "> Request/Response flag
                  (R = request, r = response)</t>

              <t hangText="byte 2 - "> Retransmission flag
                  (o = original transmission; d = duplicate transmission;
                   s = server is stateless [i.e., retransmissions are
                   not detected])</t>

              <t hangText="byte 3 - "> Sent/Received flag
                  (r = message received, s = message sent)</t>
            </list>
          <vspace blankLines="1"/></t>

          <t hangText="Record Length (6 bytes):"> Hexadecimal-Encoded Total
          length of this log record, including "Flags" and "Record Length" fields,
          and terminating line-feed<vspace blankLines="1"/></t>

         </list>

            Bytes 12 through 72 contain hexadecimal-encoded
            pointer/length pairs that point
            to the values of variable-length mandatory fields. The
            "Pointer" fields indicate absolute byte values within the
            record, and must be >= 103. They point to the start
            of the corresponding value within the "Mandatory Fields"
            area. The "Length" fields indicate the length of the
            corresponding value. 
            The final
            pointer, "TLV Start Pointer," points to the ASCII Tab
            (0x09) character for the first entry in the Tag/Length/Value
            area; if no such entries are present, this value is set to
            zero.
        </t>
        <t>
            Note that the "Length" fields do
            not include the tab delimiters between fields. 
            Further note that there are no delimiters between
            these pointer/length values -- they are packed together
            as a single, 68-character hexadecimal encoded string.
        </t>
        <t>
          Following the pointer/length pairs, several fixed-length
          fields are encoded. As before, all fields are completely
          filled, pre-pending values with '0' characters as necessary.
          <vspace blankLines="1"/>

        <list style="hanging">

          <t hangText="Date/Time (10 bytes):"> Seconds since midnight, January 1st,
          1970, GMT; decimal encoded<vspace blankLines="1"/></t>

          <t hangText="Fraction Seconds (6 bytes):"> Microseconds since
          the time in Date/Time field; decimal encoded<vspace blankLines="1"/></t>

          <t hangText="CSeq Number (10 bytes):"> CSeq number from the SIP message; 
          decimal encoded<vspace blankLines="1"/></t>

          <t hangText="Response Code (3 bytes):"> Set to the value of the response
          code for responses. Set to 0 for requests. Decimal encoded.
          <vspace blankLines="1"/></t>

          <t hangText="Mandatory Field Data:"> Contains actual values for the
          mandatory fields. This data must appear in the order listed, 
          and each field must be present. Fields are separated by a
          single ASCII Tab character (0x09). Any tab characters present
          in the data to be written will be replaced by an ASCII space
          character (0x20) prior to being logged.
          <vspace blankLines="1"/></t>

         <t><list style="hanging">

          <t hangText="Server Txn:">  The transaction identifier
             associated with the server transaction.  Implementations MAY reuse
             the server transaction identifier (the topmost branch-id of the
             incoming request, with or without the magic cookie), or they MAY
             generate a unique identification string for a server transaction
             (this identifier needs to be locally unique to the server only.)
             This identifier is used to correlate ACKs and CANCELs to an INVITE
             transaction; it is also used to aid in forking.
             <vspace blankLines="1"/></t>

          <t hangText="Client Txn:"> This field is used to
             associate client transactions with a server transaction for
             forking proxies or B2BUAs.<vspace blankLines="1"/></t>

          <t hangText="Method:"> In requests, the method from the start line. 
          In responses, the method found in the CSeq header field.
          <vspace blankLines="1"/></t>

          <t hangText="To Value:"> Value of the To header field, possibly 
          with the tag parameter removed. (Whether to remove the tag parameter is
          left up to the logging entity).<vspace blankLines="1"/></t>

          <t hangText="To Tag:"> Value of the To header field tag parameter. If 
          no To header field tag parameter is present, the pointer field
          is ignored; the length field is set to 0; and the field in the
          mandatory section is encoded as a single ASCII dash (0x2D).
          <vspace blankLines="1"/></t>

          <t hangText="From Value:"> Value of the From header field, possibly 
          with the tag parameter removed. (Whether to remove the tag parameter is
          left up to the logging entity)<vspace blankLines="1"/></t>

          <t hangText="From Tag:"> Value of the From header field tag parameter.
          <vspace blankLines="1"/></t>

          <t hangText="Call-Id:"> The value of the Call-ID header field
          <vspace blankLines="1"/></t>

         </list></t>
         </list></t>
         <t>
            After the "Mandatory Fields" section, 
            Tag/Length/Value groups appear zero or more times. The location
            within the log record is indicated by the "TLV Start Ptr" field. 
            They are used to log information that is not mandatory for all 
            messages (although specific TLVs are mandatory in request logs).
          <vspace blankLines="1"/>
         </t>
         <t><list style="hanging">

          <t hangText="Tag Field (4 bytes):"> indicates the type of value coded
         by this TLV; hexadecimal encoded.
         Currently defined tags are: <vspace blankLines="1"/>
            <list style="hanging">

              <t hangText="0x0000 -"> Contact value (can be repeated)<vspace/>
                  Contains entire value of Contact header field<vspace blankLines="1"/></t>

              <t hangText="0x0001 -"> Request URI (mandatory in request)<vspace/>
                  Contains Request URI in start line<vspace blankLines="1"/></t>

              <t hangText="0x0002 -"> Remote Host (mandatory in request)<vspace/>
                  The DNS name of IP address from which the
                  message was received (if "sent/received flag" is 0)
                  of the IP address to which the message is being
                  send (if "sent/received flag" is 1)<vspace blankLines="1"/></t>

              <t hangText="0x0003 -"> Authenticated User<vspace/>
                  Contains the user name by which the user has
                  been authenticated<vspace blankLines="1"/></t>

              <t hangText="0x0004 -"> Complete SIP Message (optional, should be omitted by default)<vspace/>
                  Contains complete SIP message. Can be repeated
                  multiple times to accommodate SIP messages that
                  exceed 65535 bytes in length.<vspace blankLines="1"/></t>
            </list>
          <vspace blankLines="1"/></t>

          <t hangText="Length Field (2 bytes):"> indicates the length of the value
         coded in this TLV, hexadecimal encoded. This length does NOT include the TLV
         header.<vspace blankLines="1"/></t>

          <t hangText="Value Field (0 to 65535 bytes):"> contains the actual
          value of this TLV. As with the mandatory fields, ASCII Tab characters
          (0x09) are replaced with ASCII space characters (0x20).
          <vspace blankLines="1"/></t>
        </list>
      <vspace blankLines="1"/>
        </t>
    </section>

    <section title="Example Record">
      <t>
        The following demonstrates approximately how a single log record
        appears in a logging file.
        Due to internet-draft conventions, this log entry 
        has been split into ten lines, instead of the two lines
        that actually appear in a log file; and the tab characters
        have been padded out using spaces to simulate their appearance
        in a text terminal.
        <figure>
          <artwork><![CDATA[
ARor,000181,
0072000D00800000008100060088002000A9000600B0002500D6000A00E100270108

1241708241.308241	0000000187	000	7yuz67jhyi9-9	-
INVITE	Bob <sip:bob@biloxi.example.com>	314159
Alice <sip:alice@atlanta.example.com>	9fxced76sl
3848276298220188511@atlanta.example.com
0000,0034,<sip:alice@client.atlanta.example.com;transport=tcp>
0001,001A,sip:bob@biloxi.example.com
0002,000C,192.168.9.12
]]></artwork>
        </figure>
        A uuencoded version of this log entry (without the
        changes required to format it for an internet-draft)
        follows.
        <figure>
          <artwork><![CDATA[
begin 644 sip-clf.txt
M05)O<BPP,#`Q.#$L,#`W,C`P,$0P,#@P,#`P,#`P.#$P,#`V,#`X.#`P,C`P
M,$$Y,#`P-C`P0C`P,#(U,#!$-C`P,$$P,$4Q,#`R-S`Q,#@*,3(T,3<P.#(T
M,2XS,#@R-#$),#`P,#`P,#$X-PDP,#`)-WEU>C8W:FAY:3DM.0DM"4E.5DE4
M10E";V(@/'-I<#IB;V)`8FEL;WAI+F5X86UP;&4N8V]M/@DS,30Q-3D)06QI
M8V4@/'-I<#IA;&EC94!A=&QA;G1A+F5X86UP;&4N8V]M/@DY9GAC960W-G-L
M"3,X-#@R-S8R.3@R,C`Q.#@U,3%`871L86YT82YE>&%M<&QE+F-O;0DP,#`P
M+#`P,S0L/'-I<#IA;&EC94!C;&EE;G0N871L86YT82YE>&%M<&QE+F-O;3MT
M<F%N<W!O<G0]=&-P/@DP,#`Q+#`P,4$L<VEP.F)O8D!B:6QO>&DN97AA;7!L
=92YC;VT),#`P,BPP,#!#+#$Y,BXQ-C@N.2XQ,@H`
`
end
]]></artwork>
        </figure>
        
      </t>
    </section>

    <section title="Text Tool Considerations">
      <t>
        This format has been designed to allow text tools to easily
        process logs without needing to understand the indexing
        format. Index lines may be rapidly discarded by checking
        the first character of the line: index lines will always
        start with an alphabetical character, while field lines
        will start with a numerical character.
      </t>
      <t>
        Within a field line, script tools can quickly split fields
        at the tab characters. The first 11 fields are positional,
        and the meaning of any subsequent fields can be determined
        by checking the first four characters of the field.
        Alternately, these non-positional fields can be located
        using a regular expression. For example, the "Request URI"
        in a request can be found by searching for the perl regex
        /\t0001,....,([^\t]*)/.
      </t>
      <t>
        Note also that requests can be distinguished from responses
        by checking the third positional field -- for requests, 
        it will always be set to "000"; any other value indicates 
        a response.
      </t>
    </section>

</middle>

  <back>
    <references title="Normative References">
      &draft-gurbani-sipping-clf;
    </references>
    <section title="Acknowledgements">
      <t>
        Cullen put me up to this.
      </t>
      <t>
        Tom Taylor suggested the technique
        of combining the length field structure from the binary
        format with the human-readable ASCII format to allow both
        rapid processing by advanced tools, and easy processing by
        simpler, text-centric tools. Dean Willis suggested the use
        of tab delimiters as a means to avoid the need to escape values
        within a field. Vijay Gurbani provided significant feedback,
        and wrote the original proof-of-concept program which was 
        adapted to produce the examples in this document.
      </t>
    </section>
  </back>
</rfc>
