<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [ 
  <!ENTITY rfc3261 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml'>
  <!ENTITY rfc5390 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.5390.xml'>
  <!ENTITY rfc4412 PUBLIC '' 'http://xml.resource.org/public/rfc/bibxml/reference.RFC.4412.xml'>
]>
<rfc ipr='trust200902' category='info' docName='draft-ietf-sipping-overload-design-01'>

<?rfc toc='yes'?>
<?rfc compact='yes'?>
<?rfc sortrefs='yes'?>

<front>
  <title abbrev='Overload Control'>Design Considerations for Session
  Initiation Protocol (SIP) Overload Control</title> 

  <author initials='V.H.' surname='Hilt (Ed.)' fullname='Volker Hilt (Ed.)'>
    <organization>Bell Labs/Alcatel-Lucent</organization>
    <address>
      <postal>
	<street>791 Holmdel-Keyport Rd</street>
	<city>Holmdel</city> <region>NJ</region>
	<code>07733</code>
	<country>USA</country>
      </postal> 
      <email>volkerh@bell-labs.com</email>
    </address>
  </author>

  <date month='March' year='2009' />
  <area>Real-time Applications and Infrastructure</area>
  <workgroup>SIPPING Working Group</workgroup>
  <keyword>SIP</keyword>
  <keyword>Overload Control</keyword>
  <abstract>
    <t>Overload occurs in Session Initiation Protocol (SIP) networks when
    SIP servers have insufficient resources to handle all SIP messages
    they receive. Even though the SIP protocol provides a limited
    overload control mechanism through its 503 (Service Unavailable)
    response code, SIP servers are still vulnerable to overload. This
    document discusses models and design considerations for a SIP
    overload control mechanism.</t> 
  </abstract>
</front>

<middle>

  <section title="Introduction">

    <t>As with any network element, a Session Initiation Protocol
    (SIP) <xref target="RFC3261" /> server can suffer from overload
    when the number of SIP messages it receives exceeds the number of
    messages it can process. Overload occurs if a SIP server does not
    have sufficient resources to process all incoming SIP
    messages. These resources may include CPU, memory, network
    bandwidth, input/output, or disk resources. </t>

    <t>Overload can pose a serious problem for a network of SIP
    servers. During periods of overload, the throughput of a network
    of SIP servers can be significantly degraded. In fact, overload
    may lead to a situation in which the throughput drops down to a
    small fraction of the original processing capacity. This is often
    called congestion collapse.</t> 

    <t>An overload control mechanism enables a SIP server to perform
    close to its capacity limit during times of overload. Overload
    control is used by a SIP server if it is unable to process all SIP 
    requests due to resource constraints. There are other failure
    cases in which a SIP server can successfully process incoming
    requests but has to reject them for other reasons. For example, a
    PSTN gateway that runs out of trunk lines but still has plenty of
    capacity to process SIP messages should reject incoming INVITEs
    using a 488 (Not Acceptable Here) response <xref target="RFC4412"
    />. Similarly, a SIP registrar that has lost connectivity to its
    registration database but is still capable of processing SIP
    messages should reject REGISTER requests with a 500 (Server Error)
    response <xref target="RFC3261" />. Overload control mechanisms do
    not apply in these cases and SIP provides appropriate response
    codes for them.</t>

    <t>The SIP protocol provides a limited mechanism for overload
    control through its 503 (Service Unavailable) response code and
    the Retry-After header. However, this mechanism cannot prevent
    overload of a SIP server and it cannot prevent congestion
    collapse. In fact, it may cause traffic to oscillate and to shift
    between SIP servers and thereby worsen an overload condition. A
    detailed discussion of the SIP overload problem, the problems with
    the 503 (Service Unavailable) response code and the Retry-After
    header and the requirements for a SIP overload control mechanism
    can be found in <xref target="RFC5390" />.</t>

    <t>This document discusses the models, assumptions and design
    considerations for a SIP overload control mechanism. The document
    is a product of the SIP overload control design team. </t> 

  </section>

<!--
  <section title="Terminology">

    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
    "OPTIONAL" in this document are to be interpreted as described in
    <xref target="RFC2119">RFC 2119</xref>.</t>

  </section>
-->

  <section title="SIP Overload Problem">

    <t>A key contributor to the SIP congestion collapse <xref
    target="RFC5390" /> is the 
    regenerative behavior of overload in the SIP protocol. When SIP is
    running over the UDP protocol, it will retransmit messages that
    were dropped by a SIP server due to overload and thereby increase
    the offered load for the already overloaded server. This increase
    in load worsens the severity of the overload condition and, in
    turn, causes more messages to be dropped. A congestion collapse
    can occur <xref target="Noel et al." />, <xref target="Shen et al." />
    and  <xref target="Hilt et al." />.</t> 

    <t>Regenerative behavior under overload should ideally be avoided
    by any protocol as this would lead to stable operation under
    overload. However, this is often difficult to achieve in
    practice. For example, changing the SIP retransmission timer
    mechanisms can reduce the degree of regeneration during overload
    but will impact the ability of SIP to recover from message
    losses. Without any retransmission each message that is dropped
    due to SIP server overload will eventually lead to a failed
    call.</t> 

    <t>For a SIP INVITE transaction to be successful a minimum of
    three messages need to be forwarded by a SIP server. Often an
    INVITE transaction consists of five or more SIP messages. If a SIP
    server under overload randomly discards messages without
    evaluating them, the chances that all messages belonging 
    to a transaction are successfully forwarded will decrease as the
    load increases. Thus, the number of transactions that complete
    successfully will decrease even if the message throughput of a
    server remains up and assuming the overload behavior is fully
    non-regenerative. A SIP server might (partially) parse incoming
    messages to determine if it is a new request or a message
    belonging to an existing transaction. However, after having spend
    resources on parsing a SIP message, discarding this message is
    expensive as the resources already spend are lost. The number of
    successful transactions will therefore decline with an increase in
    load as less and less resources can be spent on forwarding
    messages and more and more resources are consumed by inspecting
    messages that will eventually be dropped. The slope of the decline
    depends on the amount of resources spent to inspect each message.</t> 

    <t>Another key challenge for SIP overload control is that the rate
    of the true traffic source usually cannot be controlled. Overload
    is often caused by a large number of UAs each of which creates
    only a single message. These UAs cannot be rate controlled as they
    only send one message. However, the sum of their traffic can
    overload a SIP server. </t>  

  </section>

  <section title="Explicit vs. Implicit Overload Control">

    <t>The main differences between explicit and implicit overload
    control is the way overload is signaled from a SIP server that is
    reaching overload condition to its upstream neighbors.</t>

    <t>In an explicit overload control mechanism, a SIP server uses an
    explicit overload signal to indicate that it is reaching its
    capacity limit. Upstream neighbors receiving this signal can
    adjust their transmission rate as indicated by the overload signal
    to a level that is acceptable to the downstream server. The
    overload signal enables a SIP server to steer the load it is
    receiving to a rate at which it can perform at maximum 
    capacity.</t>

    <t>Implicit overload control uses the absence of responses and
    packet loss as an indication of overload. A SIP server that is
    sensing such a condition reduces the load it is forwarding a
    downstream neighbor. Since there is no explicit overload signal,
    this mechanism is robust as it does not depend on actions taken by
    the SIP server running into overload.</t>

    <t>The ideas of explicit and implicit overload control are in fact
    complementary. By considering implicit overload indications a
    server can avoid overloading an unresponsive downstream
    neighbor. An explicit overload signal enables a SIP server to
    actively steer the incoming load to a desired level.</t>  

  </section>

  <section title="System Model" anchor="sec:model">

    <t>The model shown in <xref target="fig:archa" /> identifies
    fundamental components of an explicit SIP overload control
    mechanism:</t>

    <t><list style='hanging'>
      <t hangText="SIP Processor:">The SIP Processor processes SIP
      messages and is the component that is protected by overload
      control.</t>

      <t hangText="Monitor:">The Monitor measures the current load of
      the SIP processor on the receiving entity. It implements the
      mechanisms needed to determine the current usage of resources
      relevant for the SIP processor and reports load samples (S) to
      the Control Function.</t>

      <t hangText="Control Function:">The Control Function implements
      the overload control algorithm. The control function uses the
      load samples (S) and determines if overload has occurred and a
      throttle (T) needs to be set to adjust the load sent to the SIP
      processor on the receiving entity. The control function on the
      receiving entity sends load feedback (F) to the sending
      entity.</t>

      <t hangText="Actuator:">The Actuator implements the algorithms
      needed to act on the throttles (T) and ensures that the amount
      of traffic forwarded to the receiving entity meets the criteria
      of the throttle. For example, a throttle may instruct the
      Actuator to not forward more than 100 INVITE messages per
      second. The Actuator implements the algorithms to achieve this
      objective, e.g., using message gapping. It also implements
      algorithms to select the messages that will be affected and
      determine whether they are rejected or redirected.</t>

    </list></t>

    <t>The type of feedback (F) conveyed from the receiving to the
    sending entity depends on the overload control method used
    (i.e., loss-based, rate-based, window-based or signal-based
    overload control; see <xref target="sec:method" />), the overload
    control algorithm (see <xref target="sec:algorithm" />) as well as
    other design parameters. The feedback (F) enables the sending
    entity to adjust the amount of traffic forwarded to the receiving
    entity to a level that is acceptable to the receiving entity
    without causing overload.</t>

    <t><xref target="fig:archa" /> depicts a general system model for
    overload control. In this diagram, one instance of the control
    function is on the sending entity (i.e., associated with the
    actuator) and one is on the receiving entity (i.e., associated with
    the monitor). However, a specific mechanism may not require both
    elements. In this case, one of two control function elements can
    be empty and simply passes along feedback. E.g., if (F) is defined
    as a loss-rate (e.g., reduce traffic by 10%) there is no need for
    a control function on the sending entity as the content of (F) can
    be copied directly into (T).</t> 

    <t>The model in <xref target="fig:archa" /> shows a scenario with
    one sending and one receiving entity. In a more realistic scenario
    a receiving entity will receive traffic from multiple sending
    entities and vice versa (see <xref target="sec:topologies"
    />). The feedback generated by a Monitor will therefore often be
    distributed across multiple Actuators. A Monitor needs to be able
    to split the load it can process across multiple sending entities
    and generate feedback that correctly adjusts the load each sending
    entity is allowed to send. Similarly, an Actuator needs to be
    prepared to receive different levels of feedback from different
    receiving entities and throttle traffic to these entities
    accordingly.</t>

    <figure title="System Model for Explicit Overload Control" anchor="fig:archa">
<artwork><![CDATA[
       Sending                Receiving 
        Entity                  Entity
  +----------------+      +----------------+    
  |    Server A    |      |    Server B    | 
  |  +----------+  |      |  +----------+  |    -+
  |  | Control  |  |  F   |  | Control  |  |     | 
  |  | Function |<-+------+--| Function |  |     | 
  |  +----------+  |      |  +----------+  |     |
  |     T |        |      |       ^        |     | Overload 
  |       v        |      |       | S      |     | Control
  |  +----------+  |      |  +----------+  |     |
  |  | Actuator |  |      |  | Monitor  |  |     | 
  |  +----------+  |      |  +----------+  |     |
  |       |        |      |       ^        |    -+
  |       v        |      |       |        |    -+
  |  +----------+  |      |  +----------+  |     | 
<-+--|   SIP    |  |      |  |   SIP    |  |     |  SIP
--+->|Processor |--+------+->|Processor |--+->   | System
  |  +----------+  |      |  +----------+  |     | 
  +----------------+      +----------------+    -+

 ]]></artwork>
    </figure>

  </section>

  <section title="Degree of Cooperation">

    <t>A SIP request is usually processed by more than one SIP
    server on its path to the destination. Thus, a design choice for
    an explicit overload control mechanism is where to place the
    components of overload 
    control along the path of a request and, in particular, where to
    place the Monitor and Actuator. This design choice determines the
    degree of cooperation between the SIP servers on the
    path. Overload control can be implemented hop-by-hop with the
    Monitor on one server and the Actuator on its direct upstream
    neighbor. Overload control can be implemented end-to-end with
    Monitors on all SIP servers along the path of a request and an
    Actuator on the sender. In this case, the Control Functions
    associated with each Monitor have to cooperate to jointly
    determine the overall feedback for this path. Finally,
    overload control can be implemented locally on a SIP server if
    Monitor and Actuator reside on the same server. In this case, the
    sending entity and receiving entity are the same SIP server and
    Actuator and Monitor operate on the same SIP processor (although,
    the Actuator typically operates on a pre-processing stage in local
    overload control). Local overload control is an internal overload
    control mechanism as the control loop is implemented internally on
    one server. Hop-by-hop and end-to-end are external overload
    control mechanisms. All three configurations are shown in <xref
    target="fig:hbh-e2e" />.</t>  

    <figure title="Degree of Cooperation between Servers" anchor="fig:hbh-e2e">
<artwork><![CDATA[

            +---------+             +------(+)---------+
   +------+ |         |             |       ^          | 
   |      | |        +---+          |       |         +---+
   v      | v    //=>| C |          v       |     //=>| C | 
+---+    +---+ //    +---+       +---+    +---+ //    +---+ 
| A |===>| B |                   | A |===>| B |     
+---+    +---+ \\    +---+       +---+    +---+ \\    +---+
            ^    \\=>| D |          ^       |     \\=>| D |
            |        +---+          |       |         +---+  
            |         |             |       v          |   
            +---------+             +------(+)---------+    

      (a) hop-by-hop                   (b) end-to-end

                      +-+    
                      v |    
 +-+      +-+        +---+   
 v |      v |    //=>| C |   
+---+    +---+ //    +---+   
| A |===>| B |               
+---+    +---+ \\    +---+   
                 \\=>| D |   
                     +---+   
                      ^ |    
                      +-+    

        (c) local            

 ==> SIP request flow
 <-- Overload feedback loop

 ]]></artwork>
    </figure>

    <section title="Hop-by-Hop">

      <t>The idea of hop-by-hop overload control is to instantiate a
      separate control loop between all neighboring SIP servers that
      directly exchange traffic. I.e., the Actuator is located on the
      SIP server that is the direct upstream neighbor of the SIP
      server that has the corresponding Monitor. Each control loop
      between two servers is completely independent of the control
      loop between other servers further up- or downstream. In the
      example in <xref target="fig:hbh-e2e" />(a), three independent
      overload control loops are instantiated: A - B, B - C and B -
      D. Each loop only controls a single hop. Overload feedback
      received from a downstream neighbor is not forwarded further
      upstream. Instead, a SIP server acts on this feedback, for 
      example, by rejecting SIP messages if needed. If the
      upstream neighbor of a server also becomes overloaded, it will
      report this problem to its upstream neighbors, which again take
      action based on the reported feedback. Thus, in hop-by-hop
      overload control, overload is always resolved by the direct
      upstream neighbors of the overloaded server without the need to
      involve entities that are located multiple SIP hops away.</t> 

      <t>Hop-by-hop overload control reduces the impact of overload on
      a SIP network and can avoid congestion collapse. It is simple and scales well to
      networks with many SIP entities. An advantage is that it does
      not require feedback to be transmitted across multiple-hops,
      possibly crossing multiple trust domains. Feedback is sent to 
      the next hop only. Furthermore, it does not require a SIP entity
      to aggregate a large number of overload status values or keep
      track of the overload status of SIP servers it is not
      communicating with. </t>

    </section>

    <section title="End-to-End">

      <t>End-to-end overload control implements an overload control
      loop along the entire path of a SIP request, from UAC to UAS. An
      end-to-end overload control mechanism consolidates overload
      information from all SIP servers on the way (including all
      proxies and the UAS) and uses this information to throttle  
      traffic as far upstream as possible. An end-to-end overload
      control mechanism has to be able to frequently collect the
      overload status of all servers on the potential path(s) to a
      destination and combine this data into meaningful overload
      feedback.</t>

      <t>A UA or SIP server only throttles requests if it
      knows that these requests will eventually be forwarded to an
      overloaded server. For example, if D is overloaded in <xref
      target="fig:hbh-e2e" />(b), A should only throttle requests it
      forwards to B when it knows that they will be forwarded to D. It
      should not throttle requests that will eventually be forwarded
      to C, since server C is not overloaded. In many cases, it is
      difficult for A to determine which requests will be routed to C
      and D since this depends on the local routing decision made by
      B. These routing decisions can be highly variable and, for
      example, depend on call routing policies configured by the user,
      services invoked on a call, load balancing policies,
      etc. The fact that a previous message to a target has been routed
      through an overload server does not necessarily mean the next
      message to this target will also be routed through the same
      server.</t>

      <t>The main problem of end-to-end overload control is its
      inherent complexity since UAC or SIP servers need to monitor all 
      potential paths to a destination in order to determine which
      requests should be throttled and which requests may be
      sent. Even if this information is available, it is not clear
      which path a specific request will take.</t>

      <t>A variant of end-to-end overload control is to implement a
      control loop control between a set of well-known SIP servers along
      the path of a SIP request. For example, an overload control loop
      can be instantiated between a server that only has one
      downstream neighbor or a set of closely coupled SIP servers. A
      control loop spanning multiple hops can be used if the sending
      entity has full knowledge about the SIP servers on the path of a
      SIP message.</t>

      <t>A key difference to transport protocols using end-to-end
      congestion control such as TCP is that the traffic exchanged
      between SIP servers consists of many individual SIP
      messages. Each of these SIP messages has its own source and
      destination. Even SIP messages containing identical SIP URIs
      (e.g., a SUBSCRIBE and a INVITE message to the same SIP URI) can
      be routed to different destinations. This is different from TCP
      which controls a stream of packets between a single source and a
      single destination. </t>

    </section>

    <section title="Local Overload Control"> 
      
      <t>The idea of local overload control (see
      <xref target="fig:hbh-e2e" />(c)) is to run the Monitor and
      Actuator on the same server. This enables the server to monitor
      the current resource usage and to reject messages that can't be
      processed without overusing the local resources. The fundamental
      assumption behind local overload control is that it is less
      resource consuming for a server to reject messages than to
      process them. A server can therefore reject the excess messages
      it cannot process to stop all retransmissions of these
      messages. </t> 

      <t>Local overload control can be used in conjunction with an
      other overload control mechanisms and provides an additional
      layer of protection against overload. It is fully implemented on
      the local server and does not require any cooperation from
      upstream neighbors. In general, SIP servers should apply
      implicit or explicit overload control techniques to control load
      before a local overload control mechanism is activated as a
      mechanism of last resort.</t>

  </section>

  </section>

  <section title="Topologies" anchor="sec:topologies">

    <t>The following topologies describe four generic SIP server 
    configurations. These topologies illustrate specific challenges
    for an overload control mechanism. An actual SIP server topology
    is likely to consist of combinations of these generic
    scenarios.</t>

    <t>In the "load balancer" configuration shown in <xref
    target="fig:multiple" />(a) a set of SIP servers (D, E and F)
    receives traffic from a single source A. A load balancer is a
    typical example for such a configuration. In this configuration,
    overload control needs to prevent server A (i.e., the load
    balancer) from sending too much traffic to any of its downstream
    neighbors D, E and F. If one of the downstream neighbors becomes
    overloaded, A can direct traffic to the servers that still have
    capacity. If one of the servers serves as a backup, it can be
    activated once one of the primary servers reaches overload.</t>

    <t>If A can reliably determine that D, E and F are its only
    downstream neighbors and all of them are in overload, it may
    choose to report overload upstream on behalf of D, E and
    F. However, if the set of downstream neighbors is not fixed or
    only some of them are in overload then A should not activate an
    overload control since A can still forward the requests destined
    to non-overloaded downstream neighbors. These requests would be
    throttled as well if A would use overload control towards its
    upstream neighbors.</t>

    <t>In the "multiple sources" configuration shown in <xref
    target="fig:multiple" />(b), a SIP server D receives traffic from  
    multiple upstream sources A, B and C. Each of these sources can
    contribute a different amount of traffic, which can vary over
    time. The set of active upstream neighbors of D can change as
    servers may become inactive and previously inactive servers may
    start contributing traffic to D.</t>
    
    <t>If D becomes overloaded, it needs to generate feedback to
    reduce the amount of traffic it receives from its upstream
    neighbors. D needs to decide by how much each upstream neighbor
    should reduce traffic. This decision can require the consideration
    of the amount of traffic sent by each upstream neighbor and it may
    need to be re-adjusted as the traffic contributed by each upstream
    neighbor varies over time. Server D can use a local fairness
    policy to determine much traffic it accepts from each upstream
    neighbor. </t>

    <t>In many configurations, SIP servers form a "mesh" as shown in <xref
    target="fig:multiple" />(c). Here, multiple upstream servers A, B
    and C forward traffic to multiple alternative servers D and
    E. This configuration is a combination of the "load balancer" and
    "multiple sources" scenario.</t> 

    <figure title="Topologies" anchor="fig:multiple">
<artwork><![CDATA[

                +---+              +---+
             /->| D |              | A |-\
            /   +---+              +---+  \
           /                               \   +---+
    +---+-/     +---+              +---+    \->|   |
    | A |------>| E |              | B |------>| D |
    +---+-\     +---+              +---+    /->|   |
           \                               /   +---+
            \   +---+              +---+  / 
             \->| F |              | C |-/ 
                +---+              +---+

    (a) load balancer             (b) multiple sources    

    +---+                          
    | A |---\                        a--\
    +---+-\  \---->+---+                 \
           \/----->| D |             b--\ \--->+---+
    +---+--/\  /-->+---+                 \---->|   |
    | B |    \/                      c-------->| D |
    +---+---\/\--->+---+                       |   |
            /\---->| E |            ...   /--->+---+
    +---+--/   /-->+---+                 /
    | C |-----/                      z--/
    +---+                         

          (c) mesh                   (d) edge proxy

 ]]></artwork>
    </figure>

    <t>Overload control that is based on reducing the number of
    messages a sender is allowed to send is not suited for servers
    that receive requests from a very large population of senders, each of
    which only infrequently sends a request. This scenario is shown in 
    <xref target="fig:multiple" />(d). An edge proxy that is connected
    to many UAs is a typical example for such a configuration.</t>

    <t>Since each UA typically only contributes a few requests, which
    are often related to the same call, it can't decrease its message
    rate to resolve the overload. In such a configuration, a SIP
    server can resort to local overload control by rejecting a
    percentage of the requests it receives with 503 (Service Unavailable)
    responses. Since there are many upstream neighbors that contribute  
    to the overall load, sending 503 (Service Unavailable) to a
    fraction of them can gradually reduce load without entirely
    stopping all incoming traffic. The Retry-After header can be used
    in 503 (Service Unavailable) responses to ask UAs to wait a
    given number of seconds before trying the call again. Using 503
    (Service Unavailable) towards individual sources can, however, not
    prevent overload if a large number of users places calls at the
    same time.</t> 

    <t><list>
        <t>Note: The requirements of the "edge proxy" topology are
        different than the ones of the other topologies, which may
        require a different method for overload control.</t> 
    </list></t>

  </section>

  <section title="Fairness" anchor="sec:fairness">

    <t>There are many different ways to define fairness if a SIP
    server has multiple upstream neighbors. In the context of SIP
    server overload, it is helpful to describe two categories of
    fairness criteria: basic fairness and customized fairness. With
    basic fairness a SIP server treats all end-users equally and
    ensures that each end-user has the same chance in accessing the
    server resources. With customized fairness the server allocate
    resources according to different priorities. An example
    application of the basic fairness criteria is the "Third caller
    receives free tickets" scenario, where each end-user should have
    an equal success probability in making calls through an overloaded 
    SIP server, regardless of which service provider he/she is
    subscribing to. An example of customized fairness would be a 
    server which gives different resource allocations to its upstream
    neighbors (e.g., service providers) as defined in service level
    agreements. </t>

  </section>

  <section title="Performance Metrics" anchor="sec:metrics">

    <t>The performance of an overload control mechanism can be
    measured using different metrics. </t>

    <t>A key performance indicator is the goodput of a SIP server
    during overload. Ideally, a SIP server is enabled to perform at
    its capacity limit during periods of overload. E.g., if a SIP
    server has a processing capacity of 140 INVITE transactions per
    second then an overload control mechanism should enable it to
    handle 140 INVITEs per second even if the offered load is much
    higher. The delay introduced by a SIP server is another important
    indicator. An overload control mechanism should ensure that the
    delay encountered by a SIP message is not increased significantly
    during periods of overload.</t>

    <t>Reactiveness and stability are other important performance
    indicators. An overload control mechanism should quickly react to
    an overload occurrence and ensure that a SIP server does not become
    overloaded even during sudden peaks of load. Similarly, an
    overload control mechanism should quickly remove all throttles if
    the overload disappears. Stability is another important criteria
    as using an overload control mechanism should not lead to the
    oscillation of load on a SIP server. The performance of SIP
    overload control mechanisms is discussed in 
    <xref target="Noel et al." />, <xref target="Shen et al." /> and
    <xref target="Hilt et al." />. </t> 

    <t>In addition to the above metrics, there are other indicators
    that are relevant for the evaluation of an overload control
    mechanism:</t>

    <t><list style='hanging'>
      <t hangText="Fairness:">Which types of fairness does the
      overload control mechanism implement?</t> 
      <t hangText="Self-limiting:">Is the overload control
      self-limiting if a SIP server becomes unresponsive? </t>      
      <t hangText="Changes in neighbor set:">How does the mechanism
      adapt to a changing set of sending entities?</t> 
      <t hangText="Data points to monitor:">Which data points does an
      overload control mechanism need to monitor? </t>
      <t hangText="Tuning requirements:">Does the algorithm work out
      of the box or is parameter tweaking required?</t>      
    </list></t>

    <t><list>
        <t>TBD: a discussion of these metrics for the following
        overload control mechanisms is needed.</t> 
    </list></t>

  </section>

  <section title="Explicit Overload Control Feedback" anchor="sec:method">

    <t>Explicit overload control feedback enables a receiver to
    indicate how much traffic it wants to receive. Explicit
    overload control mechanisms can be differentiated based on the
    type of information conveyed in the overload control
    feedback. Another way to classify explicit overload control
    mechanisms is whether the control function is in the receiving or
    sending entity (receiver- vs. sender-based overload control). </t>

    <section title="Rate-based Overload Control">

      <t>The key idea of rate-based overload control is to limit the
      request rate at which an upstream element is allowed to forward
      traffic to the downstream neighbor. If overload occurs, a
      SIP server instructs each upstream neighbor to send at most X 
      requests per second. Each upstream neighbor can be assigned a
      different rate cap. </t>

      <t>An example algorithm for the Actuator in a sending entity to
      implement a rate cap is request gapping. After transmitting a
      request to a downstream neighbor, a server waits for 1/X seconds
      before it transmits the next request to the same
      neighbor. Requests that arrive during the waiting period are not
      forwarded and are either redirected, rejected or buffered.</t>

      <t>The rate cap ensures that the number of requests received by 
      a SIP server never increases beyond the sum of all rate caps
      granted to upstream neighbors. Rate-based overload control
      protects a SIP server against overload even during load spikes
      assuming there are no new upstream neighbors that start sending
      traffic. New upstream neighbors need to be considered in all 
      rate caps currently assigned to upstream neighbors. The current
      overall rate cap of a SIP server is determined by an overload
      control algorithm, e.g., based on system load.</t>

      <t>Rate-based overload control requires a SIP server to assign a
      rate cap to each of its upstream neighbors while it is
      activated. Effectively, a server needs to assign a share of its
      overall capacity to each upstream neighbor. A server needs to
      ensure that the sum of all rate caps assigned to upstream
      neighbors is not (significantly) higher than its actual
      processing capacity. This requires a SIP server to keep track of
      the set of upstream neighbors and to adjust the rate cap if a
      new upstream neighbor appears or an existing neighbor stops
      transmitting. For example, if the capacity of the server is X
      and this server is receiving traffic from two upstream
      neighbors, it can assign a rate of X/2 to each of them. If a
      third sender appears, the rate for each sender is lowered to
      X/3. If the rate cap assigned to upstream neighbors is too high,
      a server may still experience overload. If the cap is too low,
      the upstream neighbors will reject requests even though they
      could be processed by the server. </t>

<!--
      <t>A SIP server can evaluate the amount of load it receives from
      each upstream neighbor and assign a rate cap that is suitable
      for this neighbor without limiting it too much. This way, the
      SIP server can allocate resources that are not used by one
      upstream neighbor because it is sending less requests than
      allowed by the rate cap to another server.</t> 
-->
      
      <t>An approach for estimating a rate cap for each upstream neighbor
      is using a fixed proportion of a control variable, X, where X
      is initially equal to the capacity of the SIP server. The server
      then increases or decreases X until the workload arrival rate
      matches the actual server capacity. Usually, this will mean that
      the sum of the rate caps sent out by the server (=X) exceeds its
      actual capacity, but enables upstream neighbors who are not
      generating more than their fair share of the work to be
      effectively unrestricted. In this approach, the server only has
      to measure the aggregate arrival rate. However, since the
      overall rate cap is usually higher than the actual capacity,
      brief periods of overload may occur.</t> 

    </section>
   
    <section title="Loss-based Overload Control">
 
      <t>A loss percentage enables a SIP server to ask an upstream
      neighbor to reduce the number of requests it would normally
      forward to this server by a percentage X. For example, a SIP
      server can ask an upstream neighbor to reduce the number of
      requests this neighbor would normally send by 10%. The upstream
      neighbor then redirects or rejects X percent of the traffic that
      is destined for this server.</t>
	
      <t>An algorithm for the sending entity to implement a loss
      percentage is to draw a random number between 1 and 100 for each
      request to be forwarded. The request is not forwarded to the
      server if the random number is less than or equal to X. </t>

      <t>An advantage of loss-based overload control is that, the
      receiving entity does not need to track the set of upstream
      neighbors or the request rate it receives from each upstream
      neighbor. It is sufficient to monitor the overall system
      utilization. To reduce load, a server can ask its upstream
      neighbors to lower the traffic forwarded by a certain
      percentage. The server calculates this percentage by 
      combining the loss percentage that is currently in use (i.e.,
      the loss percentage the upstream neighbors are currently using
      when forwarding traffic), the current system utilization and the
      desired system utilization. For example, if the server load
      approaches 90% and the current loss percentage is set to a 50%
      traffic reduction, then the server can decide to increase the loss
      percentage to 55% in order to get to a system utilization of 
      80%. Similarly, the server can lower the loss percentage if
      permitted by the system utilization. </t>

<!--
       Loss-based overload control
      achieves fairness among incoming requests if all upstream
      neighbors are throttled by the same percentage. In this case,
      each request destined for an overloaded server has the same
      chance of being rejected by overload control.</t>
-->

      <t>Loss-based overload control requires that the throttle
      percentage is adjusted to the current overall number of requests
      received by the server. This is particularly important if the
      number of requests received fluctuates quickly. For example, if
      a SIP server sets a throttle value of 10% at time t1 and the
      number of requests increases by 20% between time t1 and t2
      (t1&lt;t2), then the server will see an increase in traffic by
      10% between time t1 and t2. This is even though all upstream
      neighbors have reduced traffic by 10% as told. Thus, percentage
      throttling requires an adjustment of the throttling percentage
      in response to the traffic received and may not always be able
      to prevent a server from encountering brief periods of overload
      in extreme cases. </t> 

    </section>

    <section title="Window-based Overload Control" anchor="sec:window">

      <t>The key idea of window-based overload control is to allow an
      entity to transmit a certain number of messages before it needs
      to receive a confirmation for the messages in transit. Each
      sender maintains an overload window that limits the number of
      messages that can be in transit without being confirmed.</t>

      <t>Each sender maintains an unconfirmed message counter for each
      downstream neighbor it is communicating with. For each message
      sent to the downstream neighbor, the counter is increased. For
      each confirmation received, the counter is decreased. The sender
      stops transmitting messages to the downstream neighbor when the
      unconfirmed message counter has reached the current window
      size.</t>

      <t>A crucial parameter for the performance of window-based
      overload control is the window size. Each sender has
      an initial window size it uses when first sending a
      request. This window size can be changed based on the feedback
      it receives from the receiver.  </t>

      <t>The sender adjusts its window size as soon as it receives the
      corresponding feedback from the receiver. If the new window size
      is smaller than the current unconfirmed message counter, the
      sender stops transmitting messages until more messages are
      confirmed and the current unconfirmed message counter is less
      than the window size.</t>

      <t>A sender should not treat the reception of a 100 Trying
      response as an implicit confirmation for a message. 100 Trying
      responses are often created by a SIP server very early in 
      processing and do not indicate that a message has been successfully
      processed and cleared from the input buffer. If the downstream
      neighbor is a stateless proxy, it will not create 100 Trying
      responses at all and instead pass through 100 Trying responses
      created by the next stateful server. Also, 100 Trying responses
      are typically only created for INVITE requests. Explicit message
      confirmations do not have these problems.</t>

      <t>The behavior and issues of window-based overload control are
      similar to rate-based overload control, in that the total
      available receiver buffer space needs to be divided among all
      upstream neighbors. However, unlike rate-based overload control,
      window-based overload control can ensure that the receiver
      buffer does not overflow under normal conditions. The
      transmission of messages by senders is effectively clocked by
      message confirmations received from the receiver. A buffer
      overflow can occur if a large number of new upstream neighbors
      arrives at the same time. In window-based overload control, the
      number of messages a sender is allowed to send can be set to
      zero. In these cases, the sender needs to be informed through an
      out-of-band mechanism that it is allowed to send again if the
      window at the receiver has opened up. </t>

<!--
      Window-based
      overload control is also more robust against errors in the
      division of capacity among upstream neighbors than rate-based
      overload control. A window size that is too large will create a
      buffer overflow, however, senders will eventally stop
      transmitting new requests. 
-->

    </section>

    <section title="Overload Signal-based Overload Control" anchor="sec:osig">

      <t>The key idea of overload signal-based overload control is to use the
      transmission of a 503 (Service Unavailable) response as a signal 
      for overload in the downstream neighbor. After receiving a 503
      (Service Unavailable) response, the sender reduces the load
      forwarded to the downstream neighbor to avoid triggering more
      503 (Service Unavailable) responses. The sender keeps reducing
      the load if 503 (Service Unavailable) responses are
      received. This scheme is based on the use of 503 (Service
      Unavailable) responses without Retry-After header as the
      Retry-After header would require a sender to entirely stop
      forwarding requests. </t>

      <t>A sender which has not received 503 (Service Unavailable)
      responses for a while but is still throttling traffic can start
      to increase the offered load. By slowly increasing the traffic
      forwarded a sender can detect that overload in the downstream
      neighbor has been resolved and more load can be forwarded. The
      load is increased until the sender again receives another 503
      (Service Unavailable) response or is forwarding all requests it
      has.</t> 

      <t>A possible algorithm for adjusting traffic is additive
      increase/multiplicative decrease (AIMD).</t> 

      <t>Overload Signal-based Overload Control is a sender-based
      overload control mechanism.</t>

    </section>

    <section title="On-/Off Overload Control" anchor="sec:onoff">

      <t>On-/off overload control feedback enables a SIP server to
      turn the traffic it is receiving either on or off. The 503
      (Service Unavailable) response with Retry-After header
      implements on-/off overload control. On-/off overload control is
      less effective in controlling load than the fine grained control
      methods above. In fact, all above methods can realize on/-off
      overload control, e.g., by setting the allowed rate to either
      zero or unlimited.</t> 

    </section>

  </section>

  <section title="Implicit Overload Control">

    <t>Implicit overload control ensures that the transmission of a
    SIP server is self-limiting. It slows down the transmission rate
    of a sender when there is an indication that the receiving entity
    is experiencing overload. Such an indication can be that the
    receiving entity is not responding within the expected timeframe
    or is not responding at all. The idea of implicit overload control
    is that senders should try to sense overload of a downstream
    neighbor even if there is no explicit overload control
    feedback. It avoids that an overloaded server, which has become
    unable to generate overload control feedback, will be overwhelmed
    with requests.</t> 

    <t>Window-based overload control is inherently self-limiting since
    a sender cannot continue without receiving confirmations. All
    other explicit overload control schemes described above do not have this
    property and require additional implicit controls to limit
    transmissions in case an overloaded downstream neighbor does not
    generate explicit feedback.</t>

  </section>

  <section title="Overload Control Algorithms" anchor="sec:algorithm">

    <t>An important aspect of the design of an overload control mechanism
    is the overload control algorithm. The control algorithm
    determines when the amount of traffic to a SIP server needs
    to be decreased and when it can be increased. In terms of the
    model described in <xref target="sec:model" /> the control
    algorithm takes (S) as an input value and generates (T) as a
    result. </t>

    <t>Overload control algorithms have been studied to a large extent
    and many different overload control algorithms exist. With many
    different overload control algorithms available, it seems
    reasonable to define a baseline algorithm and allow the use of
    other algorithms if they don't violate the protocol
    semantics. This will also allow the development of future
    algorithms, which may lead to a better performance. </t>

  </section>

  <section title="Message Prioritization"> 

    <t>Overload control can require a SIP server to prioritize
    messages and select messages that need to be rejected or
    redirected. The selection is largely a matter of local policy of
    the SIP server. As a general rule, a SIP server should prioritize
    high-priority requests, such as emergency service requests, and
    preserve them as much as possible during times of overload. It
    should also prioritize messages for ongoing sessions over messages
    that set up a new session.</t>

  </section>

<!--
  <section title="An Overload Control Response Code" anchor="sec:suppretran">

    <t>A significant contributor to the overload of a SIP server are
    retransmissions of SIP transactions that are unanswered. 

  </section>
-->

  <section anchor="sec:security" title="Security Considerations">
 
    <t>Overload control mechanisms, in general, have security
    implications. If not designed carefully they can, for example, be
    used to launch a denial of service attack. The specific security
    risks and their remedies depend on the actual protocol mechanisms
    chosen for overload control. They need to be addressed in a
    document that specifies such a mechanism.</t>  

  </section>

  <section anchor="sec:iana" title="IANA Considerations">
  
    <t>This document does not require any IANA considerations.</t> 

  </section>

</middle>

<back>

<references title='Informative References'>

  &rfc3261;

  &rfc4412;

  &rfc5390;

<reference anchor='Noel et al.'>
<front>
  <title>Initial Simulation Results That Analyze SIP Based VoIP Networks Under Overload</title> 
  <author initials='E.N.' surname='Noel' fullname='Eric Noel' />
  <author initials='C.J.' surname='Johnson' fullname='Carolyn Johnson' />
</front>
<seriesInfo name='International Teletraffic Congress (ITC’07),' value='Ottawa, Canada, June 2007' />
</reference>

<reference anchor='Shen et al.'>
<front>
  <title>Session Initiation Protocol (SIP) Server Overload Control: Design and Evaluation, Principles</title> 
  <author initials='C.S.' surname='Shen' fullname='Charles Shen' />
  <author initials='H.S.' surname='Schulzrinne' fullname='Henning Schulzrinne' />
  <author initials='E.N.' surname='Nahum' fullname='Erich Nahum' />
</front>
<seriesInfo name='Systems and Applications of IP Telecommunications (IPTComm’08),' value='Heidelberg, Germany, July 2008' />
</reference>

<reference anchor='Hilt et al.'>
<front>
  <title>Controlling Overload in Networks of SIP Servers</title> 
  <author initials='V.H.' surname='Hilt' fullname='Volker Hilt' />
  <author initials='I.W.' surname='Widjaja' fullname='Indra Widjaja' />
</front>
<seriesInfo name='IEEE International Conference on Network Protocols (ICNP’08),' value='Orlando, Florida, October 2008' />
</reference>

</references>

<section title="Contributors">

  <t>Contributors to this document are: Ahmed Abdelal (Sonus
   Networks), Mary Barnes (Nortel), Carolyn Johnson (AT&T Labs), Daryl
   Malas (CableLabs), Eric Noel (AT&T Labs), Tom Phelan (Sonus
   Networks), Jonathan Rosenberg (Cisco), Henning Schulzrinne
   (Columbia University), Charles Shen (Columbia University), Nick
   Stewart (British Telecommunications plc), Rich Terpstra (Level 3),
   Fangzhe Chang (Bell Labs/Alcatel-Lucent). Many thanks!</t>

</section> 

</back>

</rfc>


 

