SPEC SIP Design Document FAQ

Revision Date: Aug 27, 2007

What application is modeled by SPEC SIP?
How is this benchmark different from SIPStone?
How is this benchmark different from the ETSI IMS benchmark?
What SIP RFCs must a SUT support?
What transport protocols are used by the SIP Benchmark?
Is the benchmark IPv4 only or IPv6 too?
Why does SRD measure the UAS in addition to the SUT?
Why is voicemail handled using a 302 redirection response?
Is Digest Authentication used?
Why are all transactions authenticated?
Can't a UAC cache the nonce and avoid the extra round-trip?
Why must the SUT have only a single IP address?
Does the benchmark make use of DNS/SRV or does it work with IPs alone?
What SIP timer defaults are required by the benchmark?
Why is there no AAA component that uses something like RADIUS or Diameter?
Where did you get the values for the various parameters in the design document?

1 What application is modeled by SPEC SIP?

SPEC SIP models a VoIP deployment. We anticipate that future releases of the benchmark (or separate benchmarks released by the SPEC SIP SubCommittee) will support Instant Messaging and Presence.

2 How is this benchmark different from SIPStone?

SIPStone is more like SPEC CPU in that it specifies 10 call-flows (what we call "scenarios") and then reports a weighted average of the 10. It that sense it is more micro-benchmark oriented. SPEC SIP is more a macro-benchmark that is meant to capture user behavior and be useful for capacity planning. Thus it uses "Simultaneous Number of Virtual Users" as its primary performance metric. SIPStone was developed by Columbia University and is available via license from SIPQuest.com. SPEC SIP was developed by consensus via the SPEC standardization process.

3 How is this benchmark different from the ETSI IMS benchmark?

The ETSI IMS benchmark was developed by ETSI to provide a benchmark for wireless/3GPP providers using IMS. SPEC SIP is meant to be SIP specific and not make any IMS assumptions. The ETSI IMS benchmark is a specification, not a code release. SPEC SIP is both a specification and a released body of code that can be run and submitted for publication using SPEC's auditing process. SPEC SIP focuses on a single node SIP server system under test, rather than a complete network architecture such as IMS.

4 What SIP RFCs must a SUT support?

Currently, only RFC3261.

5 What transport protocols are used by the SIP Benchmark?

Currently, only UDP is supported. TCP, TLS, and SCTP may be supported in future releases.

6 Is the benchmark IPv4 only or IPv6 too?

V4 only.

7 Why does SRD measure the UAS in addition to the SUT?

For simplicity. People running the benchmark will have to be careful to make sure that the UAS is not a bottleneck and does not significantly contribute to this latency. The values for response time need to have a sufficient "fudge" factor to account for it.

8 Why is voicemail handled using a 302 redirection response?

There are several ways to implement voicemail in SIP. We chose using the 302 response since it follows RFC 4458, appears to be a common method for handling voicemail (if not the most common), and places the least amount of requirements on the SUT.

9 Is Digest Authentication used?

Yes, for all methods that are appropriate (i.e., INVITE, BYE, and REGISTER, but not ACK or CANCEL).

10 Why are all transactions authenticated?

For security reasons, to prevent hijacking of calls (INVITE, BYE) or assuming someone else's identity (REGISTER).

11 Can't a UAC cache the nonce and avoid the extra round-trip?

It is possible in some cases to cache the nonce that is used in the challenge, so that the Authorization header can be re-used in later SIP requests without necessitating going through the authorization challenge. However, for security reasons, the nonce is valid for only a limited period of time. SPEC SIP assumes that any nonce would have expired and thus the authorization challenge is necessary for each transaction.

12 Why must the SUT have only a single IP address?

The IP address is a single point-of-presence (POP) for a SUT, which may be a single SIP server or even a cluster of servers sharing a single virtual IP address. The benchmark is intended to measure a single SIP configuration. Supporting multiple IP addresses would make scaling the experiment trivial, essentially running N instances of the benchmark in parallel. Instead, those wishing to scale the benchmark using multiple machines must use a load-balancing proxy or switch, which exposes a single IP address.

13 Does the benchmark make use of DNS/SRV or does it work with IPs alone?

The benchmark does make use of user name resolution (mapping URIs to IP addresses). All users are registered to the same domain, sip.spec.org, and the SUT must map a user URI to the appropriate IP address.

The benchmark does not perform any DNS resolution. While DNS resolution can have a significant impact on performance (e.g., if the server blocks while waiting for a resolve), modeling DNS resolution is complex and would require making estimates of DNS cache miss ratios and costs to retrieve DNS records over the network. Thus DNS resolution is outside the scope of the benchmark, at least for the first release.

14 What SIP timer defaults are required by the benchmark?

All timers are assumed to use their defaults as specified in RFC 3261.

15 Why is there no AAA component that uses something like RADIUS or Diameter?

While DB-like interactions are an important component of any SIP deployment, the SPEC SIP SubCommittee decided to omit this from the benchmark for two reasons: simplicity and lack of standardization. Simplicity because omitting the DB made the benchmark simpler and more a measure of the native SIP stack rather than of a DB server. Lack of standardization because there does not yet seem to be an standard way of communicating with a DB-like server that is actually widely used across industry. At the moment, many different approaches appear to be used: JDBC, LDAP, RADIUS, Diameter, and even proprietary protocols. Choosing one of these protocols would be expressing favoritism and biasing against products that did not support that protocol; choosing several of them would make apples-to-apples comparisons difficult. After a great deal of discussion, the SPEC SIP SubCommittee decided to not include this in the benchmark, with the understanding that future releases of the benchmark might change this decision. For example, if Diameter becomes widely deployed in practice, it is possible to imagine a later release of SPEC SIP would require the SUT to use Diameter.

16 Where did you get the values for the various parameters in the design document?

The values were based on workload studies done by Communigate Systems and IBM. Workload characterization of SIP server deployments are difficult to acquire and thus the values may reflect individual deployments rather than more representative scenarios. This is one reason why the benchmark is so parameterizable. The SPEC SIP SubCommittee encourages more thorough workload characterization of SIP servers and is open to improving the benchmark in terms of how representative it is.

Standard Performance Evaluation Corporation