books · protocols and standards · historical
These papers are placed into one of the following categories:
In this paper we present a new model of Web traffic and its applications in the performance evaluation of Web servers. We consider typical behavior of a user's hypertext navigation within a Web server. We propose a traffic model at the session level, formulated as a stochastic marked point process, which describes when users arrive and how they browse the server. We provide results of statistical analyses and goodness-of-fit of various simple parametric distributions and of their mixtures. We developed a Web server benchmark: WAGON (Web trAffic GeneratOr and beNchmark), and we validated the traffic model by comparing various characteristics of the synthetic traffic generated by WAGON against measurements. We then report benchmark and analysis results on the Apache server, the currently most used Web server software. We analyze the impact of the traffic parameters on the HTTP request arrival process and the packet arrival process. We also show that the aggregate traffic is self-similar in most cases, and that, more importantly, the Hurst parameter is increasing in the traffic intensity. We further provide performance comparison results between HTTP1.0 and HTTP1.1 and show that HTTP1.1 could be much worse for users as well as for servers if some control parameters of the server and of browsers are incorrectly set.
The widespread use of the World Wide Web and related applications places interesting performance demands on network servers. The ability to measure the effect of these demands is important for tuning and optimizing the various software components that make up a Web server. To measure these effects, it is necessary to generate realistic HTTP client requests. Unfortunately, accurate generation of such traffic in a testbed of limited scope is not trivial. In particular, the commonly used approach is unable to generate client request-rates that exceed the capacity of the server being tested even for short periods of time. This paper examines pitfalls that one encounters when measuring Web server capacity using a synthetic workload. We propose and evaluate a new method for Web traffic generation that can generate bursty traffic, with peak loads that exceed the capacity of the server. Finally, we use the proposed method to measure the performance of a Web server.
In this paper we propose models for both temporal and spatial locality of reference in streams of requests arriving at Web servers. We show that simple models based on document popularity alone are insufficient for capturing either temporal or spatial locality. instead, we rely on an equivalent, but numerical, representation of a reference stream: a stack distance trace. We show that temporal locality can be characterized b the marginal distribution of the stack distance trace, and we propose models for typical distributions and compare their cache performance to our traces. We also show that spatial locality in a reference stream can be characterized using the notion of self-similarity. Self-similarity describes long-range correlations in the dataset, which is a property that previous researchers have found hard to incorporate into synthetic reference strings. We show that stack distance strings appear to be strongly self-similar, and we provide measurements of the degree of self-similarity in our traces. Finally, we discuss methods for generating synthetic Web traces that exhibit the properties of temporal and spatial locality that we measured in our data.
In this paper, we analyze the way in which Web browsers use TCP connections based on extensive traffic traces obtained from a busy Web server (the official Web server of the 1996 Atlanta Olympic games). At the time of operation, this Web server was one of the busiest on the Internet, handling tens of millions of requests per day from hundreds of thousands of clients. We first describe the techniques used to gather these traces and reconstruct the behavior of TCP on the server. We then present a detailed analysis of TCPs loss recovery and congestion control behavior from the recorded transfers. The two most important results are that: (1) short Web transfers lead to poor loss recovery performance for TCP, and (2) concurrent connections cause an overly aggressive use of the network. We then discuss techniques designed to solve these problems. To improve the loss recovery performance of short transfers, we present a new technique for TCP loss recovery. To improve the congestion control and loss recovery performance of multiple parallel connections, we present a new transport-level integrated connection approach to congestion control and loss recovery. Simulation and trace analysis results show that our enhanced loss recovery scheme could have eliminated 25% of all timeout events, and our integrated connection approach provides greater fairness and improved startup performance for parallel connections. Our solutions are more general than a specific protocol enhancement such as the use of persistent connections in P-HTTP and HTTP/1.1, and include techniques, such as improved TCP loss recovery, that are not addressed by these protocols.
We analyze two days of queries to the popular NCSA Mosaic server to assess the geographic distribution of transaction requests. The wide geographic diversity of query sources and popularity of a relatively small portion of the web server file set present a strong case for deployment of geographically distributed caching mechanisms to improve server and network efficiency.
This paper addresses two unresolved issues about Web caching. [...] We first investigate the page request distribution seen by Web proxy caches using traces from a variety of sources. We find that the distribution does not follow Zipf's law precisely, but instead follows a Zipf-like distribution with the exponent varying from trace to trace. Furthermore, we find that there is i) a weak correlation between the access frequency of a Web page and its size and ii) a weak correlation between access frequency and its rate of change. We then consider a simple model where the Web accesses are independent and the reference probability of the documents follows a Zipf-like distribution. We find that the model yields asymptotic behaviors that are consistent with the experimental observations, suggesting that the various observed properties of hit ratios and temporal locality are indeed inherent to Web accesses observed by proxies. Finally, we revisit Web cache replacement algorithms and show that the algorithm that is suggested by this simple model performs best on real trace data. The results indicate that while page requests do indeed reveal short-term correlations and other structures, a simple model for an independent request stream following a Zipf-like distribution is sufficient to capture certain asymptotic properties observed at Web proxies.
This paper presents an analysis of access traces collected from seven proxy servers deployed in various locations throughout the Internet. The traces record a total of 47.4 million requests made by 23,700 clients over a twenty-one day period. We use a combination of static analysis and trace-driven cache simulation to characterize the locality and sharing properties of these accesses. Our analysis shows that a 2- to 10-GB second-level cache yields hit rates between 24% and 45% with 85% of these hits due to sharing among different clients. Caches with more clients exhibit more sharing and thus higher hit rates. Between 2% and 7% of accesses are consistency misses to unmodified objects, using the Squid and CERN proxy cache coherence protocols. Sharing is bimodal. Requests for shared objects are divided evenly between objects that are narrowly shared and those that are shared by many clients; widely shared objects also tend to be shared by clients from unrelated traces.
This paper discusses the benefits and drawbacks of caching and replication strategies in the WWW with respect to the Internet infrastructure. Bandwidth consumption, latency, and overall error rates are considered to be most important from a network point of view. The dependencies of these values with input parameters like degree of replication, document popularity, actual cache hit rates, and error rates are highlighted. In order to determine the influence of different caching and replication strategies on the behavior of a single proxy server with respect to these values, trace-based simulations are used. Since the overall effects of such strategies can hardly be decided with this approach alone, a mathematical model has been developed to deal with their influence on the network as a whole. Together, this two-tiered approach permits us to propose quantitative assessments on the influence different caching and replication proposals (are going to) have on the Internet infrastructure.
In this paper we describe the Medusa proxy, a tool for exploring user-perceived Web performance. The Medusa proxy is a non-caching forwarding proxy used in conjunction with a user's browser. The two key features that the Medusa proxy provides are (1) the ability to simultaneously mirror HTTP requests from the browser to different Web delivery systems and directly compare the results, and (2) the ability to transform requests, e.g., to transform Akamai URLs back into URLs to the customer's origin server.
We show how the Medusa proxy can be used to explore user-perceived Web performance using two experiments that deter- mine (1) to what extent using large-scale caching systems like the NLANR cache hierarchy improve user-perceived latency, and (2) to what extent content distribution networks like Akamai improve latency compared with using their customer's origin server. We found that the NLANR cache hierarchy served 63% of HTTP requests in our workload at least as fast as the origin servers, resulting in a decrease of average latency of 15%. By transforming and mirroring Akamai URLs to both Akamai edge servers and their customer's origin servers, we find that Akamai edge servers are able to serve HTTP requests an average of 5.7 times as fast as their customers. However, because requests to Akamai edge servers are only 6% of our workload, we find that using Akamai reduces overall mean latency by only 2% for our workload.
The response time of a WWW service often plays an important role in its success or demise. From a user's perspective, the response time is the time elapsed from when a request is initiated at a client to the time that the response is fully loaded by the client. This paper presents a framework for accurately measuring the client-perceived response time in a WWW service. Our framework provides feedback to the service provider and eliminates the uncertainties that are common in existing methods. This feedback can be used to determine whether performance expectations are met, and whether additional resources (e.g. more powerful server or better network connection) are needed. The framework can also be used when a consolidator provides Web hosting service, in which case the framework provides quantitative measures to verify the consolidator's compliance to a specified Service Level Agreement. Our approach assumes the existing infrastructure of the Internet with its current technologies and protocols. No modification is necessary to existing browsers or servers, and we accommodate intermediate proxies that cache documents. The only requirement is to instrument the documents to be measured, which can be done automatically using a tool we provide.
Much work in the analysis of proxy caching has focused on high-level metrics such as hit rates, and has approximated actual reference patterns by ignoring exceptional cases such as connection aborts. Several of these low-level details have a strong impact on performance, particularly in heterogeneous bandwidth environments such as modem pools connected to faster networks. Trace-driven simulation of the modem pool of a large ISP suggests that ``cookies'' dramatically affect the cacheability of resources; wasted bandwidth due to aborted connections can more than offset the savings from cached documents; and using a proxy to keep from repeatedly opening new TCP connections can reduce latency more than simply caching data.
The explosion of WWW traffic necessitates an accurate picture of WWW use, and in particular requires a good understanding of client requests for WWW documents. To address this need, we have collected traces of actual executions of NCSA Mosaic, reflecting over half a million user requests for WWW documents. In this paper we describe the methods we used to collect our traces, and the formats of the collected data. Next, we present a descriptive statistical summary of the traces we collected, which identifies a number of trends and reference patterns in WWW use. In particular, we show that many characteristics of WWW use can be modelled using power-law distributions, including the distribution of document sizes, the popularity of documents as a function of size, the distribution of user requests for documents, and the number of references to documents as a function of their overall rank in popularity (Zipf's law). Finally, we show how the power-law distributions derived from our traces can be used to guide system designers interested in caching WWW documents.
The bandwidth demands of the World Wide Web continue to grow at a hyper-exponential rate. Given this rocketing growth, caching of web objects as a means to reduce network bandwidth consumption is likely to be a necessity in the very near future. Unfortunately, many Web caches do not satisfactorily maintain cache consistency. This paper presents a survey of contemporary cache consistency mechanisms in use on the Internet today and examines recent research in Web cache consistency. Using trace-driven simulation, we show that a weak cache consistency protocol (the one used in the Alex ftp cache) reduces network bandwidth consumption and server load more than either time-to-live fields or an invalidation protocol and can be tuned to return stale data less than 5% of the time.
Clusters of identical intermediate servers are often created to improve availability and robustness in many domains. The use of proxy servers for the WWW and of Rendezvous Points in multicast routing are two such situations. However, this approach is inefficient if identical requests are received and processed by multiple servers. We present an analysis of this problem, and develop a method called the Highest Random Weight (HRW) Mapping that eliminates these difficulties. Given an object name, HRW maps it to a server within a given cluster using the object name, rather than any a priori knowledge of server states. Since HRW always maps a given object name to the same server within a given cluster, it may be used locally at client sites to achieve consensus on object-server mappings. We present an analysis of HRW and validate it with simulation results showing that it gives faster service times than traditional request allocation schemes such as round-robin or least-loaded, and adapts well to changes in the set of servers. HRW is particularly applicable to domains in which there are a large number of requestable objects, and where there is a significant probability that a requested object will be requested again. HRW has now been adopted by the multicast routing protocol PIMv2 as its mechanism for clients to identify Rendezvous Points.
Caching in the World Wide Web improves response time, reduces network bandwidth demand and reduces load on origin web servers. Caches achieve these benefits by storing copies of recently requested objects near end users, avoiding future need to transfer those objects. Cached objects are usually served more quickly and do not consume external network or server resources. Before returning an object, a cache must guess if the object it holds is still consistent with the original copy of the object. The cache may choose to validate the object's consistency with the origin server or may serve it directly to the user. If the cache must communicate with the origin server the response will take longer than one directly from cache. This report analyzes the impact of cache consistency on the response time of client requests. The analysis divides cache responses into classes according to whether or not the cache communicated with a remote server and whether or not object data was served from the cache. Analysis of traces from deployed proxy cache servers demonstrates that a round trip to a remote server is the dominant factor for response time. This study concludes that improving cache consistency will reduce response time and allow a cache to serve more user requests.
Many of the caching mechanism in HTTP, especially in HTTP/1.0, depend on header fields that carry absolute timestamp values. Errors in these values could lead to undetected cache incoherence, or to excessive cache misses. Using an extensive proxy trace, we looked for HTTP responses exhibiting several different categories of timestamp-related errors. A significant fraction of these responses have detectable errors in timestamp-based header fields.
World-Wide Web proxy servers that cache documents can potentially reduce three quantities: the number of requests that reach popular servers, the volume of network traffic resulting from document requests, and the latency that an end-user experiences in retrieving a document. This paper examines the first two using the measures of cache hit rate (or fraction of client-requested URLs returned by the proxy) and weighted cache hit rate (or fraction of client-requested bytes returned by the proxy). A client request for an uncached document may cause the removal of one or more cached documents. Variable document sizes and types allow a rich variety of policies to select a document for removal, in contrast to policies for CPU caches or demand paging, that manage homogeneous objects. We present a taxonomy of removal policies. Through trace-driven simulation, we determine the maximum possible hit rate and weighted hit rate that a cache could ever achieve, and the removal policy that maximizes hit rate and weighted hit rate. The experiments use five traces of 38 to 190 days of client URL requests. Surprisingly, the criteria used by current proxy-server removal policies (LRU and a proposal by Pitkow and Recker) are among the worst performing criteria in our simulation; instead, replacing documents based on size maximizes hit rate in each of the studied workloads.
Web caches can not only reduce network traffic and downloading latency, but can also affect the distribution of web traffic over the network through cost-aware caching. This paper introduces GreedyDual-Size, which incorporates locality with cost and size concerns in a simple and non-parameterized fashion for high performance. Trace-driven simulations show that with the appropriate cost definition, GreedyDual-Size outperforms existing web cache replacement algorithms in many aspects, including hit ratios, latency reduction and network cost reduction. In addition, GreedyDual-Size can potentially improve the performance of main-memory caching of Web documents.
Replacement policies for general caching applications and Web caching in particular have been discussed extensively in the literature. Many ad-hoc policies have been proposed that attempt to take adavantage of the retrieval latency of documents, their size, the popularity of references and temporal locality of requested documents. However, the problem of finding optimal replacement policies under these factors has not been pursued in any systematic manner. In this paper, we take a step in that direction: We first show, still under the Independent Reference Model, that a simple Markov stationary replacement policy, called the policy C0, minimizes the long-run average metric induced by non-uniform document costs when document eviction is optional. We then propose a framework for operating caching systems with multiple performance metrics. We do so by solving a constrained caching problem with a single constraint. The resulting constrained optimal replacement policy is obtained by simple randomization between two Markov stationary optimal replacement policies C0 but induced by different costs.
The HTTP/1.1 protocol is the result of four years of discussion and debate among a broad group of Web researchers and developers. It improves upon its phenomenally successful predecessor, HTTP/1.0, in numerous ways. We discuss the differences between HTTP/1.0 and HTTP/1.1, as well as some of the rationale behind these changes.
This is the presentation material from Jeff's invited talk at the 1999 USENIX Annual Technical Conference.
With the recent (draft) standardization of the HTTP/1.1 protocol on the Web, it is natural to ask what percentage of popular Web sites speak HTTP/1.1 and how compliant are these so-called HTTP/1.1 servers. We attempt to answer these questions through a series of experiments based on the protocol standard. The tests are run on a comprehensive list of popular Web sites to which a good fraction of the Web traffic is directed. Our experiments were conducted on a global extensible testing infrastructure that we built to answer the above questions. The same infrastructure will be used to answer questions like the percentage of the traffic flow that is end-to-end HTTP/1.1. Our results show reasons for concern on the state of HTTP/1.1 protocol compliancy: some servers do no properly support basic features such as the HEAD method, while many popular servers do not support some of the key features (such as persistent connections) that were added in HTTP/1.1. Perhaps most alarming of all, some servers crashed during testing. As a result we believe that changes to the protocol specification may be required.
Currently, most traffic in the Internet backbone and access networks is World Wide Web (WWW) traffic. On the basis of recent long-time traffic traces we present characteristics of WWW traffic for different flow levels, namely port-to-port, host-to-host, and total client access. Using flow duration distributions, we obtain estimates for the point in time when a shortcut connection should be established. Investigations of the correlation structure of different flow characteristics reveal that symmetrical connections occur even at high bit rates. A large number of TCP connections can therefore benefit from symmetric access network bandwidths although HTTP traffic on average is asymmetric.
Today's Web interactions are frequently short, with an increasing number of responses carrying only control information and no data. While HTTP uses client-initiated TCP for all Web interactions, TCP is not always well-suited for short interactions. Furthermore, client-initiated TCP handicaps the deployment of interception caches in the network because of the possibility of disrupted connections when some client packets bypass the cache on their way to the server. We propose a new transfer protocol for Web traffic, called Dual-transport HTTP (DHTTP), which splits the traffic between UDP and TCP channels. When choosing the TCP channel, it is the server who opens the connection back to the client. Among important aspects of DHTTP are adapting to bottleneck shifts between a server and the network and coping with the unreliable nature of UDP. The comparative performance study of DHTTP and HTTP using trace-driven simulation as well as testing real HTTP and DHTTP servers showed a significant performance advantage of DHTTP when the bottleneck is at the server and comparable performance when the bottleneck is in the network. By using server-initiated TCP, DHTTP also eliminates the possibility of disrupted TCP connections in the presence of interception caches thereby allowing unrestricted caching within backbones.
We describe our investigation of the effect of persistent connections, pipelining and link level document compression on our client and server HTTP implementations. A simple test setup is used to verify HTTP/1.1's design and understand HTTP/1.1 implementation strategies. We present TCP and real time performance data between the libwww robot and both the W3C's Jigsaw and Apache HTTP servers using HTTP/1.0, HTTP/1.1 with persistent connections, HTTP/1.1 with pipelined requests, and HTTP/1.1 with pipelined requests and deflate data compression. We also investigate whether the TCP Nagle algorithm has an effect on HTTP/1.1 performance. While somewhat artificial and possibly overstating the benefits of HTTP/1.1, we believe the tests and results approximate some common behavior seen in browsers. The results confirm that HTTP/1.1 is meeting its major design goals. Our experience has been that implementation details are very important to achieve all of the benefits of HTTP/1.1.
The workload of the global Internet is dominated by the Hypertext Transfer Protocol (HTTP), an application protocol used by World Wide Web clients and servers. Simulation studies of IP networks will require a model of the traffic patterns of the World Wide Web, in order to investigate the effects of this increasingly popular application. We have developed an empirical model of network traffic produced by HTTP. Instead of relying on server or client logs, our approach is based on packet traces of HTTP conversations. Through traffic analysis, we have determined statistics and distributions for higher-level quantities such as the size of HTTP files, the number of files per "Web page", and user browsing behavior. These quantities form a model can then be used by simulations to mimic World Wide Web network applications.
A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Although Bloom filters allow false positives, for many applications the space savings outweigh this drawback when the probability of an error is sufficiently low. We introduce compressed Bloom filters, which improve performance when the Bloom filter is passed as a message, and its transmission size is a limiting factor. For example, Bloom filters have been suggested as a means for sharing Web cache information. In this setting, proxies do not share the exact contents of their caches, but instead periodically broadcast Bloom filters representing their cache. By using compressed Bloom filters, proxies can reduce the number of bits broadcast, the false positive rate, and/or the amount of computation per lookup. The cost is the processing time for compression and decompression, which can use simple arithmetic coding, and more memory use at the proxies, which utilize the larger uncompressed form of the Bloom filter.
While algorithms for cooperative proxy caching have been widely studied, little is understood about cooperative-caching performance in the large-scale World Wide Web environment. This paper uses both trace-based analysis and analytic modelling to show the potential advantages and drawbacks of inter-proxy cooperation. With our traces, we evaluate quantitatively the performance-improvement potential of cooperation between 200 small-organization proxies within a university environment, and between two large-organization proxies handling 23,000 and 60,000 clients, respectively. With our model, we extend beyond these populations to project cooperative caching behavior in regions with millions of clients. Overall, we demonstrate that cooperative caching has performance benefits only within limited population bounds. We also use our model to examine the implications of future trends in Web-access behavior and traffic.
This paper presents Cache Digest, a novel protocol and optimization technique for cooperative Web caching. Cache Digest allows proxies to make information about their cache contents available to peers in a compact form. A peer uses digests to identify neighbors that are likely to have a given document. Cache Digest is a promising alternative to traditional per-request query/reply schemes such as ICP.
We discuss the design ideas behind Cache Digest and its implementation in the Squid proxy cache. The performance of Cache Digest is compared to ICP using real-world Web caches operated by NLANR. Our analysis shows that Cache Digest outperforms ICP in several categories. Finally, we outline improvements to the techniques we are currently working on.
We describe the structure and functionality of the Internet Cache Protocol (ICP) and its implementation in the Squid Web Caching software. ICP is a lightweight message format used for communication among Web caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate location from which to retrieve an object.
We present background on the history of ICP, and discuss issues in ICP deployment, efficiency, security, and interaction with other aspects of Web traffic behavior. We catalog successes, failures, and lessons learned from using ICP to deploy a global Web cache hierarchy.
The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we demonstrate the benefits of cache sharing, measure the overhead of the existing protocols, and propose a new protocol called ``Summary Cache''. In this new protocol, each proxy keeps a summary of the cache directory of each participating proxy, and checks these summaries for potential hits before sending any queries. Two factors contribute to our protocol's low overhead: the summaries are updated only periodically, and the directory representations are very economical, as low as 8 bits per entry. Using trace-driven simulations and a prototype implementation, we show that, compared to existing protocols such as the Internet Cache Protocol (ICP), Summary Cache reduces the number of inter-cache protocol messages by a factor of 40 to 65}, reduces the bandwidth consumption by over 50%, eliminates 75% and 95% of the protocol CPU overhead, all while maintain ing almost the same cache hit ratio as ICP. Hence Summary Cache scales to a large number of proxies.
This paper develops a new method for prefetching Web pages into the client cache. Clients send reference information to Web servers, which aggregate the reference information in near-real-time and then disperse the aggregated information to all clients, piggybacked on GET responses. The information indicates how often hyperlink URLs embedded in pages have been previously accessed relative to the embedding page. Based on knowledge about which hyperlinks are generally popular, clients initiate prefetching of the hyperlinks and their embedded images according to any algorithm they prefer. Both client and server may cap the prefetching mechanism's space overhead and waste of network resources due to speculation. The result of these differences is improved prefetching: lower client latency (by 52.3%) and less wasted network bandwidth (24.0%).
An author's exclusive right to control the making of "copies" of original works is the fundamental building block of any copyright regime. Application of this concept to electronic documents on the global Internet is proving difficult, in light of the fact that "copying" -- duplicating the information stored in binary files -- is an essential step in the transmission of information across the Internet, and thus is required for any and all utilization of works of authorship in this environment. To the extent the transitory duplication of electronic files is characterized as copying for purposes of section 106 of the U.S. Copyright Act -- a view that, arguably, has been endorsed by several courts and, less arguably, by the federal government's National Information Infrastructure Working Group on Intellectual Property -- much of what takes place over electronic networks (including the Internet) is potentially subject to a claim of copyright infringement.
Copying and storing Web pages is vital to the Internet's survival -- but is it legal?
Can merely turning on a computer constitute copyright infringement? In the Ninth Circuit, yes.
On October 12, 1998, Congress passed the Digital Millennium Copyright Act (DMCA), a complex piece of legislation which makes major changes in U.S. copyright law to address the digitally networked environment. The President is expected to sign the DMCA shortly. This memorandum discusses the law's five titles ...
This article outlines the hidden dangers of network/web caching and the implementation of a global mesh of caches. According to the author, "the integrity of information will decline, and data security risks are sure to escalate. Both Web caching and the international hierarchical cache system set the stage for abuses such as: individual monitoring and surveillance, tampering, identity theft, censorship, intellectual property/copyright infringement, invasion of privacy, and taxation."