web-cache.com: Writings: Historical Documents

Web Caching Early Papers

books ·
papers ·
protocols and standards

Caching in Large Scale Distributed File Systems

Matthew Blaze

PhD thesis, Princeton University, 1992

This thesis examines the problem of cache organization for
very large-scale distributed file systems (DFSs). Conventional
DFSs, based on the client–server model, suffer from
bottlenecks when the total client load exceeds the server’s
capacity. Previous work has suggested that hierarchical
client organizations can ameliorate the problem somewhat,
but at the expense of a substantial increase in client
latency. An analysis of existing DFS workloads reveals that
there is considerable regularity in client file access
patterns and that widely shared files lend themselves
especially well to caching techniques. In particular, a
large proportion of “cache miss” traffic is for files
that are already copied in another client’s cache. If clients
can share these cached files, the server’s load can be
reduced by a potentially large margin, making larger-scale
systems possible. We introduce the notion of {em dynamic
hierarchical caching}, in which adaptive client hierarchies
are constructed on a file – by – file basis. Trace – driven
simulation and workload – driven runs of a prototype file
system suggest that dynamic hierarchies can reduce server
load substantially without the client performance penalties
associated with more static schemes.

Dynamic Hierarchical Caching for Large-Scale Distributed File Systems

Rafael Alonso and Matthew Blaze

Proceedings of the Twelvth International Conference on Distributed Computing Systems, June 1992

Most Distributed File Systems (DFSs) are based on a flat
client-server model in which each client interacts directly
with the file server for all file operations. While this
model works well for relatively small systems in which the
file server has adequate capacity for all its clients, it
does not scale to large numbers of clients or systems in
which the clients are connected to the server through
low-bandwidth links. Server traffic can be reduced substantially
if clients keep even a modest-sized cache of previously
read files. Intuitively, the benefits of caching can be
increased by organizing clients into a hierarchy, in which
only a small number of machines communicate directly with
the file server, providing intermediate caching services
to machines below them in the hierarchy. While this potentially
reduces server traffic for widely shared files, it can
introduce a significant delay for clients low in the hierarchy
for access to files with a low degree of sharing. This paper
describes a simple method for constructing dynamic hierarchies
on a file-by-file basis. The results of a trace-driven
simulation of a dynamic hierarchical filesystem are presented,
yielding a reduction in server traffic of a factor of more
than two for shared files compared with a flat scheme and
without a large increase in client access time. An algorithm
to maintain cache consistency with low overhead by detecting
missed cache invalidation messages is given.

Alex – a global filesystem

Vincent Cate

Proceedings of the Usenix File Systems Workshop, pages 1-11, May 1992

Multi-level caching in distributed file systems – or – your cache ain’t nuthin’ but trash

D. Muntz and Peter Honeyman

USENIX Winter Conference, pages 305-313, January 1992.

A case for caching file objects inside internetworks

Peter B. Danzig, Michael F. Schwartz, and Richard S. Hall

ACM SIGCOMM 93 Conference, pages 239-248, September 1993

World-wide web proxies

Ari Luotonen and Kevin Altis

Computer Networks and ISDN systems. First International Conference on the World-Wide Web, Elsevier Science BV, 1994

A WWW proxy server, proxy for short, provides access to the
Web for people on closed subnets who can only access the
Internet through a firewall machine. The hypertext server
developed at CERN, cern_httpd, is capable of running as a
proxy, providing seamless external access to HTTP, Gopher,
WAIS and FTP. cern_httpd has had gateway features for a
long time, but only this spring they were extended to support
all the methods in the HTTP protocol used by WWW clients.
Clients don’t lose any functionality by going through a
proxy, except special processing they may have done for
nonnative Web protocols such as Gopher and FTP. A brand
new feature is caching performed by the proxy, resulting
in shorter response times after the first document fetch.
This makes proxies useful even to the people who do have
full Internet access and don’t really need the proxy just
to get out of their local subnet. This paper gives an
overview of proxies and reports their current status.

The Harvest information discovery and access system

C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, and Michael F. Schwartz.

Proceedings of the Second International World Wide Web Conference, pages 763-771, October 1994

Web traffic characterization: an assessment of the impact of caching documents from NCSA’s web server

Hans-Werner Braun and Kimberly Claffy

Second International World Wide Web Conference, October 1994

The case for geographical push-caching

James Gwertzman and Margo Seltzer

HotOS Conference, 1994

A Caching Relay for the World Wide Web

Steve Glassman

Proceedings of the First International WWW Conference, 1994

Invalidation in Large Scale Network Object Caches

Kurt Jeffery Worrell

Masters Thesis, University of Colorado, December 1994

Intelligent Caching for World-Wide Web Objects

Duane Wessels

Masters Thesis, University of Colorado, Boulder, February, 1995

This thesis describes some software designed to improve
access to World-Wide Web (WWW) data on the global Internet.
The tools used for retrieving WWW objects allow users to
be unaware of where the data actually resides. Huge
inefficiencies result when objects are repeatedly transmitted
across relatively slow wide area network (WAN) connections.
A solution to this problem is to install object caches at
strategic places in the network. Caches are implemented on
proxy servers which act as intermediaries between local
clients and remote servers. Frequently accessed objects
will already be in the cache thereby speeding delivery time
to clients and reducing WAN traffic.

Application-level document caching in the Internet

Azer Bestavros, Robert Carter, Mark Crovella, Carlos Cunha, Abdelsalam Heddaya, and Sulaiman Mirdad

IEEE SDNE’96: The Second International Workshop on Services in Distributed and Networked Environments