Research in Client/Server Caching in Distributed File Systems

Collaborating Investigators on this Project:

The client-server environment has supplanted the traditional mainframe environment in many organizations. While this new paradigm presents many exciting computational opportunities, it also gives rise to new challenges concerning system design and performance. As both workstation clients and servers become increasingly well-resourced, and both users and their applications become increasingly demanding, a number of system design decisions need to be rethought if service expectations are to be met. Our research in this area has addressed a broad range of performance and design issues in client-server clusters.

The overall theme of our investigation of disk block caching in distributed client-server file systems was that the caching at clients dramatically alters the workload presented to the file server, and thus traditional approaches to cache management may perform poorly at servers. The investigation was carried out in a number of stages, using as input to simulations detailed traces of file reference activity obtained through specifically designed probes we inserted into a Unix (HP-UX) kernel. We published three papers from this project, all of which are available on the DISCUS ftp Server. The first stage of the investigation, described in a paper entitled Disk Cache Replacement Policies for Network Fileservers, dealt with the caching of read requests. Because the request stream presented to the server cache is a stream of misses from the client cache, the temporal locality on which traditional cache management approaches depend is filtered out by the client cache, and therefore traditional locality-based cache management strategies (such as LRU), while suitable at client caches, are not suitable at server caches where frequency-based approaches (such as LFU) may be better choices. A second paper, entitled Write Caching in Distributed File Systems, extended this work to include write requests using special write caches. Concerns for data integrity have traditionally led to the use of very conservative ``write through'' approaches to propagate the results of updates to cached blocks to the file server. This means that the benefits of caching are often not available to write requests. The increasing availability of non-volatile memory technologies now provides the opportunity to delay the write-back of changed blocks without fear of compromising the integrity of the stored data. This means performance improvements in a number of ways: first, the effects of a series of operations on the same block can be reflected in a single write-back and, second, it is possible to apply optimization techniques to amortize the cost of disk access over multiple write-backs. Again, our results showed that approaches that perform well at client caches may not be a good choice at server caches. At the client caches, locality-based approaches perform well once again, while at the server it is more important to consider the cost of disk access through a purging-based approach. A third paper, entitled The Effect of Client Caching on File Server Workloads, addressed several remaining questions, including the issue of scalability. In a real client-server system, two effects contribute to the disruption of the stream of requests presented to the server cache: the ``filtering'' effect due to the presence of client caches, and an ``interleaving'' effect due to the presence of multiple clients. Both factors were shown to effect the server workload significant, with filtering the dominant effect.

Project Sponsors

Links to Other Home Pages