Advanced Networking Applications (ANA) 98-99

FINAL REPORT

1. Description of Project

This project focuses on the design and performance evaluation of the national Web caching infrastructure for CA*net II. The project builds upon our past work on workload characterization for Internet Web servers and simulation evaluation of document caching strategies for Web servers, and shifts the emphasis to distributed Web servers and/or proxy servers, wherein multiple geographically distributed machines function cohesively to provide a high performance Web service, in either a peer-to-peer or hierarchical organization. Interesting performance issues that arise relate to: caching performance with different cache management strategies; load balancing with different request dispatching policies; scalability of the design with number of users, number of server nodes, and network bandwidth; and the overall improvement in end-user response times.

Our work is simulation-based, developing and using a trace-driven discrete-event simulator for distributed Web servers, and empirical traces, to evaluate design questions and performance issues. In addition to general research conclusions regarding distributed Web servers, it is expected that our project will produce specific recommendations regarding the design and deployment of a national Web caching architecture for CA*net II, in terms of cache configurations, cache management policies, scalability of the design, and overall system and network performance with different design alternatives.

There are many interesting performance issues to be addressed in the design of a geographically distributed Web caching infrastructure. At the highest level, the two main research questions are:

  • What should be the overall design of the Web caching infrastructure?
  • How should the caches be managed?
  • At a finer level, additional research questions address the issues of cache configuration, cache performance, and overall system performance.

    Our study focusses on peer-to-peer architectures as well as multi-level architectures. In this context, issues of cache consistency, cache performance, load balancing, scalability, and overall system performance arise. Our study focuses mainly on caching performance, load balancing policies, and cache management for distributed Web servers.

    A detailed, general purpose, discrete-event simulator for clustered Web servers will be developed as part of this project, and used to explore the research questions identified above, with parts of the study tailored to specific aspects of the CA*net II Web caching infrastructure, as appropriate. Empirical Web workload traces will be used in the study.

    Upon completion of the project, our simulator will be made available to other researchers as a planning tool for Web cache and network configuration requirements.

    2. Project Results

    The project has been successful, in all respects.

    We have created (and made available) software for two different versions of a Web caching simulator: the first is intended for modeling clustered Web servers, and the second is intended for modeling geographically distributed Web proxy caching hierarchies. Both versions are available here to project sponsors:
    Clustered Web server simulator (available now upon request to carey@cs.usask.ca)
    Distributed Web server simulator (available now upon request to carey@cs.usask.ca)
    Documentation is provided as well.

    We have conducted detailed simulations of clustered Web server architectures based on empirical Web server workload traces provided by Hewlett-Packard. These simulation experiments focussed on the tradeoffs between caching and load balancing performance in clustered Web servers. Preliminary results from this research were presented at the CANARIE Advanced Networks Workshop in Ottawa on December 15-16, 1998. This work led to a research paper "Achieving Load Balance and Effective Caching in Clustered Web Servers" (20 pages, PostScript, 1.9 MB), a version of which was published and presented at the NLANR Web Caching Workshop in San Diego, March 30-April 2, 1999.

    We have conducted a detailed workload characterization study of Web proxy workloads, using empirical workload traces from the University of Saskatchewan Web proxy cache (6 months of data), the CANARIE Web proxy cache (4 months of data) and the NLANR Web proxy cache (3 months of data) from the University of Illinois at Urbana-Champaign. A technical report describing our workload study is available here:

  • A. Mahanti and C. Williamson, "Web Proxy Workload Characterization" (30 pages, PostScript, 850 KB)
  • In addition to his help with the workload characterization study, M.Sc. student Anirban Mahanti has written a paper entitled "The Limited Impact of Temporal Locality on Web Proxy Cache Performance". This paper discusses the changes in Web proxy workload characteristics across the levels of a Web caching hierarchy, and some of the implications on Web proxy cache performance. In particular, the large number of "one-timer" documents and the relatively high document modification rates (especially for popular documents) at high-level Web proxies suggest that caching can have limited effectiveness. This paper (available upon request) has been submitted for possible external publication. Further information on the status of this paper will be known in May 1999.

    We have conducted preliminary experiments with the distributed Web proxy caching hierarchy simulator, using empirical Web proxy traces as input, to evaluate candidate designs for the CA*net II Web caching infrastructure. We find the current hierarchical design with a single core cache rather ineffective, and recommend a geographically distributed "national" Web cache with four or five regional caches (see more discussion below).

    Six additional technical reports were produced by graduate students as course projects during this project. These reports have added to our understanding of Web proxy workloads, Web proxy caching strategies, and Web proxy server response time. Two of these students (Mudashiru Busari and Adeniyi Oke) are funded by TRLabs as part of this CANARIE project. The other two students (Linping Yu and Yanping Zhao) are funded by other sources, but are pursuing research topics in Web server and Web proxy performance. The student project reports are available here:

  • M. Busari, "Comparison of Cache Replacement Algorithms in Web Proxies" (12 pages, MS Word, 750 KB)
  • M. Busari, "Performance Issues in Web Proxies" (10 pages, Postscript, 300 KB)
  • A. Oke, "Web Proxy Response Time Analysis" (24 pages, MS Word, 150 KB, no graphs yet)
  • A. Oke, "Impact of Document Caching on Web Server Workloads" (10 pages, Postscript, 5.6 MB)
  • L. Yu, "Internet Web Proxies: Workload Characterization" (7 pages, PostScript, 250 KB)
  • Y. Zhao, "Caching Strategies for Web Proxies" (15 pages, MS Word, 90 KB)
  • Last but not least, research interactions with Hewlett-Packard Research Laboratories (Palo Alto, California) have proved fruitful, resulting in the donation of two HP computers for use in this project. Together with the two computers purchased with the assistance of CANARIE, this makes four machines for use in this project.

    Regarding direct outcomes, the specific deliverables identified in our project proposal were:

  • completed simulator for distributed Web/proxy servers
  • performance study of cache management strategies
  • performance study of load balancing policies
  • research paper describing performance results
  • workload characterization study for Web caching proxies
  • release of simulator as planning tool for Web cache and network configuration performance studies
  • recommendations on CA*net II Web caching infrastructure
  • As described above, all deliverables (and more!) have been met.

    The final deliverable concerns configuration recommendations for the CA*net II Web caching hierarchy. Our recommendations regarding this fall into two general categories. The first category recommends a distributed Web caching architecture, wherein the "primary" core caching nodes are actually implemented using 4 or 5 geographically distributed Web caching nodes, on a regional basis. This architecture provides a better avenue for load balancing, as well as for serving content with regional interest, and should result in overall improved response times for Web users, compared to an architecture with a single central caching node. The second set of recommendations concerns cache partitioning strategies that can make more effective use of the aggregate cache space than a simple strategy that allows document replication in many of the regional caches. This can improve overall cache hit ratio performance, and end user response time. For this purpose, we find static partitioning of the URL space to provide adequate cache performance, with minimal adverse impact on load balancing performance.

    Effective cache management is of course crucial in this environment. We strongly recommend the use of frequency-based replacement policies for Web proxy caches, rather than the default recency-based LRU policy. This change can provide better protection against cache pollution from one-timers. We also find that the hit rates at the top-level Canadian cache are extremely low, due to cache filter effects at lower level caches. Furthermore, the hit rates at the top-level cache drop significantly as the regional-level caches grow in size. For these reasons, we are in fact somewhat skeptical of the overall effectiveness of a national caching hierarchy as such. Stated another way, given the choice of adding another 50 GB of storage at a single core cache, or adding another 10 GB to each of five regional caches, we would recommend the latter approach.

    As for "reach", this project has had an impact in several ways. First, the students and researchers involved in this project have learned a LOT about Web proxy caching performance. Second, our research results have been disseminated into the Web caching community, both via the presentation at the CANARIE Advanced Networks Workshop in December 1998, as well as the NLANR Web Caching Workshop in March 1999. Third, we have had industrial interest from MTS and Nortel regarding ideas in our work on clustered Web servers. Finally, we have had successful ongoing research collaborations with HP regarding this work.

    The ultimate impact of this work will be felt by many Canadians if our work can influence the design and deployment of a more effective Web caching infrastructure for CA*net 3.

    2.1 Technical Objectives

    Proposed Deliverables (SoW)

    Deliverables Achieved

    Variations

    Justify the variance with SoW

    Complete Simulator

    Yes None N/A

    Caching Study

    Yes None N/A

    Load Balancing Study

    Yes None N/A

    Research Paper

    Yes None N/A

    Proxy Workload Study

    Yes None N/A

    Release Simulator

    Yes None N/A

    Final Report and Recommendations

    Yes None N/A

    2.2 Schedule Objectives

    Deliverable

    Planned Completion Date (SoW)

    Actual Completion Date

    Justify the variance with SoW Schedule

    Complete Simulator

    August 31, 1998 August 15, 1998 Fewer debugging problems than anticipated

    Caching Study

    September 30, 1998 October 15, 1998 Performed joint with load balancing study

    Load Balancing Study

    October 30, 1998 October 15, 1998 Performed joint with caching study

    Research Paper

    December 31, 1998 January 15, 1999 Revisions prior to paper submission deadline

    Proxy Workload Study

    January 31, 1999 February 8, 1999 Collection of sufficient data

    Release Simulator

    March 31, 1999 April 15, 1999 Two different versions to release

    Final Report and Recommendations

    March 31, 1999 May 9, 1999 Final exams, busy travel schedule

    2.3 Budget Objectives

    Cost Categories

    Total Budget (SoW)

    Actual Expenses

    Variance*

    Reasons

    Labour **

    81,000 72,444 (8,556)

    ASPA salary guidelines

    Direct Materials

         

     

    Special Purpose Equipment

    12,000 11,806 (194) Pricing quote only

    Sub-contractors

           

    Travel

    6,000 6,436 436 Four trips instead of three

    Others

    2,000 87 (1,913) Research-related costs were minimal

    Project Total

    101,000 90,773 (10,227) See below

    * ($) under-budget, $ on- budget, $ over- budget
    ** Direct Labour includes Overhead and Fringe Benefits

    The project came out under-budget largely because the salary level paid to research staff member Greg Oster was less than the amount budgeted in the proposal. The amount established was a compromise between his former salary and our desired salary, tempered by the ASPA (Administrative and Supervisory Personnel Association) salary structure for other research support staff in our department, with similar training and experience. While paying him more would have been nice, it would have exceeded the ASPA guidelines (and some faculty salaries!).

    If it is possible to use the leftover funds to maintain Oster on staff for two more months of project-related work, we would be delighted to do so. However, we do realize that the project completion date was March 31, 1999.

    All other budget items came out close to the estimates given in the project proposal.

    The special purpose equipment purchased for the project (two computers) will remain within our research group at the University of Saskatchewan, and will be used in ongoing work.

    3. Contributions of Participants

    The participants in this project were Professor Carey Williamson (Principal Investigator), Professor Derek Eager, Professor Rick Bunt, full-time research staff member Greg Oster, and four graduate students (Mudashiru Busari, Anirban Mahanti, Adeniyi Oke, Jayakumar Srinivasan).

    Williamson was the Principal Investigator for the project, overseeing the technical and financial details, interacting with CANARIE, making sure that the project stayed on schedule, and filing the quarterly progress reports. He was a co-author on two research papers produced during the project, as well as the Web proxy workload characterization study. He supervised graduate student Busari, and co-supervised students Mahanti and Srinivasan (with Eager).

    Eager was the driving force behind much of the work on clustered Web servers. He contributed ideas to the simulation experiments, most notably the load balancing metrics, the service models, and the notion of affinity-based dispatching of requests. He was the primary contributing author for the paper published at the NLANR Web Caching Workshop. He co-supervised graduate students Mahanti and Srinivasan (with Williamson).

    Bunt contributed to the work on clustered Web servers, and was a co-author on the paper that appeared at the NLANR Web Caching Workshop. He also supervised graduate student Adeniyi Oke.

    Oster was the key contributor to the success of the whole project. He almost single-handedly completed the implementation, debugging, testing, validation, and optimization of the simulator for clustered Web servers, and was the lead designer and implementor of the simulator for Web proxy caching hierarchies (though graduate student Srinivasan contributed to the implementation, debugging, and testing of the simulator in the later stages). Oster produced the release of the two simulators in Spring 1999, along with necessary documentation. He was also in charge of the daily scripts for trace collection of Web proxy access logs from our three contributing sites, and had primary responsibility for the design and execution of simulation experiments, and for the analysis and reporting of simulation results.

    Among the graduate students, Mahanti played a lead role in developing and maintaining tools for reduction, storage, and analysis of Web proxy access logs. He contributed significantly to the Web proxy workload characterization study, which will in fact form a primary part of his M.Sc. thesis, expected in August 1999. Srinivasan contributed to the implementation and testing of the Web proxy caching simulator, including help with the design and implementation of inter-proxy communications and the streaming behaviour of proxies. Busari and Oke largely made use of collected Web workload traces to study specific performance problems related to proxy caching, response time analysis, and cache filter effects.

    4. Benefits

  • Direct Benefits
  • This project provided full-time employment for research staff member Greg Oster for the duration of the project. Oster is a highly-skilled individual. This project made good use of his technical skills.

    The project also enabled four graduate students to receive TRLabs graduate scholarships in support of their research activities at the University of Saskatchewan.

  • Indirect Benefits
  • This project has contributed significantly to the research on Web proxy workload and Web proxy caching in the Internet. Our analysis of Web proxy workloads from three levels of a caching hierarchy offers a unique perspective and insight into caching effectiveness. Our research results have been disseminated to the Web caching community through the NLANR Web Caching Workshop.

    Significant expertise in this area has been produced by this project. Three faculty members, one research staff member, and four graduate students have been directly influenced by this project, and plan to continue further research in this area, hopefully with industrial support.

    Our project has identified some guidelines and recommendations regarding cache management in such a context. These suggestions could prove useful in configuring a more effective national Web caching architecture for CA*net 3 in the future. This will be of benefit to CANARIE in the longer term.

    5. Future

    Our research activities on Web performance have certainly not come to a close, even though this project has. All three faculty members plan to continue research on Web and Internet performance, with a particular focus on the scalability and performance of the Web. We have four ongoing student research projects with TRLabs in this area. For example, Ph.D. student Jayakumar Srinivasan continues to explore novel techniques to improve the performance of Web proxy caching hierarchies, and other students may explore related issues, such as pre-fetching, novel Web caching algorithms, and synthetic models for Web proxy workload generations.

    We also plan to explore follow-on funding opportunities with CANARIE (e.g., Web-based support for electronic commerce applications).


    This page was last updated on May 10, 1999 by Carey Williamson.

    Copyright © 1997-8 CANARIE Inc.