Characterizing Web-based Video Sharing Workloads

S. Mitra, M. Agrawal, A. Yadav, N. Carlsson, D. Eager, and A. Mahanti, ACM Transactions on the Web, Vol. 5, No. 2 (May 2011), pp. 8:1-8:27.

Video sharing services that allow ordinary Web users to upload video clips of their choice and watch video clips uploaded by others have recently become very popular. This paper identifies invariants in video sharing workloads, through comparison of the workload characteristics of four popular video sharing services. Our traces contain meta-data on approximately 1.8 million videos which together have been viewed approximately 6 billion times. Using these traces, we study the similarities and differences in use of several Web 2.0 features such as ratings, comments, favorites, and propensity of uploading content. In general, we find that active contribution, such as video uploading and rating of videos, is much less prevalent than passive use. While uploaders in general are skewed with respect to the number of videos they upload, the fraction of multi-time uploaders is found to differ by a factor of two between two of the sites. The distributions of life-time measures of video popularity are found to have heavy-tailed forms that are similar across the four sites. Finally, we consider implications for system design of the identified invariants. To gain further insight into caching in video sharing systems, and the relevance to caching of life-time popularity measures, we gathered an additional data set tracking views to a set of approximately 1.3 million videos from one of the services, over a twelve week period. We find that life-time popularity measures have some relevance for large cache (hot set) sizes (i.e., a hot set defined according to one of these measures is indeed relatively ``hot''), but that this relevance substantially decreases as cache size decreases, owing to churn in video popularity.


Datasets used in our paper is made available here for use by the wider research community. The datasets consist of publicly available meta-data associated with videos from the Dailymotion, Veoh, Metacafe, and Yahoo! video Web sites. If you use our datasets in your research, please drop Siddharth Mitra a line at "sidmitra DOT del AT gmail dot com", and include a reference to our paper in your work.

Dailymotion Music Category (collected 22 March, 2008; 1,194,186 videos, cf. Section 4.1.1)

Yahoo! (collected 13-15 March, 2008; 99,207 videos, cf. Section 4.1.2)

Metacafe (collected April 2008; 239,250 videos, cf. Section 4.1.3)

Veoh (collected 18 March 2008; 269,531 videos, cf. Section 4.1.3)

Dailymotion Longitudinal Dataset (cf. Section 6.3)