Characterizing and Modeling Popularity of User-generated Videos
Y. Borghol, S. Mitra, S. Ardon, N. Carlsson, D. Eager, and A. Mahanti,
Proc. 29th IFIP WG 7.3 Int'l. Symp. on Computer Performance, Modeling, Measurements and Evaluation (IFIP Performance 2011),
Amsterdam, Netherlands, Oct. 2011, to appear.
This paper develops a framework for studying the popularity
dynamics of user-generated videos,
presents a characterization of the popularity dynamics,
and proposes a model that captures the key properties of these dynamics.
We illustrate the biases that may
be introduced in the analysis for some choices of the sampling technique
used for collecting data; however, sampling from
recently-uploaded videos provides a dataset that is
seemingly unbiased. Using a dataset that tracks the views to a sample of recently-uploaded
YouTube videos over the first eight months of their lifetime, we study
the popularity dynamics. We find that the relative popularities of the videos within our
dataset are highly non-stationary,
owing primarily to large differences in the required
time since upload until peak popularity is finally achieved, and secondly to popularity oscillation.
We propose a model that can accurately capture the popularity
dynamics of collections of recently-uploaded videos as they age,
including key measures such as hot set churn statistics, and the evolution
of the viewing rate and total views distributions over time.
Datasets
The datasets used in our paper are
made available here for use by the wider research community.
The datasets consist of publicly available meta-data associated
with videos from
the Youtube Web site. Please refer to
Section 3 of our paper for a description of the data collection methodology and a summary of the datasets.
If you use our datasets in your research, please drop Anirban Mahanti
a line at "anirban dot mahanti AT-SIGN gmail dot com", and include a
reference to our paper in your work.
Recently-uploaded Videos
- Download recently-uploaded data file.
- Format: "VID", "UPLOADED", "VIEWS", "VIEWS_1", "VIEWS_2", ... , "VIEWS_34", "SOURCELIST"
VID is the video identifier,
UPLOADED is the number of minutes since the video was uploaded as measured at time of first meta-data collection,
VIEWS is the number of views to the video at time of first meta-data collection,
VIEW_1 is the number of views to the video one week following first collection,
VIEW_2 is the number of views to the video two weeks following first collection,
and so on. In total, we have 34 snapshots of view counts, each exactly one week apart.
The first meta-data collection occured between 27 July to 2 August, 2008. The
SOURCELIST is "recent" indicating that the video belongs to the recently-uploaded dataset.
Keyword-search Videos
- Download keyword-search data file.
- Format: "VID", "UPLOADED", "VIEWS", "VIEWS_1", "VIEWS_2", ... , "VIEWS_34", "SOURCELIST"
VID is the video identifier,
UPLOADED is the number of minutes since the video was uploaded as measured at time of first meta-data collection,
VIEWS is the number of views to the video at time of first meta-data collection,
VIEW_1 is the number of views to the video one week following first collection,
VIEW_2 is the number of views to the video two weeks following first collection,
and so on. In total, we have 34 snapshots of view counts, each exactly one week apart.
The first meta-data collection occured between 27 July to 2 August, 2008. The
SOURCELIST is "search" indicating that the video belongs to the keyword-search dataset.