iEpi: A Robust & Versatile Smartphone-Based Epidemiological Data Collection System

Joint work with Prof. Kevin Stanley.

Click here to see my youtube playlist with videos for our iEpi (Ethica Health) bootcamp.

Click here to see information on the latest version of the iEpi system (the Ethica Health platform), supporting both iPhones and Android devices.

Notice: 2017 Bootcamp and Incubator on Understanding Health Behavior using Smartphones and Wearables: Visitors interested in the materials available here may also be interested in knowing about for the coming 2017 Bootcamp and Incubator on Understanding Health Behavior using Smartphones and Wearables. The bootcamp will be held in early August 2017 and will incorporate both systematic and hands-on coverage of a wide variety of topics in smartphone-based data collection (and use of data from associated wearables); participants will receive hands-on assistance in building studies customized to their their research interests, bringing such studies from conceptualization to the analysis phase with cutting edge technologies.

Click here for brief sketches of how iEpi can offer insight in different health subdomains.

While large volumes of epidemiological surveillance data are collected by public health authorities and researchers, such data suffers from some shortcomings. Collection of most information relies on individual self-reporting, which can be notoriously unreliable. Physically measured information (such as that collected by NHANES III, the Canadian Health Measures Survey and some other surveys) is traditionally costly and burdensome to acquire.

The rise of sensor-bearing smartphones offers the potential for enabling data collection from ubiquitous epidemiological sensing applications downloaded and run by volunteers -- all the while cross-linking this data to results from questionnaires delivered on the same devices, either at randomly selected times (as in classical ecological momentary assessments [EMAs]) or triggered by context.

Ethica iEpi is our 3rd generation health monitoring app designed to run in the background on both Google Android 4.x+ smartphones and iPhones. iEpi is designed to automatically record and cross-link a wide variety of minute-level resolution sensor data and data from study-specific on-device surveys (EMAs) issued by the app, as well as crowdsourced events proactively signaled by a participant. iEpi EMAs can contain a variety of types of questions as well as camera input (e.g., for dietary intake, medication dosage, test results, message exposure), and can be optionally triggered by context or user request (e.g., pressing a button indicating eating administration of medication, occurrence of specific type of ideation or symptom. Reflecting the benefits of having an interface customized to each particular study, iEpi supports a highly reconfigurable interface, so that its functioning can be closely tailored to particular studies. Moreover, iEpi can be easily adapted without programming to change the volume and type of sensor and survey data collected both between and adaptively within deployments.

Common information that can be collected by iEpi's sensors include information on participant location (with GPS), physical activity level (using accelerometers), sedentary behavior (using phone orientation and accelerometers), dietary intake (using the phone's camera), and inter-participant proximity/social networks (via Bluetooth), aspects of vehicular context, etc. In some deployments, system has also been adapted to collect data from external devices, such as weight scales. All such data can be automatically cross-linked to survey data collected via the app from the same respondent, which can be triggered by data picked up by the sensors (e.g., based on participant location, social context, or physical activity) or by or by participant action (e.g., taking an image of a meal, testing blood sugars or medication use).

While the data collected by the smartphone app side of iEpi is typically streamed off of the devices for external analysis, the app is designed from the ground up to be highly secure as well as robust in the absence of connectivity. To ensure confidentiality, data is stored on the phone in an encrypted fashion. Transparently and invisibly to the user, data collected by iEpi is opportunistically uploaded in encrypted form as the participant comes in contact with cell-phone and WiFi based networks.

iEpi has been deployed in support of health research in a number of geographic regions and socio-demographic contexts across North America. Applicants have successfully employed iEpi in diverse studies lasting 1-4 months. iEpi has enjoyed successful deployment many university populations, in several lower-income communities, in both high-density urban and rural regions, and for work ranging from institutional through to broader epidemiological studies, varying widely in scale. When paired with transmission models, the system has successfully supported new insights on contagious disease spread and social determinants of health.

In what way could iEpi help advance the research objectives of health research? There are diverse areas possible contribution, but we highlight two here. 1) Assessing symptomology (both clinical and subclinical) on a 24x7 basis, and assessing its temporal relationship to risk factors and exposures recorded by sensors (e.g., physical activity, pose, some social context, location) or EMAs (medication compliance, self-reported stressors, second-hand smoke exposure, dietary intake, other aspects of social context). 2) Use of such monitoring to enhance the speed, reliability, and depth of learning from implemented interventions. More specifically, when an intervention succeeds or fails, because of limitations on traditional measurements instruments (e.g., their limited accuracy in measuring changes in medication compliance, physical activity, dietary behavior, socialization and mixing patterns, mobility, communicational behavior), there is often limited understanding of the specific pathways of effect by which such success was realized or thwarted. A tool such as iEpi is designed to inform an accurate understanding of the particular pathways by which an intervention affects important outcome measures (e.g., frequency and intensity of pain associated with osteoarthritis). Regardless of whether an intervention is successful or not in the end, the learning from it can be much deeper, more reliable and quicker by virtue of being able to examine which and by how much and how soon different pathways were affected (e.g., allowing observers to distinguish pathways that were successfully nudged vs. pathways that became a bottleneck and thereby stymied change in the outcomes of interest, or particular pathways that exerted disproportionate impact for interventions that did have effect.) Regardless of the success of the original intervention, securing such understanding from it can be of great value in devising more reliable interventions.

We see our iEpi system as a natural complement to our computational models; the two work together to yield very powerful decision-making tools. Data from iEpi helps to ground our models with a profusion (~1.5M records per participant per month) of detailed, longitudinal data at the individual level. Various types of modeling we apply -- agent-based and aggregate simulation models, as well as inferential and statistical models -- help to "make sense" of this data, and to relate it to the choices that need to be made.

Those interested in more concrete discussion about textured application of iEpi and similar smartphone technologies to different specific subdomains of health may find the this document of interest. The document discusses the use of iEpi -- and, by extension, similar smartphone-based sensing, EMA and crowdsourcing systems -- to the Obesity, Chronic Disease, Tobacco-Related Disease, Communicable Disease, Mental Health, Environmental Epidemiology, and Health Services research areas.

The latest version of iEpi (Ethica iEpi) is used centrally in our laboratory, but also being offered commercially through Ethica Data Systems. To get a feel for the system, we encourage signing up here for a free trial. Sample studies, instruments and demonstration data are available in the areas of physical activity/weight/built environment, waterborne illness, foodborne illness, communicable illness, air quality and zoonoses (Lyme's Disease, and West Nile virus), with other sample studies to be rolled out shortly.

For further information, please see the following streaming videos:

Big Data and Dynamic Modeling presentation at Deakin Modeling and Chronic Disease Master Class 2015 (sponsored by Sax Institute and Deakin University).

My presentation from the Institute for Systems Science and Health 2011 demonstrates how we can leaverage such data using 3 systems science modeling techniques.

Presentation delivered at the 2012 Annual Meeting for the Society for Epidemiological Research focuses on how sensing can inform the design of rich simulation models, but also comments on the synergy between sensing and dynamic models.



Aiello, A.E., Simanek, A.M., Eisenberg M.C., Walsh A.R., Davis, B., Volz, E., Cheng, C., Rainey, J.J.; Uzicanin, A., Gao, H. ; Osgood, N. ; Knowles, D. , Stanley, K., Tarter K., Monto, A.S. "Design and Methods of a Social Network Isolation Study for Reducing Respiratory Infection Transmission: The eX-FLU Cluster Randomized Trial." Accepted by Epidemics, January 19, 2016.

Marshall, D., Burgos-Liz, L., Pasupathy, K., Padula, W., IJzerman, M., Wong, P., Higashi, M., Engbers, J., Wiebe, S., Crown, W., Osgood, N. (2015). Transforming Healthcare Delivery: Integrating Dynamic Simulation Modelling and Big Data in Health Economics and Outcomes Research. PharmacoEconomics: 1-12.

Knowles, D.L., Stanley, K.G., Osgood, N.D. 2014. A Field-Validated Architecture for the Collection of Health-Relevant Behavioural Data. Oral presentation and full paper publication in Proceedings the IEEE International Conference on Healthcare Informatics 2014 (ICHI 2014). pp. 79-88. Verona, Italy, September 15-17, 2014.

Knowles, D.L., Stanley, K.G., Osgood, N.D. 2014. Seddacco: An Extensible Language in Support of Mass Collection of Health Behavior Data. Oral presentation and publication in ACM SIGKDD Workshop on Health Informatics (HI-KDD 2014). 8pp. New York City, August 24, 2014.

Qian, W., Osgood, N.D., Stanley, K.G. Integrating epidemiological modeling and surveillance data feeds: a Kalman filter based approach. Oral presentation and publication in Proceedings the 2014 International Social Computing, Behavioral Modeling and Prediction Conference (SBP14), Washington DC, pp. 145-152. April 2-4, 2014.

Qian W., Stanley K., Osgood, N. 2012. The Impact of Spatial Resolution and Representation on Human Mobility Predictability. Accepted Dec. 7, 2012 as a full paper in The 12th International Symposium on Web and Wireless Geographical Information Systems (W2GIS 2013), 4-5 April 2013, Banff, Alberta, Canada.

Hashemian M., Qian W., Stanley K.G., Osgood, N.D. 2012 Temporal aggregation impacts on epidemiological simulations employing microcontact data. BMC Medical Informatics and Decision Making 2012, (12)132, 20pp (plus figures).

Hashemian, M., Knowles, D., Calver, J., Qian, W., Bullock M., Bell, S., Mandryk, R.L., Osgood, N.D.,Stanley, K.G. 2012. "iEpi: An End to End Solution for Collecting, Conditioning and Utilizing Epidemiologically Relevant Data." Accepted March 27, 2012 by the The 2nd ACM International Workshop on Pervasive Wireless Healthcare. June 11-14, 2012. Hilton Head, South Carolina.

Hashemian, M., Stanley, K.G., Knowles D.L., Calver J., Osgood, N.D. 2011. "Human Network Data Collection in the Wild: The Epidemiological Utility of Micro-contact and Location Data". Accepted for publication as a full paper in Proceedings of the ACM SIGHIT International Health Informatics Symposium (IHI 2012). January 28-30, 2012, Miami, FL. 10pp.

Stanley, K., Osgood, N. "The Potential of Sensor-Based Monitoring as a Health Care, Health Promotion and Research Tool". Invited Editorial in Annals of Family Medicine. 4pp. In press. Accepted May 27, 2011.

Hashemian, M., Stanley, K., Osgood, N. 2012. "Leveraging H1N1 infection transmission modeling with proximity network microdata." Accepted April 26, 2012 by BMC Medical Informatics and Decision Making. 39pp.

Hashemian, M., Stanley, K., and Osgood, N. 2010. "Flunet: Automated tracking of contacts during flu season." Proceedings of the 6th International workshop on Wireless Network Measurements (WiNMee 2010), 557-562, 6pp.

Some example images produced from iEpi data are shown below.

Within the diagram below, nodes are wifi locations. Two wifi nodes are considered connected if at least one participant detected them in the same 5-minute timeslot with a requisite signal strength. The nodes are shown with independent horizontal and vertical spans. The length of the horizontal axis for a node varies in proportion to the density of nonparticipants (people per unit time) detected at that node. The length of the vertical axis of a node is proportional to the density of participants detected at that node.

The following diagram depicts likelihood of infection in different locations in Saskatoon, as judged by a transmission model for a hypothetical influenza-like illness.

In the diagram below, a rough proxy for physical activity (based on accelerometer readings from participants' cellphones) were used to estimate levels of physical acitivity observed throughout Saskatoon over a one-month period.

The below depicts wifi locations by non-participant time density (size) and count of distinct non-participants seen at location (brightness). Lines show association between a participant and particular locations at which they were present.

Wifi locations here are shown as circles, each with area proportional to non-participant density. Brightness indicates count of discint non-participants seen at a location. Participants are shown in red, with connections to the locations with which they were associated.