Ethica/iEpi: A Robust & Versatile Smartphone-Based Epidemiological Data Collection System

Click here for a 30-minute hands-on video introduction to Ethica. Click here to see my youtube playlist with videos from our 2017 Ethica Health bootcamp (Sponsor: Sax Institute, Sydney AU). Videos from our 2016 bootcamp may also be of interest.

Click here to see information on the latest version of the Ethica/iEpi system (the Ethica Health platform), supporting both iPhones and Android devices.

Notice: Registration is now open for the 2019 Bootcamp & Incubator on Understanding Health Behavior using Smartphones and Wearables: Visitors interested in the materials available here may also be interested in knowing about for the coming 2019 Bootcamp & Incubator on Understanding Health Behavior using Smartphones and Wearables. The bootcamp will be held June 24-26, 2019 at the University of Saskatchewan and will incorporate both systematic and hands-on coverage of a wide variety of topics in smartphone-based data collection (and use of data from associated wearables); participants will receive hands-on assistance in building studies customized to their their research interests, bringing such studies from conceptualization to the analysis phase with cutting edge technologies. Click here to Register

Click here for brief sketches of how Ethica/iEpi can offer insight in different health subdomains.

While large volumes of epidemiological surveillance data are collected by public health authorities and researchers, such data suffers from some shortcomings. Collection of most information relies on individual self-reporting, which can be notoriously unreliable. Physically measured information (such as that collected by NHANES III, the Canadian Health Measures Survey and some other surveys) is traditionally costly and burdensome to acquire.

The rise of sensor-bearing smartphones offers the potential for enabling data collection from ubiquitous epidemiological sensing applications downloaded and run by volunteers -- all the while cross-linking this data to results from questionnaires delivered on the same devices, either at randomly selected times (as in classical ecological momentary assessments [EMAs]) or triggered by context.

Ethica is a 3rd generation health monitoring system for smartphones (iPhones & Android) and wearables. Ethica is designed to automatically record and cross-link a wide variety of minute-level resolution sensor data from the smartphone, from external sensors, data from study-specific on-device surveys (ecological momentary assessments [EMAs]) issued by the app, and crowdsourced data triggered (e.g., via button press) by the respondent. Ethica EMAs can contain a variety of types of questions as well as camera input (e.g., for dietary intake, medication dosage, test results, message exposure), and can be optionally triggered by context or user request (e.g., pressing a button indicating eating, administration of medication, occurrence of specific type of ideation or symptom). Reflecting the benefits of having an interface customized to each particular contexts, Ethica supports a highly reconfigurable interface, so that its functioning can be closely tailored to particular needs of the client. Moreover, Ethica can be easily adapted without programming to change the volume and type of sensor and survey data collected both between and adaptively within deployments. In addition to measuring sensor data from smartphones, Ethica now further supports a broad and growing set of wearable devices.

iEpi/Ethica has been used for over 100 studies worldwide, and is available in 6 languages. iEpi/Ethica has been deployed in support of health research in a wide number of geographic regions and socio-demographic contexts across North America, Europe, and Australia, including -- but not limited to -- projects at University of Michigan [multiple studies], Columbia University, Harvard School of Public Health [multiple studies], Drexel University, University of North Carolina, San Francisco State University, University of Western Sydney [multiple studies], Baylor College of Medicine, University of Loughborough, University of Regina [multiple studies], Wilfred Laurier University, Columbia University, University of New Mexico, University of Calgary, Memorial University, Boston University, Universite de Montreal, and Universita Della Svizzera Italiana [Lugano]. It has enjoyed several deployments to study infectious disease dynamics using microcontact data (including person-place and person-person transmission), and as part of a CDC-funded Randomized Control Trial, and is currently being deployed for a larger-scale trial project for a large US-based basic income study. Additional funded projects have focused on applying the system to better explicate the multi-factorial impact of highly trained service dogs on outcomes for veterans with PTSD, to elucidate the interaction between e-cigarettes and traditional tobacco products, to collect context-informed patient-reported outcomes for COPD, to enhance knowledge of antecedents and triggers of suicidal ideation amongst in-ward psychiatry patients and (separately) those in the community, to elevate understanding of community food procurement patterns, to assess the accuracy of traditional and smartphone based methods for tracking foodborne illness spread, assess the accuracy of traditional and smartphone based methods for tracking foodborne illness spread, understand mental health challenges in communities in Canada’s north. Ethica has enjoyed successful deployment in many university populations, in a variety of lower-income communities, in both high-density urban and rural regions, and for work ranging from institutional through to broader epidemiological studies, varying widely in scale from dozens to thousands of participants.

Common information that can be collected by Ethica/iEpi's sensors include information on participant location (with GPS), physical activity level (using accelerometers), sedentary behavior (using phone orientation and accelerometers), dietary intake (using the phone's camera), and inter-participant proximity/social networks (via Bluetooth), aspects of vehicular context, etc. In some deployments, system has also been adapted to collect data from external devices, such as weight scales. All such data can be automatically cross-linked to survey data collected via the app from the same respondent, which can be triggered by data picked up by the sensors (e.g., based on participant location, social context, or physical activity) or by or by participant action (e.g., taking an image of a meal, testing blood sugars or medication use).

While the data collected by the smartphone app side of Ethica/iEpi is typically streamed off of the devices for external analysis, the app is designed from the ground up to be highly secure as well as robust in the absence of connectivity. To ensure confidentiality, data is stored on the phone in an encrypted fashion. Transparently and invisibly to the user, data collected by Ethica/iEpi is opportunistically uploaded in encrypted form as the participant comes in contact with cell-phone and WiFi based networks.

Ethica/iEpi has been deployed in support of health research in a number of geographic regions and socio-demographic contexts across North America. Applicants have successfully employed Ethica/iEpi in diverse studies lasting 1-4 months. Ethica/iEpi has enjoyed successful deployment many university populations, in several lower-income communities, in both high-density urban and rural regions, and for work ranging from institutional through to broader epidemiological studies, varying widely in scale. When paired with transmission models, the system has successfully supported new insights on contagious disease spread and social determinants of health.

In what way could Ethica/iEpi help advance the research objectives of health research? There are diverse areas possible contribution, but we highlight two here. 1) Assessing symptomology (both clinical and subclinical) on a 24x7 basis, and assessing its temporal relationship to risk factors and exposures recorded by sensors (e.g., physical activity, pose, some social context, location) or EMAs (medication compliance, self-reported stressors, second-hand smoke exposure, dietary intake, other aspects of social context). 2) Use of such monitoring to enhance the speed, reliability, and depth of learning from implemented interventions. More specifically, when an intervention succeeds or fails, because of limitations on traditional measurements instruments (e.g., their limited accuracy in measuring changes in medication compliance, physical activity, dietary behavior, socialization and mixing patterns, mobility, communicational behavior), there is often limited understanding of the specific pathways of effect by which such success was realized or thwarted. A tool such as Ethica/iEpi is designed to inform an accurate understanding of the particular pathways by which an intervention affects important outcome measures (e.g., frequency and intensity of pain associated with osteoarthritis). Regardless of whether an intervention is successful or not in the end, the learning from it can be much deeper, more reliable and quicker by virtue of being able to examine which and by how much and how soon different pathways were affected (e.g., allowing observers to distinguish pathways that were successfully nudged vs. pathways that became a bottleneck and thereby stymied change in the outcomes of interest, or particular pathways that exerted disproportionate impact for interventions that did have effect.) Regardless of the success of the original intervention, securing such understanding from it can be of great value in devising more reliable interventions.

We see our Ethica/iEpi system as a natural complement to our computational models; the two work together to yield very powerful decision-making tools. Data from Ethica/iEpi helps to ground our models with a profusion of detailed, longitudinal data at the individual level. Various types of modeling we apply -- agent-based and aggregate simulation models, as well as inferential and statistical models -- help to "make sense" of this data, and to relate it to the choices that need to be made. In order to ground models in incoming data, we secure particularly pronounced benefit by combining dynamic models with machine-learning tools such as sequential Monte Carlo methods (particle filtering) and Particle MCMC methods.

The machine learning/computational statistics based sequential Monte Carlo technique of Particle Filtering allows for a model that learns as new data becomes available, by recurrently regrounding estimates in the model against observed data. This provides strong support for probabilistically estimating the current state of the system (e.g., people in different points in the health continuum), (probabilistically) projecting forward, and probabilistic assessment of intervention tradeoffs. We can think of Particle Filtering as providing value in many ways: 1) As a means of avoiding “blind” models, by allowing models to be brought back in line with observed conditions, rather than diverging increasingly from the observed world. 2) As a means of integrating a theory (in the form of a model) and one or more empirical time series to provide a an integrated, “tomographic” multi-dimensional picture of the state of the system. 3) As a way of learning from data as it arrives, sharpening our understanding of both the state of the system and the evolution of uncertain parameters (or, with MCMC, static parameter values).

Those interested in more concrete discussion about textured application of iEpiEthica/iEpiand similar smartphone technologies to different specific subdomains of health may find the this document of interest. The document discusses the use of Ethica/iEpi -- and, by extension, similar smartphone-based sensing, EMA and crowdsourcing systems -- to the Obesity, Chronic Disease, Tobacco-Related Disease, Communicable Disease, Mental Health, Environmental Epidemiology, and Health Services research areas.

The latest version of Ethica/iEpi is used centrally in our laboratory, but also being offered commercially through Ethica Data Systems. To get a feel for the system, we encourage signing up here for a free trial. Sample studies, instruments and demonstration data are available in the areas of physical activity/weight/built environment, waterborne illness, foodborne illness, communicable illness, air quality and zoonoses (Lyme's Disease, and West Nile virus).

For further information, please see the following streaming videos:

Broad playlist of videos on use of Ethica, and advanced material on more sophisticated analysis tools for Ethica data.

First glimpse of ethica interface presentation at Combining Data Science and Systems Science (Big Data and Dynamic Modeling) for Health.

Ethica Geographic Heat Map and Kibana and Survey Responses presentation at Combining Data Science and Systems Science (Big Data and Dynamic Modeling) for Health

Big Data and Dynamic Modeling presentation at Deakin Modeling and Chronic Disease Master Class 2015 (sponsored by Sax Institute and Deakin University).

My presentation from the Institute for Systems Science and Health 2011 demonstrates how we can leaverage such data using 3 systems science modeling techniques.

Presentation delivered at the 2012 Annual Meeting for the Society for Epidemiological Research focuses on how sensing can inform the design of rich simulation models, but also comments on the synergy between sensing and dynamic models.



Seitzinger PJ, Tataryn, J, Osgood, N, Waldner C. 2019. Foodborne Outbreak Investigation: Effect of Recall Inaccuracies on Food Histories. Journal of Food Protection, 82(6), June 2019. pp.931-939.

Seitzinger P, Osgood N, Martin W, Tataryn J, Waldner C. 2019. Compliance Rates, Advantages, and Drawbacks of a Smartphone-Based Method of Collecting Food History and Foodborne Illness Data. Journal of Food Protection, 82(6), June 2019. pp.1061-1070.

McLean, A., Osgood, N., Newstead-Angel, J., Stanley, K., Knowles, D., van der Kamp, W., Qian, W., and Dyck, R. Chapter in Lau, F., Bartle-Clar, J., Bliss, G., Brycki, E., Courtney, K., Kuo, A. Building research capacity: results of a feasibility study using a novel mHealth epidemiological data collection system within a gestational diabetes population. 2017. Building Capacity for Health Informatics in the Future, IOS Press, Inc. ISBN 978-1-61499-741-2. [e-Book version also available; ISBN 978-1-61499-742-9] 234:238. p228.

McPhee-Knowles S., Osgood N. 2016. Agent-based Models and Health Oriented Mobile Technologies. Chapter in Kaplan G.A., Diez Roux A., Galea S., Simon C.P., Editors, Growing Inequality: Bridging Complex Systems, Health Disparities, and Population Health. Oxford University.

Seitzinger P., Osgood N., Martin W., Tataryn J., Waldner C. 2019. Feasibility of Smartphone-Based Technology to Support the Collection of Food History and Illness Data. Accepted by Journal of Food Protection January 24, 2019. (JFP-18-548R)

Paul, T., Stanley, K.G. and Osgood, N.D., 2018. Multiscale entropy rate analysis of complex mobile agents. Royal Society Open Science, 5(10), p.180488.

Katapally, T.R., Bhawra, J., Leatherdale, S.T., Ferguson, L., Longo, J., Rainham, D., Larouche, R. and Osgood, N. 2018. The SMART Study, a Mobile Health and Citizen Science Methodological Platform for Active Living Surveillance, Integrated Knowledge Translation, and Policy Interventions: Longitudinal Study. JMIR public health and surveillance, 4(1).

Stanley K., Bell S., Kreuger L.K., Bhowmik P., Shojaati N., Elliot A., Osgood N.D. 2016. Opportunistic natural experiments using digital telemetry: a transit disruption case study. International Journal of Geographical Information Science (2016): 1-20.

Aiello, A.E., Simanek, A.M., Eisenberg M.C., Walsh A.R., Davis, B., Volz, E., Cheng, C., Rainey, J.J.; Uzicanin, A., Gao, H. ; Osgood, N. ; Knowles, D. , Stanley, K., Tarter K., Monto, A.S. "Design and Methods of a Social Network Isolation Study for Reducing Respiratory Infection Transmission: The eX-FLU Cluster Randomized Trial." Accepted by Epidemics, January 19, 2016.

Marshall, D., Burgos-Liz, L., Pasupathy, K., Padula, W., IJzerman, M., Wong, P., Higashi, M., Engbers, J., Wiebe, S., Crown, W., Osgood, N. (2015). Transforming Healthcare Delivery: Integrating Dynamic Simulation Modelling and Big Data in Health Economics and Outcomes Research. PharmacoEconomics: 1-12.

Knowles, D.L., Stanley, K.G., Osgood, N.D. 2014. A Field-Validated Architecture for the Collection of Health-Relevant Behavioural Data. Oral presentation and full paper publication in Proceedings the IEEE International Conference on Healthcare Informatics 2014 (ICHI 2014). pp. 79-88. Verona, Italy, September 15-17, 2014.

Knowles, D.L., Stanley, K.G., Osgood, N.D. 2014. Seddacco: An Extensible Language in Support of Mass Collection of Health Behavior Data. Oral presentation and publication in ACM SIGKDD Workshop on Health Informatics (HI-KDD 2014). 8pp. New York City, August 24, 2014.

Qian, W., Osgood, N.D., Stanley, K.G. Integrating epidemiological modeling and surveillance data feeds: a Kalman filter based approach. Oral presentation and publication in Proceedings the 2014 International Social Computing, Behavioral Modeling and Prediction Conference (SBP14), Washington DC, pp. 145-152. April 2-4, 2014.

Qian W., Stanley K., Osgood, N. 2012. The Impact of Spatial Resolution and Representation on Human Mobility Predictability. Accepted Dec. 7, 2012 as a full paper in The 12th International Symposium on Web and Wireless Geographical Information Systems (W2GIS 2013), 4-5 April 2013, Banff, Alberta, Canada.

Hashemian M., Qian W., Stanley K.G., Osgood, N.D. 2012 Temporal aggregation impacts on epidemiological simulations employing microcontact data. BMC Medical Informatics and Decision Making 2012, (12)132, 20pp (plus figures).

Hashemian, M., Knowles, D., Calver, J., Qian, W., Bullock M., Bell, S., Mandryk, R.L., Osgood, N.D.,Stanley, K.G. 2012. "iEpi: An End to End Solution for Collecting, Conditioning and Utilizing Epidemiologically Relevant Data." Accepted March 27, 2012 by the The 2nd ACM International Workshop on Pervasive Wireless Healthcare. June 11-14, 2012. Hilton Head, South Carolina.

Hashemian, M., Stanley, K.G., Knowles D.L., Calver J., Osgood, N.D. 2011. "Human Network Data Collection in the Wild: The Epidemiological Utility of Micro-contact and Location Data". Accepted for publication as a full paper in Proceedings of the ACM SIGHIT International Health Informatics Symposium (IHI 2012). January 28-30, 2012, Miami, FL. 10pp.

Stanley, K., Osgood, N. "The Potential of Sensor-Based Monitoring as a Health Care, Health Promotion and Research Tool". Invited Editorial in Annals of Family Medicine. 4pp. In press. Accepted May 27, 2011.

Hashemian, M., Stanley, K., Osgood, N. 2012. "Leveraging H1N1 infection transmission modeling with proximity network microdata." Accepted April 26, 2012 by BMC Medical Informatics and Decision Making. 39pp.

Hashemian, M., Stanley, K., and Osgood, N. 2010. "Flunet: Automated tracking of contacts during flu season." Proceedings of the 6th International workshop on Wireless Network Measurements (WiNMee 2010), 557-562, 6pp.

Some example images produced from iEpi data are shown below.

Within the diagram below, nodes are wifi locations. Two wifi nodes are considered connected if at least one participant detected them in the same 5-minute timeslot with a requisite signal strength. The nodes are shown with independent horizontal and vertical spans. The length of the horizontal axis for a node varies in proportion to the density of nonparticipants (people per unit time) detected at that node. The length of the vertical axis of a node is proportional to the density of participants detected at that node.

The following diagram depicts likelihood of infection in different locations in Saskatoon, as judged by a transmission model for a hypothetical influenza-like illness.

In the diagram below, a rough proxy for physical activity (based on accelerometer readings from participants' cellphones) were used to estimate levels of physical acitivity observed throughout Saskatoon over a one-month period.

The below depicts wifi locations by non-participant time density (size) and count of distinct non-participants seen at location (brightness). Lines show association between a participant and particular locations at which they were present.

Wifi locations here are shown as circles, each with area proportional to non-participant density. Brightness indicates count of discint non-participants seen at a location. Participants are shown in red, with connections to the locations with which they were associated.