The Cross-Roads of Algorithmic Fairness, Accountability, and Transparency in Predictive Analytics

Speaker: Dr. Mykola Pechenizkiy, TU Eindhoven
Date: Friday, August 23, 2019 @ 3:00pm. Doors open at 2:30pm
Location: Thorv 124 

Seminar Abstract

Modern machine learning techniques contribute to the massive automation of the data-driven decision making and decision support. It becomes better understood and accepted, in particular due to the new General Data Protection Regulation (GDPR), that employed predictive models may need to be audited. Disregarding whether we deal with so-called black-box models (e.g. deep learning) or more interpretable models (e.g. decision trees), answering even basic questions like “why is this model giving these answer?” and “how do particular features affect the model output?” is nontrivial. In reality, auditors need tools not just to explain the decision logic of an algorithm, but also to uncover and characterize undesired or unlawful biases in predictive model performance, e.g. by law hiring decisions cannot be influenced by race or gender. In this talk I will give a brief overview of the different facets of comprehensibility of predictive analytics and reflect on the current state-of-the-art and further research needed for gaining a deeper understanding of what it means for predictive analytics to be truly transparent and accountable. I will also reflect on the necessity to study utility of the methods for interpretable predictive analytics.  

Biography

Mykola Pechenizkiy is Professor of Data Mining at the Department of Mathematics and Computer Science, TU Eindhoven.  His core expertise and research interests are in predictive analytics and its application to real-world problems in industry, medicine and education. At the Data Science Center (DCS/e) he leads the Responsible Data Science interdisciplinary research program aiming at developing techniques for informed, accountable and transparent analytics. As principal investigator of several data science projects he aims at developing foundations for next generation predictive analytics and demonstrating their ecological validity in practice. Over the past decade he has co-authored more than 100 peer-reviewed publications and served on the program committees of the leading data mining and AI conferences. 

 

Supporting Source Code Search with Context-Aware and Semantics Driven Query Reformation

Speaker: Mohammad Masudur Rahman, Ph.D. Candidate
Date: Thursday, August 29, 2019 @ 3:00pm. Doors open at 2:30pm
Location: Thorv 129 

Seminar Abstract

Software bugs and failures cost trillions of dollars every year, and could lead to deadly accidents. During maintenance, software developers fix numerous bugs and implement hundreds of new features by making necessary changes to the existing software code. Once an issue report is assigned to a developer, she chooses a few important keywords from the report as a search query, and then attempts to find out the exact locations in the software code that need to be either repaired or enhanced. As a part of this maintenance, developers also often select ad hoc queries on the fly, and attempt to locate the reusable code from the Internet that could assist them either in bug fixing or in feature implementation. Unfortunately, even the experienced developers often fail to construct the right search queries. Even if the developers come up with a few ad hoc queries, most of them require frequent modifications which cost significant development time and efforts. Thus, construction of an appropriate query for localizing the software bugs, programming concepts or even the reusable code is a major challenge. In this thesis, we overcome this query construction challenge with six studies, and develop a novel, effective code search solution that assists the developers in localizing the software code of interest during software maintenance. In particular, we reformulate a given search query (1) by designing novel keyword selection algorithms that outperform the traditional alternatives, (2) by leveraging the bug report quality paradigm and source document structures which were previously overlooked and (3) by exploiting the crowd knowledge and word semantics derived from Stack Overflow Q&A site, which were previously untapped. Our experiment using 5000+ search queries suggests that our proposed approach can improve the given queries significantly through automated query reformulations. Comparison with 10+ existing studies on bug localization, concept location and Internet-scale code search suggests that our approach can outperform the state-of-the-art approaches with a significant margin.

Biography

Masud Rahman is a Ph.D. Candidate at University of Saskatchewan, who is advised by Dr. Chanchal K. Roy.  He received a M.Sc. in Computer Science/Software Engineering from U of S in 2014. Masud is interested in software change automation with a particular focus on bug localization, concept location, Internet-scale code search, and code review. In his works, Software Engineering meets Information Retrieval, Machine/Deep Learning, Data Mining and Large-scale Data Analytics. To date, he has co-authored 29 high quality research papers. One of his works was recently nominated for TCSE Distinguished Paper Award. His works were featured by Stack Overflow Blog and have attracted industry collaborators. Mr. Rahman has been awarded prestigious awards such as Keith Geddes Award, President Gold Medal, ACM CAPS Award 2017+2019, NSERC Industry Engage Grant, SK Innovation & Opportunity Scholarship, and International Dean's Scholarship. He is a regular reviewer and sub-reviewer of top Software Engineering conferences and journals.).

More details: http://www.usask.ca/~masud.rahma