M.Sc Thesis Defences

A Constraint Logic Programming Approach to Predicting the Three-Dimensional Yeast Genome

kimIn order for all of a cell's genetic information to fit inside its nucleus, the chromosomes must undergo extensive folding and organization. Just like in origami where the same piece of paper folded in different ways allows the paper to take on different forms and potential functions, it is possible that different genomic organizations (or architectures) are related to various nuclear functions. Until recently it has been impossible to comprehensively investigate this relationship due to the lack of high-resolution and high-throughput techniques for identifying genomic architectures. The recent development of a technique called Hi-C, which is a derivation of chromosome conformation capture, has made it possible to detect the complete set of interactions occurring within (intra-interactions) and between (inter-interactions) chromosomes in the nucleus. Many computational methods have been proposed that use these analytical results to infer the rough three-dimensional (3D) architecture of the genome. However, the genomic architecture also impacts additional types of nuclear interactions and techniques exist that are able to capture and measure these interactions. Unfortunately, it would be extremely difficult to incorporate these additional datasets into the existing tools. To overcome this, a novel application of constraint logic programming (CLP) was used to develop a new program for the prediction of the 3D genomic architecture.

The unique representation used in this program lends itself well to the future incorporation of additional genomic datasets. This thesis investigates the most efficient way to represent and solve the constraint satisfaction problem of the 3D genome. The developed program was used to predict a 3D logical model of the yeast genome and the results were visualized using Cytoscape. This model was then biologically validated through literature search which verified that the prediction was able to recapitulate key documented features of the yeast genome. Future work will utilize this tool as a computational framework and extend it to incorporate additional genomic datasets and information into the prediction and visualization of the 3D genomic architecture. The CLP program developed here is a step towards a better understanding of the elusive relationship between the 3D structure of the genome and various nuclear functions.

M.Sc. Thesis Defence: Kim Mackay

Monday, August 22, 2016 @ 1:30 pm 

Applications of Machine Learning to Predicting Selection in Antibody Phage Display


Antibodies form an essential component of the immune system and have important scientific and clinical applications owing to their ability to bind strongly and specifically to biomolecular targets (e.g. proteins). To produce antibodies for scientific and clinical applications, researchers can use a wet-lab technique called antibody phage display, which starts with a library of diverse antibody fragments, and selects and amplifies those fragments that bind to the target of interest. Antibody phage display combined with next-generation sequencing (NGS) technology has the potential to yield greater insight into the selection process.

The research goals of this thesis were to (1) extract meaningful patterns from antibody phage display sequence data using tools and techniques from the field of machine learning (an area of artificial intelligence uniquely suited to recognizing patterns in large datasets); (2) predict outcomes of antibody phage display using these patterns; and (3) reverse engineer the resulting prediction tools to gain greater insight into the learned patterns and the selection process.

To achieve these goals, antibody phage display data produced by the Geyer lab (U of S) using two libraries (F and S) was used to train and then compare various machine learning models: a naive Bayes network (NB), linear model (LM), artificial neural network (ANN), support vector machine (SVM) with a radial basis function kernel (RBF-SVM), SVM with a string kernel (SSK-SVM), and random forest (RF). The ANN, SVMs, and RF models had the best average classification accuracy (81.5%), but of this group, there was not one classifier that performed significantly better than the others. The antibody phage display data was then grouped according to which of library F and library S was used in the antibody phage display experiment. Data originating from library F was used to train the two SVMs while library S data was used to test them. The SVMs trained on library F and tested on library S achieved an average classification accuracy of 66.7%, significantly better than would be achieved by assigning classes at random. The two SVM models trained on library F were then deconstructed to understand what features discriminate positive and negative predictions. The predictions of the RBF-SVM were found to be highly dependent on the molecular weight of the relevant binding region (i.e. CDRH3).

M.Sc. Thesis Defence: Daniel Hogan

Monday, September 1, 2016 @ 9:00 am