Document Type
Thesis
Degree
Master of Science
Major
Computer Science
Date of Defense
10-31-2019
Graduate Advisor
Dr. Badri Adhikari
Committee
Dr. Badri Adhikari
Dr. Uday Chakraborty
Dr. Sharlee Climer
Abstract
As with most domains where machine learning methods are applied, correct feature engineering is critical when developing deep learning algorithms for solving the protein folding problem. Unlike the domains such as computer vision and natural language processing, feature engineering is not rigorously studied towards solving the protein folding problem. A recent research has highlighted that input features known as precision matrix are most informative for predicting inter-residue contact map, the key for building three-dimensional models. In this work, we study the significance of the precision matrix feature when very deep residual networks are trained. Using a standard dataset of 3456 proteins, known as the DeepCov set, we trained multiple deep residual networks and tested our models on an independent test dataset of 150 proteins. On this test dataset, we find that precision matrix features deliver 3.7% more precise long-range contacts than the benchmark covariance matrix features in our recently published method DEEPCON. In addition to validating the findings that precision matrix is more informative, we also find that the significance of precision matrix is reduced when deeper residual network models are trained. Our method, DEEPCON-PRE, i.e. DEEPCON with precision matrix as input feature, is available at https://github.com/nachammai779/Deepcon_Precision.
Recommended Citation
Palaniappan, Nachammai, "DEEPCON-PRE: Improved protein contact map prediction using inverse covariance and deep residual networks" (2019). Theses. 393.
https://irl.umsl.edu/thesis/393
Included in
Artificial Intelligence and Robotics Commons, Medical Biomathematics and Biometrics Commons