Document Type

Thesis

Degree

Master of Science

Major

Computer Science

Date of Defense

10-31-2019

Graduate Advisor

Dr. Badri Adhikari

Committee

Dr. Badri Adhikari

Dr. Uday Chakraborty

Dr. Sharlee Climer

Abstract

As with most domains where machine learning methods are applied, correct feature engineering is critical when developing deep learning algorithms for solving the protein folding problem. Unlike the domains such as computer vision and natural language processing, feature engineering is not rigorously studied towards solving the protein folding problem. A recent research has highlighted that input features known as precision matrix are most informative for predicting inter-residue contact map, the key for building three-dimensional models. In this work, we study the significance of the precision matrix feature when very deep residual networks are trained. Using a standard dataset of 3456 proteins, known as the DeepCov set, we trained multiple deep residual networks and tested our models on an independent test dataset of 150 proteins. On this test dataset, we find that precision matrix features deliver 3.7% more precise long-range contacts than the benchmark covariance matrix features in our recently published method DEEPCON. In addition to validating the findings that precision matrix is more informative, we also find that the significance of precision matrix is reduced when deeper residual network models are trained. Our method, DEEPCON-PRE, i.e. DEEPCON with precision matrix as input feature, is available at https://github.com/nachammai779/Deepcon_Precision.

Share

COinS