Document Type

Thesis

Degree

Master of Science

Major

Computer Science

Date of Defense

10-16-2019

Graduate Advisor

Badri Adhikari

Committee

Badri Adhikari

Sharlee Climer

Mark Hauschild

Abstract

The protein folding problem, also known as protein structure prediction, is the task of building three-dimensional protein models given their one-dimensional amino acid sequence. New methods that have been successfully used in the most recent CASP challenge have demonstrated that predicting a protein's inter-residue distances is key to solving this problem. Various deep learning algorithms including fully convolutional neural networks and residual networks have been developed to solve the distance prediction problem. In this work, we develop a hybrid method based on residual networks and capsule networks. We demonstrate that our method can predict distances more accurately than the algorithms used in the state-of-the-art methods. Using a standard dataset of 3420 training proteins and an independent dataset of 150 test proteins, we show that our method can predict distances 51.06% more accurately than a standard residual network method, when accuracy of all long-range distances are evaluated using mean absolute error. To further validate our results, we demonstrate that three-dimensional models built using the distances predicted by our method are more accurate than models built using the distances predicted by residual networks. Overall, our results, for the first time, highlight the potential of capsule-residual hybrid networks for solving the protein inter-residue distance prediction problem.

Share

COinS