Master of Science
Date of Defense
The protein folding problem, also known as protein structure prediction, is the task of building three-dimensional protein models given their one-dimensional amino acid sequence. New methods that have been successfully used in the most recent CASP challenge have demonstrated that predicting a protein's inter-residue distances is key to solving this problem. Various deep learning algorithms including fully convolutional neural networks and residual networks have been developed to solve the distance prediction problem. In this work, we develop a hybrid method based on residual networks and capsule networks. We demonstrate that our method can predict distances more accurately than the algorithms used in the state-of-the-art methods. Using a standard dataset of 3420 training proteins and an independent dataset of 150 test proteins, we show that our method can predict distances 51.06% more accurately than a standard residual network method, when accuracy of all long-range distances are evaluated using mean absolute error. To further validate our results, we demonstrate that three-dimensional models built using the distances predicted by our method are more accurate than models built using the distances predicted by residual networks. Overall, our results, for the first time, highlight the potential of capsule-residual hybrid networks for solving the protein inter-residue distance prediction problem.
Dillon, Andrew, "Protein Inter-Residue Distance Prediction Using Residual and Capsule Networks" (2019). Theses. 436.