Document Type

Thesis

Degree

Master of Science

Major

Computer Science

Date of Defense

4-18-2018

Graduate Advisor

Dr. Sharlee Climer

Committee

Dr. Cezary Janikow

Dr. Sanjiv Bhatia

Dr. Mark Hauschild

Dr. Sharlee Climer

Abstract

The problem of community structure identification has been an extensively investigated area for biology, physics, social sciences, and computer science in recent years for studying the properties of networks representing complex relationships. Most traditional methods, such as K-means and hierarchical clustering, are based on the assumption that communities have spherical configurations. Lately, Genetic Algorithms (GA) are being utilized for efficient community detection without imposing sphericity. GAs are machine learning methods which mimic natural selection and scale with the complexity of the network. However, traditional GA approaches employ a representation method that dramatically increases the solution space to be searched by introducing redundancies. They also utilize a crossover operator which imposes a linear ordering that is not suitable for community detection.

The algorithm presented here is a framework to detect communities for complex biological networks that removes both redundancies and linearity. We also introduce a novel operator, named Gene Repair. This algorithm is unique as it is a flexible community detection technique aimed at maximizing the value of any given mathematical objective for the network. We reduce the memory requirements by representing chromosomes as a 3-dimensional bit array. Furthermore, in order to increase diversity while retaining promising chromosomes, we use natural selection process based on tournament selection with elitism. Additionally, our approach doesn’t require prior information about the number of true communities in the network. We apply our novel algorithm to benchmark datasets and also to a network representing a large cohort of AD cases and controls.

By utilizing this efficient and flexible implementation that is cognizant of characteristics for networks representing complex disease genetics, we sift out communities representing patterns of interacting genetic variants that are associated with this enigmatic disease.

Share

COinS