Detection of Bacterial Pathogens by Whole-Genome Sequencing Using a New Composite Genome Representation

Document Type



Master of Science


Biochemistry & Biotechnology

Date of Defense


Graduate Advisor

Chung Wong, Ph.D., Chairperson


Wendy Olivas, Ph.D.

Benjamin Bythell, Ph.D.

Chung Wong, Ph.D., Chairperson


Every day, millions of nucleotides are being sequenced by high throughput, next-generation sequencing (NGS) machines. With increased speed and accuracy of the sequencers coupled with the decreasing costs to perform DNA sequencing, many novel applications of DNA sequencing are being considered. These range from cancer studies to the detection of pathogens from body fluids or environmental samples. We have created a new composite reference genome representation and the associated programs that enable us to utilize the large amounts of high throughput sequencing data from metagenomics samples and rapidly analyze the presence of pathogenic bacterial strains. Based on our simulation tests, we found that our method could detect the presence of bacterium from metagenome samples (human gut or oral) even at very small loads of 0.1%. The algorithms are developed and used sequentially. A genome integration algorithm is used to construct new composite genomes as new reference genomes become available. A detection algorithm can detect the presence of pathogenic bacterium/ bacteria even at very small loads (0.1%) using the resulting composite reference genomes. The biggest advantage of using this technique/ method is that it was developed on laptops with 4GB RAM and it can be run on laptop computers coupled with low-cost, handheld, high throughput DNA sequencers to help researchers at target geographical areas to rapidly identify the presence of a pathogenic bacterium or bacterial contamination in mixed or metagenomics samples from livings systems, soil or water.

This document is currently not available here.