Focus: A Graph Mining and Assembly Platform for the Discovery and Extraction of Biological Features in Next Generation Sequencing Reads
Advisor Information
Hesham Ali
Location
UNO Criss Library, Room 231
Presentation Type
Oral Presentation
Start Date
6-3-2015 1:30 PM
End Date
6-3-2015 1:45 PM
Abstract
Next Generation Sequencing (NGS) has recently emerged as the main technology behind the majority of Bioinformatics and Biomedical research projects. Although the assembly of the reads produced by NGS remains a difficult task, it is the process of extracting useful knowledge from these relatively short sequences that is quickly becoming one of the most exciting and challenging problems in Bioinformatics. Most current assemblers rely on the assembly graph as the foundational model for representing NGS reads. However, the assembly graph is primarily used to organize NGS data for assembly purposes, even though as a structural model it could be used as the basis of an expanded model to capture genomic structural features intrinsic to the input dataset. In this research, we propose a new innovative graph approach that not only assembles NGS reads but is also capable of mining valuable biological knowledge in the process. We demonstrate that we can uncover a wealth of biologically relevant information from our model’s structural features including ambiguous graph nodes, which have previously been considered stumbling blocks for many NGS tools. In addition, we explore graph characteristics that lead to the discovery of biologically relevant features in NGS datasets including rRNA sequences. We also investigate how the assembly graph under the proposed approach can be used to analyze comparative genomics data. The ability to directly extract information from the NGS reads and structural features of their assembly graphs will provide a powerful method of analyzing genomic data and lead to new biological discoveries.
Focus: A Graph Mining and Assembly Platform for the Discovery and Extraction of Biological Features in Next Generation Sequencing Reads
UNO Criss Library, Room 231
Next Generation Sequencing (NGS) has recently emerged as the main technology behind the majority of Bioinformatics and Biomedical research projects. Although the assembly of the reads produced by NGS remains a difficult task, it is the process of extracting useful knowledge from these relatively short sequences that is quickly becoming one of the most exciting and challenging problems in Bioinformatics. Most current assemblers rely on the assembly graph as the foundational model for representing NGS reads. However, the assembly graph is primarily used to organize NGS data for assembly purposes, even though as a structural model it could be used as the basis of an expanded model to capture genomic structural features intrinsic to the input dataset. In this research, we propose a new innovative graph approach that not only assembles NGS reads but is also capable of mining valuable biological knowledge in the process. We demonstrate that we can uncover a wealth of biologically relevant information from our model’s structural features including ambiguous graph nodes, which have previously been considered stumbling blocks for many NGS tools. In addition, we explore graph characteristics that lead to the discovery of biologically relevant features in NGS datasets including rRNA sequences. We also investigate how the assembly graph under the proposed approach can be used to analyze comparative genomics data. The ability to directly extract information from the NGS reads and structural features of their assembly graphs will provide a powerful method of analyzing genomic data and lead to new biological discoveries.