Information Systems and Quantitative Analysis Faculty Publications

An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

Julia Warnke, University of Nebraska at OmahaFollow
Hesham Ali, University of Nebraska at OmahaFollow

Document Type

Article

Publication Date

2012

Publication Title

BMC Bioinformatics

Volume

Issue

First Page

Last Page

Abstract

Background: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly.

Results: Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies.

Conclusions: Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies.

Comments

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0). The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Recommended Citation

Warnke, Julia and Ali, Hesham, "An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads" (2012). Information Systems and Quantitative Analysis Faculty Publications. 52.
https://digitalcommons.unomaha.edu/isqafacpub/52

Download

Find in your library

Included in

Bioinformatics Commons

COinS

DigitalCommons@UNO

Information Systems and Quantitative Analysis Faculty Publications

An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

Document Type

Publication Date

Publication Title

Volume

Issue

First Page

Last Page

Abstract

Comments

Recommended Citation

Included in

Search

Links

Browse

Author Corner

Links

DigitalCommons@UNO

Information Systems and Quantitative Analysis Faculty Publications

An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

Authors

Document Type

Publication Date

Publication Title

Volume

Issue

First Page

Last Page

Abstract

Comments

Recommended Citation

Included in

Share

Search

Links

Browse

Author Corner

Links