An Analysis of de Novo Genome Assembly Algorithms

Presenter Type

UNO Undergraduate Student

Major/Field of Study

Bioinformatics

Advisor Information

Kathryn Cooper

Location

CEC RM #201/205/209

Presentation Type

Poster

Poster Size

48x36 inches

Start Date

22-3-2024 1:00 PM

End Date

22-3-2024 2:15 PM

Abstract

The field of bioinformatics for decades has been interested in the computational problem of de novo genome sequencing. De novo genome sequencing is the act of assembling an organism’s genome from small, fragmented reads without an existing genome for reference. As Whole Genome Shotgun (WGS) sequencing develops to yield increasing amounts of reads, the computational problem grows for developing a way to assemble a genome, especially with no reference genome to map to. Programmers are working in an uphill journey to develop algorithms that are efficient, effective, and accurate. A variety of computational methods are used, with varying degrees of success and popularity. I have isolated three algorithms to review based on time and space complexity, sustained use over the last decade, and viability in small scale research projects. The algorithms for review are: MaSuRCA, SGA, and JR-Assembler. MaSuRCA has maintained use over the last decade through hybrid assembly techniques and accuracy. SGA was innovative and popular for its time and space complexity but was replaced by newer derivative algorithms in the last five years. JR-Assembler revived an abandoned assembly technique of extended-reads and was time and space conservative. It was utilized briefly in the first few years of its creation but did not sustain preferred use in the scientific community and thus was no longer updated. Through observing three algorithms of varying degrees of success, it is clear that the scientific community values accuracy most, and is at times willing to compromise on time and space efficiency.

This document is currently not available here.

COinS
 
Mar 22nd, 1:00 PM Mar 22nd, 2:15 PM

An Analysis of de Novo Genome Assembly Algorithms

CEC RM #201/205/209

The field of bioinformatics for decades has been interested in the computational problem of de novo genome sequencing. De novo genome sequencing is the act of assembling an organism’s genome from small, fragmented reads without an existing genome for reference. As Whole Genome Shotgun (WGS) sequencing develops to yield increasing amounts of reads, the computational problem grows for developing a way to assemble a genome, especially with no reference genome to map to. Programmers are working in an uphill journey to develop algorithms that are efficient, effective, and accurate. A variety of computational methods are used, with varying degrees of success and popularity. I have isolated three algorithms to review based on time and space complexity, sustained use over the last decade, and viability in small scale research projects. The algorithms for review are: MaSuRCA, SGA, and JR-Assembler. MaSuRCA has maintained use over the last decade through hybrid assembly techniques and accuracy. SGA was innovative and popular for its time and space complexity but was replaced by newer derivative algorithms in the last five years. JR-Assembler revived an abandoned assembly technique of extended-reads and was time and space conservative. It was utilized briefly in the first few years of its creation but did not sustain preferred use in the scientific community and thus was no longer updated. Through observing three algorithms of varying degrees of success, it is clear that the scientific community values accuracy most, and is at times willing to compromise on time and space efficiency.