An Analysis of de Novo Genome Assembly Algorithms
Presenter Type
UNO Undergraduate Student
Major/Field of Study
Bioinformatics
Advisor Information
Kathryn Cooper
Location
CEC RM #201/205/209
Presentation Type
Poster
Poster Size
48x36 inches
Start Date
22-3-2024 1:00 PM
End Date
22-3-2024 2:15 PM
Abstract
The field of bioinformatics for decades has been interested in the computational problem of de novo genome sequencing. De novo genome sequencing is the act of assembling an organism’s genome from small, fragmented reads without an existing genome for reference. As Whole Genome Shotgun (WGS) sequencing develops to yield increasing amounts of reads, the computational problem grows for developing a way to assemble a genome, especially with no reference genome to map to. Programmers are working in an uphill journey to develop algorithms that are efficient, effective, and accurate. A variety of computational methods are used, with varying degrees of success and popularity. I have isolated three algorithms to review based on time and space complexity, sustained use over the last decade, and viability in small scale research projects. The algorithms for review are: MaSuRCA, SGA, and JR-Assembler. MaSuRCA has maintained use over the last decade through hybrid assembly techniques and accuracy. SGA was innovative and popular for its time and space complexity but was replaced by newer derivative algorithms in the last five years. JR-Assembler revived an abandoned assembly technique of extended-reads and was time and space conservative. It was utilized briefly in the first few years of its creation but did not sustain preferred use in the scientific community and thus was no longer updated. Through observing three algorithms of varying degrees of success, it is clear that the scientific community values accuracy most, and is at times willing to compromise on time and space efficiency.
An Analysis of de Novo Genome Assembly Algorithms
CEC RM #201/205/209
The field of bioinformatics for decades has been interested in the computational problem of de novo genome sequencing. De novo genome sequencing is the act of assembling an organism’s genome from small, fragmented reads without an existing genome for reference. As Whole Genome Shotgun (WGS) sequencing develops to yield increasing amounts of reads, the computational problem grows for developing a way to assemble a genome, especially with no reference genome to map to. Programmers are working in an uphill journey to develop algorithms that are efficient, effective, and accurate. A variety of computational methods are used, with varying degrees of success and popularity. I have isolated three algorithms to review based on time and space complexity, sustained use over the last decade, and viability in small scale research projects. The algorithms for review are: MaSuRCA, SGA, and JR-Assembler. MaSuRCA has maintained use over the last decade through hybrid assembly techniques and accuracy. SGA was innovative and popular for its time and space complexity but was replaced by newer derivative algorithms in the last five years. JR-Assembler revived an abandoned assembly technique of extended-reads and was time and space conservative. It was utilized briefly in the first few years of its creation but did not sustain preferred use in the scientific community and thus was no longer updated. Through observing three algorithms of varying degrees of success, it is clear that the scientific community values accuracy most, and is at times willing to compromise on time and space efficiency.