Reworking BugSeq-er2 to streamline metagenomics binning using DAS Tool

Presenter Type

UNO Undergraduate Student

Major/Field of Study

Bioinformatics

Other

Bioinformatics

Advisor Information

Jonathan B. Clayton

Location

CEC RM #201/205/209

Presentation Type

Poster

Poster Size

36x48in

Start Date

22-3-2024 1:00 PM

End Date

22-3-2024 2:15 PM

Abstract

The bacterial composition of the gut microbiome is thought to be the root of many health issues in organisms and has become an important topic of investigation in the scientific community. With trillions of microbes across thousands of species present in a single human gut, it is nearly impossible to manually investigate and observe differences without the aid of bioinformatics. Many labs create their own pipelines for data analysis that suit their individual needs, often utilizing open-source programs provided by bioinformaticians all over the world. The Clayton Lab has created a pipeline known as BugSeq-er2 that can process next generation sequencing data generated from biological samples and return information regarding bacterial composition. Throughout the pipeline, sequence reads are analyzed before they are organized into groups known as bins, which provide the lab with data on what bacterial genomes remain in the gut microbiome after each run of the experiment. The binning section of the prototype pipeline had higher than expected runtime for the pipeline to be used efficiently in the lab and did not adopt an effective file management schema. The prototype pipeline took >12 hours to run and the final data files were ineffectively organized, making data retrieval inefficient. To fix the output file management, the code needed to be altered to handle N amounts of output files before concatenating (joining) them into a single output file that is easy for the user to retrieve. To do this, we used the workflow management system Snakemake and additional bash scripting to modify the preexisting code by changing the rule structure within the code to meet the previously mentioned goals. By changing the code, we have found and removed redundant file structures, reorganized the binning process of the pipeline, and have significantly reduced the total runtime for the BugSeq-er2 pipeline.

This document is currently not available here.

COinS
 
Mar 22nd, 1:00 PM Mar 22nd, 2:15 PM

Reworking BugSeq-er2 to streamline metagenomics binning using DAS Tool

CEC RM #201/205/209

The bacterial composition of the gut microbiome is thought to be the root of many health issues in organisms and has become an important topic of investigation in the scientific community. With trillions of microbes across thousands of species present in a single human gut, it is nearly impossible to manually investigate and observe differences without the aid of bioinformatics. Many labs create their own pipelines for data analysis that suit their individual needs, often utilizing open-source programs provided by bioinformaticians all over the world. The Clayton Lab has created a pipeline known as BugSeq-er2 that can process next generation sequencing data generated from biological samples and return information regarding bacterial composition. Throughout the pipeline, sequence reads are analyzed before they are organized into groups known as bins, which provide the lab with data on what bacterial genomes remain in the gut microbiome after each run of the experiment. The binning section of the prototype pipeline had higher than expected runtime for the pipeline to be used efficiently in the lab and did not adopt an effective file management schema. The prototype pipeline took >12 hours to run and the final data files were ineffectively organized, making data retrieval inefficient. To fix the output file management, the code needed to be altered to handle N amounts of output files before concatenating (joining) them into a single output file that is easy for the user to retrieve. To do this, we used the workflow management system Snakemake and additional bash scripting to modify the preexisting code by changing the rule structure within the code to meet the previously mentioned goals. By changing the code, we have found and removed redundant file structures, reorganized the binning process of the pipeline, and have significantly reduced the total runtime for the BugSeq-er2 pipeline.