Designing a modular and customizable snakemake pipeline specialized for 16S data analysis
Presenter Type
UNO Undergraduate Student
Major/Field of Study
Biology
Other
Biology
Advisor Information
Jonathan Clayton
Location
CEC RM #201/205/209
Presentation Type
Poster
Poster Size
48"x36"
Start Date
22-3-2024 1:00 PM
End Date
22-3-2024 2:15 PM
Abstract
Designing a modular and customizable snakemake pipeline specialized for 16S data analysis
Chris H. Schinzela, Jordan B. Hernandeza,b,c, Katherine M. Cooperd, Paul A. Ayayeea, Jonathan B. Claytona,b,c,e,f,g
aDepartment of Biology, University of Nebraska at Omaha, Omaha, NE, USA
bNebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, NE, USA
cCallitrichid Research Center, University of Nebraska at Omaha, Omaha, NE, USA
dSchool of Interdisciplinary Informatics, College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
eDepartment of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, NE, USA
fDepartment of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, USA
gPrimate Microbiome Project, University of Nebraska-Lincoln, Lincoln, NE, USA
16S rRNA gene sequencing has become the de-facto method for easily studying gut microbial communities. As a result, a number of innovative software packages, such as QIIME2, DADA2, and Mothur, have put their own spin on the analysis pipeline and boast unique features over each of the other packages. The result of this has made 16S analysis a complicated subject to approach, particularly for those who are new to the field. In addition, more experienced microbial researchers may have trouble adding customizability to their data analyses. To combat this, an automated data pipeline called ampliConda, has been developed by our lab group with the goal of providing complete control over the 16S analysis process to the end user. This pipeline has been developed using Snakemake, a workflow manager aimed at automating data analysis with an emphasis on reproducibility and modularization. Additionally, Snakemake is incredibly effective at saving computing power and streamlining the analysis process, tracing the end goal of the pipeline back to the beginning and skipping unnecessary steps. Currently, both the QIIME2 and DADA2 (R-based) packages have been integrated into this pipeline, which has allowed the analysis framework (reference database, read trimming, etc.) to be customizable. Our next step is to further integrate 16S packages into one space where the end user can completely customize and automate a pipeline entirely of their choosing. The long-term goal of ampliConda is to become a hub where scientists of all experience-levels are able to perform customized analyses using 16S sequencing data.
Designing a modular and customizable snakemake pipeline specialized for 16S data analysis
CEC RM #201/205/209
Designing a modular and customizable snakemake pipeline specialized for 16S data analysis
Chris H. Schinzela, Jordan B. Hernandeza,b,c, Katherine M. Cooperd, Paul A. Ayayeea, Jonathan B. Claytona,b,c,e,f,g
aDepartment of Biology, University of Nebraska at Omaha, Omaha, NE, USA
bNebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, NE, USA
cCallitrichid Research Center, University of Nebraska at Omaha, Omaha, NE, USA
dSchool of Interdisciplinary Informatics, College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
eDepartment of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, NE, USA
fDepartment of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, USA
gPrimate Microbiome Project, University of Nebraska-Lincoln, Lincoln, NE, USA
16S rRNA gene sequencing has become the de-facto method for easily studying gut microbial communities. As a result, a number of innovative software packages, such as QIIME2, DADA2, and Mothur, have put their own spin on the analysis pipeline and boast unique features over each of the other packages. The result of this has made 16S analysis a complicated subject to approach, particularly for those who are new to the field. In addition, more experienced microbial researchers may have trouble adding customizability to their data analyses. To combat this, an automated data pipeline called ampliConda, has been developed by our lab group with the goal of providing complete control over the 16S analysis process to the end user. This pipeline has been developed using Snakemake, a workflow manager aimed at automating data analysis with an emphasis on reproducibility and modularization. Additionally, Snakemake is incredibly effective at saving computing power and streamlining the analysis process, tracing the end goal of the pipeline back to the beginning and skipping unnecessary steps. Currently, both the QIIME2 and DADA2 (R-based) packages have been integrated into this pipeline, which has allowed the analysis framework (reference database, read trimming, etc.) to be customizable. Our next step is to further integrate 16S packages into one space where the end user can completely customize and automate a pipeline entirely of their choosing. The long-term goal of ampliConda is to become a hub where scientists of all experience-levels are able to perform customized analyses using 16S sequencing data.