Date of Award


Document Type


Degree Name

Master of Science (MS)


Computer Science

First Advisor

Prof. Hesham Ali

Second Advisor

Prof. Dhundy Bastola

Third Advisor

Prof. Guoqing Lu


Comparing biological sequences remains one of the most vital activities in Bioinformatics. Comparing biological sequences would address the relatedness between species, and find similar structures that might lead to similar functions.

Sequence alignment is the default method, and has been used in the domain for over four decades. It gained a lot of trust, but limitations and even failure has been reported, especially with the new generated genomes. These new generated genomes have bigger size, and to some extent suffer errors. Such errors come mainly as a result from the sequencing machine. These sequencing errors should be considered when submitting sequences to GenBank, for sequence comparison, it is often hard to address or even trace this problem.

Alignment-based methods would fail with such errors, and even if biologists still trust them, reports showed failure with these methods.

The poor results of alignment-based methods with erratic sequences, motivated researchers in the domain to look for alternatives. These alternative methods are alignment-free, and would overcome the shortcomings of alignment-based methods. The work of this thesis is based on alignment-free methods, and it conducts an in-depth study to evaluate these methods, and find the right domain’s application for them. The right domain for alignment-free methods could be by applying them to data that were subjected to manufactured errors, and test the methods provide better comparison results with data that has naturally severe errors. The two techniques used in this work are compression-based and motif-based (or k-mer based, or signal based). We also addressed the selection of the used motifs in the second technique, and how to progress the results by selecting specific motifs that would enhance the quality of results.

In addition, we applied an alignment-free method to a different domain, which is gene prediction. We are using alignment-free in gene prediction to speed up the process of providing high quality results, and predict accurate stretches in the DNA sequence, which would be considered parts of genes.


A Thesis Presented to the Department of Computer Science And the Faculty of the Graduate College University of Nebraska In Partial Fulfillment of the Requirements for the Degree Master of Science University of Nebraska at Omaha. Copyright 2011 Ramez Mina.