Date of Award
12-1-2003
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Dr. Hesham Ali
Abstract
The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and found the conservation of the non-coding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conversion on the exon structures. We sought to illuminate the impact of evolutionary distances on the performance of our gene-finding program based on the cross-species sequence comparison. Based on our finding and training of data sets, we proposed a model by which coding sequence could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.
Recommended Citation
Chen, Rong, "On Gene Prediction by Cross-Species Comparative Sequenced Analysis." (2003). Student Work. 3302.
https://digitalcommons.unomaha.edu/studentwork/3302
Files over 3MB may be slow to open. For best results, right-click and select "save as..."
Comments
A Thesis Presented to the Department of Computer Science and the Faculty of the Graduate College University of Nebraska In Partial Fulfillment of the Requirements for the Degree Master of Science University of Nebraska at Omaha. Copyright 2003 Rong Chen.