Date of Award
Master of Science (MS)
Dr. Hesham H. Ali
Motivation: Clustering algorithms are widely used m bioinformatics, having been applied to a range of problems from the analysis of gene expression to the building of phylogenetic trees. Biological data often describe parallel and spontaneous processes such as molecular interactions and genome evolution. To capture these features, we propose a new clustering algorithm that employs the concept of message passing. Methods: Inspired by a real-world situation in which people who have never met can form groups by exchanging messages, Message Passing Clustering (MPC) allows data objects to communicate with each other and produces clusters in parallel, thereby making the clustering process intrinsic. Other advantages of MPC over traditional clustering methods include that it is relatively straightforward to understand and implement and that it takes into account both local and global structure. We have proved that MPC shares similarity with Hierarchical Clustering (HC) but offers significantly improved performance. Results: To validate the MPC method, we analyzed 35 sets of simulated dynamic gene expression data, achieving a 95% hit rate with 639 of 674 genes correctly clustered. We also applied MPC to real data sets to build a phylogenetic tree for 34 strains from nine species of Mycobacterium and to cluster 698 genes from a yeast cell-cycle database. The results show higher classification accuracies as compared to traditional clustering methods.
Geng, Huimin, "A New Approach to Clustering Biological Data Using Message Passing." (2001). Student Work. 3547.
Files over 3MB may be slow to open. For best results, right-click and select "save as..."