Author

Ping Yang

Date of Award

5-1-2004

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Dr. Zhengxin Chen

Abstract

This thesis-equivalent project conducts an empirical study to compare four k-medoid based clustering algorithms which are CLARA, CLARANS, IMCMRS and GCA. These four algorithms were coded using C++ language on Linux platform. A variety of data sets are used in experiments to evaluate the performance of various algorithms. Factors considered in data set include cluster shape, asymmetry, distinction, random, data size, outlier, dimension and overlapping. The evaluation criteria are execution time and silhouette width. From the results of experiments in which Euclidean distance was adopted as the dissimilarity measure, we can conclude that except GCA, the convex shapes with small size have same effect on the clustering quality of other three algorithms. Factors of asymmetry, distinction, random, data size, outlier, dimension and overlapping have different effects on these four algorithms. Generally, CLARANS outperforms other algorithms, but its execution time increases significantly with the increasing data size. CLARA can achieve satisfactory clustering quality while maintaining its efficiency when the data size is larger. But it is not suitable for high-cluster and multi-dimension data. Experiments also show that among the four algorithms, IMCMRS is the most efficient algorithm in terms of time. It is more suitable for less separated clusters, high-dimension Gauss-Markov data and overlapping clusters. Within certain degree, IMCMRS is also more resistant to outlier factor. GCA is the most time-consuming algorithm when the number of two-dimensional objects is about below 5500. Its clustering quality is acceptable but decreases dramatically with the increasing asymmetry factor. To improve the GCA algorithm, a modified genetic clustering algorithm (IGCA) was developed. Experiments showed that this algorithm has improved performance.

Comments

A Thesis-equivalent Project Presented to the Department of Computer Science and the Faculty of the Graduate College University of Nebraska In Partial Fulfillment of the Requirements for the Degree Master of Science University of Nebraska at Omaha. Copyright 2004 Ping Yang

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

COinS