Date of Award

11-19-2010

Document Type

Thesis

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Dr. Qiuming Zhu

Abstract

The proliferation of the Semantic Web depends on ontologies for knowledge sharing, semantic annotation, data fusion, and descriptions of data for machine interpretation. However, ontologies are difficult to create and maintain. In addition, their structure and content may vary depending on the application and domain. Several methods described in literature have been used in creating ontologies from various data sources such as structured data in databases or unstructured text found in text documents or HTML documents. Various data mining techniques, natural language processing methods, syntactical analysis, machine learning methods, and other techniques have been used in building ontologies with automated and semi-automated processes. Due to the vast amount of unstructured text and its continued proliferation, the problem of constructing ontologies from text has attracted considerable attention for research. However, the constructed ontologies may be noisy, with missing and incorrect knowledge. Thus ontology construction continues to be a challenging research problem. The goal of this research is to investigate a new method for guiding a process of extracting and assembling candidate terms into domain specific concepts and relationships. The process is part of an overall semi automated system for creating ontologies from unstructured text sources and is driven by the user’s goals in an incremental process. The system applies natural language processing techniques and uses a series of syntactical analysis tools for extracting grammatical relations from a list of text terms representing the parts of speech of a sentence. The extraction process focuses on evaluating the subject predicate-object sequences of the text for potential concept-relation-concept triples to be built into an ontology. Users can guide the system by selecting seedling concept-relation-concept triples to assist building concepts from the extracted domain specific terms. As a result, the ontology building process develops into an incremental one that allows the user to interact with the system, to guide the development of an ontology, and to tailor the ontology for the user’s application needs. The main contribution of this work is the implementation and evaluation of a new semi- automated methodology for constructing domain specific ontologies from unstructured text corpus.

Comments

A DISSERTATION Presented to the Faculty of The Graduate College at the University of Nebraska In Partial Fulfillment of Requirements For the Degree of Doctor of Philosophy. Copyright 2010 William L. Sousan.

COinS