Date of Award

8-2013

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Dr. Harvey Siy

Second Advisor

Dr. Robin Gandhi

Third Advisor

Dr. Parvathi Chundi

Abstract

Software maintenance takes two thirds of the life cycle of the project. Bug fixes are an important part of software maintenance. Bugs are tracked using online tools like Bugzilla. It has been noted that around 10% of fixes are buggy fixes. Many bugs are documented as fixed when they are not actually fixed, thus reducing the reliability of the software. The overlooked bugs are critical as they take more resources to fix when discovered, and since they are not documented, the reality is that defect are still present and reduce reliability of software. There have been very few studies in understanding these bugs. The best way to understand these bugs is to mine software repositories. To generalize findings we need a large number of bug information and a wide category of software projects. To solve the problem, a web crawler collected around a million bug reports from online repositories, and extracted important attributes of the bug reports. We selected four algorithms: Bayesian network, NaiveBayes, C4.5 decision tree, and Alternating decision tree. We achieved a decent amount of accuracy in predicting reopened bugs across a wide range of projects. Using AdaBoost, we analyzed the most important factors responsible for the bugs and categorized them in three categories of reputation of committer, complex units, and insufficient knowledge of defect.

Comments

A Thesis Presented to the Department of Computer Science And the Faculty of the Graduate College University of Nebraska In Partial Fulfillment Of the Requirements for the Degree Master of Science. Copyright 2013 Rishikesh Gawade.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

Share

COinS