Date of Award

12-2024

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Mathematics

First Advisor

Dr. Xiaoyue Cheng

Abstract

Medicare, a public health insurance program, provides essential coverage to older adults and individuals with certain disabilities but suffers significant financial losses from fraudulent activities such as false claims and upcoding. To combat this, machine learning techniques, including XGBoost and Random Forest, were employed to detect fraudulent behavior in Medicare data provided by Mutual of Omaha. A key challenge in fraud detection is the imbalance in datasets, with legitimate claims vastly outnumbering fraudulent ones. This study addressed the issue using cost-sensitive methods and sampling techniques, including Random Oversampling, SMOTE, and Random Undersampling. A baseline model without any imbalance techniques was also evaluated to assess whether the models themselves could effectively handle the data imbalance. Extensive feature engineering was conducted to enhance predictive strength, focusing on identifying fraudulent claims at the individual claim level. A novel data-splitting strategy was introduced to ensure claims from the same provider did not appear in multiple splits, maintaining evaluation integrity. The inclusion and exclusion of dummy variables were also explored to optimize model generalizability. The results demonstrated that models with dummy variables generally outperformed those without in terms of predictive strength. While XGBoost models, both with and without dummy variables, encountered challenges in achieving high recall on the testing data, Random Forest models exhibited stronger generalization across the testing dataset. Ultimately, the Random Forest baseline model with dummy variables was selected as the best-performing model. These findings underscore the critical role of dummy variables and newly engineered features in enhancing fraud detection.

Comments

The author holds to the copyright to this work. Reach out to the author directly for any reuse or permissions.

Share

COinS