Date of Award
12-2024
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Mathematics
First Advisor
Dr. Xiaoyue Cheng
Abstract
Medicare, a public health insurance program, provides essential coverage to older adults and individuals with certain disabilities but suffers significant financial losses from fraudulent activities such as false claims and upcoding. To combat this, machine learning techniques, including XGBoost and Random Forest, were employed to detect fraudulent behavior in Medicare data provided by Mutual of Omaha. A key challenge in fraud detection is the imbalance in datasets, with legitimate claims vastly outnumbering fraudulent ones. This study addressed the issue using cost-sensitive methods and sampling techniques, including Random Oversampling, SMOTE, and Random Undersampling. A baseline model without any imbalance techniques was also evaluated to assess whether the models themselves could effectively handle the data imbalance. Extensive feature engineering was conducted to enhance predictive strength, focusing on identifying fraudulent claims at the individual claim level. A novel data-splitting strategy was introduced to ensure claims from the same provider did not appear in multiple splits, maintaining evaluation integrity. The inclusion and exclusion of dummy variables were also explored to optimize model generalizability. The results demonstrated that models with dummy variables generally outperformed those without in terms of predictive strength. While XGBoost models, both with and without dummy variables, encountered challenges in achieving high recall on the testing data, Random Forest models exhibited stronger generalization across the testing dataset. Ultimately, the Random Forest baseline model with dummy variables was selected as the best-performing model. These findings underscore the critical role of dummy variables and newly engineered features in enhancing fraud detection.
Recommended Citation
Aitbayeva, Akbota, "Detecting Fraudulent Medicare Claims With Imbalanced Data Classification" (2024). Mathematics Theses, Dissertations, Research and Student Creative Activity. 4.
https://digitalcommons.unomaha.edu/mathstudent/4
Comments
The author holds to the copyright to this work. Reach out to the author directly for any reuse or permissions.