Date of Award
12-2025
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Data Science
First Advisor
Dr. Emilio Rivera
Second Advisor
Dr. Christian Haas
Abstract
This project aims to strengthen Gallup’s survey data preprocessing by integrating machine learning methods to detect inconsistent responses, complementing existing rule-based checks. The study benchmarks techniques such as Isolation Forest, clustering-based outlier scores, and autoencoders against current approaches to spot inconsistencies more efficiently and with reduced manual effort.
We will evaluate models using a dataset of survey responses previously flagged for inconsistency by Gallup, focusing on accuracy, coverage, and efficiency validated through expert review. The goal is to automate response classification into pass, review, or fail stages, aiming to reduce manual review time by about 25%, or 15 hours weekly, while maintaining accuracy for reliable survey data.
Integrating machine learning in preprocessing delivers cleaner data to the weighting team, reduces the risk of distorted weights, and improves subgroup estimates. ML boosts diagnostic capabilities, saves time, and enhances the reliability of survey results for both methodologists and analysts.
Recommended Citation
Ndungutse, Patrick, "Improving Data Quality in Survey Research: The Role of Machine Learning in Handling Inconsistent Responses" (2025). Information Systems and Quantitative Analysis Theses, Dissertations, and Student Creative Activity. 4.
https://digitalcommons.unomaha.edu/isqastudent/4
Comments
The author holds the copyright to this work and any reuse or permissions must be obtained from them directly.