Date of Award

12-2025

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Data Science

First Advisor

Dr. Emilio Rivera

Second Advisor

Dr. Christian Haas

Abstract

This project aims to strengthen Gallup’s survey data preprocessing by integrating machine learning methods to detect inconsistent responses, complementing existing rule-based checks. The study benchmarks techniques such as Isolation Forest, clustering-based outlier scores, and autoencoders against current approaches to spot inconsistencies more efficiently and with reduced manual effort.

We will evaluate models using a dataset of survey responses previously flagged for inconsistency by Gallup, focusing on accuracy, coverage, and efficiency validated through expert review. The goal is to automate response classification into pass, review, or fail stages, aiming to reduce manual review time by about 25%, or 15 hours weekly, while maintaining accuracy for reliable survey data.

Integrating machine learning in preprocessing delivers cleaner data to the weighting team, reduces the risk of distorted weights, and improves subgroup estimates. ML boosts diagnostic capabilities, saves time, and enhances the reliability of survey results for both methodologists and analysts.

Comments

The author holds the copyright to this work and any reuse or permissions must be obtained from them directly.

Share

COinS