"A MapReduce Algorithm for Finding Hotspots of Topics from Time Stamped" by Ashwathy Ashokan

Student Work

Title

A MapReduce Algorithm for Finding Hotspots of Topics from Time Stamped Documents

Author

Ashwathy Ashokan, University of Nebraska at Omaha

Date of Award

5-2013

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Dr. Parvathi Chundi

Second Advisor

Dr. Sanjukta Bhowmick

Third Advisor

Dr. Ilze Zigurs

Abstract

Hotspots of a word/topic are time periods with a burst of activities in a time stamped document set. Identifying and analyzing hot spots of topics has been an important area of research. Finding hot spots of topics requires processing of contents of documents which is often time consuming. In this thesis, we explore MapReduce style algorithms for computing hot spots of topics. MapReduce is a distributed parallel programming model and an associated implementation for processing and analyzing large datasets. User specifies a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model and this thesis explores the feasibility of implementing the hotspot algorithm using MapReduce. We design map and reduce functions appropriate for preprocessing of documents, and the hot spot computation. We implement the functions in Hadoop (a MapReduce framework for Apache Foundation) and conduct several experiments to assess the benefits of MapReduce style implementation versus simple sequential implementation.

Comments

A Thesis Presented to the Department of Computer Science and the Faculty of the Graduate College University of Nebraska In Partial Fulfillment of the Requirements for the Degree Master of Science University of Nebraska at Omaha. Copyright 2013 Ashwathy Ashokan.

Recommended Citation

Ashokan, Ashwathy, "A MapReduce Algorithm for Finding Hotspots of Topics from Time Stamped Documents" (2013). Student Work. 2870.
https://digitalcommons.unomaha.edu/studentwork/2870

Download

Included in

Computer Sciences Commons

COinS

DigitalCommons@UNO