Automatic classification of news stories – A machine learning approach
Date
2016-04
item.page.datecreated
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Humans are good at classifying things because our brains are adept at understanding
contextual nuances. Machines, however, need to be fed the right features to achieve
reasonably good levels of classification. Classifying text manually is a time-consuming and
expensive process especially in the information age where a combination of the success of
cloud computing, big data and the resurgent trend of the internet of things as well as
unprecedented population growth have led to an explosion in the amount of data that we
have to deal with – approximately 2.5 quintillion bytes every 24 hours (Walker, 2015). This
Thesis explores the efficiency of two well-known machine learning classification
algorithms; Naïve Bayes and Support Vector Machines in classifying news stories - an
important subset of the global repositories of information. The findings in this study report
that using machine learning to classify news stories is not easy but is feasible and if done
properly can yield accuracy rates of at least 70%. These results translate into significant
time savings that cannot be achieved by manual classification and are a precursor to other
machine learning techniques such as recommendation, clustering and sentiment analysis.
Description
Thesis submitted to the Department of Computer Science, Ashesi University College, in partial fulfillment of Bachelor of Science degree in Computer Science, April 2016
item.page.type
Thesis
item.page.format
Keywords
machine learning, classification algorithms, Naïve Bayes classification algorithm, Support Vector Machines classification algorithm