Automatic classification of news stories – A machine learning approach
Humans are good at classifying things because our brains are adept at understanding contextual nuances. Machines, however, need to be fed the right features to achieve reasonably good levels of classification. Classifying text manually is a time-consuming and expensive process especially in the information age where a combination of the success of cloud computing, big data and the resurgent trend of the internet of things as well as unprecedented population growth have led to an explosion in the amount of data that we have to deal with – approximately 2.5 quintillion bytes every 24 hours (Walker, 2015). This Thesis explores the efficiency of two well-known machine learning classification algorithms; Naïve Bayes and Support Vector Machines in classifying news stories - an important subset of the global repositories of information. The findings in this study report that using machine learning to classify news stories is not easy but is feasible and if done properly can yield accuracy rates of at least 70%. These results translate into significant time savings that cannot be achieved by manual classification and are a precursor to other machine learning techniques such as recommendation, clustering and sentiment analysis.
Thesis submitted to the Department of Computer Science, Ashesi University College, in partial fulfillment of Bachelor of Science degree in Computer Science, April 2016
machine learning, classification algorithms, Naïve Bayes classification algorithm, Support Vector Machines classification algorithm