Evaluating and choosing a machine learning algorithm for classifying road surface quality data
Evaluating and choosing a machine learning algorithm for classifying road surface quality data
Date
2018-04
item.page.datecreated
Authors
Abeo, Anthony Anabila
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Considering the importance of roads to a community, stakeholders (Governments,
Motoristsetc) needup-to-dateinformation about the state of roads for decisionmaking.
This problem inspired Vorgbe’s (2014) work in implementing a machine learning
classifier that could accurately classify roads as “good”, “fair” or “bad”. This
information can then be visualised on Google Maps. However, with his algorithm
failing to accurately classify some roads, this project seeks to evaluate five
classification algorithms to determine which one is best for classifying road surface
quality data.
To do this, we collected x, y, z acceleration and location data, extracted the desired
features from it, performed a 10-fold cross-validation trainingonthedatatochoosethe
best model and then tested on a new set of examples to determine the model that
accurately classifies the data. Fromthedataavailable,thedecisiontreemodelproduced
the best performance with true positives of 97% accuracy for bad roads,81%accuracy
for fair roads and 93% accuracy for good roads. The overall accuracy on the test set is
92% with a precision of 92% and recall of 90%. This means that, this model is more
likely to accurately predict a new data point as belonging to its true class. The other
algorithms (Logistic Regression, Random Forests, Support Vector Machines and
Nearest Neighbour) performed well when classifying the “good” and “bad” road data
but instead classified the “fair” road data as “good” road.
Description
Applied project submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in Computer Science, April 2018
item.page.type
Applied Project
item.page.format
Keywords
algorithm , road quality data