Analysis of credit card fraud detection methods
Technological advancements in the field of finance have brought about several ways to buy and sell via the internet. With the rampant increase in digitization worldwide, more people are leaning towards the use of the internet as the main medium for conducting transactions. Credit cards serve as one of the modes of payment for these transactions and fraudulent activities in the use of these cards have caused a great deal of harm to several institutions in the online market space. Unfortunately, detection of this menace is not a straightforward task as it has two main issues associated with it: (1) The problem of data imbalance in credit card fraud data (2) The profiles of fraudulent and genuine users being dynamic. This projects tackles the problem of data imbalance in credit card fraud data by adopting a resampling approach in combination with three different classification algorithms to detects instances of credit card fraud. The dataset used contained 284,315 genuine transactions and 492 fraudulent transactions, making it highly imbalanced. Such data may cause classification algorithms to be biased towards the majority class (the class with genuine users) since they are designed with the assumption that they are working with a fairly equal number of examples for each class. Hence, different resampling techniques were used to resample the data before Logistic Regression, Naive Bayes and the K-Nearest Neighbor algorithm were used to predit if a transaction was fraudulent or not. The K-Nearest Neighbor classifier obtained the best performance in terms of f1score and Precision-Recall arca under curve (PR AUC) score when it was used with the Neighborhood Cleaning Rule for undersampling. The values obtained for these performance metrics were 82.5% and 81% respectively.
Undergraduate thesis submitted to the Department of Computer Science, Ashesi University, in partial fulfillment of Bachelor of Science degree in / Computer Science, May 2021
credit card fraud, fraud data, data resampling