Random Forests & Credit Card Fraud

Research paper investigating the effectiveness of random forests in detecting credit card fraud.

This group project explored how effectively Random Forest classifiers can detect credit card fraud, a domain where fraudulent transactions represent less than 1% of total data which makes detection extremely difficult due to class imbalance.

We investigated:

  • The impact of data balancing methods such as undersampling, oversampling and SMOTE on model performance.
  • The role of model refinement techniques like boosting (XGBoost) and bagging and…
  • How feature selection and hyperparameter tuning affect accuracy and recall.

We found that, using the IEEE-CIS Fraud Detection dataset (590k+ transactions):

  • The baseline Random Forest achieved high accuracy (98.7%) but poor recall (0.48), missing many fraud cases.
  • Limited undersampling and feature selection improved recall to 0.65, achieving the best F1-score of 0.77.
  • Boosting methods like XGBoost provided additional gains in recall and robustness.