The dataset was taken from Kaggle. This Credit Fraud Detection Dataset contains credit card transactions made in September 2012 by European Cardholders. It incorporates only 2 days of transaction data, its highly imbalanced dataset as it contains 492 Fraud out of 284,807 Transactions. This infers that fraud accounts for 0.17% of the total transaction.

Image for post
Image for post
Photo by Avery Evans on Unsplash

Due to Confidentiality issues, the variable features and information are undisclosed to us. PCA Transformation was performed on input variables. …

Natural Language Processing (NLP) is the subfield of computer science able to make computer systems understand human language as humans naturally speak and type. It is used to apply machine learning algorithms to text and speech.

For example, we can use NLP to create systems like speech recognition, document summarization, machine translation, spam detection, named entity recognition, question answering, autocomplete, predictive typing, and so on.

Image for post
Image for post
Photo by Andy Kelly on Unsplash


It is the process of converting a sequence of characters into a sequence of tokens. For example, “This is a sample example for tokenization”. In this situation, each word would be a token.

Sometimes tokenization…

Fraud detection in banking is one of the vital aspects nowadays as finance is a major sector in our life. In this article, we will be looking at the powerful classification model and even use Neural Networks at the end.

Image for post
Image for post
Photo by rupixen.com on Unsplash

Before heading I would highly recommend you to check out previous parts of this case study to better understand the scenario

Part 1: Credit Card Fraud Detection: In-Depth Study: Data Preparation

Part 2: Credit Card Fraud Detection: In-Depth Study: Evaluating the Classification Model

Part 3: Credit Card Fraud Detection: Logistic Regression

Let assume from Part 3, Logistic model would be…

Logistic Regression:

It is commonly used to estimate the probabilities than on instance belong to a particular class.

For example, it tells what is the probability that the transaction is fraud?

If the probability is more than 50%, then the model predicts that the instance belongs to that class otherwise does not belong to that class.

It is basically a binary classifier.

How are the probabilities calculated?

Just like Linear Regression, the Logistic Regression model computes a weighted sum of input features and bias, but instead of outputting the result, it passes through a logistic function.

Image for post
Image for post
Source: Hands-on Machine Learning with Scikit-Learn

hθ = Hypothesis Function using model…

A Major part of building an effective model is to evaluate the model. The most frequent metric used is ‘Accuracy’. High Accuracy doesn’t mean that the model is performing better in all the situations. It is not always considered to be accurate as it sometimes is misleading in some situations like imbalanced class datasets.

Image for post
Image for post

As the analysis is focused on credit card fraud detection, we will evaluate the performance of the

Model-based on few metrics listed below:

1. Confusion Matrix

2. Accuracy, Recall, Precision and F1-Score

3. ROC Curve and AUC

Confusion Matrix

Confusion Matrix provides more insightful details on the performance…

Nikhil Thapa

Data Science Enthusiast. I love developing data products and solving challenging real world problems using data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store