The dataset was taken from Kaggle. This Credit Fraud Detection Dataset contains credit card transactions made in September 2012 by European Cardholders. It incorporates only 2 days of transaction data, its highly imbalanced dataset as it contains 492 Fraud out of 284,807 Transactions. This infers that fraud accounts for 0.17% of the total transaction.
Due to Confidentiality issues, the variable features and information are undisclosed to us. PCA Transformation was performed on input variables. …
Natural Language Processing (NLP) is the subfield of computer science able to make computer systems understand human language as humans naturally speak and type. It is used to apply machine learning algorithms to text and speech.
For example, we can use NLP to create systems like speech recognition, document summarization, machine translation, spam detection, named entity recognition, question answering, autocomplete, predictive typing, and so on.
It is the process of converting a sequence of characters into a sequence of tokens. For example, “This is a sample example for tokenization”. In this situation, each word would be a token.
Fraud detection in banking is one of the vital aspects nowadays as finance is a major sector in our life. In this article, we will be looking at the powerful classification model and even use Neural Networks at the end.
Before heading I would highly recommend you to check out previous parts of this case study to better understand the scenario
Let assume from Part 3, Logistic model would be…
It is commonly used to estimate the probabilities than on instance belong to a particular class.
For example, it tells what is the probability that the transaction is fraud?
If the probability is more than 50%, then the model predicts that the instance belongs to that class otherwise does not belong to that class.
It is basically a binary classifier.
How are the probabilities calculated?
Just like Linear Regression, the Logistic Regression model computes a weighted sum of input features and bias, but instead of outputting the result, it passes through a logistic function.
hθ = Hypothesis Function using model…
A Major part of building an effective model is to evaluate the model. The most frequent metric used is ‘Accuracy’. High Accuracy doesn’t mean that the model is performing better in all the situations. It is not always considered to be accurate as it sometimes is misleading in some situations like imbalanced class datasets.
As the analysis is focused on credit card fraud detection, we will evaluate the performance of the
Model-based on few metrics listed below:
1. Confusion Matrix
2. Accuracy, Recall, Precision and F1-Score
3. ROC Curve and AUC
Confusion Matrix provides more insightful details on the performance…
Data Science Enthusiast. I love developing data products and solving challenging real world problems using data