Skip to content

rawoolsiddhi/fraud_transaction_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕵️ Fraud Transaction Detection

https://fraudtransactiondetection-uee8kzouwxas9pc3rve5mc.streamlit.app/

This is an end-to-end machine learning project to detect fraudulent financial transactions using a stacking ensemble model (Random Forest + XGBoost). It includes preprocessing, modeling, evaluation, prediction, and a user-friendly Streamlit web app.

Key Features

  • Real-time transaction risk evaluation
  • Multi-model approach for improved accuracy
  • Comprehensive feature engineering pipeline
  • Scalable architecture for production deployment
  • Detailed performance metrics and monitoring

Architecture Overview

The system employs a modular design with clear separation of concerns, ensuring maintainability and scalability.

flowchart TD
    subgraph Input["Input Layer"]
        A[Transaction Data]
    end

    subgraph Processing["Processing Pipeline"]
        B[Feature Engineering]
        C[Model Selection]
        D[Prediction Engine]
    end

    subgraph Models["ML Models"]
        E[XGBoost Model]
        F[KNN Model]
        G[SVM Model]
    end

    subgraph Output["Results"]
        H[Fraud Score]
        I[Alert System]
    end

    A --> B
    B --> C
    C --> D
    E --> D
    F --> D
    G --> D
    D --> H
    H --> I

    classDef input fill:#93c47d,stroke:#6aa84f,color:#000
    classDef process fill:#6fa8dc,stroke:#3d85c6,color:#000
    classDef model fill:#e06666,stroke:#cc0000,color:#000
    classDef result fill:#ffd966,stroke:#bf9000,color:#000

    class A input
    class B,C,D process
    class E,F,G model
    class H,I result
Loading

The architecture diagram above illustrates the system's four main layers:

  • Green: Input layer handling raw transaction data
  • Blue: Processing pipeline managing feature engineering and model selection
  • Red: ML models (XGBoost, KNN, SVM) contributing to predictions
  • Yellow: Results layer generating fraud scores and triggering alerts

Each transaction flows through the feature engineering pipeline before being evaluated by multiple machine learning models. The Prediction Engine combines outputs from all models to generate a final fraud score, which triggers alerts based on configurable thresholds.

Technical Implementation

Core Components

fraud_transaction_detection/
├── src/
│   ├── features/
│   │   └── feature_engineering.py
│   ├── models/
│   │   ├── xgboost_model.py
│   │   ├── knn_model.py
│   │   └── svm_model.py
│   ├── pipeline/
│   │   └── prediction_pipeline.py
│   └── utils/
│       └── metrics.py
└── notebooks/
    └── analysis.ipynb

Model Comparison

Model Balanced Accuracy Precision Recall F1 Score
XGBoost 0.881 ± 0.017 0.963 ± 0.007 0.763 ± 0.035 0.851 ± 0.023
KNN 0.705 ± 0.037 0.942 ± 0.022 0.409 ± 0.074 0.568 ± 0.073
SVM 0.595 ± 0.013 1.000 ± 0.000 0.190 ± 0.026 0.319 ± 0.037

Performance

  • XGBoost: Accuracy ~96%, Precision ~92%, AUC ~0.76
  • Model validated using 5‑fold cross‑validation

Performance Metrics

Metric Value Notes
Balanced Accuracy 0.881 ± 0.017 Based on cross-validation
Precision 0.963 ± 0.007 On unseen data
Recall 0.763 ± 0.035 On unseen data
F1 Score 0.851 ± 0.023 Overall performance measure

Here's your data structure diagram:

erDiagram
    TRANSACTION ||--|| CUSTOMER : "belongs to"
    TRANSACTION ||--|| TERMINAL : "processed by"
    
    CUSTOMER {
        string CUSTOMER_ID
    }
    
    TERMINAL {
        string TERMINAL_ID
    }
    
    TRANSACTION {
        string TRANSACTION_ID
        float TX_AMOUNT
        int TX_TIME_SECONDS
        int TX_TIME_DAYS
        boolean TX_FRAUD
        string TX_FRAUD_SCENARIO
        int TX_HOUR
        int TX_DAY_OF_WEEK
        int TX_WEEK_OF_YEAR
        boolean TX_IS_WEEKEND
    }
Loading

The diagram shows how your transaction data is organized:

  • Each transaction belongs to exactly one customer and is processed by exactly one terminal
  • Fields are grouped into logical categories:
    • CUSTOMER: Customer identification
    • TERMINAL: Terminal identification
    • TRANSACTION: Core transaction details and fraud information

Fraud Transaction Detection
Simulated dataset (~1.8M transactions, ~0.8% fraud) generated using real‑world distribution patterns.

Here are your data attributes in table format:

Field Name Purpose Format Use Case Importance
TRANSACTION_ID Unique identifier for each transaction String Transaction tracking and audit trails Essential for fraud investigation
CUSTOMER_ID Identifies customer account String Customer behavior analysis Critical for pattern recognition
TERMINAL_ID Identifies processing terminal String Terminal security monitoring Key for location tracking
TX_AMOUNT Transaction amount in currency Float Fraud detection algorithms Primary fraud indicator
TX_TIME_SECONDS Precise transaction timestamp Integer Micro-level pattern analysis Enables rapid fraud detection
TX_TIME_DAYS Transaction timestamp in days Integer Long-term pattern analysis Supports trend analysis
TX_HOUR Hour of day (0-23) Integer Peak fraud hour identification Time-based risk assessment
TX_DAY_OF_WEEK Day of week (1-7) Integer Weekly pattern analysis Behavioral pattern recognition
TX_WEEK_OF_YEAR Week number in year (1-52) Integer Seasonal pattern analysis Long-term trend analysis
TX_IS_WEEKEND Weekend indicator Boolean Weekend vs. weekday analysis Risk profile adjustment
TX_FRAUD Fraud status indicator Boolean Fraud detection training Core target variable
TX_FRAUD_SCENARIO Fraud type categorization String Fraud pattern analysis Strategy development

Requirements

pip install -r requirements.txt

Here's your Streamlit project architecture diagram:

flowchart TD
    subgraph Client["Client Side"]
        Browser["Web Browser"]
    end
    
    subgraph Streamlit["Streamlit Application"]
        App["Streamlit App"]
        Model["Fraud Detection Model"]
        Cache["Cached Data"]
    end
    
    subgraph Storage["Data Storage"]
        Pickle["Pickle Files"]
        Data["Transaction Data"]
    end
    
    Browser -->|"HTTP Request"| App
    App -->|"Load Model"| Model
    Model -->|"Request Data"| Cache
    Cache -->|"If Not Found"| Pickle
    Pickle -->|"Load Data"| Data
    Data -->|"Cache"| Cache
    Model -->|"Predictions"| App
    App -->|"Results"| Browser
    
    classDef client fill:#f9f9f9,stroke:#333,stroke-width:2px,color:#000000
    classDef app fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000000
    classDef storage fill:#fff3e0,stroke:#333,stroke-width:2px,color:#000000
    
    class Browser client
    class App,Model,Cache app
    class Pickle,Data storage
Loading

Contributing

Fork the repositoryCreate a feature branchRun tests: pytest src/tests/Submit pull request with documentation updates##

feel free to connect. https://linkedin.com/in/siddhi-rawool-783059248

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published