ML PRACTITIONER · COMPUTER VISION

Forged Signature Detection

Offline Signature Verification using Machine Learning

Role

Co-Developer & Researcher

Team

2 Members

Stack

PythonOpenCVSVCRandom ForestCNN

Project Overview

This project explored offline signature verification as a behavioral biometric problem, focusing on detecting forged signatures using static images rather than dynamic signing data.

The goal was to design, implement, and evaluate multiple classification approaches to understand how different models perform when distinguishing between genuine and forged handwritten signatures.

Problem Context

Handwritten signatures remain widely used for identity verification in legal, financial, and administrative processes. Manual verification is subjective and inconsistent, especially at scale.

Offline signature verification presents additional challenges because only static images are available, requiring models to rely on structural and textural features rather than dynamic motion data.

Data Sources

The project used two datasets to ensure robust testing:

Personal Dataset: A custom handwritten signature dataset created for initial experimentation, consisting of genuine and forged samples.
CEDAR Dataset: A publicly available dataset containing genuine and forged signatures from 55 individuals written in Latin script.

This combination allowed controlled testing followed by evaluation on a larger, standardized dataset.

Solution Design

Preprocessing

Signature images were preprocessed to improve consistency and feature extraction. Steps included grayscale conversion, Gaussian blurring, Otsu thresholding, and contour extraction to reduce noise and normalize signatures.

Feature Extraction

For traditional machine learning models (SVC, Random Forest), structured features were extracted to capture both geometric structure and texture:

Bounding box dimensions and aspect ratio
Hu moments
Histogram of Oriented Gradients (HOG)
Local Binary Patterns (LBP)

Models Evaluated

Three classification approaches were implemented and compared:

Support Vector Classifier (SVC): Uses extracted features.
Random Forest: Uses extracted features.
Convolutional Neural Network (CNN): Feature learning handled directly by the network.

Model Evaluation & Results

Models were evaluated using an 80/20 train-test split.

SVC Accuracy: ~87.5%
Random Forest Accuracy: ~87.5%
CNN Accuracy: ~75%

The lower performance of the CNN highlighted the impact of limited dataset size on deep learning models in offline signature verification tasks, whereas feature-based methods performed competitively.

Key Insights

Traditional ML models performed competitively with well-engineered features.
CNNs require larger datasets to outperform feature-based methods in offline scenarios.
Structural handwriting features are highly effective for distinguishing forged signatures.
Explainable feature-based approaches are valuable in fraud-sensitive applications.