ARSHPREET
KAUR
BACK TO WORK
BA · DATA ANALYST

Mining Incident Funnel Analysis

Stack
Pythonpandasmatplotlibplotlynumpy
Mining Incident Funnel Analysis Showcase

Project Overview

This project examined 8 years of safety incident records across 48 mines to answer one question: do incidents that enter the management pipeline ever actually get resolved - or do they stall somewhere in the system?

The analysis covers the full lifecycle of 2,129 validated incidents from 2017 to 2025, identifying where the pipeline breaks down, which sites carry the most unresolved risk, and what organisations should do about it.

Business Context

Large organisations - mining, government, healthcare, logistics - all run incident reporting systems. Reports get filed. The assumption is that something happens next: an investigation opens, a corrective action is assigned, the case closes.

But that assumption is rarely measured. Without tracking pipeline completion end to end, organisations have no way of knowing whether their incident management system is working or simply collecting reports.

Real incident data from large organisations is rarely publicly available - it is sensitive, proprietary, and subject to regulatory restrictions. Rather than work with a generic or oversimplified dataset, I used AI to synthesize a realistic dataset modelled on how mining incident management systems actually operate - including real data quality problems, missing values, duplicate records, and malformed source files.

The goal was not to analyse a specific company. It was to build and demonstrate an analytical methodology that could be applied directly to real organisational data.

Solution Design

I designed the analysis end to end - from broken source data through to a management-ready presentation and an interactive dashboard.

The pipeline involved five stages: data repair, cleaning and validation, funnel analysis, time and SLA analysis, and risk ranking. Each stage was built to be reusable and defensible - not just to produce a finding, but to produce one that could be questioned and held up under scrutiny.

How the Analysis Works

1. Data Repair

Four of the six source files had a CSV formatting bug - multi-word values were incorrectly quoted, causing columns to collapse on import. Rather than hardcoding values, I built a repair script using the header column count as the source of truth, and a targeted parser using the mine short code as a positional anchor. Clean files were saved and used throughout the rest of the analysis.

2. Cleaning & Validation

The raw dataset contained 7,500 rows representing only 2,376 unique incidents. After deduplication, invalid records were quarantined rather than deleted - preserving the audit trail. The working dataset of 2,129 incidents was enriched by joining four lookup tables covering mines, departments, hazards, and severity levels.

3. Funnel Analysis

Six pipeline stages were defined - Reported, Logged, Investigation Started, Action Assigned, Action Completed, and Case Closed. A reusable build_funnel() function calculated completion rates and stage-to-stage drop-off for any slice of data, enabling analysis by severity, mine, department, and hazard category.

4. Time & SLA Analysis

Elapsed days between each stage transition were calculated for every incident that reached both stages. SLA compliance was measured against regulatory targets by severity level - 3 days for Fatal Risk through to 30 days for Near Miss.

5. Risk Ranking

A composite risk score was calculated for each mine using four weighted factors: fatal risk event count, serious injury count, total incident volume, and unresolved rate. The ranking provides a prioritised, actionable list of sites requiring management attention.

Key Findings

  • Only 1 in 6 incidents ever reaches Case Closed - a rate that has not improved across 8 years of data.
  • The largest single drop in the pipeline is between Logged and Investigation Started - 769 incidents stall here annually. Nothing downstream can happen until this bottleneck is addressed.
  • Fatal Risk incidents close at 60%. Near Miss incidents close at 3%. The 1,100+ unresolved near misses represent the cheapest and most effective point of intervention the organisation is systematically skipping.
  • SLA compliance hovers at approximately 50% across all severity levels - indicating a systemic workflow delay, not an isolated failure.

Impact & Outcomes

  • Delivered a management-ready 8-slide presentation with findings, recommendations, owners, timelines, and success criteria
  • Built an interactive HTML dashboard for non-technical stakeholders to explore findings independently
  • Produced a fully documented Python notebook suitable for replication on real organisational data
  • Identified four prioritised recommendations with measurable success criteria - including a near miss closure target and a site audit framework for the five highest-risk mines