Machine Learning-based Electricity Theft Detection Considering Customer Consumption Pattern and Geographical Condition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Electricity theft continues to pose significant challenges for power utilities, especially in developing countries, resulting in considerable non-technical losses and operational inefficiencies. This study presents a machine learning-based framework for electricity theft detection that integrates customer consumption pattern behavior such as tariff category, contracted power and service type with geographic context, including transformer-level fraud rates and district-level poverty indices. The model is trained and tested using real-world data from PLN, derived from on-site inspection records conducted between 2019 and 2023, encompassing over 6.7 million rows of data collected through the Electricity Usage Enforcement Program. Fifteen classifiers are compared under stratified 10-Fold Cross Validation on the 70 percent training split and hold out 30 percent of the data for final testing, avoiding synthetic oversampling to preserve genuine data distribution. The top model Gradient Boosting Classifier achieves an F1-score of 0.92 and AUC of 0.85 on a holdout dataset, detecting 93% of all inspections, both theft and non-thefts at 92% precision. Feature-importance and confusion-matrix analyses confirm that our framework excels at minimizing false positives while surfacing the most informative risk indicators for targeted inspections. By leveraging solely real inspection data and scalable preprocessing pipelines, this approach provides utilities with an intelligent, data-driven tool for proactive fraud prevention, optimized resource allocation, and significant reduction of non-technical losses.

Original languageEnglish
Title of host publication2025 International Conference on Data Science and Its Applications, ICoDSA 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages534-539
Number of pages6
ISBN (Electronic)9798331598549
DOIs
Publication statusPublished - 2025
Event8th International Conference on Data Science and Its Applications, ICoDSA 2025 - Hybrid, Jakarta, Indonesia
Duration: 3 Jul 20255 Jul 2025

Publication series

Name2025 International Conference on Data Science and Its Applications, ICoDSA 2025

Conference

Conference8th International Conference on Data Science and Its Applications, ICoDSA 2025
Country/TerritoryIndonesia
CityHybrid, Jakarta
Period3/07/255/07/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 10 - Reduced Inequalities
    SDG 10 Reduced Inequalities

Keywords

  • behavioral features
  • electricity theft detection
  • geospatial analysis
  • gradient boosting
  • machine learning

Fingerprint

Dive into the research topics of 'Machine Learning-based Electricity Theft Detection Considering Customer Consumption Pattern and Geographical Condition'. Together they form a unique fingerprint.

Cite this