Baselight

SGCC Electricity Theft Detection

Imbalanced dataset for the classification problem of electricity theft detection

@kaggle.bensalem14_sgcc_dataset

Loading...
Loading...

About this Dataset

SGCC Electricity Theft Detection

Overview

The State Grid Corporation of China (SGCC) dataset with 1000 records was used in the
model. This is a key resource in the field of power distribution and management, with a large and
varied set of data about electricity transport and grid operations. This set of data contains a lot of
different kinds of information, such as history and real-time data on energy use, grid
infrastructure, the integration of green energy, and grid performance. It is a key part of making
power distribution networks more reliable and efficient by helping with things like predicting
demand, watching the grid, and finding problems. Researchers, energy providers, and law-
makers can use this information to learn important things about electricity usage trends, the
health of the grid, and the merging of green energy sources. This will help the electric power
industry come up with new strategies and ideas that are based on data.

Description

Electricity theft detection released by the State Grid Corporation of China (SGCC) dataset data set.csv contains 1037 columns and 42,372 rows for electric consumption from January first 2014 to 30 October 2016. SGCC data first column is consumer ID that is alphanumeric. Then from column 2 to columns 1036 daily electricity consumption is given. Last column named flag is the labels in 0 and 1 values. the small version of the dataset datasetsmall.csv only contains the electric consumption for January 2014.

Features

  • 'MM/DD/YYYY': The electric consumption on a given day .
  • CONS_NO: Consumer Number stands for a customer ID of string type.
  • FLAG: 0 indicating no theft and 1 for theft.

Useful for

  • Binary Classification: The main intention of the dataset is for binary classification of electrical theft.
  • Imbalanced Datasets Processing: Useful for exploring class balancing methods.
  • Time Series Forecasting: Can be used for forecasting and predicting electrical consumption on a given day.

Notes

  • This Dataset Contains missing values .
  • This Dataset has dates of the form "MM/DD/YYYY".
  • This Dataset requires slight cleaning.

Tables

Data Set

@kaggle.bensalem14_sgcc_dataset.data_set
  • 80.66 MB
  • 42372 rows
  • 1036 columns
Loading...

CREATE TABLE data_set (
  "n_1_1_2014" DOUBLE,
  "n_1_2_2014" DOUBLE,
  "n_1_3_2014" DOUBLE,
  "n_1_4_2014" DOUBLE,
  "n_1_5_2014" DOUBLE,
  "n_1_6_2014" DOUBLE,
  "n_1_7_2014" DOUBLE,
  "n_1_8_2014" DOUBLE,
  "n_1_9_2014" DOUBLE,
  "n_1_10_2014" DOUBLE,
  "n_1_11_2014" DOUBLE,
  "n_1_12_2014" DOUBLE,
  "n_1_13_2014" DOUBLE,
  "n_1_14_2014" DOUBLE,
  "n_1_15_2014" DOUBLE,
  "n_1_16_2014" DOUBLE,
  "n_1_17_2014" DOUBLE,
  "n_1_18_2014" DOUBLE,
  "n_1_19_2014" DOUBLE,
  "n_1_20_2014" DOUBLE,
  "n_1_21_2014" DOUBLE,
  "n_1_22_2014" DOUBLE,
  "n_1_23_2014" DOUBLE,
  "n_1_24_2014" DOUBLE,
  "n_1_25_2014" DOUBLE,
  "n_1_26_2014" DOUBLE,
  "n_1_27_2014" DOUBLE,
  "n_1_28_2014" DOUBLE,
  "n_1_29_2014" DOUBLE,
  "n_1_30_2014" DOUBLE,
  "n_1_31_2014" DOUBLE,
  "n_2_1_2014" DOUBLE,
  "n_2_2_2014" DOUBLE,
  "n_2_3_2014" DOUBLE,
  "n_2_4_2014" DOUBLE,
  "n_2_5_2014" DOUBLE,
  "n_2_6_2014" DOUBLE,
  "n_2_7_2014" DOUBLE,
  "n_2_8_2014" DOUBLE,
  "n_2_9_2014" DOUBLE,
  "n_2_10_2014" DOUBLE,
  "n_2_11_2014" DOUBLE,
  "n_2_12_2014" DOUBLE,
  "n_2_13_2014" DOUBLE,
  "n_2_14_2014" DOUBLE,
  "n_2_15_2014" DOUBLE,
  "n_2_16_2014" DOUBLE,
  "n_2_17_2014" DOUBLE,
  "n_2_18_2014" DOUBLE,
  "n_2_19_2014" DOUBLE,
  "n_2_20_2014" DOUBLE,
  "n_2_21_2014" DOUBLE,
  "n_2_22_2014" DOUBLE,
  "n_2_23_2014" DOUBLE,
  "n_2_24_2014" DOUBLE,
  "n_2_25_2014" DOUBLE,
  "n_2_26_2014" DOUBLE,
  "n_2_27_2014" DOUBLE,
  "n_2_28_2014" DOUBLE,
  "n_3_1_2014" DOUBLE,
  "n_3_2_2014" DOUBLE,
  "n_3_3_2014" DOUBLE,
  "n_3_4_2014" DOUBLE,
  "n_3_5_2014" DOUBLE,
  "n_3_6_2014" DOUBLE,
  "n_3_7_2014" DOUBLE,
  "n_3_8_2014" DOUBLE,
  "n_3_9_2014" DOUBLE,
  "n_3_10_2014" DOUBLE,
  "n_3_11_2014" DOUBLE,
  "n_3_12_2014" DOUBLE,
  "n_3_13_2014" DOUBLE,
  "n_3_14_2014" DOUBLE,
  "n_3_15_2014" DOUBLE,
  "n_3_16_2014" DOUBLE,
  "n_3_17_2014" DOUBLE,
  "n_3_18_2014" DOUBLE,
  "n_3_19_2014" DOUBLE,
  "n_3_20_2014" DOUBLE,
  "n_3_21_2014" DOUBLE,
  "n_3_22_2014" DOUBLE,
  "n_3_23_2014" DOUBLE,
  "n_3_24_2014" DOUBLE,
  "n_3_25_2014" DOUBLE,
  "n_3_26_2014" DOUBLE,
  "n_3_27_2014" DOUBLE,
  "n_3_28_2014" DOUBLE,
  "n_3_29_2014" DOUBLE,
  "n_3_30_2014" DOUBLE,
  "n_3_31_2014" DOUBLE,
  "n_4_1_2014" DOUBLE,
  "n_4_2_2014" DOUBLE,
  "n_4_3_2014" DOUBLE,
  "n_4_4_2014" DOUBLE,
  "n_4_5_2014" DOUBLE,
  "n_4_6_2014" DOUBLE,
  "n_4_7_2014" DOUBLE,
  "n_4_8_2014" DOUBLE,
  "n_4_9_2014" DOUBLE,
  "n_4_10_2014" DOUBLE
);

Datasetsmall

@kaggle.bensalem14_sgcc_dataset.datasetsmall
  • 2.01 MB
  • 25863 rows
  • 28 columns
Loading...

CREATE TABLE datasetsmall (
  "n_01_01_2014" DOUBLE,
  "n_01_02_2014" DOUBLE,
  "n_01_03_2014" DOUBLE,
  "n_01_04_2014" DOUBLE,
  "n_01_05_2014" DOUBLE,
  "n_01_06_2014" DOUBLE,
  "n_01_07_2014" DOUBLE,
  "n_01_08_2014" DOUBLE,
  "n_01_09_2014" DOUBLE,
  "n_01_10_2014" DOUBLE,
  "n_01_11_2014" DOUBLE,
  "n_01_12_2014" DOUBLE,
  "n_1_13_2014" DOUBLE,
  "n_1_14_2014" DOUBLE,
  "n_1_15_2014" DOUBLE,
  "n_1_16_2014" DOUBLE,
  "n_1_17_2014" DOUBLE,
  "n_1_18_2014" DOUBLE,
  "n_1_19_2014" DOUBLE,
  "n_1_20_2014" DOUBLE,
  "n_1_21_2014" DOUBLE,
  "n_1_22_2014" DOUBLE,
  "n_1_23_2014" DOUBLE,
  "n_1_24_2014" DOUBLE,
  "n_1_25_2014" DOUBLE,
  "n_1_26_2014" DOUBLE,
  "cons_no" VARCHAR,
  "flag" BIGINT
);

Share link

Anyone who has the link will be able to view this.