Baselight

Retail Dataset

Massive Dataset for Retail Store Analysis

@kaggle.matteo2002_retail_dataset

Loading...
Loading...

About this Dataset

Retail Dataset

Dataset Description

This dataset contains 436,689 records of commercial transactions from a retail store/marketplace, recorded between December 2014 and December 2015. Each row represents the purchase of a single product within a specific invoice.

Main Columns:

  1. InvoiceID: Unique identifier for the invoice (categorical).
  2. CustomerID: Unique identifier for the customer (categorical).
  3. date: Date of the transaction (temporal), useful for time series and seasonality analysis.
  4. item: Product description (text/categorical).
  5. quantity: Quantity purchased for the individual product (numeric, discrete).
  6. price: Unit price of the product (numeric, continuous).
  7. type: Sales channel (e.g., online, supermarket), categorical.
  8. category: Product category, useful for grouping and pattern mining (categorical).
  9. total_quantity: Total quantity of products purchased in the same invoice (numeric, discrete).
  10. customer_type: Type of customer (e.g., private, wholesaler), categorical.
  11. product_id: Unique identifier for the product (categorical).

Potential Data Mining Applications:

  • Market Basket Analysis / Association Rules: Discover frequent combinations of products purchased together.
  • Customer Segmentation: Cluster customers based on purchase quantity, customer type, and product categories.
  • Forecasting: Predict future sales for products or categories.
  • Anomaly Detection: Detect unusual transactions in terms of quantity or price.
  • Sales Analysis: Identify trends, top-selling products, and seasonal patterns.

General Characteristics:

  • Large dataset with a mix of online and in-store sales.
  • Combination of numeric and categorical variables, suitable for classification, clustering, and association rule mining.
  • Includes a temporal variable for sequential and predictive analysis.

Tables

Customers

@kaggle.matteo2002_retail_dataset.customers
  • 50.49 kB
  • 8,237 rows
  • 2 columns
Loading...
CREATE TABLE customers (
  "customerid" BIGINT,
  "customer_type" VARCHAR
);

Invoice Items

@kaggle.matteo2002_retail_dataset.invoice_items
  • 2.38 MB
  • 436,689 rows
  • 5 columns
Loading...
CREATE TABLE invoice_items (
  "invoiceid" BIGINT,
  "product_id" BIGINT,
  "quantity" BIGINT,
  "price" DOUBLE,
  "line_total" DOUBLE
);

Products

@kaggle.matteo2002_retail_dataset.products
  • 99.76 kB
  • 4,033 rows
  • 4 columns
Loading...
CREATE TABLE products (
  "product_id" BIGINT,
  "item" VARCHAR,
  "category" VARCHAR,
  "price" DOUBLE
);

Purchases

@kaggle.matteo2002_retail_dataset.purchases
  • 1.51 MB
  • 436,689 rows
  • 5 columns
Loading...
CREATE TABLE purchases (
  "invoiceid" BIGINT,
  "date" TIMESTAMP,
  "customerid" BIGINT,
  "product_id" BIGINT,
  "quantity" BIGINT
);

Share link

Anyone who has the link will be able to view this.