Dataset Description
This dataset contains 436,689 records of commercial transactions from a retail store/marketplace, recorded between December 2014 and December 2015. Each row represents the purchase of a single product within a specific invoice.
Main Columns:
- InvoiceID: Unique identifier for the invoice (categorical).
- CustomerID: Unique identifier for the customer (categorical).
- date: Date of the transaction (temporal), useful for time series and seasonality analysis.
- item: Product description (text/categorical).
- quantity: Quantity purchased for the individual product (numeric, discrete).
- price: Unit price of the product (numeric, continuous).
- type: Sales channel (e.g., online, supermarket), categorical.
- category: Product category, useful for grouping and pattern mining (categorical).
- total_quantity: Total quantity of products purchased in the same invoice (numeric, discrete).
- customer_type: Type of customer (e.g., private, wholesaler), categorical.
- product_id: Unique identifier for the product (categorical).
Potential Data Mining Applications:
- Market Basket Analysis / Association Rules: Discover frequent combinations of products purchased together.
- Customer Segmentation: Cluster customers based on purchase quantity, customer type, and product categories.
- Forecasting: Predict future sales for products or categories.
- Anomaly Detection: Detect unusual transactions in terms of quantity or price.
- Sales Analysis: Identify trends, top-selling products, and seasonal patterns.
General Characteristics:
- Large dataset with a mix of online and in-store sales.
- Combination of numeric and categorical variables, suitable for classification, clustering, and association rule mining.
- Includes a temporal variable for sequential and predictive analysis.