Name: Uplift Modeling , Marketing Campaign Data
Creator: Kaggle
License: https://creativecommons.org/publicdomain/zero/1.0/

About this Dataset

Uplift Modeling , Marketing Campaign Data

Context

Uplift modeling is an important yet novel area of research in machine learning which aims to explain and to estimate the causal impact of a treatment at the individual level. In the digital advertising industry, the treatment is exposure to different ads and uplift modeling is used to direct marketing efforts towards users for whom it is the most efficient . The data is a collection collection of 13 million samples from a randomized control trial, scaling up previously available datasets by a healthy 590x factor.

Content

The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.

Following is a detailed description of the features:

f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
treatment: treatment group (1 = treated, 0 = control)
conversion: whether a conversion occured for this user (binary, label)
visit: whether a visit occured for this user (binary, label)
exposure: treatment effect, whether the user has been effectively exposed (binary)

Context

Content

Following is a detailed description of the features:

f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
treatment: treatment group (1 = treated, 0 = control)
conversion: whether a conversion occured for this user (binary, label)
visit: whether a visit occured for this user (binary, label)
exposure: treatment effect, whether the user has been effectively exposed (binary)

Starter Kernels

HistGradientBoostingClassifier Base Model

Acknowledgement

The data provided for paper: "A Large Scale Benchmark for Uplift Modeling"

https://s3.us-east-2.amazonaws.com/criteo-uplift-dataset/large-scale-benchmark.pdf

Eustache Diemert
CAIL
e.diemert@criteo.com
Artem Betlei
CAIL & Université Grenoble Alpes
a.betlei@criteo.com
Christophe Renaudin
CAIL
c.renaudin@criteo.com
Massih-Reza Amini
Université Grenoble Alpes
massih-reza.amini@imag.fr

For privacy reasons the data has been sub-sampled non-uniformly so that the original incrementality level cannot be deduced from the dataset while preserving a realistic, challenging benchmark. Feature names have been anonymized and their values randomly projected so as to keep predictive power while making it practically impossible to recover the original features or user context.

Inspiration

We can foresee related usages such as but not limited to:

Uplift modeling
Interactions between features and treatment
Heterogeneity of treatment

Tables

Criteo Uplift V2–1

@kaggle.arashnic_uplift_modeling.criteo_uplift_v2_1

215.08 MB
13979592 rows
16 columns


CREATE TABLE criteo_uplift_v2_1 (
  "f0" DOUBLE,
  "f1" DOUBLE,
  "f2" DOUBLE,
  "f3" DOUBLE,
  "f4" DOUBLE,
  "f5" DOUBLE,
  "f6" DOUBLE,
  "f7" DOUBLE,
  "f8" DOUBLE,
  "f9" DOUBLE,
  "f10" DOUBLE,
  "f11" DOUBLE,
  "treatment" BIGINT,
  "conversion" BIGINT,
  "visit" BIGINT,
  "exposure" BIGINT
);

Uplift Modeling , Marketing Campaign Data

About this Dataset