Video Ad Engagement Prediction: 3 Million Labeled Impressions Dataset
Description
This dataset consists of 3 million labelled advertising auction lines, aimed at fostering advancements in Machine Learning, particularly in user engagement prediction with video ads.
This dataset is a product of extensive work by Cyrille Dubarry and was initially used for a Machine Learning class competition at École Polytechnique.
Objective
This dataset is designed to facilitate the prediction of the duration for which a user will engage with a video advertisement. Each entry in the dataset, marked by a unique AuctionID, represents an individual ad impression and includes a variety of contextual information about the user, publisher, and advertiser.
Content and Features
- auction_id: unique id for identifying each line
- timestamp: the timestamp (in seconds) of the ad impression
- creative_duration: the total duration of the video that has been played
- campaign_id: the advertising campaign id
- advertiser_id: the advertiser id
- placement_id: the id of a zone in the web page where the video was played
- placement_language: the language of this zone
- website_id: the corresponding website id
- referer_deep_three: the URL of the page where the video was played, truncated at its 3rd level
- ua_country: the country of the user who watched the video
- ua_os: the user's Operating System
- ua_browser: the user's internet browser
- ua_browser_version: the user's browser version
- ua_device: the user's device
- user_average_seconds_played: the average duration the user watched video ads in the past. It can be null if the user never watched any ad.
- seconds_played: the observed time the video has been watched. This is the quantity we are trying to predict.
Use Case
This dataset is highly valuable for data scientists and researchers aiming to build predictive models for user engagement with video advertisements. It provides insights into how various factors such as device type, user preferences, and ad placement can influence ad-watching behaviour.
License
This dataset is shared under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication, which allows for unrestricted use, adaptation, and distribution in any medium for any purpose.