Articles Sharing And Reading From CI&T DeskDrop
Logs of users interactions on shared articles for content Recommender Systems
@kaggle.gspmoreira_articles_sharing_reading_from_cit_deskdrop
Logs of users interactions on shared articles for content Recommender Systems
@kaggle.gspmoreira_articles_sharing_reading_from_cit_deskdrop
Deskdrop is an internal communications platform developed by CI&T, focused in companies using Google G Suite. Among other features, this platform allows companies employees to share relevant articles with their peers, and collaborate around them.
This rich and rare dataset contains a real sample of 12 months logs (Mar. 2016 - Feb. 2017) from CI&T's Internal Communication platform (DeskDrop).
I contains about 73k logged users interactions on more than 3k public articles shared in the platform.
This dataset features some distinctive characteristics:
If you like it, please upvote!
Take a look in these featured Python kernels:
We thank CI&T for the support and permission to share a sample of real usage data from its internal communication platform: Deskdrop.
The two main approaches for Recommender Systems are Collaborative Filtering and Content-Based Filtering.
In the RecSys community, there are some popular datasets available with users ratings on items (explicit feedback), like MovieLens and Netflix Prize, which are useful for Collaborative Filtering techniques.
Therefore, it is very difficult to find open datasets with additional item attributes, which would allow the application of Content-Based filtering techniques or Hybrid approaches, specially in the domain of ephemeral textual items (eg. articles and news).
News datasets are also reported in academic literature as very sparse, in the sense that, as users are usually not required to log in in news portals, IDs are based on device cookies, making it hard to track the users page visits in different portals, browsing sessions and devices.
This difficult scenario for research and experiments on Content Recommender Systems was the main motivation for the sharing of this dataset.
CREATE TABLE shared_articles (
"timestamp" BIGINT,
"eventtype" VARCHAR,
"contentid" BIGINT,
"authorpersonid" BIGINT,
"authorsessionid" BIGINT,
"authoruseragent" VARCHAR,
"authorregion" VARCHAR,
"authorcountry" VARCHAR,
"contenttype" VARCHAR,
"url" VARCHAR,
"title" VARCHAR,
"text" VARCHAR,
"lang" VARCHAR
);CREATE TABLE users_interactions (
"timestamp" BIGINT,
"eventtype" VARCHAR,
"contentid" BIGINT,
"personid" BIGINT,
"sessionid" BIGINT,
"useragent" VARCHAR,
"userregion" VARCHAR,
"usercountry" VARCHAR
);Anyone who has the link will be able to view this.