Baselight

USA News Dataset

Information of news articles and click histories

@kaggle.vinayakshanawad_us_news_dataset

About this Dataset

USA News Dataset

Problem Description

Construct two types of models -- (A) a deep learning classifier such as LSTM or similar model to predict the category of a news article given its title and abstract, and (B) A recommendation system to recommend posts that a user is most likely to click.

The dataset consists of two files -- (1) user_news_clicks.csv, and (2) news_text.csv.

Model A, the deep learning classifier only requires the news_text.csv dataset. The goal is to predict the ‘category’ label using the ‘title’ and ‘abstract; columns. Model B, the recommendation system only requires user_news_clicks.csv but you can use the news_text.csv in addition if you’d like though it is not necessary for this exercise. The goal is to be able to recommend users news articles that they’re likely to click.

Data Description

In news_text.csv - each record consists of three attributes and a target variable:

  • Category - There are lots of news categories available in this dataset, as requested we need to only 3 categories - news, sports and finance
  • news_id - Identification number of the news
  • title - Title of the news
  • abstract - Abstract of the news

In user_news_clicks.csv - each record consists of two attributes and a target variable:

  • click - User has clicked the articles or not
  • user_id - Identification number of the user
  • item - Identification number of an item

Goals

  • Design the deep learning classifier and the recommendation system models
  • Build and train the models using a Python deep learning library such as Tensorflow or PyTorch
  • Test the model’s performance using a set of metrics
  • Report on the performance of the model

Instructions

NOTE: We do not need to use the entire dataset, if resources are limited. Feel free to sample.

  • For Model A, use only the top 3 categories -- namely news, sports, and finance for model training and validation.
  • Code and build the models A and B using a Python library such as Pytorch or Tensorflow

Share link

Anyone who has the link will be able to view this.