Problem Description
Construct two types of models -- (A) a deep learning classifier such as LSTM or similar model to predict the category of a news article given its title and abstract, and (B) A recommendation system to recommend posts that a user is most likely to click.
The dataset consists of two files -- (1) user_news_clicks.csv, and (2) news_text.csv.
Model A, the deep learning classifier only requires the news_text.csv dataset. The goal is to predict the ‘category’ label using the ‘title’ and ‘abstract; columns. Model B, the recommendation system only requires user_news_clicks.csv but you can use the news_text.csv in addition if you’d like though it is not necessary for this exercise. The goal is to be able to recommend users news articles that they’re likely to click.
Data Description
In news_text.csv - each record consists of three attributes and a target variable:
- Category - There are lots of news categories available in this dataset, as requested we need to only 3 categories - news, sports and finance
- news_id - Identification number of the news
- title - Title of the news
- abstract - Abstract of the news
In user_news_clicks.csv - each record consists of two attributes and a target variable:
- click - User has clicked the articles or not
- user_id - Identification number of the user
- item - Identification number of an item
Goals
- Design the deep learning classifier and the recommendation system models
- Build and train the models using a Python deep learning library such as Tensorflow or PyTorch
- Test the model’s performance using a set of metrics
- Report on the performance of the model
Instructions
NOTE: We do not need to use the entire dataset, if resources are limited. Feel free to sample.
- For Model A, use only the top 3 categories -- namely news, sports, and finance for model training and validation.
- Code and build the models A and B using a Python library such as Pytorch or Tensorflow