Baselight

Topic Modeling For Research Articles

NLP Topic Modelling based on Research Articles.

@kaggle.blessondensil294_topic_modeling_for_research_articles

Loading...
Loading...

About this Dataset

Topic Modeling For Research Articles

Context

Since the lockdown was announced in the country back in March, we started with a 1 day hackathon called Janatahack inspired from Janata cerfew to start our war against the pandemic. Looking at the amazing response and demand for more, we continued the hackathons over the weekends every week. Janatahack today is a phenomena where loads of esteemed members of our community regularly participate to showcase their machine learning skills by sharing their approaches and more important to learn how to apply machine learning and predictive analytics to new domains such as agriculture, Banking, IOT, forecasting and so on.

This time we bring to you hackathon, this time a 10 day extravaganza launching on the independence day for India, 15th August 2020. Open to all data practitioners, beginners in data science and data scientists. Register today to test your skills and earn AV Points. The theme for this hackathon will be launched on the independence day along with the problem statement and the dataset. So stay tuned and register today to receive all the updates regarding this exciting event.

Content

Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.

Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.

Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:

  1. Computer Science

  2. Physics

  3. Mathematics

  4. Statistics

  5. Quantitative Biology

  6. Quantitative Finance

Acknowledgements

Thanks to Analytics Vidhya for the Dataset

Tables

Test

@kaggle.blessondensil294_topic_modeling_for_research_articles.test
  • 5.63 MB
  • 8989 rows
  • 3 columns
Loading...

CREATE TABLE test (
  "id" BIGINT,
  "title" VARCHAR,
  "abstract" VARCHAR
);

Train

@kaggle.blessondensil294_topic_modeling_for_research_articles.train
  • 13.12 MB
  • 20972 rows
  • 9 columns
Loading...

CREATE TABLE train (
  "id" BIGINT,
  "title" VARCHAR,
  "abstract" VARCHAR,
  "computer_science" BIGINT,
  "physics" BIGINT,
  "mathematics" BIGINT,
  "statistics" BIGINT,
  "quantitative_biology" BIGINT,
  "quantitative_finance" BIGINT
);

Share link

Anyone who has the link will be able to view this.