Baselight

NLP On Research Articles

Multi Label Classification using NLP on Research Articles

@kaggle.vetrirah_janatahack_independence_day_2020_ml_hackathon

Loading...
Loading...

About this Dataset

NLP On Research Articles

Context

Topic Modeling for Research Articles
Researchers have access to large online archives of scientific articles. As a consequence, finding relevant articles has become more difficult. Tagging or topic modelling provides a way to give token of identification to research articles which facilitates recommendation and search process.

Content

Given the abstract and title for a set of research articles, predict the topics for each article included in the test set.

Note that a research article can possibly have more than 1 topic. The research article abstracts and titles are sourced from the following 6 topics:

  1. Computer Science
  2. Physics
  3. Mathematics
  4. Statistics
  5. Quantitative Biology
  6. Quantitative Finance

Acknowledgements

https://datahack.analyticsvidhya.com/contest/janatahack-independence-day-2020-ml-hackathon/#ProblemStatement

Inspiration

https://datahack.analyticsvidhya.com/contest/janatahack-independence-day-2020-ml-hackathon/#ProblemStatement

Tables

Sample Submission

@kaggle.vetrirah_janatahack_independence_day_2020_ml_hackathon.sample_submission
  • 56.25 KB
  • 8989 rows
  • 7 columns
Loading...

CREATE TABLE sample_submission (
  "id" BIGINT,
  "computer_science" BIGINT,
  "physics" BIGINT,
  "mathematics" BIGINT,
  "statistics" BIGINT,
  "quantitative_biology" BIGINT,
  "quantitative_finance" BIGINT
);

Test

@kaggle.vetrirah_janatahack_independence_day_2020_ml_hackathon.test
  • 5.63 MB
  • 8989 rows
  • 3 columns
Loading...

CREATE TABLE test (
  "id" BIGINT,
  "title" VARCHAR,
  "abstract" VARCHAR
);

Train

@kaggle.vetrirah_janatahack_independence_day_2020_ml_hackathon.train
  • 13.12 MB
  • 20972 rows
  • 9 columns
Loading...

CREATE TABLE train (
  "id" BIGINT,
  "title" VARCHAR,
  "abstract" VARCHAR,
  "computer_science" BIGINT,
  "physics" BIGINT,
  "mathematics" BIGINT,
  "statistics" BIGINT,
  "quantitative_biology" BIGINT,
  "quantitative_finance" BIGINT
);

Share link

Anyone who has the link will be able to view this.