MIDAS Hand-Annotated News
A Corpus of Physician-Defined Topics for Data Science and Machine Learning
@kaggle.thedevastator_midas_hand_annotated_news
A Corpus of Physician-Defined Topics for Data Science and Machine Learning
@kaggle.thedevastator_midas_hand_annotated_news
By [source]
This dataset is a hand-annotated collection of news articles covering five physician-defined topics: childhood obesity, mental health, diabetes, children in care, and infectious diseases including Coronavirus. Featuring three formats TXT, CSV and JSON file this resource stimulates groundbreaking research in data science and machine learning approaches.
The source material consists of 2020 news articles provided in TXT format for convenient use of the reader. Each article is enriched by extensive manual annotation which records up to 10 MeSH headings contained within the piece. Furthermore, the generated JSON file supports evaluation tools for classifier accuracy rating.
The dataset columns consist of x (Article ID - Integer), y (Article Text - String), z[0] (MeSH Heading 1 - String), z[1] (MeSH Heading 2 - String) through z[8] (MeSH Heading 9 - String). Unlock powerful insights with this unique resource and make an impact on knowledge discovery today!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
For the TXT format, it is the source of all news articles from each of the dataset’s physician-defined topics. This is great for getting an overview glance at all articles with regards to each topic without needing any added information.
The CSV file contains hand-annotation of news articles on up to 10 medical subject headings (MeSH), so it is an excellent resource if you are looking to quickly extract summaries and insights into different health topics. It also helps researchers compare how multiple topics intersect in the same article over time by displaying the MeSH headings associated with each article in a convenient table view. The columns contained in this csv include: Article ID; Article Text; MeSH Heading 1 through 9; and Topic Label Used For Evaluation On Test Set, where each column contains necessary information supporting research objectives related to health-related diseases or conditions worldwide.
Finally, the JSON file provides input for evaluation purposes when using NLP predictive modeling techniques such as deep learning models to classify new articles from media related sources based on both content and labels associated with them through annotation by healthcare experts1 . The columns contained in this json include: headers which contain information about key article components corresponding values that provide additional insights about their contents - such as domains (excludes/includes sentiment analysis for example); subject heading keys/values relevant keywords/phrases used in contextually related text bodies & sections titles/headlines within an article; meta data structure & entities location identifiers - i​ ncluding geographical details regarding where physical locations featured have been cited accurately2 . Ultimately this offers researchers valuable support when developing effective machine learning models which could later be implemented into day-to-day workflows within healthcare platforms Xeonphi / AI ​ etc..​
By combining these formats together you can gain greater insight into complex areas relating not only diseases but also many other factors influencing people’s physical & mental wellbeing today3​​​ . From extrapolating summaries & tips from hand-annotated news articles or simply diving deeper into highly technical subject matters current research projects may require - there really is something here suited both novice users & experienced professionals
- Implementing a supervised machine learning model to classify the MeSH headings of news articles in order to accurately predict which medical article topics they contain.
- Developing AI computer vision technology that can identify and analyze images in order to better categorize them according to their medical topics.
- Leveraging this dataset to design and develop smart algorithms that can determine connections between different medical topics of articles in real-time, allowing health professionals, journalists and researchers to have access meaningful insights into healthcare trends and developments faster than ever before.
If you use this dataset in your research, please credit the original authors.
Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: re_IRE.csv
| Column name | Description |
|---|---|
| x | Article title. (String) |
| y | Article text. (String) |
| z[0] | MeSH heading 1. (String) |
| z[1] | MeSH heading 2. (String) |
| z[2] | MeSH heading 3. (String) |
| z[3] | MeSH heading 4. (String) |
| z[4] | MeSH heading 5. (String) |
| z[5] | MeSH heading 6. (String) |
| z[6] | MeSH heading 7. (String) |
| z[7] | MeSH heading 8. (String) |
| z[8] | MeSH heading 9. (String) |
File: re_INF.csv
| Column name | Description |
|---|---|
| x | Article title. (String) |
| y | Article text. (String) |
| z[0] | MeSH heading 1. (String) |
| z[1] | MeSH heading 2. (String) |
| z[2] | MeSH heading 3. (String) |
| z[3] | MeSH heading 4. (String) |
| z[4] | MeSH heading 5. (String) |
| z[5] | MeSH heading 6. (String) |
| z[6] | MeSH heading 7. (String) |
| z[7] | MeSH heading 8. (String) |
| z[8] | MeSH heading 9. (String) |
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .
CREATE TABLE eus (
"n" BIGINT -- #,
"article_name" VARCHAR,
"mesh_heading_1" VARCHAR,
"mesh_id_1" VARCHAR,
"mesh_heading_2" VARCHAR,
"mesh_id_2" VARCHAR,
"mesh_heading_3" VARCHAR,
"mesh_id_3" VARCHAR,
"mesh_heading_4" VARCHAR,
"mesh_id_4" VARCHAR,
"mesh_heading_5" VARCHAR,
"mesh_id_5" VARCHAR,
"mesh_heading_6" VARCHAR,
"mesh_id_6" VARCHAR,
"mesh_heading_7" VARCHAR,
"mesh_id_7" VARCHAR,
"mesh_heading_8" VARCHAR,
"mesh_id_8" VARCHAR,
"mesh_heading_9" VARCHAR,
"mesh_id_9" VARCHAR,
"mesh_heading_10" VARCHAR,
"mesh_id_10" VARCHAR,
"unnamed_22" VARCHAR -- Unnamed: 22,
"unnamed_23" VARCHAR -- Unnamed: 23
);CREATE TABLE f1_eus (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE f1_fin (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE f1_inf (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE f1_ire (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE f1_nir (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE fin (
"n" BIGINT -- #,
"article_name" VARCHAR,
"mesh_heading_1" VARCHAR,
"mesh_id_1" VARCHAR,
"mesh_heading_2" VARCHAR,
"mesh_id_2" VARCHAR,
"mesh_heading_3" VARCHAR,
"mesh_id_3" VARCHAR,
"mesh_heading_4" VARCHAR,
"mesh_id_4" VARCHAR,
"mesh_heading_5" VARCHAR,
"mesh_id_5" VARCHAR,
"mesh_heading_6" VARCHAR,
"mesh_id_6" VARCHAR,
"mesh_heading_7" VARCHAR,
"mesh_id_7" VARCHAR,
"mesh_heading_8" VARCHAR,
"mesh_id_8" VARCHAR,
"mesh_heading_9" VARCHAR,
"mesh_id_9" VARCHAR,
"mesh_heading_10" VARCHAR,
"mesh_id_10" VARCHAR
);CREATE TABLE inf (
"n" BIGINT -- #,
"article_name" VARCHAR,
"mesh_heading_1" VARCHAR,
"mesh_id_1" VARCHAR,
"mesh_heading_2" VARCHAR,
"mesh_id_2" VARCHAR,
"mesh_heading_3" VARCHAR,
"mesh_id_3" VARCHAR,
"mesh_heading_4" VARCHAR,
"mesh_id_4" VARCHAR,
"mesh_heading_5" VARCHAR,
"mesh_id_5" VARCHAR,
"mesh_heading_6" VARCHAR,
"mesh_id_6" VARCHAR,
"mesh_heading_7" VARCHAR,
"mesh_id_7" VARCHAR,
"mesh_heading_8" VARCHAR,
"mesh_id_8" VARCHAR,
"mesh_heading_9" VARCHAR,
"mesh_id_9" VARCHAR,
"mesh_heading_10" VARCHAR,
"mesh_id_10" VARCHAR
);CREATE TABLE ire (
"n" BIGINT -- #,
"article_name" VARCHAR,
"mesh_heading_1" VARCHAR,
"mesh_id_1" VARCHAR,
"mesh_heading_2" VARCHAR,
"mesh_id_2" VARCHAR,
"mesh_heading_3" VARCHAR,
"mesh_id_3" VARCHAR,
"mesh_heading_4" VARCHAR,
"mesh_id_4" VARCHAR,
"mesh_heading_5" VARCHAR,
"mesh_id_5" VARCHAR,
"mesh_heading_6" VARCHAR,
"mesh_id_6" VARCHAR,
"mesh_heading_7" VARCHAR,
"mesh_id_7" VARCHAR,
"mesh_heading_8" VARCHAR,
"mesh_id_8" VARCHAR,
"mesh_heading_9" VARCHAR,
"mesh_id_9" VARCHAR,
"mesh_heading_10" VARCHAR,
"mesh_id_10" VARCHAR,
"unnamed_22" VARCHAR -- Unnamed: 22,
"unnamed_23" VARCHAR -- Unnamed: 23,
"unnamed_24" VARCHAR -- Unnamed: 24,
"unnamed_25" VARCHAR -- Unnamed: 25
);CREATE TABLE nir (
"n" BIGINT -- #,
"article_name" VARCHAR,
"mesh_heading_1" VARCHAR,
"mesh_id_1" VARCHAR,
"mesh_heading_2" VARCHAR,
"mesh_id_2" VARCHAR,
"mesh_heading_3" VARCHAR,
"mesh_id_3" VARCHAR,
"mesh_heading_4" VARCHAR,
"mesh_id_4" VARCHAR,
"mesh_heading_5" VARCHAR,
"mesh_id_5" VARCHAR,
"mesh_heading_6" VARCHAR,
"mesh_id_6" VARCHAR,
"mesh_heading_7" VARCHAR,
"mesh_id_7" VARCHAR,
"mesh_heading_8" VARCHAR,
"mesh_id_8" VARCHAR,
"mesh_heading_9" VARCHAR,
"mesh_id_9" VARCHAR,
"mesh_heading_10" VARCHAR,
"mesh_id_10" VARCHAR
);CREATE TABLE pr_eus (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE pr_fin (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE pr_inf (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE pr_ire (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE pr_nir (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE re_eus (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE re_fin (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE re_inf (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE re_ire (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);CREATE TABLE re_nir (
"x" DOUBLE,
"y" DOUBLE,
"z_0" DOUBLE -- Z[0],
"z_1" DOUBLE -- Z[1],
"z_2" DOUBLE -- Z[2],
"z_3" DOUBLE -- Z[3],
"z_4" DOUBLE -- Z[4],
"z_5" DOUBLE -- Z[5],
"z_6" DOUBLE -- Z[6],
"z_7" DOUBLE -- Z[7],
"z_8" DOUBLE -- Z[8]
);Anyone who has the link will be able to view this.