Baselight
Sign In
kaggle

Symptoms To Diseases

@kaggle.abhishekgodara_symptoms_to_diseases

Loading...
Loading...

Symptom-Based Disease Dataset

This dataset is part of the Digital Diagnosis Project, an AI-based initiative to create a comprehensive, machine-readable symptom–disease dataset for research, experimentation, ML models and medical NLP tasks.

It combines two versions of the same data source:

A structured, raw dataset with 713 diseases and 377 binary symptom columns.

A processed, NLP-ready dataset with 254 diseases and natural-language symptom descriptions.

Together, they form one of the most versatile open-source datasets for both classical ML and deep learning (Transformer-based) medical research.

It’s designed for classical machine learning tasks, such as multi-label classification, and NLP tasks and also for Fine-Tuning llms.

First Dataset(data.csv file)

Attribute Description
Rows (Diseases) 713
Columns (Symptoms) 377
Data Type Binary (0 = symptom absent, 1 = symptom present)
Target Variable Disease
Use Case ML-based disease prediction

Sample Dataset..
| disease | fever | cough | headache | nausea | chest_pain | ... |
|----------|--------|--------|-----------|----------|-------------|
| influenza | 1 | 1 | 1 | 0 | 0 | ... |
| migraine | 0 | 0 | 1 | 1 | 0 | ... |
| heart_attack | 0 | 0 | 0 | 1 | 1 | ... |

Second Dataset(final_symptoms_to_disease.csv)

Attribute Description
Rows (Diseases) 254
Format Each row represents a natural-language description of symptoms and its corresponding disease.
Data Type Text + Label
Target Variable Disease
Use Case NLP and deep learning models such as BERT, BioBERT, DistilBERT, and LSTM.

Sample Dataset..

disease symptom_text
influenza fever, cough, sore throat, and headache
asthma persistent cough, chest tightness, wheezing
heart_attack sudden chest pain, sweating, nausea

Related Datasets

Share link

Anyone who has the link will be able to view this.