Symptoms To Diseases
@kaggle.abhishekgodara_symptoms_to_diseases
@kaggle.abhishekgodara_symptoms_to_diseases
This dataset is part of the Digital Diagnosis Project, an AI-based initiative to create a comprehensive, machine-readable symptom–disease dataset for research, experimentation, ML models and medical NLP tasks.
It combines two versions of the same data source:
A structured, raw dataset with 713 diseases and 377 binary symptom columns.
A processed, NLP-ready dataset with 254 diseases and natural-language symptom descriptions.
Together, they form one of the most versatile open-source datasets for both classical ML and deep learning (Transformer-based) medical research.
It’s designed for classical machine learning tasks, such as multi-label classification, and NLP tasks and also for Fine-Tuning llms.
| Attribute | Description |
|---|---|
| Rows (Diseases) | 713 |
| Columns (Symptoms) | 377 |
| Data Type | Binary (0 = symptom absent, 1 = symptom present) |
| Target Variable | Disease |
| Use Case | ML-based disease prediction |
Sample Dataset..
| disease | fever | cough | headache | nausea | chest_pain | ... |
|----------|--------|--------|-----------|----------|-------------|
| influenza | 1 | 1 | 1 | 0 | 0 | ... |
| migraine | 0 | 0 | 1 | 1 | 0 | ... |
| heart_attack | 0 | 0 | 0 | 1 | 1 | ... |
| Attribute | Description |
|---|---|
| Rows (Diseases) | 254 |
| Format | Each row represents a natural-language description of symptoms and its corresponding disease. |
| Data Type | Text + Label |
| Target Variable | Disease |
| Use Case | NLP and deep learning models such as BERT, BioBERT, DistilBERT, and LSTM. |
Sample Dataset..
| disease | symptom_text |
|---|---|
| influenza | fever, cough, sore throat, and headache |
| asthma | persistent cough, chest tightness, wheezing |
| heart_attack | sudden chest pain, sweating, nausea |
@kaggle
@cdc
@cdc
Anyone who has the link will be able to view this.