Baselight

Diabetes Dataset

Predicting Diabetes Onset Based on Diagnostic Measures

@kaggle.hasibur013_diabetes_dataset

About this Dataset

Diabetes Dataset

This dataset provides detailed medical diagnostic measurements that were collected to predict the onset of diabetes based on several health factors. It consists of 768 records of female patients, each characterized by 8 health-related attributes. The Outcome variable indicates whether the patient has diabetes (1) or not (0). The dataset can be used to train and test machine learning models for classification tasks related to diabetes prediction.

Dataset Columns:

  • Pregnancies: Number of times the patient has been pregnant.
  • Glucose: Plasma glucose concentration after a 2-hour oral glucose tolerance test.
  • BloodPressure: Diastolic blood pressure (mm Hg).
  • SkinThickness: Triceps skinfold thickness (mm).
  • Insulin: 2-hour serum insulin (mu U/ml).
  • BMI: Body mass index (weight in kg/(height in m)^2).
  • DiabetesPedigreeFunction: A function that represents the patient’s diabetes pedigree (i.e., likelihood of diabetes based on family history).
  • Age: Age of the patient (years).
  • Outcome: Binary outcome (0 or 1) where 1 indicates the presence of diabetes and 0 indicates the absence.

Usage:

This dataset can be used for:

  • Building classification models to predict diabetes onset.
  • Exploratory data analysis to identify trends and correlations among health indicators.
  • Feature engineering and feature selection for healthcare-related datasets.

Source:

This dataset is originally adapted from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and is widely used in machine learning research on healthcare and medical diagnostics.

Share link

Anyone who has the link will be able to view this.