Overview
Dataset Title: Diabetes Dataset 2019
Year: 2019
Variables (Columns)
Pregnancies: Number of times pregnant
Glucose: Plasma glucose concentration (mg/dL)
BloodPressure: Diastolic blood pressure (mm Hg)
SkinThickness: Triceps skinfold thickness (mm)
Insulin: 2-Hour serum insulin (mu U/ml)
BMI: Body mass index (weight in kg / (height in m)^2)
DiabetesPedigreeFunction: Diabetes pedigree function (a measure of genetic influence)
Age: Age (years)
Outcome
Binary variable indicating the presence (1) or absence (0) of diabetes
Data Examples:
The dataset contains multiple rows, with each row representing an individual case or patient.
Each row includes information on the number of pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, BMI, diabetes pedigree function, age, and outcome (diabetes presence or absence).
The purpose of this dataset is to be focused on studying the relationship between various factors (e.g., pregnancies, glucose levels, BMI) and the presence or absence of diabetes.
Diabetes Dataset Analysis
Exploratory Data Analysis (EDA): Explore the distributions, relationships, and summary statistics of the variables.
Predictive Modeling: Develop predictive models to determine the likelihood of diabetes based on the given variables.
Feature Importance: Assess the importance of each variable in predicting the presence or absence of diabetes.
Risk Assessment: Identify key risk factors associated with diabetes based on the dataset.