Baselight

Cardiovascular Disease

predict the presence or absence of cardiovascular disease.

@kaggle.colewelkins_cardiovascular_disease

About this Dataset

Cardiovascular Disease

Cardiovascular Disease Dataset Description

Overview

This dataset consolidates information from two primary sources:

  1. UCI Machine Learning Repository - Heart Disease Dataset
  2. Kaggle - Heart Disease Dataset by YasserH

The primary aim is to predict the presence or absence of cardiovascular disease based on various patient metrics.

Variables

  • ID: Unique identifier for each patient.
  • age: Age of the patient in days.
  • age_years: Age of the patient in years (derived from age).
  • gender: Gender of the patient. Categorical variable (1: Female, 2: Male).
  • height: Height of the patient in centimeters.
  • weight: Weight of the patient in kilograms.
  • ap_hi: Systolic blood pressure.
  • ap_lo: Diastolic blood pressure.
  • cholesterol: Cholesterol levels. Categorical variable (1: Normal, 2: Above Normal, 3: Well Above Normal).
  • gluc: Glucose levels. Categorical variable (1: Normal, 2: Above Normal, 3: Well Above Normal).
  • smoke: Smoking status. Binary variable (0: Non-smoker, 1: Smoker).
  • alco: Alcohol intake. Binary variable (0: Does not consume alcohol, 1: Consumes alcohol).
  • active: Physical activity. Binary variable (0: Not physically active, 1: Physically active).
  • cardio: Presence or absence of cardiovascular disease. Target variable. Binary (0: Absence, 1: Presence).
  • bmi: Body Mass Index, derived from weight and height. Calculated as ( \text{BMI} = \frac{\text{weight (kg)}}{\text{height (m)}^2} ).
  • bp_category: Blood pressure category based on ap_hi and ap_lo. Categories include "Normal", "Elevated", "Hypertension Stage 1", "Hypertension Stage 2", and "Hypertensive Crisis".
  • bp_category_encoded: Encoded form of bp_category for machine learning purposes.

Data Source and Acknowledgment

Share link

Anyone who has the link will be able to view this.