Baselight

Predict Survival Of Patients With Heart Failure

Heart Failure Clinical Records

@kaggle.rabieelkharoua_predict_survival_of_patients_with_heart_failure

About this Dataset

Predict Survival Of Patients With Heart Failure

Quick Start 🚀: If you're not up for reading all of this, head straight to the file section. There, you'll find detailed explanations of the files and all the variables you need.

This dataset contains the medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.

Dataset Characteristics: Multivariate

Subject Area: Health and Medicine

Associated Tasks: Classification, Regression, Clustering

Feature Type: Integer, Real

Instances: 299

Features: 12

Dataset Information

A detailed description of the dataset can be found in the Dataset section of the following paper:

Title:
Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone

Authors:

Davide Chicco
Giuseppe Jurman
Source:
BMC Medical Informatics and Decision Making 20, 16 (2020)

DOI:
https://doi.org/10.1186/s12911-020-1023-5

Dataset Details

Feature Explanation Measurement Range
Age Age of the patient Years [40,..., 95]
Anaemia Decrease of red blood cells or hemoglobin Boolean 0, 1
High blood pressure If a patient has hypertension Boolean 0, 1
Creatinine phosphokinase Level of the CPK enzyme in the blood mcg/L [23,..., 7861]
(CPK)
Diabetes If the patient has diabetes Boolean 0, 1
Ejection fraction Percentage of blood leaving the heart at each Percentage [14,..., 80]
contraction
Sex Woman or man Binary 0, 1
Platelets Platelets in the blood kiloplatelets/mL [25.01,..., 850.00]
Serum creatinine Level of creatinine in the blood mg/dL [0.50,..., 9.40]
Serum sodium Level of sodium in the blood mEq/L [114,..., 148]
Smoking If the patient smokes Boolean 0, 1
Time Follow-up period Days [4,...,285]
(target) death event If the patient died during the follow-up period Boolean 0, 1

Statistical quantitative description of the category features

number of patients. %: percentage of patients. Full sample: 299 individuals. Dead patients: 96 individuals. Survived patients: 203 individuals.

Category feature Full sample Dead patients Survived patients
Anaemia (0: false)
# % #
170 56.86 50
Anaemia (1: true)
# % #
129 43.14 46
High blood pressure (0: false)
# % #
194 64.88 57
High blood pressure (1: true)
# % #
105 35.12 39
Diabetes (0: false)
# % #
174 58.19 56
Diabetes (1: true)
# % #
125 41.81 40
Sex (0: woman)
# % #
105 35.12 34
Sex (1: man)
# % #
194 64.88 62
Smoking (0: false)
# % #
203 67.89 66
Smoking (1: true)
# % #
96 32.11 30

Statistical quantitative description of the category features

Full sample: 299 individuals. Dead patients: 96 individuals. Survived patients: 203 individuals. σ: standard deviation

Here's the organized table:

Numeric feature Full sample Dead patients Survived patients
Median Mean σ
Age 60.00 60.83 11.89
Creatinine phosphokinase 250.00 581.80 970.29
Ejection fraction 38.00 38.08 11.83
Platelets 262.00 263.36 97.80
Serum creatinine 1.10 1.39 1.03
Serum sodium 137.00 136.60 4.41
Time 115.00 130.30 77.61

Deep dive into the dataset

Dataset Overview:

  • Medical records of 299 heart failure patients collected at Faisalabad Institute of Cardiology and Allied Hospital in Faisalabad, Punjab, Pakistan, during April–December 2015.
  • Patients had left ventricular systolic dysfunction and were classified as NYHA classes III or IV.
  • Consisted of 105 women and 194 men, aged between 40 and 95 years old.

Features:

  • 13 features including clinical, body, and lifestyle information.
  • Binary features: anaemia, high blood pressure, diabetes, sex, and smoking.
  • Anaemia defined as haematocrit levels lower than 36%.
  • Definition of high blood pressure not provided in the dataset.
  • Creatinine phosphokinase (CPK) indicates the level of CPK enzyme in blood, possibly indicating heart failure or injury with high levels.
  • Ejection fraction measures the percentage of blood pumped out by the left ventricle with each contraction.
  • Serum creatinine indicates kidney function; high levels may suggest renal dysfunction.
  • Serum sodium test checks sodium levels in the blood, abnormal levels may indicate heart failure.
  • Death event feature used as target in binary classification study, indicating if the patient died or survived during the follow-up period (130 days on average).

Dataset Characteristics:

  • Dataset represented as a table with 299 rows (patients) and 13 columns (features).
  • Imbalance in the dataset with 203 survived patients and 96 dead patients.
  • Survival rate: 67.89% negatives (survived), 32.11% positives (died).
  • Further details and changes to feature names available in the original dataset curator's publication.

Note 📝: If you find this dataset useful, please consider giving it an upvote! Your support is appreciated.

Share link

Anyone who has the link will be able to view this.