Quick Start 🚀: If you're not up for reading all of this, head straight to the file section. There, you'll find detailed explanations of the files and all the variables you need.
This dataset contains the medical records of 299 patients who had heart failure, collected during their follow-up period, where each patient profile has 13 clinical features.
Dataset Characteristics: Multivariate
Subject Area: Health and Medicine
Associated Tasks: Classification, Regression, Clustering
Feature Type: Integer, Real
Instances: 299
Features: 12
Dataset Information
A detailed description of the dataset can be found in the Dataset section of the following paper:
Title:
Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone
Authors:
Davide Chicco
Giuseppe Jurman
Source:
BMC Medical Informatics and Decision Making 20, 16 (2020)
DOI:
https://doi.org/10.1186/s12911-020-1023-5
Dataset Details
Feature |
Explanation |
Measurement |
Range |
Age |
Age of the patient |
Years |
[40,..., 95] |
Anaemia |
Decrease of red blood cells or hemoglobin |
Boolean |
0, 1 |
High blood pressure |
If a patient has hypertension |
Boolean |
0, 1 |
Creatinine phosphokinase |
Level of the CPK enzyme in the blood |
mcg/L |
[23,..., 7861] |
(CPK) |
|
|
|
Diabetes |
If the patient has diabetes |
Boolean |
0, 1 |
Ejection fraction |
Percentage of blood leaving the heart at each |
Percentage |
[14,..., 80] |
|
contraction |
|
|
Sex |
Woman or man |
Binary |
0, 1 |
Platelets |
Platelets in the blood |
kiloplatelets/mL |
[25.01,..., 850.00] |
Serum creatinine |
Level of creatinine in the blood |
mg/dL |
[0.50,..., 9.40] |
Serum sodium |
Level of sodium in the blood |
mEq/L |
[114,..., 148] |
Smoking |
If the patient smokes |
Boolean |
0, 1 |
Time |
Follow-up period |
Days |
[4,...,285] |
(target) death event |
If the patient died during the follow-up period |
Boolean |
0, 1 |
Statistical quantitative description of the category features
number of patients. %: percentage of patients. Full sample: 299 individuals. Dead patients: 96 individuals. Survived patients: 203 individuals.
Category feature |
Full sample |
Dead patients |
Survived patients |
Anaemia (0: false) |
|
|
|
|
# |
% |
# |
|
170 |
56.86 |
50 |
Anaemia (1: true) |
|
|
|
|
# |
% |
# |
|
129 |
43.14 |
46 |
High blood pressure (0: false) |
|
|
|
|
# |
% |
# |
|
194 |
64.88 |
57 |
High blood pressure (1: true) |
|
|
|
|
# |
% |
# |
|
105 |
35.12 |
39 |
Diabetes (0: false) |
|
|
|
|
# |
% |
# |
|
174 |
58.19 |
56 |
Diabetes (1: true) |
|
|
|
|
# |
% |
# |
|
125 |
41.81 |
40 |
Sex (0: woman) |
|
|
|
|
# |
% |
# |
|
105 |
35.12 |
34 |
Sex (1: man) |
|
|
|
|
# |
% |
# |
|
194 |
64.88 |
62 |
Smoking (0: false) |
|
|
|
|
# |
% |
# |
|
203 |
67.89 |
66 |
Smoking (1: true) |
|
|
|
|
# |
% |
# |
|
96 |
32.11 |
30 |
Statistical quantitative description of the category features
Full sample: 299 individuals. Dead patients: 96 individuals. Survived patients: 203 individuals. σ: standard deviation
Here's the organized table:
Numeric feature |
Full sample |
Dead patients |
Survived patients |
|
Median |
Mean |
σ |
Age |
60.00 |
60.83 |
11.89 |
Creatinine phosphokinase |
250.00 |
581.80 |
970.29 |
Ejection fraction |
38.00 |
38.08 |
11.83 |
Platelets |
262.00 |
263.36 |
97.80 |
Serum creatinine |
1.10 |
1.39 |
1.03 |
Serum sodium |
137.00 |
136.60 |
4.41 |
Time |
115.00 |
130.30 |
77.61 |
Deep dive into the dataset
Dataset Overview:
- Medical records of 299 heart failure patients collected at Faisalabad Institute of Cardiology and Allied Hospital in Faisalabad, Punjab, Pakistan, during April–December 2015.
- Patients had left ventricular systolic dysfunction and were classified as NYHA classes III or IV.
- Consisted of 105 women and 194 men, aged between 40 and 95 years old.
Features:
- 13 features including clinical, body, and lifestyle information.
- Binary features: anaemia, high blood pressure, diabetes, sex, and smoking.
- Anaemia defined as haematocrit levels lower than 36%.
- Definition of high blood pressure not provided in the dataset.
- Creatinine phosphokinase (CPK) indicates the level of CPK enzyme in blood, possibly indicating heart failure or injury with high levels.
- Ejection fraction measures the percentage of blood pumped out by the left ventricle with each contraction.
- Serum creatinine indicates kidney function; high levels may suggest renal dysfunction.
- Serum sodium test checks sodium levels in the blood, abnormal levels may indicate heart failure.
- Death event feature used as target in binary classification study, indicating if the patient died or survived during the follow-up period (130 days on average).
Dataset Characteristics:
- Dataset represented as a table with 299 rows (patients) and 13 columns (features).
- Imbalance in the dataset with 203 survived patients and 96 dead patients.
- Survival rate: 67.89% negatives (survived), 32.11% positives (died).
- Further details and changes to feature names available in the original dataset curator's publication.
Note 📝: If you find this dataset useful, please consider giving it an upvote! Your support is appreciated.