Baselight

Heart Disease Dataset | Cleaned

Copied from Original Heart Dataset. This Dataset is Cleaned With Advance Methods

@kaggle.abdmental01_heart_disease_dataset

About this Dataset

Heart Disease Dataset | Cleaned

This Dataset is Copied From the Orignal Dataset. This Dataset is Preprocess with Advance Method. This Dataset is Cleaned From Missing Values.

Context

This is a multivariate type of dataset which means providing or involving a variety of separate mathematical or statistical variables, multivariate numerical data analysis. It is composed of 14 attributes which are age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting electrocardiographic results, maximum heart rate achieved, exercise-induced angina, oldpeak — ST depression induced by exercise relative to rest, the slope of the peak exercise ST segment, number of major vessels and Thalassemia. This database includes 76 attributes, but all published studies relate to the use of a subset of 14 of them. The Cleveland database is the only one used by ML researchers to date. One of the major tasks on this dataset is to predict based on the given attributes of a patient that whether that particular person has heart disease or not and other is the experimental task to diagnose and find out various insights from this dataset which could help in understanding the problem more.

Content

Column Descriptions:

  • id (Unique id for each patient)
  • age (Age of the patient in years)
  • origin (place of study)
  • sex (Male/Female)
  • cp chest pain type ([typical angina, atypical angina, non-anginal, asymptomatic])
  • trestbps resting blood pressure (resting blood pressure (in mm Hg on admission to the hospital))
  • chol (serum cholesterol in mg/dl)
  • fbs (if fasting blood sugar > 120 mg/dl)
  • restecg (resting electrocardiographic results)
    -- Values: [normal, stt abnormality, lv hypertrophy]
  • thalach: maximum heart rate achieved
  • exang: exercise-induced angina (True/ False)
  • oldpeak: ST depression induced by exercise relative to rest
  • lope: the slope of the peak exercise ST segment
  • ca: number of major vessels (0-3) colored by fluoroscopy
  • thal: [normal; fixed defect; reversible defect]
  • num: the predicted attribute

Acknowledgements

Creators:

  1. Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
  2. University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
  3. University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
  4. V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.

Relevant Papers:

  • Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64,304--310.
    Web Link
  • David W. Aha & Dennis Kibler. "Instance-based prediction of heart-disease presence with the Cleveland database." Web Link
  • Gennari, J.H., Langley, P, & Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11-61. Web Link

Citation Request:

The authors of the databases have requested that any publications resulting from the use of the data include the names of the principal investigator responsible for the data collection at each institution. They would be:

  • Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
  • University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
  • University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
  • V.A. Medical Center, Long Beach and Cleveland Clinic Foundation:Robert Detrano, M.D., Ph.D.

Share link

Anyone who has the link will be able to view this.