MIT-BIH Arrhythmia Database (Simple CSVs)
Electrocardiograms (ECG/EKG) from 47 patients at Beth Israel hospital (1975-79)
@kaggle.protobioengineering_mit_bih_arrhythmia_database_modern_2023
Electrocardiograms (ECG/EKG) from 47 patients at Beth Israel hospital (1975-79)
@kaggle.protobioengineering_mit_bih_arrhythmia_database_modern_2023
A beginner-friendly version of the MIT-BIH Arrhythmia Database, which contains 48 electrocardiograms (EKGs) from 47 patients that were at Beth Israel Deaconess Medical Center in Boston, MA in 1975-1979.
There are 48 CSVs, each of which is a 30-minute echocardiogram (EKG) from a single patient (record 201 and 202 are from the same patient). Data was collected at 360 Hz, meaning that 360 data points is equal to 1 second of time.
Banner photo by Joshua Chehov on Unsplash.
EKGs, or electrocardiograms, measure the heart's function by looking at its electrical activity. The electrical activity in each part of the heart is supposed to happen in a particular order and intensity, creating that classic "heartbeat" line (or "QRS complex") you see on monitors in medical TV shows.
There are a few types of EKGs (4-lead, 5-lead, 12-lead, etc.), which give us varying detail about the heart. A 12-lead is one of the most detailed types of EKGs, as it allows us to get 12 different outputs or graphs, all looking at different, specific parts of the heart muscles.
This dataset only publishes two leads from each patient's 12-lead EKG, since that is all that the original MIT-BIH database provided.
Check out Ninja Nerd's EKG Basics tutorial on YouTube to understand what each part of the QRS complex (or heartbeat) means from an electrical standpoint.
Each file's name is the ID of the patient (except for 201 and 202, which are the same person).
index / 360 * 1000
)The two leads are often lead MLII and another lead such as V1, V2, or V5, though some datasets do not use MLII at all. MLII is the lead most often associated with the classic QRS Complex (the medical name for a single heartbeat).
Milliseconds were calculated and added as a secondary index to each dataset. Calculations were made by dividing the index
by 360
Hz then multiplying by 1000
. The original index was preserved, since the calculation of milliseconds as digital signals processing (e.g. filtering) occurs may cause issues with the correlation and merging of data. You are encouraged to try whichever index is most suitable for your analysis and/or recalculate a time index with Pandas' to_timedelta()
.
Info about each of the 47 patients is available here, including age, gender, medications, diagnoses, etc.
Physionet has some online tutorials and tips for analyzing EKGs and other time series / digital signals.
Check out our notebook for opening and visualizing the data.
A write-up on how the data was converted from .dat
to .csv
files is available on Medium.com. Data was downloaded from the MIT-BIH Arrhythmia Database then converted to CSV.
Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001). (PMID: 11446209)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
Anyone who has the link will be able to view this.