Medical Students Dataset
Practise Preprocessing Issues.
@kaggle.slmsshk_medical_students_dataset
Practise Preprocessing Issues.
@kaggle.slmsshk_medical_students_dataset
The Medical Student Dataset is a simulated dataset containing 100,000 rows and 12 columns. The dataset is designed to mimic real-world data commonly encountered in medical education and research. It includes various preprocessing issues commonly observed in data, such as missing values, duplicates, and inconsistencies.
The dataset consists of the following columns:
StudentID
: Unique identifier for each medical student.Gender
: Gender of the student (e.g., Male, Female).Age
: Age of the student in years.Ethnicity
: Ethnicity of the student.Year
: Academic year of the student.University
: Name of the university where the student is enrolled.GPA
: Grade Point Average of the student.MCAT Score
: Medical College Admission Test (MCAT) score of the student.Clinical Experience
: Indicator of whether the student has previous clinical experience (Yes/No).Research Experience
: Indicator of whether the student has previous research experience (Yes/No).Publication Count
: Number of publications attributed to the student.Exam Score
: Performance score on a standardized medical examination.The dataset has been intentionally created to include various preprocessing issues, such as:
This dataset can be used for various purposes, including data cleaning and preprocessing exercises, exploring data analysis techniques, and evaluating machine learning algorithms. It provides an opportunity to practice handling real-world data challenges often encountered in the field of medical education and research.
CREATE TABLE medical_students_dataset (
"student_id" DOUBLE,
"age" DOUBLE,
"gender" VARCHAR,
"height" DOUBLE,
"weight" DOUBLE,
"blood_type" VARCHAR,
"bmi" DOUBLE,
"temperature" DOUBLE,
"heart_rate" DOUBLE,
"blood_pressure" DOUBLE,
"cholesterol" DOUBLE,
"diabetes" VARCHAR,
"smoking" VARCHAR
);
Anyone who has the link will be able to view this.