This dataset is crafted for beginners to practice data cleaning and preprocessing techniques in machine learning. It contains 157 rows of student admission records, including duplicate rows, missing values, and some data inconsistencies (e.g., outliers, unrealistic values). It’s ideal for practicing common data preparation steps before applying machine learning algorithms.
The dataset simulates a university admission record system, where each student’s admission profile includes test scores, high school percentages, and admission status. The data contains realistic flaws often encountered in raw data, offering hands-on experience in data wrangling.
The dataset contains the following columns:
Name: Student's first name (Pakistani names).
Age: Age of the student (some outliers and missing values).
Gender: Gender (Male/Female).
Admission Test Score: Score obtained in the admission test (includes outliers and missing values).
High School Percentage: Student's high school final score percentage (includes outliers and missing values).
City: City of residence in Pakistan.
Admission Status: Whether the student was accepted or rejected.