Titanic - Machine Learning from Disaster Dataset
The Titanic dataset is a well-known introductory dataset for binary classification problems in machine learning. It is derived from the passenger manifest of the RMS Titanic and provides information about individuals on board. The goal is to predict the likelihood of survival based on passenger attributes.
The dataset contains the following features:
- PassengerId: Unique identifier for each passenger.
- Survived: Target variable (0 = Did not survive, 1 = Survived).
- Pclass: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd).
- Name: Passenger's name, which can provide insights like titles.
- Sex: Gender of the passenger.
- Age: Age of the passenger.
- SibSp: Number of siblings and spouses aboard.
- Parch: Number of parents and children aboard.
- Ticket: Ticket number (can sometimes reveal grouping information).
- Fare: Ticket fare.
- Cabin: Cabin number (often incomplete).
- Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
The dataset is split into:
train.csv: Includes features and the survival status for model training.
test.csv: Includes features without the survival status for predictions.
This dataset serves as a foundation for learning data preprocessing, feature engineering, and predictive modelin