Dataset: Titanic Data Set

About this Dataset

Titanic Data Set

Detail Description:

The Titanic dataset offers a comprehensive glimpse into the passengers aboard the ill-fated RMS Titanic, which famously sank on its maiden voyage in April 1912 after colliding with an iceberg. This dataset contains a wealth of information about individual passengers, including demographics, ticket class, cabin information, family relationships, fare details, and most notably, survival outcomes.

Key attributes within the dataset include:

Passenger Class (Pclass): This categorical variable indicates the ticket class of each passenger, ranging from 1st class (wealthiest) to 3rd class (lower socioeconomic status).
Name: The names of passengers, providing insight into their identities.
Sex: Gender of passengers, categorized as male or female.
Age: Age of passengers, providing information about the demographic composition of the Titanic's passengers.
SibSp: Number of siblings/spouses aboard the Titanic, offering insight into family relationships.
Parch: Number of parents/children aboard the Titanic, indicating family size and composition.
Ticket: Ticket number, providing additional information about passenger accommodations and fare details.
Fare: Fare paid by each passenger, which can be indicative of their ticket class and economic status.
Cabin: Cabin number or location, offering insights into passenger accommodations.
Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton), providing information about passengers' embarkation points.
Survived: This binary variable indicates whether a passenger survived the disaster (1) or not (0), serving as the primary outcome variable for analyses.

Researchers and data analysts frequently utilize the Titanic dataset for various purposes, including:

Exploratory data analysis to understand the demographic composition of passengers and their survival outcomes.
Predictive modeling to develop algorithms that predict the likelihood of survival based on passenger characteristics.
Feature engineering to derive new variables that may enhance predictive accuracy.
Hypothesis testing to investigate factors associated with survival rates, such as passenger class, gender, age, and family size.

Overall, the Titanic dataset serves as a valuable resource for understanding historical events, exploring data analysis techniques, and teaching machine learning concepts. Its accessibility and rich contextual information make it a popular choice for both educational and research purposes within the data science community.

Tables

Train

@kaggle.zain280_titanic_data_set.train

42.13 KB
891 rows
12 columns


CREATE TABLE train (
  "passengerid" BIGINT,
  "survived" BIGINT,
  "pclass" BIGINT,
  "name" VARCHAR,
  "sex" VARCHAR,
  "age" DOUBLE,
  "sibsp" BIGINT,
  "parch" BIGINT,
  "ticket" VARCHAR,
  "fare" DOUBLE,
  "cabin" VARCHAR,
  "embarked" VARCHAR
);