Baselight

Airplane Crashes Since 1908

Full history of airplane crashes throughout the world, from 1908-present

@kaggle.saurograndi_airplane_crashes_since_1908

Loading...
Loading...

About this Dataset

Airplane Crashes Since 1908

Airplane Crashes and Fatalities Since 1908 (Full history of airplane crashes throughout the world, from 1908-present)

At the time this Dataset was created in Kaggle (2016-09-09), the original version was hosted by Open Data by Socrata at the at: https://opendata.socrata.com/Government/Airplane-Crashes-and-Fatalities-Since-1908/q2te-8cvq, but unfortunately that is not available anymore. The dataset contains data of airplane accidents involving civil, commercial and military transport worldwide from 1908-09-17 to 2009-06-08.

While applying for a data scientist job opportunity, I was asked the following questions on this dataset:

  1. Yearly how many planes crashed? how many people were on board? how many survived? how many died?
  2. Highest number of crashes by operator and Type of aircrafts.
  3. ‘Summary’ field has the details about the crashes. Find the reasons of the crash and categorize them in different clusters i.e Fire, shot down, weather (for the ‘Blanks’ in the data category can be UNKNOWN) you are open to make clusters of your choice but they should not exceed 7.
  4. Find the number of crashed aircrafts and number of deaths against each category from above step.
  5. Find any interesting trends/behaviors that you encounter when you analyze the dataset.

My solution was:

The following bar charts display the answers requested by point 1. of the assignment, in particular:

  • the planes crashed per year
  • people aboard per year during crashes
  • people dead per year during crashes
  • people survived per year during crashes

The following answers regard point 2 of the assignment

  • Highest number of crashes by operator: Aeroflot with 179 crashes
  • By Type of aircraft: Douglas DC-3 with 334 crashes

I have identified 7 clusters using k-means clustering technique on a matrix obtained by a text corpus created by using Text Analysis (plain text, remove punctuation, to lower, etc.)
The following table summarize for each cluster the number of crashes and death.

  • Cluster 1: 258 crashes, 6368 deaths
  • Cluster 2: 500 crashes, 9408 deaths
  • Cluster 3: 211 crashes, 3513 deaths
  • Cluster 4: 1014 crashes, 14790 deaths
  • Cluster 5: 2749 crashes, 58826 deaths
  • Cluster 6: 195 crashes, 4439 deaths
  • Cluster 7: 341 crashes, 8135 deaths

The following picture shows clusters using the first 2 principal components:

For each clusters I will summarize the most used words and I will try to identify the causes of the crash

Cluster 1 (258)
aircraft, crashed, plane, shortly, taking.
No many information about this cluster can be deducted using Text Analysis

Cluster 2 (500)
aircraft, airport, altitude, crashed, crew, due, engine, failed, failure, fire, flight, landing, lost, pilot, plane, runway, takeoff, taking.
Engine failure on the runway after landing or takeoff

Cluster 3 (211):
aircraft, crashed, fog
Crash caused by fog

Cluster 4 (1014):
aircraft, airport, attempting, cargo, crashed, fire, land, landing, miles, pilot, plane, route, runway, struck, takeoff
Struck a cargo during landing or takeoff

Cluster 5 (2749):
accident, aircraft, airport, altitude, approach, attempting, cargo, conditions, control, crashed, crew, due, engine, failed, failure, feet, fire, flight, flying, fog, ground, killed, land, landing, lost, low, miles, mountain, pilot. plane, poor, route, runway, short, shortly, struck, takeoff, taking, weather
Struck a cargo due to engine failure or bad weather conditions mainly fog

Cluster 6 (195):
aircraft, crashed, engine, failure, fire, flight, left, pilot, plane, runway
Engine failure on the runway

Cluster 7 (341):
accident, aircraft, altitude, cargo, control, crashed, crew, due, engine, failure, flight, landing, loss, lost, pilot, plane, takeoff
Engine failure during landing or takeoff


Better solutions are welcome.

Thanks,
Sauro

Tables

Airplane Crashes And Fatalities Since 1908

@kaggle.saurograndi_airplane_crashes_since_1908.airplane_crashes_and_fatalities_since_1908
  • 881.68 KB
  • 5268 rows
  • 13 columns
Loading...

CREATE TABLE airplane_crashes_and_fatalities_since_1908 (
  "date" TIMESTAMP,
  "time" VARCHAR,
  "location" VARCHAR,
  "operator" VARCHAR,
  "flight" VARCHAR,
  "route" VARCHAR,
  "type" VARCHAR,
  "registration" VARCHAR,
  "cn_in" VARCHAR,
  "aboard" DOUBLE,
  "fatalities" DOUBLE,
  "ground" DOUBLE,
  "summary" VARCHAR
);

Share link

Anyone who has the link will be able to view this.