NJ Transit + Amtrak (NEC) Rail Performance
Granular performance data from 150k+ NJ Transit and Amtrak train trips
@kaggle.pranavbadami_nj_transit_amtrak_nec_performance
Granular performance data from 150k+ NJ Transit and Amtrak train trips
@kaggle.pranavbadami_nj_transit_amtrak_nec_performance
NJ Transit is the second largest commuter rail network in the United States by ridership; it spans New Jersey and connects the state to New York City. On the Northeast Corridor, the busiest passenger rail line in the United States, Amtrak also operates passenger rail service; together, NJ Transit and Amtrak operate nearly 750 trains across the NJ Transit rail network.
Despite serving over 300,000 riders on the average weekday, no granular, trip-level performance data is publicly available for the NJ Transit rail network or Amtrak. This datasets aims to publicly provide such data.
This dataset contains monthly CSVs covering the performance of nearly every train trip on the NJ Transit rail network.
As of May 19, 2019:
Since February of 2018, I have been running a scraper that gathers stop-level, minute resolution data for NJ Transit and Amtrak train trips operating on the NJ Transit rail network. This scraper gathers data every minute from the NJ Transit DepartureVision Real Time Train Status service. The raw, timestamped train status pages are stored in a data lake and then parsed into tabular form; the parser is implemented as a state machine.
For more details on these processes and ancillary meta data (such as schedules and station locations) from the NJ Transit Developer Portal, check out the project GitHub repo.
Lots of interesting, high-impact projects could be driven by this data:
For some more inspiration, you can check out Medium articles written by Michael Zhang and me with this data:
A special thanks to Michael Zhang for his valuable work on using and preparing this data, as well as general support throughout the project.
CREATE TABLE invalid_trains (
"date" VARCHAR,
"train_id" VARCHAR,
"reason" VARCHAR
);CREATE TABLE invalid_trains_05_01_19_05_18_20 (
"date" VARCHAR,
"train_id" VARCHAR,
"reason" VARCHAR
);CREATE TABLE n_2018_03 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" BIGINT,
"to" VARCHAR,
"to_id" BIGINT,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_04 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" BIGINT,
"to" VARCHAR,
"to_id" BIGINT,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_05 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_06 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_07 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_08 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_09 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_10 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_11 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2018_12 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" BIGINT,
"to" VARCHAR,
"to_id" BIGINT,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_01 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" BIGINT,
"to" VARCHAR,
"to_id" BIGINT,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_02 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_03 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_04 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" BIGINT,
"to" VARCHAR,
"to_id" BIGINT,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_05 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_06 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_07 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_08 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_09 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_10 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_11 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2019_12 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);CREATE TABLE n_2020_01 (
"date" TIMESTAMP,
"train_id" VARCHAR,
"stop_sequence" DOUBLE,
"from" VARCHAR,
"from_id" DOUBLE,
"to" VARCHAR,
"to_id" DOUBLE,
"scheduled_time" TIMESTAMP,
"actual_time" TIMESTAMP,
"delay_minutes" DOUBLE,
"status" VARCHAR,
"line" VARCHAR,
"type" VARCHAR
);Anyone who has the link will be able to view this.